[go: up one dir, main page]

WO2008004559A1 - Clustering system, and defect kind judging device - Google Patents

Clustering system, and defect kind judging device Download PDF

Info

Publication number
WO2008004559A1
WO2008004559A1 PCT/JP2007/063325 JP2007063325W WO2008004559A1 WO 2008004559 A1 WO2008004559 A1 WO 2008004559A1 JP 2007063325 W JP2007063325 W JP 2007063325W WO 2008004559 A1 WO2008004559 A1 WO 2008004559A1
Authority
WO
WIPO (PCT)
Prior art keywords
cluster
feature
distance
feature quantity
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2007/063325
Other languages
French (fr)
Japanese (ja)
Inventor
Makoto Kurumisawa
Akio Suguro
Koji Ohnishi
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
AGC Inc
Original Assignee
Asahi Glass Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Asahi Glass Co Ltd filed Critical Asahi Glass Co Ltd
Priority to JP2008523694A priority Critical patent/JP5120254B2/en
Priority to CN200780025547.5A priority patent/CN101484910B/en
Publication of WO2008004559A1 publication Critical patent/WO2008004559A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Definitions

  • the present invention relates to a clustering system and a defect type determination device that cut out a partial image of a defect portion in an image of a detection target object, extract a feature signal of the defect from the partial image cover, and classify the defect type. .
  • a conventional clustering technique based on a distance between unknown data and learning data for example, a Mahalanobis (Mahalanobis generalized distance) distance, is generally performed. That is, the unknown data is classified by determining whether or not it belongs to a cluster as a population learned in advance, and clustering processing is performed. For example, the determination of which population cluster the unknown data belongs to is made based on the Mahalanobis distance for multiple clusters (see, for example, Patent Document 1).
  • each feature amount used for classification is optimized in accordance with the identification at the time of classification, This Clustering is performed to weight each feature amount so as to correspond to the optimization of each of the clusters, and to determine which cluster the cluster belongs to using the optimized feature amount (for example, Patent Document 4).
  • Patent Document 1 Japanese Patent Laid-Open No. 2005-214682
  • Patent Document 2 Japanese Patent Laid-Open No. 2001-56861
  • Patent Document 3 Japanese Patent Application Laid-Open No. 07-105166
  • Patent Document 4 Japanese Patent Laid-Open No. 2002-99916
  • Patent Document 4 the feature amount is weighted based on the determination rate to improve the discrimination accuracy, but there is no concept of feature amount optimization for each cluster as described above. Similar to Patent Document 3, since the feature quantity is not fully utilized, there is a drawback that classification with high accuracy is not performed.
  • the present invention has been made in view of such circumstances, and is utilized when discriminating the extracted feature quantity, which is an object to be classified into the cluster to which it belongs, and compared with the conventional example.
  • a clustering system and a defect type determination device that can classify classification target data at high speed and with high accuracy, for example, can classify defects on a glass surface into clusters corresponding to defect types.
  • the present invention is different from the conventional example in which the distance between the classification target data and each cluster is calculated with the same type of feature amount to determine the classification destination. Therefore, a set of feature quantities that can obtain differences between each cluster is set for each cluster, and the distance is obtained with different feature quantities for each cluster. Thus, classification is performed with higher accuracy. Since the set of feature values described above is performed based on the characteristics of the learning data belonging to each cluster, the feature value sets are configured with feature values that can be distinguished from other clusters.
  • the present invention employs the following configuration.
  • the clustering system of the present invention includes input data (input data) for each cluster formed by the population of learning data (learning data), and features (parameters) of the input data.
  • input data input data
  • learning data learning data
  • features features
  • a feature set parameter set that is a combination of features used for classification is stored corresponding to each cluster, and is set in advance from a feature set storage unit and input data.
  • a feature amount extracting unit for extracting the feature amount, and for each feature amount set corresponding to each cluster, based on the feature amount included in the feature amount set, the center of the population of each cluster and the input
  • a distance calculating unit that calculates and outputs the distance to the data as a set distance
  • a rank extracting unit that arranges the set distances in ascending order.
  • a plurality of the feature quantity sets are set for each cluster.
  • a preferred clustering system of the present invention provides a classification criterion for each cluster of input data set based on the rank of the set distance based on the set distance obtained for each feature amount set.
  • the cluster classification unit detects which cluster the input data belongs to based on the rank of the set distance, and there are many set distances where the rank is higher. A cluster is detected as a cluster to which the input data belongs.
  • the cluster classification unit has a threshold for the number of ranks higher than the rank, and the input data belongs if the higher-ranked cluster is equal to or higher than the threshold. Detect as a cluster.
  • the distance calculation unit multiplies the set distance by a correction coefficient set corresponding to the feature amount set, and sets a set distance between each feature amount set. It is characterized by standardization.
  • a preferred clustering system of the present invention further includes a feature quantity set creation unit that creates a feature quantity set for each cluster, and the feature quantity set creation unit has a plurality of combinations of each feature quantity. The average value of the learning data of the population of each cluster is used as the origin, the average value of the distance between this origin and each learning data of the population of another cluster is obtained, and the combination of the feature values that has the largest average value is obtained. Then, it is selected as a feature value set used for distinguishing each cluster from other clusters.
  • the defect type determination apparatus is provided with any of the above-described clustering systems, and the input data is image data of a product defect, and the image data includes a feature amount indicating the defect. Defects are classified by defect type.
  • the product is a glass article, and the defects of the glass article are classified by defect type.
  • a defect detection apparatus detects a defect type of a product provided with the defect type determination apparatus.
  • a manufacturing state determination device is provided with the defect type determination device described above, performs a type of product defect, and in a manufacturing process based on correspondence with an occurrence factor corresponding to the type. Detect the cause of the defect.
  • a preferable manufacturing state determination apparatus of the present invention is provided with any one of the above-described clustering systems, and the input data is a characteristic amount indicating a manufacturing condition in a manufacturing process of the product. Categorize by manufacturing state of each step of the manufacturing process.
  • the product is a glass article, and the characteristic amount in the manufacturing process of the glass product is classified according to the manufacturing state of each step of the manufacturing process.
  • the manufacturing state detection device of the present invention detects the type of manufacturing state in each step of the product manufacturing process provided with the manufacturing state determination device described above.
  • the product manufacturing management device of the present invention detects the type of manufacturing state in each step of the manufacturing process of the product provided with the above-described manufacturing state determination device, and sets the control item corresponding to the type. Based on this, process control in the manufacturing process is performed.
  • an optimal combination of feature amounts that is far from other clusters from a plurality of feature amounts of data to be classified is previously stored. It is set in advance, and the distance between the classification target data and each cluster is calculated, and the classification target data is classified into the cluster with the smallest calculated distance. The classification target data can be accurately classified into the corresponding clusters.
  • a plurality of the above combinations are set for each cluster, and the calculation result distances of all the clusters and the classification target data are arranged in order, and are included in a preset number of upper groups. Since the data to be classified is classified into the cluster with the largest number, classification can be performed with higher accuracy than in the past.
  • FIG. 1 is a block diagram showing a configuration example of a clustering system according to first and second embodiments of the present invention.
  • FIG. 2 is a table explaining a process for selecting a feature set based on a discrimination reference value ⁇ .
  • FIG. 3 is a table for explaining processing for feature set selection based on a discrimination reference value ⁇ .
  • FIG. 4 is a diagram showing a histogram for explaining the effect of the discrimination criterion value ⁇ on the feature set selection.
  • FIG. 5 is a flowchart showing an operation example in a process of selecting a feature amount set for each cluster according to the first embodiment.
  • FIG. 6 is a flowchart showing an operation example in clustering processing for classification target data according to the first embodiment.
  • FIG. 7 is a flowchart showing an operation example for generating a rule pattern table used for clustering processing in the second embodiment.
  • FIG. 8 is a flowchart showing an operation example in clustering processing for classification target data according to the second embodiment.
  • FIG. 9 Another clustering process for the classification target data according to the second embodiment. It is a flowchart which shows the operation example which takes.
  • FIG. 10 is a flowchart showing an operation example in clustering processing for classification target data according to the third embodiment.
  • FIG. 12 is a flowchart showing an operation example of evaluation value calculation in the flowchart of FIG. 11.
  • FIG. 14 is a table showing learning data belonging to each cluster.
  • ⁇ 16 It is a conceptual diagram illustrating a method for calculating the overall correction determination rate.
  • FIG. 17 is a result table showing the result of classifying the learning data of FIG. 14 by the clustering system in the first embodiment.
  • FIG. 18 is a result table showing the results of classifying the learning data of FIG. 14 by the clustering system in the second embodiment.
  • FIG. 19 is a result table showing the results of classifying the learning data of FIG. 14 by the clustering system in the second embodiment.
  • FIG. 20 is a block diagram showing a configuration example of an inspection apparatus using the clustering system of the present invention.
  • FIG. 21 is a flowchart showing an operation example of feature quantity set selection in the inspection apparatus of FIG.
  • FIG. 22 is a flowchart showing an operation example of clustering processing in the inspection apparatus of FIG.
  • ⁇ 23 It is a block diagram showing a configuration example of a defect type determination device using the clustering system of the present invention.
  • FIG. 24 is a block diagram showing a configuration example of a manufacturing management apparatus using the clustering system of the present invention.
  • FIG. 25 is a block diagram showing a configuration example of another manufacturing management apparatus using the clustering system of the present invention.
  • the clustering system of the present invention relates to a clustering system that classifies the input data to be classified into the clusters formed by using the learning data as a population according to the feature quantities of the input data, and corresponds to each of the clusters.
  • a feature quantity set storage unit storing a feature quantity set that is a combination of feature quantities used for classification, and a feature quantity extraction unit based on the preset feature quantity set, the input data
  • the distance calculation unit extracts the feature quantity from the feature quantity, and based on the feature quantity included in the feature quantity set, the distance between the population and the input data is obtained.
  • the set distance is calculated, and the rank extraction unit arranges the set distances in ascending order and classifies them into clusters corresponding to the arrangement order.
  • FIG. 1 is a block diagram showing a configuration example of the clustering system according to the embodiment.
  • the clustering system of this embodiment has a feature value set creation unit 1, a feature value extraction unit 2, a distance calculation unit 3, a feature value set storage unit 4, and a cluster database 5.
  • the feature value set storage unit 4 stores a feature value set indicating a combination of feature values of classification target data, which is individually set for each cluster, corresponding to the identification information of each cluster. For example, if the classification target data is a set of feature quantities ⁇ a, b, c, d ⁇ , the feature quantity set for each cluster is [a, b], [a, b, c, d], [c], etc. It is set as a combination of feature types. In the following description, a combination of all feature values or a combination of two or more (in the above example, any two forces or three feature values) from the feature value set, any one of them is referred to as “feature value”. Defined as “combinations of”.
  • the feature value set corresponding to each cluster uses learning data that is classified in advance in each cluster, and each cluster Is obtained as a combination of feature quantities having the largest distance from the other clusters, and is stored in the feature quantity set storage unit 4.
  • the feature quantity set for cluster A includes the vector that is the average value of each feature quantity of learning data belonging to cluster A, and each feature quantity of learning data belonging to other clusters B and C.
  • the distance from the average vector is set as the largest combination of features.
  • classification target data and the learning data of the population in each cluster are composed of the same set of features.
  • the distance calculation unit 3 reads from the cluster database 5 a vector that is the average value of each feature quantity of the learning data of the cluster to be calculated, using the identification information of the cluster to be calculated as a key, and sets the feature quantity set of this cluster.
  • the distance between the vector of feature quantity extracted from the classification target data and the vector consisting of the average value of each feature quantity of the training data calculate.
  • the distance calculation unit 3 When calculating the distance, the distance calculation unit 3 eliminates the data unit difference between the feature quantities and standardizes the numerical values between the feature quantities. For each feature v (i), regularity is performed.
  • V (i) (v (i) -avg. (I)) / std. (I)... hi)
  • v (i) is the feature value
  • avg. (I) is the average value of the feature value in the training data in the cluster to be calculated
  • std. (I) is the value in the cluster to be calculated. This is the standardized deviation of the feature in the training data
  • V (i) is the normalized feature. Therefore, when calculating the distance, the distance calculation unit 3 needs to perform standardization of each feature value for each feature value set.
  • the distance calculation unit 3 performs the normalization process using the average value and the standard deviation of the corresponding feature values of the learning data for each feature value used for calculating the distance in the classification target data.
  • the Mahalanobis squared distance (Mahalanobis squared distance) MHD is obtained by the following equation (2).
  • Each element V (i) in the matrix V in the equation (2) is an average value avg. (I) of the feature value of the learning data in the corresponding cluster with respect to the multidimensional feature value v (i) of the unknown data.
  • the standard deviation std. (I) the feature amount obtained by the above-described equation (1).
  • n is the degree of freedom. For the state, indicate the number of features that is the number of features in the feature set (described later)!
  • the Mahalanobis square distance is a value obtained by adding the differences of n transformed feature quantities, and the unit distance of the population average becomes 1 due to (Mahalanobis square distance) Zn.
  • V T is a transposed matrix of matrix V whose elements are features v (i)
  • R _1 is an inverse matrix of correlation matrix R between each feature in the learning data in the cluster. is there.
  • the feature quantity set creation unit 1 calculates a feature quantity set used when the distance calculation unit 3 calculates the distance between the classification target data and each cluster for each cluster, and calculates the calculation result for each cluster. Corresponding to the cluster identification information, it is written and stored in the feature value set storage unit 4.
  • the feature quantity set creation unit 1 for each cluster the centroid vector (barycentric vector) of the learning data belonging to the target cluster for generating the feature quantity set, and the other clusters excluding the target cluster. Based on the distance from the centroid vector of the learning data belonging to all the clusters, the value ⁇ of the discriminant criterion is calculated by the following equation (3).
  • a combination of feature amounts will be described as a feature amount set.
  • ⁇ ⁇ ( ⁇ - ⁇ ) ⁇ / ( ⁇ ⁇ 2 + ⁇ . ⁇ . 2 ) ⁇ ' ⁇ (3)
  • ⁇ i is a centroid vector that also serves as an average value of feature quantities in a feature quantity set of “learning data belonging to the target cluster (cluster population)”.
  • is a standard deviation of a vector based on a feature amount of learning data belonging to the population in the cluster.
  • is the ratio of the number of learning data belonging to the population in the cluster to the learning data belonging to all clusters.
  • is a centroid vector that also represents the average value of feature values in the feature value set of “learning data belonging to clusters other than the target cluster (population outside target cluster)”.
  • is a standard deviation of a vector based on a feature amount of learning data belonging to the population outside the target cluster.
  • is the learning belonging to the non-cluster population in the learning data belonging to all clusters
  • the feature value set creation unit 1 calculates and uses a feature value normalized for each feature value according to equation (1). Further, the eigenvalues may be set as numerical values for which the ratios co i and ⁇ are calculated in advance and the separation becomes large.
  • the feature value set creation unit 1 uses the above equation (3) for each target cluster and uses another class.
  • the discrimination reference value with the raster is calculated for any force or all combinations of the feature values constituting the learning data, the calculated discrimination reference values are listed in descending order, and the order of the discrimination reference value ⁇ is calculated. Output a list.
  • the feature quantity set creation unit 1 uses the combination of feature quantities corresponding to the largest discriminant reference value as the feature quantity set of the target cluster, along with the discriminant reference value and the cluster identification information. And stored in the feature value set storage unit 4.
  • the determination criterion value ⁇ is determined by the feature quantity set creation unit 1 when the feature data set of each cluster is set. If there are four feature values, a, b, c, and d, all of the four feature values, multiple, or any one of all combinations of the discrimination reference values ⁇ are calculated.
  • the feature quantity set creation unit 1 selects the highest numerical value, for example, the combination of the feature quantities b and c in FIG. 2 (a).
  • the BSS method As another method for determining the reference value ⁇ , as described in Fig. 2 (b), the BSS method, that is, a reference using all n feature quantities included in the set of classification target data.
  • the value ⁇ is calculated, and then the discriminant reference value is calculated for all combinations in which ⁇ 1 is extracted from the set of feature values ⁇ . Then, select the combination of the n ⁇ l discriminant reference value maximum values, and calculate the discriminant reference value for all the n ⁇ 1 feature amount forces. To do.
  • the feature amount is reduced by one by one in the order, and the discrimination reference value is calculated by selecting the combination further reduced by one from the reduced feature amount set, and reducing the feature amount.
  • the feature quantity set creation unit 1 may be configured to select a combination that can be identified by the number of feature quantities.
  • the FSS method that is, all kinds of feature quantities from n feature quantities included in the set of classification target data are used. Are read one by one, the discrimination reference value ⁇ of each feature quantity is calculated, and the feature quantity having the maximum discrimination reference value is selected from these. Next, a combination of two feature quantities, that is, the feature quantity and other feature quantities is generated, and a discrimination reference value for each combination is calculated. Then, the combination having the maximum discrimination reference value is selected from the combinations. Next, this combination and the combination that is included in this combination and has the three feature quantity powers are generated. And each discrimination reference value is generated.
  • the feature quantity having the maximum discriminant reference value is selected sequentially from the combination of the immediately preceding feature quantities, and the feature quantity of the combination is increased by one with respect to the combination.
  • the feature value of the increased combination feature value ⁇ is calculated, the combination with the largest combination reference value ⁇ is selected, and the feature value that is not present in this combination is increased by one feature.
  • the feature value set is calculated so that the discrimination criterion value of the combination of amounts is calculated and, finally, the combination with the maximum discrimination criterion value is selected as the feature amount set from all the combinations for which the discrimination criterion value is calculated. Creation unit 1 may be configured.
  • FIGS. 3 and 4 show the effectiveness of selecting feature quantity sets used for clustering based on the discrimination reference value.
  • FIG. 3 shows combinations of feature amounts a and g, combinations of feature amounts a and h, and feature amounts d and e as combinations for selecting feature amount sets from feature amounts a, b, c, d, and e. From these combinations, cluster 1 and clusters 2 and 3 will be described with reference to selection of feature quantity sets having higher classification characteristics than conventional examples.
  • 1 corresponds to the above
  • 2 corresponds to the above
  • 1 corresponds to the above
  • ⁇ 2 corresponds to the ⁇
  • ⁇ ⁇ corresponds to the ⁇
  • ⁇ 2 corresponds to the ⁇ .
  • the combination has the largest discriminant reference value ⁇ , which is a combination of feature quantities a and h.
  • This combination is defined as a combination of cluster 1 and other clusters.
  • the classification results of cluster 1 and the other clusters (clusters 2 and 3) are confirmed in Fig. 4 for separation.
  • the horizontal axis is the log of Mahalanobis distance calculated using the combination of features.
  • the vertical axis represents the number of data to be separated (histogram) having a corresponding numerical value.
  • a numerical value of 1.4 on the horizontal axis means that the numerical value of the log of Mahalanobis distance is less than 1.4 and 1.2 or more (the value on the left side of 1.4).
  • 1.4 ⁇ indicates that it is 1.4 or more.
  • the Mahalanobis distance in Fig. 4 is calculated for the classification target data belonging to cluster 1 and other clusters using the feature set corresponding to cluster 1.
  • Fig. 4 (a) shows an example of calculating the Mahalanobis distance using a combination of features a and g.
  • Fig. 4 (b) shows an example of calculating the Mahalanobis distance using a combination of features a and h.
  • Fig. 4 (c) shows an example of calculating a Mahalanobis distance using a combination of features d and e. Looking at the histogram in Fig. 4, it can be seen that if the numerical value of the discrimination criterion is large, cluster 1 is well classified into other clusters.
  • FIG. 5 is a flowchart showing an operation example of the feature quantity set creation unit 1 of the clustering system according to the first embodiment
  • FIG. 6 is a flowchart showing an operation example of the clustering of the classification target data.
  • the classification target data is a set of feature values of scratches attached to a glass article
  • these feature values are “a: length of scratch”, “b: scratch area”.
  • ”,“ C: width of scratch ”,“ d: transmittance of a predetermined region including a scratch portion ”,“ e: reflectance of a predetermined region including a scratch ”, and the like are obtained from image processing and measurement results. Therefore, the set of feature values (hereinafter referred to as feature value set) is ⁇ a, b, c, d, e ⁇ .
  • the distance used for clustering is calculated as the Mahalanobis distance using the standardized feature value.
  • examples of the glass article in the present embodiment include plate glass and a glass substrate for display.
  • A. Feature value set creation processing (corresponding to the flowchart in FIG. 5)
  • the user detects scratches on the glass, captures this image, obtains image data, and performs image processing to extract feature quantities such as measuring the length of the scratched part. Collecting feature value data consisting of the set of feature values. Then, the user sorts the feature data as learning data based on information such as the cause and shape that is known in advance for each cluster to be classified such as the cause and shape of the scratch, and the learning data of each cluster is sorted.
  • a population is stored in the cluster database 5 in correspondence with the cluster identification information from a processing terminal (not shown) (step Sl).
  • the feature quantity set creation unit 1 corresponds to the identification information of each cluster from the cluster database 5. Read the population of learning data.
  • the feature quantity set creation unit 1 performs each feature in the cluster population for each cluster.
  • the average value and standard deviation of the quantities are calculated, and the standardized feature quantity in each learning data is calculated from the equation (1) using the average value and standard deviation.
  • the feature quantity set creation unit 1 calculates a discrimination reference value according to equation (3) for each feature quantity set of all combinations of feature quantities included in the feature quantity set.
  • the feature value set creation unit 1 uses, for each cluster, the standardized feature values of the population in the cluster, and the average value of vectors (centroid vectors) ⁇ that also have feature value power corresponding to each feature value set. And the standard deviation ⁇ of the learning data vector corresponding to the feature quantity set in the population within the cluster and the standardized feature quantity of the population outside the cluster, to each feature quantity set.
  • Corresponding feature value power vector mean value (centroid vector) ⁇ , standard deviation ⁇ of learning data vector consisting of features corresponding to feature quantity set in non-cluster population, and intra-cluster population in the total number of learning data The ratio ⁇ of the number of learning data and the ratio ⁇ of the number of learning data of the non-cluster population in the total number of learning data are calculated.
  • the feature quantity set creation unit 1 uses the centroid vectors, ⁇ , standard deviations ⁇ , ⁇ , and ratios ⁇ , ⁇ to calculate the distance from other clusters for each cluster according to equation (3).
  • the discriminant reference value for discriminating is calculated for each cluster feature amount set of all combinations of feature amount sets.
  • the feature quantity set creation unit 1 lists the discrimination reference values in the descending order for each cluster, and selects the feature quantity set corresponding to the largest discrimination reference value.
  • the affiliation to each cluster it is detected as a feature value set indicating a set of feature value combinations used for calculating the distance (step S2).
  • the feature quantity set creation unit 1 uses the correlation coefficient R between the feature quantities corresponding to each feature quantity set and the population within each cluster for use in the distance calculation by the distance calculation unit 3.
  • the average value avg. (I) and standard deviation std. (I) of the feature values of the learning data at are calculated (step S3).
  • the feature value set creation unit 1 calculates a correction coefficient _ (1/2) from the discrimination reference value.
  • This correction coefficient is standardized between each feature set. Since the distance from other clusters varies depending on the cluster, it is necessary to standardize the feature sets in order to increase the classification accuracy.
  • the feature quantity set creation unit 1 associates the identification information of each cluster with the feature quantity set, the correction coefficient corresponding to the feature quantity set, and in this embodiment fly_ (1/2) ,
  • the inverse matrix R_1 , the average value avg. (I), and the standard deviation std. (I) are stored as distance calculation data in the feature quantity set storage unit 4 (step S4).
  • the feature quantity extraction unit 2 reads a feature quantity set corresponding to each cluster from the feature quantity set storage unit 4 based on the identification signal of each cluster.
  • the feature quantity extraction unit 2 extracts the feature quantity from the classification target data for each cluster corresponding to the type of feature quantity in the read feature quantity set, and associates it with each cluster identification information. Then, the extracted feature value is stored in the internal storage unit (step Sl l).
  • the distance calculation unit 3 obtains each feature amount extracted from the classification target data from the feature amount set storage unit 4 with an average value avg. (I) corresponding to the feature amount and a standard deviation std. (i) is read out, standardized by performing the calculation of equation (2), and the feature quantity stored in the internal storage unit is replaced with the standardized feature quantity.
  • the distance calculation unit 3 generates the matrix V that also has the elemental force of V (i) obtained as described above, calculates the transposed matrix V T of this matrix V, and sequentially classifies it according to equation (3).
  • the Mahalanobis distance between the target data and each cluster is calculated, and the internal data is recorded according to the identification information of each cluster. Store in memory (step S12).
  • the distance calculation unit 3 multiplies the Mahalanobis distance of the calculation result by a correction coefficient _ (1/2) corresponding to the feature quantity set to obtain a correction distance, and each Mahalanobis distance Replace with separation (step S13). Also, when multiplying the correction factor, it may be multiplied after calculating the log or square root of the Mahalanobis distance.
  • the distance calculation unit 3 compares the correction distances between the clusters in the internal storage unit (step S14), detects the minimum correction distance, and identifies the cluster of identification information corresponding to the correction distances as the classification target.
  • the classified data to be classified is stored in the cluster database 5 in correspondence with the identification information of the cluster to be classified (step S15).
  • the feature amount set used for clustering is described as one type for each cluster. However, as in the second embodiment described below, the feature amount set is set for each cluster. Multiple, and calculate the Mahalanobis distance corresponding to each feature set, calculate the correction distance, rearrange the correction distances in ascending order, and in advance by the correction distance within the upper predetermined rank As a cluster to which the data to be classified belongs, according to the set rules.
  • the distance calculation unit 3 is set based on the distance ranking based on the distance between the classification target data obtained for each feature quantity set and each cluster. Based on the rule pattern indicating the classification criteria for each cluster of the classification target data, it is detected whether the classification target data belongs to the misaligned cluster.
  • FIG. 7 is a flowchart showing an example of pattern learning operation for the order of distances for setting rule patterns. 8 and 9 are flowcharts showing an example of clustering operation according to the second embodiment.
  • a feature quantity set creation unit 1 For each cluster, a discrimination reference value is calculated for a plurality of feature quantity sets as a combination of feature quantities, and a feature quantity set corresponding to the maximum value of the plurality of obtained discrimination reference values is calculated for each cluster. It was set as a feature amount set.
  • the feature value set creation unit 1 has one or more combinations of other clusters for each cluster. By setting the maximum value of the feature quantity set corresponding to the number of combinations, a plurality of judgment reference values are obtained, and a plurality of feature quantity sets for separating each cluster from each other is set.
  • the feature quantity set creation unit 1 obtains distance calculation data for each feature quantity set, and associates a plurality of feature quantity sets with the distance calculation data of each feature quantity set in accordance with the cluster identification information. Stored in the feature set storage unit 4.
  • the feature quantity extraction unit 2 receives a plurality of feature quantity sets corresponding to each cluster from the feature quantity set storage unit 4 according to the identification signal of each cluster. read out.
  • the feature quantity extraction unit 2 extracts the feature quantity from the learning data for each cluster corresponding to the type of feature quantity in each read feature quantity set, and associates it with the identification information of each cluster. Then, the extracted feature quantity is stored in the internal storage unit for each feature quantity set (step S21).
  • the distance calculation unit 3 extracts each feature quantity extracted from the learning data from the feature quantity set storage unit 4 and the average value avg. (I) corresponding to the feature quantity for each feature quantity set and the standard
  • the deviation std. (I) is read out and standardized by performing the calculation of equation (2) above, and the feature quantity stored in the internal storage unit is replaced with the standardized feature quantity.
  • the distance calculation unit 3 generates a matrix V that also has the elemental force of V (i) obtained as described above, calculates a transposed matrix V T of this matrix V, and sequentially learns according to Equation (3).
  • the Mahalanobis distance between the data and each cluster is calculated, and stored in the internal storage unit for each feature quantity set in correspondence with the identification information of each cluster (step S22).
  • the distance calculation unit 3 multiplies the Mahalanobis distance of the calculation result by a correction coefficient _ (1/2) corresponding to the feature amount set to obtain a correction distance, and each Mahalanobis distance Replace with release (step S23).
  • the distance calculation unit 3 sorts the correction distances between the clusters in the internal storage unit in ascending order (sorts the smaller correction distances in a higher rank), that is, the correction distance from the classification target data becomes smaller. Then, arrange them in the order of higher cluster identification information (step S24).
  • the distance calculation unit 3 detects the identification information of the cluster corresponding to each of the correction distances from the smaller (higher) to the n-th, and the identification information for each cluster included in the n is detected. Count the number, that is, vote for each cluster.
  • the distance calculation unit 3 detects a rule pattern in which the number of identification information counts of each cluster of each learning data is common to the learning data included in the same cluster. For example, if n is set to 10 and the learning data for cluster B is detected as a count pattern with 5 clusters A, 3 clusters B, and 2 clusters C, this is detected. Is rule R1.
  • cluster A when cluster A has the first and second patterns from the top, even if the number of counts for cluster B is 8, the number of counts for other clusters Regardless of rule R3, cluster A is used.
  • the regularity of the number of counts of each cluster included in each learning data classified into the same cluster is detected and stored internally as a pattern table for each piece of identification information of each cluster.
  • one rule may be set for each cluster, or a plurality of rules may be set.
  • the distance calculation unit 3 extracts the rule pattern.
  • the count number or the order rule pattern can be set arbitrarily. Also good.
  • Some clusters have similar characteristics to the characteristics of other clusters, and the relevance of multiple clusters, that is, the count number of each cluster or the pattern from the top In some cases, it is more accurate to classify the classification target data from the target pattern, and this embodiment complements this point.
  • the feature quantity extraction unit 2 When the classification target data is input, the feature quantity extraction unit 2 reads a plurality of feature quantity sets corresponding to each cluster from the feature quantity set storage unit 4 in accordance with the identification signal of each cluster. Then, the feature quantity extraction unit 2 extracts the feature quantity from the classification target data for each cluster corresponding to the feature quantity type in each read feature quantity set, and each of the cluster identification information. Correspondingly, the extracted feature quantity is stored in the internal storage unit for each feature quantity set (step S31).
  • the distance calculation unit 3 extracts each feature quantity extracted from the classification target data from the feature quantity set storage unit 4 for each feature quantity set by calculating an average value avg. (I) And the standard deviation std. (I) are read out and standardized by performing the calculation of equation (2), and the feature quantity stored in the internal storage unit is replaced with the standardized feature quantity.
  • the distance calculation unit 3 generates a matrix V that also has the elemental force of V (i) obtained as described above.
  • the distance calculation unit 3 multiplies the Mahalanobis distance of the calculation result by a correction coefficient _ (1/2) corresponding to the feature amount set to obtain a correction distance, and each Mahalanobis distance Replace with release (step S33).
  • the distance calculation unit 3 rearranges the correction distances between the clusters in the internal storage unit in ascending order, that is, arranges the identification information of the clusters with the small correction distances to the classification target data in the order of higher rank (step S34).
  • the distance calculation unit 3 detects the identification information of the clusters corresponding to the correction distances from the smaller (higher) to the nth, and calculates the number of identification information for each cluster included in the n pieces. Counting, that is, voting is performed for each cluster.
  • the distance calculation unit 3 applies to each cluster in the top n pieces of each classification target data.
  • Count number pattern (or arrangement pattern) 1S Check whether it exists in the table stored in the inside (step S35).
  • the distance calculation unit 3 detects that the rule pattern matching the target pattern of the classification target data is described in the table as a result of the above-described collation, the distance calculation unit 3 identifies the classification target data corresponding to the matched rule. It is determined that it belongs to the cluster of information, and the classification target data is classified into this cluster (step S36).
  • step S31 to step S35 is the same as the processing shown in FIG. 8, and the distance calculation unit 3 stores the table in the table as already described in step S35. Based on the stored rule pattern, collation processing with the target pattern of the classification target data is performed.
  • the distance calculation unit 3 detects whether or not a rule pattern that matches the target pattern is found in the collation result. If the rule calculation unit 3 detects that a rule pattern that matches the target pattern is found, the process proceeds to step S47. On the other hand, if it is detected that no matching rule pattern is found, the process proceeds to step S48 (step S46).
  • the distance calculation unit 3 determines that this classification target data belongs to the cluster of identification information corresponding to the matching rule, and determines the classification target data.
  • the classification target data is classified and stored in the cluster database 5 in correspondence with the identification information of the classification destination cluster (step S47).
  • the distance calculation unit 3 detects the identification information having the largest number of counts, that is, the number of votes, and classifies the classification target data into the cluster corresponding to the identification information. To do.
  • the distance calculation unit 3 stores the classified target data in the cluster database 5 in association with the identification information of the cluster to which it belongs (step S48).
  • the second embodiment described above prepares a table of rule patterns in the top n from the V V (similarity is large) distance from each cluster of the calculated classification target data.
  • V V similarity is large
  • the feature amount is set for each cluster as in the third embodiment described below. Set multiple sets, calculate the Mahalanobis distance corresponding to each feature set, calculate the correction distance, and select the cluster with the most correction distances within the upper predetermined rank as the cluster to which the classification target data belongs. Also good.
  • step S48 in FIG. 9 is performed directly after the process of setting the rules from the learning data.
  • FIG. 10 is a flowchart showing an example of clustering operation in the third embodiment.
  • step S34 the processing from step S31 to step S34 is the same as the processing shown in Fig. 8, and the distance calculation unit 3 performs the step as described above.
  • step S34 the correction distances between the clusters in the internal storage unit are rearranged in ascending order, that is, the correction distances from the classification target data are arranged in ascending order of the cluster identification information (step S34).
  • the distance calculation unit 3 detects cluster identification information corresponding to each of the correction distances from the smaller (higher) to the nth, and counts the number of identification information for each cluster included in the n. In other words, a voting process is performed for each cluster (step S55).
  • the distance calculation unit 3 detects the identification information of the largest count value (the number of votes) in the voting result, and designates the cluster corresponding to this identification information as the cluster to which the classification target data belongs, with respect to the cluster database 5. Then, the classified data to be classified is stored in correspondence with the identification information of the cluster to which it belongs (step S 56).
  • the number of votes for the identification information of cluster A is five, and the number of votes for the identification information of cluster B is five. If there are three and the number of votes for cluster C is two, the most votes Cluster A and the distance calculation unit 3 detect the identification information.
  • the distance calculation unit 3 must not belong to any cluster because the number of votes for the identification information of cluster A is less than the threshold value. Judgment is made.
  • Clustering is performed with the expectation that the population of each feature is a normal distribution, but depending on the type of feature (area, length, etc.), the distribution may not be a normal distribution and the population may be biased. In some cases, the calculation of the distance between the classification target data and each cluster, that is, the accuracy in determining the similarity between the classification target data and each cluster may be reduced. Therefore, depending on the feature value, it is necessary to convert the feature value of the population by a predetermined method and to improve the accuracy of similarity determination by bringing it close to the normal distribution.
  • This normal distribution can be converted to either a logarithm, n-root such as square root (), cubic root ( 3 ), factorial, or an arithmetic expression including a function obtained by numerical calculation. Convert.
  • FIG. 11 is a flowchart showing an operation example of the setting process of the conversion method of each feature quantity.
  • This conversion method is set for each cluster in units of feature values included in the cluster.
  • This conversion method is set using learning data belonging to each cluster.
  • the following processing is described as being performed by the feature value set creation unit 1, but it is not necessary to provide another processing unit corresponding to this processing.
  • the feature value set creation unit 1 uses the identification information of the cluster to be classified as a key, reads the learning data included in this cluster from the cluster database 5, and calculates (normalizes) the feature value of each learning data (step S61). ).
  • the feature value set creation unit 1 performs feature value conversion by calculating each of the read learning data using any of the arithmetic expressions for performing feature value conversion stored therein. (Step S62).
  • the feature value set creation unit 1 calculates an evaluation value indicating whether the distribution obtained by the conversion process is close to the normal distribution (step S63) .
  • the feature value set creation unit 1 detects whether or not the force is calculated by calculating the evaluation value for all the arithmetic expressions stored therein, that is, preset as a conversion method. , And if it is detected that the evaluation value of the distribution obtained by converting the feature values in all the arithmetic expressions is calculated, the process proceeds to step S65, while If the calculation of the feature quantity is completed and it is detected, the process returns to step S62 (step S64) in order to process the arithmetic expression that is set next.
  • the feature value set creation unit 1 detects the distribution with the smallest evaluation value in the distribution obtained in the set calculation formula, that is, the distribution closest to the normal distribution. Then, the arithmetic expression used to create the detected distribution is determined as a conversion method and set internally as a conversion method for the feature quantity of the cluster (step S65).
  • the feature quantity set creation unit 1 performs the above-described processing for each feature quantity of each cluster, and sets a conversion method corresponding to each feature quantity in each cluster.
  • step S63 the calculation of the evaluation value in step S63 will be described with reference to FIG. Fig. 12 is a flow chart for explaining an example of the processing for obtaining the evaluation value of the distribution obtained by the arithmetic expression.
  • the feature quantity set creation unit 1 converts the feature quantity of each learning data belonging to the target cluster by the set arithmetic expression (step S71).
  • the feature value set creation unit 1 calculates the average value ⁇ and standard deviation ⁇ of the distribution (population) obtained from the converted feature values (step) S72).
  • the feature value set creation unit 1 calculates the ⁇ value (1) from (X ⁇ / ⁇ ) ⁇ using the average value of the population and the standard deviation ⁇ (step S73).
  • the feature value set creation unit 1 calculates the cumulative probability in the population (steps). P74).
  • the feature value set creation unit 1 calculates the z value (2) as the value of the inverse function of the cumulative distribution function of the standard normal distribution based on the calculated cumulative probability in the population (step S75).
  • the feature quantity set creation unit 1 calculates the difference between the two z values of the feature quantity distribution, that is, the difference between the z value (1) and the z value (2), that is, the error between the two z values in the distribution (step S76). .
  • the feature value set creation unit 1 calculates the sum of the errors of the two z values, that is, the sum of the errors (square sum) as an evaluation value (step S77).
  • FIG. 13 is a flowchart showing an operation example of calculating feature amount data of classification target data.
  • the distance calculation unit 3 extracts the feature quantity of the identification target from the input classification target data corresponding to the feature quantity set set for each cluster, and performs the normalization process already described (step S81). .
  • the distance calculation unit 3 converts the feature quantity used for classification into the cluster to be classified in the classification target data V according to the conversion method (arithmetic formula) set for the feature quantity of this cluster. (Step S82).
  • the distance calculation unit 3 calculates a distance from the cluster to be classified (step S83).
  • the distance calculation unit 3 converts the feature amounts for all the clusters to be classified by a conversion method set in correspondence with the feature amounts of each cluster, and the cluster and the converted feature amounts. If it is detected that the distance is calculated for all the clusters to be classified, the process proceeds to step S85, while the cluster to be classified is detected. If it is detected that remains, the process returns to step S82 (step S84). Then, in each of the first to third embodiments, processing of the point-in-time force when the distance calculation is completed is started (step S85).
  • the Mahalanobis distance used in this embodiment is When calculating the distance between the target data and each cluster, we expect that the feature quantity is a normal distribution. Therefore, the closer the distribution of each feature quantity of the population is to the normal distribution, the closer to each cluster. In addition, it can be expected that an accurate distance (similarity) can be obtained and the accuracy of classification for each cluster is improved.
  • Fig. 15 shows a conventional calculation method, using feature amounts a and g as a combination of feature amounts, and for each learning data shown in Fig. 14 of cluster 1 to cluster 3, Mahalanobis. The distance is calculated and the determination result is shown.
  • the cluster column is the Mahalanobis distance to cluster 1
  • the cluster 2 column is the Mahalanobis distance to cluster 2
  • the cluster 3 column is the Mahalanobis distance to cluster 3.
  • the category column indicates the cluster to which each learning data actually belongs, and the judgment result indicates the learning data and the mano, and the cluster with the smallest Ranobis distance.
  • the numbers in the category and the judgment result match! The feature data that is correctly classified is displayed.
  • the column number indicates the cluster to which the learning data actually belongs, and the row number indicates the determined cluster.
  • “8” in mark R1 is determined as cluster 1 in 8 of 10 clusters in cluster 1
  • “2” in mark R2 is determined as cluster 3 in 2 out of 10 clusters in cluster 1.
  • indicates the match rate between correct answer and answer
  • pi indicates the probability of coincidence of both
  • is the overall correction determination rate, which can be obtained by the following formula. The higher ⁇ , the higher the classification accuracy.
  • (p0 -pl) / (l -pl)
  • the data belonging to cluster 1 is the number power classified as cluster 1
  • the data belonging to cluster 1 is the number power 3 ⁇ 4 classified as cluster 2
  • a + b indicates the number of data belonging to cluster 1
  • the number of data belonging to cluster 2 classified as cluster 2 is d
  • the number of data belonging to cluster 2 classified as cluster 1 is c
  • c + d is data belonging to cluster 2 Show the number! / a + c is the number classified as cluster 1 in all data a + b + c + d
  • b + d is the number classified as cluster b in all data a + b + c + d It is.
  • FIG. 17 shows the determination result by calculating the Mahalanobis distance for each learning data shown in FIG. 14 of cluster 1 to cluster 3 using the calculation method of the first embodiment. .
  • the way of viewing FIGS. 17 (a) and (b) is the same as in FIG. 15, and therefore the description thereof is omitted.
  • the correct answer rate ⁇ , the coincidence probability pi, and the overall correction judgment rate K are equivalent to the conventional calculation method in FIG.
  • a feature value set corresponding to each cluster was calculated using a method of selecting a combination having the maximum discriminant reference value for each cluster from the entire combinations described above.
  • the feature set corresponding to cluster 1 is a combination of feature quantities a and h
  • the feature set corresponding to cluster 2 is a combination of feature quantities a and d
  • the feature set corresponding to cluster 3 is A combination of features a and g was used.
  • FIG. 18 shows the determination result by calculating the Mahalanobis distance for each learning data shown in FIG. 14 of cluster 1 to cluster 3 using the calculation method of the second embodiment.
  • . 18 (a) and 18 (b) are the same as FIG. 15 and will not be described.
  • the correct answer rate pO is 0.8333
  • the probability of coincidence pi is 0.3333
  • the overall correction judgment rate ⁇ is 0.75
  • the classification accuracy is improved compared to the conventional calculation method of Fig. 15.
  • each class is selected using a method of selecting the combination having the third highest discriminant reference value ⁇ for each cluster from the above-described total combinations.
  • the feature amount set corresponding to the star was calculated.
  • feature quantities a 'h, ag, d' e are used.
  • feature quantities a'f, a-d , a 'b, and three combinations of feature quantities e' g, a-c, and a 'g were used as the feature quantity set corresponding to cluster 3.
  • FIG. 19 uses the calculation method of the second embodiment, calculates the Mahalanobis distance for each learning data shown in FIG. 14 for cluster 1 to cluster 3, and further calculates the Mahalanobis distance of the calculation result. Is multiplied by the correction coefficient ( ⁇ ) _1 / 2 , and then the ranking of the distance is performed, and the determination result is shown.
  • the way of viewing FIGS. 19 (a) and 19 (b) is the same as FIG.
  • the correct answer rate ⁇ is 0.8333
  • the probability of coincidence pi is 0.3333
  • the overall correction judgment rate ⁇ is 0.75
  • the classification accuracy is improved compared to the conventional calculation method of Fig. 15.
  • a feature value set corresponding to each cluster was calculated using a method of selecting combinations having the third highest discriminant reference value ⁇ for each cluster from the above-described total combination power.
  • the feature set corresponding to cluster 1 uses three combinations of feature quantities a'h, a -g, d 'e, and the feature set corresponding to cluster 2 is feature quantity a'f, a-d, Three combinations of a 'b were used, and three combinations of features e' g, a- c, and a 'g were used as the feature set corresponding to cluster 3, and the Mahalanobis distance was used for voting
  • the number of clusters that fall in order from the smallest number to the third is calculated, and the largest number of clusters is the cluster to which the classification target data belongs.
  • FIG. 21 is a flowchart for explaining an operation example for selecting a feature quantity set
  • FIG. 22 is a flowchart for explaining an operation example in the clustering process.
  • step S1 in the flowchart of FIG. 5 corresponds to steps S101 to S105 in the flowchart of FIG.
  • Steps S2 to S4 in FIG. 21 are the same as those in the flowchart in FIG.
  • the flaw shape collected as learning data by the image acquisition unit 101 is irradiated by the illumination device 102, and the image data of the flaw portion is acquired by the imaging device 103 (step S102). Then, the flaw feature amount of each learning data is calculated from the image data acquired by the image acquisition unit 101 (step S 103).
  • the feature quantities of the obtained learning data are assigned to the classification destinations obtained visually, and the learning data in each cluster is specified (step S104).
  • step S101 to step S102 is repeated until the learning data of each cluster reaches a predetermined number (preset number of samples), for example, about 300 pieces each.
  • the clustering unit 105 performs the processing after step S2.
  • the clustering unit 105 is the clustering system in the first or second embodiment.
  • steps S31 to S34, S55, and S56 in FIG. 22 are the same as the flowchart in FIG. 22
  • the illumination device 102 illuminates the glass substrate that is the object to be inspected 100, and the imaging device 103 captures the surface of the glass substrate and images the captured image. Output to the acquisition unit 101.
  • the defect candidate detection unit 104 When a portion different from the planar shape is detected in the captured image input from the image acquisition unit 101, it is set as a defect candidate to be classified (step S201).
  • the defect candidate detection unit 104 cuts out the image data of the defect candidate portion from the captured image as classification target data.
  • the defect candidate detection unit 104 calculates the feature amount from the image data of the classification target data, and outputs the classification target data including the extracted feature amount set to the clustering unit 105 (step S 202).
  • the inspection apparatus of the present invention can classify scratches on a glass substrate with high accuracy for each type of scratch.
  • the defect type determination apparatus shown in FIG. 23 corresponds to the clustering system of the present invention already described by the clustering unit 105.
  • the image acquisition device 201 includes the image acquisition unit 101, the illumination device 102, and the imaging device 103 in FIG.
  • the learning data of each cluster to which the classification target data is classified has already been acquired and prepared in the cluster database 5 of the clustering apparatus 105. Therefore, the feature set selection in FIG. 5 is also completed.
  • a defect candidate is detected from the captured image input from the image acquisition device 202 attached to each manufacturing device !, and the image data is cut out, and the feature amount is extracted and output to the data collection device 203. To do.
  • the control device 200 transfers the classification target data input to the data collection device 203 to the clustering unit 105. As described above, the clustering unit 105 classifies the input classification target data for each cluster corresponding to the type of scratch.
  • the manufacturing management apparatus of the present invention is composed of a control device 300, manufacturing devices 301, 302, a notification unit 303, a recording unit 304, a defective device determination unit 305, and a defect type determination device 306. ing.
  • the defect type determination device 306 is the defect described in the section B above. This is the same as the type determination device.
  • the defect type determination device 306 performs image processing on the captured images from the image acquisition devices 201 and 202 provided in the manufacturing device 301 and the manufacturing device 302, respectively, and performs feature processing on the corresponding defect candidate detection unit 104. And classify the data to be classified.
  • the defective device determination unit 305 has a table indicating the relationship between the identification information of the classified cluster and the generation factor corresponding to the cluster, and is input from the defect type determination device 306.
  • the generation factor corresponding to the identification information of the cluster to be classified is read from the table, and the manufacturing apparatus that is the generation factor is determined. That is, the defective device determination unit 305 detects the cause of the defect in the product manufacturing process in accordance with the cluster identification information.
  • the defective device determination unit 305 notifies the operator from the notification unit 303, and also notifies the recording unit 304 of the identification number of the cluster into which the defect is classified, the generation factor, and the manufacturing process corresponding to the determined date and time.
  • the device identification information is stored as a history. Further, the control device 300 stops the manufacturing device determined by the defective device determination unit 305 or controls the control parameters.
  • another production management apparatus of the present invention includes a control device 300, production devices 301 and 302, a notification unit 303, a recording unit 304, and a clustering unit 105.
  • the clustering unit 105 has the same configuration as that described in the above sections A and B.
  • the feature data of the classification target data is the manufacturing condition (material quantity, processing temperature, pressure, processing speed, etc.) force in the manufacturing process of industrial products, for example, glass substrates. Are classified according to the manufacturing status of each stage of the manufacturing process.
  • the feature amount is input as a feature amount to the clustering unit 105 as process information detected by sensors provided in the respective manufacturing apparatuses 301 and 302.
  • the clustering unit 105 needs to adjust the manufacturing state of the glass manufacturing process in each process of each manufacturing apparatus according to the feature quantity of the classification target data as “normal state” and “defects are likely to occur. Status ”,“ dangerous and needs adjustment ”, etc. Similar. Then, the clustering unit 105 notifies the operator of the classification result by the notification unit 303, outputs the cluster identification information of the classification result to the control device 300, and also notifies the recording unit 304 at the determined date and time. Correspondingly, the identification number of the classified cluster of the manufacturing state of each process, the manufacturing condition which is the most problematic feature amount, and the identification information of the manufacturing apparatus are stored as history.
  • the control device 300 has a table indicating the correspondence between the cluster identification information and the adjustment items for returning the manufacturing conditions to normal and the data, and the manufacturing apparatus corresponding to the cluster identification information input from the clustering unit 105.
  • the adjustment items that return the conditions to normal and their data are read, and the corresponding manufacturing equipment is controlled by the read data.
  • a program for realizing the functions of the clustering system in FIG. 1 is recorded on a computer-readable recording medium, and the program recorded on the recording medium is read into the computer system and executed. You may perform clustering processing of the classification target data.
  • the “computer system” here includes OS and hardware such as peripheral devices.
  • Computer system also includes a WWW system equipped with a home page provision environment (or display environment).
  • the “computer-readable recording medium” means a portable medium such as a flexible disk, a magneto-optical disk, a ROM, a CD-ROM, or a storage device such as a hard disk built in the computer system.
  • the “computer-readable recording medium” means a volatile memory (RA) inside a computer system that becomes a server or client when a program is transmitted via a network such as the Internet or a communication line such as a telephone line. As shown in M), the program is held for a certain period of time.
  • RA volatile memory
  • the program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium.
  • the “transmission medium” for transmitting a program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line.
  • the program may be for realizing a part of the functions described above. Furthermore, what can realize the above-mentioned functions in combination with programs already recorded in the computer system, so-called differences. Even a minute file (difference program).
  • the present invention can be applied to the field of classifying and discriminating information having various types of features with high accuracy, such as detection of defects in glass articles, etc., and can also be used for manufacturing state detection devices and product manufacturing management devices. . It should be noted that the entire contents of the specification, patent claims, drawings and abstract of Japanese Patent Application 2006-186628 filed on July 6, 2006 are incorporated herein by reference. It is something that is incorporated.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Library & Information Science (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Provided is a clustering system capable of classifying target data more rapidly and precisely than the examples of the prior art. The clustering system classifies input data into each of clusters formed of the populations of learning data, in terms of featuring quantities owned by the input data. The clustering system comprises a featuring set storage unit stored with featuring quantity sets or a combination of featuring quantities to be used for the classifications, in a manner to correspond to the individual clusters, a featuring quantity extracting unit for extracting a preset featuring quantity from the input data, a distance calculating unit for calculating and outputting the distances between the center of the population of the individual clusters and the input data, individually as set distances for each featuring quantity set corresponding to each cluster, on the basis of the featuring quantities contained in the featuring quantity set, and a rank extracting unit for arraying the individual set distances in the order of the smaller length.

Description

明 細 書  Specification

クラスタリングシステムおよび欠陥種類判定装置  Clustering system and defect type determination apparatus

技術分野  Technical field

[0001] 本発明は、検出対象物の画像における欠陥部分の部分画像を切り出し、この部分 画像カゝら欠陥の特徴信号を抽出して、欠陥の種別を分類するクラスタリングシステム 、欠陥種類判定装置に関する。  TECHNICAL FIELD [0001] The present invention relates to a clustering system and a defect type determination device that cut out a partial image of a defect portion in an image of a detection target object, extract a feature signal of the defect from the partial image cover, and classify the defect type. .

背景技術  Background art

[0002] 未知データと学習データとの距離、例えばマハラノビス(Mahalanobis generalized distance)距離によるクラスタリング手法は従来力も一般的に行われている。すなわち 、未知データが事前に学習した母集団としてのクラスタに属する力否かを判定するこ とにより分類され、クラスタリング処理が行われる。たとえば、複数クラスタに対するマ ハラノビス距離の大小により、未知データがいずれの母集団のクラスタに属するかの 判定が行われている (たとえば、特許文献 1参照)。  A conventional clustering technique based on a distance between unknown data and learning data, for example, a Mahalanobis (Mahalanobis generalized distance) distance, is generally performed. That is, the unknown data is classified by determining whether or not it belongs to a cluster as a population learned in advance, and clustering processing is performed. For example, the determination of which population cluster the unknown data belongs to is made based on the Mahalanobis distance for multiple clusters (see, for example, Patent Document 1).

また、上述した距離を効率的に計算するため、複数の特徴量を選択してクラスタリン グの処理を行うことが行われて 、る。  In addition, in order to efficiently calculate the above-described distance, a plurality of feature amounts are selected and clustering processing is performed.

[0003] また、多数の識別器 (classifier)より得られた結果の投票により、その未知データが 帰属するクラスタの判定を行う手法も一般的であり、異なるセンサの出力の識別結果 、または一つの画像上の異なる領域に対する未知データの識別における識別結果 等が用いられている (たとえば、特許文献 2参照)。  [0003] In addition, a method of determining a cluster to which the unknown data belongs by voting results obtained from a large number of classifiers (classifiers) is also common. The identification result in the identification of unknown data for different areas on the image is used (for example, see Patent Document 2).

上記クラスタリング手法により、血液検査の結果得られたパラメータによる病気の診 断、すなわちいずれの病気に属するかのクラスタリングにおいて、複数のクラスタにお ける 2つのクラスタずつの組み合わせを設定し、この組合せごとに、被検データがい ずれかのクラスタに類似すると判定されるかを全ての組み合わせに対して行 、、その 判定の数の集計結果により、判定された数の多いクラスタに分類されると決定する手 法がある(たとえば、特許文献 3)。  With the clustering method described above, in diagnosing illnesses using parameters obtained as a result of blood tests, that is, in categorizing which disease belongs, a combination of two clusters in a plurality of clusters is set, and for each combination. Whether the test data is judged to be similar to any cluster is performed for all the combinations, and the result of counting the number of judgments determines that the data is classified into a cluster with a large number of judgments. There is a law (for example, Patent Document 3).

[0004] LCDガラス基板に付けられた各欠陥を、予め設定されている欠陥種類毎に分類す る際、分類する際の識別に対応させて、分類に用いる各特徴量の最適化を行い、こ の最適化に対応するように、各特徴量に対してそれぞれ重み付けを行い、この最適 化された特徴量を用いて、いずれのクラスタに属するかの判定を行うクラスタリングが 行われて 、る(例えば、特許文献 4)。 [0004] When each defect attached to the LCD glass substrate is classified for each preset defect type, each feature amount used for classification is optimized in accordance with the identification at the time of classification, This Clustering is performed to weight each feature amount so as to correspond to the optimization of each of the clusters, and to determine which cluster the cluster belongs to using the optimized feature amount (for example, Patent Document 4).

特許文献 1:特開 2005— 214682号公報  Patent Document 1: Japanese Patent Laid-Open No. 2005-214682

特許文献 2 :特開 2001— 56861号公報  Patent Document 2: Japanese Patent Laid-Open No. 2001-56861

特許文献 3 :特開平 07— 105166号公報  Patent Document 3: Japanese Patent Application Laid-Open No. 07-105166

特許文献 4:特開 2002— 99916号公報  Patent Document 4: Japanese Patent Laid-Open No. 2002-99916

発明の開示  Disclosure of the invention

発明が解決しょうとする課題  Problems to be solved by the invention

[0005] し力しながら、特許文献 3に示すクラスタリングにあっては、個々の組み合わせの最 適化が行われておらず、判別材料となる特徴量を活力ゝしきれておらず、かつ判別す べきクラスタが多くなると、組み合わせ数が膨大となり、判定処理に力かる時間が増大 してしまうという問題がある。  [0005] However, in the clustering shown in Patent Document 3, the individual combinations are not optimized, the feature quantities that are the discrimination materials are not fully used, and the discrimination is not performed. When the number of clusters to be increased increases, the number of combinations becomes enormous, and there is a problem that the time required for the determination process increases.

また、特許文献 4に示すクラスタリングにあっては、判定率を元に、特徴量に重みを 付け判別精度を向上させようとしているが、クラスタ毎の特徴量の最適化の概念が無 ぐ上述した特許文献 3と同様に、特徴量が生かし切れていないため、高い精度の分 類が行われな 、欠点がある。  Further, in the clustering shown in Patent Document 4, the feature amount is weighted based on the determination rate to improve the discrimination accuracy, but there is no concept of feature amount optimization for each cluster as described above. Similar to Patent Document 3, since the feature quantity is not fully utilized, there is a drawback that classification with high accuracy is not performed.

[0006] 本発明は、このような事情に鑑みてなされたもので、属するクラスタに分類する対象 である分類対象データ力 抽出された特徴量を判別する際に活かし、従来例に比し 、より高速に、より高精度に分類対象データを分類する、例えばガラス面に付いた欠 陥を、欠陥種類に対応したクラスタに分類することができるクラスタリングシステム、欠 陥種類判定装置を提供する。  [0006] The present invention has been made in view of such circumstances, and is utilized when discriminating the extracted feature quantity, which is an object to be classified into the cluster to which it belongs, and compared with the conventional example. Provided are a clustering system and a defect type determination device that can classify classification target data at high speed and with high accuracy, for example, can classify defects on a glass surface into clusters corresponding to defect types.

課題を解決するための手段  Means for solving the problem

[0007] 上述した課題を解決するため、本発明においては、分類対象データと、各クラスタと の間の距離を同一の種類の特徴量にて算出して分類先を決定する従来例とは異な り、各クラスタ間にて差分を得ることができる特徴量のセットをクラスタ毎に設定し、そ れぞれのクラスタとの間にて異なる特徴量にて距離を求めているため、従来に比較し てより精度の高 、分類を行うこととなる。 上述した特徴量のセットは、各クラスタに属する学習データの特性に基づいて行う ため、他のクラスタと区別が可能な特徴量にて構成されている。 [0007] In order to solve the above-described problem, the present invention is different from the conventional example in which the distance between the classification target data and each cluster is calculated with the same type of feature amount to determine the classification destination. Therefore, a set of feature quantities that can obtain differences between each cluster is set for each cluster, and the distance is obtained with different feature quantities for each cluster. Thus, classification is performed with higher accuracy. Since the set of feature values described above is performed based on the characteristics of the learning data belonging to each cluster, the feature value sets are configured with feature values that can be distinguished from other clusters.

すなわち、本発明は以下の構成を採用した。  That is, the present invention employs the following configuration.

[0008] 本発明のクラスタリングシステムは、学習データ(learning data)の母集団(populatio n)により形成されたクラスタ各々に、入力データ (input data)を、該入力データが有 する特徴量 (parameter)により分類するクラスタリングシステムにおいて、クラスタ各々 に対応して、分類に用いる特徴量の組合せである特徴量セット(parameter set)が記 憶されて 、る特徴量セット記憶部と、入力データから予め設定されて 、る特徴量を抽 出する特徴量抽出部と、各クラスタに対応した特徴量セット毎に、該特徴量セット〖こ 含まれる特徴量に基づいて、各クラスタの母集団の中心と前記入力データとの距離 を、各々セット距離として計算して出力する距離計算部と、前記各セット距離を小さい 順に配列する順位抽出部とを有することを特徴とする。  [0008] The clustering system of the present invention includes input data (input data) for each cluster formed by the population of learning data (learning data), and features (parameters) of the input data. In a clustering system that classifies according to the above, a feature set (parameter set) that is a combination of features used for classification is stored corresponding to each cluster, and is set in advance from a feature set storage unit and input data. A feature amount extracting unit for extracting the feature amount, and for each feature amount set corresponding to each cluster, based on the feature amount included in the feature amount set, the center of the population of each cluster and the input A distance calculating unit that calculates and outputs the distance to the data as a set distance, and a rank extracting unit that arranges the set distances in ascending order.

[0009] 本発明の好ましいクラスタリングシステムは、前記特徴量セットが各クラスタ毎に複 数設定されている。  In a preferred clustering system of the present invention, a plurality of the feature quantity sets are set for each cluster.

[0010] 本発明の好ましいクラスタリングシステムは、特徴量セット毎に得られた前記セット距 離にぉ 、て、該セット距離の順位に基づ 、て設定された入力データの各クラスタへの 分類基準を示す規則パターンにより、前記入力データがいずれのクラスタに属するか を検出するクラスタ分類部をさらに有して!/、る。  [0010] A preferred clustering system of the present invention provides a classification criterion for each cluster of input data set based on the rank of the set distance based on the set distance obtained for each feature amount set. A cluster classification unit for detecting which cluster the input data belongs to by a rule pattern indicating! /

[0011] 本発明の好ましいクラスタリングシステムは、前記クラスタ分類部が、前記セット距離 の順位により、前記入力データがいずれのクラスタに属するかを検出し、該順位が上 位となったセット距離が多いクラスタを、前記入力データの属するクラスタとして検出 する。  [0011] In a preferred clustering system of the present invention, the cluster classification unit detects which cluster the input data belongs to based on the rank of the set distance, and there are many set distances where the rank is higher. A cluster is detected as a cluster to which the input data belongs.

[0012] 本発明の好ま ヽクラスタリングシステムは、前記クラスタ分類部が、順位が上位と なった数に対する閾値を有しており、上位となったクラスタが該閾値以上であれば入 力データの属するクラスタとして検出する。  [0012] In a preferred clustering system of the present invention, the cluster classification unit has a threshold for the number of ranks higher than the rank, and the input data belongs if the higher-ranked cluster is equal to or higher than the threshold. Detect as a cluster.

[0013] 本発明の好ま ヽクラスタリングシステムは、前記距離計算部が、前記セット距離に 対して特徴量セット対応して設定されている補正係数を乗算し、各特徴量セット間に おけるセット距離を標準化することを特徴とする。 [0014] 本発明の好ましいクラスタリングシステムは、各クラスタ毎の特徴量セットを作成する 特徴量セット作成部をさらに有し、前記特徴量セット作成部が、各特徴量の複数の組 合せ毎に、各クラスタの母集団の学習データの平均値を原点とし、この原点と他のク ラスタの母集団の各学習データとの距離の平均値を求め、最も大きな平均値となった 特徴量の組合せを、各クラスタの他のクラスタとの識別に用いる特徴量セットとして選 択する。 In a preferred clustering system of the present invention, the distance calculation unit multiplies the set distance by a correction coefficient set corresponding to the feature amount set, and sets a set distance between each feature amount set. It is characterized by standardization. [0014] A preferred clustering system of the present invention further includes a feature quantity set creation unit that creates a feature quantity set for each cluster, and the feature quantity set creation unit has a plurality of combinations of each feature quantity. The average value of the learning data of the population of each cluster is used as the origin, the average value of the distance between this origin and each learning data of the population of another cluster is obtained, and the combination of the feature values that has the largest average value is obtained. Then, it is selected as a feature value set used for distinguishing each cluster from other clusters.

[0015] 本発明の欠陥種類判定装置は、上記記載のクラスタリングシステムのいずれかが設 けられ、前記入力データが製品の欠陥の画像データであり、欠陥を示す特徴量によ り、画像データにおける欠陥を、欠陥の種類別に分類する。  [0015] The defect type determination apparatus according to the present invention is provided with any of the above-described clustering systems, and the input data is image data of a product defect, and the image data includes a feature amount indicating the defect. Defects are classified by defect type.

本発明の好ましい欠陥種類判定装置は、前記製品がガラス物品であり、該ガラス物 品の欠陥を、欠陥の種類別に分類する。  In a preferred defect type determination apparatus according to the present invention, the product is a glass article, and the defects of the glass article are classified by defect type.

[0016] 本発明の欠陥検出装置は、上記欠陥種類判定装置が設けられた、製品の欠陥の 種別を検出する。 [0016] A defect detection apparatus according to the present invention detects a defect type of a product provided with the defect type determination apparatus.

[0017] 本発明の製造状態判定装置は、上記記載の欠陥種類判定装置が設けられた、製 品の欠陥の種別を行い、該種別に対応した発生要因との対応に基づき、製造プロセ スにおける欠陥の発生要因の検出を行う。  [0017] A manufacturing state determination device according to the present invention is provided with the defect type determination device described above, performs a type of product defect, and in a manufacturing process based on correspondence with an occurrence factor corresponding to the type. Detect the cause of the defect.

[0018] 本発明の好ましい製造状態判定装置は、上記記載のクラスタリングシステムのいず れかが設けられ、前記入力データが製品の製造プロセスにおける製造条件を示す特 徴量であり、この特徴量を、製造プロセスの各工程の製造状態別に分類する。  [0018] A preferable manufacturing state determination apparatus of the present invention is provided with any one of the above-described clustering systems, and the input data is a characteristic amount indicating a manufacturing condition in a manufacturing process of the product. Categorize by manufacturing state of each step of the manufacturing process.

本発明の好ましい製造状態判定装置は、前記製品がガラス物品であり、該ガラス物 品の製造プロセスにおける特徴量を、製造プロセスの各工程の製造状態別に分類す る。  In a preferable manufacturing state determination apparatus of the present invention, the product is a glass article, and the characteristic amount in the manufacturing process of the glass product is classified according to the manufacturing state of each step of the manufacturing process.

[0019] 本発明の製造状態検出装置は、上記記載の製造状態判定装置が設けられた、製 品の製造プロセスの各工程における製造状態の種別を検出する。  [0019] The manufacturing state detection device of the present invention detects the type of manufacturing state in each step of the product manufacturing process provided with the manufacturing state determination device described above.

[0020] 本発明の製品製造管理装置は、上記記載の製造状態判定装置が設けられた、製 品の製造プロセスの各工程における製造状態の種別の検出を行い、該種別に対応 した制御項目に基づき、製造プロセスの工程におけるプロセス制御を行う。  [0020] The product manufacturing management device of the present invention detects the type of manufacturing state in each step of the manufacturing process of the product provided with the above-described manufacturing state determination device, and sets the control item corresponding to the type. Based on this, process control in the manufacturing process is performed.

発明の効果 [0021] 以上説明したように、本発明によれば、分類先のクラスタ毎に、分類対象データの 有する複数の特徴量から、他のクラスタとの距離が遠くなる最適な特徴量の組合せを 予め設定しておき、分類対象データと各クラスタとの間における距離をそれぞれ計算 し、この計算された距離が最も小さいクラスタに、分類対象データを分類するため、従 来の手法に比較して、より正確に分類対象データを対応するクラスタに分類すること ができる。 The invention's effect [0021] As described above, according to the present invention, for each cluster to be classified, an optimal combination of feature amounts that is far from other clusters from a plurality of feature amounts of data to be classified is previously stored. It is set in advance, and the distance between the classification target data and each cluster is calculated, and the classification target data is classified into the cluster with the smallest calculated distance. The classification target data can be accurately classified into the corresponding clusters.

また、本発明によれば、クラスタ毎に上記組合せを複数設定し、全クラスタと分類対 象データとの計算結果の距離を小さ 、順にならベて、予め設定した数の上位グルー プに含まれる数が最も多いクラスタに、分類対象データを分類するため、従来に比較 して精度の高 、分類が行うことができる。  In addition, according to the present invention, a plurality of the above combinations are set for each cluster, and the calculation result distances of all the clusters and the classification target data are arranged in order, and are included in a preset number of upper groups. Since the data to be classified is classified into the cluster with the largest number, classification can be performed with higher accuracy than in the past.

図面の簡単な説明  Brief Description of Drawings

[0022] [図 1]本発明の第 1および第 2の実施形態によるクラスタリングシステムの構成例を示 すブロック図である。  FIG. 1 is a block diagram showing a configuration example of a clustering system according to first and second embodiments of the present invention.

[図 2]判別基準値 λによる特徴セットの選択に対する処理を説明するテーブルである  FIG. 2 is a table explaining a process for selecting a feature set based on a discrimination reference value λ.

[図 3]判別基準値 λによる特徴セットの選択に対する処理を説明するテーブルである FIG. 3 is a table for explaining processing for feature set selection based on a discrimination reference value λ.

[図 4]判別基準値 λによる特徴セットの選択に対する効果を説明するヒストグラムを示 す図である。 FIG. 4 is a diagram showing a histogram for explaining the effect of the discrimination criterion value λ on the feature set selection.

[図 5]第 1の実施形態による各クラスタに対する特徴量セットを選択する処理における 動作例を示すフローチャートである。  FIG. 5 is a flowchart showing an operation example in a process of selecting a feature amount set for each cluster according to the first embodiment.

[図 6]第 1の実施形態による分類対象データに対するクラスタリングの処理における動 作例を示すフローチャートである。  FIG. 6 is a flowchart showing an operation example in clustering processing for classification target data according to the first embodiment.

[図 7]第 2の実施形態におけるクラスタリングの処理に用いる規則パターンのテーブル を生成する動作例を示すフローチャートである。  FIG. 7 is a flowchart showing an operation example for generating a rule pattern table used for clustering processing in the second embodiment.

[図 8]第 2の実施形態による分類対象データに対するクラスタリングの処理における動 作例を示すフローチャートである。  FIG. 8 is a flowchart showing an operation example in clustering processing for classification target data according to the second embodiment.

[図 9]第 2の実施形態による分類対象データに対する他のクラスタリングの処理にお ける動作例を示すフローチャートである。 [FIG. 9] Another clustering process for the classification target data according to the second embodiment. It is a flowchart which shows the operation example which takes.

[図 10]第 3の実施形態による分類対象データに対するクラスタリングの処理における 動作例を示すフローチャートである。  FIG. 10 is a flowchart showing an operation example in clustering processing for classification target data according to the third embodiment.

圆 11]特徴量の変換方法としての演算式を設定する動作例を示すフローチャートで ある。 [11] This is a flowchart showing an operation example for setting an arithmetic expression as a feature quantity conversion method.

[図 12]図 11のフローチャートにおける評価値の算出の動作例を示すフローチャート である。  FIG. 12 is a flowchart showing an operation example of evaluation value calculation in the flowchart of FIG. 11.

圆 13]設定された変換方法を用いて変換した特徴量を用いた距離の算出の動作例 を示すフローチャートである。 [13] This is a flowchart showing an example of the operation of calculating the distance using the feature value converted by using the set conversion method.

[図 14]各クラスタに属する学習データを示すテーブルである。  FIG. 14 is a table showing learning data belonging to each cluster.

圆 15]図 14の学習データを従来例によるクラスタリング方法により分類した結果を示 す結果テーブルである。 [15] This is a result table showing the results of classifying the learning data in Fig. 14 by the clustering method of the conventional example.

圆 16]全体補正判定率の算出方法を説明する概念図である。 圆 16] It is a conceptual diagram illustrating a method for calculating the overall correction determination rate.

[図 17]図 14の学習データを第 1の実施形態におけるクラスタリングシステムにより分 類した結果を示す結果テーブルである。  FIG. 17 is a result table showing the result of classifying the learning data of FIG. 14 by the clustering system in the first embodiment.

[図 18]図 14の学習データを第 2の実施形態におけるクラスタリングシステムにより分 類した結果を示す結果テーブルである。  FIG. 18 is a result table showing the results of classifying the learning data of FIG. 14 by the clustering system in the second embodiment.

[図 19]図 14の学習データを第 2の実施形態におけるクラスタリングシステムにより分 類した結果を示す結果テーブルである。  FIG. 19 is a result table showing the results of classifying the learning data of FIG. 14 by the clustering system in the second embodiment.

[図 20]本発明のクラスタリングシステムを用いた検査装置の構成例を示すブロック図 である。  FIG. 20 is a block diagram showing a configuration example of an inspection apparatus using the clustering system of the present invention.

[図 21]図 20の検査装置における特徴量セットの選択の動作例を示すフローチャート である。  FIG. 21 is a flowchart showing an operation example of feature quantity set selection in the inspection apparatus of FIG.

[図 22]図 20の検査装置におけるクラスタリング処理の動作例を示すフローチャートで ある。  FIG. 22 is a flowchart showing an operation example of clustering processing in the inspection apparatus of FIG.

圆 23]本発明のクラスタリングシステムを用いた欠陥種類判定装置の構成例を示す ブロック図である。 圆 23] It is a block diagram showing a configuration example of a defect type determination device using the clustering system of the present invention.

[図 24]本発明のクラスタリングシステムを用いた製造管理装置の構成例を示すブロッ ク図である。 FIG. 24 is a block diagram showing a configuration example of a manufacturing management apparatus using the clustering system of the present invention. FIG.

[図 25]本発明のクラスタリングシステムを用いた他の製造管理装置の構成例を示す ブロック図である。  FIG. 25 is a block diagram showing a configuration example of another manufacturing management apparatus using the clustering system of the present invention.

符号の説明 Explanation of symbols

1· ··特徴量セット作成部  1 ··· Feature set creation part

2· "特徴量抽出部  2. “Feature extraction unit

3· 距離計算部  3.Distance calculator

4· "特徴量セット記憶部  4 · “Feature set storage unit

5· · ·クラスタデータベース  5. Cluster database

100· 被検査物  100 · Inspection object

101· ··画像取得部  101 ··· Image acquisition unit

102· ··照明装置  102 ··· Lighting equipment

103· 撮像装置  103 · Imaging device

104· 欠陥候補検出部  104 · Defect candidate detection unit

105· "クラスタリング部  105 · “Clustering Department

200, 300…制御装置  200, 300 ... Control device

201, 202· ··画像取得装置  201, 202 ... Image acquisition device

301, 302…製造装置  301, 302 ... Manufacturing equipment

303· ··告知部  303 ... Notification Department

304· ··記録部  304 ··· Recording section

発明を実施するための最良の形態 BEST MODE FOR CARRYING OUT THE INVENTION

本発明のクラスタリングシステムは、学習データを母集団として形成されたクラスタ 各々に、分類対象の入力データを、この入力データが有する特徴量により分類する クラスタリングシステムに関するものであり、前記クラスタ各々に対応して、分類に用い る特徴量の組合せである特徴量セットが記憶されている特徴量セット記憶部を有し、 特徴量抽出部が予め設定されている該特徴量セットに基づいて、前記入力データか ら特徴量を抽出し、距離計算部が各クラスタに対応した特徴量セット毎に、該特徴量 セットに含まれる特徴量に基づいて、母集団及び前記入力データとの距離を、各々 セット距離として計算し、順位抽出部が各セット距離を小さい順に配列し、配列順に 対応してクラスタへの分類を行うものである。 The clustering system of the present invention relates to a clustering system that classifies the input data to be classified into the clusters formed by using the learning data as a population according to the feature quantities of the input data, and corresponds to each of the clusters. A feature quantity set storage unit storing a feature quantity set that is a combination of feature quantities used for classification, and a feature quantity extraction unit based on the preset feature quantity set, the input data For each feature quantity set corresponding to each cluster, the distance calculation unit extracts the feature quantity from the feature quantity, and based on the feature quantity included in the feature quantity set, the distance between the population and the input data is obtained. The set distance is calculated, and the rank extraction unit arranges the set distances in ascending order and classifies them into clusters corresponding to the arrangement order.

[0025] <第 1の実施形態 >  <First Embodiment>

以下、本発明の第 1の実施形態によるクラスタリングシステムを図面を参照して説明 する。図 1は同実施形態によるクラスタリングシステムの構成例を示すブロック図であ る。  Hereinafter, a clustering system according to a first embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing a configuration example of the clustering system according to the embodiment.

本実施形態のクラスタリングシステムは、図 1に示すように、特徴量セット作成部 1, 特徴量抽出部 2,距離計算部 3,特徴量セット記憶部 4およびクラスタデータベース 5 を有している。  As shown in FIG. 1, the clustering system of this embodiment has a feature value set creation unit 1, a feature value extraction unit 2, a distance calculation unit 3, a feature value set storage unit 4, and a cluster database 5.

特徴量セット記憶部 4には、各クラスタの識別情報に対応して、クラスタ毎に個別に 設定された、分類対象データの特徴量の組合せを示す特徴量セットが記憶されて 、 る。たとえば、分類対象データが特徴量の集合 {a, b, c, d}である場合、各クラスタの 特徴量セットは [a, b] , [a, b, c, d] , [c]等の種類の特徴量の組合せとして設定さ れている。以下の説明においては、前記特徴量の集合から、特徴量全ての組合せ, 複数 (前記例においては、集合のいずれ力 2つ, 3つの特徴量)の組合せ,いずれか 1つを、「特徴量の組合せ」と定義する。  The feature value set storage unit 4 stores a feature value set indicating a combination of feature values of classification target data, which is individually set for each cluster, corresponding to the identification information of each cluster. For example, if the classification target data is a set of feature quantities {a, b, c, d}, the feature quantity set for each cluster is [a, b], [a, b, c, d], [c], etc. It is set as a combination of feature types. In the following description, a combination of all feature values or a combination of two or more (in the above example, any two forces or three feature values) from the feature value set, any one of them is referred to as “feature value”. Defined as “combinations of”.

[0026] ここで、クラスタ A, Bおよび Cが分類先のクラスタとして設定されている場合、各クラ スタに対応する特徴量セットは、各クラスタに予め分類されている学習データを用い、 各クラスタと他のクラスタとの距離が最も大きくなる特徴量の組合せとして求められ、 特徴量セット記憶部 4に記憶されて ヽる。 [0026] Here, when clusters A, B, and C are set as classification destination clusters, the feature value set corresponding to each cluster uses learning data that is classified in advance in each cluster, and each cluster Is obtained as a combination of feature quantities having the largest distance from the other clusters, and is stored in the feature quantity set storage unit 4.

たとえば、クラスタ Aに対して設定されている特徴量セットは、クラスタ Aに属する学 習データの各特徴量の平均値力 なるベクトルと、その他のクラスタ Bおよび Cに属す る学習データの各特徴量の平均値からなるベクトルとの距離が、最も大きくなる特徴 量の組合せとして設定されて 、る。  For example, the feature quantity set for cluster A includes the vector that is the average value of each feature quantity of learning data belonging to cluster A, and each feature quantity of learning data belonging to other clusters B and C. The distance from the average vector is set as the largest combination of features.

また、分類対象データと、各クラスタにおける母集団の学習データとは、同一の特 徴量の集合から構成されて!ヽる。  Further, the classification target data and the learning data of the population in each cluster are composed of the same set of features.

[0027] 特徴量抽出部 2は、入力される分類対象データから、各クラスタとの距離を計算す る際、計算対象となるクラスタに対応する特徴量セットを特徴量セット記憶部 4から読 み出し、この特徴量セットに対応した特徴量を、分類対象データの複数の特徴量から 抽出し、抽出した特徴量を距離計算部 3へ出力する。 [0027] When calculating the distance to each cluster from the input classification target data, the feature amount extraction unit 2 reads the feature amount set corresponding to the cluster to be calculated from the feature amount set storage unit 4. The feature quantity corresponding to this feature quantity set is extracted from the plurality of feature quantities of the classification target data, and the extracted feature quantity is output to the distance calculation unit 3.

距離計算部 3は、クラスタデータベース 5から、計算対象のクラスタの識別情報をキ 一とし、計算対象となるクラスタの学習データの各特徴量の平均値力 なるベクトルを 読み出し、このクラスタの特徴量セットに基づいて、分類対象データが抽出した特徴 量力 なるベクトルと、学習データの各特徴量の平均値からなるベクトル (クラスタに おける複数の学習データの重心位置を示す重心べ外ル)との距離を算出する。  The distance calculation unit 3 reads from the cluster database 5 a vector that is the average value of each feature quantity of the learning data of the cluster to be calculated, using the identification information of the cluster to be calculated as a key, and sets the feature quantity set of this cluster The distance between the vector of feature quantity extracted from the classification target data and the vector consisting of the average value of each feature quantity of the training data (the center of gravity center indicating the center of gravity of multiple learning data in the cluster) calculate.

[0028] 前記距離の計算を行う際、距離計算部 3は、特徴量間のデータ単位の差異を無く し、特徴量間の数値を標準化するため、以下の(1)式により、分類対象データにおけ る各特徴量 v (i)毎に正規ィ匕を行っている。 [0028] When calculating the distance, the distance calculation unit 3 eliminates the data unit difference between the feature quantities and standardizes the numerical values between the feature quantities. For each feature v (i), regularity is performed.

V (i) = (v (i) -avg. (i) ) /std. (i) …ひ)  V (i) = (v (i) -avg. (I)) / std. (I)… hi)

ここで、 v (i)は特徴量であり、 avg. (i)は計算対象のクラスタ内の学習データにおけ る特徴量の平均値であり、 std. (i)は計算対象のクラスタ内の学習データにおける特 徴量の標準偏差(standardized deviation)であり、 V (i)は正規化された特徴量である 。したがって、距離を計算する際に、距離計算部 3は特徴量セット毎に各特徴量の規 格ィ匕を行う必要がある。  Here, v (i) is the feature value, avg. (I) is the average value of the feature value in the training data in the cluster to be calculated, and std. (I) is the value in the cluster to be calculated. This is the standardized deviation of the feature in the training data, and V (i) is the normalized feature. Therefore, when calculating the distance, the distance calculation unit 3 needs to perform standardization of each feature value for each feature value set.

また、距離計算部 3は、前記正規化処理を、分類対象データにおける距離の計算 に用いる特徴量毎に、学習データのそれぞれ対応する特徴量の平均値および標準 偏差を用いて行う。  In addition, the distance calculation unit 3 performs the normalization process using the average value and the standard deviation of the corresponding feature values of the learning data for each feature value used for calculating the distance in the classification target data.

[0029] また、距離としては、上述した標準化した特徴量を用いた標準化ユークリッド距離 (s tandardized Euclidean distance) ,マハラノビス距離,ミンコフスキー距離(Minkowsk ydistance)などの!/、ずれを用いてもよ!、。  [0029] As the distance,! /, Such as standardized Euclidean distance, Mahalanobis distance, Minkowsky distance (Minkowsk ydistance) using the above-mentioned standardized features, may be used! ,.

ここで、マハラノビス距離を用いた場合、マハラノビス平方距離 (Mahalanobis squar ed distance)MHDは以下の(2)式により求められる。  Here, when the Mahalanobis distance is used, the Mahalanobis squared distance (Mahalanobis squared distance) MHD is obtained by the following equation (2).

MHD= (lZn) ' (VTR_1V) - - - (2) MHD = (lZn) '(V T R _1 V)---(2)

前記(2)式における行列 Vにおける各要素 V (i)は、未知データの多次元の特徴量 v (i)に対して、該当クラスタ内の学習データの特徴量の平均値 avg. (i)と標準偏差 std . (i)により、上述した( 1)式により求めた特徴量である。 nは自由度であり、本実施形 態にお ヽては特徴量セット(後述)における特徴量の数である特徴量数を示して!/ヽるEach element V (i) in the matrix V in the equation (2) is an average value avg. (I) of the feature value of the learning data in the corresponding cluster with respect to the multidimensional feature value v (i) of the unknown data. And the standard deviation std. (I), the feature amount obtained by the above-described equation (1). n is the degree of freedom. For the state, indicate the number of features that is the number of features in the feature set (described later)!

。これにより、マハラノビス平方距離は n個の変換された特徴量の差分を加算した数 値であり、(マハラノビス平方距離) Znにより母集団平均の単位距離が 1となる。また 、 VTは特徴量 v (i)を要素とする行列 Vの転置行列であり、 R_1はクラスタ内の学習デ ータにおける各特徴量間の相関行列 (correlation matrix)Rの逆行列である。 . As a result, the Mahalanobis square distance is a value obtained by adding the differences of n transformed feature quantities, and the unit distance of the population average becomes 1 due to (Mahalanobis square distance) Zn. V T is a transposed matrix of matrix V whose elements are features v (i), and R _1 is an inverse matrix of correlation matrix R between each feature in the learning data in the cluster. is there.

[0030] 特徴量セット作成部 1は、前記距離計算部 3が分類対象データと各クラスタとの間の 距離を計算する際に用いる特徴量セットを、各クラスタ毎に算出し、算出結果を各ク ラスタの識別情報に対応して、特徴量セット記憶部 4に書き込んで記憶させる。  [0030] The feature quantity set creation unit 1 calculates a feature quantity set used when the distance calculation unit 3 calculates the distance between the classification target data and each cluster for each cluster, and calculates the calculation result for each cluster. Corresponding to the cluster identification information, it is written and stored in the feature value set storage unit 4.

特徴量セットを算出する際、特徴量セット作成部 1は、各クラスタ毎に、特徴量セット を生成する対象クラスタに属する学習データの重心ベクトル (barycentric vector)と、 該対象クラスタを除いた他のクラスタすべてに属する学習データの重心ベクトルとの 距離を基に、以下の(3)式により判別基準 (discriminant criterion)の値 λを計算する 。以下、特徴量の組合せを特徴量セットとして説明する。  When calculating the feature quantity set, the feature quantity set creation unit 1 for each cluster, the centroid vector (barycentric vector) of the learning data belonging to the target cluster for generating the feature quantity set, and the other clusters excluding the target cluster. Based on the distance from the centroid vector of the learning data belonging to all the clusters, the value λ of the discriminant criterion is calculated by the following equation (3). Hereinafter, a combination of feature amounts will be described as a feature amount set.

[0031] λ = ω ω ( μ - μ ) Ζ/ ( ω σ 2+ ω . σ .2) · ' · (3) [0031] λ = ω ω (μ-μ) Ζ / (ω σ 2 + ω. Σ. 2 ) · '· (3)

前記(3)式にお 、て、 μ iは「対象クラスタに属する学習データ (クラスタ内母集団)」 の特徴量セットにおける特徴量の平均値力もなる重心ベクトルである。 σは該クラス タ内母集団に属する学習データの特徴量によるベクトルの標準偏差である。 ωは全 クラスタに属する学習データに対するクラスタ内母集団に属する学習データ数の比 率である。また、 μ は「対象クラスタ以外のクラスタに属する学習データ (対象クラスタ 外母集団)」の特徴量セットにおける特徴量の平均値力もなる重心ベクトルである。 σ は該対象クラスタ外母集団に属する学習データの特徴量によるベクトルの標準偏差 である。 ω は全クラスタに属する学習データにおけるクラスタ外母集団に属する学習  In the above equation (3), μ i is a centroid vector that also serves as an average value of feature quantities in a feature quantity set of “learning data belonging to the target cluster (cluster population)”. σ is a standard deviation of a vector based on a feature amount of learning data belonging to the population in the cluster. ω is the ratio of the number of learning data belonging to the population in the cluster to the learning data belonging to all clusters. In addition, μ is a centroid vector that also represents the average value of feature values in the feature value set of “learning data belonging to clusters other than the target cluster (population outside target cluster)”. σ is a standard deviation of a vector based on a feature amount of learning data belonging to the population outside the target cluster. ω is the learning belonging to the non-cluster population in the learning data belonging to all clusters

0  0

データ数の比率である。ここで、(3)式における ( μ ~ μ ) ί log (対数)および平方 根とした数値を用いてもよい。また、ここで各ベクトルを計算する際、特徴量セット作 成部 1は、(1)式により、各特徴量毎に規格化された特徴量を計算して用いる。また、 比率 co i及び ωを予め演算された、分離が大きくなる数値として固有値を設定するよ うにしてもよい。  It is the ratio of the number of data. Here, (μ to μ) ί log (logarithm) and a square root value in equation (3) may be used. In addition, when calculating each vector here, the feature value set creation unit 1 calculates and uses a feature value normalized for each feature value according to equation (1). Further, the eigenvalues may be set as numerical values for which the ratios co i and ω are calculated in advance and the separation becomes large.

[0032] そして、特徴量セット作成部 1は、各対象クラスタ毎に、前記(3)式を用いて、他のク ラスタとの前記判別基準値えを、学習データを構成する特徴量のいずれ力または全 ての組合せに対して計算し、計算された判別基準値えを大きい順に列べ、判別基準 値 λの順位リストを出力する。 [0032] Then, the feature value set creation unit 1 uses the above equation (3) for each target cluster and uses another class. The discrimination reference value with the raster is calculated for any force or all combinations of the feature values constituting the learning data, the calculated discrimination reference values are listed in descending order, and the order of the discrimination reference value λ is calculated. Output a list.

ここで、特徴量セット作成部 1は、最も大きな判別基準値えに対応する特徴量の組 合せを、対象クラスタの特徴量セットとして、判別基準値えの値とともに、クラスタの識 別情報に対応させて、特徴量セット記憶部 4へ記憶する。  Here, the feature quantity set creation unit 1 uses the combination of feature quantities corresponding to the largest discriminant reference value as the feature quantity set of the target cluster, along with the discriminant reference value and the cluster identification information. And stored in the feature value set storage unit 4.

[0033] 上述した判別基準値 λの決定は、図 2 (a)に示すように、特徴量セット作成部 1は、 各クラスタの特徴量セットの設定を行う際、学習データおよび分類対象データの特徴 量が a, b, c, dの 4つである場合、この 4つの特徴量全て,複数,いずれか 1つの全組 合せにおける判別基準値 λを全て計算する。  [0033] As shown in Fig. 2 (a), the determination criterion value λ is determined by the feature quantity set creation unit 1 when the feature data set of each cluster is set. If there are four feature values, a, b, c, and d, all of the four feature values, multiple, or any one of all combinations of the discrimination reference values λ are calculated.

そして、特徴量セット作成部 1は、最も高い数値、たとえば、図 2 (a)においては、特 徴量 b, cの組合せを選択する。  Then, the feature quantity set creation unit 1 selects the highest numerical value, for example, the combination of the feature quantities b and c in FIG. 2 (a).

[0034] また、他の判別基準値 λの方法として、図 2 (b)に記載したように、 BSS法、すなわ ち分類対象データの集合に含まれる特徴量 n個全てを用いた判別基準値 λを演算 し、次に特徴量 η個の集合カゝら η—1個を取り出す組合せ全てに対し、判別基準値え を演算する。そして、その n—l個の判別基準値え力 最大値の組合せを選択し、今 度は、この n—1個の特徴量力も n— 2個の組合せ全てに対し、判別基準値えを演算 する。このように、順位、 1個ずつ特徴量を集合カゝら減少させて、減少させた特徴量の 集合から、さらに 1個減少させた組合せを選択して判別基準値えを演算して、少ない 特徴量数で判別できる組合せを選択するよう、特徴量セット作成部 1を構成してもよ い。  [0034] As another method for determining the reference value λ, as described in Fig. 2 (b), the BSS method, that is, a reference using all n feature quantities included in the set of classification target data. The value λ is calculated, and then the discriminant reference value is calculated for all combinations in which η−1 is extracted from the set of feature values η. Then, select the combination of the n−l discriminant reference value maximum values, and calculate the discriminant reference value for all the n−1 feature amount forces. To do. In this way, the feature amount is reduced by one by one in the order, and the discrimination reference value is calculated by selecting the combination further reduced by one from the reduced feature amount set, and reducing the feature amount. The feature quantity set creation unit 1 may be configured to select a combination that can be identified by the number of feature quantities.

[0035] また、さらに、他の判別基準値えの方法として、図 2 (c)に記載したように、 FSS法、 すなわち分類対象データの集合に含まれる特徴量 n個から特徴量の全種類を 1個ず つ読み出し、各特徴量の判別基準値 λを演算し、この中から最大の判別基準値を有 する特徴量を選択する。次に、この特徴量とそれ以外の特徴量との 2つの特徴量から なる組合せを生成し、それぞれの組合せに対する判別基準値えを計算する。そして 、その組合せの中から最大の判別基準値を有する組合せを選択する。次に、この組 合せと、この組合せに含まれて ヽな 、特徴量との 3つの特徴量力もなる組合せを生 成し、それぞれの判別基準値えを生成する。このように、順次、直前の特徴量の組合 せから最大の判別基準値えを有する特徴量を選択し、組合せの特徴量を組合せに 対し、この組合せに存在しない特徴量を 1個増カロさせ、増加させた組合せの特徴量 の判別基準値 λを計算し、この組合せ力 最大の判別基準値 λを有する組合せを 選択し、さらにこの組合せに存在しな ヽ特徴量を 1個増加させた特徴量の組合せの 判別基準値えを演算して、最終的に、判別基準値えを計算した全ての組合せから、 判別基準値えが最大となる組合せを特徴量セットとして選択するよう、特徴量セット作 成部 1を構成してもよい。 [0035] Furthermore, as another method for determining the criterion value, as described in Fig. 2 (c), the FSS method, that is, all kinds of feature quantities from n feature quantities included in the set of classification target data are used. Are read one by one, the discrimination reference value λ of each feature quantity is calculated, and the feature quantity having the maximum discrimination reference value is selected from these. Next, a combination of two feature quantities, that is, the feature quantity and other feature quantities is generated, and a discrimination reference value for each combination is calculated. Then, the combination having the maximum discrimination reference value is selected from the combinations. Next, this combination and the combination that is included in this combination and has the three feature quantity powers are generated. And each discrimination reference value is generated. In this way, the feature quantity having the maximum discriminant reference value is selected sequentially from the combination of the immediately preceding feature quantities, and the feature quantity of the combination is increased by one with respect to the combination. The feature value of the increased combination feature value λ is calculated, the combination with the largest combination reference value λ is selected, and the feature value that is not present in this combination is increased by one feature. The feature value set is calculated so that the discrimination criterion value of the combination of amounts is calculated and, finally, the combination with the maximum discrimination criterion value is selected as the feature amount set from all the combinations for which the discrimination criterion value is calculated. Creation unit 1 may be configured.

[0036] 次に、判別基準値えによって、クラスタリングに用いる特徴量セットの選択の有効性 を、図 3および 4により示す。  Next, FIGS. 3 and 4 show the effectiveness of selecting feature quantity sets used for clustering based on the discrimination reference value.

図 3には、特徴量 a, b, c, d, eから、特徴量セットを選択する組合せとして、特徴量 aおよび gの組合せと、特徴量 aおよび hの組合せと、特徴量 dおよび eとの組合せを抽 出し、これらの組合せから、クラスタ 1と、クラスタ 2および 3とにおいて、従来例に比し て高 ヽ分類特性を有する特徴量セットの選択にっ ヽて説明する。  FIG. 3 shows combinations of feature amounts a and g, combinations of feature amounts a and h, and feature amounts d and e as combinations for selecting feature amount sets from feature amounts a, b, c, d, and e. From these combinations, cluster 1 and clusters 2 and 3 will be described with reference to selection of feature quantity sets having higher classification characteristics than conventional examples.

図 3において、 1は前記 .に、 2は前記 に、 び1は前記び.に、 σ 2は前記 σ に、 ω ΐは前記 ω .、 ω 2は前記 ω にそれぞれ対応している。  In FIG. 3, 1 corresponds to the above, 2 corresponds to the above, 1 corresponds to the above, σ 2 corresponds to the σ, ω ΐ corresponds to the ω., And ω 2 corresponds to the ω.

[0037] この中で、前記組合せにぉ 、て、最も判別基準値 λの値が大き 、のは特徴量 aお よび hの組合せであり、この組合せをクラスタ 1と、それ以外のクラスタとの分離に用い 、クラスタ 1とそれ以外のクラスタ (クラスタ 2及び 3)との分類結果を図 4により確認する 図 4にお 、て、横軸は特徴量の組合せを用いて演算したマハラノビス距離の logの 数値を示し、縦軸は対応する数値を有する分離対象データの数 (ヒストグラム)を示し ている。ここで、横軸の数値 1. 4は、マハラノビス距離の logの数値が 1. 4未満かつ 1 . 2以上(1. 4の左側の数値)であることを意味する。他の横軸上の数値も同様である 。また、図 4において 1. 4≤は 1. 4以上であることを表す。図 4のマハラノビス距離は 、クラスタ 1に対応する特徴量セットを用いて、クラスタ 1およびそれ以外のクラスタに 属する分類対象データに対して各々計算したものである。  [0037] Among them, the combination has the largest discriminant reference value λ, which is a combination of feature quantities a and h. This combination is defined as a combination of cluster 1 and other clusters. The classification results of cluster 1 and the other clusters (clusters 2 and 3) are confirmed in Fig. 4 for separation. In Fig. 4, the horizontal axis is the log of Mahalanobis distance calculated using the combination of features. The vertical axis represents the number of data to be separated (histogram) having a corresponding numerical value. Here, a numerical value of 1.4 on the horizontal axis means that the numerical value of the log of Mahalanobis distance is less than 1.4 and 1.2 or more (the value on the left side of 1.4). The same applies to the values on the other horizontal axes. In Fig. 4, 1.4≤ indicates that it is 1.4 or more. The Mahalanobis distance in Fig. 4 is calculated for the classification target data belonging to cluster 1 and other clusters using the feature set corresponding to cluster 1.

図 4 (a)が特徴量 aおよび gの組合せを用いてマハラノビス距離を演算した例であり 、図 4 (b)が特徴量 aおよび hの組合せを用いてマハラノビス距離を演算した例であり 、図 4 (c)が特徴量 dおよび eの組合せを用いてマハラノビス距離を演算した例です。 図 4におけるヒストグラムを見ると、判別基準値えの数値が大きいと、クラスタ 1と他の クラスタとの分類が良く行われていることが判る。 Fig. 4 (a) shows an example of calculating the Mahalanobis distance using a combination of features a and g. Fig. 4 (b) shows an example of calculating the Mahalanobis distance using a combination of features a and h. Fig. 4 (c) shows an example of calculating a Mahalanobis distance using a combination of features d and e. Looking at the histogram in Fig. 4, it can be seen that if the numerical value of the discrimination criterion is large, cluster 1 is well classified into other clusters.

[0038] 次に、図 5および図 6を参照して、図 1の第 1の実施形態によるクラスタリングシステ ムの動作を説明する。図 5は第 1の実施形態によるクラスタリングシステムの特徴量セ ット作成部 1の動作例を示すフローチャートであり、図 6は分類対象データのクラスタリ ングの動作例を示すフローチャートである。 Next, with reference to FIGS. 5 and 6, the operation of the clustering system according to the first embodiment of FIG. 1 will be described. FIG. 5 is a flowchart showing an operation example of the feature quantity set creation unit 1 of the clustering system according to the first embodiment, and FIG. 6 is a flowchart showing an operation example of the clustering of the classification target data.

以下の説明において、たとえば、分類対象データがガラス物品に付けられた傷の 特徴量の集合である場合、この特徴量として「a :キズ (scratch)の長さ」, 「b :キズの面 積」, 「c:キズの幅」, 「d :キズ部分を含む所定領域の透過率」, 「e :キズを含む所定 領域の反射率」などが、画像処理や測定結果から得られるとする。したがって、特徴 量の集合 (以下、特徴量集合とする)としては {a, b, c, d, e}となる。また、本実施形 態においては、クラスタリングに用いる距離を、規格ィ匕した特徴量を用いたマハラノビ ス距離として算出する。ここで、本実施形態における上記ガラス物品は、一例として、 板ガラスやディスプレイ用ガラス基板が挙げられる。  In the following description, for example, when the classification target data is a set of feature values of scratches attached to a glass article, these feature values are “a: length of scratch”, “b: scratch area”. ”,“ C: width of scratch ”,“ d: transmittance of a predetermined region including a scratch portion ”,“ e: reflectance of a predetermined region including a scratch ”, and the like are obtained from image processing and measurement results. Therefore, the set of feature values (hereinafter referred to as feature value set) is {a, b, c, d, e}. In this embodiment, the distance used for clustering is calculated as the Mahalanobis distance using the standardized feature value. Here, examples of the glass article in the present embodiment include plate glass and a glass substrate for display.

[0039] A.特徴量セット作成処理(図 5のフローチャート対応) [0039] A. Feature value set creation processing (corresponding to the flowchart in FIG. 5)

ユーザは、ガラスに付けられたキズを検出し、この画像を撮像して画像データを得 るとともに、この画像データカゝらキズ部分の長さの測定などの特徴量の抽出を画像処 理により行い、前記特徴量の集合からなる特徴量データを収集する。そして、ユーザ はキズの発生原因や形状などの分類したい各クラスタに対して、予め判っている発生 原因や形状などの情報に基づき、特徴量データを学習データとして振り分け、各クラ スタの学習データの母集団とし、図示しない処理端末からクラスタの識別情報に対応 させて、クラスタデータベース 5へ記憶させる(ステップ Sl)。  The user detects scratches on the glass, captures this image, obtains image data, and performs image processing to extract feature quantities such as measuring the length of the scratched part. Collecting feature value data consisting of the set of feature values. Then, the user sorts the feature data as learning data based on information such as the cause and shape that is known in advance for each cluster to be classified such as the cause and shape of the scratch, and the learning data of each cluster is sorted. A population is stored in the cluster database 5 in correspondence with the cluster identification information from a processing terminal (not shown) (step Sl).

[0040] 次に、特徴量セット作成部 1は、各クラスタに対する特徴量セットを生成する制御命 令を、前記処理端末から入力すると、クラスタデータベース 5から、各クラスタの識別 情報に対応して、学習データの母集団を読み込む。 [0040] Next, when a control instruction for generating a feature quantity set for each cluster is input from the processing terminal, the feature quantity set creation unit 1 corresponds to the identification information of each cluster from the cluster database 5. Read the population of learning data.

そして、特徴量セット作成部 1は、各クラスタ毎に、クラスタ内母集団における各特徴 量の平均値および標準偏差を算出し、この平均値および標準偏差を用いて、(1)式 から、各学習データにおける規格化された特徴量を算出する。 Then, the feature quantity set creation unit 1 performs each feature in the cluster population for each cluster. The average value and standard deviation of the quantities are calculated, and the standardized feature quantity in each learning data is calculated from the equation (1) using the average value and standard deviation.

[0041] 次に、特徴量セット作成部 1は、特徴量集合に含まれる特徴量の全ての組合せの 特徴量セット毎に、(3)式により判別基準値えを算出する。  [0041] Next, the feature quantity set creation unit 1 calculates a discrimination reference value according to equation (3) for each feature quantity set of all combinations of feature quantities included in the feature quantity set.

このとき、特徴量セット作成部 1は、クラスタ毎に、クラスタ内母集団の規格化された 特徴量を用いて、各特徴量セットに対応した特徴量力もなるベクトルの平均値 (重心 ベクトル) μと、クラスタ内母集団における特徴量セットに対応する特徴量カゝらなる学 習データのベクトルの標準偏差 σと、クラスタ外母集団の規格化された特徴量を用 いて、各特徴量セットに対応した特徴量力 なるベクトルの平均値 (重心ベクトル) μ と、クラスタ外母集団における特徴量セットに対応する特徴量からなる学習データの ベクトルの標準偏差 σ と、全学習データ数におけるクラスタ内母集団の学習データ 数の比率 ωと、全学習データ数におけるクラスタ外母集団の学習データ数の比率 ω とを算出する。  At this time, the feature value set creation unit 1 uses, for each cluster, the standardized feature values of the population in the cluster, and the average value of vectors (centroid vectors) μ that also have feature value power corresponding to each feature value set. And the standard deviation σ of the learning data vector corresponding to the feature quantity set in the population within the cluster and the standardized feature quantity of the population outside the cluster, to each feature quantity set. Corresponding feature value power vector mean value (centroid vector) μ, standard deviation σ of learning data vector consisting of features corresponding to feature quantity set in non-cluster population, and intra-cluster population in the total number of learning data The ratio ω of the number of learning data and the ratio ω of the number of learning data of the non-cluster population in the total number of learning data are calculated.

[0042] そして、特徴量セット作成部 1は、前記重心ベクトル , μ ,標準偏差 σ , σ ,比 率 ω , ωを用いて、(3)式により、各クラスタ毎に他のクラスタとの距離を判別する判 別基準値えを、各クラスタ毎に、特徴量集合の全ての組合せの特徴量セットに対して 計算する。  [0042] Then, the feature quantity set creation unit 1 uses the centroid vectors, μ, standard deviations σ, σ, and ratios ω, ω to calculate the distance from other clusters for each cluster according to equation (3). The discriminant reference value for discriminating is calculated for each cluster feature amount set of all combinations of feature amount sets.

全ての判別基準値えの計算が終了すると、特徴量セット作成部 1は、各クラスタ毎 に、大きい順に判別基準値えを列べ、最も大きな判別基準値えに対応する特徴量セ ットを、各クラスタへの所属を判定する際に、距離の算出に用いる特徴量の組合せの 集合を示す特徴量セットとして検出する (ステップ S2)。  When the calculation of all the discrimination reference values is completed, the feature quantity set creation unit 1 lists the discrimination reference values in the descending order for each cluster, and selects the feature quantity set corresponding to the largest discrimination reference value. When determining the affiliation to each cluster, it is detected as a feature value set indicating a set of feature value combinations used for calculating the distance (step S2).

[0043] 次に、特徴量セット作成部 1は、距離計算部 3での距離の計算に用いるため、各特 徴量セットに対応した特徴量間の相関係数 Rと、各クラスタ内母集団における学習デ ータの特徴量の平均値 avg. (i)および標準偏差 std. (i)とを算出する (ステップ S3)。  [0043] Next, the feature quantity set creation unit 1 uses the correlation coefficient R between the feature quantities corresponding to each feature quantity set and the population within each cluster for use in the distance calculation by the distance calculation unit 3. The average value avg. (I) and standard deviation std. (I) of the feature values of the learning data at are calculated (step S3).

[0044] 次に、特徴量セット作成部 1は、前記判別基準値えから補正係数え _ (1/2)を算出す る。この補正係数え は、各特徴量セット間の標準化をとるものである。クラスタに よって、他のクラスタとの距離がばらついているため、分類精度を上げるために、特徴 量セット間の標準化を行う必要がある。また、補正係数としてえ _ (1/2)ではなぐ log ( λ )としたり、あるいは単純に( 。— を用いても良ぐ λを含む関数であって特徴 量セット間の標準化が行えるものであれば 、ずれでもよ 、。 [0044] Next, the feature value set creation unit 1 calculates a correction coefficient _ (1/2) from the discrimination reference value. This correction coefficient is standardized between each feature set. Since the distance from other clusters varies depending on the cluster, it is necessary to standardize the feature sets in order to increase the classification accuracy. In addition, the correction coefficient Toshitee _ (1/2) in Nag log ( λ), or simply (.-can be used. If it is a function that includes λ and can standardize between feature sets, it can be shifted.

また、上記(3)式において、対象クラスタ外母集団の特徴量セットにおける重心べク トル を算出する際、対象クラスタ外母集団における学習データとして以下の 3つの 種類のいずれかを選択して算出する。  In the above equation (3), when calculating the center of gravity vector in the feature set of the population outside the target cluster, select one of the following three types as the learning data in the population outside the target cluster. To do.

a.全学習データにおける対象クラスタ外母集団の全ての学習データ  a. All learning data of the population outside the target cluster in all learning data

b.上記対象クラスタ外母集団における分類の目的に対応する特定の学習データ c特徴量の選択に用いた学習データにおける対象クラスタ外母集団の学習データ ここで、 b.の分類の目的とは注目しているクラスタと明確に差を付けて区別すること であり、学習データとしてはこの差を付けたい他のクラスタに含まれる学習データを用 いる。  b. Specific learning data corresponding to the purpose of classification in the population outside the target cluster c Learning data of the population outside the target cluster in the learning data used to select the feature amount Here, the purpose of classification in b. The learning data included in other clusters to which this difference is desired is used as the learning data.

そして、特徴量セット作成部 1は、各クラスタの識別情報毎に対応させて、特徴量セ ットと、特徴量セットに対応した補正係数、本実施形態にてはえ _ (1/2)の値と、逆行列 R_1と、平均値 avg. (i)と、標準偏差 std. (i)を、特徴量セット記憶部 4に、距離計算デ ータとして記憶する (ステップ S4)。 Then, the feature quantity set creation unit 1 associates the identification information of each cluster with the feature quantity set, the correction coefficient corresponding to the feature quantity set, and in this embodiment fly_ (1/2) , The inverse matrix R_1 , the average value avg. (I), and the standard deviation std. (I) are stored as distance calculation data in the feature quantity set storage unit 4 (step S4).

[0045] B.クラスタリング処理(図 6のフローチャート対応) [0045] B. Clustering processing (corresponding to the flowchart in FIG. 6)

分類対象データが入力されると、特徴量抽出部 2は、各クラスタの識別信号により、 クラスタ毎に対応した特徴量セットを、特徴量セット記憶部 4から読み出す。  When the classification target data is input, the feature quantity extraction unit 2 reads a feature quantity set corresponding to each cluster from the feature quantity set storage unit 4 based on the identification signal of each cluster.

そして、特徴量抽出部 2は、読み出した特徴量セットにおける特徴量の種別に対応 して、分類対象データから特徴量を、各クラスタ毎に抽出し、クラスタの識別情報それ ぞれに対応させて、抽出した特徴量を内部記憶部に記憶する (ステップ Sl l)。  Then, the feature quantity extraction unit 2 extracts the feature quantity from the classification target data for each cluster corresponding to the type of feature quantity in the read feature quantity set, and associates it with each cluster identification information. Then, the extracted feature value is stored in the internal storage unit (step Sl l).

[0046] 次に、距離計算部 3は、分類対象データから抽出した各特徴量を、特徴量セット記 憶部 4から、該特徴量に対応する平均値 avg. (i)と標準偏差 std. (i)を読み出し、前記 (2)式の演算を行うことにより規格ィ匕し、内部記憶部に記憶されている特徴量を、規 格化した特徴量に置き換える。 [0046] Next, the distance calculation unit 3 obtains each feature amount extracted from the classification target data from the feature amount set storage unit 4 with an average value avg. (I) corresponding to the feature amount and a standard deviation std. (i) is read out, standardized by performing the calculation of equation (2), and the feature quantity stored in the internal storage unit is replaced with the standardized feature quantity.

そして、距離計算部 3は、上述のように得られた V (i)の要素力もなる行列 Vを生成し 、この行列 Vの転置行列 VTを計算し、(3)式により、順次、分類対象データと各クラス タとの間のマハラノビス距離を計算し、各クラスタの識別情報に対応させて、内部記 憶部に記憶する (ステップ S 12)。 Then, the distance calculation unit 3 generates the matrix V that also has the elemental force of V (i) obtained as described above, calculates the transposed matrix V T of this matrix V, and sequentially classifies it according to equation (3). The Mahalanobis distance between the target data and each cluster is calculated, and the internal data is recorded according to the identification information of each cluster. Store in memory (step S12).

[0047] 次に、距離計算部 3は、計算結果の前記マハラノビス距離に対して、特徴量セット に対応する補正係数え _ (1/2)を乗算し、補正距離を求めて、それぞれマハラノビス距 離と置き換える (ステップ S 13)。また、補正係数を乗算する際、マハラノビス距離の lo gまたは平方根を計算した後に乗算するようにしてもょ 、。 [0047] Next, the distance calculation unit 3 multiplies the Mahalanobis distance of the calculation result by a correction coefficient _ (1/2) corresponding to the feature quantity set to obtain a correction distance, and each Mahalanobis distance Replace with separation (step S13). Also, when multiplying the correction factor, it may be multiplied after calculating the log or square root of the Mahalanobis distance.

そして、距離計算部 3は、内部記憶部における各クラスタ間との補正距離を比較し( ステップ S14)、最小の補正距離を検出し、その補正距離に対応する識別情報のクラ スタを、分類対象データの属するクラスタとし、クラスタデータベース 5に対し、分類先 のクラスタの識別情報に対応させ、分類した分類対象データを記憶する (ステップ S1 5)。  Then, the distance calculation unit 3 compares the correction distances between the clusters in the internal storage unit (step S14), detects the minimum correction distance, and identifies the cluster of identification information corresponding to the correction distances as the classification target. As the cluster to which the data belongs, the classified data to be classified is stored in the cluster database 5 in correspondence with the identification information of the cluster to be classified (step S15).

[0048] <第 2の実施形態 >  <Second Embodiment>

上述した第 1の実施形態は、クラスタリングを行う際に用いる特徴量セットを、クラス タ毎に 1種類として説明したが、以下に説明する第 2の実施形態のように、クラスタ毎 に特徴量セットを複数設定して、それぞれの特徴量セットに対応したマハラノビス距 離を演算し、補正距離を算出して、この補正距離を小さい順番に並び替え、上位の 所定の順位以内の補正距離により、予め設定された規則に応じて、分類対象データ の属するクラスタとしてちょ 、。  In the first embodiment described above, the feature amount set used for clustering is described as one type for each cluster. However, as in the second embodiment described below, the feature amount set is set for each cluster. Multiple, and calculate the Mahalanobis distance corresponding to each feature set, calculate the correction distance, rearrange the correction distances in ascending order, and in advance by the correction distance within the upper predetermined rank As a cluster to which the data to be classified belongs, according to the set rules.

[0049] すなわち、本実施形態における距離計算部 3は、特徴量セット毎に得られた分類対 象データと各クラスタとの距離にぉ 、て、この距離の順位に基づ 、て設定された分類 対象データの各クラスタへの分類基準を示す規則パターンにより、分類対象データ カ^、ずれのクラスタに属するかを検出する。  That is, the distance calculation unit 3 according to the present embodiment is set based on the distance ranking based on the distance between the classification target data obtained for each feature quantity set and each cluster. Based on the rule pattern indicating the classification criteria for each cluster of the classification target data, it is detected whether the classification target data belongs to the misaligned cluster.

以下、第 2の実施形態の構成は、図 1に示す第 1の実施形態と同様であり、同一の 符号を各構成に付し、各構成において第 1の実施形態と異なる動作のみを、図 7を用 いて説明する。第 2の実施形態においては、学習データ力も上記規則パターンを設 定する処理がある。図 7は規則パターンを設定する距離の順位に対するパターン学 習の動作例を示すフローチャートである。図 8及び図 9は第 2の実施形態におけるク ラスタリングの動作例を示すフローチャートである。  Hereinafter, the configuration of the second embodiment is the same as that of the first embodiment shown in FIG. 1, and the same reference numerals are given to the respective components, and only the operations different from those of the first embodiment in each configuration are illustrated in FIG. Use 7 to explain. In the second embodiment, there is a process for setting the rule pattern for the learning data power. FIG. 7 is a flowchart showing an example of pattern learning operation for the order of distances for setting rule patterns. 8 and 9 are flowcharts showing an example of clustering operation according to the second embodiment.

[0050] また、第 1の実施形態において、特徴量セットを作成する際、特徴量セット作成部 1 は、各クラスタ毎に、特徴量の組合せとしての複数の特徴量セットに対して判別基準 値えを算出し、複数求められた判別基準値えの最大値に対応する特徴量セットを、 各クラスタの特徴量セットとして設定した。 Further, in the first embodiment, when creating a feature quantity set, a feature quantity set creation unit 1 For each cluster, a discrimination reference value is calculated for a plurality of feature quantity sets as a combination of feature quantities, and a feature quantity set corresponding to the maximum value of the plurality of obtained discrimination reference values is calculated for each cluster. It was set as a feature amount set.

一方、第 2の実施形態において、特徴量セット作成部 1は、各クラスタ毎に、他のク ラスタの 1つまたは複数の組合せある ヽは他の全てのクラスタに対して、それぞれ特 徴量の組合せ数に対応する特徴量セットの最大値を設定することにより、複数の判 別基準値えを求め、各クラスタ毎に他のクラスタと分離するための複数の特徴量セッ トを設定する。  On the other hand, in the second embodiment, the feature value set creation unit 1 has one or more combinations of other clusters for each cluster. By setting the maximum value of the feature quantity set corresponding to the number of combinations, a plurality of judgment reference values are obtained, and a plurality of feature quantity sets for separating each cluster from each other is set.

そして、特徴量セット作成部 1は、各特徴量セット毎に距離計算データを求め、クラ スタの識別情報に対応させて、複数の特徴量セットと、各特徴量セットの距離計算デ ータを特徴量セット記憶部 4に記憶させる。  Then, the feature quantity set creation unit 1 obtains distance calculation data for each feature quantity set, and associates a plurality of feature quantity sets with the distance calculation data of each feature quantity set in accordance with the cluster identification information. Stored in the feature set storage unit 4.

[0051] そして、図 7において、学習データが入力されると、特徴量抽出部 2は、各クラスタ の識別信号により、クラスタ毎に対応した複数の特徴量セットを、特徴量セット記憶部 4から読み出す。 Then, in FIG. 7, when learning data is input, the feature quantity extraction unit 2 receives a plurality of feature quantity sets corresponding to each cluster from the feature quantity set storage unit 4 according to the identification signal of each cluster. read out.

そして、特徴量抽出部 2は、読み出した各特徴量セットにおける特徴量の種別に対 応して、学習データから特徴量を、各クラスタ毎に抽出し、クラスタの識別情報それぞ れに対応させて、抽出した特徴量を特徴量セット毎に内部記憶部に記憶する (ステツ プ S21)。  Then, the feature quantity extraction unit 2 extracts the feature quantity from the learning data for each cluster corresponding to the type of feature quantity in each read feature quantity set, and associates it with the identification information of each cluster. Then, the extracted feature quantity is stored in the internal storage unit for each feature quantity set (step S21).

[0052] 次に、距離計算部 3は、学習データから抽出した各特徴量を、特徴量セット記憶部 4から、特徴量セット毎に該特徴量に対応する平均値 avg. (i)と標準偏差 std. (i)を読 み出し、前記(2)式の演算を行うことにより規格ィ匕し、内部記憶部に記憶されている 特徴量を、規格化した特徴量に置き換える。  [0052] Next, the distance calculation unit 3 extracts each feature quantity extracted from the learning data from the feature quantity set storage unit 4 and the average value avg. (I) corresponding to the feature quantity for each feature quantity set and the standard The deviation std. (I) is read out and standardized by performing the calculation of equation (2) above, and the feature quantity stored in the internal storage unit is replaced with the standardized feature quantity.

そして、距離計算部 3は、上述のように得られた V (i)の要素力もなる行列 Vを生成し 、この行列 Vの転置行列 VTを計算し、(3)式により、順次、学習データと各クラスタと の間のマハラノビス距離を計算し、各クラスタの識別情報に対応させて、各特徴量セ ット毎に内部記憶部に記憶する (ステップ S22)。 Then, the distance calculation unit 3 generates a matrix V that also has the elemental force of V (i) obtained as described above, calculates a transposed matrix V T of this matrix V, and sequentially learns according to Equation (3). The Mahalanobis distance between the data and each cluster is calculated, and stored in the internal storage unit for each feature quantity set in correspondence with the identification information of each cluster (step S22).

[0053] 次に、距離計算部 3は、計算結果の前記マハラノビス距離に対して、特徴量セット に対応する補正係数え _ (1/2)を乗算し、補正距離を求めて、それぞれマハラノビス距 離と置き換える (ステップ S23)。 [0053] Next, the distance calculation unit 3 multiplies the Mahalanobis distance of the calculation result by a correction coefficient _ (1/2) corresponding to the feature amount set to obtain a correction distance, and each Mahalanobis distance Replace with release (step S23).

そして、距離計算部 3は、内部記憶部における各クラスタ間との補正距離を小さい 順に並べ替え (小さ ヽ補正距離ほど上位となる順位に並べ替え)、すなわち分類対 象データとの補正距離の小さ 、クラスタの識別情報が上位となる順番に並べる (ステ ップ S24)。  Then, the distance calculation unit 3 sorts the correction distances between the clusters in the internal storage unit in ascending order (sorts the smaller correction distances in a higher rank), that is, the correction distance from the classification target data becomes smaller. Then, arrange them in the order of higher cluster identification information (step S24).

[0054] 次に、距離計算部 3は、小さい方 (上位)から n番目までの各補正距離に対応するク ラスタの識別情報を検出し、その n個に含まれる各クラスタ毎の識別情報の数をカウ ント、すなわち各クラスタに対して投票処理を行う。  [0054] Next, the distance calculation unit 3 detects the identification information of the cluster corresponding to each of the correction distances from the smaller (higher) to the n-th, and the identification information for each cluster included in the n is detected. Count the number, that is, vote for each cluster.

そして、距離計算部 3は、各学習データの各クラスタの識別情報のカウント数のバタ ーンが、同一のクラスタに含まれる学習データに共通する規則パターンを検出する。 たとえば、 nを 10としたとき、クラスタ Bの学習データの場合、クラスタ Aが 5個,クラス タ Bが 3個、クラスタ Cが 2個となるカウント数のパターンとなることが検出されるとこれを 規則 R1とする。  Then, the distance calculation unit 3 detects a rule pattern in which the number of identification information counts of each cluster of each learning data is common to the learning data included in the same cluster. For example, if n is set to 10 and the learning data for cluster B is detected as a count pattern with 5 clusters A, 3 clusters B, and 2 clusters C, this is detected. Is rule R1.

また、クラスタ Cの学習データの場合、クラスタ Cが 3個検出されると、クラスタ Aが 7 個で、クラスタ Bが 0個であっても、必ずクラスタ Cであることが共通であると、クラスタ C のカウント数が 3以上であれば、他のクラスタのカウント数に無関係にクラスタ Cとする 規則 R2とする。  Also, in the case of learning data for cluster C, if 3 clusters C are detected, 7 clusters A and 0 clusters B will always be cluster C. If the count of C is 3 or more, rule R2 is set to cluster C regardless of the count of other clusters.

また、クラスタ Aの学習データの場合、クラスタ Aが上位から 1番目及び 2番目を占め た並びのパターンのとき、クラスタ Bのカウント数が 8個であっても、他のクラスタのカウ ント数に無関係にクラスタ Aとする規則 R3とする。  In addition, in the case of learning data for cluster A, when cluster A has the first and second patterns from the top, even if the number of counts for cluster B is 8, the number of counts for other clusters Regardless of rule R3, cluster A is used.

[0055] 上述したように、同一クラスタに分類される各学習データが有する各クラスタのカウ ント数の規則性を検出し、各クラスタの識別情報毎にパターンテーブルとして内部に 記憶しておく。ここで、規則は各クラスタに 1つでもよいし、複数設定しておいてもよい 。また、上述の説明において、距離計算部 3が規則パターンを抽出するとしたが、ュ 一ザが各クラスタへの分類の精度を変えるために、カウント数あるいは並びの規則パ ターンを任意に設定してもよい。 [0055] As described above, the regularity of the number of counts of each cluster included in each learning data classified into the same cluster is detected and stored internally as a pattern table for each piece of identification information of each cluster. Here, one rule may be set for each cluster, or a plurality of rules may be set. In the above description, the distance calculation unit 3 extracts the rule pattern. However, in order for the user to change the accuracy of classification into each cluster, the count number or the order rule pattern can be set arbitrarily. Also good.

クラスタによっては、他のクラスタと特徴情報の特性が似ているものもあり、複数のク ラスタの関連性、すなわち各クラスタのカウント数あるいは上位からの並びのパターン である対象パターンから、分類対象データの分類を行う方が精度の高い場合もあり、 本実施形態はその点を補完するものである。 Some clusters have similar characteristics to the characteristics of other clusters, and the relevance of multiple clusters, that is, the count number of each cluster or the pattern from the top In some cases, it is more accurate to classify the classification target data from the target pattern, and this embodiment complements this point.

[0056] 次に、上述したテーブルに記述された規則を用いた第 2の実施形態のクラスタリン グの処理につ 、て、図 8のフローチャートを用いて説明する。  Next, the clustering process of the second embodiment using the rules described in the above table will be described with reference to the flowchart of FIG.

分類対象データが入力されると、特徴量抽出部 2は、各クラスタの識別信号により、 クラスタ毎に対応した複数の特徴量セットを、特徴量セット記憶部 4から読み出す。 そして、特徴量抽出部 2は、読み出した各特徴量セットにおける特徴量の種別に対 応して、分類対象データから特徴量を、各クラスタ毎に抽出し、クラスタの識別情報そ れぞれに対応させて、抽出した特徴量を特徴量セット毎に内部記憶部に記憶する( ステップ S31)。  When the classification target data is input, the feature quantity extraction unit 2 reads a plurality of feature quantity sets corresponding to each cluster from the feature quantity set storage unit 4 in accordance with the identification signal of each cluster. Then, the feature quantity extraction unit 2 extracts the feature quantity from the classification target data for each cluster corresponding to the feature quantity type in each read feature quantity set, and each of the cluster identification information. Correspondingly, the extracted feature quantity is stored in the internal storage unit for each feature quantity set (step S31).

[0057] 次に、距離計算部 3は、分類対象データから抽出した各特徴量を、特徴量セット記 憶部 4から、特徴量セット毎に該特徴量に対応する平均値 avg. (i)と標準偏差 std. (i) を読み出し、前記(2)式の演算を行うことにより規格ィ匕し、内部記憶部に記憶されて いる特徴量を、規格化した特徴量に置き換える。  [0057] Next, the distance calculation unit 3 extracts each feature quantity extracted from the classification target data from the feature quantity set storage unit 4 for each feature quantity set by calculating an average value avg. (I) And the standard deviation std. (I) are read out and standardized by performing the calculation of equation (2), and the feature quantity stored in the internal storage unit is replaced with the standardized feature quantity.

そして、距離計算部 3は、上述のように得られた V (i)の要素力もなる行列 Vを生成し Then, the distance calculation unit 3 generates a matrix V that also has the elemental force of V (i) obtained as described above.

、この行列 Vの転置行列 VTを計算し、(3)式により、順次、分類対象データと各クラス タとの間のマハラノビス距離を計算し、各クラスタの識別情報に対応させて、各特徴 量セット毎に内部記憶部に記憶する (ステップ S32)。 Then, the transpose matrix V T of this matrix V is calculated, and the Mahalanobis distance between the data to be classified and each cluster is calculated sequentially by equation (3), and each feature is correlated with the identification information of each cluster. Each quantity set is stored in the internal storage unit (step S32).

[0058] 次に、距離計算部 3は、計算結果の前記マハラノビス距離に対して、特徴量セット に対応する補正係数え _ (1/2)を乗算し、補正距離を求めて、それぞれマハラノビス距 離と置き換える (ステップ S33)。 [0058] Next, the distance calculation unit 3 multiplies the Mahalanobis distance of the calculation result by a correction coefficient _ (1/2) corresponding to the feature amount set to obtain a correction distance, and each Mahalanobis distance Replace with release (step S33).

そして、距離計算部 3は、内部記憶部における各クラスタ間との補正距離を小さい 順に、並べ替え、すなわち分類対象データとの補正距離の小さいクラスタの識別情 報が上位となる順番に並べる(ステップ S34)。  Then, the distance calculation unit 3 rearranges the correction distances between the clusters in the internal storage unit in ascending order, that is, arranges the identification information of the clusters with the small correction distances to the classification target data in the order of higher rank (step S34).

並べ替えた後、距離計算部 3は、小さい方 (上位)から n番目までの各補正距離に 対応するクラスタの識別情報を検出し、その n個に含まれる各クラスタ毎の識別情報 の数をカウント、すなわち各クラスタに対して投票処理を行う。  After rearrangement, the distance calculation unit 3 detects the identification information of the clusters corresponding to the correction distances from the smaller (higher) to the nth, and calculates the number of identification information for each cluster included in the n pieces. Counting, that is, voting is performed for each cluster.

[0059] 次に、距離計算部 3は、各分類対象データの上位 n個における各クラスタに対する カウント数のパターン (あるいは並びのパターン) 1S 内部に記憶したテーブルに存在 するか否かの照合処理を行う(ステップ S35)。 [0059] Next, the distance calculation unit 3 applies to each cluster in the top n pieces of each classification target data. Count number pattern (or arrangement pattern) 1S Check whether it exists in the table stored in the inside (step S35).

そして、距離計算部 3は、上述した照合の結果、分類対象データの対象パターンに 合致する規則パターンがテーブルに記述されていることを検出すると、この分類対象 データがその合致した規則に対応する識別情報のクラスタに属すると判定し、分類 対象データをこのクラスタに分類する (ステップ S36)。  When the distance calculation unit 3 detects that the rule pattern matching the target pattern of the classification target data is described in the table as a result of the above-described collation, the distance calculation unit 3 identifies the classification target data corresponding to the matched rule. It is determined that it belongs to the cluster of information, and the classification target data is classified into this cluster (step S36).

[0060] また、上述したテーブルに記述された規則を用いた第 2の実施形態の他のクラスタ リングの処理について、図 9のフローチャートを用いて説明する。 Further, another clustering process of the second embodiment using the rules described in the above-described table will be described with reference to the flowchart of FIG.

この図 9に示す他のクラスタリングの処理において、ステップ S31〜ステップ S35ま での処理は、図 8に示す処理と同様であり、距離計算部 3は、ステップ S35において すでに述べたように、テーブルに記憶されている規則パターンから、分類対象データ の対象パターンとの照合処理を行う。  In the other clustering processing shown in FIG. 9, the processing from step S31 to step S35 is the same as the processing shown in FIG. 8, and the distance calculation unit 3 stores the table in the table as already described in step S35. Based on the stored rule pattern, collation processing with the target pattern of the classification target data is performed.

そして、距離計算部 3は、上記照合結果において、上記対象パターンと合致する規 則パターンが検索されたか否かを検出し、合致する規則パターンが検索されたことを 検出した場合、処理をステップ S47へ移行し、一方、合致する規則パターンが検索さ れな 、ことを検出した場合、処理をステップ S48へ移行する (ステップ S46)。  Then, the distance calculation unit 3 detects whether or not a rule pattern that matches the target pattern is found in the collation result. If the rule calculation unit 3 detects that a rule pattern that matches the target pattern is found, the process proceeds to step S47. On the other hand, if it is detected that no matching rule pattern is found, the process proceeds to step S48 (step S46).

[0061] 合致する規則パターンが検索されたことを検出した場合、距離計算部 3は、この分 類対象データがその合致した規則に対応する識別情報のクラスタに属すると判定し 、分類対象データをこのクラスタに分類し、クラスタデータベース 5に対し、分類先のク ラスタの識別情報に対応させ、分類した分類対象データを記憶する (ステップ S47)。 一方、合致する規則パターンが検索されないことを検出した場合、距離計算部 3は 、カウント数、すなわち投票数が最も多い識別情報を検出し、この識別情報に対応す るクラスタに分類対象データを分類する。 [0061] When it is detected that a matching rule pattern has been searched, the distance calculation unit 3 determines that this classification target data belongs to the cluster of identification information corresponding to the matching rule, and determines the classification target data. The classification target data is classified and stored in the cluster database 5 in correspondence with the identification information of the classification destination cluster (step S47). On the other hand, when it is detected that the matching rule pattern is not searched, the distance calculation unit 3 detects the identification information having the largest number of counts, that is, the number of votes, and classifies the classification target data into the cluster corresponding to the identification information. To do.

そして、距離計算部 3は、クラスタデータベース 5に対し、帰属先のクラスタの識別情 報に対応させ、分類した分類対象データを記憶する (ステップ S48)。  Then, the distance calculation unit 3 stores the classified target data in the cluster database 5 in association with the identification information of the cluster to which it belongs (step S48).

[0062] <第 3の実施形態 > <Third Embodiment>

上述した第 2の実施形態は、計算した分類対象データの各クラスタとの距離が小さ Vヽ (類似性が大き ヽ)方から上位 n個における規則パターンのテーブルを準備し、こ のテーブルにある規則パターンに対応するカゝ否かにより、各分類対象データのクラス タリングの処理を行うとして説明したが、以下に説明する第 3の実施形態のように、ク ラスタ毎に特徴量セットを複数設定して、それぞれの特徴量セットに対応したマハラノ ビス距離を演算し、補正距離を算出して、上位の所定の順位以内の補正距離が多い クラスタを、分類対象データの属するクラスタとしてもよい。 The second embodiment described above prepares a table of rule patterns in the top n from the V V (similarity is large) distance from each cluster of the calculated classification target data. Although it has been described that clustering processing is performed on each classification target data depending on whether or not the rule pattern corresponding to the rule pattern in the table is selected, the feature amount is set for each cluster as in the third embodiment described below. Set multiple sets, calculate the Mahalanobis distance corresponding to each feature set, calculate the correction distance, and select the cluster with the most correction distances within the upper predetermined rank as the cluster to which the classification target data belongs. Also good.

以下、第 3の実施形態の構成は、図 1に示す第 1及び第 2の実施形態と同様であり 、同一の符号を各構成に付し、各構成において第 2の実施形態と異なる動作のみを 、図 10を用いて説明する。第 3の実施形態においては、学習データから上記規則を 設定する処理がなぐ直接に図 9におけるステップ S48を行う。図 10は第 3の実施形 態におけるクラスタリングの動作例を示すフローチャートである。  Hereinafter, the configuration of the third embodiment is the same as that of the first and second embodiments shown in FIG. 1, and the same reference numerals are given to the respective components, and only the operations different from those of the second embodiment are performed in the respective configurations. Is explained using FIG. In the third embodiment, step S48 in FIG. 9 is performed directly after the process of setting the rules from the learning data. FIG. 10 is a flowchart showing an example of clustering operation in the third embodiment.

[0063] この図 10に示す他のクラスタリングの処理において、ステップ S31〜ステップ S34ま での処理は、図 8に示す処理と同様であり、距離計算部 3は、すでに述べたように、ス テツプ S34において、内部記憶部における各クラスタ間との補正距離を小さい順に、 並べ替え、すなわち分類対象データとの補正距離の小さ 、クラスタの識別情報が上 位となる順番に並べる (ステップ S34)。 [0063] In the other clustering processing shown in Fig. 10, the processing from step S31 to step S34 is the same as the processing shown in Fig. 8, and the distance calculation unit 3 performs the step as described above. In S34, the correction distances between the clusters in the internal storage unit are rearranged in ascending order, that is, the correction distances from the classification target data are arranged in ascending order of the cluster identification information (step S34).

次に、距離計算部 3は、小さい方 (上位)から n番目までの各補正距離に対応するク ラスタの識別情報を検出し、その n個に含まれる各クラスタ毎の識別情報の数をカウ ント、すなわち各クラスタに対して投票処理を行う(ステップ S55)。  Next, the distance calculation unit 3 detects cluster identification information corresponding to each of the correction distances from the smaller (higher) to the nth, and counts the number of identification information for each cluster included in the n. In other words, a voting process is performed for each cluster (step S55).

そして、距離計算部 3は、 投票結果において、最も多いカウント値 (投票数)の識別 情報を検出し、この識別情報に対応するクラスタを、分類対象データの属するクラス タとし、クラスタデータベース 5に対し、帰属先のクラスタの識別情報に対応させ、分 類した分類対象データを記憶する (ステップ S 56)。  Then, the distance calculation unit 3 detects the identification information of the largest count value (the number of votes) in the voting result, and designates the cluster corresponding to this identification information as the cluster to which the classification target data belongs, with respect to the cluster database 5. Then, the classified data to be classified is stored in correspondence with the identification information of the cluster to which it belongs (step S 56).

[0064] また、ユーザが予め足きりのための投票数の閾値を、距離計算部 3に識別情報毎 に設定し、最も投票数の多い識別情報の投票数がこの閾値に満たない場合、いずれ のクラスタにも属さな 、とする処理を行ってもょ 、。 [0064] In addition, when the user sets a threshold value for the number of votes in advance in the distance calculation unit 3 for each identification information, and the number of votes for the identification information with the largest number of votes is less than this threshold value, Do not belong to any cluster.

例えば、クラスタ A, B, Cの 3つのクラスタに対して、分類対象データを分類する場 合、クラスタ Aの識別情報に対する投票数が 5個であり、クラスタ Bの識別情報に対す る投票数が 3個であり、クラスタ Cに対する投票数が 2個である場合、最も投票数の多 い識別情報はクラスタ Aと距離計算部 3が検出する。 For example, when classifying data to be classified for three clusters A, B, and C, the number of votes for the identification information of cluster A is five, and the number of votes for the identification information of cluster B is five. If there are three and the number of votes for cluster C is two, the most votes Cluster A and the distance calculation unit 3 detect the identification information.

し力しながら、クラスタ Aに対する上記閾値が 6個として設定されていると、距離計算 部 3は、クラスタ Aの識別情報に対する投票数が閾値に満たないため、いずれのクラ スタにも属さないとの判定を行う。  However, if the above threshold value for cluster A is set to 6, the distance calculation unit 3 must not belong to any cluster because the number of votes for the identification information of cluster A is less than the threshold value. Judgment is made.

これにより、特徴量が他のクラスタとわずかな差しかないクラスタに対するクラスタリ ングにおいて、分類対象データのクラスタに対する分類処理の信頼性を向上させるこ とが可能となる。  As a result, it is possible to improve the reliability of the classification process for the cluster of the classification target data in the clustering for the cluster whose feature quantity is not slightly different from other clusters.

[0065] <特徴量の変換方法 > [0065] <Feature Conversion Method>

各特徴量の母集団が正規分布であることを期待してクラスタリングを行うが、特徴量 の種類 (面積、長さなど)によっては正規分布とならず、母集団が偏った分布を有す る場合があり、分類対象データと各クラスタとの間の距離の計算、すなわち分類対象 データと各クラスタとの類似性を判定する場合の精度が低下することが考えられる。 そのため、特徴量によっては、母集団の特徴量を所定の方法により変換し、正規分 布に近づけて類似性の判定の精度を向上させることを行う必要がある。  Clustering is performed with the expectation that the population of each feature is a normal distribution, but depending on the type of feature (area, length, etc.), the distribution may not be a normal distribution and the population may be biased. In some cases, the calculation of the distance between the classification target data and each cluster, that is, the accuracy in determining the similarity between the classification target data and each cluster may be reduced. Therefore, depending on the feature value, it is necessary to convert the feature value of the population by a predetermined method and to improve the accuracy of similarity determination by bringing it close to the normal distribution.

この正規分布への変換方法としては、特徴量を logや平方根 ( )、立方根 (3 )な どの n方根、または階乗、あるいは数値計算により求めた関数を含む演算式のいず れかにより変換する。 This normal distribution can be converted to either a logarithm, n-root such as square root (), cubic root ( 3 ), factorial, or an arithmetic expression including a function obtained by numerical calculation. Convert.

[0066] 以下に、各特徴量の変換方法の設定処理について図 11を用いて説明する。図 11 は各特徴量の変換方法の設定処理の動作例を示すフローチャートである。なお、こ の変換方法は、クラスタ毎に、クラスタに含まれる各特徴量単位にて設定する。また、 この変換方法の設定は、各クラスタに属する学習データを用いて行う。以下の処理は 、特徴量セット作成部 1が行うこととして説明するが、この処理に対応した処理部を他 に設けても力まわない。  [0066] Hereinafter, setting processing of a conversion method for each feature amount will be described with reference to FIG. FIG. 11 is a flowchart showing an operation example of the setting process of the conversion method of each feature quantity. This conversion method is set for each cluster in units of feature values included in the cluster. This conversion method is set using learning data belonging to each cluster. The following processing is described as being performed by the feature value set creation unit 1, but it is not necessary to provide another processing unit corresponding to this processing.

特徴量セット作成部 1は、分類対象のクラスタの識別情報をキーとし、このクラスタに 含まれる学習データをクラスタデータベース 5から読み出し、各学習データの特徴量 を算出 (正規化処理)する (ステップ S61)。  The feature value set creation unit 1 uses the identification information of the cluster to be classified as a key, reads the learning data included in this cluster from the cluster database 5, and calculates (normalizes) the feature value of each learning data (step S61). ).

[0067] 次に、特徴量セット作成部 1は、内部に記憶されている特徴量変換を行う演算式の いずれかを用い、読み出した上記各学習データを演算することにより、特徴量の変換 を行う(ステップ S 62)。 [0067] Next, the feature value set creation unit 1 performs feature value conversion by calculating each of the read learning data using any of the arithmetic expressions for performing feature value conversion stored therein. (Step S62).

全ての学習データの特徴量の変換が終了すると、特徴量セット作成部 1は、変換処 理にて得られた分布が正規分布に近いか否かを示す評価値を算出する (ステップ S6 3)。  When the conversion of the feature values of all learning data is completed, the feature value set creation unit 1 calculates an evaluation value indicating whether the distribution obtained by the conversion process is close to the normal distribution (step S63) .

[0068] 次に、特徴量セット作成部 1は、内部に記憶されている、すなわち変換方法として予 め設定されて 、る演算式の全てにぉ 、て評価値を算出した力否かの検出を行 、、全 ての演算式にて特徴量が変換され得られた分布の評価値が算出されていることが検 出された場合、処理をステップ S65へ進め、一方、全ての演算式による特徴量の算 出が終了して 、な 、ことを検出した場合、次に設定されて 、る演算式の処理を行うた め、処理をステップ S62へ戻す (ステップ S64)。  [0068] Next, the feature value set creation unit 1 detects whether or not the force is calculated by calculating the evaluation value for all the arithmetic expressions stored therein, that is, preset as a conversion method. , And if it is detected that the evaluation value of the distribution obtained by converting the feature values in all the arithmetic expressions is calculated, the process proceeds to step S65, while If the calculation of the feature quantity is completed and it is detected, the process returns to step S62 (step S64) in order to process the arithmetic expression that is set next.

全ての演算式による特徴量の変換が終了した場合、特徴量セット作成部 1は、設定 した演算式において得られた分布にて評価値が最も小さな分布、すなわち最も正規 分布に近い分布を検出し、検出された分布を作成するために用いた演算式を変換 方法として決定し、そのクラスタの特徴量の変換方法として内部に設定する (ステップ S65)。  When the feature value conversion by all the calculation formulas is completed, the feature value set creation unit 1 detects the distribution with the smallest evaluation value in the distribution obtained in the set calculation formula, that is, the distribution closest to the normal distribution. Then, the arithmetic expression used to create the detected distribution is determined as a conversion method and set internally as a conversion method for the feature quantity of the cluster (step S65).

特徴量セット作成部 1は、上述した処理を各クラスタの特徴量毎に対して行い、それ ぞれのクラスタにおける各特徴量に対応して変換方法を設定する。  The feature quantity set creation unit 1 performs the above-described processing for each feature quantity of each cluster, and sets a conversion method corresponding to each feature quantity in each cluster.

[0069] 次に、上記ステップ S63における評価値の計算を、図 12を用いて説明する。図 12 は演算式により得られた分布の評価値を求める処理の動作例を説明するフローチヤ ートである。 [0069] Next, the calculation of the evaluation value in step S63 will be described with reference to FIG. Fig. 12 is a flow chart for explaining an example of the processing for obtaining the evaluation value of the distribution obtained by the arithmetic expression.

特徴量セット作成部 1は、対象クラスタに属する各学習データの特徴量を、設定さ れて 、る演算式により変換する (ステップ S71)。  The feature quantity set creation unit 1 converts the feature quantity of each learning data belonging to the target cluster by the set arithmetic expression (step S71).

全ての学習データの特徴量を変換した後、特徴量セット作成部 1は、この変換後の 特徴量にて得られた分布 (母集団)の平均値 μ及び標準偏差 σを算出する (ステツ プ S72)。  After converting the feature values of all the learning data, the feature value set creation unit 1 calculates the average value μ and standard deviation σ of the distribution (population) obtained from the converted feature values (step) S72).

そして、特徴量セット作成部 1は、上記母集団の平均値 と標準偏差 σとを用いて (X— /ζ ) Ζ σにより ζ値(1)を算出する (ステップ S73)。  Then, the feature value set creation unit 1 calculates the ζ value (1) from (X− / ζ) Ζσ using the average value of the population and the standard deviation σ (step S73).

[0070] 次に、特徴量セット作成部 1は、上記母集団における累積確率を算出する (ステツ プ S74)。 [0070] Next, the feature value set creation unit 1 calculates the cumulative probability in the population (steps). P74).

算出後、特徴量セット作成部 1は、求めた母集団中の累積確率により、標準正規分 布の累積分布関数の逆関数の値として z値(2)を算出する (ステップ S75)。  After the calculation, the feature value set creation unit 1 calculates the z value (2) as the value of the inverse function of the cumulative distribution function of the standard normal distribution based on the calculated cumulative probability in the population (step S75).

そして、特徴量セット作成部 1は、特徴量の分布の 2つの z値、すなわち z値(1)及び z値(2)の差、すなわち分布における 2つの z値の誤差を求める(ステップ S76)。 z値の誤差を求めると、特徴量セット作成部 1は、上記 2つ z値の誤差の和、すなわ ちその誤差の総和(自乗和)を評価値として算出する (ステップ S77)。  Then, the feature quantity set creation unit 1 calculates the difference between the two z values of the feature quantity distribution, that is, the difference between the z value (1) and the z value (2), that is, the error between the two z values in the distribution (step S76). . When the error of the z value is obtained, the feature value set creation unit 1 calculates the sum of the errors of the two z values, that is, the sum of the errors (square sum) as an evaluation value (step S77).

上述した 2つの z値の誤差が小さいほど、分布は正規分布に近ぐ z値の誤差がなけ れば正規分布であり、一方、分布が正規分布力 外れるほど誤差は大きくなる。  The smaller the error between the two z-values mentioned above, the closer the distribution is to the normal distribution if there is no z-value error. On the other hand, the larger the error is, the larger the error is.

[0071] 次に、第 1〜第 3の実施形態におけるクラスタリングの処理を行う前に、分類対象デ ータの特徴量の算出について図 13を用いて説明する。図 13は、分類対象データの 特徴量データの算出の動作例を示すフローチャートである。 Next, calculation of the feature quantity of the classification target data will be described with reference to FIG. 13 before performing the clustering process in the first to third embodiments. FIG. 13 is a flowchart showing an operation example of calculating feature amount data of classification target data.

距離計算部 3は、入力される分類対象データから識別対象の特徴量を、各クラスタ に対して設定された特徴量セットに対応して抽出し、すでに説明した正規化処理を 行う(ステップ S81)。  The distance calculation unit 3 extracts the feature quantity of the identification target from the input classification target data corresponding to the feature quantity set set for each cluster, and performs the normalization process already described (step S81). .

次に、距離計算部 3は、分類対象データにおける分類対象のクラスタへの分類に用 V、られる特徴量を、このクラスタの特徴量に対して設定されて 、る変換方法 (演算式) により変換する (ステップ S82)。  Next, the distance calculation unit 3 converts the feature quantity used for classification into the cluster to be classified in the classification target data V according to the conversion method (arithmetic formula) set for the feature quantity of this cluster. (Step S82).

そして、距離計算部 3は、第 1〜第 3の実施形態に記載されているように、分類対象 のクラスタとの距離を算出する (ステップ S83)。  Then, as described in the first to third embodiments, the distance calculation unit 3 calculates a distance from the cluster to be classified (step S83).

[0072] 次に、距離計算部 3は、分類対象のクラスタ全てに対し、各クラスタの特徴量に対応 して設定された変換方法により、特徴量を変換し、この変換した特徴量によりクラスタ との距離が計算されたカゝ否かの検出を行い、分類対象の全てのクラスタに対して距 離を求めたことが検出された場合、処理をステップ S85へ進め、一方、分類対象のク ラスタが残って 、ることを検出した場合、処理をステップ S82に戻す (ステップ S84)。 そして、第 1〜第 3の実施形態各々において、距離の計算が終了した時点力 の処 理を開始する (ステップ S85)。 [0072] Next, the distance calculation unit 3 converts the feature amounts for all the clusters to be classified by a conversion method set in correspondence with the feature amounts of each cluster, and the cluster and the converted feature amounts. If it is detected that the distance is calculated for all the clusters to be classified, the process proceeds to step S85, while the cluster to be classified is detected. If it is detected that remains, the process returns to step S82 (step S84). Then, in each of the first to third embodiments, processing of the point-in-time force when the distance calculation is completed is started (step S85).

上述した処理により、本実施形態にて用いているマハラノビス距離においては、分 類対象データと各クラスタとの間の距離を求める際、特徴量が正規分布であることを 期待しているため、母集団の各特徴量の分布が正規分布に近いほど、各クラスタとの 間にお 、て正確な距離 (類似性)を求めることができ各クラスタに対する分類の精度 が向上することが期待できる。 With the processing described above, the Mahalanobis distance used in this embodiment is When calculating the distance between the target data and each cluster, we expect that the feature quantity is a normal distribution. Therefore, the closer the distribution of each feature quantity of the population is to the normal distribution, the closer to each cluster. In addition, it can be expected that an accurate distance (similarity) can be obtained and the accuracy of classification for each cluster is improved.

実施例  Example

[0073] <計算例>  [0073] <Calculation example>

次に、上述した第 1 ,第 2及び第 3の実施形態のクラスタリングシステムを用いて、図 14に示すサンプルデータによる、従来例との分類の精度を確認した。サンプル数が 少ないが、使用している特徴量が少ないにもかかわらず、従来例またはそれ以上の 正答率が得られていることが判る。この図 14において、クラスタとして、カテゴリ 1 ,力 テゴリ 2およびカテゴリ 3のそれぞれに学習データを 10個ずつ定義し、各学習データ が特徴量 a, b, c, d, e, f, g, hの 8つを有している。この例では、図 14に示す各クラ スタに属している学習データから、クラスタリングに用いる特徴量セットを決定し、次に 、分類対象データとして、同様に学習セットを用いてクラスタリングを行っている。  Next, using the clustering systems of the first, second, and third embodiments described above, the accuracy of classification with the conventional example based on the sample data shown in FIG. 14 was confirmed. Although the number of samples is small, it can be seen that the correct answer rate is higher than that of the conventional example, even though the number of features used is small. In Fig. 14, 10 learning data are defined for each of category 1, force category 2 and category 3 as clusters, and each learning data has feature quantities a, b, c, d, e, f, g, h There are eight of them. In this example, a feature amount set used for clustering is determined from the learning data belonging to each cluster shown in FIG. 14, and then clustering is similarly performed using the learning set as classification target data.

[0074] 計算結果としては、図 15が従来の計算手法として、特徴量の組合せとして特徴量 a および gを用いて、クラスタ 1〜クラスタ 3の図 14に示す各学習データに対して、マハ ラノビス距離を演算して、判定結果を示している。図 15 (a)において、 Clusterlの列 はクラスタ 1とのマハラノビス距離であり、 Cluster2の列はクラスタ 2とのマハラノビス距 離であり、 Cluster3の列はクラスタ 3とのマハラノビス距離を示している。また、カテゴリ の列が実際に各学習データが属しているクラスタを示し、判定結果が学習データとマ ノ、ラノビス距離が最小のクラスタを示して 、る。カテゴリと判定結果との数字が一致し て!、るものが正確に分類された特徴量データを示して 、る。 [0074] As a calculation result, Fig. 15 shows a conventional calculation method, using feature amounts a and g as a combination of feature amounts, and for each learning data shown in Fig. 14 of cluster 1 to cluster 3, Mahalanobis. The distance is calculated and the determination result is shown. In Fig. 15 (a), the cluster column is the Mahalanobis distance to cluster 1, the cluster 2 column is the Mahalanobis distance to cluster 2, and the cluster 3 column is the Mahalanobis distance to cluster 3. The category column indicates the cluster to which each learning data actually belongs, and the judgment result indicates the learning data and the mano, and the cluster with the smallest Ranobis distance. The numbers in the category and the judgment result match! The feature data that is correctly classified is displayed.

[0075] 図 15 (b)において、列の番号が学習データが実際に属しているクラスタを示し、行 の番号が判定されたクラスタを示している。例えば、マーク R1の「8」はクラスタ 1の 10 個のクラスタの内 8個がクラスタ 1として判定され、マーク R2の「2」はクラスタ 1の 10個 のクラスタの内 2個がクラスタ 3と判定されたことを示している。 ρθは正解と回答との一 致率を示し、 piは両者が偶然一致する確率を示し、 κは全体補正判定率であり、以 下の式により求められる。この κが高いほど分類の精度が高いことを示している。 κ = (p0 -pl) / ( l -pl) In FIG. 15B, the column number indicates the cluster to which the learning data actually belongs, and the row number indicates the determined cluster. For example, “8” in mark R1 is determined as cluster 1 in 8 of 10 clusters in cluster 1, and “2” in mark R2 is determined as cluster 3 in 2 out of 10 clusters in cluster 1. It has been shown. ρθ indicates the match rate between correct answer and answer, pi indicates the probability of coincidence of both, and κ is the overall correction determination rate, which can be obtained by the following formula. The higher κ, the higher the classification accuracy. κ = (p0 -pl) / (l -pl)

pO = (a + d) / (a + b + c + d)  pO = (a + d) / (a + b + c + d)

pi = [ (a + b) · (a + c) · (b + d) · (c + d) ] · (a + b + c + d) 2 pi = [(a + b) · (a + c) · (b + d) · (c + d)] · (a + b + c + d) 2

[0076] 前記式における、 a, b, c, dの関係を、図 16を用いて説明する。 [0076] The relationship between a, b, c, and d in the above equation will be described with reference to FIG.

クラスタ 1に属するデータがクラスタ 1として分類された数力 であり、クラスタ 1に属す るデータがクラスタ 2として分類された数力 ¾であり、 a + bがクラスタ 1に属するデータ 数を示している。また、同様に、クラスタ 2に属するデータがクラスタ 2として分類された 数が dであり、クラスタ 2に属するデータがクラスタ 1として分類された数が cであり、 c + dがクラスタ 2に属するデータ数を示して!/、る。 a + cは全データ a + b + c + dの内でク ラスタ 1に分類された数であり、 b + dは全データ a +b + c + dの内でクラスタ bに分類 された数である。  The data belonging to cluster 1 is the number power classified as cluster 1, the data belonging to cluster 1 is the number power ¾ classified as cluster 2, and a + b indicates the number of data belonging to cluster 1 . Similarly, the number of data belonging to cluster 2 classified as cluster 2 is d, the number of data belonging to cluster 2 classified as cluster 1 is c, and c + d is data belonging to cluster 2 Show the number! / a + c is the number classified as cluster 1 in all data a + b + c + d, b + d is the number classified as cluster b in all data a + b + c + d It is.

[0077] 次に、図 17が第 1の実施形態の計算手法を用い、クラスタ 1〜クラスタ 3の図 14に 示す各学習データに対して、マハラノビス距離を演算して、判定結果を示している。 この図 17 (a)および (b)の見方については、図 15と同様であるためその説明を省略 する。正解率 ρθ,偶然一致する確立 p i ,全体補正判定率 Kは図 15の従来の計算手 法と同等であることが判る。ここで、上述した全体の組み合わせのなかから、各クラス タ毎に最大の判別基準値えを有する組み合わせを選択する方法を用いて、各クラス タに対応する特徴量セットを算出した。クラスタ 1に対応した特徴量セットとしては特徴 量 aおよび hの組み合わせを用い、クラスタ 2に対応した特徴量セットとしては特徴量 a , dの組み合わせを用い、クラスタ 3に対応した特徴量セットとしては特徴量 a, gの組 み合わせを用いた。 Next, FIG. 17 shows the determination result by calculating the Mahalanobis distance for each learning data shown in FIG. 14 of cluster 1 to cluster 3 using the calculation method of the first embodiment. . The way of viewing FIGS. 17 (a) and (b) is the same as in FIG. 15, and therefore the description thereof is omitted. It can be seen that the correct answer rate ρθ, the coincidence probability pi, and the overall correction judgment rate K are equivalent to the conventional calculation method in FIG. Here, a feature value set corresponding to each cluster was calculated using a method of selecting a combination having the maximum discriminant reference value for each cluster from the entire combinations described above. The feature set corresponding to cluster 1 is a combination of feature quantities a and h, the feature set corresponding to cluster 2 is a combination of feature quantities a and d, and the feature set corresponding to cluster 3 is A combination of features a and g was used.

[0078] 次に、図 18が第 2の実施形態の計算手法を用い、クラスタ 1〜クラスタ 3の図 14に 示す各学習データに対して、マハラノビス距離を演算して、判定結果を示している。 この図 18 (a)および (b)の見方にっ 、ては、図 15と同様であるためその説明を省略 する。正解率 pOが 0. 8333であり、偶然一致する確立 p iが 0. 3333であり,全体補 正判定率 κが 0. 75であり、図 15の従来の計算手法と比較すると分類精度が向上し ていることが判る。ここで、上述した全体の組み合わせのなかから、各クラスタ毎に上 位 3番目までの判別基準値 λを有する組み合わせを選択する方法を用いて、各クラ スタに対応する特徴量セットを算出した。クラスタ 1に対応した特徴量セットとしては特 徴量 a 'h, a-g, d' eの 3つの組み合わせを用い、クラスタ 2に対応した特徴量セットと しては特徴量 a'f, a - d, a 'bの 3つの組み合わせを用い、クラスタ 3に対応した特徴量 セットとしては特徴量 e 'g, a - c, a 'gの 3つの組み合わせを用いた。 Next, FIG. 18 shows the determination result by calculating the Mahalanobis distance for each learning data shown in FIG. 14 of cluster 1 to cluster 3 using the calculation method of the second embodiment. . 18 (a) and 18 (b) are the same as FIG. 15 and will not be described. The correct answer rate pO is 0.8333, the probability of coincidence pi is 0.3333, the overall correction judgment rate κ is 0.75, and the classification accuracy is improved compared to the conventional calculation method of Fig. 15. You can see that Here, each class is selected using a method of selecting the combination having the third highest discriminant reference value λ for each cluster from the above-described total combinations. The feature amount set corresponding to the star was calculated. As a feature set corresponding to cluster 1, three combinations of feature quantities a 'h, ag, d' e are used. As a feature set corresponding to cluster 2, feature quantities a'f, a-d , a 'b, and three combinations of feature quantities e' g, a-c, and a 'g were used as the feature quantity set corresponding to cluster 3.

また、投票の判定としては、マハラノビス距離の少ないものから順番に列べ、少ない ものから 3番目に入るクラスタの数を計算して、最も多い数のクラスタをその分類対象 データが属するクラスタとした。  In addition, for voting, we calculated the number of clusters that entered the third from the smallest Mahalanobis distance, and calculated the number of the third cluster from the smallest, and set the largest number of clusters to which the classification target data belongs.

[0079] 次に、図 19が第 2の実施形態の計算手法を用い、クラスタ 1〜クラスタ 3の図 14に 示す各学習データに対して、マハラノビス距離を演算し、さらに計算結果のマハラノ ビス距離に対して補正係数( λ ) _ 1/2を乗算した後、距離の順位付けを行い、判定結 果を示している。この図 19 (a)および (b)の見方については、図 15と同様であるため その説明を省略する。正解率 ρθが 0. 8333であり、偶然一致する確立 piが 0. 3333 であり,全体補正判定率 κが 0. 75であり、図 15の従来の計算手法と比較すると分 類精度が向上していることが判る。ここで、上述した全体の組み合わせのな力から、 各クラスタ毎に上位 3番目までの判別基準値 λを有する組み合わせを選択する方法 を用いて、各クラスタに対応する特徴量セットを算出した。クラスタ 1に対応した特徴 量セットとしては特徴量 a'h, a -g, d' eの 3つの組み合わせを用い、クラスタ 2に対応 した特徴量セットとしては特徴量 a'f, a - d, a 'bの 3つの組み合わせを用い、クラスタ 3 に対応した特徴量セットとしては特徴量 e 'g, a- c, a 'gの 3つの組み合わせを用いた また、投票の判定としては、マハラノビス距離の少ないものから順番に列べ、少ない ものから 3番目に入るクラスタの数を計算して、最も多い数のクラスタをその分類対象 データが属するクラスタとした。 Next, FIG. 19 uses the calculation method of the second embodiment, calculates the Mahalanobis distance for each learning data shown in FIG. 14 for cluster 1 to cluster 3, and further calculates the Mahalanobis distance of the calculation result. Is multiplied by the correction coefficient (λ) _1 / 2 , and then the ranking of the distance is performed, and the determination result is shown. The way of viewing FIGS. 19 (a) and 19 (b) is the same as FIG. The correct answer rate ρθ is 0.8333, the probability of coincidence pi is 0.3333, the overall correction judgment rate κ is 0.75, and the classification accuracy is improved compared to the conventional calculation method of Fig. 15. You can see that Here, a feature value set corresponding to each cluster was calculated using a method of selecting combinations having the third highest discriminant reference value λ for each cluster from the above-described total combination power. The feature set corresponding to cluster 1 uses three combinations of feature quantities a'h, a -g, d 'e, and the feature set corresponding to cluster 2 is feature quantity a'f, a-d, Three combinations of a 'b were used, and three combinations of features e' g, a- c, and a 'g were used as the feature set corresponding to cluster 3, and the Mahalanobis distance was used for voting The number of clusters that fall in order from the smallest number to the third is calculated, and the largest number of clusters is the cluster to which the classification target data belongs.

[0080] 上述した図 15, 17, 18, 19に示した各分類結果から、本実施形態が従来例に比 して、高速かつ高精度のクラスタリング処理が行われていることが判り、本実施形態の 従来例に対する優位性が確認できた。 [0080] From the classification results shown in FIGS. 15, 17, 18, and 19 described above, it can be seen that the clustering process of this embodiment is faster and more accurate than the conventional example. The superiority of the form over the conventional example was confirmed.

[0081] <本発明の応用例 > <Application example of the present invention>

A.検査装置 図 20に示すように被検査物、例えばガラス基板表面のキズの種類を分類する検査 装置 (欠陥検出装置)を説明する。図 21は特徴量セットの選択の動作例を説明する フローチャートであり、図 22はクラスタリング処理における動作例を説明するフローチ ヤートである。 A. Inspection equipment As shown in FIG. 20, an inspection apparatus (defect detection apparatus) for classifying the types of scratches on the surface of an inspection object, for example, a glass substrate will be described. FIG. 21 is a flowchart for explaining an operation example for selecting a feature quantity set, and FIG. 22 is a flowchart for explaining an operation example in the clustering process.

まず、特徴量セットの選択の動作について説明する。図 5のフローチャートにおける ステップ S1における学習データの収集力 図 21のフローチャートのステップ S101か らステップ S 105に対応して!/、る。  First, the feature quantity set selection operation will be described. The learning data collection capability in step S1 in the flowchart of FIG. 5 corresponds to steps S101 to S105 in the flowchart of FIG.

図 21のステップ S2からステップ S4は図 5のフローチャートと同様であるため、説明 を省略する。  Steps S2 to S4 in FIG. 21 are the same as those in the flowchart in FIG.

[0082] オペレータの操作により、キズの種類を分類したいクラスタにそれぞれ対応する学 習データ用のサンプル収集する(ステップ S 101)。  [0082] By the operation of the operator, a sample for learning data corresponding to each cluster to be classified into scratch types is collected (step S101).

画像取得部 101が学習データとして収集したキズの形状を照明装置 102にて照射 し、キズの部分の画像データを撮像装置 103により取得する (ステップ S102)。 そして、画像取得部 101が取得した画像データから、各学習データのキズの特徴 量を算出する (ステップ S 103)。  The flaw shape collected as learning data by the image acquisition unit 101 is irradiated by the illumination device 102, and the image data of the flaw portion is acquired by the imaging device 103 (step S102). Then, the flaw feature amount of each learning data is calculated from the image data acquired by the image acquisition unit 101 (step S 103).

得られた学習データの特徴量を目視で得られた分類先にそれぞれ振り分け、各ク ラスタにおける学習データの特定を行う(ステップ S 104)。  The feature quantities of the obtained learning data are assigned to the classification destinations obtained visually, and the learning data in each cluster is specified (step S104).

そして、各クラスタの学習データが所定数 (予め設定したサンプル数)、例えば、 30 0個ずつ程度になるまで、ステップ S101からステップ S102までの処理を繰り返し、所 定数となると、すでに図 5説明したステップ S2以降の処理をクラスタリング部 105が行 う。ここで、クラスタリング部 105は、第 1または第 2の実施形態におけるクラスタリング システムである。  Then, the processing from step S101 to step S102 is repeated until the learning data of each cluster reaches a predetermined number (preset number of samples), for example, about 300 pieces each. The clustering unit 105 performs the processing after step S2. Here, the clustering unit 105 is the clustering system in the first or second embodiment.

[0083] 次に、図 22を参照して、図 4の検査装置におけるクラスタリングの処理を説明する。  Next, with reference to FIG. 22, the clustering process in the inspection apparatus of FIG. 4 will be described.

ここで、図 22のステップ S31からステップ S34、 S55及び S56は図 10のフローチヤ一 トと同様であるため、説明を省略する。  Here, steps S31 to S34, S55, and S56 in FIG. 22 are the same as the flowchart in FIG.

図 20の検査装置において、検査が開始されると、被検査物 100であるガラス基板 に対し、照明装置 102が照明を行い、撮像装置 103がガラス基板表面を撮影してそ の撮像画像を画像取得部 101へ出力する。これにより、欠陥候補検出部 104は、画 像取得部 101から入力される撮像画像において平面形状と異なる部分を検出すると 、それを分類すべき欠陥候補とする (ステップ S 201)。 In the inspection apparatus of FIG. 20, when inspection is started, the illumination device 102 illuminates the glass substrate that is the object to be inspected 100, and the imaging device 103 captures the surface of the glass substrate and images the captured image. Output to the acquisition unit 101. As a result, the defect candidate detection unit 104 When a portion different from the planar shape is detected in the captured image input from the image acquisition unit 101, it is set as a defect candidate to be classified (step S201).

[0084] 次に、欠陥候補検出部 104は、その欠陥候補の部分の画像データを分類対象デ ータとして、撮像画像から切り出す。 Next, the defect candidate detection unit 104 cuts out the image data of the defect candidate portion from the captured image as classification target data.

そして、欠陥候補検出部 104は、分類対象データの画像データから特徴量を算出 し、クラスタリング部 105に対して、抽出した特徴量の集合からなる分類対象データを 出力する (ステップ S 202)。  Then, the defect candidate detection unit 104 calculates the feature amount from the image data of the classification target data, and outputs the classification target data including the extracted feature amount set to the clustering unit 105 (step S 202).

後のクラスタリングの処理については、図 10のステップですでに説明してあるため、 省略する。上述したように、本発明の検査装置は、ガラス基板上に付いた傷を、キズ の種類毎に、高い精度にて分類することができる。  The subsequent clustering process has already been described in step 10 in FIG. As described above, the inspection apparatus of the present invention can classify scratches on a glass substrate with high accuracy for each type of scratch.

[0085] B.欠陥種類判定装置 [0085] B. Defect type determination device

図 23に示す欠陥種類判定装置は、クラスタリング部 105がすでに説明した本発明 のクラスタリングシステムに対応して 、る。  The defect type determination apparatus shown in FIG. 23 corresponds to the clustering system of the present invention already described by the clustering unit 105.

画像取得装置 201は、図 20における画像取得部 101,照明装置 102および撮像 装置 103から構成されて 、る。  The image acquisition device 201 includes the image acquisition unit 101, the illumination device 102, and the imaging device 103 in FIG.

すでに分類対象データを分類する先の各クラスタの学習データは取得されており、 クラスタリング装置 105のクラスタデータベース 5に準備されている。したがって、図 5 における特徴量セットの選択も終了している。  The learning data of each cluster to which the classification target data is classified has already been acquired and prepared in the cluster database 5 of the clustering apparatus 105. Therefore, the feature set selection in FIG. 5 is also completed.

[0086] 各製造装置に取り付けられて!/ヽる画像取得装置 202から入力される撮像画像から 欠陥候補を検出し、その画像データを切り取り、特徴量を抽出してデータ収集装置 2 03へ出力する。制御装置 200は、データ収集装置 203へ入力される分類対象デー タを、クラスタリング部 105へ転送させる。そして、すでに説明したように、クラスタリン グ部 105は、入力される分類対象データを、キズの種類に対応した各クラスタに対し て分類する。 [0086] A defect candidate is detected from the captured image input from the image acquisition device 202 attached to each manufacturing device !, and the image data is cut out, and the feature amount is extracted and output to the data collection device 203. To do. The control device 200 transfers the classification target data input to the data collection device 203 to the clustering unit 105. As described above, the clustering unit 105 classifies the input classification target data for each cluster corresponding to the type of scratch.

[0087] C.製造管理装置 [0087] C. Manufacturing management device

本発明の製造管理装置は、図 24に示すように、制御装置 300,製造装置 301, 30 2,告知部 303,記録部 304,不具合装置判定部 305および欠陥種別判定装置 306 カゝら構成されている。ここで、欠陥種別判定装置 306は前記 Bの項で説明した欠陥 種別判定装置と同様である。 As shown in FIG. 24, the manufacturing management apparatus of the present invention is composed of a control device 300, manufacturing devices 301, 302, a notification unit 303, a recording unit 304, a defective device determination unit 305, and a defect type determination device 306. ing. Here, the defect type determination device 306 is the defect described in the section B above. This is the same as the type determination device.

欠陥種類判定装置 306は、製造装置 301および製造装置 302にそれぞれ設けら れている画像取得装置 201, 202からの撮像画像を、対応する欠陥候補検出部 104 にお 、て画像処理して特徴量を抽出し、分類対象データの分類を行う。  The defect type determination device 306 performs image processing on the captured images from the image acquisition devices 201 and 202 provided in the manufacturing device 301 and the manufacturing device 302, respectively, and performs feature processing on the corresponding defect candidate detection unit 104. And classify the data to be classified.

[0088] 次に、不具合装置判定部 305は、分類されたクラスタの識別情報と、そのクラスタに 対応する発生要因との関係を示すテーブルを有し、前記欠陥種類判定装置 306か ら入力される分類先のクラスタの識別情報に対応した発生要因を前記テーブルから 読み出し、発生要因となっている製造装置を判定する。すなわち、不具合装置判定 部 305は、クラスタの識別情報に対応して、製品の製造プロセスにおける欠陥の発生 要因を検出する。 Next, the defective device determination unit 305 has a table indicating the relationship between the identification information of the classified cluster and the generation factor corresponding to the cluster, and is input from the defect type determination device 306. The generation factor corresponding to the identification information of the cluster to be classified is read from the table, and the manufacturing apparatus that is the generation factor is determined. That is, the defective device determination unit 305 detects the cause of the defect in the product manufacturing process in accordance with the cluster identification information.

そして、不具合装置判定部 305は、告知部 303からオペレータに通知するとともに 、記録部 304に、判定された日時に対応して、欠陥の分類されたクラスタの識別番号 と、発生要因と、その製造装置の識別情報とを履歴として記憶させる。また、制御装 置 300は、不具合装置判定部 305の判定した製造装置の停止、または制御パラメ一 タの制御を行う。  Then, the defective device determination unit 305 notifies the operator from the notification unit 303, and also notifies the recording unit 304 of the identification number of the cluster into which the defect is classified, the generation factor, and the manufacturing process corresponding to the determined date and time. The device identification information is stored as a history. Further, the control device 300 stops the manufacturing device determined by the defective device determination unit 305 or controls the control parameters.

[0089] D.製造管理装置 [0089] D. Manufacturing management device

本発明の他の製造管理装置は、図 25に示すように、制御装置 300,製造装置 301 , 302,告知部 303,記録部 304およびクラスタリング部 105から構成されている。こ こで、クラスタリング部 105は前記 A, Bの項で説明した構成と同様である。  As shown in FIG. 25, another production management apparatus of the present invention includes a control device 300, production devices 301 and 302, a notification unit 303, a recording unit 304, and a clustering unit 105. Here, the clustering unit 105 has the same configuration as that described in the above sections A and B.

クラスタリング部 105においては、上述した A〜Cの場合と異なり、分類対象データ の特徴データが工業製品、例えばガラス基板の製造過程における製造条件 (材料の 分量、処理温度、圧力、処理速度など)力 なる特徴量により、製造プロセスの各ェ 程の製造状態別に分類する。前記特徴量は、各製造装置 301や 302に設けられて いるセンサの検出する工程情報としてクラスタリング部 105に特徴量として入力される  In the clustering unit 105, unlike the cases A to C described above, the feature data of the classification target data is the manufacturing condition (material quantity, processing temperature, pressure, processing speed, etc.) force in the manufacturing process of industrial products, for example, glass substrates. Are classified according to the manufacturing status of each stage of the manufacturing process. The feature amount is input as a feature amount to the clustering unit 105 as process information detected by sensors provided in the respective manufacturing apparatuses 301 and 302.

[0090] すなわち、クラスタリング部 105は、前記分類対象データの特徴量により、各製造装 置における各工程におけるガラス製造プロセスの製造状態を、「正常な状態」, 「欠陥 が発生しやすく調整が必要な状態」, 「危険で調整が必要な状態」などのクラスタに分 類する。そして、クラスタリング部 105は、前記分類結果を告知部 303によりオペレー タに通知するとともに、分類結果のクラスタの識別情報を制御装置 300へ出力し、ま た、記録部 304に、判定された日時に対応して、前記各工程の製造状態の分類され たクラスタの識別番号と、最も問題となる特徴量である製造条件と、その製造装置の 識別情報とを履歴として記憶させる。 [0090] That is, the clustering unit 105 needs to adjust the manufacturing state of the glass manufacturing process in each process of each manufacturing apparatus according to the feature quantity of the classification target data as “normal state” and “defects are likely to occur. Status ”,“ dangerous and needs adjustment ”, etc. Similar. Then, the clustering unit 105 notifies the operator of the classification result by the notification unit 303, outputs the cluster identification information of the classification result to the control device 300, and also notifies the recording unit 304 at the determined date and time. Correspondingly, the identification number of the classified cluster of the manufacturing state of each process, the manufacturing condition which is the most problematic feature amount, and the identification information of the manufacturing apparatus are stored as history.

制御装置 300は、クラスタの識別情報と製造条件を正常に戻す調整項目およびそ のデータとの対応を示すテーブルを有しており、クラスタリング部 105から入力される クラスタの識別情報に対応した、製造条件を正常に戻す調整項目およびそのデータ を読み出し、対応する製造装置を読み出したデータにより制御する。  The control device 300 has a table indicating the correspondence between the cluster identification information and the adjustment items for returning the manufacturing conditions to normal and the data, and the manufacturing apparatus corresponding to the cluster identification information input from the clustering unit 105. The adjustment items that return the conditions to normal and their data are read, and the corresponding manufacturing equipment is controlled by the read data.

[0091] なお、図 1におけるクラスタリングシステムの機能を実現するためのプログラムをコン ピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラム をコンピュータシステムに読み込ませ、実行することにより分類対象データのクラスタリ ングの処理を行ってもよい。なお、ここでいう「コンピュータシステム」とは、 OSや周辺 機器等のハードウェアを含むものとする。また、「コンピュータシステム」は、ホームべ ージ提供環境 (あるいは表示環境)を備えた WWWシステムも含むものとする。また、 「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク 、 ROM, CD— ROM等の可搬媒体、コンピュータシステムに内蔵されるハードデイス ク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、ィ ンターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信さ れた場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ(RA M)のように、一定時間プログラムを保持して 、るものも含むものとする。 [0091] It should be noted that a program for realizing the functions of the clustering system in FIG. 1 is recorded on a computer-readable recording medium, and the program recorded on the recording medium is read into the computer system and executed. You may perform clustering processing of the classification target data. The “computer system” here includes OS and hardware such as peripheral devices. “Computer system” also includes a WWW system equipped with a home page provision environment (or display environment). The “computer-readable recording medium” means a portable medium such as a flexible disk, a magneto-optical disk, a ROM, a CD-ROM, or a storage device such as a hard disk built in the computer system. Furthermore, the “computer-readable recording medium” means a volatile memory (RA) inside a computer system that becomes a server or client when a program is transmitted via a network such as the Internet or a communication line such as a telephone line. As shown in M), the program is held for a certain period of time.

[0092] また、前記プログラムは、このプログラムを記憶装置等に格納したコンピュータシス テムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータ システムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インター ネット等のネットワーク (通信網)や電話回線等の通信回線 (通信線)のように情報を 伝送する機能を有する媒体のことをいう。また、前記プログラムは、前述した機能の一 部を実現するためのものであってもよい。さらに、前述した機能をコンピュータシステ ムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差 分ファイル (差分プログラム)であってもよ 、。 [0092] The program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the “transmission medium” for transmitting a program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line. The program may be for realizing a part of the functions described above. Furthermore, what can realize the above-mentioned functions in combination with programs already recorded in the computer system, so-called differences. Even a minute file (difference program).

産業上の利用可能性 Industrial applicability

本発明は、ガラス物品等の欠点検出などのように多種類の特徴量を有する情報を 高精度で分類し判別する分野に応用でき、さらに製造状態検出装置や製品製造管 理装置にも利用できる。 なお、 2006年 7月 6日に出願された日本特許出願 2006— 186628の明細書、特 許請求の範囲、図面及び要約書の全内容をここに引用し、本発明の明細書の開示と して、取り入れるものである。  The present invention can be applied to the field of classifying and discriminating information having various types of features with high accuracy, such as detection of defects in glass articles, etc., and can also be used for manufacturing state detection devices and product manufacturing management devices. . It should be noted that the entire contents of the specification, patent claims, drawings and abstract of Japanese Patent Application 2006-186628 filed on July 6, 2006 are incorporated herein by reference. It is something that is incorporated.

Claims

請求の範囲 The scope of the claims [1] 学習データの母集団により形成されたクラスタ各々に、入力データを、該入力デー タが有する特徴量により分類するクラスタリングシステムにおいて、  [1] In a clustering system that classifies input data according to the features of the input data into each cluster formed by the population of learning data, 前記クラスタ各々に対応して、分類に用いる特徴量の組合せである特徴量セットが 記憶されて 、る特徴量セット記憶部と、  A feature value set that is a combination of feature values used for classification is stored corresponding to each of the clusters, and a feature value set storage unit, 入力データから予め設定されている特徴量を抽出する特徴量抽出部と、 各クラスタに対応した特徴量セット毎に、該特徴量セットに含まれる特徴量に基づ いて、各クラスタの母集団の中心と前記入力データとの距離を、各々セット距離として 計算して出力する距離計算部と、  A feature quantity extraction unit that extracts preset feature quantities from the input data, and for each feature quantity set corresponding to each cluster, based on the feature quantities included in the feature quantity set, the population of each cluster A distance calculation unit that calculates and outputs the distance between the center and the input data as a set distance, 前記各セット距離を小さい順に配列する順位抽出部と  A rank extracting unit for arranging the set distances in ascending order; を有することを特徴とするクラスタリングシステム。  A clustering system characterized by comprising: [2] 前記特徴量セットが各クラスタ毎に複数設定されて ヽる請求項 1に記載のクラスタリ ングシステム。  [2] The clustering system according to claim 1, wherein a plurality of the feature quantity sets are set for each cluster. [3] 特徴量セット毎に得られた前記セット距離にぉ 、て、該セット距離の順位に基づ ヽ て設定された入力データの各クラスタへの分類基準を示す規則パターンにより、前記 入力データがいずれのクラスタに属するかを検出するクラスタ分類部をさらに有する 請求項 2に記載のクラスタリングシステム。  [3] According to the rule pattern indicating the classification criteria for each cluster of the input data set based on the set distance obtained for each feature quantity set and based on the rank of the set distance, the input data The clustering system according to claim 2, further comprising a cluster classification unit that detects which cluster belongs to. [4] 前記クラスタ分類部が、前記セット距離の順位により、前記入力データがいずれの クラスタに属するかを検出し、該順位が上位となったセット距離が多いクラスタを、前 記入力データの属するクラスタとして検出する請求項 3に記載のクラスタリングシステ ム。  [4] The cluster classification unit detects which cluster the input data belongs to based on the rank of the set distance, and a cluster having a large set distance with the rank higher belongs to the input data. The clustering system according to claim 3, wherein the clustering system is detected as a cluster. [5] 前記クラスタ分類部が、順位が上位となった数に対する閾値を有しており、上位とな つたクラスタが該閾値以上であれば入力データの属するクラスタとして検出する請求 項 4に記載のクラスタリングシステム。  5. The cluster classification unit according to claim 4, wherein the cluster classification unit has a threshold for the number of ranks higher than the rank, and detects a cluster to which input data belongs if the higher rank cluster is equal to or higher than the threshold. Clustering system. [6] 前記距離計算部が、前記セット距離に対して特徴量セット対応して設定されて!ヽる 補正係数を乗算し、各特徴量セット間におけるセット距離を標準化することを特徴と する請求項 1から請求項 5のいずれかに記載のクラスタリングシステム。  [6] The distance calculation unit is characterized by multiplying the set distance corresponding to the feature amount set by a correction coefficient and standardizing a set distance between each feature amount set. Item 6. The clustering system according to any one of Items 1 to 5. [7] 各クラスタ毎の特徴量セットを作成する特徴量セット作成部をさらに有し、 前記特徴量セット作成部が、各特徴量の複数の組合せ毎に、各クラスタの母集団 の学習データの平均値を原点とし、この原点と他のクラスタの母集団の各学習データ との距離の平均値を求め、最も大きな平均値となった特徴量の組合せを、各クラスタ の他のクラスタとの識別に用いる特徴量セットとして選択する請求項 1から請求項 6の[7] It further includes a feature quantity set creation unit that creates a feature quantity set for each cluster, For each of a plurality of combinations of feature amounts, the feature value set creation unit uses the average value of the learning data of the population of each cluster as the origin, and calculates the distance between this origin and each learning data of the population of other clusters. The average value is obtained, and the combination of the feature values having the largest average value is selected as a feature value set used for distinguishing each cluster from other clusters. V、ずれかに記載のクラスタリングシステム。 V, the clustering system described in somewhere. [8] 前記請求項 1から請求項 7の 、ずれかに記載のクラスタリングシステムが設けられ、 前記入力データが製品の欠陥の画像データであり、欠陥を示す特徴量により、画 像データにおける欠陥を、欠陥の種類別に分類する欠陥種類判定装置。 [8] The clustering system according to any one of claims 1 to 7 is provided, wherein the input data is image data of a product defect, and the defect in the image data is detected by a feature amount indicating the defect. Defect type determination device for classifying by defect type. [9] 前記製品がガラス物品であり、該ガラス物品の欠陥を、欠陥の種類別に分類する請 求項 8に記載の欠陥種類判定装置。 [9] The defect type determination device according to claim 8, wherein the product is a glass article, and the defects of the glass article are classified by defect type. [10] 請求項 8または請求項 9に記載の欠陥種類判定装置が設けられた、製品の欠陥の 種別を検出する欠陥検出装置。 [10] A defect detection apparatus for detecting a defect type of a product, provided with the defect type determination apparatus according to claim 8 or 9. [11] 請求項 8または請求項 9に記載の欠陥種類判定装置が設けられた、製品の欠陥の 種別を行い、該種別に対応した発生要因との対応に基づき、製造プロセスにおける 欠陥の発生要因の検出を行う製造状態判定装置。 [11] The defect type judging device according to claim 8 or claim 9 is provided, the type of defect of the product is determined, and the cause of the defect in the manufacturing process based on the correspondence with the cause corresponding to the type Manufacturing state determination device that detects the above. [12] 前記請求項 1から請求項 7の 、ずれかに記載のクラスタリングシステムが設けられ、 前記入力データが製品の製造プロセスにおける製造条件を示す特徴量であり、こ の特徴量を、製造プロセスの各工程の製造状態別に分類する製造状態判定装置。 [12] The clustering system according to any one of claims 1 to 7 is provided, and the input data is a feature value indicating a manufacturing condition in a product manufacturing process, and the feature value is converted into a manufacturing process. The manufacturing state determination apparatus which classifies according to the manufacturing state of each process. [13] 前記製品がガラス物品であり、該ガラス物品の製造プロセスにおける特徴量を、製 造プロセスの各工程の製造状態別に分類する請求項 12に記載の製造状態判定装 置。 13. The manufacturing state determination device according to claim 12, wherein the product is a glass article, and the feature amount in the manufacturing process of the glass article is classified according to the manufacturing state of each step of the manufacturing process. [14] 請求項 12または請求項 13に記載の製造状態判定装置が設けられた、製品の製造 プロセスの各工程における製造状態の種別を検出する製造状態検出装置。  [14] A manufacturing state detecting device provided with the manufacturing state determining device according to claim 12 or 13, for detecting a type of manufacturing state in each step of a product manufacturing process. [15] 請求項 12または請求項 13に記載の製造状態判定装置が設けられた、製品の製造 プロセスの各工程における製造状態の種別の検出を行 ヽ、該種別に対応した制御 項目に基づき、製造プロセスの工程におけるプロセス制御を行う製品製造管理装置  [15] The manufacturing state determination device according to claim 12 or claim 13 is provided to detect the type of manufacturing state in each step of the product manufacturing process, and based on the control item corresponding to the type, Product manufacturing management device for process control in the manufacturing process
PCT/JP2007/063325 2006-07-06 2007-07-03 Clustering system, and defect kind judging device Ceased WO2008004559A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2008523694A JP5120254B2 (en) 2006-07-06 2007-07-03 Clustering system and defect type determination apparatus
CN200780025547.5A CN101484910B (en) 2006-07-06 2007-07-03 Clustering system, and defect kind judging device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2006186628 2006-07-06
JP2006-186628 2006-07-06

Publications (1)

Publication Number Publication Date
WO2008004559A1 true WO2008004559A1 (en) 2008-01-10

Family

ID=38894527

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2007/063325 Ceased WO2008004559A1 (en) 2006-07-06 2007-07-03 Clustering system, and defect kind judging device

Country Status (5)

Country Link
JP (1) JP5120254B2 (en)
KR (1) KR100998456B1 (en)
CN (1) CN101484910B (en)
TW (1) TWI434229B (en)
WO (1) WO2008004559A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009250722A (en) * 2008-04-03 2009-10-29 Nippon Steel Corp Defect learning apparatus, defect learning method and computer program
JP2009265082A (en) * 2008-04-03 2009-11-12 Nippon Steel Corp Flaw learning apparatus, flaw learning method, and computer program
WO2010058759A1 (en) 2008-11-20 2010-05-27 旭硝子株式会社 Transparent body inspecting device
JPWO2021140865A1 (en) * 2020-01-08 2021-07-15
TWI769722B (en) * 2020-03-31 2022-07-01 日商Sumco股份有限公司 State determination device, state determination method and state determination program
CN115687961A (en) * 2023-01-03 2023-02-03 苏芯物联技术(南京)有限公司 Automatic welding process intelligent identification method based on pattern recognition

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010035124A1 (en) * 2008-09-29 2010-04-01 Sandisk Il Ltd. File system for storage device which uses different cluster sizes
JP5465689B2 (en) * 2011-02-28 2014-04-09 株式会社日立製作所 High-precision similarity search system
JP5943722B2 (en) * 2012-06-08 2016-07-05 三菱重工業株式会社 Defect determination apparatus, radiation imaging system, and defect determination method
CN107110743B (en) * 2015-01-21 2019-12-10 三菱电机株式会社 Inspection data processing device and inspection data processing method
WO2016117086A1 (en) * 2015-01-22 2016-07-28 三菱電機株式会社 Chronological data search device and chronological data search program
JP6919779B2 (en) * 2016-12-20 2021-08-18 日本電気硝子株式会社 Glass substrate manufacturing method
KR102260976B1 (en) * 2017-10-30 2021-06-04 현대모비스 주식회사 Apparatus for manufacturing object false positive rejector
CN107941812B (en) * 2017-12-20 2021-07-16 联想(北京)有限公司 Information processing method and electronic equipment
JP2019204232A (en) * 2018-05-22 2019-11-28 株式会社ジェイテクト Information processing method, information processor, and program
JP6771705B2 (en) * 2018-07-31 2020-10-21 三菱電機株式会社 Information processing equipment, programs and information processing methods
CN109522931A (en) * 2018-10-18 2019-03-26 深圳市华星光电半导体显示技术有限公司 Judge the method and its system of the folded figure aggregation of defect
JP7028133B2 (en) 2018-10-23 2022-03-02 オムロン株式会社 Control system and control method
JP7270127B2 (en) * 2019-10-07 2023-05-10 パナソニックIpマネジメント株式会社 Classification system, classification method, and program
CN111984812B (en) * 2020-08-05 2024-05-03 沈阳东软智能医疗科技研究院有限公司 Feature extraction model generation method, image retrieval method, device and equipment
CN112730427B (en) * 2020-12-22 2024-02-09 安徽康能电气有限公司 Product surface defect detection method and system based on machine vision
CN113312400B (en) * 2021-06-02 2024-01-30 蚌埠凯盛工程技术有限公司 Float glass grade judging method and device
KR102464945B1 (en) * 2021-08-18 2022-11-10 한국과학기술정보연구원 Apparatus and method for analyzing signal data state using machine learning
KR102795578B1 (en) * 2021-10-12 2025-04-15 경기대학교 산학협력단 Apparatus and method for generating image annotation based on shap

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08106295A (en) * 1994-10-05 1996-04-23 Atr Onsei Honyaku Tsushin Kenkyusho:Kk Method and device for recognizing pattern
JPH11344450A (en) * 1998-06-03 1999-12-14 Hitachi Ltd Teaching data creation method, defect classification method and device therefor
JP2001184509A (en) * 1999-12-24 2001-07-06 Nec Corp Pattern recognition apparatus and method, and recording medium
JP2004165216A (en) * 2002-11-08 2004-06-10 Matsushita Electric Ind Co Ltd Production management method and production management device
JP2006099565A (en) * 2004-09-30 2006-04-13 Kddi Corp Content identification device

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3095623B2 (en) * 1994-06-16 2000-10-10 松下電器産業株式会社 Attribute judgment method
US6307965B1 (en) * 1998-04-30 2001-10-23 International Business Machines Corporation System and method for detecting clusters of information
JP2002099916A (en) * 2000-09-25 2002-04-05 Olympus Optical Co Ltd Pattern-classifying method, its device, and computer- readable storage medium
JP2003344300A (en) * 2002-05-21 2003-12-03 Jfe Steel Kk Surface defect determination method
JP2006105943A (en) * 2004-10-08 2006-04-20 Omron Corp Device for creating knowledge, parameter retrieving method, and program product

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08106295A (en) * 1994-10-05 1996-04-23 Atr Onsei Honyaku Tsushin Kenkyusho:Kk Method and device for recognizing pattern
JPH11344450A (en) * 1998-06-03 1999-12-14 Hitachi Ltd Teaching data creation method, defect classification method and device therefor
JP2001184509A (en) * 1999-12-24 2001-07-06 Nec Corp Pattern recognition apparatus and method, and recording medium
JP2004165216A (en) * 2002-11-08 2004-06-10 Matsushita Electric Ind Co Ltd Production management method and production management device
JP2006099565A (en) * 2004-09-30 2006-04-13 Kddi Corp Content identification device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
KURUMISAWA M.: "Glass no Gaikan Kekkan no Kensa (Inspection for Sensor Defects of Glass)", THE INSTITUTE OF ELECTRICAL ENGINEERS OF JAPAN KENKYUKAI SHIRYO PAPERS OF TECHNICAL MEETING ON INFORMATION ORIENTED INDUSTRIAL SYSTEM, IIS, IEE, JAPAN, 20 March 2006 (2006-03-20), pages 7 - 10, XP003020581 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009250722A (en) * 2008-04-03 2009-10-29 Nippon Steel Corp Defect learning apparatus, defect learning method and computer program
JP2009265082A (en) * 2008-04-03 2009-11-12 Nippon Steel Corp Flaw learning apparatus, flaw learning method, and computer program
WO2010058759A1 (en) 2008-11-20 2010-05-27 旭硝子株式会社 Transparent body inspecting device
JPWO2021140865A1 (en) * 2020-01-08 2021-07-15
JP7220356B2 (en) 2020-01-08 2023-02-10 パナソニックIpマネジメント株式会社 Classification system, classification method, and program
TWI769722B (en) * 2020-03-31 2022-07-01 日商Sumco股份有限公司 State determination device, state determination method and state determination program
CN115687961A (en) * 2023-01-03 2023-02-03 苏芯物联技术(南京)有限公司 Automatic welding process intelligent identification method based on pattern recognition

Also Published As

Publication number Publication date
CN101484910B (en) 2015-04-08
JP5120254B2 (en) 2013-01-16
TWI434229B (en) 2014-04-11
KR20090018920A (en) 2009-02-24
KR100998456B1 (en) 2010-12-06
TW200818060A (en) 2008-04-16
JPWO2008004559A1 (en) 2009-12-03
CN101484910A (en) 2009-07-15

Similar Documents

Publication Publication Date Title
WO2008004559A1 (en) Clustering system, and defect kind judging device
CN108023876B (en) Intrusion detection method and intrusion detection system based on sustainable ensemble learning
CN112860183B (en) Intelligent fault diagnosis method for multi-source distillation-migration machinery based on high-order moment matching
CN111090579B (en) Software Defect Prediction Method Based on Pearson Correlation Weighted Association Classification Rule
CN103544499A (en) Method for reducing dimensions of texture features for surface defect detection on basis of machine vision
CN106529580A (en) EDSVM-based software defect data association classification method
US20050192824A1 (en) System and method for determining a behavior of a classifier for use with business data
CN116930042B (en) Building waterproof material performance detection equipment and method
CN110879821A (en) Method, device, equipment and storage medium for generating rating card model derivative label
CN120030440A (en) A data review and annotation system and method based on marketing
CN1749988A (en) Methods and apparatus for managing and predicting performance of automatic classifiers
CN116071558A (en) Processing method and device and electronic equipment
KR102750441B1 (en) Artificial intelligence learning devices and methods capable of responding to changes in data labeling criteria
Binghay et al. Object Detection Approach for Batch Detection of Cacao Bean Defects
Zahan Prediction of Faults in Embedded Software Using Machine Learning Approaches
Elsaied et al. Automated Classification of Marble Types Using Texture Features and Neural Networks: A Robust Approach for Enhanced Accuracy and Reproducibility
CN1750020A (en) Methods and apparatus for managing and predicting performance of automatic classifiers
CN120147258B (en) A particleboard grading method based on an improved decision tree algorithm
CN119295137B (en) Method and system for processing research and development data of plastic products
Hompoa et al. Namdokmai Mango Sweetness Classification Using YOLO with Quality Focal Loss
Chen Data Quality Assessment Methodology for Improved Prognostics Modeling
Reddy¹ et al. Combatting Misinformation: Leveraging Machine Learning Ensemble Methods for Fake
CN114862465A (en) Client development analysis system and method for carrying out statistics based on machine learning
CN121052755A (en) Method and system for checking out and in-warehouse of fastener
Kant Efficient Rice Grain Classification Using MobileNetV2: A Deep Learning Approach

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 200780025547.5

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07768095

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2008523694

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 1020087028432

Country of ref document: KR

NENP Non-entry into the national phase

Ref country code: DE

NENP Non-entry into the national phase

Ref country code: RU

122 Ep: pct application non-entry in european phase

Ref document number: 07768095

Country of ref document: EP

Kind code of ref document: A1