WO2025224959A1 - Information processing device, training method, and program - Google Patents
Information processing device, training method, and programInfo
- Publication number
- WO2025224959A1 WO2025224959A1 PCT/JP2024/016359 JP2024016359W WO2025224959A1 WO 2025224959 A1 WO2025224959 A1 WO 2025224959A1 JP 2024016359 W JP2024016359 W JP 2024016359W WO 2025224959 A1 WO2025224959 A1 WO 2025224959A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- unit
- feature
- cluster
- data
- inference
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Definitions
- This disclosure relates to an information processing device, a learning method, and a program.
- Non-Patent Document 1 describes a technique in which features extracted from image data are clustered, and the clusters obtained by the clustering are assigned as pseudo labels to each image.
- Non-Patent Document 1 had the problem that if labels were not assigned to each feature point within a cluster, it was not possible to perform learning by assigning pseudo-labels to these points. For example, the conventional technology assumes that a pseudo-label is assigned to each image corresponding to a cluster, and therefore cannot be applied to image segmentation, which requires that a pseudo-label be assigned to each pixel within an image.
- the present disclosure aims to solve the above problem and provide an information processing device capable of learning by assigning pseudo labels to each feature point of a cluster.
- the information processing device includes a feature extraction unit that extracts features from data, a clustering unit that determines the cluster to which a feature belongs by clustering the features in feature space, a pseudo label assignment unit that sequentially sets ranges within the cluster, generates pseudo labels based on cluster information corresponding to the features included within the range, and assigns the generated pseudo label to at least one feature included within the range, an inference unit that calculates an inferred value based on the feature, and a parameter update unit that updates parameters of the feature extraction unit and the inference unit based on the pseudo label and the inferred value.
- the information processing device disclosed herein enables learning in which pseudo labels are assigned to each feature point of a cluster.
- FIG. 1 is a block diagram showing an example of the configuration of an information processing device according to a first embodiment; 4 is a flowchart illustrating an operation of the information processing device according to the first embodiment. 3 is a flowchart showing a learning method according to the first embodiment.
- FIG. 1 is a diagram illustrating an overview of a process for extracting features from data.
- FIG. 1 is a diagram illustrating an overview of clustering of feature amounts.
- FIG. 10 is a diagram illustrating an overview of a process for assigning a pseudo label to each point in a cluster.
- FIG. 1 is a diagram illustrating an outline of inference using feature amounts.
- FIG. 10 is a diagram illustrating an overview of parameter updates in a feature extraction unit and an inference unit.
- FIG. 1 is a block diagram showing a hardware configuration for realizing the functions of an information processing device according to a first embodiment
- FIG. 10 is a block diagram showing a configuration example of an information processing device according to a second embodiment.
- 10 is a flowchart showing a learning method according to the second embodiment.
- 12A, 12B, 12C, and 12D are diagrams showing examples of data having spatial continuity.
- Embodiment 1 In supervised learning, a correct label (hereinafter simply referred to as "label") is assigned to a dataset, whereas in unsupervised learning, unlabeled data is used.
- label is identification information used to identify the cluster into which the data constituting the dataset is classified.
- a cluster is a set of points with similar features. Features are expressed in the form of numbers or categories, etc., of each piece of data constituting the dataset. Points with features that share characteristics different from those of other clusters are classified within the same cluster.
- the information processing device is a device that includes an inference unit, and that performs inference processing by learning the inference unit by updating the parameters of an inference model through unsupervised learning.
- the unsupervised learning performed by the information processing device according to the first embodiment calculates pseudo labels corresponding to features extracted from unlabeled data, and assigns the calculated pseudo labels to each feature point within a cluster.
- a pseudo label is a label that is artificially assigned to unlabeled data.
- Outline of information processing device 1 is a block diagram showing an example configuration of an information processing device 1 according to embodiment 1.
- the information processing device 1 extracts features from data acquired as a dataset, clusters the features in a feature space, sequentially sets ranges within the clusters, generates pseudo labels based on cluster information corresponding to the features included in the ranges, and assigns the pseudo labels to at least one feature included in the ranges.
- each point in the feature space represents a point at each time in time-series data of features, or a feature of each pixel of an image, etc.
- Cluster information is information that indicates the relationship between points that indicate feature quantities and the clusters into which these points are classified, and is generated in the process of clustering features extracted from data.
- Cluster information includes information that indicates which cluster a point that indicates a feature quantity has been classified into, information that indicates the reference point of the cluster, or information that indicates the characteristics or attributes of the cluster.
- Information that indicates which cluster a point that indicates a feature quantity has been classified into includes, for example, information that indicates a cluster number used to identify the cluster. Examples of reference points for a cluster include the center point, center of gravity, average value, or median of the cluster.
- Information that indicates the characteristics or attributes of a cluster is information in which the characteristics or attributes have been quantified, and may be, for example, a score used to evaluate the characteristics of the cluster.
- the score is calculated based on the feature amount. For example, the information processing device 1 assigns the average value of the scores corresponding to the feature amounts included within a range set within the cluster as a pseudo label to the target points included within the range. The information processing device 1 then sets the above range sequentially and assigns a pseudo label to each point so that all points included in the cluster become target points. This allows the information processing device 1 to learn to assign pseudo labels to all feature amount points included in the feature amount space.
- the information processing device 1 also includes a feature extraction unit 11, a clustering unit 12, a pseudo label assignment unit 13, an inference unit 14, and a parameter update unit 15.
- the information processing device 1 is implemented by a computer.
- a memory included in the computer stores information processing application programs for implementing the functions of the feature extraction unit 11, the clustering unit 12, the pseudo label assignment unit 13, the inference unit 14, and the parameter update unit 15.
- a processor included in the computer executes the information processing application read from the memory, thereby implementing the functions of the feature extraction unit 11, the clustering unit 12, the pseudo label assignment unit 13, the inference unit 14, and the parameter update unit 15.
- the feature extraction unit 11 extracts features from data. For example, when extracting features from data in a three-dimensional space, the feature extraction unit 11 selects features to be extracted depending on the purpose for which the data will be used. Position information, normal information, color information, or a combination of these, of an object existing in the three-dimensional space is selected as the feature to be extracted. Then, the feature extraction unit 11 extracts the selected feature.
- the feature extracting unit 11 extracts the position coordinates of the object as the feature. Furthermore, when shape information is selected as the feature to be extracted, the feature extraction unit 11 calculates a feature (volume, surface area, length, etc.) representing the shape of the object. When color information is selected as the feature to be extracted, the feature extraction unit 11 extracts the hue, saturation, brightness, etc. of the object.
- the feature extraction unit 11 may also scale the features. For example, if the scales of the position information are different, the feature extraction unit 11 may perform normalization to unify the ranges of each dimension. In the case of color information, the feature extraction unit 11 may scale the ranges of color values to unify them.
- the learning data from which features are extracted includes sensor data, web data, image data, video data, text data, and the like.
- the sensor data is data detected by a physical sensor.
- the sensor data may include data such as temperature, humidity, or atmospheric pressure detected by a weather sensor, environmental monitoring data detected by a sensor network, and patient biometric information detected by a medical sensor.
- the feature extraction unit 11 can extract features from the sensor data acquired from the sensor by the information processing device 1.
- Web data is data available on the Internet, such as information from web pages or user behavior data.
- the information processing device 1 may acquire web data by web scraping using a communication unit not shown in FIG. 1.
- the information processing device 1 may also acquire data from social media using the communication unit.
- the feature extraction unit 11 is capable of extracting features from the web data acquired by the information processing device 1 using the communication unit.
- Image data and video data are information collected using cameras or sensors, and include, for example, video data captured by surveillance cameras, medical image data, camera image data for automobile detection or segmentation, and the like.
- the feature extraction unit 11 is capable of extracting features from image data or video data acquired by the information processing device 1 from a camera or a sensor.
- Text data is text information collected from books, news articles, web pages, social media posts, etc. It is used for natural language processing (NLP) tasks or text mining.
- NLP natural language processing
- the feature extraction unit 11 is able to extract features from text data acquired by the information processing device 1 using the communication unit.
- the feature extraction unit 11 may also extract features by a convolution operation.
- a convolutional neural network CNN
- a matrix called a convolution kernel is used in the convolution operation.
- the convolution kernel is a weight matrix used when performing a convolution operation on input data.
- the convolution kernel functions as a filter for detecting specific patterns or features.
- the feature extraction unit 11 slides the convolution kernel along each position of the data, sequentially calculates the product of each element of the convolution kernel and the corresponding element of the data, and extracts the value at a specific position of the convolution kernel as a feature. This makes it possible to extract features that indicate local characteristics of the data.
- the feature extraction unit 11 also extracts features from data based on the values of parameters set therein. For example, when the feature extraction unit 11 extracts features through a convolution operation using a CNN model, parameters for setting the size and stride of the convolution kernel affect the feature extraction. The larger the convolution kernel size, the wider the range of features extracted, and the larger the stride, the fewer the number of features extracted. Furthermore, when the feature extraction unit 11 extracts features by moving a moving window over sequence data such as time-series data or text data, the values of parameters for setting the window width and step size affect the feature extraction: the larger the window width, the wider the range of features extracted, and the larger the step size, the fewer the number of features.
- the feature extraction unit 11 When the data from which features are to be extracted by convolution operations is one-dimensional data, two-dimensional data, three-dimensional data, or graph data, the feature extraction unit 11 performs one-dimensional convolution, two-dimensional convolution, three-dimensional convolution, or graph convolution to extract features.
- one-dimensional convolution For example, one-dimensional data includes time series data, audio data, sensor data, etc.
- the convolution kernel is defined as a one-dimensional array.
- the feature extraction unit 11 slides the convolution kernel along each position of the data and sequentially calculates the product of each element of the convolution kernel and the corresponding element of the data.
- a feature indicating a local feature that emphasizes the relationship between neighboring elements in the data is extracted.
- the position of the feature from which the feature is extracted may be the beginning, the center, or the end of the convolution kernel.
- Two-dimensional convolution For example, two-dimensional data includes image data.
- the convolution kernel is defined as a two-dimensional matrix.
- the feature extraction unit 11 slides the convolution kernel along each position of the data and sequentially calculates the product of each element of the convolution kernel and the corresponding element of the data.
- By extracting values at specific positions of the convolution kernel as features features indicating local features that emphasize the relationship between neighboring elements in the data are extracted.
- the position of the feature from which the feature is extracted may be the beginning, the center, or the end of the convolution kernel.
- 3D data includes video data.
- the convolution kernel is defined as a 3D matrix having three dimensions: width, height, and channel.
- the feature extraction unit 11 slides the convolution kernel along each position of the data and sequentially calculates the product of each element of the convolution kernel and the corresponding element of the data.
- By extracting values at specific positions of the convolution kernel as features features indicating local features that emphasize the relationship between neighboring elements in the data are extracted.
- the position of the feature from which the feature is extracted may be the beginning, the center, or the end of the convolution kernel.
- the feature extraction unit 11 may perform graph convolution, which is a convolution operation on data having a graph structure.
- Examples of data having a graph structure include social networks, power grids, and molecular structures.
- graph convolution a graph structure is provided as input data.
- a graph is composed of a set of nodes (vertices) and edges (lines), where the nodes represent elements of the data and the edges represent relationships between the nodes.
- the feature extraction unit 11 performs a convolution operation while moving a convolution kernel on the graph, thereby extracting specific patterns in the graph as features.
- the feature extraction unit 11 may perform two-dimensional convolution on the three-dimensional data in the spatial direction, and then perform one-dimensional convolution on the three-dimensional data in the time direction.
- the feature extraction unit 11 may convert the image data into a feature vector of a fixed length, and then perform one-dimensional convolution in the time direction.
- the feature extraction unit 11 may perform derated convolution, which uses a convolution kernel to convolve elements at regular intervals in the data.
- the feature extraction unit 11 may extract features using a recursive neural network (RNN) such as a long short-term memory (LSTM) or a gated recurrent unit (GRU). Furthermore, the feature extraction unit 11 may extract features using an attention such as a transformer or an MLP-Mixer that replaces it.
- RNN recursive neural network
- LSTM long short-term memory
- GRU gated recurrent unit
- an attention such as a transformer or an MLP-Mixer that replaces it.
- the clustering unit 12 determines the cluster to which a feature belongs by clustering the feature in the feature space. Clustering is a process of grouping points with similar feature amounts. For example, the clustering unit 12 calculates the distance or similarity between points in the feature space and clusters the feature amounts based on the calculated distance or similarity.
- the distance can be Euclidean distance or Manhattan distance.
- the similarity can be cosine similarity.
- Typical clustering methods include, for example, k-means, Gaussian mixture models (GMM), hierarchical (agglomerative) clustering, spectral clustering, DBSCAN, OPTICS, BIRCH, or Mean Shift.
- the feature extraction unit 11 uses the selected clustering method to assign each feature point to a cluster. At this time, clustering is performed based on the distance or similarity between points in the feature space.
- the pseudo label assignment unit 13 sequentially sets ranges within a cluster, generates pseudo labels based on cluster information corresponding to feature quantities included within the ranges, and assigns pseudo labels to at least one feature quantity included within the ranges.
- the pseudo label assignment unit 13 assigns pseudo labels to all points in the feature quantity space by performing the above process for each cluster set in the feature quantity space.
- the range set within a cluster is a moving window that includes points of one or more feature quantities, and defines the cluster in the same dimension as the feature quantities, for example. This range is sequentially set so that all points included in the cluster are targets for assigning pseudo labels.
- the pseudo label may be any one of the mode, median, average, minimum, and maximum values of the cluster information corresponding to the feature amount included in the range set in the cluster. For example, if the range set within a cluster is a fixed-length moving window for one-dimensional data of features, and the cluster information corresponding to the features included in the moving window is a cluster number, the pseudo label assignment unit 13 calculates one of the mode, median, average, minimum, or maximum of the cluster number corresponding to the features included in the moving window as the pseudo label. Since pseudo labels can be generated that statistically take into account cluster information corresponding to the features contained in the clusters, the information processing device 1 can more accurately represent the features or patterns of the data.
- the pseudo label assignment unit 13 may change the size of the range (moving window) set within the cluster as needed. Changing the size of the range makes it possible to extract features over different ranges. For example, when the processing accuracy is set using an input device not shown in FIG. 1, the pseudo label assignment unit 13 sets a range size according to this processing accuracy and performs the pseudo label assignment process. In this case, if the size of the range is changed significantly, the range from which features are extracted becomes wider, and if the size of the range is decreased, more localized features are extracted. The pseudo label assignment unit 13 may also change the size of the range to the size set by the user using the input device.
- the pseudo label assignment unit 13 may assign pseudo labels to one-dimensional features using a filter with a feedback loop, such as an IIR (Infinite Impulse Response) filter. For example, the pseudo label assignment unit 13 adjusts the parameters of the IIR filter to control the position and size of the moving window, sequentially setting the moving window within the cluster, and assigns the information obtained by the IIR filter as pseudo labels to the features within the moving window.
- a filter with a feedback loop such as an IIR (Infinite Impulse Response) filter.
- IIR Intelligent Impulse Response
- the pseudo label assignment unit 13 may assign pseudo labels to two-dimensional features using an image filter.
- An image filter is a filter that uses a convolution operation to identify a specific area on two-dimensional data and transform the information within that area.
- Image filters include smoothing filters, edge detection filters, and sharpening filters.
- the pseudo label assignment unit 13 adjusts the parameters of the image filter to control the position and size of the moving window, sequentially setting the moving window within the cluster, and assigns the information obtained by the image filter as a pseudo label to the feature within the moving window.
- the pseudo label assignment unit 13 may assign pseudo levels using an averaging filter, a weighted average filter, a median filter, or a Laplacian filter.
- the averaging filter is, for example, a filter that replaces the value of each pixel in an image with the average value of the values of its surrounding pixels.
- the pseudo label assignment unit 13 uses the averaging filter to sequentially set moving windows on clusters in the image and assigns the average value of all pixel values included in the moving windows as a pseudo label to the feature to be assigned.
- a weighted average filter is, for example, a filter that calculates an average value by weighting the values of pixels surrounding a pixel of interest.
- the pseudo label assignment unit 13 uses the weighted average filter to sequentially set a moving window within a cluster on the image, calculate the average value of all pixel values included in the moving window, and assign the calculated value as a pseudo label to the feature to be assigned.
- a median filter is a filter that, for example, calculates the median of the pixel values included in a moving window and uses the calculated value as the value of the pixel at the center of the moving window.
- the pseudo label assignment unit 13 uses a median filter to sequentially set moving windows within clusters on the image, calculates the median of the values of all pixels included in the moving window, and assigns the calculated value as a pseudo label to the feature to be assigned.
- a Laplacian filter for example, is a filter that emphasizes edges or features in an image and detects sudden changes in the image using a two-dimensional differential operator.
- the pseudo label assignment unit 13 uses a Laplacian filter to sequentially set moving windows within clusters on the image, calculates the values of pixels included in the moving windows that have suddenly changed, and assigns the calculated values as pseudo labels to the feature to be assigned.
- the pseudo label assignment unit 13 may assign pseudo labels to the feature quantities of the graph data using a label propagation method or a label diffusion method.
- Label propagation is a method of updating the labels of nodes in a graph based on the labels of surrounding nodes.
- label diffusion is a method of updating the labels of nodes in a graph, but in label diffusion, the labels are updated taking into account not only the node labels but also information such as edge weights and distances.
- the pseudo label assignment unit 13 assigns an initial label to each node and updates the label using the mode or average value of the labels of adjacent nodes. This allows the labels to be propagated or diffused throughout the graph.
- the pseudo label assignment unit 13 assigns pseudo labels based on the labels of the feature quantities included in the moving window. By repeating these processes until the label updates converge, it is possible to assign pseudo labels to each feature quantity point within the cluster.
- the inference unit 14 calculates an inferred value based on the feature quantities extracted from the data by the feature quantity extraction unit 11. For example, the inference unit 14 uses a model for calculating an inferred value from the feature quantities.
- the model is a model of the relationship between the input feature quantities and the output inferred value. Examples of the model include a neural network, a generalized linear model, a generalized additive model, a decision tree, a decision tree ensemble, a support vector machine, and a nearest neighbor model.
- the parameter values of the model are updated by the parameter update unit 15 (described later), and the model is trained so that it can generate an appropriate inferred value from the input data.
- the inference unit 14 uses the trained model to perform processing such as prediction, classification, or regression on the input data.
- Activation functions include ReLU (Rectified Linear Unit), Sigmoid, Softmax, and Tanh. Note that it is also possible to output the value obtained by adding a bias to the dot product of the input and weight without using an activation function.
- the parameter update unit 15 updates the parameters of the feature extraction unit 11 and the inference unit 14 based on the pseudo label and the inference value. For example, the parameter update unit 15 updates the parameters of the feature extraction unit 11 and the inference unit 14 so that the inference value output from the inference unit 14 matches the pseudo label output from the pseudo label assignment unit 13.
- the parameter update unit 15 uses a loss function to evaluate the degree of agreement between the inferred value and the pseudo label.
- the loss function is an index that represents the difference between the inferred value and the pseudo label, and is used to evaluate the performance of the model. The value of the loss function becomes smaller as the inferred value and the pseudo label value become closer.
- loss functions include cross-entropy when the inferred values are categorical, and mean absolute error (MAE) or mean squared error (MSE) when the inferred values are numeric.
- the parameter update unit 15 updates the parameters of the feature extraction unit 11 and the inference unit 14 so as to minimize the value of the loss function. For example, if the feature extraction unit 11 and the inference unit 14 are capable of gradient calculation of a neural network or the like, the parameter update unit 15 calculates the gradient of the loss function using the gradient descent method and uses the calculated gradient to calculate the amount of update for each parameter of the feature extraction unit 11 and the inference unit 14. The parameter update unit 15 repeats the above procedure to perform learning until the value of the loss function converges. As a result, the parameters are updated in a direction that minimizes the loss function. Gradient descent methods include stochastic gradient descent (SGD), mini-batch gradient descent, Adam, or RMSProp.
- FIG. 2 is a flowchart showing the operation of the information processing device 1.
- the parameter values of the feature extraction unit 11 are initialized (step ST1).
- the feature extraction unit 11 initializes parameters such as the weights of a model for extracting features with random values or predefined values.
- the information processing device 1 acquires data for learning (step ST2).
- the data acquired by the information processing device 1 is output to the feature extraction unit 11.
- the learning dataset is a dataset for unsupervised learning that does not have a correct answer label assigned.
- the information processing device 1 then performs a learning process using a learning dataset (step ST3).
- the feature extraction unit 11 extracts features from the data.
- the clustering unit 12 determines the cluster to which each feature belongs by clustering the features in feature space.
- the pseudo label assignment unit 13 sequentially sets ranges within the cluster, generates pseudo labels based on cluster information corresponding to the features included within the range, and assigns the generated pseudo label to at least one feature included within the range.
- the inference unit 14 calculates inference values based on the features, and the parameter update unit 15 updates the parameters of the feature extraction unit 11 and the inference unit 14 based on the pseudo labels and inference values.
- step ST4 It is determined whether the learning conditions have ended (step ST4). If the learning conditions have ended (step ST4; YES), proceed to step ST5. On the other hand, if the learning conditions have not ended (step ST4; NO), return to step ST3 and the learning process is repeated. That is, feature extraction by the feature extraction unit 11, clustering by the clustering unit 12, assignment of pseudo labels by the pseudo label assignment unit 13, inference by the inference unit 14, and parameter update by the parameter update unit 15 are repeatedly executed until the learning conditions end.
- the learning conditions may be, for example, a condition of whether a predetermined number of repetitions has been reached, or a condition of whether a predetermined learning time has been reached, etc.
- the learning condition may be the following condition (1) or (2).
- the condition is that learning is terminated when the number of samples whose pseudo labels change becomes equal to or less than a predetermined value.
- the acquired data is divided into data for learning and data for verification, and the information processing device 1 uses the learning data to update the parameters of the feature extraction unit 11 and the inference unit 14 in step ST3.
- the learning condition is that if the value of a loss function representing the difference between an inferred value and a pseudo label does not decrease a predetermined number of times consecutively using the verification data, the value of the loss function is deemed to have converged and learning is terminated.
- the information processing device 1 may perform clustering and pseudo-labeling, and inference and parameter updating at a 1:1 ratio, or at different ratios. For example, the information processing device 1 may perform clustering and pseudo-labeling once, and then repeat inference and parameter updating ten times according to the stochastic gradient descent method.
- the inference unit 14 performs inference processing using the model trained in step ST4 (step ST5). Inference includes prediction, classification, regression, etc. for new data. When the features of new data are input, the trained model outputs a prediction or classification result.
- FIG. 3 is a flowchart showing the learning method according to the first embodiment, illustrating the detailed processing of step ST3 in FIG.
- the feature extraction unit 11 extracts features from the learning data (step ST1A).
- Fig. 4 is a diagram showing an outline of a process for extracting a feature quantity Z from data X.
- data X is one-dimensional data composed of elements x i .
- Feature quantity Z is a feature quantity vector composed of feature quantities z i , where i is an integer equal to or greater than 1.
- the feature extraction unit 11 calculates the neighborhood of x i included in the data X.
- the neighborhood is a set of other elements existing around the element x i . For example, it is the k nearest neighbors of the element x i or elements within a certain distance range.
- the feature extraction unit 11 calculates a feature z i using a feature function based on the neighborhood data.
- the feature z i calculated using the feature function is each element of the feature corresponding to each element x i of the data. In this way, a feature Z including the feature z i is extracted from the one-dimensional data X.
- the clustering unit 12 determines the cluster to which the feature belongs by clustering the feature in the feature space (step ST2A).
- FIG. 5 is a diagram showing an overview of clustering of feature quantity Z.
- cluster C is a cluster into which feature quantity z i is classified.
- Feature quantity z i is assigned a label c i .
- the clustering unit 12 clusters feature quantity z i into cluster C according to a clustering algorithm set therein. Examples of clustering algorithms include k-means, hierarchical clustering, and DBSCAN. These algorithms determine which cluster C feature quantity z i belongs to.
- the pseudo label assignment unit 13 sequentially sets ranges (moving windows) within the cluster, generates pseudo labels based on cluster information corresponding to features included within the ranges, and assigns the pseudo labels to at least one feature included within the ranges (step ST3A).
- Fig. 6 is a diagram showing an outline of the process of assigning a pseudo label a i to each point in cluster C.
- pseudo label A is composed of pseudo labels a i that are assigned pseudo based on labels c i included in cluster C.
- the pseudo label assignment unit 13 sets a moving window W within cluster C according to the function Neighbor(c i ) and calculates the label c i of each element within the moving window W.
- the pseudo label assignment unit 13 calculates a pseudo label a i based on a label obtained by aggregating the labels c i of the elements within the moving window W according to the Aggregate function. For example, the mode or average value of the aggregated labels c i is calculated as the pseudo label a i .
- the pseudo label assignment unit 13 assigns the calculated pseudo label a i to the element to be assigned. As a result, a set of pseudo labels a i included in the pseudo label A is obtained.
- the inference unit 14 calculates an inference value based on the feature amount (step ST4A).
- Fig. 7 is a diagram showing an outline of inference using the feature quantity Z.
- the inference result Y is the result of inference using the feature quantity Z, and is a set of elements yi of the inference result based on the feature quantity zi .
- the parameter update unit 15 updates the parameters of the feature extraction unit 11 and the inference unit 14 based on the pseudo label and the inference value (step ST5A).
- FIG. 8 is a diagram showing an overview of parameter updates for the feature extraction unit 11 and the inference unit 14.
- the loss function Loss( ai , yi ) is a function that quantifies the difference between the pseudo label ai and the element yi of the inference result, and is used to evaluate the degree of match between them.
- the parameter update unit 15 updates each parameter of the feature extraction unit 11 and the inference unit 14 in a direction that minimizes the value of the loss function Loss( ai , yi ). When the value of the loss function becomes equal to or less than a threshold or a certain number of iterations is reached, the parameter values are considered to have converged, and learning ends.
- FIG. 9 is a block diagram showing a hardware configuration that realizes the functions of the information processing device 1.
- the functions of the feature extraction unit 11, clustering unit 12, pseudo-label assignment unit 13, inference unit 14, and parameter update unit 15 provided in the information processing device 1 are realized by processing circuits. That is, the information processing device 1 includes a processing circuit for executing the processes of steps ST1A to ST5A shown in FIG. 3.
- the processing circuit may be a CPU (Central Processing Unit) that executes a program stored in a memory.
- CPU Central Processing Unit
- the feature extraction unit 11 acquires a learning dataset received from an external device by a communication unit included in the information processing device 1, for example, via the input interface 100. Furthermore, if the dataset is stored in a memory unit included in the information processing device 1, the feature extraction unit 11 reads and acquires the dataset from the memory unit via the input interface 100. In this case, the information processing device 1 does not need to be equipped with a communication unit.
- the inference unit 14 outputs the inference result to an external device, for example, via the output interface 101.
- the inference unit 14 may also transmit the inference result to an external device by controlling the communication unit via the output interface 101.
- the functions of the feature extraction unit 11, clustering unit 12, pseudo label assignment unit 13, inference unit 14, and parameter update unit 15 included in the information processing device 1 are realized by software, firmware, or a combination of software and firmware.
- the software or firmware is written as a program and stored in the memory 103 .
- the processor 102 reads and executes programs stored in the memory 103 to implement the functions of the feature extraction unit 11, clustering unit 12, pseudo label assignment unit 13, inference unit 14, and parameter update unit 15 provided in the information processing device 1.
- the information processing device 1 includes a memory 103 for storing programs that, when executed by the processor 102, result in the processing of steps ST1A to ST5A shown in FIG. 3 being performed. These programs cause a computer to execute the procedures or methods of the processing performed by the feature extraction unit 11, clustering unit 12, pseudo label assignment unit 13, inference unit 14, and parameter update unit 15.
- the memory 103 may be a computer-readable storage medium that stores programs for causing a computer to function as the feature extraction unit 11, clustering unit 12, pseudo label assignment unit 13, inference unit 14, and parameter update unit 15.
- Memory 103 may be, for example, non-volatile or volatile semiconductor memory such as RAM (Random Access Memory), ROM (Read Only Memory), flash memory, EPROM (Erasable Programmable Read Only Memory), EEPROM (Electrically-EPROM) (registered trademark), magnetic disk, flexible disk, optical disk, compact disk, mini disk, DVD, etc.
- Some of the functions of the feature extraction unit 11, clustering unit 12, pseudo label assignment unit 13, inference unit 14 and parameter update unit 15 provided in the information processing device 1 may be realized by dedicated hardware, and other functions may be realized by software or firmware.
- the functions of the feature extraction unit 11, the clustering unit 12, and the pseudo-labeling unit 13 may be realized by a processing circuit that is dedicated hardware, and the functions of the inference unit 14 and the parameter update unit 15 may be realized by the processor 102 reading and executing a program stored in the memory 103. In this way, the processing circuit can realize the above functions by hardware, software, firmware, or a combination of these.
- the information processing device 1 includes a feature extraction unit 11 that extracts features from data, a clustering unit 12 that determines a cluster to which a feature belongs by clustering the features in a feature space, a pseudo label assignment unit 13 that sequentially sets ranges within a cluster, generates pseudo labels based on cluster information corresponding to the features included within the ranges, and assigns the generated pseudo label to at least one feature included within the ranges, an inference unit 14 that calculates an inferred value based on the features, and a parameter update unit 15 that updates parameters of the feature extraction unit 11 and the inference unit 14 based on the pseudo label and the inferred value.
- the information processing device 1 can perform learning by assigning a pseudo label to each point of the feature of the cluster. For example, when the learning data is image data, the information processing device 1 can assign a pseudo label to each pixel in the image, and can be applied to image segmentation.
- the feature extraction unit 11 extracts features by convolution operations. This enables the information processing device 1 to extract features that indicate local characteristics of data.
- the pseudo label is one of the mode, median, mean, minimum, or maximum of the cluster information corresponding to the feature amounts included in the range. Because it is possible to generate pseudo labels that statistically consider the cluster information corresponding to the feature amounts included in the cluster, the information processing device 1 can more accurately represent the features or patterns of the data.
- the learning method includes step ST1A in which a feature extraction unit 11 extracts features from data; step ST2A in which a clustering unit 12 determines a cluster to which a feature belongs by clustering the features in a feature space; step ST3A in which a pseudo label assignment unit 13 sequentially sets ranges within a cluster, generates pseudo labels based on cluster information corresponding to the features included in the ranges, and assigns the generated pseudo label to at least one feature included in the range; step ST4A in which an inference unit 14 calculates an inference value based on the feature; and step ST5A in which a parameter update unit 15 updates parameters of the feature extraction unit 11 and the inference unit 14 based on the pseudo label and the inference value.
- the information processing device 1 can perform learning in which pseudo labels are assigned to each point of the feature quantity of a cluster.
- the computer that executes the program according to the first embodiment functions as a feature extraction unit 11 that extracts features from data, a clustering unit 12 that determines the cluster to which a feature belongs by clustering the features in the feature space, a pseudo label assignment unit 13 that sequentially sets ranges within the cluster, generates pseudo labels based on cluster information corresponding to the features included in the ranges, and assigns the generated pseudo label to at least one feature included in the range, an inference unit 14 that calculates an inferred value based on the feature, and a parameter update unit 15 that updates the parameters of the feature extraction unit 11 and the inference unit 14 based on the pseudo label and the inferred value.
- a computer that executes the program according to the first embodiment is capable of performing learning by assigning pseudo labels to each point of the feature quantity of a cluster.
- Embodiment 2 Although the first embodiment does not mention the characteristics of the training data, the information processing device according to the second embodiment acquires data having spatial continuity as training data. As a result, the information processing device according to the second embodiment can assign pseudo labels taking into account the spatial continuity of the training data by extracting features from spatially continuous local data.
- Outline of information processing device 10 is a block diagram showing an example configuration of an information processing device 1A according to embodiment 2.
- the information processing device 1A acquires data having spatial continuity and extracts features from the spatially continuous local data.
- the information processing device 1A then sequentially sets ranges within clusters obtained by clustering the features, generates pseudo labels based on cluster information corresponding to the features included within the ranges, and assigns the pseudo labels to at least one feature included within the ranges.
- the information processing device 1A includes a feature extraction unit 11A, a clustering unit 12A, a pseudo-label assignment unit 13A, an inference unit 14A, a parameter update unit 15A, and a data acquisition unit 16.
- the information processing device 1A is realized by a computer.
- a memory included in the computer stores programs constituting information processing applications for realizing the functions of the feature extraction unit 11A, the clustering unit 12A, the pseudo label assignment unit 13A, the inference unit 14A, the parameter update unit 15A, and the data acquisition unit 16.
- a processor included in the computer executes the information processing application read from the memory, thereby realizing the functions of the feature extraction unit 11A, the clustering unit 12A, the pseudo label assignment unit 13A, the inference unit 14A, the parameter update unit 15A, and the data acquisition unit 16.
- the data acquisition unit 16 acquires data having spatial continuity.
- the data acquisition unit 16 acquires the data having spatial continuity by connecting to a sensor device or a server via a network using a communication unit not shown in FIG. 10 .
- Examples of data with spatial continuity include time series data, audio data, image data, video data, spectrograms, text data, three-dimensional point clouds, range Doppler, and graph data.
- Time series data and audio data are one-dimensional data with temporal continuity.
- Image data is two-dimensional data that represents two-dimensional space.
- Video data is data that represents two-dimensional space and also has a time dimension.
- Spectrograms have the dimensions of time and frequency.
- Text data is one-dimensional data with an order dimension.
- Three-dimensional point clouds are data in three-dimensional space.
- Range Doppler is data with the dimensions of distance and velocity.
- Graph data is data represented by node connections.
- feature points extracted from spatially continuous local data may be scalars or vectors.
- features of univariate time-series data are represented by scalars
- features of multivariate time-series data are represented by vectors.
- Color image data is represented by a three-dimensional vector in which each pixel value has color information of R (red), G (green), and B (blue).
- the feature extraction unit 11A extracts features from spatially continuous local data acquired by the data acquisition unit 16. These features also have spatial continuity.
- the feature extraction unit 11A extracts features by performing one-dimensional convolution, two-dimensional convolution, three-dimensional convolution, or graph convolution.
- the clustering unit 12A determines the cluster to which the feature value belongs by clustering the feature values having spatial continuity.
- the clustering unit 12A may perform hard clustering or soft clustering.
- Hard clustering is a method of assigning each feature point to only a unique cluster, and assigns, as cluster information, for example, a single cluster number (an integer value) to each point in the cluster. Examples of hard clustering include k-means clustering.
- Soft clustering is a method in which each point of a feature quantity has the possibility of belonging to multiple clusters, and a score (real value) is assigned to each cluster as cluster information. Examples of soft clustering include GMM.
- the pseudo label assignment unit 13A sequentially sets ranges within a cluster, generates pseudo labels based on cluster information corresponding to feature amounts included in the ranges, and assigns the pseudo label to at least one feature amount included in the ranges. For example, the pseudo labeling unit 13A aggregates information on spatially continuous local clusters within the above range and assigns a pseudo label to the clusters. In the case of hard clustering, the pseudo-labeling unit 13A assigns information that aggregates cluster numbers (integer values) as pseudo labels. The aggregation may be the mode, median, mean, minimum, maximum, or quantile of the cluster numbers assigned to points of feature quantities included in a range set within the cluster. In the case of soft clustering, the pseudo label assignment unit 13A assigns, for example, information that aggregates scores (real-valued values) for clusters as pseudo labels.
- the inference unit 14A calculates an inferred value based on the feature extracted from the data by the feature extraction unit 11A. For example, the inference unit 14A calculates the inferred value using a neural network. When the feature is input to the input layer of the neural network, the feature propagates from the input layer to the output layer via the hidden layer, and a final inferred value is generated. In the fully connected layer, a bias is added to the inner product of the input and the weight, and an output value is calculated using an activation function. When the activation function is a sigmoid function, an inferred value for a binary category is output. When the activation function is a Softmax function, an inferred value for a multi-value category is output.
- the parameter update unit 15A updates the parameters of the feature extraction unit 11A and the inference unit 14A based on the pseudo label and the inference value. For example, the parameter update unit 15A updates the parameters of the feature extraction unit 11A and the inference unit 14A so that the inference value output from the inference unit 14A matches the pseudo label output from the pseudo label assignment unit 13A.
- FIG. 11 is a flowchart showing a learning method according to the second embodiment, illustrating the detailed processing of step ST3 in FIG.
- the data acquisition unit 16 acquires data having spatial continuity (step ST1B).
- the feature extraction unit 11A extracts feature amounts from spatially continuous local data (step ST2B).
- the clustering unit 12A determines the cluster to which the feature belongs by clustering the feature in the feature space (step ST3B).
- the pseudo label assignment unit 13A sequentially sets moving windows within the cluster, generates pseudo labels based on cluster information corresponding to the features included in the moving window, and assigns the pseudo label to at least one feature included in the range (step ST4B).
- the inference unit 14A calculates an inference value based on the feature amount (step ST5B).
- the parameter update unit 15A updates the parameters of the feature extraction unit 11A and the inference unit 14A based on the pseudo label and the inference value (step ST6B).
- FIG. 12A is a diagram showing an overview of the process of assigning pseudo labels to feature points within a cluster, which is time-series data.
- the pseudo label assignment unit 13A sets a moving window W1 for cluster D1, which is time-series data.
- the moving window W1 includes point P1, which is the target of assigning a pseudo label, and its neighboring point PN.
- the pseudo label assignment unit 13A calculates a pseudo label by aggregating the label numbers of points P1 and PN included in the moving window W1.
- the pseudo label assignment unit 13A assigns the pseudo label to point P1, which is the target of assignment. By repeating this process, pseudo labels are assigned to all points included in cluster D1. This enables the information processing device 1A to perform learning by assigning pseudo labels to each feature point, which is time-series data.
- FIG. 12B is a diagram showing an overview of the process of assigning pseudo labels to feature points within a cluster, which is two-dimensional data.
- the pseudo label assignment unit 13A sets a moving window W2 within cluster D2, which is two-dimensional data.
- the moving window W2 includes point P2, which is the target of assigning a pseudo label, and its neighboring points.
- the pseudo label assignment unit 13A calculates a pseudo label by aggregating the label numbers of the points included in the moving window W2.
- the pseudo label assignment unit 13A assigns a pseudo label to point P2, which is the target of assignment.
- pseudo labels are assigned to all points included in cluster D2. This enables the information processing device 1A to perform learning by assigning pseudo labels to each feature point, which is two-dimensional data.
- Figure 12C is a diagram showing an overview of the process of assigning pseudo labels to feature points within a cluster, which is three-dimensional data.
- the pseudo label assignment unit 13A sets a moving window W3 within cluster D3.
- the moving window W3 includes point P3, which is the target of assigning a pseudo label, and its neighboring points.
- the pseudo label assignment unit 13A calculates a pseudo label by aggregating the label numbers of the points included in the moving window W3.
- the pseudo label assignment unit 13A assigns the pseudo label to point P3, which is the target of assignment.
- pseudo labels are assigned to all points included in cluster D3. This enables the information processing device 1A to perform learning by assigning pseudo labels to each feature point, which is three-dimensional data.
- FIG. 12D is a diagram showing an overview of the process of assigning pseudo labels to feature points within a cluster, which is graph data.
- the pseudo label assignment unit 13A sets a moving window W4 within cluster D4.
- the moving window W4 includes point P4, which is the target of assigning a pseudo label, and its neighboring points.
- the pseudo label assignment unit 13A calculates a pseudo label by aggregating the label numbers of the points included in the moving window W4.
- the pseudo label assignment unit 13A assigns the pseudo label to point P4, which is the target of assignment.
- pseudo labels are assigned to all points included in cluster D4. This enables the information processing device 1A to perform learning by assigning pseudo labels to each feature point, which is graph data.
- the information processing device 1A acquires time series data as a learning dataset, performs hard clustering on each point of the feature extracted from the time series data, and assigns cluster information, which is an integer value indicating a cluster number, to each point.
- the time-series data is, for example, time-series data of acceleration and angular acceleration detected by sensors attached to a person's hands and feet. This time-series data is segmented by the type of movement of the person wearing the sensor, such as walking, running, or sitting. Since the true value of the segment, i.e., the type of movement, is unknown, no training data is provided.
- the operation of the information processing device 1A according to the first modification will be described below with reference to the flowcharts of FIGS.
- the feature extraction unit 11A extracts features by one-dimensional convolution calculation using a CNN.
- the information processing device 1A randomly initializes parameters of a one-dimensional convolutional layer in a CNN (step ST1).
- the data acquisition unit 16 acquires time-series data of acceleration and angular acceleration detected by sensors attached to the person's hands and feet (step ST2), which enables the information processing device 1A to perform learning by assigning pseudo-labels to each point of spatially continuous local feature quantities.
- the information processing device 1A uses the time-series data acquired by the data acquisition unit 16 as learning data and repeatedly executes the following learning process (processing from step ST1A to step ST5A) a specified number of times (step ST3).
- the feature extraction unit 11A extracts a time series of features from time series data using a one-dimensional convolutional layer of a CNN (step ST1A).
- the clustering unit 12A performs hard clustering on each point of the time-series feature extracted by the feature extraction unit 11A using k-means with a cluster count of 10 (step ST2A). As a result, an integer value indicating a cluster number is assigned as cluster information to each point of the feature included in a cluster.
- the pseudo label assignment unit 13A sequentially sets a moving window of size 5 within the cluster at each time point, takes a majority vote on the cluster information within the moving window to convert it into a pseudo label, and sequentially assigns the pseudo label to the feature at the target time point (step ST3A).
- the inference unit 14A converts the feature quantity at each time point into a 10-dimensional vector using a fully connected layer of a CNN and a Softmax function (step ST4A).
- the parameter update unit 15A updates each parameter of the one-dimensional convolutional layer of the CNN used by the feature extraction unit 11A and the fully connected layer of the CNN used by the inference unit 14A using stochastic gradient descent so as to minimize the value of the cross-entropy loss function calculated based on the pseudo label at each time point and the 10-dimensional vector (step ST5A).
- the processes from step ST1A to step ST5A are repeatedly executed until the designated number of times is reached (step ST4).
- the inference unit 14A performs segmentation on the inference result data (step ST5). For example, the inference unit 14A acquires a time series of features extracted from the time series data by a one-dimensional convolutional layer of a CNN, and converts the features at each time point into a 10-dimensional vector using a fully connected layer and a Softmax function. The inference unit 14A outputs, as a segment, the cluster in which the 10-dimensional vector is maximized at each time point. This allows the information processing device 1A to segment the time series data of acceleration and angular acceleration detected by sensors attached to a person's hands and feet by the type of movement, such as walking, running, or sitting, of the person wearing the sensor.
- the information processing device 1A acquires image data, which is two-dimensional data, as a learning dataset, and performs soft clustering on each point of feature quantities extracted from the image data.
- the information processing device 1A segments the image data on a pixel-by-pixel basis.
- the information processing device 1A uses a trained model trained in advance for segmentation, but performs segmentation in a target domain that is different from the source domain trained in advance. It is assumed that there is no training data in the target domain.
- the operation of the information processing device 1A in Variation 2 will be described below with reference to the flowcharts of FIGS. 2 and 3 .
- the feature extraction unit 11A extracts features by two-dimensional convolution calculation using a CNN.
- the information processing device 1A randomly initializes parameters of a two-dimensional convolutional layer in a CNN (step ST1).
- the data acquisition unit 16 acquires image data (step ST2).
- the information processing device 1A uses the image data acquired by the data acquisition unit 16 as learning data and repeatedly executes the following learning process (processing from step ST1A to step ST5A) a specified number of times (step ST3).
- the feature extraction unit 11A extracts a time series of features from image data using a two-dimensional convolutional layer of a CNN (step ST1A).
- the clustering unit 12A performs soft clustering on the feature of each pixel extracted from the image data by the feature extraction unit 11A using a GMM with 100 clusters (step ST2A).
- a cluster likelihood which is a score for the cluster, is assigned to the feature of each pixel included in a cluster as cluster information.
- the pseudo label assignment unit 13A sequentially sets a moving window, which is an averaging filter of size 3x3, for the cluster, averages the cluster likelihood of each pixel using the averaging filter, calculates the maximum cluster likelihood as a pseudo label, and sequentially assigns pseudo labels to the features of the pixels to be assigned (step ST3A).
- the inference unit 14A converts the feature quantity of each pixel into a 100-dimensional vector using a fully connected layer of a CNN and a Softmax function (step ST4A).
- the parameter update unit 15A updates each parameter of the two-dimensional convolutional layer of the CNN used by the feature extraction unit 11A and the fully connected layer of the CNN used by the inference unit 14A using stochastic gradient descent so as to minimize the value of the cross-entropy loss function calculated based on the pseudo label of each pixel and the 100-dimensional vector (step ST5A).
- the processes from step ST1A to step ST5A are repeatedly executed until the designated number of times is reached (step ST4).
- the inference unit 14A performs segmentation on the inference result data (step ST5). For example, the inference unit 14A acquires the feature of each pixel extracted from the image data by the two-dimensional convolutional layer of the CNN, and converts the feature of each pixel into a 100-dimensional vector using a fully connected layer and a Softmax function. The inference unit 14A outputs, as a segment, the cluster in which the 100-dimensional vector for each pixel is maximized. This allows the information processing device 1A to segment image data in a target domain that is different from the pre-trained source domain.
- the information processing device 1A includes a data acquisition unit 16 that acquires spatially continuous data, and a feature extraction unit 11A that extracts feature vectors from spatially continuous local data among the acquired data. This enables the information processing device 1A to perform learning by assigning pseudo-labels to each point of spatially continuous local feature values.
- the data acquisition unit 16 acquires time-series data. This enables the information processing device 1A to perform learning by assigning pseudo-labels to each point of the feature quantity, which is the time-series data.
- the clustering unit 12A determines clusters by hard clustering. This enables the information processing device 1A to perform learning by assigning pseudo labels calculated based on cluster information that is assigned only to unique clusters to each feature point within a cluster.
- the cluster information is an integer value indicating the cluster number. This enables the information processing device 1A to perform learning by assigning a pseudo label calculated based on the cluster number to each feature point within a cluster.
- the clustering unit 12A determines clusters by soft clustering. This enables the information processing device 1A to perform learning by assigning pseudo labels calculated based on cluster information that can be assigned to multiple clusters to each feature point within a cluster.
- the cluster information is a score for the cluster. This enables the information processing device 1A to perform learning by assigning a pseudo label calculated based on the score for the cluster to each feature point within the cluster.
- the information processing device disclosed herein can be used for various information processing tasks, such as unsupervised learning.
- 1, 1A Information processing device, 11, 11A: Feature extraction unit, 12, 12A: Clustering unit, 13, 13A: Pseudo label assignment unit, 14, 14A: Inference unit, 15, 15A: Parameter update unit, 16: Data acquisition unit, 100: Input interface, 101: Output interface, 102: Processor, 103: Memory.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Image Analysis (AREA)
Abstract
Description
本開示は、情報処理装置、学習方法およびプログラムに関する。 This disclosure relates to an information processing device, a learning method, and a program.
データから抽出した特徴量をクラスタリングし、クラスタリングで得られたクラスタを疑似ラベルとして、分類器を学習する技術が知られている。例えば、非特許文献1には、画像データから抽出した特徴量をクラスタリングし、クラスタリングで得られたクラスタを、画像一枚ごとにクラスタを疑似ラベルとして付与する技術が記載されている。 A technique is known in which features extracted from data are clustered, and the clusters obtained by the clustering are used as pseudo labels to train a classifier. For example, Non-Patent Document 1 describes a technique in which features extracted from image data are clustered, and the clusters obtained by the clustering are assigned as pseudo labels to each image.
非特許文献1に記載される従来の技術では、クラスタ内の特徴量の各点にラベルが付与されていない場合、これらの点に疑似ラベルを付与した学習をすることができないという課題があった。例えば、従来の技術は、クラスタに相当する画像一枚ごとに疑似ラベルを付与することを想定しているので、画像内のピクセルごとに疑似ラベルを付与することが求められる画像のセグメンテーションに適用することができない。 The conventional technology described in Non-Patent Document 1 had the problem that if labels were not assigned to each feature point within a cluster, it was not possible to perform learning by assigning pseudo-labels to these points. For example, the conventional technology assumes that a pseudo-label is assigned to each image corresponding to a cluster, and therefore cannot be applied to image segmentation, which requires that a pseudo-label be assigned to each pixel within an image.
本開示は上記課題を解決するものであり、クラスタの特徴量の各点に疑似ラベルを付与した学習が可能な情報処理装置を得ることを目的とする。 The present disclosure aims to solve the above problem and provide an information processing device capable of learning by assigning pseudo labels to each feature point of a cluster.
本開示に係る情報処理装置は、データから特徴量を抽出する特徴量抽出部と、特徴量空間における特徴量のクラスタリングにより特徴量が属するクラスタを決定するクラスタリング部と、クラスタ内で範囲を順次設定し、範囲内に含まれる特徴量に対応するクラスタ情報に基づいて疑似ラベルを生成し、生成した疑似ラベルを範囲内に含まれる少なくとも一つの特徴量に付与する疑似ラベル付与部と、特徴量に基づいて推論値を算出する推論部と、疑似ラベルおよび推論値に基づいて特徴量抽出部および推論部のパラメータを更新するパラメータ更新部と、を備える。 The information processing device according to the present disclosure includes a feature extraction unit that extracts features from data, a clustering unit that determines the cluster to which a feature belongs by clustering the features in feature space, a pseudo label assignment unit that sequentially sets ranges within the cluster, generates pseudo labels based on cluster information corresponding to the features included within the range, and assigns the generated pseudo label to at least one feature included within the range, an inference unit that calculates an inferred value based on the feature, and a parameter update unit that updates parameters of the feature extraction unit and the inference unit based on the pseudo label and the inferred value.
本開示に係る情報処理装置によれば、クラスタの特徴量の各点に疑似ラベルを付与した学習が可能である。 The information processing device disclosed herein enables learning in which pseudo labels are assigned to each feature point of a cluster.
実施の形態1.
教師あり学習では、データセットに正解ラベル(以下、単に「ラベル」と記載する。)が付与されるが、教師なし学習では、ラベルのないデータが用いられる。ここで、ラベルとは、データセットを構成するデータが分類されるクラスタを識別するための識別情報である。クラスタは、類似した特徴量の点の集合である。特徴量は、データセットを構成する各データを数値またはカテゴリ等の形式で表現したものである。同じクラスタ内には、他のクラスタとは異なる特性を共有する特徴量の点が分類される。
Embodiment 1.
In supervised learning, a correct label (hereinafter simply referred to as "label") is assigned to a dataset, whereas in unsupervised learning, unlabeled data is used. Here, a label is identification information used to identify the cluster into which the data constituting the dataset is classified. A cluster is a set of points with similar features. Features are expressed in the form of numbers or categories, etc., of each piece of data constituting the dataset. Points with features that share characteristics different from those of other clusters are classified within the same cluster.
実施の形態1に係る情報処理装置は、推論部を備え、教師なし学習により推論モデルのパラメータを更新することで、推論部の学習を行い、学習済みの推論部によって推論処理を行う装置である。ここで、実施の形態1に係る情報処理装置が実行する教師なし学習では、ラベルのないデータから抽出された特徴量に対応する疑似ラベルを算出し、算出した疑似ラベルをクラスタ内の特徴量の各点に付与するものである。なお、疑似ラベルとは、ラベルのないデータに対して疑似的に付与されるラベルである。 The information processing device according to the first embodiment is a device that includes an inference unit, and that performs inference processing by learning the inference unit by updating the parameters of an inference model through unsupervised learning. Here, the unsupervised learning performed by the information processing device according to the first embodiment calculates pseudo labels corresponding to features extracted from unlabeled data, and assigns the calculated pseudo labels to each feature point within a cluster. Note that a pseudo label is a label that is artificially assigned to unlabeled data.
(情報処理装置の概要)
図1は、実施の形態1に係る情報処理装置1の構成例を示すブロック図である。図1において、情報処理装置1は、データセットとして取得されたデータから特徴量を抽出し、特徴量空間の特徴量をクラスタリングし、クラスタ内で範囲を順次設定して、この範囲内に含まれる特徴量に対応するクラスタ情報に基づいて疑似ラベルを生成し、疑似ラベルを範囲内に含まれる少なくとも一つの特徴量に付与する。例えば、特徴量空間の各点には、特徴量の時系列データの各時刻の点、または、画像の各ピクセルの特徴量等がある。
(Outline of information processing device)
1 is a block diagram showing an example configuration of an information processing device 1 according to embodiment 1. In FIG. 1, the information processing device 1 extracts features from data acquired as a dataset, clusters the features in a feature space, sequentially sets ranges within the clusters, generates pseudo labels based on cluster information corresponding to the features included in the ranges, and assigns the pseudo labels to at least one feature included in the ranges. For example, each point in the feature space represents a point at each time in time-series data of features, or a feature of each pixel of an image, etc.
クラスタ情報は、特徴量を示す点と、この点が分類されたクラスタとの関係を示す情報であり、データから抽出された特徴量をクラスタリングする過程で生成される。クラスタ情報には、特徴量を示す点がどのクラスタに分類されたかを示す情報、クラスタの基準点を示す情報、または、クラスタの特性または属性を示す情報等が含まれる。特徴量を示す点がどのクラスタに分類されたかを示す情報には、例えば、クラスタを識別するためのクラスタ番号を示す情報がある。クラスタの基準点としては、クラスタの中心点、重心点、平均値、または中央値等がある。クラスタの特性または属性を示す情報は、特性または属性が数値化された情報であり、例えば、クラスタの特性を評価するためのスコアであってもよい。 Cluster information is information that indicates the relationship between points that indicate feature quantities and the clusters into which these points are classified, and is generated in the process of clustering features extracted from data. Cluster information includes information that indicates which cluster a point that indicates a feature quantity has been classified into, information that indicates the reference point of the cluster, or information that indicates the characteristics or attributes of the cluster. Information that indicates which cluster a point that indicates a feature quantity has been classified into includes, for example, information that indicates a cluster number used to identify the cluster. Examples of reference points for a cluster include the center point, center of gravity, average value, or median of the cluster. Information that indicates the characteristics or attributes of a cluster is information in which the characteristics or attributes have been quantified, and may be, for example, a score used to evaluate the characteristics of the cluster.
クラスタ情報がクラスタの特性を評価するためのスコアである場合、スコアは、特徴量に基づいて算出される。例えば、情報処理装置1は、クラスタ内に設定した範囲内に含まれる特徴量に対応するスコアの平均値を疑似ラベルとして範囲内に含まれる付与対象点に付与する。そして、情報処理装置1は、クラスタ内に含まれる全ての点が付与対象点になるように、上記範囲を順次設定して各点に疑似ラベルを付与する。これにより、情報処理装置1は、特徴量空間に含まれる特徴量の全ての点に対して疑似ラベルを付与する学習を行うことが可能である。 When the cluster information is a score for evaluating the characteristics of a cluster, the score is calculated based on the feature amount. For example, the information processing device 1 assigns the average value of the scores corresponding to the feature amounts included within a range set within the cluster as a pseudo label to the target points included within the range. The information processing device 1 then sets the above range sequentially and assigns a pseudo label to each point so that all points included in the cluster become target points. This allows the information processing device 1 to learn to assign pseudo labels to all feature amount points included in the feature amount space.
また、図1に示すように、情報処理装置1は、特徴量抽出部11、クラスタリング部12、疑似ラベル付与部13、推論部14およびパラメータ更新部15を備える。例えば、情報処理装置1は、コンピュータにより実現される。コンピュータが備えるメモリには、特徴量抽出部11、クラスタリング部12、疑似ラベル付与部13、推論部14、およびパラメータ更新部15の各機能を実現するための情報処理アプリケーションのプログラムが記憶されている。コンピュータが備えるプロセッサが、メモリから読み出した情報処理アプリケーションを実行することによって、特徴量抽出部11、クラスタリング部12、疑似ラベル付与部13、推論部14およびパラメータ更新部15の各機能が実現される。 As shown in FIG. 1, the information processing device 1 also includes a feature extraction unit 11, a clustering unit 12, a pseudo label assignment unit 13, an inference unit 14, and a parameter update unit 15. For example, the information processing device 1 is implemented by a computer. A memory included in the computer stores information processing application programs for implementing the functions of the feature extraction unit 11, the clustering unit 12, the pseudo label assignment unit 13, the inference unit 14, and the parameter update unit 15. A processor included in the computer executes the information processing application read from the memory, thereby implementing the functions of the feature extraction unit 11, the clustering unit 12, the pseudo label assignment unit 13, the inference unit 14, and the parameter update unit 15.
(特徴量抽出部)
特徴量抽出部11は、データから特徴量を抽出する。例えば、3次元空間のデータから特徴量を抽出する場合、特徴量抽出部11は、データがどのような目的で使用されるかに応じて、抽出対象の特徴量を選択する。3次元空間に存在するオブジェクトの位置情報、経常情報、色情報、またはこれらの組み合わせが、抽出対象の特徴量として選択される。続いて、特徴量抽出部11は、選択した特徴量を抽出する。
(Feature extraction unit)
The feature extraction unit 11 extracts features from data. For example, when extracting features from data in a three-dimensional space, the feature extraction unit 11 selects features to be extracted depending on the purpose for which the data will be used. Position information, normal information, color information, or a combination of these, of an object existing in the three-dimensional space is selected as the feature to be extracted. Then, the feature extraction unit 11 extracts the selected feature.
例えば、位置情報が特徴量として選択された場合、特徴量抽出部11は、オブジェクトの位置座標を特徴量として抽出する。
また、抽出対象の特徴量として形状情報が選択された場合、特徴量抽出部11は、オブジェクトの形状を表す特徴量(体積、表面積または長さ等)を算出する。抽出対象の特徴量として色情報が選択された場合、特徴量抽出部11は、オブジェクトの色相、彩度または明度等を抽出する。
また、特徴量抽出部11は、特徴量のスケーリングを行ってもよい。例えば、特徴量抽出部11は、位置情報のスケールが異なる場合、正規化を行って各次元の範囲を統一してもよい。また、色情報の場合、特徴量抽出部11は、色値の範囲をスケーリングして統一してもよい。
For example, when position information is selected as the feature, the feature extracting unit 11 extracts the position coordinates of the object as the feature.
Furthermore, when shape information is selected as the feature to be extracted, the feature extraction unit 11 calculates a feature (volume, surface area, length, etc.) representing the shape of the object. When color information is selected as the feature to be extracted, the feature extraction unit 11 extracts the hue, saturation, brightness, etc. of the object.
The feature extraction unit 11 may also scale the features. For example, if the scales of the position information are different, the feature extraction unit 11 may perform normalization to unify the ranges of each dimension. In the case of color information, the feature extraction unit 11 may scale the ranges of color values to unify them.
特徴量を抽出する学習用データには、センサデータ、ウェブデータ、画像データ、映像データ、および、テキストデータ等がある。
センサデータは、物理的なセンサにより検出されたデータである。例えば、気象センサにより検出された気温、湿度または気圧等のデータ、センサネットワークにより検出された環境モニタリングデータ、医療センサにより検出された患者の生体情報等がある。特徴量抽出部11は、情報処理装置1によってセンサから取得されたセンサデータから特徴量を抽出することが可能である。
The learning data from which features are extracted includes sensor data, web data, image data, video data, text data, and the like.
The sensor data is data detected by a physical sensor. For example, the sensor data may include data such as temperature, humidity, or atmospheric pressure detected by a weather sensor, environmental monitoring data detected by a sensor network, and patient biometric information detected by a medical sensor. The feature extraction unit 11 can extract features from the sensor data acquired from the sensor by the information processing device 1.
ウェブデータは、ウェブページからの情報またはユーザの行動データといったインターネット上で利用可能なデータである。情報処理装置1は、図1において図示しない通信部を用いて、ウェブスクレイピングによりウェブデータを取得してもよい。また、情報処理装置1は、上記通信部を用いて、ソーシャルメディアからデータを取得してもよい。特徴量抽出部11は、通信部を用いて、情報処理装置1によって取得されたウェブデータから特徴量を抽出することが可能である。 Web data is data available on the Internet, such as information from web pages or user behavior data. The information processing device 1 may acquire web data by web scraping using a communication unit not shown in FIG. 1. The information processing device 1 may also acquire data from social media using the communication unit. The feature extraction unit 11 is capable of extracting features from the web data acquired by the information processing device 1 using the communication unit.
画像データおよび映像データは、カメラまたはセンサを用いて収集される情報であり、例えば、監視カメラによって撮影された映像データ、医療画像データ、自動車のディテクションまたはセグメンテーションのためのカメラ画像データ等が含まれる。
特徴量抽出部11は、情報処理装置1によってカメラまたはセンサから取得された画像データまたは映像データから特徴量を抽出することが可能である。
Image data and video data are information collected using cameras or sensors, and include, for example, video data captured by surveillance cameras, medical image data, camera image data for automobile detection or segmentation, and the like.
The feature extraction unit 11 is capable of extracting features from image data or video data acquired by the information processing device 1 from a camera or a sensor.
テキストデータは、書籍、ニュース記事、ウェブページ、ソーシャルメディアの投稿等から収集されるテキスト情報である。自然言語処理(NLP)タスクまたはテキストマイニングに用いられる。特徴量抽出部11は、通信部を用いて、情報処理装置1によって取得されたテキストデータから特徴量を抽出することが可能である。 Text data is text information collected from books, news articles, web pages, social media posts, etc. It is used for natural language processing (NLP) tasks or text mining. The feature extraction unit 11 is able to extract features from text data acquired by the information processing device 1 using the communication unit.
(畳み込み演算による特徴量の抽出)
また、特徴量抽出部11は、畳み込み演算によって特徴量を抽出してもよい。畳み込み演算には、例えば、畳み込みニューラルネットワーク(CNN)が用いられる。畳み込み演算では、畳み込みカーネルと呼ばれる行列が用いられる。畳み込みカーネルとは、入力データに畳み込み演算を行う際に用いられる重み行列である。畳み込みカーネルは、特定のパターンまたは特徴を検出するためのフィルタとして機能する。特徴量抽出部11は、畳み込みカーネルをデータの各位置に沿ってスライドさせ、畳み込みカーネルの各要素とデータの対応する要素との積を順に算出し、畳み込みカーネルの特定の位置の値を特徴量として抽出する。これにより、データの局所的な特徴を示す特徴量を抽出することが可能である。
(Feature extraction using convolution operations)
The feature extraction unit 11 may also extract features by a convolution operation. For example, a convolutional neural network (CNN) is used for the convolution operation. A matrix called a convolution kernel is used in the convolution operation. The convolution kernel is a weight matrix used when performing a convolution operation on input data. The convolution kernel functions as a filter for detecting specific patterns or features. The feature extraction unit 11 slides the convolution kernel along each position of the data, sequentially calculates the product of each element of the convolution kernel and the corresponding element of the data, and extracts the value at a specific position of the convolution kernel as a feature. This makes it possible to extract features that indicate local characteristics of the data.
また、特徴量抽出部11は、自身に設定されたパラメータの値に基づいて、データから特徴量を抽出する。例えば、特徴量抽出部11が、CNNのモデルを用いて畳み込み演算によって特徴量を抽出する場合、畳み込みカーネルのサイズとストライドを設定するためのパラメータが特徴量抽出に影響する。畳み込みカーネルのサイズが大きいほど、より広範囲の特徴量が抽出され、ストライドが大きいほど、抽出される特徴量の数が減少する。
また、特徴量抽出部11が、時系列データまたはテキストデータなどのシーケンスデータに対して移動窓を移動させながら特徴量を抽出する場合、窓幅とステップサイズを設定するためのパラメータの値が特徴量の抽出に影響する。窓幅が大きいほど、より広範囲の特徴量が抽出され、ステップサイズが大きいほど、特徴量の数が減少する。
The feature extraction unit 11 also extracts features from data based on the values of parameters set therein. For example, when the feature extraction unit 11 extracts features through a convolution operation using a CNN model, parameters for setting the size and stride of the convolution kernel affect the feature extraction. The larger the convolution kernel size, the wider the range of features extracted, and the larger the stride, the fewer the number of features extracted.
Furthermore, when the feature extraction unit 11 extracts features by moving a moving window over sequence data such as time-series data or text data, the values of parameters for setting the window width and step size affect the feature extraction: the larger the window width, the wider the range of features extracted, and the larger the step size, the fewer the number of features.
畳み込み演算によって特徴量を抽出するデータが、1次元データ、2次元テータ、3次元データ、および、グラフデータである場合に、特徴量抽出部11は、1次元畳み込み、2次元畳み込み、3次元畳み込み、および、グラフ畳み込みを行って特徴量を抽出する。 When the data from which features are to be extracted by convolution operations is one-dimensional data, two-dimensional data, three-dimensional data, or graph data, the feature extraction unit 11 performs one-dimensional convolution, two-dimensional convolution, three-dimensional convolution, or graph convolution to extract features.
(1)1次元畳み込み
例えば、1次元データには、時系列データ、音声データまたはセンサデータ等がある。畳み込み演算によって1次元データから特徴量を抽出する場合、畳み込みを行うデータの長さがカーネルサイズとなり、畳み込みカーネルは1次元配列で定義される。
特徴量抽出部11は、畳み込みカーネルをデータの各位置に沿ってスライドさせ、畳み込みカーネルの各要素とデータの対応する要素との積を順に算出する。畳み込みカーネルの特定の位置の値を特徴量として抽出することにより、データ内の近隣の要素間の関係が強調された局所的な特徴を示す特徴量が抽出される。
なお、特徴量を抽出する特徴量の位置は、畳み込みカーネルの先頭であってもよいし、中央であってもよいし、末尾であってもよい。
(1) One-dimensional convolution For example, one-dimensional data includes time series data, audio data, sensor data, etc. When extracting features from one-dimensional data by convolution, the length of the data to be convolved becomes the kernel size, and the convolution kernel is defined as a one-dimensional array.
The feature extraction unit 11 slides the convolution kernel along each position of the data and sequentially calculates the product of each element of the convolution kernel and the corresponding element of the data. By extracting the value at a specific position of the convolution kernel as a feature, a feature indicating a local feature that emphasizes the relationship between neighboring elements in the data is extracted.
The position of the feature from which the feature is extracted may be the beginning, the center, or the end of the convolution kernel.
(2)2次元畳み込み
例えば、2次元データには、画像データ等がある。畳み込み演算によって2次元データから特徴量を抽出する場合、畳み込みカーネルは、2次元行列で定義される。特徴量抽出部11は、畳み込みカーネルをデータの各位置に沿ってスライドさせ、畳み込みカーネルの各要素とデータの対応する要素との積を順に算出する。畳み込みカーネルの特定の位置の値を特徴量として抽出することにより、データ内の近隣の要素間の関係が強調された局所的な特徴を示す特徴量が抽出される。
なお、特徴量を抽出する特徴量の位置は、畳み込みカーネルの先頭であってもよいし、中央であってもよいし、末尾であってもよい。
(2) Two-dimensional convolution For example, two-dimensional data includes image data. When extracting features from two-dimensional data by convolution calculation, the convolution kernel is defined as a two-dimensional matrix. The feature extraction unit 11 slides the convolution kernel along each position of the data and sequentially calculates the product of each element of the convolution kernel and the corresponding element of the data. By extracting values at specific positions of the convolution kernel as features, features indicating local features that emphasize the relationship between neighboring elements in the data are extracted.
The position of the feature from which the feature is extracted may be the beginning, the center, or the end of the convolution kernel.
(3)3次元畳み込み
例えば、3次元データには、映像データ等がある。畳み込み演算によって3次元データから特徴量を抽出する場合、畳み込みカーネルは、幅、高さ、チャンネルの3つの次元を有する3次元行列で定義される。特徴量抽出部11は、畳み込みカーネルをデータの各位置に沿ってスライドさせ、畳み込みカーネルの各要素とデータの対応する要素との積を順に算出する。畳み込みカーネルの特定の位置の値を特徴量として抽出することにより、データ内の近隣の要素間の関係が強調された局所的な特徴を示す特徴量が抽出される。
なお、特徴量を抽出する特徴量の位置は、畳み込みカーネルの先頭であってもよいし、中央であってもよいし、末尾であってもよい。
(3) 3D Convolution For example, 3D data includes video data. When extracting features from 3D data using a convolution operation, the convolution kernel is defined as a 3D matrix having three dimensions: width, height, and channel. The feature extraction unit 11 slides the convolution kernel along each position of the data and sequentially calculates the product of each element of the convolution kernel and the corresponding element of the data. By extracting values at specific positions of the convolution kernel as features, features indicating local features that emphasize the relationship between neighboring elements in the data are extracted.
The position of the feature from which the feature is extracted may be the beginning, the center, or the end of the convolution kernel.
(4)グラフ畳み込み
特徴量抽出部11は、グラフ構造を有するデータに畳み込み演算を行うグラフ畳み込みを行ってもよい。グラフ構造を有するデータには、例えば、ソーシャルネットワーク、電力網、分子構造などがある。グラフ畳み込みでは、入力データとしてグラフ構造が与えられる。グラフは、ノード(頂点)とエッジ(辺)の集合で構成され、ノードは、データの要素を表し、エッジはノード間の関係性を表している。特徴量抽出部11が、グラフ上で畳み込みカーネルを移動させながら畳み込み演算を行うことにより、グラフにおける特定のパターン等を特徴量として抽出することができる。
(4) Graph Convolution The feature extraction unit 11 may perform graph convolution, which is a convolution operation on data having a graph structure. Examples of data having a graph structure include social networks, power grids, and molecular structures. In graph convolution, a graph structure is provided as input data. A graph is composed of a set of nodes (vertices) and edges (lines), where the nodes represent elements of the data and the edges represent relationships between the nodes. The feature extraction unit 11 performs a convolution operation while moving a convolution kernel on the graph, thereby extracting specific patterns in the graph as features.
また、特徴量抽出部11は、3次元データについて空間方向に2次元畳み込みを行ってから、時間方向の1次元畳み込みを行ってもよい。
特徴量抽出部11は、画像データを固定長の特徴量ベクトルに変換してから、時間方向の1次元畳み込みを行ってもよい。
特徴量抽出部11は、畳み込みカーネルを用いてデータにおいて一定間隔空けた要素の畳み込みを行うディレイテッド畳み込みを行ってもよい。
Alternatively, the feature extraction unit 11 may perform two-dimensional convolution on the three-dimensional data in the spatial direction, and then perform one-dimensional convolution on the three-dimensional data in the time direction.
The feature extraction unit 11 may convert the image data into a feature vector of a fixed length, and then perform one-dimensional convolution in the time direction.
The feature extraction unit 11 may perform derated convolution, which uses a convolution kernel to convolve elements at regular intervals in the data.
CNNを用いた特徴量抽出を示したが、特徴量抽出部11は、LSTM(Long Short-Term Memory)またはGRU(Gated Recurrent Unit)などの再帰的ニューラルネット(RNN)を用いて特徴量を抽出してもよい。
また、特徴量抽出部11は、TransformerなどのAttentionまたはそれを代替するMLP-Mixerを用いて、特徴量を抽出してもよい。
Although feature extraction using a CNN has been described, the feature extraction unit 11 may extract features using a recursive neural network (RNN) such as a long short-term memory (LSTM) or a gated recurrent unit (GRU).
Furthermore, the feature extraction unit 11 may extract features using an attention such as a transformer or an MLP-Mixer that replaces it.
(クラスタリング部)
クラスタリング部12は、特徴量空間における特徴量のクラスタリングにより特徴量が属するクラスタを決定する。クラスタリングは、類似した特徴量の各点をグループ化する処理である。例えば、クラスタリング部12は、特徴量空間における点と点との間の距離または類似度を算出し、算出した距離または類似度に基づいて特徴量のクラスタリングを行う。距離には、ユークリッド距離またはマンハッタン距離がある。また、類似度には、コサイン類似度がある。
(Clustering Department)
The clustering unit 12 determines the cluster to which a feature belongs by clustering the feature in the feature space. Clustering is a process of grouping points with similar feature amounts. For example, the clustering unit 12 calculates the distance or similarity between points in the feature space and clusters the feature amounts based on the calculated distance or similarity. The distance can be Euclidean distance or Manhattan distance. The similarity can be cosine similarity.
代表的なクラスタリング手法としては、例えば、k-means、混合ガウスモデル(GMM)、階層的(凝集型)クラスタリング、スペクトルクラスタリング、DBSCAN、OPTICS、BIRCH、または、Mean Shift等がある。特徴量抽出部11は、選択したクラスタリング手法を用いて、特徴量の各点をクラスタに割り当てる。このとき、特徴量空間内で点と点との間の距離または類似度に基づいてクラスタリングが行われる。 Typical clustering methods include, for example, k-means, Gaussian mixture models (GMM), hierarchical (agglomerative) clustering, spectral clustering, DBSCAN, OPTICS, BIRCH, or Mean Shift. The feature extraction unit 11 uses the selected clustering method to assign each feature point to a cluster. At this time, clustering is performed based on the distance or similarity between points in the feature space.
(疑似ラベル付与部)
疑似ラベル付与部13は、クラスタ内で範囲を順次設定し、範囲内に含まれる特徴量に対応するクラスタ情報に基づいて疑似ラベルを生成し、範囲内に含まれる少なくとも一つの特徴量に疑似ラベルを付与する。疑似ラベル付与部13は、特徴量空間内に設定された各クラスタについて上記処理を行うことにより、特徴量空間内の全ての点に疑似ラベルを付与する。クラスタ内に設定される範囲は、一つまたは複数の特徴量の点が含まれる移動窓であり、例えば、特徴量と同じ次元でクラスタ内を規定する。この範囲は、クラスタ内に含まれる全ての点が疑似ラベルの付与対象になるように順次設定される。
(Pseudo label assignment unit)
The pseudo label assignment unit 13 sequentially sets ranges within a cluster, generates pseudo labels based on cluster information corresponding to feature quantities included within the ranges, and assigns pseudo labels to at least one feature quantity included within the ranges. The pseudo label assignment unit 13 assigns pseudo labels to all points in the feature quantity space by performing the above process for each cluster set in the feature quantity space. The range set within a cluster is a moving window that includes points of one or more feature quantities, and defines the cluster in the same dimension as the feature quantities, for example. This range is sequentially set so that all points included in the cluster are targets for assigning pseudo labels.
疑似ラベルは、クラスタ内に設定された範囲内に含まれる特徴量に対応するクラスタ情報の最頻値、中央値、平均値、最小値または最大値のいずれか一つであってもよい。
例えば、クラスタ内に設定される範囲が、特徴量の1次元データに対する固定長の移動窓であり、移動窓に含まれる特徴量に対応するクラスタ情報がクラスタ番号である場合、疑似ラベル付与部13は、移動窓に含まれる特徴量に対応するクラスタ番号の最頻値、中央値、平均値、最小値または最大値のいずれか一つを、疑似ラベルとして算出する。
クラスタ内に含まれる特徴量に対応するクラスタ情報を統計的に考慮した疑似ラベルを生成することができるので、情報処理装置1は、データの特徴またはパターンをより正確に表現することが可能である。
The pseudo label may be any one of the mode, median, average, minimum, and maximum values of the cluster information corresponding to the feature amount included in the range set in the cluster.
For example, if the range set within a cluster is a fixed-length moving window for one-dimensional data of features, and the cluster information corresponding to the features included in the moving window is a cluster number, the pseudo label assignment unit 13 calculates one of the mode, median, average, minimum, or maximum of the cluster number corresponding to the features included in the moving window as the pseudo label.
Since pseudo labels can be generated that statistically take into account cluster information corresponding to the features contained in the clusters, the information processing device 1 can more accurately represent the features or patterns of the data.
なお、疑似ラベル付与部13は、クラスタ内に設定する範囲(移動窓)のサイズを必要に応じて変更してもよい。上記範囲のサイズを変更することで、異なる範囲で特徴量の抽出が可能になる。例えば、疑似ラベル付与部13は、図1において図示されていない入力装置を用いて処理精度が設定されると、この処理精度に応じたサイズの範囲を設定して、疑似ラベルの付与処理を行う。このとき、上記範囲のサイズが大きく変更されると、特徴量が抽出される範囲が広がり、上記範囲のサイズが小さくされると、より局所的な特徴量が抽出される。また、疑似ラベル付与部13は、上記範囲の大きさを、入力装置を用いてユーザにより設定された大きさに変更してもよい。 The pseudo label assignment unit 13 may change the size of the range (moving window) set within the cluster as needed. Changing the size of the range makes it possible to extract features over different ranges. For example, when the processing accuracy is set using an input device not shown in FIG. 1, the pseudo label assignment unit 13 sets a range size according to this processing accuracy and performs the pseudo label assignment process. In this case, if the size of the range is changed significantly, the range from which features are extracted becomes wider, and if the size of the range is decreased, more localized features are extracted. The pseudo label assignment unit 13 may also change the size of the range to the size set by the user using the input device.
疑似ラベル付与部13は、IIR(Infinite Impulse Response)フィルタ等のフィードバックループを有するフィルタを用いて、1次元の特徴量に疑似ラベルを付与してもよい。例えば、疑似ラベル付与部13は、IIRフィルタのパラメータを調整して移動窓の位置とサイズとを制御してクラスタ内に移動窓を順次設定し、IIRフィルタによって得られた情報を疑似ラベルとして移動窓内の特徴量に付与する。 The pseudo label assignment unit 13 may assign pseudo labels to one-dimensional features using a filter with a feedback loop, such as an IIR (Infinite Impulse Response) filter. For example, the pseudo label assignment unit 13 adjusts the parameters of the IIR filter to control the position and size of the moving window, sequentially setting the moving window within the cluster, and assigns the information obtained by the IIR filter as pseudo labels to the features within the moving window.
疑似ラベル付与部13は、画像フィルタを用いて、2次元の特徴量に疑似ラベルを付与してもよい。画像フィルタは、畳み込み演算を用いて2次元データ上の特定の領域を特定し、その領域内の情報を変換するフィルタである。画像フィルタには、平滑化フィルタ、エッジ検出フィルタ、または、シャープニングフィルタなどがある。例えば、疑似ラベル付与部13は、画像フィルタのパラメータを調整して移動窓の位置とサイズとを制御してクラスタ内に移動窓を順次設定し、画像フィルタによって得られた情報を疑似ラベルとして移動窓内の特徴量に付与する。 The pseudo label assignment unit 13 may assign pseudo labels to two-dimensional features using an image filter. An image filter is a filter that uses a convolution operation to identify a specific area on two-dimensional data and transform the information within that area. Image filters include smoothing filters, edge detection filters, and sharpening filters. For example, the pseudo label assignment unit 13 adjusts the parameters of the image filter to control the position and size of the moving window, sequentially setting the moving window within the cluster, and assigns the information obtained by the image filter as a pseudo label to the feature within the moving window.
疑似ラベル付与部13は、平均化フィルタ、加重平均フィルタ、メディアンフィルタ、または、ラプラシアンフィルタを用いて、疑似レベルを付与してもよい。
平均化フィルタは、例えば、画像上の各ピクセルの値をその周囲のピクセルの値の平均値で置き換えるフィルタである。画像の各ピクセルの値が特徴量である場合、疑似ラベル付与部13は、平均化フィルタを用いて、画像上のクラスタに移動窓を順次設定し、移動窓に含まれる全ピクセルの値の平均値を疑似ラベルとして付与対象の特徴量に付与する。
The pseudo label assignment unit 13 may assign pseudo levels using an averaging filter, a weighted average filter, a median filter, or a Laplacian filter.
The averaging filter is, for example, a filter that replaces the value of each pixel in an image with the average value of the values of its surrounding pixels. When the value of each pixel in an image is a feature, the pseudo label assignment unit 13 uses the averaging filter to sequentially set moving windows on clusters in the image and assigns the average value of all pixel values included in the moving windows as a pseudo label to the feature to be assigned.
加重平均フィルタは、例えば、注目したピクセルの周囲に存在するピクセルの値に重みをかけて平均値を算出するフィルタである。画像の各ピクセルの値が特徴量である場合、疑似ラベル付与部13は、加重平均フィルタを用いて、画像上のクラスタ内に移動窓を順次設定し、移動窓に含まれる全ピクセルの値の平均値を算出し、算出した値を疑似ラベルとして付与対象の特徴量に付与する。 A weighted average filter is, for example, a filter that calculates an average value by weighting the values of pixels surrounding a pixel of interest. When the value of each pixel in an image is a feature, the pseudo label assignment unit 13 uses the weighted average filter to sequentially set a moving window within a cluster on the image, calculate the average value of all pixel values included in the moving window, and assign the calculated value as a pseudo label to the feature to be assigned.
メディアンフィルタは、例えば、移動窓に含まれるピクセルの値の中央値を算出して、算出した値を移動窓の中心のピクセルの値として用いるフィルタである。画像の各ピクセルの値が特徴量である場合に、疑似ラベル付与部13は、メディアンフィルタを用いて、画像上のクラスタ内に移動窓を順次設定し、移動窓に含まれる全ピクセルの値の中央値を算出し、算出した値を疑似ラベルとして付与対象の特徴量に付与する。 A median filter is a filter that, for example, calculates the median of the pixel values included in a moving window and uses the calculated value as the value of the pixel at the center of the moving window. When the value of each pixel in an image is a feature, the pseudo label assignment unit 13 uses a median filter to sequentially set moving windows within clusters on the image, calculates the median of the values of all pixels included in the moving window, and assigns the calculated value as a pseudo label to the feature to be assigned.
ラプラシアンフィルタは、例えば、画像中のエッジまたは特徴を強調するフィルタであり、2次元の微分演算子を使用して画像の急激な変化を検出する。画像の各ピクセルの値が特徴量である場合、疑似ラベル付与部13は、ラプラシアンフィルタを用いて、画像上のクラスタ内に移動窓を順次設定しておき、移動窓に含まれるピクセルの値のうち、急激に変化した値を算出し、算出した値を疑似ラベルとして付与対象の特徴量に付与する。 A Laplacian filter, for example, is a filter that emphasizes edges or features in an image and detects sudden changes in the image using a two-dimensional differential operator. When the value of each pixel in an image is a feature, the pseudo label assignment unit 13 uses a Laplacian filter to sequentially set moving windows within clusters on the image, calculates the values of pixels included in the moving windows that have suddenly changed, and assigns the calculated values as pseudo labels to the feature to be assigned.
疑似ラベル付与部13は、ラベル伝播法またはラベル拡散法を用いて、グラフデータの特徴量に疑似ラベルを付与してもよい。
ラベル伝播法は、グラフ内のノードのラベルを、その周囲のノードのラベルに基づいて更新していく手法である。また、ラベル拡散法は、ラベル伝播法と同様に、グラフ内のノードのラベルを更新する手法であるが、ラベル拡散法では、ノードのラベルのみならず、エッジの重みおよび距離等の情報も考慮してラベルが更新される。
例えば、疑似ラベル付与部13は、各ノードに初期ラベルを割り当て、隣接するノードのラベルの最頻値または平均値等を用いてラベルを更新する。これにより、ラベルがグラフ全体に伝播または拡散される。疑似ラベル付与部13は、移動窓内に含まれる特徴量のラベルに基づいて、疑似ラベルを付与する。これらの処理をラベルの更新が収束するまで繰り返すことで、クラスタ内の特徴量の各点に疑似ラベルを付与することができる。
The pseudo label assignment unit 13 may assign pseudo labels to the feature quantities of the graph data using a label propagation method or a label diffusion method.
Label propagation is a method of updating the labels of nodes in a graph based on the labels of surrounding nodes. Similarly, label diffusion is a method of updating the labels of nodes in a graph, but in label diffusion, the labels are updated taking into account not only the node labels but also information such as edge weights and distances.
For example, the pseudo label assignment unit 13 assigns an initial label to each node and updates the label using the mode or average value of the labels of adjacent nodes. This allows the labels to be propagated or diffused throughout the graph. The pseudo label assignment unit 13 assigns pseudo labels based on the labels of the feature quantities included in the moving window. By repeating these processes until the label updates converge, it is possible to assign pseudo labels to each feature quantity point within the cluster.
(推論部)
推論部14は、特徴量抽出部11によりデータから抽出された特徴量に基づいて推論値を算出する。例えば、推論部14では、特徴量から推論値を算出するためのモデルが用いられる。モデルは、入力される特徴量と出力される推論値との関係がモデル化されたものである。モデルには、ニューラルネットワーク、一般化線形モデル、一般化加法モデル、決定木、決定木アンサンブル、サポートベクタマシン、または、近傍法モデル等がある。モデルのパラメータの値は、後述するパラメータ更新部15により更新され、モデルは、入力データから適切な推論値を生成できるように学習が行われる。推論部14は、学習済モデルを用いて、入力データに対する、予測、分類または回帰等の処理を行う。
(Inference Department)
The inference unit 14 calculates an inferred value based on the feature quantities extracted from the data by the feature quantity extraction unit 11. For example, the inference unit 14 uses a model for calculating an inferred value from the feature quantities. The model is a model of the relationship between the input feature quantities and the output inferred value. Examples of the model include a neural network, a generalized linear model, a generalized additive model, a decision tree, a decision tree ensemble, a support vector machine, and a nearest neighbor model. The parameter values of the model are updated by the parameter update unit 15 (described later), and the model is trained so that it can generate an appropriate inferred value from the input data. The inference unit 14 uses the trained model to perform processing such as prediction, classification, or regression on the input data.
例えば、推論部14において、特徴量がニューラルネットワークの入力層に入力されると、ニューラルネットワークにおいて、特徴量は、入力層から隠れ層を経由して出力層へと伝播し、最終的な推論値が生成される。全結合層では、入力と重みの内積にバイアスが加算され、活性化関数により出力値が算出される。活性化関数には、ReLU(Rectified Linear Unit)、Sigmoid、Softmax、またはTanh等がある。なお、活性化関数を用いず、入力と重みの内積にバイアスが加算された値をそのまま出力してもよい。 For example, when a feature is input to the input layer of a neural network in the inference unit 14, the feature propagates from the input layer through the hidden layer to the output layer in the neural network, and a final inference value is generated. In the fully connected layer, a bias is added to the dot product of the input and weight, and an output value is calculated using an activation function. Activation functions include ReLU (Rectified Linear Unit), Sigmoid, Softmax, and Tanh. Note that it is also possible to output the value obtained by adding a bias to the dot product of the input and weight without using an activation function.
(パラメータ更新部)
パラメータ更新部15は、疑似ラベルおよび推論値に基づいて特徴量抽出部11および推論部14のパラメータを更新する。例えば、パラメータ更新部15は、推論部14から出力される推論値が疑似ラベル付与部13から出力される疑似ラベルと一致するように、特徴量抽出部11および推論部14の各パラメータを更新する。
(Parameter update section)
The parameter update unit 15 updates the parameters of the feature extraction unit 11 and the inference unit 14 based on the pseudo label and the inference value. For example, the parameter update unit 15 updates the parameters of the feature extraction unit 11 and the inference unit 14 so that the inference value output from the inference unit 14 matches the pseudo label output from the pseudo label assignment unit 13.
例えば、パラメータ更新部15は、推論値と疑似ラベルとの一致度を評価するために、損失関数(loss)を用いる。損失関数は、推論値と疑似ラベルとの差異を表す指標であり、モデルの性能を評価するために用いられる。損失関数の値は、推論値と疑似ラベルの値とが近づくほど小さくなる。
なお、損失関数には、推論値がカテゴリである場合、交差エントロピーがある。また、推論値が数値である場合、損失関数には、平均絶対誤差(MAE)または平均二乗誤差(MSE)等がある。
For example, the parameter update unit 15 uses a loss function to evaluate the degree of agreement between the inferred value and the pseudo label. The loss function is an index that represents the difference between the inferred value and the pseudo label, and is used to evaluate the performance of the model. The value of the loss function becomes smaller as the inferred value and the pseudo label value become closer.
Note that loss functions include cross-entropy when the inferred values are categorical, and mean absolute error (MAE) or mean squared error (MSE) when the inferred values are numeric.
パラメータ更新部15は、損失関数の値が最小化されるように特徴量抽出部11および推論部14のパラメータを更新する。
例えば、特徴量抽出部11および推論部14が、ニューラルネット等の勾配計算が可能である場合、パラメータ更新部15は、勾配降下法を用いて損失関数の勾配を算出し、算出された勾配を用いて特徴量抽出部11および推論部14の各パラメータに対する更新量を算出する。パラメータ更新部15は、損失関数の値が収束するまで、上記の手順を反復して学習を行う。これにより、損失関数を最小化する方向にパラメータが更新される。
勾配降下法には、確率的勾配降下法(SGD)、ミニバッチ勾配降下法、AdamまたはRMSProp等の手法がある。
The parameter update unit 15 updates the parameters of the feature extraction unit 11 and the inference unit 14 so as to minimize the value of the loss function.
For example, if the feature extraction unit 11 and the inference unit 14 are capable of gradient calculation of a neural network or the like, the parameter update unit 15 calculates the gradient of the loss function using the gradient descent method and uses the calculated gradient to calculate the amount of update for each parameter of the feature extraction unit 11 and the inference unit 14. The parameter update unit 15 repeats the above procedure to perform learning until the value of the loss function converges. As a result, the parameters are updated in a direction that minimizes the loss function.
Gradient descent methods include stochastic gradient descent (SGD), mini-batch gradient descent, Adam, or RMSProp.
次に、情報処理装置1の動作について説明する。
図2は、情報処理装置1の動作を示すフローチャートである。
特徴量抽出部11のパラメータの値が初期化される(ステップST1)。例えば、特徴量抽出部11は、特徴量の抽出するためのモデルの重み等のパラメータを、ランダムな値または事前に定義された値で初期化する。
Next, the operation of the information processing device 1 will be described.
FIG. 2 is a flowchart showing the operation of the information processing device 1.
The parameter values of the feature extraction unit 11 are initialized (step ST1). For example, the feature extraction unit 11 initializes parameters such as the weights of a model for extracting features with random values or predefined values.
情報処理装置1は、学習用のデータを取得する(ステップST2)。情報処理装置1によって取得されたデータは、特徴量抽出部11に出力される。学習用のデータセットは、正解ラベルが付与されていない教師なし学習用のデータセットである。 The information processing device 1 acquires data for learning (step ST2). The data acquired by the information processing device 1 is output to the feature extraction unit 11. The learning dataset is a dataset for unsupervised learning that does not have a correct answer label assigned.
続いて、情報処理装置1は、学習用のデータセットを用いた学習処理を行う(ステップST3)。例えば、特徴量抽出部11は、データから特徴量を抽出する。クラスタリング部12が、特徴量空間における特徴量のクラスタリングにより特徴量が属するクラスタを決定する。疑似ラベル付与部13は、クラスタ内で範囲を順次設定し、範囲内に含まれる特徴量に対応するクラスタ情報に基づいて疑似ラベルを生成し、生成された疑似ラベルを範囲内に含まれる少なくとも一つの特徴量に付与する。推論部14は、特徴量に基づいて推論値を算出し、パラメータ更新部15は、疑似ラベルおよび推論値に基づいて、特徴量抽出部11および推論部14のパラメータを更新する。 The information processing device 1 then performs a learning process using a learning dataset (step ST3). For example, the feature extraction unit 11 extracts features from the data. The clustering unit 12 determines the cluster to which each feature belongs by clustering the features in feature space. The pseudo label assignment unit 13 sequentially sets ranges within the cluster, generates pseudo labels based on cluster information corresponding to the features included within the range, and assigns the generated pseudo label to at least one feature included within the range. The inference unit 14 calculates inference values based on the features, and the parameter update unit 15 updates the parameters of the feature extraction unit 11 and the inference unit 14 based on the pseudo labels and inference values.
学習条件が終了したかどうかが判定される(ステップST4)。学習条件が終了した場合(ステップST4;YES)、ステップST5に移行する。一方、学習条件が終了していなければ(ステップST4;NO)、ステップST3に戻り、学習処理が繰り返し行われる。すなわち、特徴量抽出部11による特徴量抽出、クラスタリング部12によるクラスタリング、疑似ラベル付与部13による疑似ラベルの付与、推論部14による推論およびパラメータ更新部15によるパラメータ更新が、学習条件が終了するまで繰り返し実行される。なお、学習条件には、例えば、所定の繰り返し回数に達したかどうかの条件、または、所定の学習時間に達したかどうかの条件等がある。 It is determined whether the learning conditions have ended (step ST4). If the learning conditions have ended (step ST4; YES), proceed to step ST5. On the other hand, if the learning conditions have not ended (step ST4; NO), return to step ST3 and the learning process is repeated. That is, feature extraction by the feature extraction unit 11, clustering by the clustering unit 12, assignment of pseudo labels by the pseudo label assignment unit 13, inference by the inference unit 14, and parameter update by the parameter update unit 15 are repeatedly executed until the learning conditions end. Note that the learning conditions may be, for example, a condition of whether a predetermined number of repetitions has been reached, or a condition of whether a predetermined learning time has been reached, etc.
また、学習条件は、下記の条件(1)または(2)であってもよい。
(1)疑似ラベルが変化するサンプルの数が所定値以下になったら学習を終了するという条件である。
(2)取得されたデータを学習用と検証用とに分け、情報処理装置1は、学習用データを用いてステップST3において特徴量抽出部11および推論部14の各パラメータの更新を行う。学習条件は、推論値と疑似ラベルとの差異を表す損失関数の値が、検証用データを用いて所定の回数連続して減少しなければ、損失関数の値が収束したものとして学習を終了するという条件である。
The learning condition may be the following condition (1) or (2).
(1) The condition is that learning is terminated when the number of samples whose pseudo labels change becomes equal to or less than a predetermined value.
(2) The acquired data is divided into data for learning and data for verification, and the information processing device 1 uses the learning data to update the parameters of the feature extraction unit 11 and the inference unit 14 in step ST3. The learning condition is that if the value of a loss function representing the difference between an inferred value and a pseudo label does not decrease a predetermined number of times consecutively using the verification data, the value of the loss function is deemed to have converged and learning is terminated.
また、情報処理装置1は、クラスタリングおよび疑似ラベルの付与と、推論およびパラメータの更新とを、1:1の比率で行ってもよいし、異なる比率で行ってもよい。
例えば、情報処理装置1は、クラスタリングおよび疑似ラベルの付与を1回行った後、推論およびパラメータの更新を、確率的勾配降下法に従って10回繰り返してもよい。
Furthermore, the information processing device 1 may perform clustering and pseudo-labeling, and inference and parameter updating at a 1:1 ratio, or at different ratios.
For example, the information processing device 1 may perform clustering and pseudo-labeling once, and then repeat inference and parameter updating ten times according to the stochastic gradient descent method.
推論部14は、ステップST4で学習されたモデルを用いて推論処理を行う(ステップST5)。推論には、新しいデータに対する予測、分類または回帰等が含まれる。学習済のモデルは、新しいデータの特徴量が入力されると、予測または分類結果を出力する。 The inference unit 14 performs inference processing using the model trained in step ST4 (step ST5). Inference includes prediction, classification, regression, etc. for new data. When the features of new data are input, the trained model outputs a prediction or classification result.
次に、実施の形態1に係る学習方法について説明する。
図3は、実施の形態1に係る学習方法を示すフローチャートであって、図2のステップST3の詳細な処理を示している。
特徴量抽出部11が、学習用データから特徴量を抽出する(ステップST1A)。
図4は、データXから特徴量Zを抽出する処理の概要を示す図である。図4において、データXは、要素xiから構成される1次元のデータである。特徴量Zは、特徴量ziから構成される特徴量ベクトルである。iは、1以上の整数である。
特徴量抽出部11は、特徴量ziを、関係式zi=Feature(Neighbor(xi))を用いて算出する。特徴量抽出部11は、データXに含まれるxiの近傍を算出する。近傍とは、要素xiの周囲に存在する他の要素の集合である。例えば、要素xiのk個の最近傍点またはある特定の距離範囲内にある要素等である。
特徴量抽出部11は、近傍データに基づいてFeature関数を用いて特徴量ziを算出する。Feature関数を用いて算出された特徴量ziは、データの各要素xiに対応する特徴量の各要素である。これにより、1次元のデータXから、特徴量ziを含む特徴量Zが抽出される。
Next, a learning method according to the first embodiment will be described.
FIG. 3 is a flowchart showing the learning method according to the first embodiment, illustrating the detailed processing of step ST3 in FIG.
The feature extraction unit 11 extracts features from the learning data (step ST1A).
Fig. 4 is a diagram showing an outline of a process for extracting a feature quantity Z from data X. In Fig. 4, data X is one-dimensional data composed of elements x i . Feature quantity Z is a feature quantity vector composed of feature quantities z i , where i is an integer equal to or greater than 1.
The feature extraction unit 11 calculates the feature z i using the relational expression z i = Feature(Neighbor(x i )). The feature extraction unit 11 calculates the neighborhood of x i included in the data X. The neighborhood is a set of other elements existing around the element x i . For example, it is the k nearest neighbors of the element x i or elements within a certain distance range.
The feature extraction unit 11 calculates a feature z i using a feature function based on the neighborhood data. The feature z i calculated using the feature function is each element of the feature corresponding to each element x i of the data. In this way, a feature Z including the feature z i is extracted from the one-dimensional data X.
クラスタリング部12が、特徴量空間における特徴量のクラスタリングにより特徴量が属するクラスタを決定する(ステップST2A)。
図5は、特徴量Zのクラスタリングの概要を示す図である。図5において、クラスタCは、特徴量ziが分類されるクラスタである。特徴量ziには、ラベルciが付与される。クラスタリング部12は、自身に設定されたクラスタリングアルゴリズムに従って、特徴量ziをクラスタCにクラスタリングする。クラスタリングアルゴリズムには、k-means、階層的クラスタリング、または、DBSCAN等がある。これらのアルゴリズムは、特徴量ziがどのクラスタCに属するかを決定する。
クラスタリング部12は、関係式ci=Cluster(zi)を用いて、特徴量ziに対応するラベルciを割り当てる。これにより、特徴量ziが属するクラスタCが決定される。なお、ciの値は、特徴量ziが属するクラスタCを表す情報である。
The clustering unit 12 determines the cluster to which the feature belongs by clustering the feature in the feature space (step ST2A).
FIG. 5 is a diagram showing an overview of clustering of feature quantity Z. In FIG. 5, cluster C is a cluster into which feature quantity z i is classified. Feature quantity z i is assigned a label c i . The clustering unit 12 clusters feature quantity z i into cluster C according to a clustering algorithm set therein. Examples of clustering algorithms include k-means, hierarchical clustering, and DBSCAN. These algorithms determine which cluster C feature quantity z i belongs to.
The clustering unit 12 assigns a label c corresponding to the feature z using the relational expression c = Cluster( z ), thereby determining the cluster C to which the feature z belongs. Note that the value of c is information representing the cluster C to which the feature z belongs.
疑似ラベル付与部13が、クラスタ内で範囲(移動窓)を順次設定し、範囲内に含まれる特徴量に対応するクラスタ情報に基づいて疑似ラベルを生成し、疑似ラベルを、範囲内に含まれる少なくとも一つの特徴量に付与する(ステップST3A)。
図6は、クラスタCの各点に疑似ラベルaiを付与する処理の概要を示す図である。図6において、疑似ラベルAは、クラスタCに含まれるラベルciに基づいて疑似的に付与される疑似ラベルaiから構成される。疑似ラベル付与部13は、関数式Neighbor(ci)に従ってクラスタC内に移動窓Wを設定し、移動窓W内の各要素のラベルciを算出する。これにより、移動窓Wに含まれる各要素のラベルの集合が得られる。
次に、疑似ラベル付与部13は、Aggregate関数に従って移動窓W内の各要素のラベルciを集約したラベルに基づいて疑似ラベルaiを算出する。
例えば、集約されたラベルciの最頻値または平均値が、疑似ラベルaiとして算出される。疑似ラベル付与部13は、算出した疑似ラベルaiを付与対象の要素に付与する。これにより、疑似ラベルAに含まれる疑似ラベルaiの集合が得られる。
The pseudo label assignment unit 13 sequentially sets ranges (moving windows) within the cluster, generates pseudo labels based on cluster information corresponding to features included within the ranges, and assigns the pseudo labels to at least one feature included within the ranges (step ST3A).
Fig. 6 is a diagram showing an outline of the process of assigning a pseudo label a i to each point in cluster C. In Fig. 6, pseudo label A is composed of pseudo labels a i that are assigned pseudo based on labels c i included in cluster C. The pseudo label assignment unit 13 sets a moving window W within cluster C according to the function Neighbor(c i ) and calculates the label c i of each element within the moving window W. As a result, a set of labels for each element included in the moving window W is obtained.
Next, the pseudo label assignment unit 13 calculates a pseudo label a i based on a label obtained by aggregating the labels c i of the elements within the moving window W according to the Aggregate function.
For example, the mode or average value of the aggregated labels c i is calculated as the pseudo label a i . The pseudo label assignment unit 13 assigns the calculated pseudo label a i to the element to be assigned. As a result, a set of pseudo labels a i included in the pseudo label A is obtained.
推論部14が、特徴量に基づいて推論値を算出する(ステップST4A)。
図7は、特徴量Zを用いた推論の概要を示す図である。図7において、推論結果Yは、特徴量Zを用いた推論の結果であり、特徴量ziに基づいた推論結果の要素yiの集合である。推論部14は、関係式yi=Predict(zi)に従った推論を行う。
例えば、推論部14は、特徴量ziが入力されると、推論値の要素yiを出力する推論モデルを用いて、推論結果Yを生成する。これにより、特徴量ziから目的とする情報の推論が可能である。
The inference unit 14 calculates an inference value based on the feature amount (step ST4A).
Fig. 7 is a diagram showing an outline of inference using the feature quantity Z. In Fig. 7, the inference result Y is the result of inference using the feature quantity Z, and is a set of elements yi of the inference result based on the feature quantity zi . The inference unit 14 performs inference according to the relational expression yi = Predict( zi ).
For example, when a feature value z i is input, the inference unit 14 generates an inference result Y using an inference model that outputs an element y i of an inference value. This makes it possible to infer target information from the feature value z i .
パラメータ更新部15が、疑似ラベルおよび推論値に基づいて特徴量抽出部11および推論部14のパラメータを更新する(ステップST5A)。
図8は、特徴量抽出部11および推論部14のパラメータ更新の概要を示す図である。図8において、損失関数Loss(ai,yi)は、疑似ラベルaiと推論結果の要素yiとの差分を定量化する関数であり、両者の一致度合いの評価に用いられる。パラメータ更新部15は、損失関数Loss(ai,yi)の値が最小化する方向に、特徴量抽出部11および推論部14の各パラメータを更新する。各パラメータの更新は、損失関数の値が閾値以下になるか、または一定のイテレーション回数に達した場合に、パラメータの値が収束したものとして、学習が終了する。
The parameter update unit 15 updates the parameters of the feature extraction unit 11 and the inference unit 14 based on the pseudo label and the inference value (step ST5A).
FIG. 8 is a diagram showing an overview of parameter updates for the feature extraction unit 11 and the inference unit 14. In FIG. 8, the loss function Loss( ai , yi ) is a function that quantifies the difference between the pseudo label ai and the element yi of the inference result, and is used to evaluate the degree of match between them. The parameter update unit 15 updates each parameter of the feature extraction unit 11 and the inference unit 14 in a direction that minimizes the value of the loss function Loss( ai , yi ). When the value of the loss function becomes equal to or less than a threshold or a certain number of iterations is reached, the parameter values are considered to have converged, and learning ends.
次に、情報処理装置1の機能を実現するハードウェア構成について説明する。
図9は、情報処理装置1の機能を実現するハードウェア構成を示すブロック図である。情報処理装置1が備える、特徴量抽出部11、クラスタリング部12、疑似ラベル付与部13、推論部14およびパラメータ更新部15の各機能は、処理回路により実現される。すなわち、情報処理装置1は、図3に示すステップST1AからステップST5Aの処理を実行するための処理回路を備える。処理回路は、メモリに記憶されたプログラムを実行するCPU(Central Processing Unit)であってもよい。
Next, a hardware configuration for realizing the functions of the information processing device 1 will be described.
9 is a block diagram showing a hardware configuration that realizes the functions of the information processing device 1. The functions of the feature extraction unit 11, clustering unit 12, pseudo-label assignment unit 13, inference unit 14, and parameter update unit 15 provided in the information processing device 1 are realized by processing circuits. That is, the information processing device 1 includes a processing circuit for executing the processes of steps ST1A to ST5A shown in FIG. 3. The processing circuit may be a CPU (Central Processing Unit) that executes a program stored in a memory.
特徴量抽出部11は、例えば、入力インタフェース100を介して、情報処理装置1が備える通信部によって外部装置から受信された学習用のデータセットを取得する。また、情報処理装置1が備える記憶部にデータセットが記憶されている場合、特徴量抽出部11は、入力インタフェース100を介して記憶部からデータセットを読み出して取得する。この場合、情報処理装置1は、通信部を備えていなくてもよい。 The feature extraction unit 11 acquires a learning dataset received from an external device by a communication unit included in the information processing device 1, for example, via the input interface 100. Furthermore, if the dataset is stored in a memory unit included in the information processing device 1, the feature extraction unit 11 reads and acquires the dataset from the memory unit via the input interface 100. In this case, the information processing device 1 does not need to be equipped with a communication unit.
推論部14は、例えば、出力インタフェース101を介して推論結果を外部装置に出力する。また、推論部14は、出力インタフェース101を介して通信部を制御することにより、推論結果を外部装置に送信してもよい。 The inference unit 14 outputs the inference result to an external device, for example, via the output interface 101. The inference unit 14 may also transmit the inference result to an external device by controlling the communication unit via the output interface 101.
情報処理装置1が備える、特徴量抽出部11、クラスタリング部12、疑似ラベル付与部13、推論部14およびパラメータ更新部15の各機能は、ソフトウェア、ファームウェアまたはソフトウェアとファームウェアとの組み合わせにより実現される。
なお、ソフトウェアまたはファームウェアは、プログラムとして記述されてメモリ103に記憶される。
The functions of the feature extraction unit 11, clustering unit 12, pseudo label assignment unit 13, inference unit 14, and parameter update unit 15 included in the information processing device 1 are realized by software, firmware, or a combination of software and firmware.
The software or firmware is written as a program and stored in the memory 103 .
プロセッサ102は、メモリ103に記憶されたプログラムを読み出して実行することにより、情報処理装置1が備える特徴量抽出部11、クラスタリング部12、疑似ラベル付与部13、推論部14およびパラメータ更新部15の各機能を実現する。例えば、情報処理装置1は、プロセッサ102によって実行されるときに図3に示したステップST1AからステップST5Aの処理が結果的に実行されるプログラムを記憶するためのメモリ103を備える。これらのプログラムは、特徴量抽出部11、クラスタリング部12、疑似ラベル付与部13、推論部14およびパラメータ更新部15が行う各処理の手順または方法を、コンピュータに実行させるものである。メモリ103は、コンピュータを、特徴量抽出部11、クラスタリング部12、疑似ラベル付与部13、推論部14およびパラメータ更新部15として機能させるためのプログラムが記憶されたコンピュータ可読記憶媒体であってもよい。 The processor 102 reads and executes programs stored in the memory 103 to implement the functions of the feature extraction unit 11, clustering unit 12, pseudo label assignment unit 13, inference unit 14, and parameter update unit 15 provided in the information processing device 1. For example, the information processing device 1 includes a memory 103 for storing programs that, when executed by the processor 102, result in the processing of steps ST1A to ST5A shown in FIG. 3 being performed. These programs cause a computer to execute the procedures or methods of the processing performed by the feature extraction unit 11, clustering unit 12, pseudo label assignment unit 13, inference unit 14, and parameter update unit 15. The memory 103 may be a computer-readable storage medium that stores programs for causing a computer to function as the feature extraction unit 11, clustering unit 12, pseudo label assignment unit 13, inference unit 14, and parameter update unit 15.
メモリ103は、例えば、RAM(Random Access Memory)、ROM(Read Only Memory)、フラッシュメモリ、EPROM(Erasable Programmable Read Only Memory)、EEPROM(Electrically-EPROM)(登録商標)などの不揮発性または揮発性の半導体メモリ、磁気ディスク、フレキシブルディスク、光ディスク、コンパクトディスク、ミニディスク、DVDなどが該当する。 Memory 103 may be, for example, non-volatile or volatile semiconductor memory such as RAM (Random Access Memory), ROM (Read Only Memory), flash memory, EPROM (Erasable Programmable Read Only Memory), EEPROM (Electrically-EPROM) (registered trademark), magnetic disk, flexible disk, optical disk, compact disk, mini disk, DVD, etc.
情報処理装置1が備える、特徴量抽出部11、クラスタリング部12、疑似ラベル付与部13、推論部14およびパラメータ更新部15の機能の一部を、専用のハードウェアで実現し、他の一部はソフトウェアまたはファームウェアで実現してもよい。
例えば、特徴量抽出部11、クラスタリング部12および疑似ラベル付与部13の機能は、専用のハードウェアである処理回路により実現し、推論部14およびパラメータ更新部15の機能は、プロセッサ102がメモリ103に記憶されたプログラムを読み出して実行することにより実現してもよい。このように、処理回路は、ハードウェア、ソフトウェア、ファームウェア、またはこれらの組み合わせにより上記機能を実現することが可能である。
Some of the functions of the feature extraction unit 11, clustering unit 12, pseudo label assignment unit 13, inference unit 14 and parameter update unit 15 provided in the information processing device 1 may be realized by dedicated hardware, and other functions may be realized by software or firmware.
For example, the functions of the feature extraction unit 11, the clustering unit 12, and the pseudo-labeling unit 13 may be realized by a processing circuit that is dedicated hardware, and the functions of the inference unit 14 and the parameter update unit 15 may be realized by the processor 102 reading and executing a program stored in the memory 103. In this way, the processing circuit can realize the above functions by hardware, software, firmware, or a combination of these.
以上のように、実施の形態1に係る情報処理装置1は、データから特徴量を抽出する特徴量抽出部11と、特徴量空間における特徴量のクラスタリングにより特徴量が属するクラスタを決定するクラスタリング部12と、クラスタ内で範囲を順次設定し、範囲内に含まれる特徴量に対応するクラスタ情報に基づいて疑似ラベルを生成し、生成した疑似ラベルを範囲内に含まれる少なくとも一つの特徴量に付与する疑似ラベル付与部13と、特徴量に基づいて推論値を算出する推論部14と、疑似ラベルおよび推論値に基づいて特徴量抽出部11および推論部14のパラメータを更新するパラメータ更新部15を備える。
これにより、情報処理装置1は、クラスタの特徴量の各点に疑似ラベルを付与した学習が可能である。例えば、学習用データが画像データである場合、情報処理装置1は、画像内のピクセルごとに疑似ラベルを付与することができ、画像のセグメンテーションに適用することが可能である。
As described above, the information processing device 1 according to the first embodiment includes a feature extraction unit 11 that extracts features from data, a clustering unit 12 that determines a cluster to which a feature belongs by clustering the features in a feature space, a pseudo label assignment unit 13 that sequentially sets ranges within a cluster, generates pseudo labels based on cluster information corresponding to the features included within the ranges, and assigns the generated pseudo label to at least one feature included within the ranges, an inference unit 14 that calculates an inferred value based on the features, and a parameter update unit 15 that updates parameters of the feature extraction unit 11 and the inference unit 14 based on the pseudo label and the inferred value.
This enables the information processing device 1 to perform learning by assigning a pseudo label to each point of the feature of the cluster. For example, when the learning data is image data, the information processing device 1 can assign a pseudo label to each pixel in the image, and can be applied to image segmentation.
実施の形態1に係る情報処理装置1において、特徴量抽出部11は、畳み込み演算によって特徴量を抽出する。これにより、情報処理装置1は、データの局所的な特徴を示す特徴量を抽出することが可能である。 In the information processing device 1 according to embodiment 1, the feature extraction unit 11 extracts features by convolution operations. This enables the information processing device 1 to extract features that indicate local characteristics of data.
実施の形態1に係る情報処理装置1において、疑似ラベルは、範囲内に含まれる特徴量に対応するクラスタ情報の最頻値、中央値、平均値、最小値または最大値のいずれか一つである。クラスタ内に含まれる特徴量に対応するクラスタ情報を統計的に考慮した疑似ラベルを生成することができるので、情報処理装置1は、データの特徴またはパターンをより正確に表現することが可能である。 In the information processing device 1 according to embodiment 1, the pseudo label is one of the mode, median, mean, minimum, or maximum of the cluster information corresponding to the feature amounts included in the range. Because it is possible to generate pseudo labels that statistically consider the cluster information corresponding to the feature amounts included in the cluster, the information processing device 1 can more accurately represent the features or patterns of the data.
実施の形態1に係る学習方法は、特徴量抽出部11が、データから特徴量を抽出するステップST1Aと、クラスタリング部12が、特徴量空間における特徴量のクラスタリングにより特徴量が属するクラスタを決定するステップST2Aと、疑似ラベル付与部13が、クラスタ内で範囲を順次設定し、範囲内に含まれる特徴量に対応するクラスタ情報に基づいて疑似ラベルを生成し、生成した疑似ラベルを範囲内に含まれる少なくとも一つの特徴量に付与するステップST3Aと、推論部14が、特徴量に基づいて推論値を算出するステップST4Aと、パラメータ更新部15が、疑似ラベルおよび推論値に基づいて特徴量抽出部11および推論部14のパラメータを更新するステップST5Aを備える。
情報処理装置1が上記学習方法を実行することにより、クラスタの特徴量の各点に疑似ラベルを付与した学習が可能である。
The learning method according to the first embodiment includes step ST1A in which a feature extraction unit 11 extracts features from data; step ST2A in which a clustering unit 12 determines a cluster to which a feature belongs by clustering the features in a feature space; step ST3A in which a pseudo label assignment unit 13 sequentially sets ranges within a cluster, generates pseudo labels based on cluster information corresponding to the features included in the ranges, and assigns the generated pseudo label to at least one feature included in the range; step ST4A in which an inference unit 14 calculates an inference value based on the feature; and step ST5A in which a parameter update unit 15 updates parameters of the feature extraction unit 11 and the inference unit 14 based on the pseudo label and the inference value.
By executing the above learning method, the information processing device 1 can perform learning in which pseudo labels are assigned to each point of the feature quantity of a cluster.
実施の形態1に係るプログラムを実行するコンピュータは、データから特徴量を抽出する特徴量抽出部11、特徴量空間における特徴量のクラスタリングにより特徴量が属するクラスタを決定するクラスタリング部12、クラスタ内で範囲を順次設定し、範囲内に含まれる特徴量に対応するクラスタ情報に基づいて疑似ラベルを生成し、生成した疑似ラベルを範囲内に含まれる少なくとも一つの特徴量に付与する疑似ラベル付与部13、特徴量に基づいて推論値を算出する推論部14、疑似ラベルおよび推論値に基づいて特徴量抽出部11および推論部14のパラメータを更新するパラメータ更新部15として機能する。
実施の形態1に係るプログラムを実行したコンピュータは、クラスタの特徴量の各点に疑似ラベルを付与した学習が可能である。
The computer that executes the program according to the first embodiment functions as a feature extraction unit 11 that extracts features from data, a clustering unit 12 that determines the cluster to which a feature belongs by clustering the features in the feature space, a pseudo label assignment unit 13 that sequentially sets ranges within the cluster, generates pseudo labels based on cluster information corresponding to the features included in the ranges, and assigns the generated pseudo label to at least one feature included in the range, an inference unit 14 that calculates an inferred value based on the feature, and a parameter update unit 15 that updates the parameters of the feature extraction unit 11 and the inference unit 14 based on the pseudo label and the inferred value.
A computer that executes the program according to the first embodiment is capable of performing learning by assigning pseudo labels to each point of the feature quantity of a cluster.
実施の形態2.
実施の形態1は、学習用データの特性について言及しなかったが、実施の形態2に係る情報処理装置は、学習用データとして空間的な連続性を有するデータを取得する。これにより、実施の形態2に係る情報処理装置は、空間的に連続する局所的なデータから特徴量を抽出することで、学習用データの空間的な連続性を考慮した疑似ラベルの付与が可能である。
Embodiment 2.
Although the first embodiment does not mention the characteristics of the training data, the information processing device according to the second embodiment acquires data having spatial continuity as training data. As a result, the information processing device according to the second embodiment can assign pseudo labels taking into account the spatial continuity of the training data by extracting features from spatially continuous local data.
(情報処理装置の概要)
図10は、実施の形態2に係る情報処理装置1Aの構成例を示すブロック図である。図10において、情報処理装置1Aは、空間的な連続性を有するデータを取得し、空間的に連続する局所的なデータから特徴量を抽出する。そして、情報処理装置1Aは、特徴量をクラスタリングして得られたクラスタ内で範囲を順次設定し、この範囲内に含まれる特徴量に対応するクラスタ情報に基づいて疑似ラベルを生成し、疑似ラベルを範囲内に含まれる少なくとも一つの特徴量に付与する。
(Outline of information processing device)
10 is a block diagram showing an example configuration of an information processing device 1A according to embodiment 2. In FIG. 10, the information processing device 1A acquires data having spatial continuity and extracts features from the spatially continuous local data. The information processing device 1A then sequentially sets ranges within clusters obtained by clustering the features, generates pseudo labels based on cluster information corresponding to the features included within the ranges, and assigns the pseudo labels to at least one feature included within the ranges.
情報処理装置1Aは、特徴量抽出部11A、クラスタリング部12A、疑似ラベル付与部13A、推論部14A、パラメータ更新部15Aおよびデータ取得部16を備える。
例えば、情報処理装置1Aは、コンピュータにより実現される。コンピュータが備えるメモリには、特徴量抽出部11A、クラスタリング部12A、疑似ラベル付与部13A、推論部14A、パラメータ更新部15Aおよびデータ取得部16の各機能を実現するための情報処理アプリケーションを構成するプログラムが記憶される。コンピュータが備えるプロセッサが、メモリから読み出した情報処理アプリケーションを実行することにより、特徴量抽出部11A、クラスタリング部12A、疑似ラベル付与部13A、推論部14A、パラメータ更新部15Aおよびデータ取得部16の各機能が実現される。
The information processing device 1A includes a feature extraction unit 11A, a clustering unit 12A, a pseudo-label assignment unit 13A, an inference unit 14A, a parameter update unit 15A, and a data acquisition unit 16.
For example, the information processing device 1A is realized by a computer. A memory included in the computer stores programs constituting information processing applications for realizing the functions of the feature extraction unit 11A, the clustering unit 12A, the pseudo label assignment unit 13A, the inference unit 14A, the parameter update unit 15A, and the data acquisition unit 16. A processor included in the computer executes the information processing application read from the memory, thereby realizing the functions of the feature extraction unit 11A, the clustering unit 12A, the pseudo label assignment unit 13A, the inference unit 14A, the parameter update unit 15A, and the data acquisition unit 16.
(データ取得部)
データ取得部16は、空間的な連続性を有するデータを取得する。例えば、データ取得部16は、図10において図示しない通信部により、ネットワークを介して、センサデバイスまたはサーバに通信接続することで、空間的な連続性を有するデータを取得する。
空間的な連続性を有するデータには、例えば、時系列データ、音声データ、画像データ、映像データ、スペクトログラム、テキストデータ、3次元点群、レンジドップラ、または、グラフデータ等がある。時系列データおよび音声データは、時間的な連続性を有する1次元データである。画像データは、2次元空間を表す2次元データである。映像データは、2次元空間を表しかつ時間の次元も有するデータである。スペクトログラムは、時間と周波数の次元を有する。テキストデータは、順序の次元を有する1次元データである。3次元点群は、3次元空間のデータである。レンジドップラは、距離と速度の次元を有するデータである。グラフデータは、ノードの接続により表されるデータである。
また、空間的に連続する局所的なデータから抽出される特徴量の点は、スカラーまたはベクトルであってもよい。例えば、単変量時系列データの特徴量は、スカラーで表され、多変量時系列データの特徴量は、ベクトルで表される。カラー画像データは、各画素値がR(赤)、G(緑)およびB(青)の色情報を有した3次元ベクトルで表される。
(Data acquisition section)
The data acquisition unit 16 acquires data having spatial continuity. For example, the data acquisition unit 16 acquires the data having spatial continuity by connecting to a sensor device or a server via a network using a communication unit not shown in FIG. 10 .
Examples of data with spatial continuity include time series data, audio data, image data, video data, spectrograms, text data, three-dimensional point clouds, range Doppler, and graph data. Time series data and audio data are one-dimensional data with temporal continuity. Image data is two-dimensional data that represents two-dimensional space. Video data is data that represents two-dimensional space and also has a time dimension. Spectrograms have the dimensions of time and frequency. Text data is one-dimensional data with an order dimension. Three-dimensional point clouds are data in three-dimensional space. Range Doppler is data with the dimensions of distance and velocity. Graph data is data represented by node connections.
Furthermore, feature points extracted from spatially continuous local data may be scalars or vectors. For example, features of univariate time-series data are represented by scalars, and features of multivariate time-series data are represented by vectors. Color image data is represented by a three-dimensional vector in which each pixel value has color information of R (red), G (green), and B (blue).
(特徴量抽出部)
特徴量抽出部11Aは、データ取得部16によって取得された空間的に連続する局所的なデータから特徴量を抽出する。この特徴量も空間的な連続性を有する。畳み込み演算によって特徴量を抽出するデータが、1次元データ、2次元テータ、3次元データ、および、グラフデータである場合に、特徴量抽出部11Aは、1次元畳み込み、2次元畳み込み、3次元畳み込み、および、グラフ畳み込みを行って特徴量を抽出する。
(Feature extraction unit)
The feature extraction unit 11A extracts features from spatially continuous local data acquired by the data acquisition unit 16. These features also have spatial continuity. When the data from which features are to be extracted by convolution calculation is one-dimensional data, two-dimensional data, three-dimensional data, or graph data, the feature extraction unit 11A extracts features by performing one-dimensional convolution, two-dimensional convolution, three-dimensional convolution, or graph convolution.
(クラスタリング部)
クラスタリング部12Aは、空間的な連続性を有する特徴量のクラスタリングにより、特徴量が属するクラスタを決定する。
例えば、クラスタリング部12Aは、ハードクラスタリングを行ってもよいし、ソフトクラスタリングを行ってもよい。
ハードクラスタリングは、特徴量の各点を一意のクラスタのみに割り当てる手法であり、クラスタ情報として、例えば、単一のクラスタ番号(整数値)がクラスタ内の各点に割り当てられる。ハードクラスタリングには、k-meansクラスタリング等がある。
ソフトクラスタリングは、特徴量の各点が複数のクラスタに属する可能性を有する手法であり、クラスタ情報として、例えば、各クラスタに対するスコア(実数値)が割り当てられる。ソフトクラスタリングには、GMM等がある。
(Clustering Department)
The clustering unit 12A determines the cluster to which the feature value belongs by clustering the feature values having spatial continuity.
For example, the clustering unit 12A may perform hard clustering or soft clustering.
Hard clustering is a method of assigning each feature point to only a unique cluster, and assigns, as cluster information, for example, a single cluster number (an integer value) to each point in the cluster. Examples of hard clustering include k-means clustering.
Soft clustering is a method in which each point of a feature quantity has the possibility of belonging to multiple clusters, and a score (real value) is assigned to each cluster as cluster information. Examples of soft clustering include GMM.
(疑似ラベル付与部)
疑似ラベル付与部13Aは、クラスタ内で範囲を順次設定し、範囲内に含まれる特徴量に対応するクラスタ情報に基づいて疑似ラベルを生成し、範囲内に含まれる少なくとも一つの特徴量に疑似ラベルを付与する。
例えば、疑似ラベル付与部13Aは、上記範囲内で空間的に連続する局所的なクラスタ情報を集約して疑似ラベルを付与する。
ハードクラスタリングの場合、疑似ラベル付与部13Aは、例えば、クラスタ番号(整数値)を集約した情報を、疑似ラベルとして付与する。集約とは、クラスタ内に設定された範囲に含まれる特徴量の点に割り当てられたクラスタ番号の最頻値、中央値、平均値、最小値、最大値または分位点等である。
ソフトクラスタリングの場合、疑似ラベル付与部13Aは、例えば、クラスタに対するスコア(実数値)を集約した情報を、疑似ラベルとして付与する。
(Pseudo label assignment unit)
The pseudo label assignment unit 13A sequentially sets ranges within a cluster, generates pseudo labels based on cluster information corresponding to feature amounts included in the ranges, and assigns the pseudo label to at least one feature amount included in the ranges.
For example, the pseudo labeling unit 13A aggregates information on spatially continuous local clusters within the above range and assigns a pseudo label to the clusters.
In the case of hard clustering, the pseudo-labeling unit 13A assigns information that aggregates cluster numbers (integer values) as pseudo labels. The aggregation may be the mode, median, mean, minimum, maximum, or quantile of the cluster numbers assigned to points of feature quantities included in a range set within the cluster.
In the case of soft clustering, the pseudo label assignment unit 13A assigns, for example, information that aggregates scores (real-valued values) for clusters as pseudo labels.
(推論部)
推論部14Aは、特徴量抽出部11Aによってデータから抽出された特徴量に基づいて推論値を算出する。例えば、推論部14Aは、ニューラルネットワークを用いて推論値を算出する。特徴量がニューラルネットワークの入力層に入力されると、ニューラルネットワークにおいて、特徴量は、入力層から隠れ層を経由して出力層へと伝播し、最終的な推論値が生成される。全結合層では、入力と重みの内積にバイアスが加算され、活性化関数により出力値が算出される。活性化関数がSigmoid関数である場合、二値カテゴリの推論値が出力される。また、活性化関数がSoftmax関数である場合は、多値カテゴリの推論値が出力される。
(Inference Department)
The inference unit 14A calculates an inferred value based on the feature extracted from the data by the feature extraction unit 11A. For example, the inference unit 14A calculates the inferred value using a neural network. When the feature is input to the input layer of the neural network, the feature propagates from the input layer to the output layer via the hidden layer, and a final inferred value is generated. In the fully connected layer, a bias is added to the inner product of the input and the weight, and an output value is calculated using an activation function. When the activation function is a sigmoid function, an inferred value for a binary category is output. When the activation function is a Softmax function, an inferred value for a multi-value category is output.
(パラメータ更新部)
パラメータ更新部15Aは、疑似ラベルおよび推論値に基づいて、特徴量抽出部11Aおよび推論部14Aのパラメータを更新する。例えば、パラメータ更新部15Aは、推論部14Aから出力される推論値が、疑似ラベル付与部13Aから出力される疑似ラベルと一致するように、特徴量抽出部11Aおよび推論部14Aの各パラメータを更新する。
(Parameter update section)
The parameter update unit 15A updates the parameters of the feature extraction unit 11A and the inference unit 14A based on the pseudo label and the inference value. For example, the parameter update unit 15A updates the parameters of the feature extraction unit 11A and the inference unit 14A so that the inference value output from the inference unit 14A matches the pseudo label output from the pseudo label assignment unit 13A.
次に、実施の形態2に係る学習方法について説明する。
図11は、実施の形態2に係る学習方法を示すフローチャートであり、図2のステップST3の詳細な処理を示している。
データ取得部16は、空間的な連続性を有するデータを取得する(ステップST1B)。
特徴量抽出部11Aが、空間的に連続する局所的なデータから特徴量を抽出する(ステップST2B)。
クラスタリング部12Aが、特徴量空間における特徴量のクラスタリングにより特徴量が属するクラスタを決定する(ステップST3B)。
疑似ラベル付与部13Aが、クラスタ内で移動窓を順次設定し、移動窓内に含まれる特徴量に対応するクラスタ情報に基づいて疑似ラベルを生成し、疑似ラベルを、範囲内に含まれる少なくとも一つの特徴量に付与する(ステップST4B)。
推論部14Aが、特徴量に基づいて推論値を算出する(ステップST5B)。
パラメータ更新部15Aが、疑似ラベルおよび推論値に基づいて、特徴量抽出部11Aおよび推論部14Aのパラメータを更新する(ステップST6B)。
Next, a learning method according to the second embodiment will be described.
FIG. 11 is a flowchart showing a learning method according to the second embodiment, illustrating the detailed processing of step ST3 in FIG.
The data acquisition unit 16 acquires data having spatial continuity (step ST1B).
The feature extraction unit 11A extracts feature amounts from spatially continuous local data (step ST2B).
The clustering unit 12A determines the cluster to which the feature belongs by clustering the feature in the feature space (step ST3B).
The pseudo label assignment unit 13A sequentially sets moving windows within the cluster, generates pseudo labels based on cluster information corresponding to the features included in the moving window, and assigns the pseudo label to at least one feature included in the range (step ST4B).
The inference unit 14A calculates an inference value based on the feature amount (step ST5B).
The parameter update unit 15A updates the parameters of the feature extraction unit 11A and the inference unit 14A based on the pseudo label and the inference value (step ST6B).
図12Aは、時系列データであるクラスタ内の特徴量の点に疑似ラベルを付与する処理の概要を示す図である。図12Aにおいて、疑似ラベル付与部13Aは、時系列データであるクラスタD1に移動窓W1を設定する。移動窓W1には、疑似ラベルの付与対象の点P1とその近傍の点PNが含まれる。疑似ラベル付与部13Aは、移動窓W1に含まれる点P1と点PNのラベル番号を集約して疑似ラベルを算出する。疑似ラベル付与部13Aは、疑似ラベルを、付与対象の点P1に付与する。この処理を繰り返すことにより、クラスタD1に含まれる全ての点に疑似ラベルが付与される。これにより、情報処理装置1Aは、時系列データである特徴量の各点に疑似ラベルを付与した学習が可能である。 FIG. 12A is a diagram showing an overview of the process of assigning pseudo labels to feature points within a cluster, which is time-series data. In FIG. 12A, the pseudo label assignment unit 13A sets a moving window W1 for cluster D1, which is time-series data. The moving window W1 includes point P1, which is the target of assigning a pseudo label, and its neighboring point PN. The pseudo label assignment unit 13A calculates a pseudo label by aggregating the label numbers of points P1 and PN included in the moving window W1. The pseudo label assignment unit 13A assigns the pseudo label to point P1, which is the target of assignment. By repeating this process, pseudo labels are assigned to all points included in cluster D1. This enables the information processing device 1A to perform learning by assigning pseudo labels to each feature point, which is time-series data.
図12Bは、2次元データであるクラスタ内の特徴量の点に疑似ラベルを付与する処理の概要を示す図である。図12Bにおいて、疑似ラベル付与部13Aは、2次元データであるクラスタD2に移動窓W2を設定する。移動窓W2には、疑似ラベルの付与対象の点P2とその近傍の点が含まれる。疑似ラベル付与部13Aは、移動窓W2に含まれる点のラベル番号を集約して疑似ラベルを算出する。疑似ラベル付与部13Aは、疑似ラベルを付与対象の点P2に付与する。この処理を繰り返すことにより、クラスタD2に含まれる全ての点に疑似ラベルが付与される。これにより、情報処理装置1Aは、2次元データである特徴量の各点に疑似ラベルを付与した学習が可能である。 FIG. 12B is a diagram showing an overview of the process of assigning pseudo labels to feature points within a cluster, which is two-dimensional data. In FIG. 12B, the pseudo label assignment unit 13A sets a moving window W2 within cluster D2, which is two-dimensional data. The moving window W2 includes point P2, which is the target of assigning a pseudo label, and its neighboring points. The pseudo label assignment unit 13A calculates a pseudo label by aggregating the label numbers of the points included in the moving window W2. The pseudo label assignment unit 13A assigns a pseudo label to point P2, which is the target of assignment. By repeating this process, pseudo labels are assigned to all points included in cluster D2. This enables the information processing device 1A to perform learning by assigning pseudo labels to each feature point, which is two-dimensional data.
図12Cは、3次元データであるクラスタ内の特徴量の点に疑似ラベルを付与する処理の概要を示す図である。図12Cにおいて、疑似ラベル付与部13Aは、クラスタD3に移動窓W3を設定する。移動窓W3には、疑似ラベルの付与対象の点P3とその近傍の点が含まれる。疑似ラベル付与部13Aは、移動窓W3に含まれる点のラベル番号を集約して疑似ラベルを算出する。疑似ラベル付与部13Aは、疑似ラベルを、付与対象の点P3に付与する。この処理を繰り返すことで、クラスタD3に含まれる全ての点に疑似ラベルが付与される。これにより、情報処理装置1Aは、3次元データである特徴量の各点に疑似ラベルを付与した学習が可能である。 Figure 12C is a diagram showing an overview of the process of assigning pseudo labels to feature points within a cluster, which is three-dimensional data. In Figure 12C, the pseudo label assignment unit 13A sets a moving window W3 within cluster D3. The moving window W3 includes point P3, which is the target of assigning a pseudo label, and its neighboring points. The pseudo label assignment unit 13A calculates a pseudo label by aggregating the label numbers of the points included in the moving window W3. The pseudo label assignment unit 13A assigns the pseudo label to point P3, which is the target of assignment. By repeating this process, pseudo labels are assigned to all points included in cluster D3. This enables the information processing device 1A to perform learning by assigning pseudo labels to each feature point, which is three-dimensional data.
図12Dは、グラフデータであるクラスタ内の特徴量の点に疑似ラベルを付与する処理の概要を示す図である。図12Dにおいて、疑似ラベル付与部13Aは、クラスタD4に移動窓W4を設定する。移動窓W4には、疑似ラベルの付与対象の点P4とその近傍の点が含まれる。疑似ラベル付与部13Aは、移動窓W4に含まれる点のラベル番号を集約して疑似ラベルを算出する。疑似ラベル付与部13Aは、疑似ラベルを、付与対象の点P4に付与する。この処理を繰り返すことにより、クラスタD4に含まれる全ての点に疑似ラベルが付与される。これにより、情報処理装置1Aは、グラフデータである特徴量の各点に疑似ラベルを付与した学習が可能である。 FIG. 12D is a diagram showing an overview of the process of assigning pseudo labels to feature points within a cluster, which is graph data. In FIG. 12D, the pseudo label assignment unit 13A sets a moving window W4 within cluster D4. The moving window W4 includes point P4, which is the target of assigning a pseudo label, and its neighboring points. The pseudo label assignment unit 13A calculates a pseudo label by aggregating the label numbers of the points included in the moving window W4. The pseudo label assignment unit 13A assigns the pseudo label to point P4, which is the target of assignment. By repeating this process, pseudo labels are assigned to all points included in cluster D4. This enables the information processing device 1A to perform learning by assigning pseudo labels to each feature point, which is graph data.
(変形例1)
変形例1においては、情報処理装置1Aが、学習用データセットとして時系列データを取得し、時系列データから抽出した特徴量の各点に対してハードクラスタリングを行い、クラスタ番号を示す整数値であるクラスタ情報を各点に割り当てる。
時系列データは、例えば、人の手足に装着したセンサによって検出された加速度および角加速度の時系列データである。この時系列データを、センサを装着した人の歩く、走るまたは座る等の動作種別で分節化(セグメンテーション)する。セグメントの真値、すなわちどの動作種別であるかは未知であるため、教師データはないものとする。
以下、図2および図3のフローチャートに沿って情報処理装置1Aの変形例1の動作を説明する。
(Variation 1)
In variant example 1, the information processing device 1A acquires time series data as a learning dataset, performs hard clustering on each point of the feature extracted from the time series data, and assigns cluster information, which is an integer value indicating a cluster number, to each point.
The time-series data is, for example, time-series data of acceleration and angular acceleration detected by sensors attached to a person's hands and feet. This time-series data is segmented by the type of movement of the person wearing the sensor, such as walking, running, or sitting. Since the true value of the segment, i.e., the type of movement, is unknown, no training data is provided.
The operation of the information processing device 1A according to the first modification will be described below with reference to the flowcharts of FIGS.
特徴量抽出部11Aは、CNNを用いた1次元畳み込み演算により特徴量を抽出する。
まず、情報処理装置1Aは、CNNにおける1次元畳み込み層のパラメータをランダムに初期化する(ステップST1)。
データ取得部16は、人の手足に装着したセンサによって検出された加速度および角加速度の時系列データを取得する(ステップST2)。これにより、情報処理装置1Aは、空間的に連続する局所的な特徴量の各点に疑似ラベルを付与した学習が可能である。
以下、情報処理装置1Aは、データ取得部16によって取得された時系列データを学習用データとして、下記の学習処理(ステップST1AからステップST5Aまでの処理)を指定回数だけ繰り返し実行する(ステップST3)。
The feature extraction unit 11A extracts features by one-dimensional convolution calculation using a CNN.
First, the information processing device 1A randomly initializes parameters of a one-dimensional convolutional layer in a CNN (step ST1).
The data acquisition unit 16 acquires time-series data of acceleration and angular acceleration detected by sensors attached to the person's hands and feet (step ST2), which enables the information processing device 1A to perform learning by assigning pseudo-labels to each point of spatially continuous local feature quantities.
Thereafter, the information processing device 1A uses the time-series data acquired by the data acquisition unit 16 as learning data and repeatedly executes the following learning process (processing from step ST1A to step ST5A) a specified number of times (step ST3).
特徴量抽出部11Aは、CNNの1次元畳み込み層によって時系列データから特徴量の時系列を抽出する(ステップST1A)。
クラスタリング部12Aは、特徴量抽出部11Aによって抽出された時系列な特徴量の各点を、クラスタ数が10であるk-meansによってハードクラスタリングを行う(ステップST2A)。これにより、クラスタに含まれる特徴量の各点には、クラスタ番号を示す整数値がクラスタ情報として割り当てられる。
The feature extraction unit 11A extracts a time series of features from time series data using a one-dimensional convolutional layer of a CNN (step ST1A).
The clustering unit 12A performs hard clustering on each point of the time-series feature extracted by the feature extraction unit 11A using k-means with a cluster count of 10 (step ST2A). As a result, an integer value indicating a cluster number is assigned as cluster information to each point of the feature included in a cluster.
疑似ラベル付与部13Aは、各時点のクラスタ内にサイズが5である移動窓を順次設定し、移動窓内のクラスタ情報の多数決をとって疑似ラベルに変換し、付与対象時点の特徴量に対して疑似ラベルを順次付与する(ステップST3A)。
推論部14Aは、CNNの全結合層およびSoftmax関数を用いて、各時点の特徴量を10次元ベクトルに変換する(ステップST4A)。
The pseudo label assignment unit 13A sequentially sets a moving window of size 5 within the cluster at each time point, takes a majority vote on the cluster information within the moving window to convert it into a pseudo label, and sequentially assigns the pseudo label to the feature at the target time point (step ST3A).
The inference unit 14A converts the feature quantity at each time point into a 10-dimensional vector using a fully connected layer of a CNN and a Softmax function (step ST4A).
パラメータ更新部15Aは、各時点の疑似ラベルと10次元ベクトルとに基づいて算出した交差エントロピー損失関数の値が最小化されるように、特徴量抽出部11Aが用いるCNNの1次元畳み込み層および推論部14Aが用いるCNNの全結合層の各パラメータを、確率的勾配降下法を用いて更新する(ステップST5A)。
ステップST1AからステップST5Aまでの処理は、指定回数に達するまで繰り返し実行される(ステップST4)。
The parameter update unit 15A updates each parameter of the one-dimensional convolutional layer of the CNN used by the feature extraction unit 11A and the fully connected layer of the CNN used by the inference unit 14A using stochastic gradient descent so as to minimize the value of the cross-entropy loss function calculated based on the pseudo label at each time point and the 10-dimensional vector (step ST5A).
The processes from step ST1A to step ST5A are repeatedly executed until the designated number of times is reached (step ST4).
推論部14Aは、推論結果のデータに対してセグメンテーションを行う(ステップST5)。例えば、推論部14Aは、CNNの1次元畳み込み層によって時系列データから抽出された特徴量の時系列を取得し、各時点の特徴量を、全結合層およびSoftmax関数を用いて10次元ベクトルに変換する。推論部14Aは、各時点で10次元ベクトルが最大となるクラスタを、セグメントとして出力する。
これにより、情報処理装置1Aは、人の手足に装着したセンサによって検出された加速度および角加速度の時系列データを、センサを装着した人の歩く、走るまたは座る等の動作種別でセグメンテーションすることができる。
The inference unit 14A performs segmentation on the inference result data (step ST5). For example, the inference unit 14A acquires a time series of features extracted from the time series data by a one-dimensional convolutional layer of a CNN, and converts the features at each time point into a 10-dimensional vector using a fully connected layer and a Softmax function. The inference unit 14A outputs, as a segment, the cluster in which the 10-dimensional vector is maximized at each time point.
This allows the information processing device 1A to segment the time series data of acceleration and angular acceleration detected by sensors attached to a person's hands and feet by the type of movement, such as walking, running, or sitting, of the person wearing the sensor.
(変形例2)
変形例2においては、情報処理装置1Aが、学習用データセットとして2次元データである画像データを取得し、画像データから抽出した特徴量の各点にソフトクラスタリングを行う。情報処理装置1Aは、画像データをピクセル単位でセグメンテーションを行う。
また、情報処理装置1Aは、セグメンテーション用に事前に学習された学習済みモデルを用いるが、事前学習されたソースドメインとは異なるターゲットドメインでセグメンテーションを行う。ターゲットドメインでは、教師データはないものとする
以下、図2および図3のフローチャートに沿って情報処理装置1Aの変形例2の動作を説明する。
(Variation 2)
In Modification 2, the information processing device 1A acquires image data, which is two-dimensional data, as a learning dataset, and performs soft clustering on each point of feature quantities extracted from the image data. The information processing device 1A segments the image data on a pixel-by-pixel basis.
The information processing device 1A uses a trained model trained in advance for segmentation, but performs segmentation in a target domain that is different from the source domain trained in advance. It is assumed that there is no training data in the target domain. The operation of the information processing device 1A in Variation 2 will be described below with reference to the flowcharts of FIGS. 2 and 3 .
特徴量抽出部11Aは、CNNを用いた2次元畳み込み演算により特徴量を抽出する。
まず、情報処理装置1Aは、CNNにおける2次元畳み込み層のパラメータをランダムに初期化する(ステップST1)。
データ取得部16は、画像データを取得する(ステップST2)。
以下、情報処理装置1Aは、データ取得部16によって取得された画像データを学習用データとして、下記の学習処理(ステップST1AからステップST5Aまでの処理)を指定回数だけ繰り返し実行する(ステップST3)。
The feature extraction unit 11A extracts features by two-dimensional convolution calculation using a CNN.
First, the information processing device 1A randomly initializes parameters of a two-dimensional convolutional layer in a CNN (step ST1).
The data acquisition unit 16 acquires image data (step ST2).
Thereafter, the information processing device 1A uses the image data acquired by the data acquisition unit 16 as learning data and repeatedly executes the following learning process (processing from step ST1A to step ST5A) a specified number of times (step ST3).
特徴量抽出部11Aは、CNNの2次元畳み込み層によって、画像データから特徴量の時系列を抽出する(ステップST1A)。
クラスタリング部12Aは、特徴量抽出部11Aによって画像データから抽出された、各ピクセルの特徴量を、クラスタ数が100であるGMMによってソフトクラスタリングを行う(ステップST2A)。クラスタに含まれる各ピクセルの特徴量には、クラスタに対するスコアであるクラスタ尤度が、クラスタ情報として割り当てられる。
The feature extraction unit 11A extracts a time series of features from image data using a two-dimensional convolutional layer of a CNN (step ST1A).
The clustering unit 12A performs soft clustering on the feature of each pixel extracted from the image data by the feature extraction unit 11A using a GMM with 100 clusters (step ST2A). A cluster likelihood, which is a score for the cluster, is assigned to the feature of each pixel included in a cluster as cluster information.
疑似ラベル付与部13Aは、クラスタに対してサイズが3×3の平均化フィルタである移動窓を順次設定し、平均化フィルタによって各ピクセルのクラスタ尤度を平均化して、最大のクラスタ尤度を疑似ラベルとして算出し、付与対象ピクセルの特徴量に対して疑似ラベルを順次付与する(ステップST3A)。
推論部14Aは、CNNの全結合層およびSoftmax関数を用いて、各ピクセルの特徴量を100次元ベクトルに変換する(ステップST4A)。
The pseudo label assignment unit 13A sequentially sets a moving window, which is an averaging filter of size 3x3, for the cluster, averages the cluster likelihood of each pixel using the averaging filter, calculates the maximum cluster likelihood as a pseudo label, and sequentially assigns pseudo labels to the features of the pixels to be assigned (step ST3A).
The inference unit 14A converts the feature quantity of each pixel into a 100-dimensional vector using a fully connected layer of a CNN and a Softmax function (step ST4A).
パラメータ更新部15Aは、各ピクセルの疑似ラベルと100次元ベクトルとに基づいて算出した交差エントロピー損失関数の値が最小化されるように、特徴量抽出部11Aが用いるCNNの2次元畳み込み層および推論部14Aが用いるCNNの全結合層の各パラメータを、確率的勾配降下法を用いて更新する(ステップST5A)。
ステップST1AからステップST5Aまでの処理は、指定回数に達するまで繰り返し実行される(ステップST4)。
The parameter update unit 15A updates each parameter of the two-dimensional convolutional layer of the CNN used by the feature extraction unit 11A and the fully connected layer of the CNN used by the inference unit 14A using stochastic gradient descent so as to minimize the value of the cross-entropy loss function calculated based on the pseudo label of each pixel and the 100-dimensional vector (step ST5A).
The processes from step ST1A to step ST5A are repeatedly executed until the designated number of times is reached (step ST4).
推論部14Aは、推論結果のデータに対してセグメンテーションを行う(ステップST5)。例えば、推論部14Aは、CNNの2次元畳み込み層によって画像データから抽出された各ピクセルの特徴量を取得し、各ピクセルの特徴量を、全結合層およびSoftmax関数を用いて100次元ベクトルに変換する。推論部14Aは、各ピクセルで100次元ベクトルが最大となるクラスタを、セグメントとして出力する。
これにより、情報処理装置1Aは、画像データを、事前学習されたソースドメインとは異なるターゲットドメインでセグメンテーションすることができる。
The inference unit 14A performs segmentation on the inference result data (step ST5). For example, the inference unit 14A acquires the feature of each pixel extracted from the image data by the two-dimensional convolutional layer of the CNN, and converts the feature of each pixel into a 100-dimensional vector using a fully connected layer and a Softmax function. The inference unit 14A outputs, as a segment, the cluster in which the 100-dimensional vector for each pixel is maximized.
This allows the information processing device 1A to segment image data in a target domain that is different from the pre-trained source domain.
以上のように、実施の形態2に係る情報処理装置1Aは、空間的な連続性を有するデータを取得するデータ取得部16を備え、特徴量抽出部11Aは、取得されたデータのうち、空間的に連続する局所的なデータから特徴量ベクトルを抽出する。これにより、情報処理装置1Aは、空間的に連続する局所的な特徴量の各点に疑似ラベルを付与した学習が可能である。 As described above, the information processing device 1A according to embodiment 2 includes a data acquisition unit 16 that acquires spatially continuous data, and a feature extraction unit 11A that extracts feature vectors from spatially continuous local data among the acquired data. This enables the information processing device 1A to perform learning by assigning pseudo-labels to each point of spatially continuous local feature values.
実施の形態2に係る情報処理装置1Aにおいて、データ取得部16は、時系列データを取得する。これにより、情報処理装置1Aは、時系列データである特徴量の各点に疑似ラベルを付与した学習が可能である。 In the information processing device 1A according to embodiment 2, the data acquisition unit 16 acquires time-series data. This enables the information processing device 1A to perform learning by assigning pseudo-labels to each point of the feature quantity, which is the time-series data.
実施の形態2に係る情報処理装置1Aにおいて、クラスタリング部12Aは、ハードクラスタリングによりクラスタを決定する。これにより、情報処理装置1Aは、クラスタ内の特徴量の各点に対して、一意のクラスタのみに割り当てるクラスタ情報に基づいて算出された疑似ラベルを付与した学習が可能である。 In the information processing device 1A according to the second embodiment, the clustering unit 12A determines clusters by hard clustering. This enables the information processing device 1A to perform learning by assigning pseudo labels calculated based on cluster information that is assigned only to unique clusters to each feature point within a cluster.
実施の形態2に係る情報処理装置1Aにおいて、クラスタ情報は、クラスタ番号を示す整数値である。これにより、情報処理装置1Aは、クラスタ内の特徴量の各点に対して、クラスタ番号に基づいて算出された疑似ラベルを付与した学習が可能である。 In the information processing device 1A according to the second embodiment, the cluster information is an integer value indicating the cluster number. This enables the information processing device 1A to perform learning by assigning a pseudo label calculated based on the cluster number to each feature point within a cluster.
実施の形態2に係る情報処理装置1Aにおいて、クラスタリング部12Aは、ソフトクラスタリングによりクラスタを決定する。これにより、情報処理装置1Aは、クラスタ内の特徴量の各点に対して、複数のクラスタに割り当て可能なクラスタ情報に基づいて算出された疑似ラベルを付与した学習が可能である。 In the information processing device 1A according to the second embodiment, the clustering unit 12A determines clusters by soft clustering. This enables the information processing device 1A to perform learning by assigning pseudo labels calculated based on cluster information that can be assigned to multiple clusters to each feature point within a cluster.
実施の形態2に係る情報処理装置1Aにおいて、クラスタ情報は、クラスタに対するスコアである。これにより、情報処理装置1Aは、クラスタ内の特徴量の各点に対して、クラスタに対するスコアに基づいて算出された疑似ラベルを付与した学習が可能である。 In the information processing device 1A according to the second embodiment, the cluster information is a score for the cluster. This enables the information processing device 1A to perform learning by assigning a pseudo label calculated based on the score for the cluster to each feature point within the cluster.
なお、各実施の形態の組み合わせまたは実施の形態のそれぞれの任意の構成要素の変形もしくは実施の形態のそれぞれにおいて任意の構成要素の省略が可能である。 It is possible to combine the various embodiments, modify any of the components in each embodiment, or omit any of the components in each embodiment.
本開示に係る情報処理装置は、例えば、教師なし学習を行う様々な情報処理に利用可能である。 The information processing device disclosed herein can be used for various information processing tasks, such as unsupervised learning.
1,1A 情報処理装置、11,11A 特徴量抽出部、12,12A クラスタリング部、13,13A 疑似ラベル付与部、14,14A 推論部、15,15A パラメータ更新部、16 データ取得部、100 入力インタフェース、101 出力インタフェース、102 プロセッサ、103 メモリ。 1, 1A: Information processing device, 11, 11A: Feature extraction unit, 12, 12A: Clustering unit, 13, 13A: Pseudo label assignment unit, 14, 14A: Inference unit, 15, 15A: Parameter update unit, 16: Data acquisition unit, 100: Input interface, 101: Output interface, 102: Processor, 103: Memory.
Claims (11)
特徴量空間における前記特徴量のクラスタリングにより前記特徴量が属するクラスタを決定するクラスタリング部と、
前記クラスタ内で範囲を順次設定し、前記範囲内に含まれる前記特徴量に対応するクラスタ情報に基づいて疑似ラベルを生成し、生成した前記疑似ラベルを前記範囲内に含まれる少なくとも一つの前記特徴量に付与する疑似ラベル付与部と、
前記特徴量に基づいて推論値を算出する推論部と、
前記疑似ラベルおよび前記推論値に基づいて前記特徴量抽出部および前記推論部のパラメータを更新するパラメータ更新部と、
を備える情報処理装置。 a feature extraction unit that extracts features from data;
a clustering unit that determines a cluster to which the feature value belongs by clustering the feature value in a feature value space;
a pseudo label assigning unit that sequentially sets ranges within the cluster, generates pseudo labels based on cluster information corresponding to the feature amounts included in the ranges, and assigns the generated pseudo labels to at least one of the feature amounts included in the ranges;
an inference unit that calculates an inference value based on the feature amount;
a parameter update unit that updates parameters of the feature extraction unit and the inference unit based on the pseudo label and the inference value;
An information processing device comprising:
前記特徴量抽出部は、取得されたデータのうち、空間的に連続する局所的なデータから前記特徴量を抽出する
ことを特徴とする請求項1に記載の情報処理装置。 a data acquisition unit that acquires data having spatial continuity;
The information processing apparatus according to claim 1 , wherein the feature extraction unit extracts the feature from spatially continuous local data among the acquired data.
ことを特徴とする請求項2に記載の情報処理装置。 The information processing apparatus according to claim 2 , wherein the data acquisition unit acquires time-series data.
ことを特徴とする請求項1から請求項3のいずれか1項に記載の情報処理装置。 The information processing device according to claim 1 , wherein the clustering unit determines the clusters by hard clustering.
ことを特徴とする請求項4に記載の情報処理装置。 The information processing apparatus according to claim 4 , wherein the cluster information is an integer value indicating a cluster number.
ことを特徴とする請求項1から請求項3のいずれか1項に記載の情報処理装置。 The information processing device according to claim 1 , wherein the clustering unit determines the clusters by soft clustering.
ことを特徴とする請求項6に記載の情報処理装置。 The information processing apparatus according to claim 6 , wherein the cluster information is a score for the cluster.
ことを特徴とする請求項1に記載の情報処理装置。 The information processing apparatus according to claim 1 , wherein the feature extraction unit extracts the feature by a convolution operation.
ことを特徴とする請求項1から請求項8のいずれか1項に記載の情報処理装置。 The information processing device according to claim 1 , wherein the pseudo label is one of a mode, a median, an average, a minimum value, and a maximum value of the cluster information corresponding to the feature included in the range.
特徴量抽出部が、データから特徴量を抽出するステップと、
クラスタリング部が、特徴量空間における前記特徴量のクラスタリングにより前記特徴量が属するクラスタを決定するステップと、
疑似ラベル付与部が、前記クラスタ内で範囲を順次設定し、前記範囲内に含まれる前記特徴量に対応するクラスタ情報に基づいて疑似ラベルを生成し、生成した前記疑似ラベルを前記範囲内に含まれる少なくとも一つの前記特徴量に付与するステップと、
推論部が、前記特徴量に基づいて推論値を算出するステップと、
パラメータ更新部が、前記疑似ラベルおよび前記推論値に基づいて前記特徴量抽出部および前記推論部のパラメータを更新するステップと、
を備える学習方法。 A learning method executed by an information processing device,
A feature extraction unit extracts features from data;
a clustering unit determining a cluster to which the feature value belongs by clustering the feature value in a feature value space;
a pseudo label assignment unit sequentially setting ranges within the cluster, generating pseudo labels based on cluster information corresponding to the feature amounts included in the ranges, and assigning the generated pseudo labels to at least one of the feature amounts included in the ranges;
an inference unit calculating an inferred value based on the feature amount;
a parameter updating unit updating parameters of the feature extraction unit and the inference unit based on the pseudo label and the inference value;
A learning method that includes:
データから特徴量を抽出する特徴量抽出部、
特徴量空間における前記特徴量のクラスタリングにより前記特徴量が属するクラスタを決定するクラスタリング部、
前記クラスタ内で範囲を順次設定し、前記範囲内に含まれる前記特徴量に対応するクラスタ情報に基づいて疑似ラベルを生成し、生成した前記疑似ラベルを前記範囲内に含まれる少なくとも一つの前記特徴量に付与する疑似ラベル付与部、
前記特徴量に基づいて推論値を算出する推論部、
前記疑似ラベルおよび前記推論値に基づいて前記特徴量抽出部および前記推論部のパラメータを更新するパラメータ更新部として機能させるためのプログラム。 Computer,
a feature extraction unit that extracts features from the data;
a clustering unit that determines a cluster to which the feature value belongs by clustering the feature value in a feature value space;
a pseudo label assignment unit that sequentially sets ranges within the cluster, generates pseudo labels based on cluster information corresponding to the feature amounts included in the ranges, and assigns the generated pseudo labels to at least one of the feature amounts included in the ranges;
an inference unit that calculates an inference value based on the feature amount;
a program for causing the program to function as a parameter update unit that updates parameters of the feature extraction unit and the inference unit based on the pseudo label and the inference value;
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/JP2024/016359 WO2025224959A1 (en) | 2024-04-26 | 2024-04-26 | Information processing device, training method, and program |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/JP2024/016359 WO2025224959A1 (en) | 2024-04-26 | 2024-04-26 | Information processing device, training method, and program |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025224959A1 true WO2025224959A1 (en) | 2025-10-30 |
Family
ID=97489521
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2024/016359 Pending WO2025224959A1 (en) | 2024-04-26 | 2024-04-26 | Information processing device, training method, and program |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025224959A1 (en) |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN116150354A (en) * | 2022-11-04 | 2023-05-23 | 马上消费金融股份有限公司 | Intention recognition method and device, electronic equipment and storage medium |
| JP2023092974A (en) * | 2021-12-22 | 2023-07-04 | 富士通株式会社 | Information processing program, information processing method, and information processing apparatus |
| WO2023166578A1 (en) * | 2022-03-02 | 2023-09-07 | 日本電気株式会社 | Labeling assistance system, labeling assistance method, and labeling assistance program |
| JP2024029832A (en) * | 2022-08-23 | 2024-03-07 | 富士通株式会社 | Judgment program, judgment device, and judgment method |
-
2024
- 2024-04-26 WO PCT/JP2024/016359 patent/WO2025224959A1/en active Pending
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2023092974A (en) * | 2021-12-22 | 2023-07-04 | 富士通株式会社 | Information processing program, information processing method, and information processing apparatus |
| WO2023166578A1 (en) * | 2022-03-02 | 2023-09-07 | 日本電気株式会社 | Labeling assistance system, labeling assistance method, and labeling assistance program |
| JP2024029832A (en) * | 2022-08-23 | 2024-03-07 | 富士通株式会社 | Judgment program, judgment device, and judgment method |
| CN116150354A (en) * | 2022-11-04 | 2023-05-23 | 马上消费金融股份有限公司 | Intention recognition method and device, electronic equipment and storage medium |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN113326731B (en) | Cross-domain pedestrian re-identification method based on momentum network guidance | |
| KR102796191B1 (en) | Method for optimizing neural networks | |
| CN107529650B (en) | Closed loop detection method and device and computer equipment | |
| US9613298B2 (en) | Tracking using sensor data | |
| US11272097B2 (en) | Aesthetic learning methods and apparatus for automating image capture device controls | |
| JP6965206B2 (en) | Clustering device, clustering method and program | |
| US9111375B2 (en) | Evaluation of three-dimensional scenes using two-dimensional representations | |
| CN110555390A (en) | pedestrian re-identification method, device and medium based on semi-supervised training mode | |
| WO2020228525A1 (en) | Place recognition method and apparatus, model training method and apparatus for place recognition, and electronic device | |
| KR20200022739A (en) | Method and device to recognize image and method and device to train recognition model based on data augmentation | |
| CN108090919A (en) | Improved kernel correlation filtering tracking method based on super-pixel optical flow and adaptive learning factor | |
| CN111695590B (en) | A Feature Visualization Method for Deep Neural Networks Based on Constrained Optimization Class Activation Maps | |
| CN110427835B (en) | Electromagnetic Signal Recognition Method and Device Based on Graph Convolutional Network and Transfer Learning | |
| CN119049133B (en) | Garbage classification throwing behavior identification method and system based on AI algorithm | |
| CN114462479B (en) | Model training methods, retrieval methods, models, devices, and media | |
| CN114008666B (en) | Dynamic image resolution assessment | |
| US20150356350A1 (en) | unsupervised non-parametric multi-component image segmentation method | |
| CN112446888A (en) | Processing method and processing device for image segmentation model | |
| EP4517585A1 (en) | Long duration structured video action segmentation | |
| CN119863665A (en) | Ore rapid sorting method based on deep learning under self-adaptive illumination intensity | |
| CN110033012A (en) | A kind of production method for tracking target based on channel characteristics weighted convolution neural network | |
| Sinha et al. | Human activity recognition from uav videos using an optimized hybrid deep learning model | |
| Malialis et al. | Data augmentation on-the-fly and active learning in data stream classification | |
| JP2017117025A (en) | Pattern identification method, device thereof, and program thereof | |
| Ren et al. | Research on infrared small target segmentation algorithm based on improved mask R-CNN |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24936877 Country of ref document: EP Kind code of ref document: A1 |