JP6951913B2

JP6951913B2 - Classification model generator, image data classification device and their programs

Info

Publication number: JP6951913B2
Application number: JP2017170806A
Authority: JP
Inventors: 吉彦河合; 佐野　雅規; 雅規佐野
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2017-09-06
Filing date: 2017-09-06
Publication date: 2021-10-20
Anticipated expiration: 2037-09-06
Also published as: JP2019046334A

Description

本発明は、画像データを分類するための畳み込みニューラルネットワークで構成される分類モデルを生成する分類モデル生成装置、分類モデルにより画像データを分類する画像データ分類装置およびそれらのプログラムに関する。 The present invention relates to a classification model generator that generates a classification model composed of a convolutional neural network for classifying image data, an image data classification device that classifies image data by the classification model, and a program thereof.

従来、画像データを分類する手法として、畳み込みニューラルネットワーク（Convolutional Neural Network：ＣＮＮ）を用いた手法が用いられている（非特許文献１，２等）。
ここで、図１２，図１３を参照して、ＣＮＮの一例についてその概要を説明する。ＣＮＮは、図１２に示すように、入力層Ｉと、隠れ層Ｈと、出力層Ｏとの各層で構成される。各層は、複数のノード（ユニット）をエッジで結んだ構造を有する。なお、図１２ではＣＮＮの説明を簡易にするため、各層の数を少なくし、入力画像の大きさを小さくして説明している。 Conventionally, as a method for classifying image data, a method using a convolutional neural network (CNN) has been used (Non-Patent Documents 1, 2, etc.).
Here, an outline of an example of CNN will be described with reference to FIGS. 12 and 13. As shown in FIG. 12, the CNN is composed of an input layer I, a hidden layer H, and an output layer O. Each layer has a structure in which a plurality of nodes (units) are connected by edges. In FIG. 12, in order to simplify the explanation of CNN, the number of each layer is reduced and the size of the input image is reduced.

入力層Ｉは、分類対象となる画像データ（入力画像）を入力する層である。
隠れ層Ｈは、複数の畳み込み層Ｃ（Ｃ_１，Ｃ_２，…）およびプーリング層Ｐ（Ｐ_１，Ｐ_２，…）と、全結合層Ｆ（Ｆ_１，Ｆ_２，…）とを介して、入力画像から特徴量（特徴マップ）を抽出する層である。なお、隠れ層Ｈは、畳み込み層Ｃを連続して設けたり、正規化層を設けたり等、図１２の構成には限定されない。 The input layer I is a layer for inputting image data (input image) to be classified.
The hidden layer H is provided via a plurality of convolution layers C (C ₁ , C ₂ , ...), a pooling layer P (P ₁ , P ₂ , ...), And a fully connected layer F (F ₁ , F ₂ , ...). This is a layer for extracting features (feature maps) from the input image. The hidden layer H is not limited to the configuration shown in FIG. 12, such as providing a convolution layer C continuously or providing a normalized layer.

畳み込み層Ｃは、入力画像、あるいは、前層の出力となる特徴マップに対して、複数の畳み込みフィルタによって画像の畳み込み演算を行うものである。図１２では、例えば、畳み込み層Ｃ_１において、２４×２４画素の入力画像に対して、４つの畳み込みフィルタによって畳み込み演算を行うことで、４つの２０×２０画素の特徴マップＭ_１（４＠２０×２０）を生成した例を示している。 The convolution layer C performs an image convolution calculation on the input image or the feature map that is the output of the previous layer by a plurality of convolution filters. In FIG. 12, for example, in the convolution layer C ₁ _{, a feature map M 1} (4 @ 20) of four 20 × 20 pixels is performed by performing a convolution operation on an input image of 24 × 24 pixels by four convolution filters. An example of generating × 20) is shown.

この畳み込み層Ｃは、図１３に示すように、畳み込みフィルタＣｆの大きさ（ここでは、３×３画素）に対応する前の層（第Ｌ層）の画像に対して、順次、畳み込みフィルタＣｆを移動させて畳み込み処理を行い、活性化関数ｆ（例えば、正規化線形関数ｍａｘ（０，ｘ））による演算を行うことで、次の層（第（Ｌ＋１）層）の画素値を求める。なお、ここでは、畳み込みフィルタＣｆを４つとし、第Ｌ層の画像から、４つの第（Ｌ＋１）層の特徴マップを生成した例を示している。 As shown in FIG. 13, the convolution layer C sequentially refers to the image of the previous layer (Lth layer) corresponding to the size of the convolution filter Cf (here, 3 × 3 pixels). Is moved to perform a convolution process, and an operation is performed by the activation function f (for example, the normalized linear function max (0, x)) to obtain the pixel value of the next layer (third (L + 1) layer). Here, an example is shown in which the convolution filters Cf are set to four and the feature maps of the four (L + 1) layers are generated from the image of the L layer.

プーリング層Ｐは、畳み込み層Ｃで生成される特徴マップＭをサブサンプリングするものである。図１２では、例えば、プーリング層Ｐ_１において、４つの２０×２０画像の特徴マップＭ_１（４＠２０×２０）に対して、水平垂直にそれぞれ１／２のサブサンプリングを行うことで、４つの１０×１０画像の特徴マップＭ_２（４＠１０×１０）を生成した例を示している。 The pooling layer P subsamples the feature map M generated by the convolution layer C. In FIG. 12, for example, in the pooling layer _{P 1,} with respect to the four 20 × 20 image feature map _M 1 (4 @ 20 × 20), by performing half the subsampling horizontally vertically, respectively, 4 _{An example of generating a feature map M 2} (4 @ 10 × 10) of two 10 × 10 images is shown.

全結合層Ｆは、複数の畳み込み層Ｃおよびプーリング層Ｐを介して生成される特徴マップを１次元のベクトルとする多層パーセプトロンである。この全結合層Ｆは、複数の層（Ｆ_１，Ｆ_２，…）で構成され、各層のノードは次の層のノードとすべて繋がっている。
出力層Ｏは、入力画像の分類結果を確率値として出力する層である。この出力層Ｏは、全結合層Ｆの出力をすべて接続した分類対象と同じノード数を持ち、活性化関数（例えばソフマックス関数）により、ノードごとの確率値を出力する。
このＣＮＮは、学習段階において、分類が既知の複数の画像データにより、各層のパラメータ（ネットワーク）を学習し、分類段階において、学習したパラメータにより、分類が未知の画像データを分類する。 The fully connected layer F is a multi-layer perceptron having a feature map generated through a plurality of convolution layers C and a pooling layer P as a one-dimensional vector. This fully connected layer F is composed of a plurality of layers (F ₁ , F ₂ , ...), And the nodes of each layer are all connected to the nodes of the next layer.
The output layer O is a layer that outputs the classification result of the input image as a probability value. This output layer O has the same number of nodes as the classification target to which all the outputs of the fully connected layer F are connected, and outputs the probability value for each node by the activation function (for example, the Sofmax function).
This CNN learns the parameters (network) of each layer from a plurality of image data whose classification is known in the learning stage, and classifies the image data whose classification is unknown by the learned parameters in the classification stage.

Quoc V Le,”Building high-level features using large scale unsupervised learning”, ICASSP, 2013Quoc V Le, “Building high-level features using large scale unsupervised learning”, ICASSP, 2013 Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton,”ImageNet classification with deep convolutional neural networks”, NIPS, 2012Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, “ImageNet classification with deep convolutional neural networks”, NIPS, 2012

前記したＣＮＮは、画像データから特徴量を抽出するために、畳み込みフィルタを移動させながら畳み込み処理を行っている。この畳み込みフィルタは、画像データの内容に依存せず、常にフィルタの向きは一定である。
例えば、図１４に示すように、同じオブジェクト（一例として、「家」の画像）が異なる画像データ内で傾いた状態であった場合、図１４（ａ），（ｂ），（ｃ）において、オブジェクトの同一領域（「煙突部分」の画像領域）で畳み込みフィルタＣｆにより畳み込み処理を行って特徴量を抽出すると、同じオブジェクトの同一領域であっても、それぞれ異なった特徴量が抽出されることになる。
そのため、従来のＣＮＮは、図１４（ａ），（ｂ），（ｃ）の各画像データに同一のオブジェクトが含まれていても、オブジェクトが傾くことで異なるオブジェクトを含んだ画像データとして分類してしまうことになる。 The CNN described above performs a convolution process while moving the convolution filter in order to extract a feature amount from the image data. This convolution filter does not depend on the content of the image data, and the direction of the filter is always constant.
For example, as shown in FIG. 14, when the same object (for example, an image of “house”) is tilted in different image data, in FIGS. 14 (a), (b), and (c), When features are extracted by performing convolution processing with the convolution filter Cf in the same area of the object (image area of the "chimney part"), different features are extracted even in the same area of the same object. Become.
Therefore, the conventional CNN classifies as image data including different objects by tilting the objects even if the same objects are included in the image data of FIGS. 14 (a), (b), and (c). Will end up.

これらの画像データ内のオブジェクトを同一のオブジェクトとして認識するためには、オブジェクトを様々な方向に傾けた画像データを学習データとして、ＣＮＮを学習する必要がある。
このように、従来のＣＮＮを用いた画像データの分類手法は、様々な方向のオブジェクトを含んだ画像データを学習データとして準備する必要があり、学習データの量と学習に要する時間が膨大になってしまうという問題がある。 In order to recognize the objects in these image data as the same object, it is necessary to learn CNN by using the image data in which the objects are tilted in various directions as learning data.
As described above, in the conventional image data classification method using CNN, it is necessary to prepare image data including objects in various directions as learning data, and the amount of training data and the time required for learning become enormous. There is a problem that it will end up.

そこで、本発明は、１つの方向のオブジェクトの画像データからＣＮＮ（分類モデル）を学習するだけで、画像データ内のオブジェクトの向きに関わらず同一のオブジェクトとして認識し、画像データを分類することが可能な分類モデルを生成する分類モデル生成装置、その分類モデルを用いて画像データを分類する画像データ分類装置およびそれらのプログラムを提供することを課題とする。 Therefore, in the present invention, it is possible to classify the image data by recognizing it as the same object regardless of the orientation of the objects in the image data only by learning the CNN (classification model) from the image data of the objects in one direction. An object of the present invention is to provide a classification model generator that generates a possible classification model, an image data classification device that classifies image data using the classification model, and a program thereof.

前記課題を解決するため、本発明に係る分類モデル生成装置は、分類が既知の複数の画像データから、分類が未知の画像データを分類するための畳み込みニューラルネットワークである分類モデルを生成する分類モデル生成装置であって、領域別主方向推定手段と、分類モデル学習手段と、を備える構成とした。 In order to solve the above problems, the classification model generator according to the present invention generates a classification model which is a convolutional neural network for classifying image data whose classification is unknown from a plurality of image data whose classification is known. The generator is configured to include a region-specific main direction estimation means and a classification model learning means.

かかる構成において、分類モデル生成装置は、領域別主方向推定手段によって、分類が既知の画像データから、畳み込みニューラルネットワークの最初の畳み込み層の畳み込みフィルタを適用するフィルタ領域ごとに、画像のエッジ成分の主たる方向（主方向）を推定する。なお、エッジ成分の主方向は、ソーベルフィルタ等を用いて推定することができる。 In such a configuration, the classification model generator uses the region-specific main direction estimation means to apply the convolution filter of the first convolutional layer of the convolutional neural network from the image data whose classification is known, for each filter region of the image. Estimate the main direction (main direction). The main direction of the edge component can be estimated by using a Sobel filter or the like.

そして、分類モデル生成装置は、分類モデル学習手段によって、分類が既知の画像データと分類内容を示す教師データとから、畳み込みニューラルネットワークを学習し分類モデルを生成する。このとき、分類モデル学習手段は、最初の畳み込み層において、フィルタ領域ごとに、フィルタ領域の予め定めた基準の向きが、領域別主方向推定手段で推定されたエッジ成分の主方向に対して一定方向となるようフィルタ領域を回転させ、その回転した領域に対して、空間フィルタである畳み込みフィルタを適用して畳み込み演算を行う。
これによって、分類モデル学習手段は、最初の畳み込み層において、画像データ内のオブジェクトの向きに対してほぼ不変な特徴量を抽出することができる。
なお、分類モデル生成装置は、コンピュータを、前記した各手段として機能させるための分類モデル生成プログラムで動作させることができる。 Then, the classification model generation device learns the convolutional neural network from the image data whose classification is known and the teacher data indicating the classification contents by the classification model learning means, and generates the classification model. At this time, in the classification model learning means, in the first convolution layer, the orientation of the predetermined reference of the filter region is constant with respect to the main direction of the edge component estimated by the region-specific main direction estimation means for each filter region. The filter area is rotated so as to be in the direction, and a convolution filter, which is a spatial filter, is applied to the rotated area to perform a convolution operation.
As a result, the classification model learning means can extract features that are almost invariant to the orientation of the objects in the image data in the first convolutional layer.
The classification model generation device can be operated by a classification model generation program for operating the computer as each of the above-mentioned means.

また、前記課題を解決するため、本発明に係る画像データ分類装置は、モデル生成装置で生成された畳み込みニューラルネットワークである分類モデルを用いて、分類が未知の画像データを分類する画像データ分類装置であって、領域別主方向推定手段と、分類手段と、を備える構成とした。 Further, in order to solve the above problems, the image data classification device according to the present invention is an image data classification device that classifies image data whose classification is unknown by using a classification model which is a convolutional neural network generated by the model generation device. Therefore, the configuration is provided with a region-specific main direction estimation means and a classification means.

かかる構成において、画像データ分類装置は、領域別主方向推定手段によって、分類が未知の画像データから、畳み込みニューラルネットワークの最初の畳み込み層の畳み込みフィルタを適用するフィルタ領域ごとに、画像のエッジ成分の主方向を推定する。なお、エッジ成分の主方向は、ソーベルフィルタ等を用いて推定することができる。 In such a configuration, the image data classification device uses the region-specific main direction estimation means to apply the convolution filter of the first convolutional layer of the convolutional neural network from the image data whose classification is unknown, to the edge component of the image for each filter region. Estimate the main direction. The main direction of the edge component can be estimated by using a Sobel filter or the like.

そして、画像データ分類装置は、分類手段によって、分類モデルである畳み込みニューラルネットワークにより、分類が未知の画像データを分類する。このとき、分類手段は、最初の畳み込み層において、フィルタ領域ごとに、フィルタ領域の予め定めた基準の向きが、領域別主方向推定手段で推定されたエッジ成分の主方向に対して一定方向となるようにフィルタ領域を回転させ、その回転した領域に対して、空間フィルタである畳み込みフィルタを適用して畳み込み演算を行う。
これによって、分類手段は、最初の畳み込み層において、画像データ内のオブジェクトの向きに対してほぼ不変な特徴量を抽出することができる。
なお、画像データ分類装置は、コンピュータを、前記した各手段として機能させるための画像データ分類プログラムで動作させることができる。 Then, the image data classification device classifies the image data whose classification is unknown by the classification means and the convolutional neural network which is a classification model. At this time, in the first convolution layer, the classification means sets the orientation of the predetermined reference of the filter region as a constant direction with respect to the main direction of the edge component estimated by the region-specific main direction estimation means for each filter region. The filter area is rotated so as to be, and a convolution filter, which is a spatial filter, is applied to the rotated area to perform a convolution operation.
As a result, the classification means can extract features that are almost invariant to the orientation of the objects in the image data in the first convolution layer.
The image data classification device can be operated by an image data classification program for operating the computer as each of the above-mentioned means.

また、前記課題を解決するため、本発明に係る画像データ分類装置は、分類が既知の複数の画像データから、分類が未知の画像データを分類するための畳み込みニューラルネットワークである分類モデルを生成し、分類が未知の画像データを分類する画像データ分類装置であって、領域別主方向推定手段と、分類モデル学習手段と、分類手段と、を備える構成としてもよい。 Further, in order to solve the above problems, the image data classification device according to the present invention generates a classification model which is a convolutional neural network for classifying image data whose classification is unknown from a plurality of image data whose classification is known. , An image data classification device for classifying image data whose classification is unknown, and may be configured to include a region-specific main direction estimation means, a classification model learning means, and a classification means.

本発明は、以下に示す優れた効果を奏するものである。
本発明によれば、最初の畳み込み層において、畳み込みフィルタのフィルタ領域の予め定めた基準の向きを、フィルタ領域に対応する画像のエッジ成分の主方向に対して一定方向となるように回転して畳み込み演算を行うことで、画像データ内のオブジェクトの向きに対してほぼ不変な特徴量を抽出することができる。
これによって、本発明は、１つの方向のオブジェクトの画像データを学習データとして用いて分類モデルを学習すればよく、学習データの量と学習に要する時間を抑えることができる。また、本発明は、画像データ内のオブジェクトが傾いているか否かに関わらず同一のオブジェクトとして認識し、画像データを分類することができる。 The present invention has the following excellent effects.
According to the present invention, in the first convolution layer, the orientation of a predetermined reference of the filter region of the convolution filter is rotated so as to be a constant direction with respect to the main direction of the edge component of the image corresponding to the filter region. By performing the convolution operation, it is possible to extract a feature amount that is almost invariant with respect to the orientation of the object in the image data.
Thereby, in the present invention, the classification model may be learned by using the image data of the object in one direction as the learning data, and the amount of the learning data and the time required for the learning can be suppressed. Further, the present invention can classify the image data by recognizing the objects in the image data as the same object regardless of whether or not the objects are tilted.

本発明の実施形態に係る画像データ分類装置の構成を示すブロック構成図である。It is a block block diagram which shows the structure of the image data classification apparatus which concerns on embodiment of this invention. 畳み込み層において適用する畳み込みフィルタのフィルタ領域の大きさと移動量を説明するための説明図である。It is explanatory drawing for demonstrating the size and the movement amount of the filter area of the convolution filter applied in the convolution layer. ソーベルフィルタの例を示す図であって、（ａ）は縦方向ソーベルフィルタ、（ｂ）は横方向ソーベルフィルタを示す。It is a figure which shows the example of a sobel filter, (a) shows a vertical sovel filter, (b) shows a horizontal sovel filter. エッジ成分の主方向を求める手法を説明するための説明図であって、（ａ）はフィルタ領域のエッジ成分の勾配強度および勾配方向を画素ごとにベクトルで表した図、（ｂ）は勾配方向を量子化して勾配強度の累計をヒストグラム化した図である。It is explanatory drawing for demonstrating the method of finding the main direction of an edge component, (a) is the figure which represented the gradient intensity and the gradient direction of the edge component of a filter region by a vector for each pixel, (b) is the gradient direction. Is a histogram of the cumulative gradient intensity. フィルタ領域の回転方向を説明するための説明図であって、（ａ）はフィルタ領域の基準方向とエッジ成分の主方向との関係を示す図、（ｂ）はフィルタ領域を回転させた図、（ｃ）は回転したフィルタ領域に対して適用する畳み込みフィルタの対応画素を示す図である。It is explanatory drawing for demonstrating the rotation direction of a filter area, (a) is a figure which shows the relationship between the reference direction of a filter area and the main direction of an edge component, (b) is a figure which rotated the filter area, (C) is a figure which shows the corresponding pixel of the convolution filter applied to the rotated filter area. フィルタ領域を回転させながらフィルタ領域を移動させて畳み込み処理を行う例を説明するための説明図である。It is explanatory drawing for demonstrating an example which performs a convolution process by moving a filter area while rotating a filter area. 本発明の最初の畳み込み層のフィルタ領域を説明するための図であって、（ａ）〜（ｃ）は、同じオブジェクトの同一領域において、ほぼ同じ方向の特徴量が抽出される例を説明するための説明図である。It is a figure for demonstrating the filter area of the first convolution layer of this invention, and (a)-(c) explain the example in which the feature quantity in substantially the same direction is extracted in the same area of the same object. It is explanatory drawing for this. 本発明の実施形態に係る画像データ分類装置の学習モードの動作を示すフローチャートである。It is a flowchart which shows the operation of the learning mode of the image data classification apparatus which concerns on embodiment of this invention. 本発明の実施形態に係る画像データ分類装置の分類モードの動作を示すフローチャートである。It is a flowchart which shows the operation of the classification mode of the image data classification apparatus which concerns on embodiment of this invention. 本発明の他の実施形態に係る分類モデル生成装置の構成を示すブロック構成図である。It is a block block diagram which shows the structure of the classification model generation apparatus which concerns on other embodiment of this invention. 本発明の他の実施形態に係る画像データ分類装置の構成を示すブロック構成図である。It is a block block diagram which shows the structure of the image data classification apparatus which concerns on other embodiment of this invention. 畳み込みニューラルネットワークの構造の例を示すネットワーク図である。It is a network diagram which shows an example of the structure of a convolutional neural network. 畳み込みニューラルネットワークの畳み込み層の処理を説明するための説明図である。It is explanatory drawing for demonstrating the processing of the convolutional layer of a convolutional neural network. 従来の畳み込みフィルタの領域を説明するための図であって、（ａ）〜（ｃ）は、同じオブジェクトの同一領域において、それぞれ異なった方向の特徴量が抽出される例を説明するための説明図である。It is a figure for demonstrating the area of the conventional convolution filter, and (a)-(c) are the explanation for demonstrating the example in which the feature amount of a different direction is extracted in the same area of the same object. It is a figure.

以下、本発明の実施形態について図面を参照して説明する。
＜画像データ分類装置の構成＞
まず、図１を参照して、本発明の実施形態に係る画像データ分類装置１の構成について説明する。
画像データ分類装置１は、画像データを、画像データ内のオブジェクトにより分類するための畳み込みニューラルネットワーク（ＣＮＮ；以下、分類モデルという）を学習し、その分類モデルを用いて、画像データを分類するものである。この画像データ分類装置１は、分類モデルを学習するモード（以下、「学習モードという」）と、画像データを分類するモード（以下、「分類モード」という）の２つの異なる動作モードを有する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
<Configuration of image data classification device>
First, the configuration of the image data classification device 1 according to the embodiment of the present invention will be described with reference to FIG.
The image data classification device 1 learns a convolutional neural network (CNN; hereinafter referred to as a classification model) for classifying image data by objects in the image data, and classifies the image data using the classification model. Is. The image data classification device 1 has two different operation modes: a mode for learning a classification model (hereinafter, referred to as “learning mode”) and a mode for classifying image data (hereinafter, referred to as “classification mode”).

学習モードにおいて、画像データ分類装置１は、分類が既知の画像データと、その分類内容を示す教師データとを学習データとして複数入力し、分類モデルを学習する。ここで、教師データは、例えば、分類対象が人物であれば、それぞれの人物を一意に特定する情報（例えば、人物名等）である。
分類モードにおいて、画像データ分類装置１は、分類が未知の画像データを入力し、分類モデルを用いて分類した結果（分類結果）を出力する。
以下、この２つの動作モードで動作する画像データ分類装置１の構成を詳細に説明する。 In the learning mode, the image data classification device 1 learns the classification model by inputting a plurality of image data whose classification is known and teacher data indicating the classification contents as learning data. Here, the teacher data is, for example, information that uniquely identifies each person (for example, a person's name, etc.) if the classification target is a person.
In the classification mode, the image data classification device 1 inputs image data whose classification is unknown and outputs a result (classification result) of classification using the classification model.
Hereinafter, the configuration of the image data classification device 1 that operates in these two operation modes will be described in detail.

画像データ分類装置１は、学習用データ入力手段１０と、分類用データ入力手段１１と、領域別主方向推定手段１２と、領域別主方向記憶手段１３と、分類モデル学習手段１４と、分類モデル記憶手段１５と、分類手段１６と、を備える。 The image data classification device 1 includes a learning data input means 10, a classification data input means 11, a region-specific main direction estimation means 12, a region-specific main direction storage means 13, a classification model learning means 14, and a classification model. A storage means 15 and a classification means 16 are provided.

学習用データ入力手段１０は、学習データとして、分類が既知の画像データと、その分類内容を示す教師データとを入力するものである。この学習用データ入力手段１０は、入力した画像データを、領域別主方向推定手段１２および分類モデル学習手段１４に出力する。また、学習用データ入力手段１０は、入力した教師データを、分類モデル学習手段１４に出力する。 The learning data input means 10 inputs image data whose classification is known and teacher data indicating the classification contents as learning data. The learning data input means 10 outputs the input image data to the area-specific main direction estimation means 12 and the classification model learning means 14. Further, the learning data input means 10 outputs the input teacher data to the classification model learning means 14.

分類用データ入力手段１１は、分類が未知の画像データを入力するものである。この分類用データ入力手段１１は、入力した画像データを、領域別主方向推定手段１２および分類手段１６に出力する。 The classification data input means 11 inputs image data whose classification is unknown. The classification data input means 11 outputs the input image data to the area-specific main direction estimation means 12 and the classification means 16.

領域別主方向推定手段１２は、分類モデル（ＣＮＮ）の最初の畳み込み層で行う畳み込み処理において畳み込みフィルタを適用する画像領域（フィルタ領域）ごとに、画像データのエッジ成分の主方向を推定するものである。この領域別主方向推定手段１２は、学習モードにおいては画像データを学習用データ入力手段１０から入力し、分類モードにおいては画像データを分類用データ入力手段１１から入力する。 The area-specific main direction estimating means 12 estimates the main direction of the edge component of the image data for each image area (filter area) to which the convolution filter is applied in the convolution process performed in the first convolution layer of the classification model (CNN). Is. The region-specific main direction estimation means 12 inputs image data from the learning data input means 10 in the learning mode, and inputs image data from the classification data input means 11 in the classification mode.

図２に、畳み込み層において適用するフィルタ領域の大きさと移動量の例を示す。ここでは、畳み込みフィルタの大きさ（ここでは、３×３画素）と同じで、畳み込みフィルタの移動幅（ストライド：ここでは、水平・垂直方向ともに１画素）で移動させたフィルタ領域Ｒ，Ｒ，…，Ｒの例を示す。もちろん、畳み込みフィルタの大きさおよび移動幅は、これに限定されるものではない。
領域別主方向推定手段１２は、図２に例示したフィルタ領域Ｒごとに、エッジ成分の主方向を推定する。なお、エッジ成分の主方向を推定する手法は、ソーベル（Ｓｏｂｅｌ）フィルタを用いる等の一般的な手法を用いることができる。 FIG. 2 shows an example of the size and movement amount of the filter region applied in the convolution layer. Here, the size of the convolution filter (here, 3 × 3 pixels) is the same, and the filter areas R, R, which are moved by the movement width of the convolution filter (stride: here, 1 pixel in both the horizontal and vertical directions). ..., an example of R is shown. Of course, the size and movement width of the convolution filter are not limited to this.
The region-specific main direction estimating means 12 estimates the main direction of the edge component for each of the filter regions R illustrated in FIG. As a method for estimating the main direction of the edge component, a general method such as using a Sobel filter can be used.

ここで、ソーベルフィルタを用いて、フィルタ領域Ｒのエッジ成分の主方向を推定する手法について簡単に説明する。
まず、領域別主方向推定手段１２は、図３に例示したソーベルフィルタ（（ａ）縦方向ソーベルフィルタ、（ｂ）横方向ソーベルフィルタ）を用い、フィルタ領域Ｒの画素ごとに、近接画素の画素値からエッジ成分の勾配強度および勾配方向を演算する。
ここで、フィルタ領域Ｒの（ｘ，ｙ）座標の画素に、図３（ａ）の縦方向ソーベルフィルタを適用した値をｆ_ｘ（ｘ，ｙ）、図３（ｂ）の横方向ソーベルフィルタを適用した値をｆ_ｙ（ｘ，ｙ）としたとき、領域別主方向推定手段１２は、以下の式（１）により、（ｘ，ｙ）座標の画素のエッジ成分の勾配強度Ｇ（ｘ，ｙ）を求め、以下の式（２）により、（ｘ，ｙ）座標の画素のエッジ成分の勾配方向θ（ｘ，ｙ）を求める。 Here, a method of estimating the main direction of the edge component of the filter region R using the Sobel filter will be briefly described.
First, the area-specific main direction estimating means 12 uses the sobel filter ((a) vertical sobel filter, (b) horizontal sobel filter) illustrated in FIG. 3, and is close to each pixel of the filter area R. The gradient strength and gradient direction of the edge component are calculated from the pixel value of the pixel.
Here, the values obtained by applying the vertical sobel filter of FIG. 3 (a) to the pixels of the (x, y) coordinates of the filter area R are _fx (x, y), and the horizontal saw of FIG. 3 (b). When the value to which the bell filter is applied is set to f _y (x, y), the region-specific main direction estimation means 12 uses the following equation (1) to determine the gradient intensity G of the edge component of the pixel at the (x, y) coordinate. (X, y) is obtained, and the gradient direction θ (x, y) of the edge component of the pixel at the (x, y) coordinate is obtained by the following equation (2).

これによって、図４（ａ）に示すように、フィルタ領域Ｒの画素ごとに、エッジ成分の勾配強度（ベクトルの長さ）および勾配方向（ベクトルの方向）を求めることができる。そして、領域別主方向推定手段１２は、図４（ａ）に示した画素ごとのエッジ成分の勾配方向を、図４（ｂ）に示すように量子化（例えば、５°単位で量子化）して、勾配方向ごとの勾配強度を累計したヒストグラムを生成する。 Thereby, as shown in FIG. 4A, the gradient intensity (vector length) and the gradient direction (vector direction) of the edge component can be obtained for each pixel of the filter region R. Then, the region-specific main direction estimating means 12 quantizes the gradient direction of the edge component for each pixel shown in FIG. 4 (a) as shown in FIG. 4 (b) (for example, quantized in units of 5 °). Then, a histogram that accumulates the gradient intensities for each gradient direction is generated.

そして、領域別主方向推定手段１２は、図４（ｂ）に示したヒストグラムにおける勾配強度の累計がピークとなる勾配方向を、フィルタ領域Ｒのエッジ成分の主方向として推定する。なお、領域別主方向推定手段１２は、明確なピークを検出できない場合、エッジ成分の主方向が存在しないこととし、例えば、主方向を０°とする。ここで、ヒストグラムに明確なピークが存在するか否かは、例えば、勾配強度の最も大きい累計値に対する２番目に大きい累計値の割合が予め定めた割合よりも大きい場合等とすればよい。
図１に戻って、画像データ分類装置１の構成について説明を続ける。 Then, the region-specific main direction estimating means 12 estimates the gradient direction at which the cumulative gradient intensity in the histogram shown in FIG. 4B peaks as the main direction of the edge component of the filter region R. If the region-specific main direction estimation means 12 cannot detect a clear peak, it is assumed that the main direction of the edge component does not exist, and for example, the main direction is set to 0 °. Here, whether or not a clear peak exists in the histogram may be determined, for example, when the ratio of the second largest cumulative value to the highest cumulative value of the gradient intensity is larger than a predetermined ratio.
Returning to FIG. 1, the configuration of the image data classification device 1 will be described.

領域別主方向推定手段１２は、フィルタ領域Ｒ，Ｒ，…，Ｒ（図２）ごとのエッジ成分の主方向を、フィルタ領域Ｒの位置に対応付けて領域別主方向記憶手段１３に記憶する。
この領域別主方向推定手段１２は、画像データのすべてのフィルタ領域Ｒについてエッジ成分の主方向を推定した段階で、推定が完了したことを示す「推定完了通知」を、学習モードにおいては分類モデル学習手段１４に通知し、分類モードにおいては分類手段１６に通知する。 The area-specific main direction estimation means 12 stores the main direction of the edge component for each of the filter areas R, R, ..., R (FIG. 2) in the area-specific main direction storage means 13 in association with the position of the filter area R. ..
The region-specific main direction estimation means 12 provides a "estimation completion notification" indicating that the estimation is completed at the stage where the main directions of the edge components are estimated for all the filter regions R of the image data, in the learning mode, as a classification model. Notify the learning means 14, and in the classification mode, notify the classification means 16.

領域別主方向記憶手段１３は、画像データのフィルタ領域の位置と、領域別主方向推定手段１２で推定されたフィルタ領域に対応するエッジ成分の主方向とを対応付けて記憶するものである。この領域別主方向記憶手段１３は、ハードディスク、半導体メモリ等の一般的な記憶媒体で構成することができる。
この領域別主方向記憶手段１３に記憶されているフィルタ領域ごとのエッジ成分の主方向を、学習モードにおいては分類モデル学習手段１４が参照し、分類モードにおいては分類手段１６が参照する。 The area-specific main direction storage means 13 stores the position of the filter area of the image data in association with the main direction of the edge component corresponding to the filter area estimated by the area-specific main direction estimation means 12. The area-specific main direction storage means 13 can be configured by a general storage medium such as a hard disk or a semiconductor memory.
The classification model learning means 14 refers to the main direction of the edge component for each filter region stored in the region-specific main direction storage means 13 in the learning mode, and the classification means 16 refers to the edge component in the classification mode.

分類モデル学習手段１４は、学習用データ入力手段１０から入力される複数の学習データ（画像データ、教師データ）と、領域別主方向記憶手段１３に記憶されているフィルタ領域ごとのエッジ成分の主方向とを用いて、分類が未知の画像データを分類する分類モデルである畳み込みニューラルネットワーク（ＣＮＮ）を学習するものである。なお、分類モデルのパラメータ等の初期値は分類モデル記憶手段１５に記憶されており、分類モデル学習手段１４は、学習により、分類モデル記憶手段１５に記憶されている分類モデルのパラメータを更新する。 The classification model learning means 14 is mainly composed of a plurality of learning data (image data, teacher data) input from the learning data input means 10 and edge components for each filter area stored in the area-specific main direction storage means 13. A convolutional neural network (CNN), which is a classification model for classifying image data whose classification is unknown, is learned by using directions. Initial values such as parameters of the classification model are stored in the classification model storage means 15, and the classification model learning means 14 updates the parameters of the classification model stored in the classification model storage means 15 by learning.

この分類モデル学習手段１４は、ＣＮＮの最初の畳み込み層における畳み込み処理において、画像データのそれぞれのフィルタ領域を、エッジ成分の主方向に応じて所定角度回転させて、回転後のフィルタ領域に対して、畳み込み演算を行う。
図５（ａ）に示すように、例えば、画像データ内におけるあるフィルタ領域Ｒのエッジ成分の主方向が予め定めた基準方向（ここでは、画像の水平右方向〔０°方向〕とする）から３０°の方向であった場合、分類モデル学習手段１４は、図５（ｂ）に示すように、フィルタ領域Ｒの中心Ｏを基準に３０°回転させた領域を新たなフィルタ領域Ｒ_Ｎとする。なお、フィルタ領域Ｒの回転は、畳み込み対象のエリアを所定角度回転させるのみであり、領域内の画像を回転させるわけではない。 In the convolution process in the first convolution layer of the CNN, the classification model learning means 14 rotates each filter region of the image data by a predetermined angle according to the main direction of the edge component, and refers to the rotated filter region. , Performs a convolution operation.
As shown in FIG. 5A, for example, the main direction of the edge component of a certain filter region R in the image data is from a predetermined reference direction (here, the horizontal right direction [0 ° direction] of the image). In the case of the direction of 30 °, as shown in FIG. 5B, the classification model learning means 14 sets a region rotated by 30 ° with respect to the center O of the filter region R as a new filter region _RN . .. Note that the rotation of the filter area R only rotates the area to be convolved by a predetermined angle, and does not rotate the image in the area.

そして、分類モデル学習手段１４は、図５（ｃ）に示すように、回転前のフィルタ領域Ｒの画素領域（ａ１，ａ２，…，ａ９）の画素値の代わりに、回転後のフィルタ領域Ｒ_Ｎの画素領域（ｂ１，ｂ２，…，ｂ９）の画素値に対して、畳み込みフィルタを適用して畳み込み演算を行う。なお、厳密には、回転後のフィルタ領域Ｒ_Ｎの画素領域（ｂ１，ｂ２，…，ｂ９）の画素値とは、フィルタ領域Ｒ_Ｎの画素領域（ｂ１，ｂ２，…，ｂ９）のそれぞれの中心位置に対応する画素の画素値である。
これによって、分類モデル学習手段１４は、フィルタ領域Ｒのエッジ成分の主方向がどの方向であっても、主方向に対して、畳み込みフィルタを適用する方向を同じにすることができる。 Then, as shown in FIG. 5C, the classification model learning means 14 replaces the pixel values of the pixel regions (a1, a2, ..., A9) of the filter region R before rotation with the filter region R after rotation. _A convolution filter is applied to the pixel values in the N pixel region (b1, b2, ..., B9) to perform a convolution operation. Strictly speaking, the filter region _{R N} of the pixel region after the rotation (b1, b2, ..., b9 ) and the pixel value of the pixel area of the filter region _{R N} (b1, b2, ..., b9 ) of each It is a pixel value of a pixel corresponding to the center position.
As a result, the classification model learning means 14 can make the direction in which the convolution filter is applied the same as the main direction regardless of the main direction of the edge component of the filter region R.

そして、分類モデル学習手段１４は、図６に示すように、画像データのフィルタ領域を順次移動させる際に、エッジ成分の主方向に応じてフィルタ領域を回転させ、回転したフィルタ領域に畳み込みフィルタＣｆを適用して畳み込み処理を行う。
このように、分類モデル学習手段１４は、ＣＮＮの最初の畳み込み層において、すべてのフィルタ領域Ｒで、エッジ成分の主方向に対して畳み込みフィルタの向きが一定となるように畳み込み処理を行う。これにより、画像データ内のオブジェクトが傾いているか否かに関わらず、フィルタ領域ごとにほぼ不変な特徴量としてＣＮＮの次の層に伝播させることができる。
図１に戻って、画像データ分類装置１の構成について説明を続ける。 Then, as shown in FIG. 6, the classification model learning means 14 rotates the filter area according to the main direction of the edge component when the filter area of the image data is sequentially moved, and convolves the filter Cf into the rotated filter area. Is applied to perform the convolution process.
As described above, the classification model learning means 14 performs the convolution process in the first convolution layer of the CNN so that the direction of the convolution filter is constant with respect to the main direction of the edge component in all the filter regions R. As a result, regardless of whether the object in the image data is tilted or not, it can be propagated to the next layer of the CNN as a feature amount that is almost invariant for each filter region.
Returning to FIG. 1, the configuration of the image data classification device 1 will be described.

分類モデル学習手段１４は、最初の畳み込み層においてのみ、エッジ成分の主方向に応じた畳み込み処理を行い、以降の処理（２段目以降の畳み込み層、プーリング層、全結合層、出力層；図１２参照）は、従来のＣＮＮと同じ処理を行う。
そして、分類モデル学習手段１４は、入力された画像データに対応して出力層から出力される分類結果と、教師データである既知の分類結果との誤差をなくす方向（誤差関数の値が“０”に漸近するよう）に、例えば、誤差逆伝播法を用いて、分類モデルのパラメータ（畳み込みフィルタ、全結合層の層間の重み〔重み行列〕等）を更新する。この分類モデルのパラメータの更新は、一般的なＣＮＮの手法であるため、ここでは詳細な説明を省略する。
なお、後述するように、誤差逆伝播法によって、フィルタ領域を所定角度回転させた畳み込みフィルタの更新が可能である。 The classification model learning means 14 performs the convolution processing according to the main direction of the edge component only in the first convolution layer, and the subsequent processing (the second and subsequent convolution layers, the pooling layer, the fully connected layer, the output layer; 12) performs the same processing as the conventional CNN.
Then, the classification model learning means 14 has a direction of eliminating an error between the classification result output from the output layer corresponding to the input image data and the known classification result which is the training data (the value of the error function is "0"). The parameters of the classification model (convolution filter, weights between layers of fully connected layers [weight matrix], etc.) are updated using, for example, the error back propagation method. Since updating the parameters of this classification model is a general CNN method, detailed description thereof will be omitted here.
As will be described later, the convolution filter in which the filter region is rotated by a predetermined angle can be updated by the back-propagation method.

分類モデル記憶手段１５は、分類モデル学習手段１４で学習した分類モデルを記憶するものである。この分類モデル記憶手段１５は、ハードディスク、半導体メモリ等の一般的な記憶媒体で構成することができる。学習後の分類モデルは、分類手段１６によって参照される。
この分類モデル記憶手段１５には、予め分類モデルの構造（畳み込み層、プーリング層、全結合層等の構造、畳み込みフィルタの大きさ、数、移動幅等）を記憶するとともに、分類モデルのパラメータ（畳み込みフィルタ、全結合層の層間の重み〔重み行列〕等）の初期値を記憶しておく。なお、分類モデルのパラメータは、分類モデル学習手段１４によって、学習モードの動作時に更新される。 The classification model storage means 15 stores the classification model learned by the classification model learning means 14. The classification model storage means 15 can be configured by a general storage medium such as a hard disk or a semiconductor memory. The classification model after learning is referred to by the classification means 16.
The classification model storage means 15 stores in advance the structure of the classification model (structure of convolutional layer, pooling layer, fully connected layer, etc., size, number, movement width, etc. of the convolutional filter), and parameters of the classification model (convolutional model storage means 15). Store the initial values of the convolutional filter, the weight between the layers of the fully connected layer [weight matrix], etc.). The parameters of the classification model are updated by the classification model learning means 14 when the learning mode is operated.

分類手段１６は、領域別主方向記憶手段１３に記憶されているフィルタ領域ごとのエッジ成分の主方向と、分類モデル記憶手段１５に記憶されている分類モデルとを用いて、分類用データ入力手段１１から入力される画像データを分類するものである。
この分類手段１６は、分類モデルの最初の畳み込み層における畳み込み処理において、画像データのそれぞれのフィルタ領域を、エッジ成分の主方向に応じて所定角度回転させて、回転後のフィルタ領域に対して、畳み込み演算を行う。 The classification means 16 uses the main direction of the edge component for each filter area stored in the region-specific main direction storage means 13 and the classification model stored in the classification model storage means 15 to input data for classification. The image data input from 11 is classified.
In the convolution process in the first convolution layer of the classification model, the classification means 16 rotates each filter region of the image data by a predetermined angle according to the main direction of the edge component, and refers to the rotated filter region with respect to the rotation filter region. Performs a convolution operation.

なお、この分類モデルの最初の畳み込み層における畳み込み処理は、分類モデル記憶手段１５に記憶されている学習済みの畳み込み係数を用いる以外は、図５、図６を用いて説明した分類モデル学習手段１４の処理と同じであるため、説明を省略する。
また、分類手段１６は、最初の畳み込み層における畳み込み処理以降、分類モデル記憶手段１５に記憶されている分類モデルを用いて、画像データの特徴を伝播させ、分類モデルの出力層のノードで最も高い確率値となるノードの対応する分類結果を出力する。 The convolution process in the first convolution layer of this classification model is the classification model learning means 14 described with reference to FIGS. 5 and 6, except that the learned convolution coefficient stored in the classification model storage means 15 is used. Since it is the same as the process of, the description will be omitted.
Further, the classification means 16 propagates the features of the image data by using the classification model stored in the classification model storage means 15 after the convolution processing in the first convolutional layer, and is the highest in the node of the output layer of the classification model. Outputs the corresponding classification result of the node that becomes the probability value.

以上、本発明の実施形態に係る画像データ分類装置１の構成について説明したが、画像データ分類装置１は、コンピュータを前記した各手段として機能させるためのプログラム（画像データ分類プログラム）で動作させることができる。 The configuration of the image data classification device 1 according to the embodiment of the present invention has been described above, but the image data classification device 1 is operated by a program (image data classification program) for operating the computer as each of the above-mentioned means. Can be done.

以上説明したように画像データ分類装置１を構成することで、画像データ分類装置１は、１つの方向のオブジェクトを含んだ画像データを用いて分類モデルを学習することで、オブジェクトの向きによらずに精度よく画像データを分類することができる。 By configuring the image data classification device 1 as described above, the image data classification device 1 learns the classification model using the image data including the objects in one direction, regardless of the orientation of the objects. Image data can be classified with high accuracy.

例えば、図７に示すように、同じオブジェクト（一例として、「家」の画像）が異なる画像データ内で傾いた状態であった場合、画像データ分類装置１は、図７（ａ），（ｂ），（ｃ）において、オブジェクトの同一領域（「煙突部分」の画像領域）で畳み込みフィルタＣｆを適用する際に、エッジ成分の主方向に対して同一方向となるフィルタ領域で畳み込み処理を行う。そのため、画像データ分類装置１は、同じオブジェクトの同一領域において、ほぼ同じ特徴量を抽出することができ、１つの方向のオブジェクトを含んだ画像データを用いて分類モデルを学習すればよい。 For example, as shown in FIG. 7, when the same object (for example, an image of “house”) is in a tilted state in different image data, the image data classification device 1 uses FIGS. 7 (a) and 7 (b). ), (C), when the convolution filter Cf is applied in the same region of the object (the image region of the “chimney portion”), the convolution process is performed in the filter region that is in the same direction as the main direction of the edge component. Therefore, the image data classification device 1 can extract substantially the same feature amount in the same region of the same object, and may learn the classification model using the image data including the objects in one direction.

＜画像データ分類装置の動作＞
次に、図８，図９を参照して、本発明の実施形態に係る画像データ分類装置１の動作について説明する。ここでは、画像データ分類装置１の動作を、学習モードと、分類モードとに分けて説明する。 <Operation of image data classification device>
Next, the operation of the image data classification device 1 according to the embodiment of the present invention will be described with reference to FIGS. 8 and 9. Here, the operation of the image data classification device 1 will be described separately for a learning mode and a classification mode.

（学習モード）
図８を参照（構成については適宜図１参照）して、画像データ分類装置１の学習モードの動作について説明する。 (Learning mode)
The operation of the learning mode of the image data classification device 1 will be described with reference to FIG. 8 (see FIG. 1 for the configuration as appropriate).

ステップＳ１において、学習用データ入力手段１０は、学習データとして、分類が既知の画像データと、その分類内容を示す教師データとを入力する。
そして、領域別主方向推定手段１２は、以下のステップＳ２からステップＳ６の動作により、ステップＳ１で入力した画像データにおいて、畳み込みフィルタを適用するフィルタ領域ごとにエッジ成分の主方向を推定する。 In step S1, the learning data input means 10 inputs image data whose classification is known and teacher data indicating the classification content as learning data.
Then, the region-specific main direction estimating means 12 estimates the main direction of the edge component for each filter region to which the convolution filter is applied in the image data input in step S1 by the following operations from step S2 to step S6.

ステップＳ２において、領域別主方向推定手段１２は、ステップＳ１で入力した画像データに対して、畳み込みフィルタを適用するフィルタ領域の初期位置（例えば、画像の左上）を設定する。 In step S2, the area-specific main direction estimation means 12 sets the initial position (for example, the upper left of the image) of the filter area to which the convolution filter is applied with respect to the image data input in step S1.

ステップＳ３において、領域別主方向推定手段１２は、フィルタ領域において、画像のエッジ成分の主方向を推定する。具体的には、領域別主方向推定手段１２は、ソーベルフィルタを用いて、フィルタ領域内の画像の各画素の勾配強度および勾配方向を求める。そして、領域別主方向推定手段１２は、勾配方向を量子化し、量子化した勾配方向ごとの勾配強度を累計し、勾配強度の累計がピークとなる勾配方向を、エッジ成分の主方向とする。なお、勾配強度の最も大きい累計値に対する２番目に大きい累計値の割合が予め定めた割合よりも大きい場合は、主方向が存在しないもの（主方向＝０°）とする。 In step S3, the region-specific main direction estimation means 12 estimates the main direction of the edge component of the image in the filter region. Specifically, the region-specific main direction estimation means 12 uses a Sobel filter to obtain the gradient intensity and the gradient direction of each pixel of the image in the filter region. Then, the region-specific main direction estimating means 12 quantizes the gradient direction, accumulates the gradient intensities for each of the quantized gradient directions, and sets the gradient direction at which the cumulative gradient intensity peaks as the main direction of the edge component. If the ratio of the second largest cumulative value to the highest cumulative value of the gradient intensity is larger than the predetermined ratio, it is assumed that the main direction does not exist (main direction = 0 °).

ステップＳ４において、領域別主方向推定手段１２は、フィルタ領域の位置と、ステップＳ３で推定したエッジ成分の主方向とを対応付けて領域別主方向記憶手段１３に記憶する。
ステップＳ５において、領域別主方向推定手段１２は、画像データ内のすべてのフィルタ領域の画像に対して、エッジ成分の主方向を推定したか否かを判定する。 In step S4, the area-specific main direction estimation means 12 stores the position of the filter region and the main direction of the edge component estimated in step S3 in association with each other in the area-specific main direction storage means 13.
In step S5, the region-specific main direction estimating means 12 determines whether or not the main direction of the edge component has been estimated for the images of all the filter regions in the image data.

ここで、まだ、すべてのフィルタ領域の画像に対してエッジ成分の主方向を推定していない場合（ステップＳ５でＮｏ）、ステップＳ６において、領域別主方向推定手段１２は、フィルタ領域を、畳み込みフィルタの移動幅に応じた位置に移動させる。そして、領域別主方向推定手段１２は、ステップＳ３に戻って、次のフィルタ領域の画像に対して、エッジ成分の主方向を推定する。
一方、すべてのフィルタ領域の画像に対してエッジ成分の主方向を推定した場合（ステップＳ５でＹｅｓ）、分類モデル学習手段１４がステップＳ７以降の動作を行う。 Here, when the main directions of the edge components have not yet been estimated for the images of all the filter regions (No in step S5), in step S6, the region-specific main direction estimation means 12 convolves the filter region. Move to a position according to the movement width of the filter. Then, the region-specific main direction estimating means 12 returns to step S3 and estimates the main direction of the edge component with respect to the image of the next filter region.
On the other hand, when the main direction of the edge component is estimated for the images of all the filter regions (Yes in step S5), the classification model learning means 14 performs the operations after step S7.

分類モデル学習手段１４は、以下のステップＳ７からステップＳ１１の動作により、最初の畳み込み層の処理を行う。 The classification model learning means 14 processes the first convolution layer by the following operations from step S7 to step S11.

ステップＳ７において、分類モデル学習手段１４は、ステップＳ１で入力した画像データに対して、畳み込みフィルタを適用するフィルタ領域の初期位置を設定する。
ステップＳ８において、分類モデル学習手段１４は、フィルタ領域の位置に対応するエッジ成分の主方向を、領域別主方向記憶手段１３から読み出し、フィルタ領域の予め定めた基準方向が主方向となるように回転させた領域を新たなフィルタ領域とする。 In step S7, the classification model learning means 14 sets the initial position of the filter area to which the convolution filter is applied with respect to the image data input in step S1.
In step S8, the classification model learning means 14 reads out the main direction of the edge component corresponding to the position of the filter area from the area-specific main direction storage means 13, so that the predetermined reference direction of the filter area becomes the main direction. The rotated region is used as a new filter region.

ステップＳ９において、分類モデル学習手段１４は、ステップＳ８で主方向の向きに所定角度回転させたフィルタ領域に対して畳み込みフィルタを適用して畳み込み演算を行う。
ステップＳ１０において、分類モデル学習手段１４は、画像データ内のすべてのフィルタ領域に対して、畳み込み演算を行ったか否かを判定する。 In step S9, the classification model learning means 14 applies a convolution filter to the filter region rotated by a predetermined angle in the direction of the main direction in step S8 to perform a convolution operation.
In step S10, the classification model learning means 14 determines whether or not the convolution operation has been performed on all the filter areas in the image data.

ここで、まだ、すべてのフィルタ領域に対して畳み込み演算を行っていない場合（ステップＳ１０でＮｏ）、ステップＳ１１において、分類モデル学習手段１４は、フィルタ領域を、畳み込みフィルタの移動幅に応じた位置に移動させる。そして、分類モデル学習手段１４は、ステップＳ８に戻って、次のフィルタ領域に対して、畳み込み演算を行う。 Here, when the convolution operation has not yet been performed on all the filter areas (No in step S10), in step S11, the classification model learning means 14 positions the filter area according to the movement width of the convolution filter. Move to. Then, the classification model learning means 14 returns to step S8 and performs a convolution operation on the next filter area.

一方、すべてのフィルタ領域に対して畳み込み演算を行った場合（ステップＳ１０でＹｅｓ）、分類モデル学習手段１４は、ステップＳ１２に動作を進める。なお、図示は省略するが、最初の畳み込み層において、複数の畳み込みフィルタを用いる場合、分類モデル学習手段１４は、ステップＳ７からステップＳ１１までの動作を、畳み込みフィルタの数だけ実行する。 On the other hand, when the convolution operation is performed on all the filter areas (Yes in step S10), the classification model learning means 14 proceeds to step S12. Although not shown, when a plurality of convolution filters are used in the first convolution layer, the classification model learning means 14 executes the operations from step S7 to step S11 by the number of convolution filters.

ステップＳ１２において、分類モデル学習手段１４は、ステップＳ１１までの動作で最初の畳み込み層により生成された特徴マップに対して、後段の２段目以降の畳み込み層、プーリング層、全結合層、出力層の処理を実行する。
ステップＳ１３において、分類モデル学習手段１４は、ステップＳ１２の出力層から出力される分類結果と、ステップＳ１で入力した教師データとの誤差から、誤差逆伝播法を用いて、分類モデルのパラメータを更新し、分類モデル記憶手段１５に記憶する。 In step S12, the classification model learning means 14 refers to the convolution layer, the pooling layer, the fully connected layer, and the output layer of the second and subsequent convolution layers in the subsequent stage with respect to the feature map generated by the first convolution layer in the operations up to step S11. Executes the processing of.
In step S13, the classification model learning means 14 updates the parameters of the classification model by using the error backpropagation method from the error between the classification result output from the output layer in step S12 and the teacher data input in step S1. Then, it is stored in the classification model storage means 15.

ステップＳ１４において、分類モデル学習手段１４は、分類モデルの学習を完了したか否かを判定する。ここで、分類モデルの学習の判定は、ステップＳ１３における誤差が予め定めた閾値よりも小さくなった場合である。
ここで、分類モデルの学習が完了していない場合（ステップＳ１４でＮｏ）、ステップＳ１において、学習用データ入力手段１０が新たな学習データを入力することで、分類モデル学習手段１４は、分類モデルの学習を継続する。
一方、分類モデルの学習が完了した場合（ステップＳ１４でＹｅ）、画像データ分類装置１は、動作を終了する。 In step S14, the classification model learning means 14 determines whether or not the learning of the classification model has been completed. Here, the determination of learning of the classification model is a case where the error in step S13 becomes smaller than the predetermined threshold value.
Here, when the learning of the classification model is not completed (No in step S14), in step S1, the learning data input means 10 inputs new learning data, so that the classification model learning means 14 uses the classification model. Continue learning.
On the other hand, when the learning of the classification model is completed (Ye in step S14), the image data classification device 1 ends the operation.

以上の動作によって、画像データ分類装置１は、ＣＮＮの分類モデルを学習する際に、最初の畳み込み層の処理において、フィルタ領域のエッジ成分の主方向に対して、一定方向となるように畳み込みフィルタを適用して畳み込み処理を行う。
これによって、画像データ分類装置１は、画像データのオブジェクトの傾きに対してほぼ不変な特徴量を抽出して学習を行うことができるため、様々な向きでオブジェクトが映った画像データを学習データとする必要がなく、学習データの量と学習時間とを従来に比べて軽減することができる。 By the above operation, when the image data classification device 1 learns the classification model of CNN, the convolution filter is set in a constant direction with respect to the main direction of the edge component of the filter region in the processing of the first convolution layer. Is applied to perform the convolution process.
As a result, the image data classification device 1 can perform learning by extracting features that are almost invariant to the inclination of the object in the image data, so that the image data in which the object is projected in various directions can be used as the training data. It is not necessary to do so, and the amount of training data and the training time can be reduced as compared with the conventional case.

（分類モード）
次に、図９を参照（構成については適宜図１参照）して、画像データ分類装置１の画像データの分類モードの動作について説明する。 (Classification mode)
Next, the operation of the image data classification mode of the image data classification device 1 will be described with reference to FIG. 9 (see FIG. 1 for the configuration as appropriate).

ステップＳ２０において、分類用データ入力手段１１は、分類が未知の画像データを入力する。
そして、領域別主方向推定手段１２は、ステップＳ２１からステップＳ２５の動作により、ステップＳ２０で入力した画像データにおいて、畳み込みフィルタを適用するフィルタ領域ごとにエッジ成分の主方向を推定する。なお、ステップＳ２１からステップＳ２５の動作は、図８で説明したステップＳ２からステップＳ６の動作と同じであるため、説明を省略する。 In step S20, the classification data input means 11 inputs image data whose classification is unknown.
Then, the region-specific main direction estimating means 12 estimates the main direction of the edge component for each filter region to which the convolution filter is applied in the image data input in step S20 by the operation of steps S21 to S25. Since the operations of steps S21 to S25 are the same as the operations of steps S2 to S6 described with reference to FIG. 8, the description thereof will be omitted.

そして、分類手段１６は、ステップＳ２６からステップＳ３０の動作により、最初の畳み込み層の処理を行う。なお、ステップＳ２６からステップＳ３０の動作は、動作主体が分類モデル学習手段１４から分類手段１６に替わるだけ、図８で説明したステップＳ７からステップＳ１１の動作と同じであるため、説明を省略する。 Then, the classification means 16 processes the first convolution layer by the operations of steps S26 to S30. The operation of steps S26 to S30 is the same as the operation of steps S7 to S11 described with reference to FIG. 8 except that the action subject changes from the classification model learning means 14 to the classification means 16. Therefore, the description thereof will be omitted.

ステップＳ３１において、分類手段１６は、ステップＳ３０までの動作で最初の畳み込み層により生成された特徴マップに対して、後段の２段目以降の畳み込み層、プーリング層、全結合層、出力層の処理を実行する。
ステップＳ３２において、分類手段１６は、ステップＳ３１における出力層のノードで最も高い確率値となるノードの対応する分類結果を出力する。 In step S31, the classification means 16 processes the convolution layer, the pooling layer, the fully connected layer, and the output layer of the second and subsequent convolution layers in the subsequent stage with respect to the feature map generated by the first convolution layer in the operations up to step S30. To execute.
In step S32, the classification means 16 outputs the corresponding classification result of the node having the highest probability value among the nodes of the output layer in step S31.

以上の動作によって、画像データ分類装置１は、ＣＮＮの分類モデルにより画像データを分類する際に、最初の畳み込み層の処理において、フィルタ領域のエッジ成分の主方向に対して、一定方向となるように畳み込みフィルタを適用して畳み込み処理を行う。
これによって、画像データ分類装置１は、画像データのオブジェクトの傾きに対してほぼ不変な特徴量を抽出するため、異なる向きで同じオブジェクトが映った画像データであっても、同じ内容として画像データを分類することができる。 By the above operation, when the image data classification device 1 classifies the image data by the classification model of CNN, in the processing of the first convolution layer, the image data classification device 1 is in a constant direction with respect to the main direction of the edge component of the filter region. A convolution filter is applied to the convolution process.
As a result, the image data classification device 1 extracts a feature amount that is almost invariant to the inclination of the object of the image data. Therefore, even if the image data shows the same object in different directions, the image data is displayed as the same content. Can be classified.

＜変形例＞
以上、本発明の実施形態に係る画像データ分類装置１の構成および動作について説明したが、本発明は、この実施形態に限定されるものではない。
（変形例１）
画像データ分類装置１は、分類モデルを学習するモード（学習モード）と、画像データを分類するモード（分類モード）との２つの異なる動作モードの処理を１つの装置で実行するものである。しかし、これらの処理は、別々の装置で行うようにしても構わない。 <Modification example>
Although the configuration and operation of the image data classification device 1 according to the embodiment of the present invention have been described above, the present invention is not limited to this embodiment.
(Modification example 1)
The image data classification device 1 executes processing of two different operation modes, a mode for learning a classification model (learning mode) and a mode for classifying image data (classification mode), with one device. However, these processes may be performed by separate devices.

具体的には、分類モデルを学習する装置は、図１０に示す分類モデル生成装置２として構成することができる。
分類モデル生成装置２は、図１０に示すように、学習用データ入力手段１０と、領域別主方向推定手段１２と、領域別主方向記憶手段１３と、分類モデル学習手段１４と、分類モデル記憶手段１５と、を備える。この構成は、図１で説明した画像データ分類装置１の構成から、分類用データ入力手段１１と、分類手段１６とを削除したものである。
この分類モデル生成装置２は、分類モデルを学習する動作のみを行う。分類モデル生成装置２の動作は、図８で説明した動作と同じである。
なお、分類モデル生成装置２は、コンピュータを前記した各手段として機能させるためのプログラム（分類モデル生成プログラム）で動作させることができる。 Specifically, the device for learning the classification model can be configured as the classification model generation device 2 shown in FIG.
As shown in FIG. 10, the classification model generation device 2 includes a learning data input means 10, a region-specific main direction estimation means 12, a region-specific main direction storage means 13, a classification model learning means 14, and a classification model storage. Means 15 and. In this configuration, the classification data input means 11 and the classification means 16 are deleted from the configuration of the image data classification device 1 described with reference to FIG.
The classification model generation device 2 only performs an operation of learning the classification model. The operation of the classification model generator 2 is the same as the operation described with reference to FIG.
The classification model generation device 2 can be operated by a program (classification model generation program) for operating the computer as each of the above-mentioned means.

（変形例２）
また、分類モデルを用いて、画像データを分類する装置は、図１１に示す画像データ分類装置１Ｂとして構成することができる。
画像データ分類装置１Ｂは、分類用データ入力手段１１と、領域別主方向推定手段１２と、領域別主方向記憶手段１３と、分類モデル記憶手段１５と、分類手段１６と、を備える。この構成は、図１で説明した画像データ分類装置１の構成から、学習用データ入力手段１０と、分類モデル学習手段１４とを削除したものである。また、分類モデル記憶手段１５に記憶する分類モデルは、図１０の分類モデル生成装置２で生成されたものである。
この画像データ分類装置１Ｂは、画像データを分類する動作のみを行う。画像データ分類装置１Ｂの動作は、図９で説明した動作と同じである。
なお、画像データ分類装置１Ｂは、コンピュータを前記した各手段として機能させるためのプログラム（画像データ分類プログラム）で動作させることができる。 (Modification 2)
Further, the device for classifying the image data using the classification model can be configured as the image data classification device 1B shown in FIG.
The image data classification device 1B includes a data input means 11 for classification, a main direction estimation means 12 for each area, a main direction storage means 13 for each area, a classification model storage means 15, and a classification means 16. In this configuration, the learning data input means 10 and the classification model learning means 14 are deleted from the configuration of the image data classification device 1 described with reference to FIG. The classification model stored in the classification model storage means 15 is generated by the classification model generation device 2 of FIG.
The image data classification device 1B only performs an operation of classifying image data. The operation of the image data classification device 1B is the same as the operation described with reference to FIG.
The image data classification device 1B can be operated by a program (image data classification program) for operating the computer as each of the above-mentioned means.

このように、分類モデルを学習する動作と、分類モデルを用いて画像データを分類する動作とを、異なる装置（分類モデル生成装置２，画像データ分類装置１Ｂ）で動作させることで、１つの分類モデル生成装置２で生成した分類計モデルを、複数の画像データ分類装置１Ｂで利用することが可能になる。 In this way, by operating the operation of learning the classification model and the operation of classifying the image data using the classification model by different devices (classification model generation device 2, image data classification device 1B), one classification is performed. The classifier model generated by the model generation device 2 can be used by a plurality of image data classification devices 1B.

（変形例３）
また、ここでは、領域別主方向推定手段１２がソーベルフィルタを用いてエッジ成分の主方向を推定することとしたが、これに限定されるものではない。
例えば、領域別主方向推定手段１２は、ＳＩＦＴ（Scale-Invariant Feature Transform）、ＳＵＲＦ（Speed-Up Robust Features）等の画像データの特徴量であるエッジ成分の勾配強度、勾配方向を用いてもよい。あるいは、畳み込みフィルタの大きさの画像を、予めエッジ成分の主方向が既知の複数のパターンで機械学習した結果を用いて、領域別主方向推定手段１２が、入力された画像データの主方向を推定することとしてもよい。 (Modification example 3)
Further, here, the region-specific main direction estimation means 12 estimates the main direction of the edge component using a sobel filter, but the present invention is not limited to this.
For example, the region-specific main direction estimation means 12 may use the gradient intensity and gradient direction of the edge component, which is a feature amount of image data such as SIFT (Scale-Invariant Feature Transform) and SURF (Speed-Up Robust Features). .. Alternatively, using the result of machine learning the image of the size of the convolution filter with a plurality of patterns in which the main direction of the edge component is known in advance, the region-specific main direction estimation means 12 determines the main direction of the input image data. It may be estimated.

＜畳み込みフィルタの更新について＞
最後に、分類モデル学習手段１４（図１）において、誤差逆伝播法によって、フィルタ領域を所定角度だけ回転させた畳み込みフィルタの更新（学習）が可能であることを説明する。
ＣＮＮにおける第Ｌ層の座標（ｉ，ｊ）における出力値（重み付き和）をｕ_ｉｊ ^Ｌ、活性化関数をｆとすると、活性（活性化関数の値）ｚ_ｉｊ ^Ｌは、以下の式（３）で表すことができる。 <About updating the convolution filter>
Finally, in the classification model learning means 14 (FIG. 1), it will be described that the convolution filter in which the filter region is rotated by a predetermined angle can be updated (learned) by the error back propagation method.
Assuming that the output value (weighted sum) at the coordinates (i, j) of the Lth layer in the CNN is _uij ^L and the activation function is f, the activity (value of the activation function) z _ij ^L is the following equation ( It can be represented by 3).

ここで、畳み込みフィルタの係数をｈ_ｐｑとすると従来の畳み込み層における出力値ｕ_ｉｊ ^Ｌは、以下の式（４）で表すことができる。なお、（ｐ，ｑ）は、畳み込みフィルタの座標を示す。 Here, assuming that the coefficient of the convolution filter is h _pq _{, the output value uij} ^L in the conventional convolution layer can be expressed by the following equation (4). Note that (p, q) indicates the coordinates of the convolution filter.

一方、本発明において、畳み込みフィルタの畳み込み対象となる座標（ｉ＋ｐ，ｊ＋ｑ）はエッジ成分の主方向に応じて所定角度回転することになる。この回転角度は、領域別主方向推定手段１２によって、分類モデル学習手段１４におけるＣＮＮの学習以前に既知の情報である。ここで、回転後の座標を（（ｉ＋ｐ）′，（ｊ＋ｑ）′）とすると、分類モデル学習手段１４における最初の畳み込み層の出力値ｕ_ｉｊ ^Ｌは、以下の式（５）で表すことができる。 On the other hand, in the present invention, the coordinates (i + p, j + q) to be convoluted by the convolution filter are rotated by a predetermined angle according to the main direction of the edge component. This rotation angle is information known before the learning of the CNN in the classification model learning means 14 by the region-specific main direction estimating means 12. Here, assuming that the coordinates after rotation are ((i + p)', (j + q)'), the output value u _ij ^L of the first convolution layer in the classification model learning means 14 can be expressed by the following equation (5). can.

本発明において、誤差逆伝播法によって所定角度回転させた畳み込みフィルタの更新が可能であるか否かは、誤差関数が微分可能（誤差関数の勾配を求めることが可能）であるか否かと同義である。以下、本発明において、誤差関数が微分可能であることを示す。
ここで、誤差関数をＥとする。誤差関数Ｅの勾配は、偏微分の連鎖法則から以下の式（６）で表すことができる。 In the present invention, whether or not the convolution filter rotated by a predetermined angle by the error backpropagation method can be updated is synonymous with whether or not the error function is differentiable (the gradient of the error function can be obtained). be. Hereinafter, it is shown that the error function is differentiable in the present invention.
Here, let the error function be E. The gradient of the error function E can be expressed by the following equation (6) from the chain rule of partial differentiation.

ここで、誤差関数Ｅを重み付き和ｕ_ｉｊ ^Ｌで偏微分した結果を以下の式（７）に示すδ_ｉｊ ^Ｌとする。 Here, the result of partially differentiating the error function E with the weighted sum u _ij ^L _{is defined as δ ij} ^L shown in the following equation (7).

すると、前記式（５）から、前記式（６）は以下の式（８）に書き換えることができる。 Then, from the equation (5), the equation (6) can be rewritten to the following equation (8).

この式（８）のｚ_{（ｉ＋ｐ）′，（ｊ＋ｑ）′} ^Ｌ−１は、前の層（第（Ｌ−１）層）の出力値であり、回転後の座標の値は、エッジ成分の主方向がすでに決定されていることから、確定した値となる。そこで、誤差伝播を行うためには、δ_ｉｊ ^Ｌを求めることができればよいことになる。なお、δ_ｉｊ ^Ｌを求めることができか否かは、畳み込みフィルタが所定角度回転しているか否かによらず、従来と同様の手法で求めることができる。
まず、偏微分の連鎖法則によって、δ_ｉｊ ^Ｌは、以下の式（９）のように変形することができる。なお、座標（ｓ，ｔ）における重み付き和をｕ_ｓｔ ^Ｌとする。 _{The z (i + p)'and (j + q)'} ^L-1 of this equation (8) are the output values of the previous layer (the (L-1) layer), and the coordinate values after rotation are the edge components. Since the main direction has already been determined, it will be a fixed value. Therefore, in order to carry out error propagation, it is sufficient if _{δ ij} ^{L can be obtained.} Whether or not δ _ij ^L can be obtained can be obtained by the same method as in the conventional method regardless of whether or not the convolution filter is rotated by a predetermined angle.
First, according to the chain rule of partial differentiation, δ _ij ^L can be transformed as shown in the following equation (9). The weighted sum at the coordinates (s, t) is _ust ^L.

ここで、前記式（３）および前記式（５）から、前記式（９）の（∂ｕ_ｓｔ ^Ｌ＋１／∂ｕ_ｉｊ ^Ｌ）は、以下の式（１０）に変形することができる。 Here, from the above formula (3) and the above formula (5), (∂u _st ^{L + 1} / ∂u _ij ^L ) of the above formula (9) can be transformed into the following formula (10).

前記式（９）を前記式（１０）で置き換えると、以下の式（１１）となる。 When the formula (9) is replaced with the formula (10), the following formula (11) is obtained.

この式（１１）における∂（…）／∂ｕ_ｉｊ ^Ｌは、ｕ_ｉｊ ^Ｌで偏微分していることから、ｕ_{ｓ＋ｐ，ｔ＋ｑ} ^Ｌ＝ｕ_ｉｊ ^Ｌ、すなわち、ｓ＋ｐ＝ｉ，ｔ＋ｑ＝ｊとなる（ｓ，ｔ）および（ｐ，ｑ）の組み合わせだけを考えればよい（他の値は“０”になる）ため、前記式（１１）は以下の式（１２）となる。 Since ∂ (...) / ∂u _ij ^L in this equation (11) is partially differentiated with respect to u _ij ^L , it becomes us _{+ p, t + q} ^L = u _ij ^L , that is, s + p = i, t + q = j. Since only the combination of (s, t) and (p, q) needs to be considered (other values are "0"), the above equation (11) becomes the following equation (12).

ここで、ｆ′（…）は、既知の活性化関数ｆの微分であり、δ_{ｉ−ｐ，ｊ−ｑ} ^Ｌ＋１は、後ろの層から伝播される値であることから、δ_ｉｊ ^Ｌを求めることができる。
このように、本発明によっても、誤差関数Ｅは微分可能であり、ＣＮＮにおける順伝播および逆伝播の処理を行うことで、分類モデルを学習することができる。 Here, f'(...) is the derivative of the known activation function f, and δ _{ip and j-q} ^{L + 1} are values propagated from the subsequent layer, so that δ _ij ^L is obtained. be able to.
As described above, also in the present invention, the error function E is differentiable, and the classification model can be learned by performing the forward propagation and back propagation processing in the CNN.

１，１Ｂ画像データ分類装置
２分類モデル生成装置
１０学習用データ入力手段
１１分類用データ入力手段
１２領域別主方向推定手段
１３領域別主方向記憶手段
１４分類モデル学習手段
１５分類モデル記憶手段
１６分類手段 1,1B Image data classification device 2 Classification model generation device 10 Learning data input means 11 Classification data input means 12 Area-specific main direction estimation means 13 Area-specific main direction storage means 14 Classification model learning means 15 Classification model storage means 16 Classification means

Claims

A classification model generator that generates a classification model, which is a convolutional neural network for classifying image data whose classification is unknown, from a plurality of image data whose classification is known.
From the image data whose classification is known, a region-specific main direction estimating means for estimating the main direction of the edge component of the image for each filter region to which the convolution filter of the first convolutional layer of the convolutional neural network is applied.
A classification model learning means for learning the convolutional neural network and generating the classification model from image data whose classification is known and teacher data indicating the classification contents is provided.
In the classification model learning means, in the first convolution layer, for each of the filter regions, the orientation of the predetermined reference of the filter region is relative to the main direction of the edge component estimated by the region-specific main direction estimation means. A classification model generator characterized in that a convolution operation is performed by rotating a filter area so as to have a certain direction.

The region-specific main direction estimating means calculates the gradient intensity and gradient direction of the edge component of the filter region from the pixel values of the nearby pixels close to the pixel for each pixel in the filter region, and for each quantized gradient direction. The classification model generation device according to claim 1, wherein the gradient angle having the largest accumulated gradient intensity is estimated as the main direction.

An image data classification device for classifying image data whose classification is unknown by using a classification model which is a convolutional neural network generated by the model generation device according to claim 1 or 2.
A region-specific main direction estimation means for estimating the main direction of the edge component of the image for each filter region to which the convolution filter of the first convolutional layer of the convolutional neural network is applied from the image data whose classification is unknown.
The convolutional neural network, which is the classification model, is provided with a classification means for classifying image data whose classification is unknown.
In the first convolutional layer, the classification means has a predetermined reference orientation of the filter region for each of the filter regions in a fixed direction with respect to the main direction of the edge component estimated by the region-specific main direction estimation means. An image data classification device characterized in that a convolution operation is performed by rotating a filter area so as to become.

The area-specific main direction estimating means calculates the gradient intensity and the gradient direction of the edge component from the pixel values of the nearby pixels for each pixel in the filter region, and the gradient intensity accumulated for each quantized gradient direction is the largest gradient. The image data classification device according to claim 3, wherein the angle is estimated as the main direction.

An image data classification device that generates a classification model, which is a convolutional neural network for classifying image data whose classification is unknown, from a plurality of image data whose classification is known, and classifies the image data whose classification is unknown.
From the image data, for each filter region to which the convolution filter of the first convolution layer of the convolutional neural network is applied, a region-specific main direction estimation means for estimating the main direction of the edge component of the image, and a region-specific main direction estimation means.
A classification model learning means that learns the convolutional neural network and generates the classification model from image data whose classification is known and teacher data indicating the classification contents.
The convolutional neural network, which is the classification model, is provided with a classification means for classifying image data whose classification is unknown.
In the classification model learning means, in the first convolution layer, for each of the filter regions, the orientation of the predetermined reference of the filter region is relative to the main direction of the edge component estimated by the region-specific main direction estimation means. Rotate the filter area so that it is in a fixed direction, perform the convolution operation, and perform the convolution operation.
In the first convolutional layer, the classification means has a predetermined reference orientation of the filter region for each of the filter regions in a fixed direction with respect to the main direction of the edge component estimated by the region-specific main direction estimation means. An image data classification device characterized in that a convolution operation is performed by rotating a filter area so as to become.

A classification model generation program for operating a computer as the classification model generation device according to claim 1 or 2.

An image data classification program for causing a computer to function as the image data classification device according to any one of claims 3 to 5.