WO2022208843A1

WO2022208843A1 - Training data processing device, training data processing method, and training data processing program

Info

Publication number: WO2022208843A1
Application number: PCT/JP2021/014159
Authority: WO
Inventors: 翔大山田; 弘員柿沼; 秀信長田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: NTT Inc
Priority date: 2021-04-01
Filing date: 2021-04-01
Publication date: 2022-10-06
Anticipated expiration: 2023-10-01
Also published as: JPWO2022208843A1

Abstract

A training data processing device according to one embodiment of the present invention comprises a data input unit, data expansion unit, and data output unit. The data input unit acquires training data to be used for training a deep generative model including: an input image and an illumination environment for the input image; a teacher image obtained by changing only the illumination environment from the input image, and an illumination environment for the teacher image; and a targeted region image indicating a brightness change region in the input image. The data expansion unit creates a brightness adjustment image obtained by subjecting the input image to brightness adjustment and also creates a mask image indicating a brightness change region in which brightness change is to be made, and synthesizes the targeted region image, the brightness adjustment image, and the mask image so as to create a data expansion image. The data output unit changes the input image in the training data to the data expansion image so as to create new training data, and outputs the new training data as training data to be used for training the deep generative model.

Description

LEARNING DATA PROCESSING DEVICE, LEARNING DATA PROCESSING METHOD AND LEARNING DATA PROCESSING PROGRAM

　この発明の実施形態は、学習データ処理装置、学習データ処理方法及び学習データ処理プログラムに関する。 Embodiments of the present invention relate to a learning data processing device, a learning data processing method, and a learning data processing program.

　再照明（Relighting）は、入力画像に対して画像内の照明環境を所望のものに変更した再照明画像を生成する技術である。この再照明技術では、入力画像から所望の再照明画像を生成するのに、深層学習を利用している。 Relighting is a technique for generating a relighted image by changing the lighting environment in the image to the desired one for the input image. This relighting technique uses deep learning to generate the desired relighted image from the input image.

　深層学習を利用した手法では、例えば、非特許文献１に提案されているように、入力画像と入力画像から照明環境のみを変更した教師画像とをペアにした学習データを用いて、再照明画像を生成する深層生成モデルの学習を行う。 In a method using deep learning, for example, as proposed in Non-Patent Document 1, a re-illuminated image is obtained using learning data in which an input image and a teacher image obtained by changing only the lighting environment from the input image are paired. Train a deep generative model that generates

　また、例えば、非特許文献２は、このような学習データを現実環境で作成する際に、多数のカメラ及び照明に囲まれたような特殊な設備を準備して、様々な撮影条件及び照明条件で撮影することを提案している。 Also, for example, Non-Patent Document 2 prepares a special facility surrounded by a large number of cameras and lighting when creating such learning data in a real environment, and prepares various shooting conditions and lighting conditions. I suggest shooting with

　なお、学習に用いる入力画像は、建物や木等で照明環境が遮られることで生じる顔領域に大域的にかかる陰影が無い画像であることが望ましく、例えば、非特許文献３は、そのような陰影の除去手法を提案している。 It is desirable that the input image used for learning is an image that does not have global shadows on the face region caused by the lighting environment being blocked by buildings or trees. A shadow removal method is proposed.

T. SUN, et al, "Single Image Portrait Relighting," SIGGRAPH2019.T. SUN, et al, "Single Image Portrait Relighting," SIGGRAPH2019. K. Guo, et al, "The Relightables: Volumetric Performance Capture of Humans with Realistic Relighting," SIGGRAPH 2020.K. Guo, et al, "The Relightables: Volumetric Performance Capture of Humans with Realistic Relighting," SIGGRAPH 2020. X. Zhang, et, al, "Portrait Shadow Manipulation," SIGGRAPH2020.X. Zhang, et, al, "Portrait Shadow Manipulation," SIGGRAPH2020.

　上記非特許文献２で提案されているような特殊な設備は、実環境で発生し得る不規則的な陰影及びハイライトまで考慮するには、設定する撮影条件及び照明条件のパターンが膨大となり、現実的ではない。 Special equipment such as that proposed in Non-Patent Document 2 above requires a huge number of patterns of shooting conditions and lighting conditions to be set in order to consider irregular shadows and highlights that may occur in the actual environment. Not realistic.

　よって、非特許文献１のような再照明画像を生成する深層生成モデルを、非特許文献２で作成される学習データを用いて学習する場合、学習データ内に照明環境のパターンが少なくなってしまう。そのため、深層生成モデルの学習後、入力画像として学習データに含まれないような陰影或いはハイライトがのった画像から再照明画像を生成する際には、生成された再照明画像上に陰影又はハイライトが残ってしまう。 Therefore, when a deep generative model that generates a re-illuminated image as in Non-Patent Document 1 is learned using the learning data created in Non-Patent Document 2, the number of lighting environment patterns in the learning data is reduced. . Therefore, after learning the deep generation model, when generating a relight image from an image with shadows or highlights that are not included in the training data as an input image, shadows or highlights are added to the generated relight image. It leaves highlights.

　再照明画像の陰影を除去するために、非特許文献３のような陰影除去処理がさらに必要になってしまう。 In order to remove the shadows in the re-illumination image, shadow removal processing like that of Non-Patent Document 3 is further required.

　この発明は、陰影又はハイライトに対し頑健な深層生成モデルの学習を実現することが可能となる技術を提供しようとするものである。 The present invention seeks to provide a technique that makes it possible to implement the learning of deep generative models that are robust to shadows or highlights.

　上記課題を解決するために、この発明の一態様に係る学習データ処理装置は、データ入力部と、データ拡張部と、データ出力部と、を備える。データ入力部は、入力画像及び前記入力画像の照明環境と、前記入力画像から照明環境のみを変更した画像である教師画像及び前記教師画像の照明環境と、前記入力画像中の輝度変更対象領域を示す対象領域画像と、を含む、深層生成モデルの学習に利用する学習データを取得する。データ拡張部は、前記入力画像に対して輝度調整を行った輝度調整画像を作成すると共に、輝度変更を加える輝度変更領域を示すマスク画像を作成し、前記対象領域画像、前記輝度調整画像及び前記マスク画像を合成して、データ拡張画像を作成する。データ出力部は、前記学習データ中の前記入力画像を前記データ拡張画像に変更して新たな学習データを作成し、前記新たな学習データを、前記深層生成モデルの学習に利用する前記学習データとして出力する。 In order to solve the above problems, a learning data processing device according to one aspect of the present invention includes a data input section, a data extension section, and a data output section. The data input unit inputs an input image, an illumination environment of the input image, a teacher image that is an image obtained by changing only the illumination environment from the input image, an illumination environment of the teacher image, and a brightness change target area in the input image. Learning data used for learning the deep generative model is acquired, including the target region image shown in FIG. A data extension unit creates a brightness-adjusted image obtained by performing brightness adjustment on the input image, creates a mask image indicating a brightness-changed region to which brightness is to be changed, and extracts the target region image, the brightness-adjusted image, and the brightness-adjusted image. Synthesize the mask image to create a data augmented image. The data output unit creates new learning data by changing the input image in the learning data to the data augmentation image, and uses the new learning data as the learning data used for learning the deep generative model. Output.

　この発明の一態様によれば、陰影又はハイライトに対し頑健な深層生成モデルの学習を実現することが可能となる技術を提供することができる。 According to one aspect of the present invention, it is possible to provide a technology that enables learning of a deep generative model robust to shadows or highlights.

図１は、この発明の第１実施形態に係る学習データ処理装置を備える深層生成モデル学習システムの構成の一例を示すブロック図である。FIG. 1 is a block diagram showing an example of the configuration of a deep generative model learning system comprising a learning data processing device according to the first embodiment of the present invention. 図２は、学習データ処理装置が有するマスク画像データセットの一例を示す図である。FIG. 2 is a diagram showing an example of a mask image data set possessed by the learning data processing device. 図３は、学習データ処理装置のハードウェア構成の一例を示す図である。FIG. 3 is a diagram illustrating an example of a hardware configuration of a learning data processing device; 図４は、学習データ処理装置の処理動作の一例を示すフローチャートである。FIG. 4 is a flow chart showing an example of the processing operation of the learning data processing device. 図５は、学習データの一つである入力画像の一例を示す図である。FIG. 5 is a diagram showing an example of an input image that is one of learning data. 図６は、学習データの一つである対象領域画像の一例を示す図である。FIG. 6 is a diagram showing an example of a target region image, which is one of learning data. 図７は、学習データ処理装置が処理途中で作成する陰影・ハイライト画像の一例を示す図である。FIG. 7 is a diagram showing an example of a shadow/highlight image created by the learning data processing device during processing. 図８は、陰影・ハイライト画像の別の一例を示す図である。FIG. 8 is a diagram showing another example of a shadow/highlight image. 図９は、マスク画像の一例を示す図である。FIG. 9 is a diagram showing an example of a mask image. 図１０は、マスク画像の別の一例を示す図である。FIG. 10 is a diagram showing another example of the mask image. 図１１は、画像合成される反転陰影・ハイライト付与領域画像の一例を示す図である。FIG. 11 is a diagram showing an example of a reverse shadow/highlight imparting area image to be combined. 図１２は、画像合成される反転陰影・ハイライト付与領域画像の別の一例を示す図である。FIG. 12 is a diagram showing another example of a reverse shadow/highlight imparting area image to be combined. 図１３は、学習データ処理装置が作成するデータ拡張画像の一例を示す図である。FIG. 13 is a diagram illustrating an example of a data augmented image created by the learning data processing device; 図１４は、学習データ処理装置が作成するデータ拡張画像の別の一例を示す図である。FIG. 14 is a diagram showing another example of a data augmented image created by the learning data processing device. 図１５は、この発明の第２実施形態に係る学習データ処理装置の処理動作の一例を示すフローチャートである。FIG. 15 is a flow chart showing an example of the processing operation of the learning data processing device according to the second embodiment of the present invention. 図１６は、この発明の第３実施形態に係る学習データ処理装置を備える深層生成モデル学習システムの構成の一例を示すブロック図である。FIG. 16 is a block diagram showing an example of the configuration of a deep generative model learning system including a learning data processing device according to the third embodiment of the present invention. 図１７は、第３実施形態に係る学習データ処理装置の処理動作の一例を示すフローチャートである。FIG. 17 is a flow chart showing an example of the processing operation of the learning data processing device according to the third embodiment.

　以下、図面を参照して、この発明に係わる実施形態を説明する。 Hereinafter, embodiments according to the present invention will be described with reference to the drawings.

　［第１実施形態］
　（構成例）
　図１は、この発明の第１実施形態に係る学習データ処理装置１００を備える深層生成モデル学習システムの構成の一例を示すブロック図である。深層生成モデル学習システムは、この学習データ処理装置１００と、学習装置２００と、学習データ記憶部３００と、を備える。なお、深層生成モデル学習システムは、これら各部を１つの装置や筐体として一体として構成していても良いし、複数の装置から構成しても良い。また、複数の装置が遠隔に配置され、ネットワークを経由して接続されていても良い。 [First embodiment]
(Configuration example)
FIG. 1 is a block diagram showing an example of the configuration of a deep generative model learning system including a learning data processing device 100 according to the first embodiment of the present invention. The deep generative model learning system includes this learning data processing device 100 , a learning device 200 and a learning data storage unit 300 . Note that the deep generative model learning system may be configured such that each of these units is integrated as one device or housing, or may be configured from a plurality of devices. Also, multiple devices may be remotely located and connected via a network.

　学習データ記憶部３００は、学習装置２００での学習に必要な学習データを格納する。学習データは、入力画像と入力画像の照明環境、入力画像から照明環境のみを変更した画像である教師画像と教師画像の照明環境、及び、入力画像中の輝度変更対象領域すなわち陰影又はハイライトを付与する領域（例えば、人物の顔の部分）を示す対象領域画像、を含む。この内、照明環境としては、例えば、球面調和級数を用いたベクトルデータ、又は、画像周囲の映り込みを表現した環境マップ画像、の何れかを持つ。用意した学習データ全てを一度、学習データ記憶部３００から学習データ処理装置１００へ渡すことを１エポックとし、各エポックにおいて学習データの順番がランダムに並び替えて、学習データ処理装置１００へ渡される。 The learning data storage unit 300 stores learning data necessary for learning in the learning device 200. The learning data includes the input image and the lighting environment of the input image, the teacher image that is an image obtained by changing only the lighting environment from the input image, the lighting environment of the teacher image, and the brightness change target area, that is, the shadow or highlight in the input image. and a target area image showing the area to be applied (eg, a portion of a person's face). Of these, the lighting environment has, for example, either vector data using spherical harmonics or an environment map image expressing reflections around the image. One epoch is when all of the prepared learning data is transferred from the learning data storage unit 300 to the learning data processing device 100 once.

　学習データ処理装置１００は、学習データ記憶部３００から取得した学習データに対してデータ拡張を含むデータの事前処理を行う。データ拡張とは、学習データに陰影又はハイライトを模した画像効果を加える処理を言う。学習データ処理装置１００は、この事前処理を行った学習データを学習装置２００へ渡す。 The learning data processing device 100 performs data preprocessing including data extension on the learning data acquired from the learning data storage unit 300 . Data augmentation refers to a process of adding an image effect that simulates shadows or highlights to learning data. The learning data processing device 100 passes the preprocessed learning data to the learning device 200 .

　学習装置２００は、学習データ処理装置１００から渡された学習データを用いて深層生成モデルの学習を行う。 The learning device 200 uses the learning data passed from the learning data processing device 100 to learn the deep generative model.

　また、学習装置２００は、学習した深層生成モデルを用いて、任意の入力画像から再照明画像を生成する。入力画像は、学習データ処理装置１００を介して取得しても良いし、図示しない入力装置やネットワークを介して取得するものであっても良い。さらに、学習装置２００は、生成した再照明画像と学習データとの評価を行うことで、深層生成モデルのパラメータの更新及び深層生成モデルの記録を行う。 Also, the learning device 200 uses the learned deep generation model to generate a re-illuminated image from an arbitrary input image. The input image may be acquired via the learning data processing device 100, or may be acquired via an input device or a network (not shown). Furthermore, the learning device 200 updates the parameters of the deep generative model and records the deep generative model by evaluating the generated re-illumination image and the learning data.

　図１に示すように、学習データ処理装置１００は、データ入力部１１０、データ拡張部１２０及びデータ出力部１３０を備える。 As shown in FIG. 1, the learning data processing device 100 includes a data input section 110, a data extension section 120 and a data output section .

　データ入力部１１０は、学習データ記憶部３００から、学習データ、つまり入力画像と入力画像の照明環境、教師画像と教師画像の照明環境、及び、対象領域画像を取得する。データ入力部１１０は、この学習データの内、入力画像と入力画像の照明環境、及び、教師画像と教師画像の照明環境を、データ出力部１３０に渡す。また、データ入力部１１０は、ランダムなパラメータで、照明の影響を増やすデータ拡張を行うか決定し、データ拡張を行う場合は、学習データの内の入力画像と対象領域画像とを、データ拡張部１２０に渡す。 The data input unit 110 acquires the learning data, that is, the input image and the lighting environment of the input image, the teacher image and the lighting environment of the teacher image, and the target area image from the learning data storage unit 300 . The data input unit 110 passes the input image and the lighting environment of the input image and the teacher image and the lighting environment of the teacher image among the learning data to the data output unit 130 . In addition, the data input unit 110 uses random parameters to determine whether to perform data extension that increases the influence of illumination. Hand over to 120.

　データ拡張部１２０は、輝度調整部１２１、マスク領域作成部１２２、マスク画像記憶部１２３、及び、画像合成部１２４を備える。 The data extension unit 120 includes a luminance adjustment unit 121, a mask area creation unit 122, a mask image storage unit 123, and an image synthesis unit 124.

　輝度調整部１２１は、データ入力部１１０から渡された入力画像を画像合成部１２４へ渡す。また輝度調整部１２１は、入力画像に対して輝度調整を行った輝度調整画像である陰影・ハイライト画像を作成する。そして、輝度調整部１２１は、この作成した陰影・ハイライト画像と、データ入力部１１０から渡された対象領域画像とを、マスク領域作成部１２２へ渡す。 The brightness adjustment unit 121 passes the input image passed from the data input unit 110 to the image synthesizing unit 124 . Also, the luminance adjustment unit 121 creates a shadow/highlight image, which is a luminance-adjusted image obtained by performing luminance adjustment on the input image. Then, the brightness adjustment unit 121 passes the created shadow/highlight image and the target region image passed from the data input unit 110 to the mask region generation unit 122 .

　マスク画像記憶部１２３は、不規則的なマスク画像のデータセットであるマスク画像データセットを記憶している。図２は、このマスク画像データセットの一例を示す図である。マスク画像には、既存のデータセットを用いても良いし、パーリンノイズを用いて作成した画像を用いても良い。 The mask image storage unit 123 stores a mask image data set, which is a data set of irregular mask images. FIG. 2 is a diagram showing an example of this mask image data set. An existing data set may be used for the mask image, or an image created using Perlin noise may be used.

　マスク領域作成部１２２は、輝度調整部１２１から渡された陰影・ハイライト画像及び対象領域画像と、マスク画像とから、陰影又はハイライトをのせる領域を示す陰影・ハイライト付与領域画像を作成する。マスク画像は、マスク画像記憶部１２３に事前に記憶されているマスク画像であっても良いし、陰影・ハイライト画像に任意の２値化処理を行うことで作成しても良い。マスク領域作成部１２２は、事前に記憶されているマスク画像と作成したマスク画像のどちらを用いるかを、ランダムなパラメータで決定することができる。マスク領域作成部１２２は、陰影・ハイライト画像と、作成した陰影・ハイライト付与領域画像とを、画像合成部１２４へ渡す。 The mask area creation unit 122 creates a shadow/highlight application area image indicating an area to be shaded or highlighted from the shadow/highlight image and the target area image passed from the brightness adjustment unit 121 and the mask image. do. The mask image may be a mask image stored in advance in the mask image storage unit 123, or may be created by subjecting the shadow/highlight image to arbitrary binarization processing. The mask area creation unit 122 can determine which of the pre-stored mask image and the created mask image is to be used using a random parameter. The mask area creating unit 122 passes the shadow/highlight image and the created shadow/highlight adding area image to the image synthesizing unit 124 .

　画像合成部１２４は、輝度調整部１２１から渡された入力画像と、マスク領域作成部１２２から渡された陰影・ハイライト画像及び陰影・ハイライト付与領域画像とを合成して、データ拡張画像を作成する。画像合成部１２４は、作成したデータ拡張画像をデータ出力部１３０へ渡す。 The image synthesizing unit 124 synthesizes the input image passed from the luminance adjusting unit 121, the shadow/highlight image and the shadow/highlight added area image passed from the mask area creating unit 122, and generates a data extension image. create. The image synthesizing unit 124 passes the created data augmented image to the data output unit 130 .

　データ出力部１３０は、データ拡張を行わない場合には、データ入力部１１０から渡された入力画像及び教師画像それぞれに、正規化又は標準化を行う。そして、データ出力部１３０は、正規化又は標準化した入力画像と入力画像の照明環境、及び、正規化又は標準化した教師画像と教師画像の照明環境を、学習データとして学習装置２００へ渡す。 The data output unit 130 normalizes or standardizes each of the input image and teacher image passed from the data input unit 110 when data extension is not performed. Then, the data output unit 130 passes the normalized or standardized input image and the lighting environment of the input image, and the normalized or standardized teacher image and the lighting environment of the teacher image to the learning device 200 as learning data.

　また、データ出力部１３０は、データ拡張を行う場合には、データ入力部１１０から渡された入力画像を、データ拡張部１２０の画像合成部１２４から渡されたデータ拡張画像へ置き換える。すなわち、この場合には、データ出力部１３０は、このデータ拡張画像に書き換えられた入力画像と教師画像それぞれに、正規化又は標準化を行うこととなる。そして、データ出力部１３０は、正規化又は標準化した入力画像と入力画像の照明環境、及び、正規化又は標準化した教師画像と教師画像の照明環境を、学習データとして学習装置２００へ渡す。すなわち、データ出力部１３０は、元の学習データとは異なる新たな学習データを学習装置２００へ渡す。 Also, when performing data extension, the data output unit 130 replaces the input image passed from the data input unit 110 with the data extended image passed from the image synthesis unit 124 of the data extension unit 120 . That is, in this case, the data output unit 130 normalizes or standardizes the input image rewritten to the data augmented image and the teacher image. Then, the data output unit 130 passes the normalized or standardized input image and the lighting environment of the input image, and the normalized or standardized teacher image and the lighting environment of the teacher image to the learning device 200 as learning data. That is, data output unit 130 passes new learning data different from the original learning data to learning device 200 .

　図３は、学習データ処理装置１００のハードウェア構成の一例を示す図である。学習データ処理装置１００は、例えば、プロセッサ１１、プログラムメモリ１２、データメモリ１３、入出力インタフェース１４、及び、通信インタフェース１５を備える。プログラムメモリ１２、データメモリ１３、入出力インタフェース１４、及び、通信インタフェース１５は、バス１６を介してプロセッサ１１に接続されている。学習データ処理装置１００は、例えば、パーソナルコンピュータなどの汎用的なコンピュータで構成しても良い。 FIG. 3 is a diagram showing an example of the hardware configuration of the learning data processing device 100. As shown in FIG. The learning data processing device 100 includes a processor 11, a program memory 12, a data memory 13, an input/output interface 14, and a communication interface 15, for example. Program memory 12 , data memory 13 , input/output interface 14 and communication interface 15 are connected to processor 11 via bus 16 . The learning data processing device 100 may be composed of, for example, a general-purpose computer such as a personal computer.

　プロセッサ１１は、マルチコア・マルチスレッド数のＣＰＵ（Central Processing Unit）を含み、複数の情報処理を並行して同時実行することができる。 The processor 11 includes a multi-core/multi-threaded CPU (Central Processing Unit), and is capable of concurrently executing multiple pieces of information processing.

　プログラムメモリ１２は、記憶媒体として、例えば、ＨＤＤ（Hard Disk Drive）、ＳＳＤ（Solid State Drive）等の随時書込み及び読出しが可能な不揮発性メモリと、ＲＯＭ（Read Only Memory）等の不揮発性メモリとを組み合わせて使用したもので、ＣＰＵ等のプロセッサ１１が実行することで、この発明の第１実施形態に係る各種制御処理を実行するために必要なプログラムを記憶している。すなわち、プロセッサ１１は、プログラムメモリ１２に記憶されたプログラムを読み出して実行することで、図１に示すようなデータ入力部１１０、データ拡張部１２０及びデータ出力部１３０として機能することができる。なお、これらの処理機能部は、１つのＣＰＵスレッドの順次処理によって実現されても良いし、それぞれ別個のＣＰＵスレッドで同時並行処理可能な形態で実現されも良い。また、これらの処理機能部は、別個のＣＰＵで実現されても良い。すなわち、学習データ処理装置１００は、複数のＣＰＵを備えていても良い。また、これらの処理機能部の少なくとも一部は、ＡＳＩＣ（Application Specific Integrated Circuit）、ＦＰＧＡ（field-programmable gate array）、ＧＰＵ（Graphics Processing Unit）等の集積回路を含む、他の多様なハードウェア回路の形式で実現されても良い。なお、プログラムメモリ１２に記憶されるプログラムは、図３に示すような学習データ処理プログラムを含むことができる。 The program memory 12 includes, as a storage medium, a non-volatile memory such as a HDD (Hard Disk Drive) or an SSD (Solid State Drive) that can be written and read at any time, and a non-volatile memory such as a ROM (Read Only Memory). , and stores programs necessary for executing various control processes according to the first embodiment of the present invention by being executed by a processor 11 such as a CPU. That is, the processor 11 can function as the data input unit 110, the data expansion unit 120, and the data output unit 130 as shown in FIG. 1 by reading and executing the programs stored in the program memory 12. FIG. These processing function units may be realized by sequential processing of one CPU thread, or may be realized in a form in which simultaneous parallel processing is possible by separate CPU threads. Also, these processing function units may be realized by separate CPUs. That is, the learning data processing device 100 may include multiple CPUs. In addition, at least some of these processing function units may include integrated circuits such as ASICs (Application Specific Integrated Circuits), FPGAs (field-programmable gate arrays), GPUs (Graphics Processing Units), and various other hardware circuits. may be implemented in the form of The programs stored in the program memory 12 can include a learning data processing program as shown in FIG.

　データメモリ１３は、記憶媒体として、例えば、ＨＤＤ又はＳＳＤ等の随時書込み及び読出しが可能な不揮発性メモリと、ＲＡＭ（Random Access Memory）等の揮発性メモリとを組み合わせて使用したもので、データ拡張を含むデータの事前処理を行うのに必要な各種データを事前記憶するために用いられる。例えば、データメモリ１３には、マスク画像データセットを記憶するためのマスク画像データセット記憶領域１３Ａが確保されることができる。すなわち、データメモリ１３は、マスク画像記憶部１２３として機能することがでる。また、データメモリ１３には、データ拡張を含むデータの事前処理を行う過程で取得及び作成された各種データを記憶するために用いられる一時記憶領域１３Ｂも確保されることができる。 The data memory 13 uses, as a storage medium, a combination of a non-volatile memory such as an HDD or an SSD that can be written and read at any time, and a volatile memory such as a RAM (Random Access Memory). It is used to pre-store various data necessary for pre-processing data including For example, in the data memory 13, a mask image data set storage area 13A for storing mask image data sets can be reserved. That is, the data memory 13 can function as the mask image storage unit 123 . In the data memory 13, a temporary storage area 13B can also be reserved for storing various data obtained and created during the process of pre-processing data including data extension.

　入出力インタフェース１４は、図示しないキーボードやマウス等の入力装置、及び、液晶モニタ等の出力装置とのインタフェースである。また、入出力インタフェース１４は、メモリカードやディスク媒体のリーダ／ライタとのインタフェースを含み得る。マスク画像セットがメモリカード又はディスク媒体に記録されて提供される場合、プロセッサ１１は、入出力インタフェース１４を介してそれを読み出して、データメモリ１３のマスク画像データセット記憶領域１３Ａに保存することができる。 The input/output interface 14 is an interface with an input device such as a keyboard and mouse (not shown) and an output device such as a liquid crystal monitor. The input/output interface 14 may also include an interface with a memory card or disk medium reader/writer. If the mask image set is recorded on a memory card or disk medium and provided, the processor 11 can read it through the input/output interface 14 and store it in the mask image data set storage area 13A of the data memory 13. can.

　通信インタフェース１５は、例えば１つ以上の有線又は無線の通信インタフェースユニットを含んでおり、ネットワークで使用される通信プロトコルに従い、ネットワーク上の機器との間で各種情報の送受信を可能にする。有線インタフェースとしては、例えば有線ＬＡＮ、ＵＳＢ（Universal Serial Bus）インタフェース、等が使用され、また無線インタフェースとしては、例えば４Ｇ又は５Ｇなどの携帯電話通信システム、無線ＬＡＮ、Ｂｌｕｅｔｏｏｔｈ（登録商標）等の小電力無線データ通信規格を採用したインタフェース、等が使用される。例えば、学習データ記憶部３００がネットワーク上のファイルサーバ等に配置される場合、プロセッサ１１は、通信インタフェース１５を介して学習データ記憶部３００から学習データを受信して取得することができる。同様にして、プロセッサ１１は、ネットワーク上の機器からマスク画像データセットを取得することもできる。また、学習データ記憶部３００がネットワーク上のサーバ装置等に配置される場合、プロセッサ１１は、通信インタフェース１５を介して学習データを学習装置２００へ送信することができる。 The communication interface 15 includes, for example, one or more wired or wireless communication interface units, and enables transmission and reception of various information with devices on the network according to the communication protocol used on the network. As a wired interface, for example, a wired LAN, a USB (Universal Serial Bus) interface, etc. are used. An interface that adopts the power wireless data communication standard, etc. is used. For example, when the learning data storage unit 300 is arranged in a file server or the like on the network, the processor 11 can receive and acquire learning data from the learning data storage unit 300 via the communication interface 15 . Similarly, processor 11 can obtain mask image data sets from devices on the network. Also, when learning data storage unit 300 is arranged in a server device or the like on a network, processor 11 can transmit learning data to learning device 200 via communication interface 15 .

　（動作）
　次に、学習データ処理装置１００の動作を説明する。 (motion)
Next, the operation of the learning data processing device 100 will be described.

　図４は、学習データ処理装置１００の処理動作の一例を示すフローチャートである。入出力インタフェース１４を介して図示しない入力装置から、ユーザによって学習データ処理プログラムの実行が指示されると、プロセッサ１１は、このフローチャートに示す動作を開始する。或いは、通信インタフェースを介して、ネットワーク上の学習装置２００からの実行指示に応じて、プロセッサ１１は、このフローチャートに示す動作を開始するようにしても良い。 FIG. 4 is a flowchart showing an example of the processing operation of the learning data processing device 100. FIG. When the user instructs execution of the learning data processing program from an input device (not shown) through the input/output interface 14, the processor 11 starts the operation shown in this flow chart. Alternatively, the processor 11 may start the operation shown in this flow chart in response to an execution instruction from the learning device 200 on the network via the communication interface.

　まず、プロセッサ１１は、データ入力部１１０としての動作を実行して、学習データ記憶部３００から、学習データを取得する（ステップＳ１１）。取得した学習データは、データメモリ１３の一時記憶領域１３Ｂに保存される。学習データは、入力画像と入力画像の照明環境、教師画像と教師画像の照明環境、及び、対象領域画像を含む。 First, the processor 11 operates as the data input unit 110 to acquire learning data from the learning data storage unit 300 (step S11). The acquired learning data is stored in the temporary storage area 13B of the data memory 13 . The learning data includes the input image and the lighting environment of the input image, the teacher image and the lighting environment of the teacher image, and the target area image.

　そして、プロセッサ１１は、ランダムなパラメータで、照明の影響を増やすデータ拡張を行うか決定する（ステップＳ１２）。データ拡張を行わない場合には（ステップＳ１２のＮＯ）、プロセッサ１１は、後述するステップＳ２０の処理に進む。 Then, the processor 11 determines whether or not to perform data expansion to increase the influence of illumination using random parameters (step S12). If data extension is not to be performed (NO in step S12), the processor 11 proceeds to the process of step S20, which will be described later.

　これに対して、データ拡張を行う場合には（ステップＳ１２のＹＥＳ）、プロセッサ１１は、輝度調整部１２１としての動作を実行して、まず、入力画像と対象領域画像を取得する（ステップＳ１３）。すなわち、プロセッサ１１は、一時記憶領域１３Ｂから入力画像と対象領域画像を読み出す。構成の説明におけるデータ入力部１１０から輝度調整部１２１へ入力画像及び対象領域画像を渡すとは、このように一時記憶領域１３Ｂへの保存及び読み出しを意味する。これは、以下の説明においても同様である。 On the other hand, if data extension is to be performed (YES in step S12), the processor 11 performs the operation as the luminance adjustment unit 121 and first acquires the input image and the target area image (step S13). . That is, the processor 11 reads out the input image and the target area image from the temporary storage area 13B. Passing the input image and the target area image from the data input unit 110 to the brightness adjustment unit 121 in the description of the configuration means saving and reading to the temporary storage area 13B in this manner. This also applies to the following description.

　図５は、入力画像Ｉの一例を示す図である。また、図６は、対象領域画像Ｍ_fの一例を示す図である。 FIG. 5 is a diagram showing an example of the input image I. FIG. FIG. 6 is a diagram showing an example of the target area image _Mf .

　そして、プロセッサ１１は、入力画像Ｉの全体に対して輝度調整を行い、陰影・ハイライト画像を作成する（ステップＳ１４）。この輝度調整は、データ拡張として陰影を模した効果Ａを加える場合の輝度を下げる輝度調整と、データ拡張としてハイライトを模した効果Ｂを加える場合の輝度を上げる輝度調整と、を含む。輝度調整手法は、例えば線形補正及びガンマ補正を使用することができ、ユーザが事前に選択する。輝度調整のパラメータγは、線形補正及びガンマ補正のどちらにおいても、効果Ａの場合はγ＜１．０、効果Ｂの場合はγ＞１．０の条件で上限及び下限を事前にユーザが設定し、この範囲内からランダムに決定する。輝度調整を行った陰影・ハイライト画像Ｊは、以下の数１のように作成される。 Then, the processor 11 performs luminance adjustment on the entire input image I to create a shadow/highlight image (step S14). This brightness adjustment includes a brightness adjustment to decrease brightness when adding effect A simulating shadow as data extension, and a brightness adjustment increasing brightness when effect B simulating highlight is added as data extension. Brightness adjustment techniques may use, for example, linear correction and gamma correction, and are preselected by the user. For the brightness adjustment parameter γ, the user sets the upper and lower limits in advance under the condition that γ<1.0 for effect A and γ>1.0 for effect B in both linear correction and gamma correction. and randomly determined within this range. A shadow/highlight image J whose luminance has been adjusted is created as shown in Equation 1 below.

　図７は、データ拡張として陰影を模した効果Ａを加える場合の輝度調整を行った陰影・ハイライト画像Ｊの一例を示す図である。この場合の陰影・ハイライト画像Ｊは、陰影画像となる。また、図８は、データ拡張としてハイライトを模した効果Ｂを加える場合の輝度調整を行った陰影・ハイライト画像Ｊの一例を示す図である。この場合の陰影・ハイライト画像Ｊは、ハイライト画像となる。 FIG. 7 is a diagram showing an example of a shadow/highlight image J that has undergone luminance adjustment in the case of adding an effect A imitating a shadow as data extension. The shadow/highlight image J in this case is a shadow image. Also, FIG. 8 is a diagram showing an example of a shadow/highlight image J subjected to luminance adjustment in the case of adding an effect B simulating a highlight as data extension. The shadow/highlight image J in this case is a highlight image.

　プロセッサ１１は、こうして作成した陰影・ハイライト画像Ｊを一時記憶領域１３Ｂに保存する。 The processor 11 stores the shadow/highlight image J thus created in the temporary storage area 13B.

　プロセッサ１１は、マスク領域作成部１２２としての動作を実行して、マスク画像記憶部１２３つまりマスク画像データセット記憶領域１３Ａに事前に記憶されているマスク画像データセットを利用するか決定する（ステップＳ１５）。すなわち、プロセッサ１１は、事前に用意したマスク画像を用いるか、陰影・ハイライト画像Ｊからマスク画像を作成するかを、ランダムなパラメータにより決定する。 The processor 11 executes the operation as the mask area creating unit 122 and determines whether to use the mask image data set stored in advance in the mask image storage unit 123, that is, the mask image data set storage area 13A (step S15). ). That is, the processor 11 determines whether to use a mask image prepared in advance or to create a mask image from the shadow/highlight image J based on random parameters.

　マスク画像データセットを利用すると決定した場合（ステップＳ１５のＹＥＳ）、プロセッサ１１は、マスク画像記憶部１２３に記憶されているマスク画像データセットから、例えばランダムなパラメータにより、マスク画像Ｍ_dを取得する（ステップＳ１６）。図９は、取得したマスク画像Ｍ_dの一例を示す図である。プロセッサ１１は、この取得したマスク画像Ｍ_dを一時記憶領域１３Ｂに保存する。 If it is determined to use the mask image data set (YES in step S15), the processor 11 acquires the mask image _Md from the mask image data set stored in the mask image storage unit 123 using, for example, random parameters. (Step S16). FIG. 9 is a diagram showing an example of the acquired mask image _Md . The processor 11 stores this acquired mask image _Md in the temporary storage area 13B.

　これに対して、マスク画像データセットを利用しないと決定した場合（ステップＳ１５のＮＯ）、プロセッサ１１は、一時記憶領域１３Ｂに保存されている陰影・ハイライト画像Ｊの内、データ拡張としてハイライトを模した効果Ｂを加える場合の輝度調整を行った陰影・ハイライト画像Ｊであるハイライト画像を読み出す。そして、プロセッサ１１は、この陰影・ハイライト画像Ｊに任意の２値化処理を行うことでマスク画像Ｍ_dを作成する（ステップＳ１７）。図１０は、このハイライト画像から作成されたマスク画像Ｍ_dの一例を示す図である。プロセッサ１１は、この作成したマスク画像Ｍ_dを一時記憶領域１３Ｂに保存する。 On the other hand, if it is determined not to use the mask image data set (NO in step S15), the processor 11 selects the shadow/highlight image J stored in the temporary storage area 13B as data extension. A highlight image, which is a shadow/highlight image J subjected to luminance adjustment when applying an effect B imitating , is read. Then, the processor 11 creates a mask image _Md by performing arbitrary binarization processing on the shadow/highlight image J (step S17). FIG. 10 is a diagram showing an example of a mask image _Md created from this highlight image. The processor 11 stores this created mask image _Md in the temporary storage area 13B.

　こうしてマスク画像Ｍ_dが取得又は作成されたならば、プロセッサ１１は、一時記憶領域１３Ｂより対象領域画像Ｍ_fとマスク画像Ｍ_dとを読み出し、それらに基づいて陰影・ハイライト付与領域画像を作成する（ステップＳ１８）。すなわち、プロセッサ１１は、以下の数２のように、対象領域画像Ｍ_fとマスク画像Ｍ_dの論理積をとり、ガウシアンフィルタｇ（k，σ）をかけることで、陰影・ハイライト付与領域画像Ｍを求める。なお、ガウシアンフィルタのパラメータは、ユーザが事前に決定する。デフォルトでは、フィルタサイズｋ＝１１とし、σ＝５．０とする。そして、プロセッサ１１は、作成した陰影・ハイライト付与領域画像Ｍを一時記憶領域１３Ｂに保存する。 After the mask image _Md is obtained or created in this way, the processor 11 reads out the target area image _Mf and the mask image _Md from the temporary storage area 13B, and creates a shadow/highlight application area image based on them. (step S18). That is, the processor 11 obtains the logical product of the target area image M _f and the mask image M _d as shown in Equation 2 below, and applies a Gaussian filter g(k, σ) to obtain a shadow/highlight added area image Ask for M. Note that the parameters of the Gaussian filter are determined in advance by the user. By default, the filter size k=11 and σ=5.0. Then, the processor 11 saves the created shaded/highlighted area image M in the temporary storage area 13B.

　その後、プロセッサ１１は、画像合成部１２４としての動作を実行し、一時記憶領域１３Ｂから入力画像Ｉ、陰影・ハイライト画像Ｊ、及び陰影・ハイライト付与領域画像Ｍを読み出し、それらを以下の数３のように合成して、データ拡張画像Ｉ’を作成する（ステップＳ１９）。 After that, the processor 11 executes the operation as the image synthesizing unit 124, reads out the input image I, the shadow/highlight image J, and the shadow/highlight application area image M from the temporary storage area 13B, and converts them into the following numbers. 3 to create a data extended image I' (step S19).

　なお、ここで読み出す陰影・ハイライト画像Ｊは、マスク画像Ｍ_dがマスク画像データセットから取得された場合には陰影画像、陰影・ハイライト画像Ｊから作成した場合には該当するハイライト画像、となる。プロセッサ１１は、上記ステップＳ１６又はステップＳ１７において、一時記憶領域１３Ｂに保存しておき、それを読み出すことで、何れの陰影・ハイライト画像Ｊを読み出すかを決定することができる。或いは、上記ステップＳ１６又はステップＳ１７においてマスク画像Ｍ_dを一時記憶領域１３Ｂに保存する際に、画像合成に使用しない陰影・ハイライト画像Ｊを一時記憶領域１３Ｂから消去するようにしても良い。 The shadow/highlight image J read here is a shadow image when the mask image _Md is obtained from the mask image data set, a corresponding highlight image when the mask image Md is created from the shadow/highlight image J, becomes. The processor 11 can determine which shadow/highlight image J is to be read by storing it in the temporary storage area 13B in step S16 or step S17 and reading it. Alternatively, when the mask image _Md is stored in the temporary storage area 13B in step S16 or S17, the shadow/highlight image J not used for image synthesis may be deleted from the temporary storage area 13B.

　図１１及び図１２は、数３における１－Ｍで示される、画像合成される反転陰影・ハイライト付与領域画像の一例を示す図である。図１１は図９のマスク画像Ｍ_dに対応し、図１２は図１０のマスク画像Ｍ_dに対応する。 11 and 12 are diagrams showing an example of a reverse shadow/highlight imparting area image to be combined, indicated by 1-M in Equation 3. FIG. 11 corresponds to the mask image _Md of FIG. 9, and FIG. 12 corresponds to the mask image _Md of FIG.

　そして、プロセッサ１１は、作成したデータ拡張画像Ｉ’を一時記憶領域１３Ｂに保存する。図１３及び図１４は、作成されたデータ拡張画像Ｉ’の一例を示す図である。図１３は、入力画像Ｉと陰影画像である陰影・ハイライト画像Ｊと図１１の反転陰影・ハイライト付与領域画像１－Ｍとによって作成されたデータ拡張画像Ｉ’、図１４は、入力画像Ｉとハイライト画像である陰影・ハイライト画像Ｊと図１２の反転陰影・ハイライト付与領域画像１－Ｍとによって作成されたデータ拡張画像Ｉ’を示している。 Then, the processor 11 saves the created data augmented image I' in the temporary storage area 13B. 13 and 14 are diagrams showing an example of the created data augmented image I'. FIG. 13 shows a data augmented image I' created from the input image I, a shadow/highlight image J which is a shadow image, and the reversed shadow/highlight added area image 1-M in FIG. 12 shows a data augmented image I' created from I, a shadow/highlight image J which is a highlight image, and the reverse shadow/highlight imparting area image 1-M of FIG.

　そして、プロセッサ１１は、データ出力部１３０としての動作を実行し、学習データを送信する（ステップＳ２０）。 The processor 11 then operates as the data output unit 130 and transmits learning data (step S20).

　すなわち、上記ステップＳ１２においてデータ拡張を行わないと判断した場合には、プロセッサ１１は、一時記憶領域１３Ｂに保存されている入力画像と教師画像とを読み出し、それらに対して正規化又は標準化を行って、再度、一時記憶領域１３Ｂに保存する。そして、プロセッサ１１は、一時記憶領域１３Ｂから入力画像と入力画像の照明環境、及び、教師画像と教師画像の照明環境を読み出し、通信インタフェース１５を介して学習装置２００へ、それらを送信する。 That is, if it is determined in step S12 that data expansion is not to be performed, the processor 11 reads out the input image and teacher image stored in the temporary storage area 13B, and normalizes or standardizes them. and store it again in the temporary storage area 13B. Then, the processor 11 reads out the input image, the lighting environment of the input image, and the teacher image and the lighting environment of the teacher image from the temporary storage area 13B, and transmits them to the learning device 200 via the communication interface 15 .

　これに対して、上記ステップＳ１２においてデータ拡張を行うと判断して、上記ステップＳ１３乃至ステップＳ１９の処理によりデータ拡張画像Ｉ’が生成された場合には、プロセッサ１１は、一時記憶領域１３Ｂに保存されているデータ拡張画像Ｉ’を読み出す。そして、プロセッサ１１は、そのデータ拡張画像Ｉ’に対して正規化又は標準化を行い、その結果を、入力画像Ｉとして、既に一時記憶領域１３Ｂに保存されている入力画像Ｉに上書き保存する。すなわち、プロセッサ１１は、一時記憶領域１３Ｂに保存されている入力画像Ｉを、正規化又は標準化を行ったデータ拡張画像Ｉ’に書き換える。また、プロセッサ１１は、一時記憶領域１３Ｂに保存されている教師画像を読み出し、それに対して正規化又は標準化を行って、再度、一時記憶領域１３Ｂに保存する。そして、プロセッサ１１は、一時記憶領域１３Ｂから入力画像と入力画像の照明環境、及び、教師画像と教師画像の照明環境を読み出し、通信インタフェース１５を介して学習装置２００へ、それらを送信する。 On the other hand, when it is determined in step S12 that data extension is to be performed and the data extension image I' is generated by the processing in steps S13 to S19, the processor 11 stores the data in the temporary storage area 13B. read out the data augmented image I'. Then, the processor 11 normalizes or standardizes the data augmented image I', and saves the result as the input image I by overwriting the input image I already saved in the temporary storage area 13B. That is, the processor 11 rewrites the input image I stored in the temporary storage area 13B to the normalized or standardized data augmented image I'. The processor 11 also reads the teacher image stored in the temporary storage area 13B, normalizes or standardizes it, and stores it in the temporary storage area 13B again. Then, the processor 11 reads out the input image, the lighting environment of the input image, and the teacher image and the lighting environment of the teacher image from the temporary storage area 13B, and transmits them to the learning device 200 via the communication interface 15 .

　以上に説明した第１実施形態に係る学習データ処理装置１００は、データ入力部１１０により、学習データ記憶部３００から、入力画像Ｉ及び入力画像Ｉの照明環境と、入力画像Ｉから照明環境のみを変更した画像である教師画像及び教師画像の照明環境と、入力画像Ｉ中の輝度変更対象領域を示す対象領域画像Ｍ_fと、を含む、深層生成モデルの学習に利用する学習データを取得し、データ拡張部１２０によって、入力画像Ｉに対して輝度調整を行った輝度調整画像である陰影・ハイライト画像Ｊを作成すると共に、輝度変更を加える輝度変更領域を示すマスク画像Ｍ_dを取得又は作成し、対象領域画像Ｍ_f、陰影・ハイライト画像Ｊ及びマスク画像Ｍ_dを合成して、データ拡張画像Ｉ’を作成する。そして、学習データ処理装置１００は、データ出力部１３０により、学習データ中の入力画像Ｉをデータ拡張画像Ｉ’に変更して新たな学習データを作成し、新たな学習データを、深層生成モデルの学習に利用する学習データとして学習装置２００に出力する。 In the learning data processing apparatus 100 according to the first embodiment described above, the data input unit 110 obtains the input image I and the lighting environment of the input image I from the learning data storage unit 300, and only the lighting environment from the input image I. Acquiring learning data used for learning a deep generative model, including a teacher image that is a modified image, the lighting environment of the teacher image, and a target region image M _f that indicates a brightness change target region in the input image I; The data extension unit 120 creates a shadow/highlight image J, which is a brightness adjusted image obtained by performing brightness adjustment on the input image I, and acquires or creates a mask image _Md indicating a brightness change area to which brightness change is applied. Then, the target area image M _f , the shadow/highlight image J, and the mask image M _d are combined to create a data extension image I′. Then, the learning data processing device 100 uses the data output unit 130 to change the input image I in the learning data to the data augmented image I′ to create new learning data, and converts the new learning data into the deep generation model. It is output to the learning device 200 as learning data used for learning.

　このように、第１実施形態に係る学習データ処理装置１００は、学習データに基づいてデータ拡張画像Ｉ’を作成し、このデータ拡張画像Ｉ’を含む新たな学習データを作成することで、深層生成モデルの学習に用いる学習データの個数を増やすことができる。よって、学習装置２００は、不規則的な照明環境の影響が増えた学習データを用いて深層生成モデルの学習を行うことができ、陰影又はハイライトに対し頑健な深層生成モデルの学習を実現することが可能となる。 In this way, the learning data processing apparatus 100 according to the first embodiment creates the data augmented image I′ based on the learning data, and creates new learning data including the data augmented image I′, so that the deep layer The number of pieces of learning data used for learning the generative model can be increased. Therefore, the learning device 200 can learn a deep generative model using learning data with increased influence of an irregular lighting environment, and realize learning of a deep generative model that is robust against shadows or highlights. becomes possible.

　また、第１実施形態によれば、データ拡張部１２０は、入力画像Ｉ全体の輝度を下げることで輝度調整画像である陰影・ハイライト画像Ｊを作成する輝度調整部１２１と、対象領域画像Ｍ_f、陰影・ハイライト画像Ｊ及びマスク画像Ｍ_dを合成して、入力画像Ｉにおける輝度変更対象領域に対応する領域の内、輝度変更領域に相当する部分が暗くなったデータ拡張画像Ｉ’を作成する画像合成部１２４と、を含む。 Further, according to the first embodiment, the data extension unit 120 includes the brightness adjustment unit 121 that creates the shadow/highlight image J, which is a brightness adjusted image, by lowering the brightness of the entire input image I, and the target region image M _f , the shadow/highlight image J and the mask image _Md are combined to create a data augmented image I′ in which the portion corresponding to the luminance change area in the area corresponding to the luminance change target area in the input image I is darkened. and an image synthesizing unit 124 to create.

　このように、第１実施形態に係る学習データ処理装置１００は、学習データに陰影を模した画像効果を加えることで、学習データ内の擬似的に照明環境のパターンを増加させることができ、学習装置２００における陰影に対する頑健な深層生成モデルの学習を実現することが可能となる。 As described above, the learning data processing apparatus 100 according to the first embodiment adds an image effect simulating a shadow to the learning data, thereby increasing the pattern of the lighting environment in the learning data in a pseudo manner. It is possible to realize robust deep generative model learning for shadows in the device 200 .

　また、第１実施形態によれば、データ拡張部１２０は、入力画像Ｉ全体の輝度を上げることで輝度調整画像である陰影・ハイライト画像Ｊを作成する輝度調整部１２１と、対象領域画像Ｍ_f、陰影・ハイライト画像Ｊ及びマスク画像Ｍ_dを合成して、入力画像Ｉにおける輝度変更対象領域に対応する領域の内、輝度変更領域に相当する部分が明るくなったデータ拡張画像Ｉ’を作成する画像合成部１２４と、を含む。 Further, according to the first embodiment, the data extension unit 120 includes the brightness adjustment unit 121 that creates the shadow/highlight image J, which is a brightness adjustment image, by increasing the brightness of the entire input image I, and the target region image M _f , the shadow/highlight image J and the mask image M _d are combined to create a data augmented image I′ in which the portion corresponding to the luminance change area in the area corresponding to the luminance change target area in the input image I is brightened. and an image synthesizing unit 124 to create.

　このように、第１実施形態に係る学習データ処理装置１００は、学習データにハイライトを模した画像効果を加えることで、学習データ内の擬似的に照明環境のパターンを増加させることができ、学習装置２００におけるハイライトに対する頑健な深層生成モデルの学習を実現することが可能となる。 In this way, the learning data processing apparatus 100 according to the first embodiment adds an image effect simulating a highlight to the learning data, thereby increasing the number of pseudo lighting environment patterns in the learning data. It is possible to realize robust deep generative model learning for highlights in the learning device 200 .

　さらに、第１実施形態によれば、データ拡張部１２０は、輝度調整画像である陰影・ハイライト画像Ｊに任意の２値化処理を行うことでマスク画像Ｍ_dを作成するマスク領域作成部１２２を含む。 Furthermore, according to the first embodiment, the data expansion unit 120 includes the mask area creation unit 122 that creates the mask image _Md by performing arbitrary binarization processing on the shadow/highlight image J, which is the brightness adjustment image. including.

　このように、第１実施形態に係る学習データ処理装置１００は、輝度調整画像である陰影・ハイライト画像Ｊすなわち入力画像Ｉに基づいたマスク画像Ｍ_dを作成し、このマスク画像Ｍ_dを使用してデータ拡張画像Ｉ’を作成するので、入力画像Ｉから大きく乖離したデータ拡張画像Ｉ’が作成される確率を下げることができる。 In this way, the learning data processing apparatus 100 according to the first embodiment creates a mask image _Md based on the shadow/highlight image J, that is, the input image I, which is a luminance adjustment image, and uses this mask image _Md . Since the data-extended image I' is generated as a result, the probability that the data-extended image I' deviating greatly from the input image I will be generated can be reduced.

　また、第１実施形態によれば、データ拡張部１２０は、さらに、マスク画像記憶部１２３に事前に記憶された不規則的なマスク画像のデータセットであるマスク画像データセットＭＩＤＳの中からマスク画像Ｍ_dを取得するマスク領域作成部１２２を含む。 Further, according to the first embodiment, the data expansion unit 120 further extracts mask images from the mask image data set MIDS, which is a data set of irregular mask images stored in the mask image storage unit 123 in advance. It includes a mask region generator 122 that obtains M _d .

　このように、第１実施形態に係る学習データ処理装置１００は、入力画像Ｉに依存しない様々なマスク画像Ｍ_dを使用してデータ拡張画像Ｉ’を作成することができ、学習データの個数を容易に増加させることが可能となる。 Thus, the learning data processing apparatus 100 according to the first embodiment can create the data augmented image I' using various mask images _Md independent of the input image I, and the number of learning data can be reduced to It becomes possible to increase easily.

　［第２実施形態］
　第１実施形態では、学習データに陰影を模した画像効果を加えるデータ拡張を行ったデータ拡張画像、又は、学習データにハイライトを模した画像効果を加えるデータ拡張を行ったデータ拡張画像Ｉ’を作成している。すなわち、何れか一方のデータ拡張画像Ｉ’を作成するようにしている。しかしながら、それら２種類のデータ拡張画像Ｉ’を両方とも作成するようにしても良い。 [Second embodiment]
In the first embodiment, a data-extended image obtained by adding an image effect simulating a shadow to the learning data, or a data-extended image I′ obtained by performing data extension adding an image effect simulating a highlight to the learning data. is creating That is, one of the data extension images I' is created. However, both of these two types of data extension images I' may be created.

　図１５は、この発明の第２実施形態に係る学習データ処理装置の処理動作の一例を示すフローチャートである。本実施形態では、上記第１実施形態における上記ステップＳ１５の判断処理を省略し、プロセッサ１１は、上記ステップＳ１６でのマスク画像Ｍ_dの取得と、上記ステップＳ１７でのマスク画像Ｍ_dの作成との両方を行う。プロセッサ１１がマルチスレッド数のＣＰＵを含むとき、それらの処理は別スレッドで並行して同時に行うことができる。勿論、上記ステップＳ１６の処理と上記ステップＳ１７の処理とを順次に行うものとしても良い。この場合、上記ステップＳ１６の処理を行った後、上記ステップＳ１７の処理を行っても良いし、その逆の順序であっても良い。 FIG. 15 is a flow chart showing an example of the processing operation of the learning data processing device according to the second embodiment of the present invention. In this embodiment, the determination process of step S15 in the first embodiment is omitted, and the processor 11 obtains the mask image _Md in step S16 and creates the mask image _Md in step S17. do both. When the processor 11 includes multi-threaded CPUs, these processes can be performed concurrently in separate threads. Of course, the process of step S16 and the process of step S17 may be performed sequentially. In this case, after performing the process of step S16, the process of step S17 may be performed, or the order may be reversed.

　上記ステップＳ１８の処理においては、プロセッサ１１は、それぞれのマスク画像Ｍ_dを用いて２種類の陰影・ハイライト付与領域画像Ｍを作成し、上記ステップＳ１９の処理において、２種類のデータ拡張画像Ｉ’を作成する。 In the process of step S18, the processor 11 creates two types of shadow/highlight addition area images M using the respective mask images M _d , and in the process of step S19, two types of data extension images I ' to create.

　そして、プロセッサ１１は、上記ステップＳ２０において、それら２種類のデータ拡張画像Ｉ’を含む学習データを学習装置２００へ送信する。 Then, the processor 11 transmits learning data including these two types of data augmented images I' to the learning device 200 in step S20.

　以上に説明した第２実施形態に係る学習データ処理装置１００は、入力画像Ｉ全体の輝度を下げた陰影・ハイライト画像Ｊと、入力画像Ｉ全体の輝度を上げた陰影・ハイライト画像Ｊとを用いて、入力画像Ｉにおける輝度変更対象領域に対応する領域の内、輝度変更領域に相当する部分が暗くなったデータ拡張画像Ｉ’と、輝度変更領域に相当する部分が明るくなったデータ拡張画像Ｉ’と、を作成する。 The learning data processing apparatus 100 according to the second embodiment described above produces a shadow/highlight image J obtained by reducing the luminance of the entire input image I and a shadow/highlight image J obtained by increasing the luminance of the entire input image I. is used to obtain a data extension image I′ in which a portion corresponding to the brightness change region is darkened in the region corresponding to the brightness change target region in the input image I, and a data extension image in which a portion corresponding to the brightness change region is brightened Create an image I'.

　このように、第２実施形態に係る学習データ処理装置１００は、学習データに陰影及びハイライトを模した画像効果を加えることで、学習データ内の擬似的に照明環境のパターンを増加させることができ、学習装置２００における陰影及びハイライトに対する頑健な深層生成モデルの学習を実現することが可能となる。 As described above, the learning data processing apparatus 100 according to the second embodiment adds image effects simulating shadows and highlights to the learning data, thereby increasing the number of pseudo lighting environment patterns in the learning data. This makes it possible to realize robust deep generative model learning for shadows and highlights in the learning device 200 .

　［第３実施形態］
　図１６は、この発明の第３実施形態に係る学習データ処理装置１００を備える深層生成モデル学習システムの構成の一例を示すブロック図である。本実施形態に係る学習データ処理装置１００は、第１実施形態の構成に加え、評価部１４０を備える。そして、データ拡張部１２０の画像合成部１２４は、作成したデータ拡張画像Ｉ’を、データ出力部１３０に加え、この評価部１４０にも渡す。 [Third embodiment]
FIG. 16 is a block diagram showing an example of the configuration of a deep generative model learning system including the learning data processing device 100 according to the third embodiment of the present invention. A learning data processing apparatus 100 according to this embodiment includes an evaluation unit 140 in addition to the configuration of the first embodiment. Then, the image synthesizing unit 124 of the data extension unit 120 adds the created data extension image I′ to the data output unit 130 and also passes it to the evaluation unit 140 .

　評価部１４０は、内部に異常検知モデルを持ち、データ拡張画像Ｉ’の評価を行う。ここで、異常検知モデルは、実画像から取得した陰影・ハイライトがのった画像群Ａと、実画像から大きく乖離するよう恣意的に作成された陰影・ハイライトがのった画像群Ｂとを学習データとし、距離学習を用いて学習を行ったものである。評価部１４０は、データ拡張部１２０からデータ拡張画像Ｉ’を取得し、このデータ拡張画像Ｉ’を異常検知モデルに入力して、評価値を得る。そして、評価値がユーザの設定した閾値を超えたとき、評価部１４０は、データ拡張画像Ｉ’は実画像から大きく乖離した画像であるとみなし、これを破棄して、データ拡張部１２０に再度、データ拡張を実行させる。 The evaluation unit 140 has an internal anomaly detection model and evaluates the data augmented image I'. Here, the anomaly detection model consists of an image group A with shadows and highlights obtained from the actual image, and an image group B with shadows and highlights arbitrarily created so as to greatly deviate from the actual image. was used as learning data, and learning was performed using distance learning. The evaluation unit 140 acquires the data extension image I' from the data extension unit 120, inputs this data extension image I' to the anomaly detection model, and obtains an evaluation value. Then, when the evaluation value exceeds the threshold set by the user, the evaluation unit 140 regards the data extension image I′ as an image greatly deviating from the actual image, discards it, and returns the image to the data extension unit 120 again. , to perform data augmentation.

　図１７は、第３実施形態に係る学習データ処理装置１００の処理動作の一例を示すフローチャートである。すなわち、プロセッサ１１は、上記ステップＳ１９の処理に続けて、一時記憶領域１３Ｂに保存されたデータ拡張画像Ｉ’を読み出し、事前に学習した異常検知モデルにより当該データ拡張画像Ｉ’を評価する（ステップＳ３１）。 FIG. 17 is a flow chart showing an example of the processing operation of the learning data processing device 100 according to the third embodiment. That is, following the processing of step S19, the processor 11 reads out the data augmented image I' stored in the temporary storage area 13B, and evaluates the data augmented image I' using the anomaly detection model learned in advance (step S31).

　そして、プロセッサ１１は、得られた評価値が閾値以下であるか判断する（ステップＳ３２）。評価値が閾値以下の場合（ステップＳ３２のＹＥＳ）、データ拡張画像Ｉ’は、実画像から大きく乖離はしておらず、学習装置２００の学習データに相応しいものとして、上記ステップＳ２０の処理に進む。これにより、そのデータ拡張画像Ｉ’を含む学習データが学習装置２００に送信されることとなる。 Then, the processor 11 determines whether the obtained evaluation value is equal to or less than the threshold (step S32). If the evaluation value is equal to or less than the threshold (YES in step S32), the data augmented image I' does not deviate greatly from the actual image, and is considered suitable for learning data of the learning device 200, and the process proceeds to step S20. . As a result, learning data including the data augmented image I′ is transmitted to the learning device 200 .

　これに対して、評価値が閾値以下ではない、つまり、閾値を超えている場合（ステップＳ３２のＮＯ）、プロセッサ１１は、データ拡張画像Ｉ’は実画像から大きく乖離した画像であるとして、一時記憶領域１３Ｂから当該データ拡張画像Ｉ’を削除して、上記ステップＳ１３の処理から繰り返す。これにより、マスク画像を変更して新たなデータ拡張画像Ｉ’を作成することができる。 On the other hand, if the evaluation value is not equal to or less than the threshold value, that is, if it exceeds the threshold value (NO in step S32), the processor 11 temporarily determines that the data augmented image I′ is an image greatly deviating from the actual image. The data augmented image I' is deleted from the storage area 13B, and the process from step S13 is repeated. As a result, a new data augmented image I' can be created by changing the mask image.

　以上に説明した第３実施形態に係る学習データ処理装置１００は、評価部１４０において、データ拡張部１２０によって作成されたデータ拡張画像Ｉ’の評価を行い、データ拡張画像Ｉ’が実画像から大きく乖離した画像である場合には、再度、データ拡張部１２０にデータ拡張画像Ｉ’を作成させる。 In the learning data processing device 100 according to the third embodiment described above, the evaluation unit 140 evaluates the data augmented image I′ created by the data extension unit 120, and the data augmented image I′ is larger than the actual image. If the image is a deviated image, the data extension unit 120 is caused to create the data extension image I′ again.

　このように、第３実施形態に係る学習データ処理装置１００は、データ拡張画像Ｉ’の評価を行うことで、学習装置２００での深層生成モデルの学習に利用するのに相応しくない学習データが作成されることを防止することが可能となる。 In this way, the learning data processing device 100 according to the third embodiment evaluates the data augmented image I′ to create learning data that is not suitable for use in learning the deep generative model in the learning device 200. It is possible to prevent it from being done.

　なお、前述した第２実施形態に係る学習データ処理装置１００においても、この第３実施形態のような評価部１４０を追加しても良いことは言うまでも無い。 It goes without saying that the learning data processing device 100 according to the second embodiment described above may also include the evaluation unit 140 as in the third embodiment.

　［他の実施形態］
　前述した第１及び第３実施形態では、輝度調整部１２１において、陰影・ハイライト画像Ｊとして陰影画像とハイライト画像の両方を作成し、マスク領域作成部１２２にてランダムに何れかを選択的に使用するものとしている。このようにする代わりに、輝度調整部１２１においてランダムに何れかの画像を生成し、マスク領域作成部１２２では、輝度調整部１２１で生成された陰影・ハイライト画像Ｊに応じたマスク画像を使用するものとしても良い。 [Other embodiments]
In the first and third embodiments described above, the brightness adjustment unit 121 creates both a shadow image and a highlight image as the shadow/highlight image J, and the mask area creation unit 122 randomly selects one of them. shall be used for Instead of doing so, the brightness adjustment unit 121 randomly generates any image, and the mask area creation unit 122 uses a mask image corresponding to the shadow/highlight image J generated by the brightness adjustment unit 121. It is good to do.

　また、第１実施形態では１種類のデータ拡張画像Ｉ’を１つ作成し、第２実施形態では２種類のデータ拡張画像Ｉ’を１つずつ作成するものとしたが、マスク画像を複数取得又は作成することで、作成するデータ拡張画像Ｉ’の個数を増加させても良い。 In the first embodiment, one data augmented image I' of one type is created, and in the second embodiment, two types of data augmented image I' are created one by one. Alternatively, the number of data extension images I' to be created may be increased by creating them.

　なお、第１及び第３実施形態では、上記ステップＳ１９においてデータ拡張画像Ｉ’を作成する際に、マスク画像Ｍ_dがマスク画像データセットから取得されていた場合には、陰影・ハイライト画像Ｊとして陰影画像を用いるとしたが、そのような場合にも陰影・ハイライト画像Ｊとしてハイライト画像を用いても良い。陰影・ハイライト画像Ｊとして何れの画像を用いるかはランダムなパラメータで決定するようにしても良いし、両方とも用いて２種類のデータ拡張画像Ｉ’を作成するようにしても構わない。 In the first and third embodiments, when the data augmented image I' is created in step S19, if the mask image _Md has been obtained from the mask image data set, the shadow/highlight image J However, the highlight image may be used as the shadow/highlight image J in such a case as well. Which image to use as the shadow/highlight image J may be determined by a random parameter, or both may be used to create two types of data extension images I'.

　また、学習データ記憶部３００は、学習データ処理装置１００の一部として構成されても良い。すなわち、データメモリ１３に、学習データ記憶部３００としての記憶領域が設けられていても良い。 Also, the learning data storage unit 300 may be configured as part of the learning data processing device 100 . That is, the data memory 13 may be provided with a storage area as the learning data storage unit 300 .

　さらに、学習装置２００に、実施形態の学習データ処理装置１００の機能を組み込んでも構わない。 Furthermore, the learning device 200 may incorporate the functions of the learning data processing device 100 of the embodiment.

　また、各実施形態に記載した手法は、計算機（コンピュータ）に実行させることができるプログラム（ソフトウェア手段）として、例えば磁気ディスク（フロッピー（登録商標）ディスク、ハードディスク等）、光ディスク（ＣＤ－ＲＯＭ、ＤＶＤ、ＭＯ等）、半導体メモリ（ＲＯＭ、ＲＡＭ、フラッシュメモリ等）等の記録媒体に格納し、また通信媒体により伝送して頒布することもできる。なお、媒体側に格納されるプログラムには、計算機に実行させるソフトウェア手段（実行プログラムのみならずテーブル、データ構造も含む）を計算機内に構成させる設定プログラムをも含む。本装置を実現する計算機は、記録媒体に記録されたプログラムを読み込み、また場合により設定プログラムによりソフトウェア手段を構築し、このソフトウェア手段によって動作が制御されることにより上述した処理を実行する。なお、本明細書で言う記録媒体は、頒布用に限らず、計算機内部或いはネットワークを介して接続される機器に設けられた磁気ディスク、半導体メモリ等の記憶媒体を含むものである。 Further, the method described in each embodiment can be executed by a computer (computer) as a program (software means), such as a magnetic disk (floppy (registered trademark) disk, hard disk, etc.), an optical disk (CD-ROM, DVD , MO, etc.), a semiconductor memory (ROM, RAM, flash memory, etc.), or the like, or may be transmitted and distributed via a communication medium. The programs stored on the medium also include a setting program for configuring software means (including not only execution programs but also tables and data structures) to be executed by the computer. A computer that realizes this apparatus reads a program recorded on a recording medium, and in some cases, builds software means by a setting program, and executes the above-described processes by controlling the operation by this software means. The term "recording medium" as used in this specification includes not only those for distribution, but also storage media such as magnetic disks, semiconductor memories, etc. provided in computers or devices connected via a network.

　要するに、この発明は上記実施形態に限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で種々に変形することが可能である。また、各実施形態は可能な限り適宜組み合わせて実施しても良く、その場合組み合わせた効果が得られる。さらに、上記実施形態には種々の段階の発明が含まれており、開示される複数の構成要件における適当な組み合わせにより種々の発明が抽出され得る。 In short, the present invention is not limited to the above embodiments, and can be modified in various ways without departing from the gist of the invention at the implementation stage. Moreover, each embodiment may be implemented in combination as much as possible, in which case the combined effect can be obtained. Furthermore, the above-described embodiments include inventions at various stages, and various inventions can be extracted by appropriately combining a plurality of disclosed constituent elements.

　１１…プロセッサ
　１２…プログラムメモリ
　１３…データメモリ
　１３Ａ…マスク画像データセット記憶領域
　１３Ｂ…一時記憶領域
　１４…入出力インタフェース
　１５…通信インタフェース
　１６…バス
　１００…学習データ処理装置
　１１０…データ入力部
　１２０…データ拡張部
　１２１…輝度調整部
　１２２…マスク領域作成部
　１２３…マスク画像記憶部
　１２４…画像合成部
　１３０…データ出力部
　１４０…評価部
　２００…学習装置
　３００…学習データ記憶部
　Ｉ…入力画像
　Ｉ’…データ拡張画像
　Ｊ…陰影・ハイライト画像
　Ｍ_d…マスク画像
　Ｍ_f…対象領域画像
　ＭＩＤＳ…マスク画像データセット
　１－Ｍ…反転陰影・ハイライト付与領域画像

DESCRIPTION OF SYMBOLS 11... Processor 12... Program memory 13... Data memory 13A... Mask image data set storage area 13B... Temporary storage area 14... Input/output interface 15... Communication interface 16... Bus 100... Learning data processing device 110... Data input unit 120... Data Extension unit 121 Luminance adjustment unit 122 Mask area creation unit 123 Mask image storage unit 124 Image synthesizing unit 130 Data output unit 140 Evaluation unit 200 Learning device 300 Learning data storage unit I Input image I' Data extension image J...Shade/highlight image Md...Mask image _Mf ...Target area image _MIDS ...Mask image data set 1-M...Inverted shadow/highlight added area image

Claims

an input image and the illumination environment of the input image; a teacher image that is an image obtained by changing only the illumination environment from the input image; the illumination environment of the teacher image; a data input unit that acquires learning data used for learning the deep generative model, including
creating a brightness-adjusted image obtained by performing brightness adjustment on the input image, creating a mask image indicating a brightness-changed region to which brightness is to be changed, and synthesizing the target region image, the brightness-adjusted image, and the mask image; a data augmentation unit for creating a data augmentation image;
A data output unit that creates new learning data by changing the input image in the learning data to the data augmentation image, and outputs the new learning data as the learning data used for learning the deep generative model. When,
A learning data processing device comprising:

The data extension unit combines a brightness adjustment unit that creates the brightness adjustment image by lowering the brightness of the entire input image, and combines the target area image, the brightness adjustment image, and the mask image to obtain the 2. The learning data processing device according to claim 1, further comprising an image synthesizing unit that creates the data augmented image in which a portion corresponding to the brightness change area in the area corresponding to the brightness change target area is darkened.

The data extension unit synthesizes the brightness adjustment unit that creates the brightness adjustment image by increasing the brightness of the entire input image, and the target area image, the brightness adjustment image, and the mask image, 2. The learning data processing device according to claim 1, further comprising an image synthesizing unit that creates the data augmented image in which a portion corresponding to the brightness change area in the area corresponding to the brightness change target area is brightened.

The learning data processing device according to claim 3, wherein the data extension unit further includes a mask area creation unit that creates the mask image by performing arbitrary binarization processing on the brightness adjustment image.

4. The data extension unit according to claim 2 or 3, further comprising a mask region creation unit that obtains the mask image from a mask image data set, which is a pre-stored data set of irregular mask images. learning data processor.

2. An evaluation unit that evaluates the data extension image and causes the data extension unit to create the data extension image again when the data extension image is an image greatly deviated from the actual image. 6. The learning data processing device according to any one of 1 to 5.

A learning data processing method in a learning data processing device having a processor and creating new learning data from learning data used for learning a deep generative model,
The processor indicates an input image, an illumination environment of the input image, a teacher image that is an image obtained by changing only the illumination environment from the input image, an illumination environment of the teacher image, and a brightness change target area in the input image. obtaining the training data, including an image of the target area;
The processor creates a brightness-adjusted image obtained by performing brightness adjustment on the input image, creates a mask image indicating a brightness-changed region to which brightness is to be changed, and produces the target region image, the brightness-adjusted image, and the mask. Combining images to create a data augmented image,
The processor changes the input image in the learning data to the data augmentation image to create new learning data, and outputs the new learning data as the learning data used for learning the deep generative model. do,
Learning data processing method.

A learning data processing program that causes a processor to function as each part of the learning data processing device according to any one of claims 1 to 6.