WO2024180744A1

WO2024180744A1 - Combining device, combining method, and combining program

Info

Publication number: WO2024180744A1
Application number: PCT/JP2023/007682
Authority: WO
Inventors: 真徳山田; 智也山下
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: NTT Inc
Priority date: 2023-03-01
Filing date: 2023-03-01
Publication date: 2024-09-06
Anticipated expiration: 2025-09-01
Also published as: JPWO2024180744A1

Abstract

In the present invention, an acquisition unit (15a) acquires a first model that has been trained using first training data and a second model that has been trained using second training data. An identification unit (15b) uses a weight with respect to input data of the first model and a weight of the second model to identify a weight of a combined model combining the first and second models, on the basis of the flatness of the gradient of a loss function for each weight.

Description

Synthesis apparatus, synthesis method, and synthesis program

　本発明は、合成装置、合成方法および合成プログラムに関する。 The present invention relates to a synthesis device, a synthesis method, and a synthesis program.

　従来、データセットＡとデータセットＢとが独立に取得される場合に、データセットＡとデータセットＢとで性能が出るモデルを生成する方法が知られている。例えば、データの適用範囲を広げるための継続学習（非特許文献１参照）や、プライバシーの観点を考慮せずにデータを一か所に集めて学習するためのFederated　Learning（非特許文献２参照）や、同じデータ間を合成する手法（非特許文献３参照）が知られている。　Conventionally, methods are known for generating a model that performs well with dataset A and dataset B when dataset A and dataset B are acquired independently. For example, continuous learning (see Non-Patent Document 1) is used to expand the scope of application of data, federated learning (see Non-Patent Document 2) is used to collect data in one place and learn without considering privacy concerns, and a method for synthesizing the same data (see Non-Patent Document 3) is known.

“Introduction　to　Continual　Learning”,　[online],　∞　WIki,　[２０２３年２月６日検索]、インターネット＜URL:　https://wiki.continualai.org/the-continualai-wiki/introduction-to-continual-learning＞“Introduction to Continual Learning”, [online], ∞ WIki, [Retrieved February 6, 2023], Internet <URL: https://wiki.continualai.org/the-continualai-wiki/introduction-to-continual-learning> “Federated　learning”,　[online],　WIKIPEDIA,　[２０２３年２月６日検索]、インターネット＜URL:　https://en.wikipedia.org/wiki/Federated_learning＞“Federated learning”, [online], WIKIPEDIA, [Retrieved February 6, 2023], Internet <URL: https://en.wikipedia.org/wiki/Federated_learning> Samuel　K.　Ainsworth,　Jonathan　Hayase,　Siddhartha　Srinivasa,　“GIT　RE-BASIN:　MERGING　MODELS　MODULO　PERMUTATION　SYMMETRIES”,２０２２年１２月Samuel K. Ainsworth, Jonathan Hayase, Siddhartha Srinivasa, “GIT RE-BASIN: MERGING MODELS MODULO PERMUTATION SYMMETRIES”, December 2022

　しかしながら、従来技術では、データセットＡで学習したモデルＡおよびデータセットＢで学習したモデルＢから、データセットＡとデータセットＢとの両方で使えるモデルを生成することが困難な場合がある。例えば、従来の技術によれば、データセットＡによる学習とデータセットＢによる学習とを独立に実行することができず、運用の制約が生じてしまうという問題がある。 However, with conventional technology, it can be difficult to generate a model that can be used with both datasets A and B from model A trained with dataset A and model B trained with dataset B. For example, with conventional technology, learning with dataset A and learning with dataset B cannot be performed independently, resulting in operational constraints.

　本発明は、上記に鑑みてなされたものであって、データセットＡで学習したモデルＡおよびデータセットＢで学習したモデルＢから、データセットＡとデータセットＢとの両方で使えるモデルを生成することを目的とする。 The present invention has been made in consideration of the above, and aims to generate a model that can be used with both dataset A and dataset B from model A trained with dataset A and model B trained with dataset B.

　上述した課題を解決し、目的を達成するために、本発明に係る合成装置は、第１の学習用データを用いて学習された第１のモデルと、第２の学習用データを用いて学習された第２のモデルとを取得する取得部と、前記第１のモデルの入力データに対するｗｅｉｇｈｔおよび前記第２のモデルのｗｅｉｇｈｔを用いて、各ｗｅｉｇｈｔのｌｏｓｓ関数の勾配のｆｌａｔｎｅｓｓに基づいて、前記第１のモデルと前記第２のモデルとを合成した合成モデルのｗｅｉｇｈｔを特定する特定部と、を有することを特徴とする。 In order to solve the above-mentioned problems and achieve the object, the synthesis device according to the present invention is characterized by having an acquisition unit that acquires a first model trained using first learning data and a second model trained using second learning data, and an identification unit that uses the weight of the first model for the input data and the weight of the second model to identify the weight of a synthesis model obtained by synthesizing the first model and the second model based on the flatness of the gradient of the loss function of each weight.

　本発明によれば、データセットＡで学習したモデルＡおよびデータセットＢで学習したモデルＢから、データセットＡとデータセットＢとの両方で使えるモデルを生成することが可能となる。 According to the present invention, it is possible to generate a model that can be used with both datasets A and B from model A trained with dataset A and model B trained with dataset B.

図１は、合成装置の概略構成を例示する模式図である。FIG. 1 is a schematic diagram illustrating the schematic configuration of a synthesis apparatus. 図２は、合成処理を説明するための図である。FIG. 2 is a diagram for explaining the synthesis process. 図３は、合成処理手順を示すフローチャートである。FIG. 3 is a flowchart showing the synthesis process procedure. 図４は、実施例を説明するための図である。FIG. 4 is a diagram for explaining the embodiment. 図５は、実施例を説明するための図である。FIG. 5 is a diagram for explaining the embodiment. 図６は、合成プログラムを実行するコンピュータを例示する図である。FIG. 6 is a diagram illustrating a computer that executes a synthesis program.

　以下、図面を参照して、本発明の一実施形態を詳細に説明する。なお、この実施形態により本発明が限定されるものではない。また、図面の記載において、同一部分には同一の符号を付して示している。 Below, one embodiment of the present invention will be described in detail with reference to the drawings. Note that the present invention is not limited to this embodiment. In addition, in the drawings, the same parts are denoted by the same reference numerals.

［合成装置の概要］
　本実施形態の合成装置は、データセットＡで学習したモデルＡおよびデータセットＢで学習したモデルＢを用いて、データセットＡとデータセットＢとの両方で使える合成モデルを生成する。 [Outline of synthesis apparatus]
The synthesis device of this embodiment generates a synthetic model that can be used for both datasets A and B using model A trained on dataset A and model B trained on dataset B.

　まず、データセットＡとデータセットＢとの合成データセットＤを次式（１）のように定義する。 First, define a composite dataset D of dataset A and dataset B as follows:

　また、教師あり学習において、ｌをｌｏｓｓ関数、θをモデルの入力データに対するｗｅｉｇｈｔとして、モデルＡのｗｅｉｇｈｔθ_Ａ、モデルＢのｗｅｉｇｈｔθ_Ｂ、合成データセットで学習した合成モデルのｗｅｉｇｈｔθ_Ａ＋Ｂは、それぞれ次式（２）、（３）、（４）で表される。 In addition, in supervised learning, where l is a loss function and θ is the weight of the model for the input data, the weight θ _A of model A, the weight θ _B of model B, and the weight θ _A+B of the synthetic model trained with the synthetic data set are expressed by the following equations (2), (3), and (4), respectively.

　そして、次式（５）を満たす演算★を特定できれば、データセットＡとデータセットＢとを個別に学習し、学習済みｗｅｉｇｈｔを合成してデータセットＡ＋Ｂで使える合成モデルを生成することが可能となる。 If we can identify the operation ★ that satisfies the following formula (5), it will be possible to learn dataset A and dataset B separately, and then combine the learned weights to generate a composite model that can be used with dataset A + B.

　ここで、Ｄｅｅｐ　Ｌｅａｒｎｉｎｇでは、ネットワーク構造の操作に対する対象性が存在する。そこで、合成装置は、ｗｅｉｇｈｔの並び替えに対する対象性を利用して、モデルの出力を変えないようにｗｅｉｇｈｔを並び替えるｐｅｒｍｕｔａｔｉｏｎを行う。 Here, in Deep Learning, there is symmetry in the manipulation of the network structure. Therefore, the synthesis device uses the symmetry in the rearrangement of weights to perform permutation, which rearranges the weights so as not to change the model output.

　その場合に、次式（６）が成立する。 In that case, the following equation (6) holds.

　上記式（６）は、次式（７）に示すルールで置き換えることにより、次式（８）が実現する。 By replacing the above formula (6) with the rule shown in the following formula (7), the following formula (8) is realized.

　そして、合成装置は、以下に説明するように、モデルＡのｗｅｉｇｈｔとモデルＢの並び替えたｗｅｉｇｈｔとの平均を、合成モデルのｗｅｉｇｈｔとして特定することにより、合成モデルを生成する。 Then, the synthesis device generates a synthetic model by determining the average of the weight of model A and the rearranged weight of model B as the weight of the synthetic model, as described below.

［合成装置の構成］
　図１は、合成装置の概略構成を例示する模式図である。また、図２は、合成処理を説明するための図である。まず、図１に例示するように、合成装置１０は、パソコン等の汎用コンピュータで実現され、入力部１１、出力部１２、通信制御部１３、記憶部１４、および制御部１５を備える。 [Configuration of synthesis device]
Fig. 1 is a schematic diagram illustrating the schematic configuration of a synthesis device. Fig. 2 is a diagram for explaining synthesis processing. First, as illustrated in Fig. 1, a synthesis device 10 is realized by a general-purpose computer such as a personal computer, and includes an input unit 11, an output unit 12, a communication control unit 13, a storage unit 14, and a control unit 15.

　入力部１１は、キーボードやマウス等の入力デバイスを用いて実現され、操作者による入力操作に対応して、制御部１５に対して処理開始などの各種指示情報を入力する。出力部１２は、液晶ディスプレイなどの表示装置、プリンター等の印刷装置等によって実現される。 The input unit 11 is realized using input devices such as a keyboard and a mouse, and inputs various instruction information such as a command to start processing to the control unit 15 in response to input operations by an operator. The output unit 12 is realized by a display device such as a liquid crystal display, a printing device such as a printer, etc.

　通信制御部１３は、ＮＩＣ（Network　Interface　Card）等で実現され、ネットワークを介したサーバ等の外部の装置と制御部１５との通信を制御する。例えば、通信制御部１３は、後述する合成処理の対象のデータセットや学習済のモデル等の各種情報を管理する管理装置等と制御部１５との通信を制御する。 The communication control unit 13 is realized by a NIC (Network Interface Card) or the like, and controls communication between the control unit 15 and an external device such as a server via a network. For example, the communication control unit 13 controls communication between the control unit 15 and a management device or the like that manages various information such as the datasets to be subjected to the synthesis process described below and the trained models.

　記憶部１４は、ＲＡＭ（Random　Access　Memory）、フラッシュメモリ（Flash　Memory）等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現される。記憶部１４には、合成装置１０を動作させる処理プログラムや、処理プログラムの実行中に使用されるデータなどが予め記憶され、あるいは処理の都度一時的に記憶される。なお、記憶部１４は、通信制御部１３を介して制御部１５と通信する構成でもよい。 The storage unit 14 is realized by a semiconductor memory element such as a RAM (Random Access Memory) or a flash memory, or a storage device such as a hard disk or an optical disk. The storage unit 14 stores in advance the processing program that operates the synthesis device 10 and data used during execution of the processing program, or stores it temporarily each time processing is performed. The storage unit 14 may be configured to communicate with the control unit 15 via the communication control unit 13.

　制御部１５は、ＣＰＵ（Central　Processing　Unit）やＮＰ（Network　Processor）やＦＰＧＡ（Field　Programmable　Gate　Array）等を用いて実現され、メモリに記憶された処理プログラムを実行する。これにより、制御部１５は、図１に例示するように、取得部１５ａ、および特定部１５ｂとして機能して、合成定処理を実行する。なお、これらの機能部は、それぞれが異なるハードウェアに実装されてもよい。また、制御部１５は、その他の機能部を備えてもよい。 The control unit 15 is realized using a CPU (Central Processing Unit), NP (Network Processor), FPGA (Field Programmable Gate Array), etc., and executes a processing program stored in memory. As a result, the control unit 15 functions as an acquisition unit 15a and an identification unit 15b, as exemplified in FIG. 1, and executes synthesis determination processing. Note that each of these functional units may be implemented in different hardware. The control unit 15 may also include other functional units.

　取得部１５ａは、第１の学習用データを用いて学習された第１のモデルと、第２の学習用データを用いて学習された第２のモデルとを取得する。例えば、取得部１５ａは、後述する合成処理に用いられるデータセットＡを用いて学習され運用中のモデルＡと、その後に収集されたデータセットＢを用いて学習されたモデルＢとを取得する。 The acquisition unit 15a acquires a first model trained using the first learning data and a second model trained using the second learning data. For example, the acquisition unit 15a acquires model A, which is trained using data set A used in the synthesis process described below and is currently in operation, and model B, which is trained using data set B collected thereafter.

　取得部１５ａは、これらのデータを、入力部１１を介して、あるいは各種の情報を管理する管理装置から通信制御部１３を介して取得する。また、取得部１５ａは、取得した通信データを記憶部１４に記憶させてもよい。なお、取得部１５ａは、これらの情報を記憶部１４に記憶させずに、以下の特定部１５ｂに転送してもよい。 The acquisition unit 15a acquires these pieces of data via the input unit 11, or from a management device that manages various pieces of information via the communication control unit 13. The acquisition unit 15a may also store the acquired communication data in the storage unit 14. The acquisition unit 15a may also transfer this information to the identification unit 15b described below without storing it in the storage unit 14.

　特定部１５ｂは、第１のモデルの入力データに対するｗｅｉｇｈｔおよび第２のモデルのｗｅｉｇｈｔを用いて、各ｗｅｉｇｈｔのｌｏｓｓ関数の勾配のｆｌａｔｎｅｓｓに基づいて、第１のモデルと第２のモデルとを合成した合成モデルのｗｅｉｇｈｔを特定する。 The identification unit 15b uses the weight of the first model for the input data and the weight of the second model to identify the weight of the combined model that combines the first model and the second model based on the flatness of the gradient of the loss function of each weight.

　具体的には、特定部１５ｂは、第２のモデルの出力を変えずにｗｅｉｇｈｔの並び替えを行い、並び替えられたｗｅｉｇｈｔと第１のモデルのｗｅｉｇｈｔとを平均することにより、合成モデルのｗｅｉｇｈｔを特定する。 Specifically, the identification unit 15b rearranges the weights without changing the output of the second model, and identifies the weight of the composite model by averaging the rearranged weights and the weights of the first model.

　例えば、従来、データを使用せずに高速に並び替えを行えるｗｅｉｇｈｔ　ｍａｔｃｈｉｎｇと呼ばれる手法が知られている。ｗｅｉｇｈｔ　ｍａｔｃｈｉｎｇでは、次式（９）の最適化を行う。 For example, a method called weight matching is known that can perform high-speed sorting without using data. Weight matching optimizes the following equation (9).

　そこで、特定部１５ｂは、ｗｅｉｇｈｔのｌｏｓｓ関数のｌａｎｄｓｃａｐｅのｆｌａｔｎｅｓｓすなわち勾配の平坦度を考慮したｗｅｉｇｈｔ　ｍａｔｃｈｉｎｇを行う。ｆｌａｔｎｅｓｓとは、例えば、特定部１５ｂは、次式（１０）に示すように、ｗｅｉｇｈｔのｌｏｓｓ関数の勾配のｆｌａｔｎｅｓｓを表す項を追加して、最適化を行う。 The determination unit 15b therefore performs weight matching taking into account the flatness of the landscape of the weight loss function, i.e., the flatness of the gradient. For example, the determination unit 15b performs optimization by adding a term that represents the flatness of the gradient of the weight loss function, as shown in the following formula (10).

　具体的には、特定部１５ｂは、次式（１１）を最大化するようなｐｅｒｍｕｔａｔｉｏｎ演算子Ｐを探索する。ここで、βは、ｆｌａｔとのバランスをとる定数である。 Specifically, the identification unit 15b searches for a permutation operator P that maximizes the following equation (11). Here, β is a constant that balances with flat.

　ここで、異なる２つのデータセットＡ，Ｂの合成データセットは、次式（１２）で表すことができる。例えば、ｓ＝０の場合にはデータセットＡのみを表し、ｓ＝１はデータセットＢのみを表す。 Here, the composite dataset of two different datasets A and B can be expressed by the following equation (12). For example, when s = 0, it represents only dataset A, and when s = 1, it represents only dataset B.

　また、データセットＡで学習されたモデルＡのｗｅｉｇｈｔの最適解は、次式（１３）で表され、データセットＢで学習されたモデルＢのｗｅｉｇｈｔの最適解は、次式（１４）で表される。 Furthermore, the optimal solution for the weight of model A trained on data set A is expressed by the following formula (13), and the optimal solution for the weight of model B trained on data set B is expressed by the following formula (14).

　また、合成データセットについて、次式（１５）が成立する。 Furthermore, for the synthetic data set, the following equation (15) holds:

　ここで、上記式（１５）の各項のｌｏｓｓ関数をそれぞれ、ｈ、ｆ、ｇと簡略化して記すと、次式（１６）のように表せる。 If we simplify the loss functions of each term in the above formula (15) and write them as h, f, and g, respectively, we can express it as the following formula (16).

　期待値は線形演算であるため、特定部１５ｂは、ｆとｇとの和で定義されるｈの最小値を探索することになる。 Since the expected value is a linear calculation, the determination unit 15b searches for the minimum value of h, which is defined as the sum of f and g.

　ここで、従来のｗｅｉｇｈｔ　ｍａｔｃｈｉｎｇでは、ｆとｇとの最適解をできるだけ近づけるように探索する。ただし、モデルの出力は変わらないため、ｌｏｓｓの大きさ自体は変わらない。その際に、ｇのｆｌａｔｎｅｓｓは考慮されない。 Here, in conventional weight matching, a search is performed to find the optimal solution between f and g as close as possible. However, since the model output does not change, the magnitude of the loss itself does not change. In this case, the flatness of g is not taken into account.

　そこで、特定部１５ｂは、ｆとｇとの解を近づける際に、図２に例示するにように、ｆｌａｔｎｅｓｓの異なるｇのうち、よりｆｌａｔなものをｇの解として選択する。図２には、移動可能なｇの直感的なイメージが破線で示されている。なお、ｆはＳＧＤ（stochastic　gradient　descent、確率的勾配降下法）で探索した結果であるため、ｆｌａｔであることが仮定される。 Then, when the determination unit 15b brings the solutions of f and g closer together, it selects the flattest solution of g among the solutions of g with different flatness, as shown in FIG. 2. In FIG. 2, an intuitive image of a movable g is shown by a dashed line. Note that f is assumed to be flat because it is the result of a search using SGD (stochastic gradient descent).

　このように、特定部１５ｂは、ｗｅｉｇｈｔのｌｏｓｓ関数の勾配のｆｌａｔｎｅｓｓに基づいて、合成モデルのｗｅｉｇｈｔを特定することにより、合成モデルを生成する。 In this way, the identification unit 15b generates a composite model by identifying the weight of the composite model based on the flatness of the gradient of the loss function of the weight.

　また、特定部１５ｂは、ｌｏｓｓ関数を算出する際に、データ凝縮により生成された代理データを用いてもよい。例えば、特定部１５ｂは、ｇｒａｄｉｅｎｔ　ｍａｔｃｈｉｎｇといわれるデータ凝縮の手法を用いて、代理データを生成する。これにより、合成モデルの特定がより容易に可能となる。 In addition, when calculating the loss function, the identification unit 15b may use proxy data generated by data condensation. For example, the identification unit 15b generates proxy data using a data condensation technique called gradient matching. This makes it easier to identify the composite model.

　なお、本実施形態の合成処理は、例えば、一般的なＡＩに適用可能である。特に、Ｄｅｅｐ　Ｌｅａｒｎｉｎｇが得意とする画像認識、自然言語処理、音声認識等との親和性が高く、顔認証システム等に用いることが可能である。例えば、会社Ａの社員の顔認証モデルＡを生成した後に、会社Ａが会社Ｂと合併することになった場合に、会社Ｂの社員も顔認証システムを使えるように、会社Ａ、Ｂの双方が使える合成モデルを生成することが可能となる。 The synthesis process of this embodiment can be applied to, for example, general AI. In particular, it has a high affinity with Deep Learning's specialties of image recognition, natural language processing, voice recognition, etc., and can be used in face recognition systems, etc. For example, if company A merges with company B after generating face recognition model A for employees of company A, it is possible to generate a synthetic model that can be used by both companies A and B so that employees of company B can also use the face recognition system.

［合成処理］
　次に、図３を参照して、本実施形態に係る合成装置１０による合成処理について説明する。本実施形態の合成処理は、検知処理と検索処理とを含む。図３のフローチャートは、例えば、合成処理の開始を指示する操作入力があったタイミングで開始される。 [Composition Processing]
Next, a compositing process performed by the compositing device 10 according to the present embodiment will be described with reference to Fig. 3. The compositing process according to the present embodiment includes a detection process and a search process. The flowchart in Fig. 3 starts, for example, when an operation input is made to instruct the start of the compositing process.

　まず、取得部１５ａが、データセットＡで学習されたモデルＡと、データセットＢで学習されたモデルＢとを取得する（ステップＳ１）。例えば、取得部１５ａは、データセットＡで学習され運用中のモデルＡを取得する。また、取得部１５ａは、その後に取得されたデータセットＢで学習されたモデルＢを取得する。 First, the acquisition unit 15a acquires model A trained on dataset A and model B trained on dataset B (step S1). For example, the acquisition unit 15a acquires model A trained on dataset A and in operation. The acquisition unit 15a also acquires model B trained on dataset B acquired thereafter.

　次に、特定部１５ｂが、モデルＡのｗｅｉｇｈｔおよびモデルＢのｗｅｉｇｈｔを用いて、各ｗｅｉｇｈｔのｌｏｓｓ関数の勾配のｆｌａｔｎｅｓｓに基づいて、モデルＡとモデルＢとを合成した合成モデルのｗｅｉｇｈｔを特定する（ステップＳ２）。 Next, the identification unit 15b uses the weight of model A and the weight of model B to identify the weight of a composite model obtained by combining model A and model B based on the flatness of the gradient of the loss function of each weight (step S2).

　具体的には、特定部１５ｂは、モデルＢの出力を変えずにｗｅｉｇｈｔの並び替えを行い、並び替えられたｗｅｉｇｈｔとモデルＡのｗｅｉｇｈｔとを平均することにより、合成モデルのｗｅｉｇｈｔを特定する。これにより、合成モデルが生成される。 Specifically, the identification unit 15b rearranges the weights without changing the output of model B, and identifies the weight of the composite model by averaging the rearranged weights and the weights of model A. In this way, the composite model is generated.

　その際に、特定部１５ｂは、ｇｒａｄｉｅｎｔ　ｍａｔｃｈｉｎｇ等のデータ凝縮の手法を用いて代理データを生成し、ｌｏｓｓ関数を算出する際に、代理データを用いる。 At that time, the identification unit 15b generates proxy data using a data condensation technique such as gradient matching, and uses the proxy data when calculating the loss function.

　また、特定部１５ｂは、生成した合成モデルを、例えば出力部１２を介して運用装置に対して出力する。これにより、一連の合成処理が終了する。 The identification unit 15b also outputs the generated composite model to the operation device, for example, via the output unit 12. This completes the series of composite processes.

［効果］
　以上、説明したように、取得部１５ａは、第１の学習用データを用いて学習された第１のモデルと、第２の学習用データを用いて学習された第２のモデルとを取得する。特定部１５ｂが、第１のモデルの入力データに対するｗｅｉｇｈｔおよび第２のモデルのｗｅｉｇｈｔを用いて、各ｗｅｉｇｈｔのｌｏｓｓ関数の勾配のｆｌａｔｎｅｓｓに基づいて、第１のモデルと第２のモデルとを合成した合成モデルのｗｅｉｇｈｔを特定する。 [effect]
As described above, the acquiring unit 15a acquires the first model trained using the first learning data and the second model trained using the second learning data. The identifying unit 15b identifies the weight of a composite model obtained by combining the first model and the second model, using the weight of the first model for the input data and the weight of the second model, based on the flatness of the gradient of the loss function of each weight.

　これにより、データセットＡで学習したモデルＡおよびデータセットＢで学習したモデルＢから、データセットＡとデータセットＢとの両方で使えるモデルを生成することが可能となる。 This makes it possible to generate a model that can be used with both datasets A and B from model A trained on dataset A and model B trained on dataset B.

　また、特定部１５ｄが、ｌｏｓｓ関数を算出する際に、データ凝縮により生成された代理データを用いる。これにより、運用の条件が緩和され、合成モデルの特定がより容易に可能となる。 In addition, when the identification unit 15d calculates the loss function, it uses the proxy data generated by data condensation. This relaxes the operating conditions and makes it easier to identify the composite model.

［実施例］
　図４および図５は、実施例を説明するための図である。本実施例では、ＭＮＩＳＴで学習したモデルＡとＦａｓｉｏｎＭＮＩＳＴで学習したモデルＢとを用いて、上記の実施形態の合成処理により、合成モデルを生成した。その際に、モデルはＭＬＰとし、データセットはＭＮＩＳＴ、ＦａｓｈｉｏｎＭＮＩＳＴを使用し、また「https://github.com/samuela/git-re-basin」を参照して作成したソースコードを使用した。そして、ＭＮＩＳＴ、ＦａｓｈｉｏｎＭＮＩＳＴとの両方を合わせた合成データセットによる合成モデルの精度とｌｏｓｓとを評価した。 [Example]
4 and 5 are diagrams for explaining the examples. In this example, a synthetic model was generated by the synthesis process of the above embodiment using model A trained with MNIST and model B trained with FashionMNIST. At that time, the model was set to MLP, MNIST and FashionMNIST were used as datasets, and source code created with reference to "https://github.com/samuela/git-re-basin" was used. Then, the accuracy and loss of the synthetic model using a synthetic dataset combining both MNIST and FashionMNIST were evaluated.

　図４および図５の縦軸は、ＭＮＩＳＴにより学習されたモデルＡと、ＦａｓｈｉｏｎモデルＭＮＩＳＴにより学習されたモデルＢとの合成の割合λを示し、縦軸は合成モデルの精度を示す。 The vertical axis in Figures 4 and 5 shows the synthesis ratio λ between model A trained by MNIST and model B trained by the Fashion model MNIST, and the vertical axis shows the accuracy of the synthesized model.

　ここで、モデルＡはＭＮＩＳＴに対して１００％近い精度が示されるが、ＦａｓｈｉｏｎＭＮＩＳＴに対しては０％に近い精度が示される。したがって、図４および図５において、モデルＡは合成データセットに対しては、精度が５０％程度となっている。また、モデルＢは、ＦａｓｈｉｏｎＭＮＩＳＴに対して１００％近い精度が示されるが、ＭＮＩＳＴに対しては０％に近い精度が示される。したがって、図４および図５において、モデルＢも同様に、合成データセットに対しては精度が５０％程度となっている。 Here, model A shows an accuracy of nearly 100% for MNIST, but an accuracy of nearly 0% for FashionMNIST. Therefore, in Figures 4 and 5, model A has an accuracy of about 50% for the synthetic dataset. Also, model B shows an accuracy of nearly 100% for FashionMNIST, but an accuracy of nearly 0% for MNIST. Therefore, in Figures 4 and 5, model B also has an accuracy of about 50% for the synthetic dataset.

　まず、図４には、ｆｌａｔｏｎｅｓｓを考慮しない場合であって、破線で示すＰｅｒｍｕｔａｔｉｏｎを行わない場合、実線で示すＰｅｒｍｕｔａｔｉｏｎを行った場合の２パターンのケースについて例示されている。また、各ケースについて、学習時（Train、太線）と運用時（Test）とについて例示されている。 First, Figure 4 shows two examples of cases where flatness is not taken into account: no permutation shown by the dashed line, and permutation shown by the solid line. Also, examples are shown for each case during learning (Train, thick line) and operation (Test).

　また、図５には、ｆｌａｔｏｎｅｓｓを考慮した場合について、図４の場合と同様の２パターンのケースについて例示されている。 In addition, Figure 5 shows two examples of cases similar to those in Figure 4 when flatness is taken into account.

　図４および図５のいずれにおいても、実線で示すＰｅｒｍｕｔａｔｉｏｎを行った場合の方が、破線で示すＰｅｒｍｕｔａｔｉｏｎを行わない場合より合成モデルの精度が高いことが確認された。 In both Figures 4 and 5, it was confirmed that the accuracy of the synthetic model was higher when permutation was performed (shown by the solid line) than when permutation was not performed (shown by the dashed line).

　また、図５に示したｆｌａｔｏｎｅｓｓを考慮した場合の方が、図４に示したｆｌａｔｏｎｅｓｓを考慮しない場合より、合成モデルの精度が高いことが確認された。 It was also confirmed that the accuracy of the synthetic model was higher when the flatness shown in Figure 5 was taken into account than when the flatness shown in Figure 4 was not taken into account.

［プログラム］
　上記実施形態に係る合成装置１０が実行する処理をコンピュータが実行可能な言語で記述したプログラムを作成することもできる。一実施形態として、合成装置１０は、パッケージソフトウェアやオンラインソフトウェアとして上記の合成処理を実行する合成プログラムを所望のコンピュータにインストールさせることによって実装できる。例えば、上記の合成プログラムを情報処理装置に実行させることにより、情報処理装置を合成装置１０として機能させることができる。また、その他にも、情報処理装置にはスマートフォン、携帯電話機やＰＨＳ（Personal　Handyphone　System）等の移動体通信端末、さらには、ＰＤＡ（Personal　Digital　Assistant）等のスレート端末等がその範疇に含まれる。また、合成装置１０の機能を、クラウドサーバに実装してもよい。 [program]
A program in which the process executed by the synthesis device 10 according to the above embodiment is written in a language executable by a computer can also be created. As an embodiment, the synthesis device 10 can be implemented by installing a synthesis program that executes the above synthesis process as package software or online software on a desired computer. For example, the above synthesis program can be executed by an information processing device, so that the information processing device can function as the synthesis device 10. In addition, the information processing device also includes mobile communication terminals such as smartphones, mobile phones, and PHS (Personal Handyphone System), as well as slate terminals such as PDA (Personal Digital Assistant). The function of the synthesis device 10 may also be implemented on a cloud server.

　図６は、合成プログラムを実行するコンピュータの一例を示す図である。コンピュータ１０００は、例えば、メモリ１０１０と、ＣＰＵ１０２０と、ハードディスクドライブインタフェース１０３０と、ディスクドライブインタフェース１０４０と、シリアルポートインタフェース１０５０と、ビデオアダプタ１０６０と、ネットワークインタフェース１０７０とを有する。これらの各部は、バス１０８０によって接続される。 FIG. 6 is a diagram showing an example of a computer that executes a synthesis program. The computer 1000 has, for example, a memory 1010, a CPU 1020, a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. These components are connected by a bus 1080.

　メモリ１０１０は、ＲＯＭ（Read　Only　Memory）１０１１およびＲＡＭ１０１２を含む。ＲＯＭ１０１１は、例えば、ＢＩＯＳ（Basic　Input　Output　System）等のブートプログラムを記憶する。ハードディスクドライブインタフェース１０３０は、ハードディスクドライブ１０３１に接続される。ディスクドライブインタフェース１０４０は、ディスクドライブ１０４１に接続される。ディスクドライブ１０４１には、例えば、磁気ディスクや光ディスク等の着脱可能な記憶媒体が挿入される。シリアルポートインタフェース１０５０には、例えば、マウス１０５１およびキーボード１０５２が接続される。ビデオアダプタ１０６０には、例えば、ディスプレイ１０６１が接続される。 The memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM 1012. The ROM 1011 stores a boot program such as a BIOS (Basic Input Output System). The hard disk drive interface 1030 is connected to a hard disk drive 1031. The disk drive interface 1040 is connected to a disk drive 1041. A removable storage medium such as a magnetic disk or optical disk is inserted into the disk drive 1041. The serial port interface 1050 is connected to a mouse 1051 and a keyboard 1052, for example. The video adapter 1060 is connected to a display 1061, for example.

　ここで、ハードディスクドライブ１０３１は、例えば、ＯＳ１０９１、アプリケーションプログラム１０９２、プログラムモジュール１０９３およびプログラムデータ１０９４を記憶する。上記実施形態で説明した各情報は、例えばハードディスクドライブ１０３１やメモリ１０１０に記憶される。 Here, the hard disk drive 1031 stores, for example, an OS 1091, an application program 1092, a program module 1093, and program data 1094. Each piece of information described in the above embodiment is stored, for example, in the hard disk drive 1031 or memory 1010.

　また、合成プログラムは、例えば、コンピュータ１０００によって実行される指令が記述されたプログラムモジュール１０９３として、ハードディスクドライブ１０３１に記憶される。具体的には、上記実施形態で説明した合成装置１０が実行する各処理が記述されたプログラムモジュール１０９３が、ハードディスクドライブ１０３１に記憶される。 The synthesis program is stored in the hard disk drive 1031, for example, as a program module 1093 in which instructions to be executed by the computer 1000 are written. Specifically, the program module 1093 in which each process executed by the synthesis device 10 described in the above embodiment is written is stored in the hard disk drive 1031.

　また、合成プログラムによる情報処理に用いられるデータは、プログラムデータ１０９４として、例えば、ハードディスクドライブ１０３１に記憶される。そして、ＣＰＵ１０２０が、ハードディスクドライブ１０３１に記憶されたプログラムモジュール１０９３やプログラムデータ１０９４を必要に応じてＲＡＭ１０１２に読み出して、上述した各手順を実行する。 In addition, data used for information processing by the synthesis program is stored as program data 1094, for example, in the hard disk drive 1031. Then, the CPU 1020 reads the program module 1093 and program data 1094 stored in the hard disk drive 1031 into the RAM 1012 as necessary, and executes each of the above-mentioned procedures.

　なお、合成プログラムに係るプログラムモジュール１０９３やプログラムデータ１０９４は、ハードディスクドライブ１０３１に記憶される場合に限られず、例えば、着脱可能な記憶媒体に記憶されて、ディスクドライブ１０４１等を介してＣＰＵ１０２０によって読み出されてもよい。あるいは、合成プログラムに係るプログラムモジュール１０９３やプログラムデータ１０９４は、ＬＡＮ（Local　Area　Network）やＷＡＮ（Wide　Area　Network）等のネットワークを介して接続された他のコンピュータに記憶され、ネットワークインタフェース１０７０を介してＣＰＵ１０２０によって読み出されてもよい。 The program module 1093 and program data 1094 related to the synthesis program are not limited to being stored in the hard disk drive 1031, but may be stored in a removable storage medium, for example, and read by the CPU 1020 via the disk drive 1041 or the like. Alternatively, the program module 1093 and program data 1094 related to the synthesis program may be stored in another computer connected via a network, such as a LAN (Local Area Network) or a WAN (Wide Area Network), and read by the CPU 1020 via the network interface 1070.

　以上、本発明者によってなされた発明を適用した実施形態について説明したが、本実施形態による本発明の開示の一部をなす記述および図面により本発明は限定されることはない。すなわち、本実施形態に基づいて当業者等によりなされる他の実施形態、実施例および運用技術等は全て本発明の範疇に含まれる。 The above describes an embodiment of the invention made by the inventor, but the present invention is not limited to the descriptions and drawings that form part of the disclosure of the present invention according to this embodiment. In other words, other embodiments, examples, operational techniques, etc. made by those skilled in the art based on this embodiment are all included in the scope of the present invention.

　１０　合成装置
　１１　入力部
　１２　出力部
　１３　通信制御部
　１４　記憶部
　１５　制御部
　１５ａ　取得部
　１５ｂ　特定部 REFERENCE SIGNS LIST 10 Synthesis device 11 Input unit 12 Output unit 13 Communication control unit 14 Storage unit 15 Control unit 15a Acquisition unit 15b Identification unit

Claims

an acquisition unit that acquires a first model trained using the first learning data and a second model trained using the second learning data;
A determination unit that determines a weight of a composite model obtained by combining the first model and the second model based on a flatness of a gradient of a loss function of each weight by using a weight for the input data of the first model and a weight of the second model;
A synthesis apparatus comprising:

The synthesis device according to claim 1, characterized in that the identification unit identifies the weight of the synthesis model by rearranging the weights without changing the output of the second model and averaging the rearranged weights and the weights of the first model.

The synthesis device according to claim 1, characterized in that the determination unit uses proxy data generated by data condensation when calculating the loss function.

A synthesis method performed by a synthesis device, comprising:
An acquisition step of acquiring a first model trained using the first training data and a second model trained using the second training data;
A step of specifying a weight of a composite model obtained by combining the first model and the second model based on the flatness of the gradient of a loss function of each weight using the weight of the first model for the input data and the weight of the second model;
A synthesis method comprising:

An acquisition step of acquiring a first model trained using the first training data and a second model trained using the second training data;
A step of specifying a weight of a composite model obtained by combining the first model and the second model based on a flatness of a gradient of a loss function of each weight using a weight for the input data of the first model and a weight of the second model;
A synthesis program for causing a computer to execute the above.