[go: up one dir, main page]

WO2024180744A1 - Combining device, combining method, and combining program - Google Patents

Combining device, combining method, and combining program Download PDF

Info

Publication number
WO2024180744A1
WO2024180744A1 PCT/JP2023/007682 JP2023007682W WO2024180744A1 WO 2024180744 A1 WO2024180744 A1 WO 2024180744A1 JP 2023007682 W JP2023007682 W JP 2023007682W WO 2024180744 A1 WO2024180744 A1 WO 2024180744A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
weight
synthesis
trained
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/JP2023/007682
Other languages
French (fr)
Japanese (ja)
Inventor
真徳 山田
智也 山下
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NTT Inc
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to PCT/JP2023/007682 priority Critical patent/WO2024180744A1/en
Priority to JP2025503529A priority patent/JPWO2024180744A1/ja
Publication of WO2024180744A1 publication Critical patent/WO2024180744A1/en
Anticipated expiration legal-status Critical
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present invention relates to a synthesis device, a synthesis method, and a synthesis program.
  • Non-Patent Document 1 continuous learning
  • Non-Patent Document 2 federated learning
  • Non-Patent Document 3 a method for synthesizing the same data
  • the present invention has been made in consideration of the above, and aims to generate a model that can be used with both dataset A and dataset B from model A trained with dataset A and model B trained with dataset B.
  • the synthesis device is characterized by having an acquisition unit that acquires a first model trained using first learning data and a second model trained using second learning data, and an identification unit that uses the weight of the first model for the input data and the weight of the second model to identify the weight of a synthesis model obtained by synthesizing the first model and the second model based on the flatness of the gradient of the loss function of each weight.
  • FIG. 1 is a schematic diagram illustrating the schematic configuration of a synthesis apparatus.
  • FIG. 2 is a diagram for explaining the synthesis process.
  • FIG. 3 is a flowchart showing the synthesis process procedure.
  • FIG. 4 is a diagram for explaining the embodiment.
  • FIG. 5 is a diagram for explaining the embodiment.
  • FIG. 6 is a diagram illustrating a computer that executes a synthesis program.
  • the synthesis device of this embodiment generates a synthetic model that can be used for both datasets A and B using model A trained on dataset A and model B trained on dataset B.
  • the synthesis device uses the symmetry in the rearrangement of weights to perform permutation, which rearranges the weights so as not to change the model output.
  • the synthesis device generates a synthetic model by determining the average of the weight of model A and the rearranged weight of model B as the weight of the synthetic model, as described below.
  • Fig. 1 is a schematic diagram illustrating the schematic configuration of a synthesis device.
  • Fig. 2 is a diagram for explaining synthesis processing.
  • a synthesis device 10 is realized by a general-purpose computer such as a personal computer, and includes an input unit 11, an output unit 12, a communication control unit 13, a storage unit 14, and a control unit 15.
  • the input unit 11 is realized using input devices such as a keyboard and a mouse, and inputs various instruction information such as a command to start processing to the control unit 15 in response to input operations by an operator.
  • the output unit 12 is realized by a display device such as a liquid crystal display, a printing device such as a printer, etc.
  • the communication control unit 13 is realized by a NIC (Network Interface Card) or the like, and controls communication between the control unit 15 and an external device such as a server via a network.
  • the communication control unit 13 controls communication between the control unit 15 and a management device or the like that manages various information such as the datasets to be subjected to the synthesis process described below and the trained models.
  • the storage unit 14 is realized by a semiconductor memory element such as a RAM (Random Access Memory) or a flash memory, or a storage device such as a hard disk or an optical disk.
  • the storage unit 14 stores in advance the processing program that operates the synthesis device 10 and data used during execution of the processing program, or stores it temporarily each time processing is performed.
  • the storage unit 14 may be configured to communicate with the control unit 15 via the communication control unit 13.
  • the control unit 15 is realized using a CPU (Central Processing Unit), NP (Network Processor), FPGA (Field Programmable Gate Array), etc., and executes a processing program stored in memory. As a result, the control unit 15 functions as an acquisition unit 15a and an identification unit 15b, as exemplified in FIG. 1, and executes synthesis determination processing. Note that each of these functional units may be implemented in different hardware. The control unit 15 may also include other functional units.
  • the acquisition unit 15a acquires a first model trained using the first learning data and a second model trained using the second learning data. For example, the acquisition unit 15a acquires model A, which is trained using data set A used in the synthesis process described below and is currently in operation, and model B, which is trained using data set B collected thereafter.
  • the acquisition unit 15a acquires these pieces of data via the input unit 11, or from a management device that manages various pieces of information via the communication control unit 13.
  • the acquisition unit 15a may also store the acquired communication data in the storage unit 14.
  • the acquisition unit 15a may also transfer this information to the identification unit 15b described below without storing it in the storage unit 14.
  • the identification unit 15b uses the weight of the first model for the input data and the weight of the second model to identify the weight of the combined model that combines the first model and the second model based on the flatness of the gradient of the loss function of each weight.
  • the identification unit 15b rearranges the weights without changing the output of the second model, and identifies the weight of the composite model by averaging the rearranged weights and the weights of the first model.
  • Weight matching optimizes the following equation (9).
  • the determination unit 15b therefore performs weight matching taking into account the flatness of the landscape of the weight loss function, i.e., the flatness of the gradient. For example, the determination unit 15b performs optimization by adding a term that represents the flatness of the gradient of the weight loss function, as shown in the following formula (10).
  • the identification unit 15b searches for a permutation operator P that maximizes the following equation (11).
  • is a constant that balances with flat.
  • the optimal solution for the weight of model A trained on data set A is expressed by the following formula (13)
  • the optimal solution for the weight of model B trained on data set B is expressed by the following formula (14).
  • the determination unit 15b searches for the minimum value of h, which is defined as the sum of f and g.
  • the determination unit 15b brings the solutions of f and g closer together, it selects the flattest solution of g among the solutions of g with different flatness, as shown in FIG. 2.
  • FIG. 2 an intuitive image of a movable g is shown by a dashed line. Note that f is assumed to be flat because it is the result of a search using SGD (stochastic gradient descent).
  • the identification unit 15b generates a composite model by identifying the weight of the composite model based on the flatness of the gradient of the loss function of the weight.
  • the identification unit 15b may use proxy data generated by data condensation.
  • the identification unit 15b generates proxy data using a data condensation technique called gradient matching. This makes it easier to identify the composite model.
  • the synthesis process of this embodiment can be applied to, for example, general AI.
  • it has a high affinity with Deep Learning's specialties of image recognition, natural language processing, voice recognition, etc., and can be used in face recognition systems, etc.
  • face recognition systems etc.
  • company A merges with company B after generating face recognition model A for employees of company A
  • the compositing process according to the present embodiment includes a detection process and a search process.
  • the flowchart in Fig. 3 starts, for example, when an operation input is made to instruct the start of the compositing process.
  • the acquisition unit 15a acquires model A trained on dataset A and model B trained on dataset B (step S1). For example, the acquisition unit 15a acquires model A trained on dataset A and in operation. The acquisition unit 15a also acquires model B trained on dataset B acquired thereafter.
  • the identification unit 15b uses the weight of model A and the weight of model B to identify the weight of a composite model obtained by combining model A and model B based on the flatness of the gradient of the loss function of each weight (step S2).
  • the identification unit 15b rearranges the weights without changing the output of model B, and identifies the weight of the composite model by averaging the rearranged weights and the weights of model A. In this way, the composite model is generated.
  • the identification unit 15b generates proxy data using a data condensation technique such as gradient matching, and uses the proxy data when calculating the loss function.
  • a data condensation technique such as gradient matching
  • the identification unit 15b also outputs the generated composite model to the operation device, for example, via the output unit 12. This completes the series of composite processes.
  • the acquiring unit 15a acquires the first model trained using the first learning data and the second model trained using the second learning data.
  • the identifying unit 15b identifies the weight of a composite model obtained by combining the first model and the second model, using the weight of the first model for the input data and the weight of the second model, based on the flatness of the gradient of the loss function of each weight.
  • the identification unit 15b rearranges the weights without changing the output of the second model, and identifies the weight of the composite model by averaging the rearranged weights and the weights of the first model.
  • the identification unit 15d uses the proxy data generated by data condensation. This relaxes the operating conditions and makes it easier to identify the composite model.
  • Example 4 and 5 are diagrams for explaining the examples.
  • a synthetic model was generated by the synthesis process of the above embodiment using model A trained with MNIST and model B trained with FashionMNIST.
  • the model was set to MLP, MNIST and FashionMNIST were used as datasets, and source code created with reference to "https://github.com/samuela/git-re-basin" was used. Then, the accuracy and loss of the synthetic model using a synthetic dataset combining both MNIST and FashionMNIST were evaluated.
  • the vertical axis in Figures 4 and 5 shows the synthesis ratio ⁇ between model A trained by MNIST and model B trained by the Fashion model MNIST, and the vertical axis shows the accuracy of the synthesized model.
  • model A shows an accuracy of nearly 100% for MNIST, but an accuracy of nearly 0% for FashionMNIST. Therefore, in Figures 4 and 5, model A has an accuracy of about 50% for the synthetic dataset.
  • model B shows an accuracy of nearly 100% for FashionMNIST, but an accuracy of nearly 0% for MNIST. Therefore, in Figures 4 and 5, model B also has an accuracy of about 50% for the synthetic dataset.
  • Figure 4 shows two examples of cases where flatness is not taken into account: no permutation shown by the dashed line, and permutation shown by the solid line. Also, examples are shown for each case during learning (Train, thick line) and operation (Test).
  • Figure 5 shows two examples of cases similar to those in Figure 4 when flatness is taken into account.
  • a program in which the process executed by the synthesis device 10 according to the above embodiment is written in a language executable by a computer can also be created.
  • the synthesis device 10 can be implemented by installing a synthesis program that executes the above synthesis process as package software or online software on a desired computer.
  • the above synthesis program can be executed by an information processing device, so that the information processing device can function as the synthesis device 10.
  • the information processing device also includes mobile communication terminals such as smartphones, mobile phones, and PHS (Personal Handyphone System), as well as slate terminals such as PDA (Personal Digital Assistant).
  • the function of the synthesis device 10 may also be implemented on a cloud server.
  • FIG. 6 is a diagram showing an example of a computer that executes a synthesis program.
  • the computer 1000 has, for example, a memory 1010, a CPU 1020, a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. These components are connected by a bus 1080.
  • the memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM 1012.
  • the ROM 1011 stores a boot program such as a BIOS (Basic Input Output System).
  • BIOS Basic Input Output System
  • the hard disk drive interface 1030 is connected to a hard disk drive 1031.
  • the disk drive interface 1040 is connected to a disk drive 1041.
  • a removable storage medium such as a magnetic disk or optical disk is inserted into the disk drive 1041.
  • the serial port interface 1050 is connected to a mouse 1051 and a keyboard 1052, for example.
  • the video adapter 1060 is connected to a display 1061, for example.
  • the hard disk drive 1031 stores, for example, an OS 1091, an application program 1092, a program module 1093, and program data 1094. Each piece of information described in the above embodiment is stored, for example, in the hard disk drive 1031 or memory 1010.
  • the synthesis program is stored in the hard disk drive 1031, for example, as a program module 1093 in which instructions to be executed by the computer 1000 are written.
  • the program module 1093 in which each process executed by the synthesis device 10 described in the above embodiment is written is stored in the hard disk drive 1031.
  • data used for information processing by the synthesis program is stored as program data 1094, for example, in the hard disk drive 1031.
  • the CPU 1020 reads the program module 1093 and program data 1094 stored in the hard disk drive 1031 into the RAM 1012 as necessary, and executes each of the above-mentioned procedures.
  • the program module 1093 and program data 1094 related to the synthesis program are not limited to being stored in the hard disk drive 1031, but may be stored in a removable storage medium, for example, and read by the CPU 1020 via the disk drive 1041 or the like.
  • the program module 1093 and program data 1094 related to the synthesis program may be stored in another computer connected via a network, such as a LAN (Local Area Network) or a WAN (Wide Area Network), and read by the CPU 1020 via the network interface 1070.
  • a network such as a LAN (Local Area Network) or a WAN (Wide Area Network

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

In the present invention, an acquisition unit (15a) acquires a first model that has been trained using first training data and a second model that has been trained using second training data. An identification unit (15b) uses a weight with respect to input data of the first model and a weight of the second model to identify a weight of a combined model combining the first and second models, on the basis of the flatness of the gradient of a loss function for each weight.

Description

合成装置、合成方法および合成プログラムSynthesis apparatus, synthesis method, and synthesis program

 本発明は、合成装置、合成方法および合成プログラムに関する。 The present invention relates to a synthesis device, a synthesis method, and a synthesis program.

 従来、データセットAとデータセットBとが独立に取得される場合に、データセットAとデータセットBとで性能が出るモデルを生成する方法が知られている。例えば、データの適用範囲を広げるための継続学習(非特許文献1参照)や、プライバシーの観点を考慮せずにデータを一か所に集めて学習するためのFederated Learning(非特許文献2参照)や、同じデータ間を合成する手法(非特許文献3参照)が知られている。  Conventionally, methods are known for generating a model that performs well with dataset A and dataset B when dataset A and dataset B are acquired independently. For example, continuous learning (see Non-Patent Document 1) is used to expand the scope of application of data, federated learning (see Non-Patent Document 2) is used to collect data in one place and learn without considering privacy concerns, and a method for synthesizing the same data (see Non-Patent Document 3) is known.

“Introduction to Continual Learning”, [online], ∞ WIki, [2023年2月6日検索]、インターネット<URL: https://wiki.continualai.org/the-continualai-wiki/introduction-to-continual-learning>“Introduction to Continual Learning”, [online], ∞ WIki, [Retrieved February 6, 2023], Internet <URL: https://wiki.continualai.org/the-continualai-wiki/introduction-to-continual-learning> “Federated learning”, [online], WIKIPEDIA, [2023年2月6日検索]、インターネット<URL: https://en.wikipedia.org/wiki/Federated_learning>“Federated learning”, [online], WIKIPEDIA, [Retrieved February 6, 2023], Internet <URL: https://en.wikipedia.org/wiki/Federated_learning> Samuel K. Ainsworth, Jonathan Hayase, Siddhartha Srinivasa, “GIT RE-BASIN: MERGING MODELS MODULO PERMUTATION SYMMETRIES”,2022年12月Samuel K. Ainsworth, Jonathan Hayase, Siddhartha Srinivasa, “GIT RE-BASIN: MERGING MODELS MODULO PERMUTATION SYMMETRIES”, December 2022

 しかしながら、従来技術では、データセットAで学習したモデルAおよびデータセットBで学習したモデルBから、データセットAとデータセットBとの両方で使えるモデルを生成することが困難な場合がある。例えば、従来の技術によれば、データセットAによる学習とデータセットBによる学習とを独立に実行することができず、運用の制約が生じてしまうという問題がある。 However, with conventional technology, it can be difficult to generate a model that can be used with both datasets A and B from model A trained with dataset A and model B trained with dataset B. For example, with conventional technology, learning with dataset A and learning with dataset B cannot be performed independently, resulting in operational constraints.

 本発明は、上記に鑑みてなされたものであって、データセットAで学習したモデルAおよびデータセットBで学習したモデルBから、データセットAとデータセットBとの両方で使えるモデルを生成することを目的とする。 The present invention has been made in consideration of the above, and aims to generate a model that can be used with both dataset A and dataset B from model A trained with dataset A and model B trained with dataset B.

 上述した課題を解決し、目的を達成するために、本発明に係る合成装置は、第1の学習用データを用いて学習された第1のモデルと、第2の学習用データを用いて学習された第2のモデルとを取得する取得部と、前記第1のモデルの入力データに対するweightおよび前記第2のモデルのweightを用いて、各weightのloss関数の勾配のflatnessに基づいて、前記第1のモデルと前記第2のモデルとを合成した合成モデルのweightを特定する特定部と、を有することを特徴とする。 In order to solve the above-mentioned problems and achieve the object, the synthesis device according to the present invention is characterized by having an acquisition unit that acquires a first model trained using first learning data and a second model trained using second learning data, and an identification unit that uses the weight of the first model for the input data and the weight of the second model to identify the weight of a synthesis model obtained by synthesizing the first model and the second model based on the flatness of the gradient of the loss function of each weight.

 本発明によれば、データセットAで学習したモデルAおよびデータセットBで学習したモデルBから、データセットAとデータセットBとの両方で使えるモデルを生成することが可能となる。 According to the present invention, it is possible to generate a model that can be used with both datasets A and B from model A trained with dataset A and model B trained with dataset B.

図1は、合成装置の概略構成を例示する模式図である。FIG. 1 is a schematic diagram illustrating the schematic configuration of a synthesis apparatus. 図2は、合成処理を説明するための図である。FIG. 2 is a diagram for explaining the synthesis process. 図3は、合成処理手順を示すフローチャートである。FIG. 3 is a flowchart showing the synthesis process procedure. 図4は、実施例を説明するための図である。FIG. 4 is a diagram for explaining the embodiment. 図5は、実施例を説明するための図である。FIG. 5 is a diagram for explaining the embodiment. 図6は、合成プログラムを実行するコンピュータを例示する図である。FIG. 6 is a diagram illustrating a computer that executes a synthesis program.

 以下、図面を参照して、本発明の一実施形態を詳細に説明する。なお、この実施形態により本発明が限定されるものではない。また、図面の記載において、同一部分には同一の符号を付して示している。 Below, one embodiment of the present invention will be described in detail with reference to the drawings. Note that the present invention is not limited to this embodiment. In addition, in the drawings, the same parts are denoted by the same reference numerals.

[合成装置の概要]
 本実施形態の合成装置は、データセットAで学習したモデルAおよびデータセットBで学習したモデルBを用いて、データセットAとデータセットBとの両方で使える合成モデルを生成する。
[Outline of synthesis apparatus]
The synthesis device of this embodiment generates a synthetic model that can be used for both datasets A and B using model A trained on dataset A and model B trained on dataset B.

 まず、データセットAとデータセットBとの合成データセットDを次式(1)のように定義する。 First, define a composite dataset D of dataset A and dataset B as follows:

Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000001

 また、教師あり学習において、lをloss関数、θをモデルの入力データに対するweightとして、モデルAのweightθ、モデルBのweightθ、合成データセットで学習した合成モデルのweightθA+Bは、それぞれ次式(2)、(3)、(4)で表される。 In addition, in supervised learning, where l is a loss function and θ is the weight of the model for the input data, the weight θ A of model A, the weight θ B of model B, and the weight θ A+B of the synthetic model trained with the synthetic data set are expressed by the following equations (2), (3), and (4), respectively.

Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000003
Figure JPOXMLDOC01-appb-M000003
Figure JPOXMLDOC01-appb-M000004
Figure JPOXMLDOC01-appb-M000004

 そして、次式(5)を満たす演算★を特定できれば、データセットAとデータセットBとを個別に学習し、学習済みweightを合成してデータセットA+Bで使える合成モデルを生成することが可能となる。 If we can identify the operation ★ that satisfies the following formula (5), it will be possible to learn dataset A and dataset B separately, and then combine the learned weights to generate a composite model that can be used with dataset A + B.

Figure JPOXMLDOC01-appb-M000005
Figure JPOXMLDOC01-appb-M000005

 ここで、Deep Learningでは、ネットワーク構造の操作に対する対象性が存在する。そこで、合成装置は、weightの並び替えに対する対象性を利用して、モデルの出力を変えないようにweightを並び替えるpermutationを行う。 Here, in Deep Learning, there is symmetry in the manipulation of the network structure. Therefore, the synthesis device uses the symmetry in the rearrangement of weights to perform permutation, which rearranges the weights so as not to change the model output.

 その場合に、次式(6)が成立する。 In that case, the following equation (6) holds.

Figure JPOXMLDOC01-appb-M000006
Figure JPOXMLDOC01-appb-M000006

 上記式(6)は、次式(7)に示すルールで置き換えることにより、次式(8)が実現する。 By replacing the above formula (6) with the rule shown in the following formula (7), the following formula (8) is realized.

Figure JPOXMLDOC01-appb-M000007
Figure JPOXMLDOC01-appb-M000007

Figure JPOXMLDOC01-appb-M000008
Figure JPOXMLDOC01-appb-M000008

 そして、合成装置は、以下に説明するように、モデルAのweightとモデルBの並び替えたweightとの平均を、合成モデルのweightとして特定することにより、合成モデルを生成する。 Then, the synthesis device generates a synthetic model by determining the average of the weight of model A and the rearranged weight of model B as the weight of the synthetic model, as described below.

[合成装置の構成]
 図1は、合成装置の概略構成を例示する模式図である。また、図2は、合成処理を説明するための図である。まず、図1に例示するように、合成装置10は、パソコン等の汎用コンピュータで実現され、入力部11、出力部12、通信制御部13、記憶部14、および制御部15を備える。
[Configuration of synthesis device]
Fig. 1 is a schematic diagram illustrating the schematic configuration of a synthesis device. Fig. 2 is a diagram for explaining synthesis processing. First, as illustrated in Fig. 1, a synthesis device 10 is realized by a general-purpose computer such as a personal computer, and includes an input unit 11, an output unit 12, a communication control unit 13, a storage unit 14, and a control unit 15.

 入力部11は、キーボードやマウス等の入力デバイスを用いて実現され、操作者による入力操作に対応して、制御部15に対して処理開始などの各種指示情報を入力する。出力部12は、液晶ディスプレイなどの表示装置、プリンター等の印刷装置等によって実現される。 The input unit 11 is realized using input devices such as a keyboard and a mouse, and inputs various instruction information such as a command to start processing to the control unit 15 in response to input operations by an operator. The output unit 12 is realized by a display device such as a liquid crystal display, a printing device such as a printer, etc.

 通信制御部13は、NIC(Network Interface Card)等で実現され、ネットワークを介したサーバ等の外部の装置と制御部15との通信を制御する。例えば、通信制御部13は、後述する合成処理の対象のデータセットや学習済のモデル等の各種情報を管理する管理装置等と制御部15との通信を制御する。 The communication control unit 13 is realized by a NIC (Network Interface Card) or the like, and controls communication between the control unit 15 and an external device such as a server via a network. For example, the communication control unit 13 controls communication between the control unit 15 and a management device or the like that manages various information such as the datasets to be subjected to the synthesis process described below and the trained models.

 記憶部14は、RAM(Random Access Memory)、フラッシュメモリ(Flash Memory)等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現される。記憶部14には、合成装置10を動作させる処理プログラムや、処理プログラムの実行中に使用されるデータなどが予め記憶され、あるいは処理の都度一時的に記憶される。なお、記憶部14は、通信制御部13を介して制御部15と通信する構成でもよい。 The storage unit 14 is realized by a semiconductor memory element such as a RAM (Random Access Memory) or a flash memory, or a storage device such as a hard disk or an optical disk. The storage unit 14 stores in advance the processing program that operates the synthesis device 10 and data used during execution of the processing program, or stores it temporarily each time processing is performed. The storage unit 14 may be configured to communicate with the control unit 15 via the communication control unit 13.

 制御部15は、CPU(Central Processing Unit)やNP(Network Processor)やFPGA(Field Programmable Gate Array)等を用いて実現され、メモリに記憶された処理プログラムを実行する。これにより、制御部15は、図1に例示するように、取得部15a、および特定部15bとして機能して、合成定処理を実行する。なお、これらの機能部は、それぞれが異なるハードウェアに実装されてもよい。また、制御部15は、その他の機能部を備えてもよい。 The control unit 15 is realized using a CPU (Central Processing Unit), NP (Network Processor), FPGA (Field Programmable Gate Array), etc., and executes a processing program stored in memory. As a result, the control unit 15 functions as an acquisition unit 15a and an identification unit 15b, as exemplified in FIG. 1, and executes synthesis determination processing. Note that each of these functional units may be implemented in different hardware. The control unit 15 may also include other functional units.

 取得部15aは、第1の学習用データを用いて学習された第1のモデルと、第2の学習用データを用いて学習された第2のモデルとを取得する。例えば、取得部15aは、後述する合成処理に用いられるデータセットAを用いて学習され運用中のモデルAと、その後に収集されたデータセットBを用いて学習されたモデルBとを取得する。 The acquisition unit 15a acquires a first model trained using the first learning data and a second model trained using the second learning data. For example, the acquisition unit 15a acquires model A, which is trained using data set A used in the synthesis process described below and is currently in operation, and model B, which is trained using data set B collected thereafter.

 取得部15aは、これらのデータを、入力部11を介して、あるいは各種の情報を管理する管理装置から通信制御部13を介して取得する。また、取得部15aは、取得した通信データを記憶部14に記憶させてもよい。なお、取得部15aは、これらの情報を記憶部14に記憶させずに、以下の特定部15bに転送してもよい。 The acquisition unit 15a acquires these pieces of data via the input unit 11, or from a management device that manages various pieces of information via the communication control unit 13. The acquisition unit 15a may also store the acquired communication data in the storage unit 14. The acquisition unit 15a may also transfer this information to the identification unit 15b described below without storing it in the storage unit 14.

 特定部15bは、第1のモデルの入力データに対するweightおよび第2のモデルのweightを用いて、各weightのloss関数の勾配のflatnessに基づいて、第1のモデルと第2のモデルとを合成した合成モデルのweightを特定する。 The identification unit 15b uses the weight of the first model for the input data and the weight of the second model to identify the weight of the combined model that combines the first model and the second model based on the flatness of the gradient of the loss function of each weight.

 具体的には、特定部15bは、第2のモデルの出力を変えずにweightの並び替えを行い、並び替えられたweightと第1のモデルのweightとを平均することにより、合成モデルのweightを特定する。 Specifically, the identification unit 15b rearranges the weights without changing the output of the second model, and identifies the weight of the composite model by averaging the rearranged weights and the weights of the first model.

 例えば、従来、データを使用せずに高速に並び替えを行えるweight matchingと呼ばれる手法が知られている。weight matchingでは、次式(9)の最適化を行う。 For example, a method called weight matching is known that can perform high-speed sorting without using data. Weight matching optimizes the following equation (9).

Figure JPOXMLDOC01-appb-M000009
Figure JPOXMLDOC01-appb-M000009

 そこで、特定部15bは、weightのloss関数のlandscapeのflatnessすなわち勾配の平坦度を考慮したweight matchingを行う。flatnessとは、例えば、特定部15bは、次式(10)に示すように、weightのloss関数の勾配のflatnessを表す項を追加して、最適化を行う。 The determination unit 15b therefore performs weight matching taking into account the flatness of the landscape of the weight loss function, i.e., the flatness of the gradient. For example, the determination unit 15b performs optimization by adding a term that represents the flatness of the gradient of the weight loss function, as shown in the following formula (10).

Figure JPOXMLDOC01-appb-M000010
Figure JPOXMLDOC01-appb-M000010

 具体的には、特定部15bは、次式(11)を最大化するようなpermutation演算子Pを探索する。ここで、βは、flatとのバランスをとる定数である。 Specifically, the identification unit 15b searches for a permutation operator P that maximizes the following equation (11). Here, β is a constant that balances with flat.

Figure JPOXMLDOC01-appb-M000011
Figure JPOXMLDOC01-appb-M000011

 ここで、異なる2つのデータセットA,Bの合成データセットは、次式(12)で表すことができる。例えば、s=0の場合にはデータセットAのみを表し、s=1はデータセットBのみを表す。 Here, the composite dataset of two different datasets A and B can be expressed by the following equation (12). For example, when s = 0, it represents only dataset A, and when s = 1, it represents only dataset B.

Figure JPOXMLDOC01-appb-M000012
Figure JPOXMLDOC01-appb-M000012

 また、データセットAで学習されたモデルAのweightの最適解は、次式(13)で表され、データセットBで学習されたモデルBのweightの最適解は、次式(14)で表される。 Furthermore, the optimal solution for the weight of model A trained on data set A is expressed by the following formula (13), and the optimal solution for the weight of model B trained on data set B is expressed by the following formula (14).

Figure JPOXMLDOC01-appb-M000013
Figure JPOXMLDOC01-appb-M000013
Figure JPOXMLDOC01-appb-M000014
Figure JPOXMLDOC01-appb-M000014

 また、合成データセットについて、次式(15)が成立する。 Furthermore, for the synthetic data set, the following equation (15) holds:

Figure JPOXMLDOC01-appb-M000015
Figure JPOXMLDOC01-appb-M000015

 ここで、上記式(15)の各項のloss関数をそれぞれ、h、f、gと簡略化して記すと、次式(16)のように表せる。 If we simplify the loss functions of each term in the above formula (15) and write them as h, f, and g, respectively, we can express it as the following formula (16).

Figure JPOXMLDOC01-appb-M000016
Figure JPOXMLDOC01-appb-M000016

 期待値は線形演算であるため、特定部15bは、fとgとの和で定義されるhの最小値を探索することになる。 Since the expected value is a linear calculation, the determination unit 15b searches for the minimum value of h, which is defined as the sum of f and g.

 ここで、従来のweight matchingでは、fとgとの最適解をできるだけ近づけるように探索する。ただし、モデルの出力は変わらないため、lossの大きさ自体は変わらない。その際に、gのflatnessは考慮されない。 Here, in conventional weight matching, a search is performed to find the optimal solution between f and g as close as possible. However, since the model output does not change, the magnitude of the loss itself does not change. In this case, the flatness of g is not taken into account.

 そこで、特定部15bは、fとgとの解を近づける際に、図2に例示するにように、flatnessの異なるgのうち、よりflatなものをgの解として選択する。図2には、移動可能なgの直感的なイメージが破線で示されている。なお、fはSGD(stochastic gradient descent、確率的勾配降下法)で探索した結果であるため、flatであることが仮定される。 Then, when the determination unit 15b brings the solutions of f and g closer together, it selects the flattest solution of g among the solutions of g with different flatness, as shown in FIG. 2. In FIG. 2, an intuitive image of a movable g is shown by a dashed line. Note that f is assumed to be flat because it is the result of a search using SGD (stochastic gradient descent).

 このように、特定部15bは、weightのloss関数の勾配のflatnessに基づいて、合成モデルのweightを特定することにより、合成モデルを生成する。 In this way, the identification unit 15b generates a composite model by identifying the weight of the composite model based on the flatness of the gradient of the loss function of the weight.

 また、特定部15bは、loss関数を算出する際に、データ凝縮により生成された代理データを用いてもよい。例えば、特定部15bは、gradient matchingといわれるデータ凝縮の手法を用いて、代理データを生成する。これにより、合成モデルの特定がより容易に可能となる。 In addition, when calculating the loss function, the identification unit 15b may use proxy data generated by data condensation. For example, the identification unit 15b generates proxy data using a data condensation technique called gradient matching. This makes it easier to identify the composite model.

 なお、本実施形態の合成処理は、例えば、一般的なAIに適用可能である。特に、Deep Learningが得意とする画像認識、自然言語処理、音声認識等との親和性が高く、顔認証システム等に用いることが可能である。例えば、会社Aの社員の顔認証モデルAを生成した後に、会社Aが会社Bと合併することになった場合に、会社Bの社員も顔認証システムを使えるように、会社A、Bの双方が使える合成モデルを生成することが可能となる。 The synthesis process of this embodiment can be applied to, for example, general AI. In particular, it has a high affinity with Deep Learning's specialties of image recognition, natural language processing, voice recognition, etc., and can be used in face recognition systems, etc. For example, if company A merges with company B after generating face recognition model A for employees of company A, it is possible to generate a synthetic model that can be used by both companies A and B so that employees of company B can also use the face recognition system.

[合成処理]
 次に、図3を参照して、本実施形態に係る合成装置10による合成処理について説明する。本実施形態の合成処理は、検知処理と検索処理とを含む。図3のフローチャートは、例えば、合成処理の開始を指示する操作入力があったタイミングで開始される。
[Composition Processing]
Next, a compositing process performed by the compositing device 10 according to the present embodiment will be described with reference to Fig. 3. The compositing process according to the present embodiment includes a detection process and a search process. The flowchart in Fig. 3 starts, for example, when an operation input is made to instruct the start of the compositing process.

 まず、取得部15aが、データセットAで学習されたモデルAと、データセットBで学習されたモデルBとを取得する(ステップS1)。例えば、取得部15aは、データセットAで学習され運用中のモデルAを取得する。また、取得部15aは、その後に取得されたデータセットBで学習されたモデルBを取得する。 First, the acquisition unit 15a acquires model A trained on dataset A and model B trained on dataset B (step S1). For example, the acquisition unit 15a acquires model A trained on dataset A and in operation. The acquisition unit 15a also acquires model B trained on dataset B acquired thereafter.

 次に、特定部15bが、モデルAのweightおよびモデルBのweightを用いて、各weightのloss関数の勾配のflatnessに基づいて、モデルAとモデルBとを合成した合成モデルのweightを特定する(ステップS2)。 Next, the identification unit 15b uses the weight of model A and the weight of model B to identify the weight of a composite model obtained by combining model A and model B based on the flatness of the gradient of the loss function of each weight (step S2).

 具体的には、特定部15bは、モデルBの出力を変えずにweightの並び替えを行い、並び替えられたweightとモデルAのweightとを平均することにより、合成モデルのweightを特定する。これにより、合成モデルが生成される。 Specifically, the identification unit 15b rearranges the weights without changing the output of model B, and identifies the weight of the composite model by averaging the rearranged weights and the weights of model A. In this way, the composite model is generated.

 その際に、特定部15bは、gradient matching等のデータ凝縮の手法を用いて代理データを生成し、loss関数を算出する際に、代理データを用いる。 At that time, the identification unit 15b generates proxy data using a data condensation technique such as gradient matching, and uses the proxy data when calculating the loss function.

 また、特定部15bは、生成した合成モデルを、例えば出力部12を介して運用装置に対して出力する。これにより、一連の合成処理が終了する。 The identification unit 15b also outputs the generated composite model to the operation device, for example, via the output unit 12. This completes the series of composite processes.

[効果]
 以上、説明したように、取得部15aは、第1の学習用データを用いて学習された第1のモデルと、第2の学習用データを用いて学習された第2のモデルとを取得する。特定部15bが、第1のモデルの入力データに対するweightおよび第2のモデルのweightを用いて、各weightのloss関数の勾配のflatnessに基づいて、第1のモデルと第2のモデルとを合成した合成モデルのweightを特定する。
[effect]
As described above, the acquiring unit 15a acquires the first model trained using the first learning data and the second model trained using the second learning data. The identifying unit 15b identifies the weight of a composite model obtained by combining the first model and the second model, using the weight of the first model for the input data and the weight of the second model, based on the flatness of the gradient of the loss function of each weight.

 具体的には、特定部15bは、第2のモデルの出力を変えずにweightの並び替えを行い、並び替えられたweightと第1のモデルのweightとを平均することにより、合成モデルのweightを特定する。 Specifically, the identification unit 15b rearranges the weights without changing the output of the second model, and identifies the weight of the composite model by averaging the rearranged weights and the weights of the first model.

 これにより、データセットAで学習したモデルAおよびデータセットBで学習したモデルBから、データセットAとデータセットBとの両方で使えるモデルを生成することが可能となる。 This makes it possible to generate a model that can be used with both datasets A and B from model A trained on dataset A and model B trained on dataset B.

 また、特定部15dが、loss関数を算出する際に、データ凝縮により生成された代理データを用いる。これにより、運用の条件が緩和され、合成モデルの特定がより容易に可能となる。 In addition, when the identification unit 15d calculates the loss function, it uses the proxy data generated by data condensation. This relaxes the operating conditions and makes it easier to identify the composite model.

[実施例]
 図4および図5は、実施例を説明するための図である。本実施例では、MNISTで学習したモデルAとFasionMNISTで学習したモデルBとを用いて、上記の実施形態の合成処理により、合成モデルを生成した。その際に、モデルはMLPとし、データセットはMNIST、FashionMNISTを使用し、また「https://github.com/samuela/git-re-basin」を参照して作成したソースコードを使用した。そして、MNIST、FashionMNISTとの両方を合わせた合成データセットによる合成モデルの精度とlossとを評価した。
[Example]
4 and 5 are diagrams for explaining the examples. In this example, a synthetic model was generated by the synthesis process of the above embodiment using model A trained with MNIST and model B trained with FashionMNIST. At that time, the model was set to MLP, MNIST and FashionMNIST were used as datasets, and source code created with reference to "https://github.com/samuela/git-re-basin" was used. Then, the accuracy and loss of the synthetic model using a synthetic dataset combining both MNIST and FashionMNIST were evaluated.

 図4および図5の縦軸は、MNISTにより学習されたモデルAと、FashionモデルMNISTにより学習されたモデルBとの合成の割合λを示し、縦軸は合成モデルの精度を示す。 The vertical axis in Figures 4 and 5 shows the synthesis ratio λ between model A trained by MNIST and model B trained by the Fashion model MNIST, and the vertical axis shows the accuracy of the synthesized model.

 ここで、モデルAはMNISTに対して100%近い精度が示されるが、FashionMNISTに対しては0%に近い精度が示される。したがって、図4および図5において、モデルAは合成データセットに対しては、精度が50%程度となっている。また、モデルBは、FashionMNISTに対して100%近い精度が示されるが、MNISTに対しては0%に近い精度が示される。したがって、図4および図5において、モデルBも同様に、合成データセットに対しては精度が50%程度となっている。 Here, model A shows an accuracy of nearly 100% for MNIST, but an accuracy of nearly 0% for FashionMNIST. Therefore, in Figures 4 and 5, model A has an accuracy of about 50% for the synthetic dataset. Also, model B shows an accuracy of nearly 100% for FashionMNIST, but an accuracy of nearly 0% for MNIST. Therefore, in Figures 4 and 5, model B also has an accuracy of about 50% for the synthetic dataset.

 まず、図4には、flatonessを考慮しない場合であって、破線で示すPermutationを行わない場合、実線で示すPermutationを行った場合の2パターンのケースについて例示されている。また、各ケースについて、学習時(Train、太線)と運用時(Test)とについて例示されている。 First, Figure 4 shows two examples of cases where flatness is not taken into account: no permutation shown by the dashed line, and permutation shown by the solid line. Also, examples are shown for each case during learning (Train, thick line) and operation (Test).

 また、図5には、flatonessを考慮した場合について、図4の場合と同様の2パターンのケースについて例示されている。 In addition, Figure 5 shows two examples of cases similar to those in Figure 4 when flatness is taken into account.

 図4および図5のいずれにおいても、実線で示すPermutationを行った場合の方が、破線で示すPermutationを行わない場合より合成モデルの精度が高いことが確認された。 In both Figures 4 and 5, it was confirmed that the accuracy of the synthetic model was higher when permutation was performed (shown by the solid line) than when permutation was not performed (shown by the dashed line).

 また、図5に示したflatonessを考慮した場合の方が、図4に示したflatonessを考慮しない場合より、合成モデルの精度が高いことが確認された。 It was also confirmed that the accuracy of the synthetic model was higher when the flatness shown in Figure 5 was taken into account than when the flatness shown in Figure 4 was not taken into account.

[プログラム]
 上記実施形態に係る合成装置10が実行する処理をコンピュータが実行可能な言語で記述したプログラムを作成することもできる。一実施形態として、合成装置10は、パッケージソフトウェアやオンラインソフトウェアとして上記の合成処理を実行する合成プログラムを所望のコンピュータにインストールさせることによって実装できる。例えば、上記の合成プログラムを情報処理装置に実行させることにより、情報処理装置を合成装置10として機能させることができる。また、その他にも、情報処理装置にはスマートフォン、携帯電話機やPHS(Personal Handyphone System)等の移動体通信端末、さらには、PDA(Personal Digital Assistant)等のスレート端末等がその範疇に含まれる。また、合成装置10の機能を、クラウドサーバに実装してもよい。
[program]
A program in which the process executed by the synthesis device 10 according to the above embodiment is written in a language executable by a computer can also be created. As an embodiment, the synthesis device 10 can be implemented by installing a synthesis program that executes the above synthesis process as package software or online software on a desired computer. For example, the above synthesis program can be executed by an information processing device, so that the information processing device can function as the synthesis device 10. In addition, the information processing device also includes mobile communication terminals such as smartphones, mobile phones, and PHS (Personal Handyphone System), as well as slate terminals such as PDA (Personal Digital Assistant). The function of the synthesis device 10 may also be implemented on a cloud server.

 図6は、合成プログラムを実行するコンピュータの一例を示す図である。コンピュータ1000は、例えば、メモリ1010と、CPU1020と、ハードディスクドライブインタフェース1030と、ディスクドライブインタフェース1040と、シリアルポートインタフェース1050と、ビデオアダプタ1060と、ネットワークインタフェース1070とを有する。これらの各部は、バス1080によって接続される。 FIG. 6 is a diagram showing an example of a computer that executes a synthesis program. The computer 1000 has, for example, a memory 1010, a CPU 1020, a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. These components are connected by a bus 1080.

 メモリ1010は、ROM(Read Only Memory)1011およびRAM1012を含む。ROM1011は、例えば、BIOS(Basic Input Output System)等のブートプログラムを記憶する。ハードディスクドライブインタフェース1030は、ハードディスクドライブ1031に接続される。ディスクドライブインタフェース1040は、ディスクドライブ1041に接続される。ディスクドライブ1041には、例えば、磁気ディスクや光ディスク等の着脱可能な記憶媒体が挿入される。シリアルポートインタフェース1050には、例えば、マウス1051およびキーボード1052が接続される。ビデオアダプタ1060には、例えば、ディスプレイ1061が接続される。 The memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM 1012. The ROM 1011 stores a boot program such as a BIOS (Basic Input Output System). The hard disk drive interface 1030 is connected to a hard disk drive 1031. The disk drive interface 1040 is connected to a disk drive 1041. A removable storage medium such as a magnetic disk or optical disk is inserted into the disk drive 1041. The serial port interface 1050 is connected to a mouse 1051 and a keyboard 1052, for example. The video adapter 1060 is connected to a display 1061, for example.

 ここで、ハードディスクドライブ1031は、例えば、OS1091、アプリケーションプログラム1092、プログラムモジュール1093およびプログラムデータ1094を記憶する。上記実施形態で説明した各情報は、例えばハードディスクドライブ1031やメモリ1010に記憶される。 Here, the hard disk drive 1031 stores, for example, an OS 1091, an application program 1092, a program module 1093, and program data 1094. Each piece of information described in the above embodiment is stored, for example, in the hard disk drive 1031 or memory 1010.

 また、合成プログラムは、例えば、コンピュータ1000によって実行される指令が記述されたプログラムモジュール1093として、ハードディスクドライブ1031に記憶される。具体的には、上記実施形態で説明した合成装置10が実行する各処理が記述されたプログラムモジュール1093が、ハードディスクドライブ1031に記憶される。 The synthesis program is stored in the hard disk drive 1031, for example, as a program module 1093 in which instructions to be executed by the computer 1000 are written. Specifically, the program module 1093 in which each process executed by the synthesis device 10 described in the above embodiment is written is stored in the hard disk drive 1031.

 また、合成プログラムによる情報処理に用いられるデータは、プログラムデータ1094として、例えば、ハードディスクドライブ1031に記憶される。そして、CPU1020が、ハードディスクドライブ1031に記憶されたプログラムモジュール1093やプログラムデータ1094を必要に応じてRAM1012に読み出して、上述した各手順を実行する。 In addition, data used for information processing by the synthesis program is stored as program data 1094, for example, in the hard disk drive 1031. Then, the CPU 1020 reads the program module 1093 and program data 1094 stored in the hard disk drive 1031 into the RAM 1012 as necessary, and executes each of the above-mentioned procedures.

 なお、合成プログラムに係るプログラムモジュール1093やプログラムデータ1094は、ハードディスクドライブ1031に記憶される場合に限られず、例えば、着脱可能な記憶媒体に記憶されて、ディスクドライブ1041等を介してCPU1020によって読み出されてもよい。あるいは、合成プログラムに係るプログラムモジュール1093やプログラムデータ1094は、LAN(Local Area Network)やWAN(Wide Area Network)等のネットワークを介して接続された他のコンピュータに記憶され、ネットワークインタフェース1070を介してCPU1020によって読み出されてもよい。 The program module 1093 and program data 1094 related to the synthesis program are not limited to being stored in the hard disk drive 1031, but may be stored in a removable storage medium, for example, and read by the CPU 1020 via the disk drive 1041 or the like. Alternatively, the program module 1093 and program data 1094 related to the synthesis program may be stored in another computer connected via a network, such as a LAN (Local Area Network) or a WAN (Wide Area Network), and read by the CPU 1020 via the network interface 1070.

 以上、本発明者によってなされた発明を適用した実施形態について説明したが、本実施形態による本発明の開示の一部をなす記述および図面により本発明は限定されることはない。すなわち、本実施形態に基づいて当業者等によりなされる他の実施形態、実施例および運用技術等は全て本発明の範疇に含まれる。 The above describes an embodiment of the invention made by the inventor, but the present invention is not limited to the descriptions and drawings that form part of the disclosure of the present invention according to this embodiment. In other words, other embodiments, examples, operational techniques, etc. made by those skilled in the art based on this embodiment are all included in the scope of the present invention.

 10 合成装置
 11 入力部
 12 出力部
 13 通信制御部
 14 記憶部
 15 制御部
 15a 取得部
 15b 特定部
REFERENCE SIGNS LIST 10 Synthesis device 11 Input unit 12 Output unit 13 Communication control unit 14 Storage unit 15 Control unit 15a Acquisition unit 15b Identification unit

Claims (5)

 第1の学習用データを用いて学習された第1のモデルと、第2の学習用データを用いて学習された第2のモデルとを取得する取得部と、
 前記第1のモデルの入力データに対するweightおよび前記第2のモデルのweightを用いて、各weightのloss関数の勾配のflatnessに基づいて、前記第1のモデルと前記第2のモデルとを合成した合成モデルのweightを特定する特定部と、
 を有することを特徴とする合成装置。
an acquisition unit that acquires a first model trained using the first learning data and a second model trained using the second learning data;
A determination unit that determines a weight of a composite model obtained by combining the first model and the second model based on a flatness of a gradient of a loss function of each weight by using a weight for the input data of the first model and a weight of the second model;
A synthesis apparatus comprising:
 前記特定部は、前記第2のモデルの出力を変えずに前記weightの並び替えを行い、並び替えられたweightと前記第1のモデルのweightとを平均することにより、前記合成モデルのweightを特定することを特徴とする請求項1に記載の合成装置。 The synthesis device according to claim 1, characterized in that the identification unit identifies the weight of the synthesis model by rearranging the weights without changing the output of the second model and averaging the rearranged weights and the weights of the first model.  前記特定部は、前記loss関数を算出する際に、データ凝縮により生成された代理データを用いることを特徴とする請求項1に記載の合成装置。 The synthesis device according to claim 1, characterized in that the determination unit uses proxy data generated by data condensation when calculating the loss function.  合成装置が実行する合成方法であって、
 第1の学習用データを用いて学習された第1のモデルと、第2の学習用データを用いて学習された第2のモデルとを取得する取得工程と、
 前記第1のモデルの入力データに対するweightおよび前記第2のモデルのweightを用いて、各weightのloss関数の勾配のflatnessに基づいて、前記第1のモデルと前記第2のモデルとを合成した合成モデルのweightを特定する特定工程と、
 を含んだことを特徴とする合成方法。
A synthesis method performed by a synthesis device, comprising:
An acquisition step of acquiring a first model trained using the first training data and a second model trained using the second training data;
A step of specifying a weight of a composite model obtained by combining the first model and the second model based on the flatness of the gradient of a loss function of each weight using the weight of the first model for the input data and the weight of the second model;
A synthesis method comprising:
 第1の学習用データを用いて学習された第1のモデルと、第2の学習用データを用いて学習された第2のモデルとを取得する取得ステップと、
 前記第1のモデルの入力データに対するweightおよび前記第2のモデルのweightを用いて、各weightのloss関数の勾配のflatnessに基づいて、前記第1のモデルと前記第2のモデルとを合成した合成モデルのweightを特定する特定ステップと、
 をコンピュータに実行させるための合成プログラム。
An acquisition step of acquiring a first model trained using the first training data and a second model trained using the second training data;
A step of specifying a weight of a composite model obtained by combining the first model and the second model based on a flatness of a gradient of a loss function of each weight using a weight for the input data of the first model and a weight of the second model;
A synthesis program for causing a computer to execute the above.
PCT/JP2023/007682 2023-03-01 2023-03-01 Combining device, combining method, and combining program Pending WO2024180744A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2023/007682 WO2024180744A1 (en) 2023-03-01 2023-03-01 Combining device, combining method, and combining program
JP2025503529A JPWO2024180744A1 (en) 2023-03-01 2023-03-01

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2023/007682 WO2024180744A1 (en) 2023-03-01 2023-03-01 Combining device, combining method, and combining program

Publications (1)

Publication Number Publication Date
WO2024180744A1 true WO2024180744A1 (en) 2024-09-06

Family

ID=92589402

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2023/007682 Pending WO2024180744A1 (en) 2023-03-01 2023-03-01 Combining device, combining method, and combining program

Country Status (2)

Country Link
JP (1) JPWO2024180744A1 (en)
WO (1) WO2024180744A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020148998A1 (en) * 2019-01-18 2020-07-23 オムロン株式会社 Model integration device, method, and program, and inference, inspection, and control system
JP2022191762A (en) * 2021-06-16 2022-12-28 株式会社日立製作所 Integration device, learning device, and integration method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020148998A1 (en) * 2019-01-18 2020-07-23 オムロン株式会社 Model integration device, method, and program, and inference, inspection, and control system
JP2022191762A (en) * 2021-06-16 2022-12-28 株式会社日立製作所 Integration device, learning device, and integration method

Also Published As

Publication number Publication date
JPWO2024180744A1 (en) 2024-09-06

Similar Documents

Publication Publication Date Title
CN110852438B (en) Model generation method and device
US11727203B2 (en) Information processing system, feature description method and feature description program
US20190318268A1 (en) Distributed machine learning at edge nodes
CN109885628B (en) Tensor transposition method and device, computer and storage medium
US10761734B2 (en) Systems and methods for data frame representation
JP6725452B2 (en) Classification device, classification method, and classification program
CN115605862A (en) Training differentiable renderers and neural networks for 3D model database queries
CN113468344B (en) Entity relationship extraction method and device, electronic equipment and computer readable medium
US11120460B2 (en) Effectiveness of service complexity configurations in top-down complex services design
US9294122B2 (en) Field level compression in parallel data flows
JP7573548B2 (en) Feature Vector Feasibility Estimation
CN116306676A (en) Information processing method for online business application and artificial intelligent service system
WO2024180744A1 (en) Combining device, combining method, and combining program
KR20200110881A (en) Apparatus and method for data augmentation using non-negative matrix factorization
WO2024180743A1 (en) Synthesis device, synthesis method, and synthesis program
WO2024232061A1 (en) Synthesis device, synthesis method, and synthesis program
US20220188101A1 (en) Shape-based code comparisons
WO2022264387A1 (en) Training device, training method, and training program
US20230237095A1 (en) Metadata for Graph Connected Databases
CN114579757B (en) A text processing method and device based on knowledge graph assistance
JP2024108021A (en) Information processing device, information processing method, and information processing program
WO2020255299A1 (en) Abnormality degree estimation device, abnormality degree estimation method, and program
JP7593491B2 (en) DETECTION APPARATUS, DETECTION METHOD, AND DETECTION PROGRAM
WO2020250313A1 (en) Polygon lookup method
CN114022274A (en) Graph neural network structure searching method and system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23925299

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2025503529

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 2025503529

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE