WO2024014035A1

WO2024014035A1 - Data prediction support method and data prediction system

Info

Publication number: WO2024014035A1
Application number: PCT/JP2023/006982
Authority: WO
Inventors: 和樹南波; 将人内海; 喜仁木下; 洋飯村; 大輔浜場; 広晃小川; 潤山崎; 大地渡邊
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2022-07-12
Filing date: 2023-02-27
Publication date: 2024-01-18
Anticipated expiration: 2025-01-12
Also published as: JP2024010530A

Abstract

This system extracts a plurality of data set candidates from a plurality of pieces of observation data having periodicity. The data set candidate is a set of two or more pieces of observation data for a target period. The system calculates vector data for each data set candidate, the vector data including, as elements, the magnitudes of a plurality of different frequency components of the observation data for each piece of the observation data in the dataset candidate, the system classifies two or more pieces of the vector data into one or more data groups on the basis of a distance between pieces of the vector data, and the system calculates the goodness of fit to a prediction model with the observation data serving as the input and the prediction data serving as the output, on the basis of the number of data pieces for each data group. The system outputs a data set candidate to be used for prediction using the prediction model, on the basis of the goodness of fit that is calculated for each data set candidate.

Description

Data prediction support method and data prediction system

　本発明は、概して、データ予測に関する。 The present invention generally relates to data prediction.

　電力事業やガス事業などのエネルギー事業分野や、通信事業分野、タクシーや配送業などの運送事業分野などでは、消費者の需要に合わせた設備稼働や資源配分を行うために、需要量の値の予測を行う。 In the energy business field such as electric power business and gas business, the communication business field, and the transportation business field such as taxi and delivery businesses, the value of demand is calculated in order to operate equipment and allocate resources in accordance with consumer demand. Make predictions.

　例えば電力事業の分野では、電気の発電量と需要量とを常に一致させなければならないという物理的な制約がある。必要十分な発電機を事前に待機させる必要があるため、電力の需要を正確に予測する必要がある。 For example, in the field of electric power business, there is a physical constraint that the amount of electricity generated must always match the amount of electricity demanded. Since it is necessary to have sufficient generators on standby in advance, it is necessary to accurately predict the demand for electricity.

　再生可能エネルギーや電気自動車等の分散電源の普及による電力消費傾向の多様化や、電力自由化に伴う小売契約切替の高頻度化が生じている。従って、管轄エリア一括での需要予測では捉えられない電力消費傾向の変化が発生し、高精度の需要予測が困難になる。電力需要の予測に関する技術は、例えば特許文献１および２に開示されている。 Power consumption trends are diversifying due to the spread of renewable energy and distributed power sources such as electric vehicles, and retail contract switching is becoming more frequent due to electricity liberalization. Therefore, changes in power consumption trends occur that cannot be captured by a collective demand forecast for the jurisdiction area, making highly accurate demand forecasting difficult. Techniques related to predicting power demand are disclosed in Patent Documents 1 and 2, for example.

　特許文献１に開示の電力需要予測システムは、複数の需要家の所定の時間毎における消費電力量を示す実績データに基づいて需要家を複数のグループに分類し、グループのそれぞれから一部の需要家を選択的に抽出し、当該需要家の実績データに基づいて、需要家の電力需要の合計である合計電力需要を予測する。 The power demand forecasting system disclosed in Patent Document 1 classifies consumers into a plurality of groups based on performance data indicating the amount of power consumed by a plurality of consumers at each predetermined time, and predicts a portion of the demand from each of the groups. Houses are selectively extracted, and the total power demand, which is the sum of the power demands of the consumers, is predicted based on the actual data of the consumers.

　特許文献２に開示の電力需要予測装置は、スマートメータの電力計測データを傾向の類似するグループに分類し、グループ別に予測モデルの構築および予測値の算定を行う。 The power demand prediction device disclosed in Patent Document 2 classifies power measurement data from a smart meter into groups with similar trends, and constructs a prediction model and calculates a predicted value for each group.

特開２０１６－１８１０６０号公報Japanese Patent Application Publication No. 2016-181060 特開２０１７－１８２２６６号公報Japanese Patent Application Publication No. 2017-182266

　複数の観測データから、二つ以上の観測データで構成されたデータセットを抽出し、データセットにおける二つ以上の観測データを複数のグループに分類し、グループ別に、観測データ（観測された値の時系列データ）を入力として予測データ（予測された値の時系列データ）を出力とする予測モデルを構築することが考えられる。 A dataset consisting of two or more observational data is extracted from multiple observational data, the two or more observational data in the dataset are classified into multiple groups, and the observational data (observed values) are classified into multiple groups. It is conceivable to construct a prediction model that uses time-series data (time-series data) as input and prediction data (time-series data of predicted values) as output.

　特許文献１によれば、グループを、所在地、契約種別、契約電力及び業種などの属性が同じ需要家のグループとすることができる。しかし、同一属性の需要家の間でも、再生可能エネルギーや電気自動車の導入の有無や、需要家の行動特性の違いにより、電力消費傾向が異なる場合がある。同一グループに異なる傾向の観測データが混在する場合、予測モデルの精度が低下する。 According to Patent Document 1, a group can be a group of consumers having the same attributes such as location, contract type, contract power, and industry. However, even among consumers with the same attributes, electricity consumption trends may differ depending on whether renewable energy or electric vehicles have been introduced or differences in the behavioral characteristics of the consumers. When observed data with different trends are mixed in the same group, the accuracy of the prediction model decreases.

　特許文献２によれば、グループを、観測データ自体の傾向が類似したグループとすることができる。しかし、この場合、グループによっては、観測データ数が十分でなく、故に、十分な精度で予測することができない。 According to Patent Document 2, a group can be a group in which the observed data itself has a similar tendency. However, in this case, depending on the group, the number of observation data is insufficient, and therefore prediction cannot be made with sufficient accuracy.

　本発明の目的は、以上の点を考慮してなされたもので、高い予測精度の予測モデルの構築に有用なデータセットを抽出することにある。 The purpose of the present invention was made in consideration of the above points, and is to extract a data set useful for constructing a prediction model with high prediction accuracy.

　データ予測システムは、それぞれ周期性を持つ複数の観測データを含むデータソースから複数のデータセット候補を抽出する。当該複数のデータセット候補の各々は、対象期間についての二つ以上の観測データの集合である。各データセット候補において、観測データは、複数の観測データのうちの一つの観測データの全部または一部である。データ予測システムは、複数のデータセット候補の各々について、当該データセット候補における観測データ毎に、当該観測データの異なる複数の周波数成分の大きさを要素に持つベクトルデータを算出し、二つ以上のベクトルデータを、ベクトルデータ間の距離に基づき、一つ以上のデータグループに分類し、データグループ毎のデータ数を基に、観測データを入力とし予測データを出力とする予測モデルへの適合度を算出する。データ予測システムは、複数のデータセット候補の各々について算出された適合度を基に、予測モデルを用いた予測処理に使用されるデータセット候補を適合データセットとして出力する。 A data prediction system extracts multiple dataset candidates from a data source that includes multiple observational data, each with periodicity. Each of the plurality of data set candidates is a set of two or more observation data for the target period. In each data set candidate, the observation data is all or part of one of the plurality of observation data. For each of a plurality of dataset candidates, the data prediction system calculates, for each observational data in the dataset candidate, vector data whose elements are the magnitudes of multiple different frequency components of the observational data, and Vector data is classified into one or more data groups based on the distance between vector data, and based on the number of data in each data group, the goodness of fit to a prediction model that uses observed data as input and predicted data as output is calculated. calculate. The data prediction system outputs a dataset candidate used for prediction processing using a prediction model as a compatible dataset based on the degree of fitness calculated for each of the plurality of dataset candidates.

　本発明によれば、高い予測精度の予測モデルの構築に有用なデータセットを抽出することが可能となる。 According to the present invention, it is possible to extract a dataset useful for constructing a prediction model with high prediction accuracy.

データ予測システムの第一の実施の形態による装置構成を示す図である。1 is a diagram showing a device configuration according to a first embodiment of a data prediction system. 図１の形態を電力の需給管理システムで実施した場合の装置構成を示す図である。FIG. 2 is a diagram showing an apparatus configuration when the embodiment of FIG. 1 is implemented in an electric power supply and demand management system. データ予測システムの第一の実施の形態による装置内部構成を示す図である。1 is a diagram showing the internal configuration of a data prediction system according to a first embodiment; FIG. データ予測システムの第一の実施の形態によるデータフローを示す図である。FIG. 2 is a diagram showing a data flow according to the first embodiment of the data prediction system. データ予測システムの第一の実施の形態による処理フローを示す図である。FIG. 3 is a diagram showing a processing flow according to the first embodiment of the data prediction system. 適合抽出部の詳細を示す図である。FIG. 3 is a diagram showing details of a matching extraction section. 適合抽出部の処理フローを示す図である。It is a figure which shows the processing flow of a suitability extraction part. 適合抽出部の処理の一部の概要を示す図である。FIG. 3 is a diagram illustrating an overview of a part of the processing of the compatibility extraction unit. 適合抽出部の処理の残りの概要を示す図である。FIG. 6 is a diagram illustrating the remaining outline of the process of the matching extraction unit. 傾向分類部の詳細を示す図である。FIG. 3 is a diagram showing details of a trend classification section. グループ別予測部の詳細を示す図である。FIG. 3 is a diagram showing details of a group-by-group prediction unit. 全体予測部の詳細を示す図である。FIG. 3 is a diagram showing details of the overall prediction unit. 効果の一例の概要を示す図である。It is a figure showing an outline of an example of an effect. 傾向分類部の第四の実施の形態によるデータフローを示す図である。It is a figure which shows the data flow by the fourth embodiment of a trend classification part. 適合抽出部の第六の実施の形態によるデータフローを示す図である。FIG. 12 is a diagram showing a data flow according to a sixth embodiment of a matching extraction unit. 適合抽出部の第七の実施の形態によるデータフローを示す図である。FIG. 11 is a diagram showing a data flow according to a seventh embodiment of a matching extraction unit. 適合抽出部の第八の実施の形態によるデータフローを示す図である。FIG. 7 is a diagram showing a data flow according to an eighth embodiment of a matching extraction unit.

　以下の説明では、「インターフェース装置」は、一つ以上のインターフェースデバイスで良い。当該一つ以上のインターフェースデバイスは、下記のうちの少なくとも一つで良い。
・一つ以上のＩ／Ｏ（Ｉｎｐｕｔ／Ｏｕｔｐｕｔ）インターフェースデバイス。Ｉ／Ｏ（Ｉｎｐｕｔ／Ｏｕｔｐｕｔ）インターフェースデバイスは、Ｉ／Ｏデバイスと遠隔の表示用計算機とのうちの少なくとも一つに対するインターフェースデバイスである。表示用計算機に対するＩ／Ｏインターフェースデバイスは、通信インターフェースデバイスで良い。少なくとも一つのＩ／Ｏデバイスは、ユーザインターフェースデバイス、例えば、キーボードおよびポインティングデバイスのような入力デバイスと、表示デバイスのような出力デバイスとのうちのいずれでも良い。
・一つ以上の通信インターフェースデバイス。一つ以上の通信インターフェースデバイスは、一つ以上の同種の通信インターフェースデバイス（例えば一つ以上のＮＩＣ（Ｎｅｔｗｏｒｋ　Ｉｎｔｅｒｆａｃｅ　Ｃａｒｄ））であっても良いし二つ以上の異種の通信インターフェースデバイス（例えばＮＩＣとＨＢＡ（Ｈｏｓｔ　Ｂｕｓ　Ａｄａｐｔｅｒ））であっても良い。 In the following description, an "interface device" may be one or more interface devices. The one or more interface devices may be at least one of the following:
- One or more I/O (Input/Output) interface devices. The I/O (Input/Output) interface device is an interface device for at least one of an I/O device and a remote display computer. The I/O interface device for the display computer may be a communication interface device. The at least one I/O device may be a user interface device, eg, an input device such as a keyboard and pointing device, or an output device such as a display device.
- One or more communication interface devices. The one or more communication interface devices may be one or more communication interface devices of the same type (for example, one or more NICs (Network Interface Cards)), or two or more communication interface devices of different types (for example, one or more NICs). It may also be an HBA (Host Bus Adapter).

　また、以下の説明では、「メモリ」は、一つ以上のメモリデバイスであり、典型的には主記憶デバイスで良い。メモリにおける少なくとも一つのメモリデバイスは、揮発性メモリデバイスであっても良いし不揮発性メモリデバイスであっても良い。 Additionally, in the following description, "memory" refers to one or more memory devices, typically a main storage device. At least one memory device in the memory may be a volatile memory device or a non-volatile memory device.

　また、以下の説明では、「永続記憶装置」は、一つ以上の永続記憶デバイスである。永続記憶デバイスは、典型的には、不揮発性の記憶デバイス（例えば補助記憶デバイス）であり、具体的には、例えば、ＨＤＤ（Ｈａｒｄ　Ｄｉｓｋ　Ｄｒｉｖｅ）またはＳＳＤ（Ｓｏｌｉｄ　Ｓｔａｔｅ　Ｄｒｉｖｅ）である。 Also, in the following description, "persistent storage" refers to one or more persistent storage devices. The persistent storage device is typically a nonvolatile storage device (for example, an auxiliary storage device), and specifically, for example, an HDD (Hard Disk Drive) or an SSD (Solid State Drive).

　また、以下の説明では、「記憶装置」は、メモリと永続記憶装置の少なくともメモリで良い。 Furthermore, in the following description, a "storage device" may be at least a memory or a permanent storage device.

　また、以下の説明では、「プロセッサ」は、一つ以上のプロセッサデバイスである。少なくとも一つのプロセッサデバイスは、典型的には、ＣＰＵ（Ｃｅｎｔｒａｌ　Ｐｒｏｃｅｓｓｉｎｇ　Ｕｎｉｔ）のようなマイクロプロセッサデバイスであるが、ＧＰＵ（Ｇｒａｐｈｉｃｓ　Ｐｒｏｃｅｓｓｉｎｇ　Ｕｎｉｔ）のような他種のプロセッサデバイスでも良い。少なくとも一つのプロセッサデバイスは、シングルコアでも良いしマルチコアでも良い。少なくとも一つのプロセッサデバイスは、プロセッサコアでも良い。少なくとも一つのプロセッサデバイスは、処理の一部または全部を行うハードウェア回路（例えばＦＰＧＡ（Ｆｉｅｌｄ－Ｐｒｏｇｒａｍｍａｂｌｅ　Ｇａｔｅ　Ａｒｒａｙ）またはＡＳＩＣ（Ａｐｐｌｉｃａｔｉｏｎ　Ｓｐｅｃｉｆｉｃ　Ｉｎｔｅｇｒａｔｅｄ　Ｃｉｒｃｕｉｔ））といった広義のプロセッサデバイスでも良い。 Also, in the following description, a "processor" refers to one or more processor devices. The at least one processor device is typically a microprocessor device such as a CPU (Central Processing Unit), but may be another type of processor device such as a GPU (Graphics Processing Unit). At least one processor device may be single-core or multi-core. The at least one processor device may be a processor core. At least one processor device may be a broadly defined processor device such as a hardware circuit (for example, a field-programmable gate array (FPGA) or an application-specific integrated circuit (ASIC)) that performs part or all of the processing.

　また、以下の説明では、「ｙｙｙ部」の表現にて機能を説明することがあるが、機能は、一つ以上のコンピュータプログラムがプロセッサによって実行されることで実現されても良いし、一つ以上のハードウェア回路（例えばＦＰＧＡまたはＡＳＩＣ）によって実現されても良いし、それらの組合せによって実現されても良い。プログラムがプロセッサによって実行されることで機能が実現される場合、定められた処理が、適宜に記憶装置および／またはインターフェース装置等を用いながら行われるため、機能はプロセッサの少なくとも一部とされても良い。機能を主語として説明された処理は、プロセッサあるいはそのプロセッサを有する装置が行う処理としても良い。プログラムは、プログラムソースからインストールされても良い。プログラムソースは、例えば、プログラム配布計算機または計算機が読み取り可能な記録媒体（例えば非一時的な記録媒体）であっても良い。各機能の説明は一例であり、複数の機能が一つの機能にまとめられたり、一つの機能が複数の機能に分割されたりしても良い。 In addition, in the following explanation, functions may be explained using the expression "yyy part", but functions may be realized by one or more computer programs being executed by a processor, or one or more computer programs may be executed by a processor. It may be realized by the above hardware circuits (for example, FPGA or ASIC), or by a combination thereof. When a function is realized by a program being executed by a processor, the specified processing is performed using a storage device and/or an interface device as appropriate, so the function may be implemented as at least a part of the processor. good. A process described using a function as a subject may be a process performed by a processor or a device having the processor. The program may be installed from program source. The program source may be, for example, a program distribution computer or a computer-readable recording medium (for example, a non-temporary recording medium). The description of each function is an example, and a plurality of functions may be combined into one function, or one function may be divided into a plurality of functions.

　以下、図面を参照して、本発明の幾つかの実施の形態を詳述する。
（１）第一の実施の形態
（１－１）本実施の形態によるデータ予測システムの構成 Hereinafter, some embodiments of the present invention will be described in detail with reference to the drawings.
(1) First embodiment (1-1) Configuration of data prediction system according to this embodiment

　図１は、本実施の形態によるデータ予測システム全体の装置構成を示す。 FIG. 1 shows the device configuration of the entire data prediction system according to this embodiment.

　データ処理システム１は、例えば電力事業分野に適用されている場合、過去の電力需要の観測値を分析し、任意の対象期間の電力の需要量や取引価格の予測値などを算出する。予測値に基づき、発電機の運転計画の策定と実行、そして、他の電気事業者からの電力の調達取引計画の策定や実行、送配電設備等の形成計画など電力の需給管理が可能になる。 For example, when applied to the electric power business field, the data processing system 1 analyzes observed values of past electric power demand and calculates predicted values of electric power demand and transaction prices for any target period. Based on the predicted values, it becomes possible to manage the supply and demand of electricity, including the formulation and execution of generator operation plans, the formulation and execution of electricity procurement transaction plans from other electric utilities, and the formation planning of power transmission and distribution facilities, etc. .

　データ処理システム１は、利用者２、データ予測システム３、観測データ提供者４、観測データ記憶装置５、外部データ提供者６、外部データ記憶装置７、需給管理設備８、制御装置９、および通信経路１０から構成される。通信経路１０は、例えばＬＡＮ（Ｌｏｃａｌ　Ａｒｅａ　Ｎｅｔｗｏｒｋ）やＷＡＮ（Ｗｉｄｅ　Ａｒｅａ　Ｎｅｔｗｏｒｋ）のような通信ネットワークで良く、データ処理システム１を構成する各種装置および端末を互いに通信可能に接続する通信経路である。制御装置９は、データ予測システム３で算出した予測データを用い、発電機や通信局などの設備の運用、制御、市場取引、設備形成などに関する計画の作成と実行を行って良い。 The data processing system 1 includes a user 2, a data prediction system 3, an observation data provider 4, an observation data storage device 5, an external data provider 6, an external data storage device 7, a supply and demand management facility 8, a control device 9, and a communication device. It consists of a route 10. The communication path 10 may be, for example, a communication network such as a LAN (Local Area Network) or a WAN (Wide Area Network), and is a communication path that connects various devices and terminals that make up the data processing system 1 so that they can communicate with each other. The control device 9 may use the prediction data calculated by the data prediction system 3 to create and execute plans regarding the operation and control of equipment such as generators and communication stations, market transactions, equipment formation, and the like.

　具体例として、データ処理システム１を電力の需給管理を行うシステムにおいて実施した場合、図２に示す装置構成が考えられる。 As a specific example, when the data processing system 1 is implemented in a system that performs power supply and demand management, the device configuration shown in FIG. 2 can be considered.

　利用者２は、需給管理設備８の運用者にあたる。観測データ提供者４は需要家に該当し、観測データ記憶装置５は電力計測装置に該当する。外部データ提供者６は公共データ提供者に該当し、外部データ記憶装置７は公共データ記憶装置に該当する。また、需給管理設備８には発電機や蓄電設備や開閉器などが含まれ、制御装置９は、例えば、市場取引の管理装置、発電機の制御装置、蓄電設備の制御装置、および、開閉器の制御装置、のうちの少なくとも一つである。 The user 2 corresponds to the operator of the supply and demand management equipment 8. The observation data provider 4 corresponds to a consumer, and the observation data storage device 5 corresponds to a power measuring device. The external data provider 6 corresponds to a public data provider, and the external data storage device 7 corresponds to a public data storage device. In addition, the supply and demand management equipment 8 includes a generator, power storage equipment, a switch, etc., and the control device 9 includes, for example, a market transaction management device, a generator control device, a power storage equipment control device, and a switch. at least one of the controllers.

　なお、公共データは、例えば、下記のうちの少なくとも一つを含むデータで良い。
・気温、湿度、日射量、風速、気圧などの気象データ。
・年月日、曜日、任意に設定した日の種別を示すフラグ値などの暦日データ。
・台風やイベントなどの突発事象の発生有無を示すデータ。
・産業動態（例えば、エネルギーの消費者数、エネルギー消費者の工場やオフィスや一般家庭などの種別を表す属性、エネルギー消費者の業種、業種ごとや企業ごとの生産数や売上額）のデータ。
・地域ごとの地形あるいは気候の特性を示すデータ。
・通信基地局に接続する通信端末数などのデータ。
・過去の観測データそのもの。 Note that the public data may be, for example, data including at least one of the following.
- Weather data such as temperature, humidity, solar radiation, wind speed, and atmospheric pressure.
・Calendar date data such as year, month, day, day of the week, and flag values indicating the type of arbitrarily set day.
・Data indicating the occurrence of sudden events such as typhoons and events.
- Data on industrial dynamics (for example, the number of energy consumers, attributes indicating the type of energy consumers such as factories, offices, and households, the industry of energy consumers, and the number of production and sales amount for each industry and company).
・Data showing the topography or climate characteristics of each region.
-Data such as the number of communication terminals connected to a communication base station.
・Past observation data itself.

　観測データ記憶装置５は、データソースの一例であり、観測データ群を記憶する。観測データ群は、それぞれ周期性を持つ複数の観測データで構成される。観測データは、データ予測を行うための入力データとなるデータであり、過去の観測値の時系列のデータで良い。 The observation data storage device 5 is an example of a data source and stores observation data groups. The observation data group is composed of a plurality of observation data each having periodicity. The observed data is data that serves as input data for performing data prediction, and may be time-series data of past observed values.

　観測データは、例えば、下記のうちの少なくとも一つを含むデータで良い。また、観測データは、計測器単位毎のデータ（例えば、スマートメータ毎のデータ）でも良いし、複数の計測器の合計としてのデータ（例えば、所定のエリアに属する全てのスマートメータの観測データの平均値としての観測データ）でも良い。
・電力、ガス、水道などのエネルギー消費量データ。
・太陽光発電や風力発電などのエネルギーの生産量データ。
・卸取引所で取引されるエネルギーの取引価格のデータ。
・通信基地局などで計測される通信量データ。
・自動車などの移動体の位置情報の履歴データ。 The observation data may be, for example, data including at least one of the following. Furthermore, observation data may be data for each measuring instrument (for example, data for each smart meter), or data as a total of multiple measuring instruments (for example, observation data for all smart meters belonging to a predetermined area). (observed data as an average value) may also be used.
- Energy consumption data such as electricity, gas, water, etc.
- Energy production data such as solar power generation and wind power generation.
・Data on energy transaction prices traded on wholesale exchanges.
・Communication amount data measured at communication base stations, etc.
・Historical data of location information of moving objects such as cars.

　観測データ記憶装置５は、他装置からのデータ取得要求に応じて、観測データの検索または送信、あるいはその両方を行う。 The observation data storage device 5 searches for and/or transmits observation data in response to data acquisition requests from other devices.

　外部データ記憶装置７は、データ予測を行うための入力データであり、１つ乃至複数の観測データと紐づく外部データを記憶する。外部データは、予め紐づけられていても良いし、グループ分類部３５２２のような処理部により動的に紐づけられても良い。また、外部データは、過去の値を表すデータでも良いし将来の値を表すデータでも良い。 The external data storage device 7 is input data for performing data prediction, and stores external data linked to one or more observation data. The external data may be linked in advance, or may be linked dynamically by a processing unit such as the group classification unit 3522. Furthermore, the external data may be data representing past values or data representing future values.

　外部データは、例えば、下記のうちの少なくとも一つを含むデータで良い。外部データは、観測データと同様に値の時系列のデータで良い。
・気温、湿度、日射量、風速、気圧などの気象データ。
・年月日、曜日、任意に設定した日の種別を示すフラグ値などの暦日データ。
・台風やイベントなどの突発事象の発生有無を示すデータ。
・産業動態（例えば、エネルギーの消費者数、エネルギー消費者の工場やオフィスや一般家庭などの種別を表す属性、エネルギー消費者の業種、業種ごとや企業ごとの生産数や売上額）のデータ。
・位置情報や地域ごとの地形あるいは気候の特性を示すデータ。
・通信基地局に接続する通信端末数などのデータ。
・過去の観測データそのもの。 The external data may be, for example, data including at least one of the following. The external data may be time-series data of values, similar to observation data.
- Weather data such as temperature, humidity, solar radiation, wind speed, and atmospheric pressure.
・Calendar date data such as year, month, day, day of the week, and flag values indicating the type of arbitrarily set day.
・Data indicating the occurrence of sudden events such as typhoons and events.
- Data on industrial dynamics (for example, the number of energy consumers, attributes indicating the type of energy consumers such as factories, offices, and households, the industry of energy consumers, and the number of production and sales amount for each industry and company).
・Data showing location information and regional topographic or climate characteristics.
-Data such as the number of communication terminals connected to a communication base station.
・Past observation data itself.

　外部データ記憶装置７は、他装置からのデータ取得要求に応じて、外部データの検索または送信、あるいはその両方を行う。 The external data storage device 7 searches for and/or transmits external data in response to data acquisition requests from other devices.

　以上のように、データ処理システム１は、データ予測システム３と、発電設備および蓄電設備のうちの少なくとも一つを制御する電力制御システム（例えば、需給管理設備８）とを備える。データ予測システム３は、後述の適合データセットを用いて予測処理を行うことで予測データを出力する。電力制御システムは、予測データを受信し、当該予測データを用いて発電および蓄電の少なくとも一つの計画を作成し、当該作成した計画を基に発電設備および蓄電設備のうちの少なくとも一つを制御する。データ予測システム３の予測精度は後述するように高いため、作成される計画は制御に適した計画であり、故に、好適な発電および／または蓄電が実現され、以って、好適な電力需給制御が期待できる。
（１－２）装置内部構成 As described above, the data processing system 1 includes the data prediction system 3 and the power control system (for example, the supply and demand management equipment 8) that controls at least one of the power generation equipment and the power storage equipment. The data prediction system 3 outputs predicted data by performing a prediction process using a compatible data set, which will be described later. The power control system receives the prediction data, uses the prediction data to create at least one plan for power generation and storage, and controls at least one of the power generation equipment and the power storage equipment based on the created plan. . As described later, the prediction accuracy of the data prediction system 3 is high, so the created plan is a plan suitable for control. Therefore, suitable power generation and/or storage is realized, and therefore, suitable power supply and demand control is achieved. can be expected.
(1-2) Device internal configuration

　図３は、データ処理システム１に含まれるデータ予測システム３、観測データ記憶装置５、および外部データ記憶装置７の装置構成を示す。 FIG. 3 shows the device configuration of the data prediction system 3, observed data storage device 5, and external data storage device 7 included in the data processing system 1.

　データ予測システム３は、入力装置３２、出力装置３３、通信装置３４、記憶装置３５およびそれらに接続されたＣＰＵ（Ｃｅｎｔｒａｌ　Ｐｒｏｃｅｓｓｉｎｇ　Ｕｎｉｔ）３１から構成される。データ予測システム３は、例えばパーソナルコンピュータ、サーバコンピュータまたはハンドヘルドコンピュータなどの情報処理システムである。データ予測システム３は、そのような物理的な計算機システム（一つ以上の物理的な計算機）でもよいし、物理的な計算機システムに基づく論理的な計算機システム（例えば、クラウドコンピューティングサービスとしてのシステム）でも良い。入力装置３２および出力装置３３は無くても良い。通信装置３４がインターフェース装置の一例であり、ＣＰＵ３１がプロセッサの一例である。 The data prediction system 3 includes an input device 32, an output device 33, a communication device 34, a storage device 35, and a CPU (Central Processing Unit) 31 connected thereto. The data prediction system 3 is an information processing system such as a personal computer, a server computer, or a handheld computer. The data prediction system 3 may be such a physical computer system (one or more physical computers) or a logical computer system based on a physical computer system (for example, a system as a cloud computing service). ) but that's fine. The input device 32 and the output device 33 may be omitted. The communication device 34 is an example of an interface device, and the CPU 31 is an example of a processor.

　入力装置３２は、例えばキーボードまたはマウスから構成され、出力装置３３は、例えばディスプレイまたはプリンタから構成される。また通信装置３４は、例えば無線ＬＡＮまたは有線ＬＡＮに接続するためのＮＩＣ（Ｎｅｔｗｏｒｋ　Ｉｎｔｅｒｆａｃｅ　Ｃａｒｄ）を備えて構成される。また、記憶装置３５は、ＲＡＭ（Ｒａｎｄｏｍ　Ａｃｃｅｓｓ　Ｍｅｍｏｒｙ）やＲＯＭ（Ｒｅａｄ　Ｏｎｌｙ　Ｍｅｍｏｒｙ）などの記憶媒体である。出力装置３３を介して、各処理部の出力結果や、中間結果を適宜出力しても良い。 The input device 32 is composed of, for example, a keyboard or a mouse, and the output device 33 is composed of, for example, a display or a printer. Further, the communication device 34 is configured to include, for example, a NIC (Network Interface Card) for connecting to a wireless LAN or a wired LAN. Further, the storage device 35 is a storage medium such as a RAM (Random Access Memory) or a ROM (Read Only Memory). The output results and intermediate results of each processing unit may be outputted as appropriate via the output device 33.

　記憶装置３５は、適合抽出部３５１、傾向分類部３５２、グループ別予測部３５３および全体予測部３５４といった機能を実現するための一つまたは複数のコンピュータプログラムを記憶する。これらのコンピュータプログラムがＣＰＵ３１に実行されることで、それらの機能が実現される。 The storage device 35 stores one or more computer programs for realizing functions such as a matching extraction section 351, a trend classification section 352, a group prediction section 353, and an overall prediction section 354. When these computer programs are executed by the CPU 31, these functions are realized.

　また、記憶装置３５は、適合データセット３５５Ａを格納するための記憶領域３５５、および、全体予測データ３５６Ａを格納するための３５６を有する。なお、記憶領域３５５および３５６は単一の記憶領域でも良い。 Furthermore, the storage device 35 has a storage area 355 for storing a matching data set 355A, and a storage area 356 for storing overall prediction data 356A. Note that the storage areas 355 and 356 may be a single storage area.

　適合データセット３５５Ａは、全体予測データ３５６Ａの算出に用いるデータベース情報またはテキスト情報などで良く、観測データ群５２１Ａ（複数の観測データ）から抽出された一部としてのデータセット（二つ以上の観測データの集合）である。 The compatible data set 355A may be database information or text information used to calculate the overall prediction data 356A, and may be a data set (two or more observation data) extracted from the observation data group 521A (a plurality of observation data). ).

　全体予測データ３５６Ａは、次のように生成される。すなわち、適合データセット３５５Ａに含まれる二つ以上の観測データが、推移様態が類似する１つ乃至複数のグループに分類される。グループ別に、予測モデルが構築され、且つ、当該予測モデルを用いて予測データが生成される。各グループの予測データから生成された全体の予測データが、全体予測データ３５６Ａである。全体予測データ３５６Ａは、データベース情報、テキスト情報、または、算出した値をグラフ化した画像情報などで良い。 The overall prediction data 356A is generated as follows. That is, two or more pieces of observed data included in the compatible data set 355A are classified into one or more groups having similar transition patterns. A prediction model is constructed for each group, and prediction data is generated using the prediction model. The overall prediction data generated from the prediction data of each group is the overall prediction data 356A. The overall prediction data 356A may be database information, text information, or image information that is a graph of calculated values.

　観測データ記憶装置５は、少なくとも通信装置５１、記憶装置５２およびそれらに接続されたＣＰＵ５３から構成される。記憶装置５２は、観測データ群５２１Ａを格納するための記憶領域５２１を有する。 The observation data storage device 5 includes at least a communication device 51, a storage device 52, and a CPU 53 connected thereto. The storage device 52 has a storage area 521 for storing observation data group 521A.

　外部データ記憶装置７は、少なくとも通信装置７１、記憶装置７２およびそれらに接続されたＣＰＵ７３から構成される。記憶装置７２は、外部データ群７２１Ａを格納するための記憶領域７２１を有する。外部データ群７２１Ａは、複数の外部データで構成される。 The external data storage device 7 includes at least a communication device 71, a storage device 72, and a CPU 73 connected thereto. The storage device 72 has a storage area 721 for storing an external data group 721A. The external data group 721A is composed of a plurality of external data.

　データ予測システム３は、観測データ記憶装置５および外部データ記憶装置７から取得した観測データおよび外部データを用いてデータ予測を行う。 The data prediction system 3 performs data prediction using observation data and external data acquired from the observation data storage device 5 and the external data storage device 7.

　適合抽出部３５１は、周期性を持つ観測データの集合である観測データ群５２１Ａから複数のデータセット候補を抽出する。データセット候補は、二つ以上の観測データの集合である。適合抽出部３５１は、抽出したデータセット候補毎に、当該データセット候補に含まれるそれぞれの観測データを、複数の相違なる周波数成分の大きさを表すデータに変換した後、各周波数成分と一対一で対応する次元を持つ多次元の特徴空間に当該データをマッピングする。適合抽出部３５１は、特徴空間において距離が近いデータ同士をグループ化することで、データを複数のグループに分類する。適合抽出部３５１は、各データセット候補について、グループにそれぞれ含まれるデータ数について分散などばらつきを表す指標を算出し、複数のデータセット候補の中から前記指標が最小となる候補をデータ予測に用いる適合データセットとして出力する。
（１－３）本実施の形態によるデータ予測システムの処理およびデータフロー The matching extraction unit 351 extracts a plurality of data set candidates from the observation data group 521A, which is a collection of periodic observation data. A dataset candidate is a collection of two or more observed data. For each extracted data set candidate, the matching extraction unit 351 converts each observed data included in the data set candidate into data representing the magnitudes of a plurality of different frequency components, and then performs one-on-one matching with each frequency component. map the data to a multidimensional feature space with corresponding dimensions. The matching extraction unit 351 classifies data into a plurality of groups by grouping data that are close to each other in the feature space. The suitability extraction unit 351 calculates, for each dataset candidate, an index representing variation such as variance for the number of data included in each group, and uses the candidate with the minimum index from among the plurality of dataset candidates for data prediction. Output as a conforming data set.
(1-3) Processing and data flow of data prediction system according to this embodiment

　図４および図５を用いて、本実施の形態におけるデータ予測システム３のデータフローおよび処理フローの説明を行う。 The data flow and processing flow of the data prediction system 3 in this embodiment will be explained using FIGS. 4 and 5.

　本実施の形態におけるデータ予測システム３は、観測データ記憶装置５、および外部データ記憶装置７からそれぞれ観測データ群５２１Ａ、外部データ群７２１Ａを取得する。 The data prediction system 3 in this embodiment acquires an observation data group 521A and an external data group 721A from the observation data storage device 5 and the external data storage device 7, respectively.

　観測データ群５２１Ａは、適合抽出部３５１に入力される。適合抽出部３５１は、入力された観測データ群５２１Ａから複数のデータセット候補を抽出し、当該複数のデータセット候補の内１つを適合データセット３５５Ａとして出力する（Ｓ３０１）。 The observed data group 521A is input to the matching extraction unit 351. The compatible extraction unit 351 extracts a plurality of data set candidates from the input observation data group 521A, and outputs one of the plurality of data set candidates as the compatible data set 355A (S301).

　適合データセット３５５Ａは、外部データ群７２１Ａと共に傾向分類部３５２へ入力される。傾向分類部３５２は、入力された適合データセット３５５Ａが含む観測データと外部データ群７２１Ａが含む外部データとを紐づける。傾向分類部３５２は、適合データセット３５５Ａに含まれるそれぞれの観測データ（典型的には時系列データ）を周波数成分の大きさを要素に持つ多次元ベクトルに変換し、ベクトル間の距離が近いデータ同士をまとめることで、適合データセット３５５Ａと外部データ群７２１Ａに含まれるデータを１つ乃至複数のグループに分類する（Ｓ３０２）。グループには、観測データが含まれ、観測データに外部データが紐づけられている。「距離」とは、ユークリッド距離、マハラノビス距離、マンハッタン距離、チェビシェフ距離、ミンコフスキー距離などの、距離の公理を満たす一般の距離尺度や、コサイン類似度などの類似度で良い。また、グループ化の処理は、例えばＷａｒｄ法、単リンク法、完全リンク法、重心法などに代表される階層型クラスタリング手法や、ｋ－ｍｅａｎｓ、ＥＭアルゴリズムやスペクトラルクラスタリングといった近傍最適手法としてのクラスタリング手法、もしくは教師なしＳＶＭ（Ｓｕｐｐｏｒｔ　Ｖｅｃｔｏｒ　Ｍａｃｈｉｎｅ）やＶＱアルゴリズム、ＳＯＭ（Ｓｅｌｆ－Ｏｒｇａｎｉｚｉｎｇ　Ｍａｐｓ）といった識別境界最適としてのクラスタリング手法などにより行う処理で良い。 The matching data set 355A is input to the trend classification unit 352 together with the external data group 721A. The trend classification unit 352 links the observed data included in the input compatible data set 355A with the external data included in the external data group 721A. The trend classification unit 352 converts each observed data (typically time series data) included in the adapted data set 355A into a multidimensional vector whose elements are the magnitudes of frequency components, and classifies data with close distances between the vectors. By grouping them together, the data included in the compatible data set 355A and the external data group 721A are classified into one or more groups (S302). A group includes observation data, and external data is linked to the observation data. The "distance" may be a general distance measure that satisfies the axiom of distance, such as Euclidean distance, Mahalanobis distance, Manhattan distance, Chebyshev distance, or Minkowski distance, or a degree of similarity such as cosine similarity. In addition, the grouping process can be performed using hierarchical clustering methods such as the Ward method, single link method, complete link method, or centroid method, or clustering methods as neighborhood optimal methods such as k-means, EM algorithm, or spectral clustering. Alternatively, processing may be performed using a clustering method for optimal identification boundary such as unsupervised SVM (Support Vector Machine), VQ algorithm, or SOM (Self-Organizing Maps).

　傾向分類部３５２により得られた複数のグループは、グループ別予測部３５３に入力される。グループ別予測部３５３は、傾向分類部３５２により作成されたグループ別に、予測モデルを構築し、各予測モデルを用いてグループ別の予測データ（例えば、予測された値の時系列のデータ）を算出する（Ｓ３０３）。 The plurality of groups obtained by the trend classification section 352 are input to the group-by-group prediction section 353. The group prediction unit 353 constructs a prediction model for each group created by the trend classification unit 352, and calculates prediction data for each group (for example, time series data of predicted values) using each prediction model. (S303).

　グループ別予測部３５３により算出されたグループ別の予測データは、全体予測部３５４に入力される。全体予測部３５４は、グループ別予測部３５３により算出されたグループ別の予測データや、適合抽出部３５１により抽出されなかった残余分の観測データを用いて、全体の予測データを算出し出力する（Ｓ３０４）。観測データが需要家毎のデータである場合、全体予測データは、当該需要家が属するエリア全体についての予測データである。 The group-specific prediction data calculated by the group-specific prediction unit 353 is input to the overall prediction unit 354. The overall prediction unit 354 calculates and outputs the overall prediction data using the group-specific prediction data calculated by the group-specific prediction unit 353 and the residual observed data not extracted by the adaptive extraction unit 351 ( S304). When the observed data is data for each customer, the overall prediction data is prediction data for the entire area to which the customer belongs.

　全体予測部３５４により算出された全体予測データ３５６Ａは、需給管理設備８に入力される。需給管理設備８は、入力された全体予測データ３５６Ａを用いて、発電機、蓄電設備、開閉器などを制御する。 The overall prediction data 356A calculated by the overall prediction unit 354 is input to the supply and demand management equipment 8. The supply and demand management equipment 8 uses the input overall forecast data 356A to control the generator, power storage equipment, switch, etc.

　以降、各部の詳細を説明する。
（１－４）各構成要素の詳細
（１－４－１）適合抽出部３５１ The details of each part will be explained below.
(1-4) Details of each component (1-4-1) Compatibility extraction unit 351

　図６、図７、図８を用いて、適合抽出部３５１の実施形態を説明する。 An embodiment of the matching extraction unit 351 will be described using FIGS. 6, 7, and 8.

　図６は、適合抽出部３５１内部のデータフローを示す。 FIG. 6 shows the data flow inside the matching extraction unit 351.

　適合抽出部３５１は、候補生成部３５１１、指標算出部３５１３、グループ分類部３５１４、および、候補選定部３５１５から構成される。記憶領域３５１２は、記憶装置３５における一領域であり、データセット候補３５１２Ａが中間出力として格納される。 The suitability extraction unit 351 includes a candidate generation unit 3511, an index calculation unit 3513, a group classification unit 3514, and a candidate selection unit 3515. The storage area 3512 is an area in the storage device 35, and the data set candidate 3512A is stored as an intermediate output.

　候補生成部３５１１は、観測データ群５２１Ａから複数のデータセット候補３５１２Ａを抽出し、記憶領域３５１２に各データセット候補３５１２Ａを格納する。データセット候補の抽出は、無作為抽出でも良いし、複数のデータ分類の各々から一部の観測データを抽出することでも良い。データ分類は、下記のうちのいずれでも良い。
・ラベルが同じ観測データの集合（ラベルは、観測データが得られた計測器が属する属性（例えば、工場、商業施設、家庭など）に依存して良い）。
・統計量が同じまたは類似の観測データの集合（統計量は、観測データが表すデータ推移の平均値、最大値、最小値、分散などで良い）。 The candidate generation unit 3511 extracts a plurality of dataset candidates 3512A from the observed data group 521A, and stores each dataset candidate 3512A in the storage area 3512. Data set candidates may be extracted at random, or by extracting some observed data from each of a plurality of data classifications. Data classification may be any of the following.
- A collection of observation data with the same label (the label may depend on the attribute (for example, factory, commercial facility, home, etc.) to which the measuring instrument from which the observation data was obtained belongs).
- A collection of observed data with the same or similar statistics (statistics may be the average value, maximum value, minimum value, variance, etc. of the data transition represented by the observed data).

　また、各データセット候補に含まれる観測データの数は、利用者２が任意に定めて良い。 Furthermore, the number of observation data included in each dataset candidate may be arbitrarily determined by the user 2.

　指標算出部３５１３は、各データセット候補３５１２Ａについて、データセット候補３５１２Ａに含まれる観測データ毎に、観測データを、周波数成分の大きさを各要素に持つ多次元ベクトルデータに変換する。変換の処理は、例えばデータ推移を表す値を正規化する処理か、データ推移を表す値のフーリエ変換やウェーブレット変換処理、あるいはその両方で良い。変換処理により算出する周波数成分として、事前に任意の周波数成分が指定されて良い。指定される周波数成分は、異なる複数の周期成分でよく、周期成分は、年、月、週、日単位や、１時間、０．５時間単位の周期成分から選択されて良い。例として、電力分野に置いて特徴的な成分が現れやすい１年（３６５日）、半年（１８０日）、３月（９０日）、１月（３０日）、１週間（７日）、１日、半日（１２時間）、６時間、１時間および３０分（０．５時間）のうちの二つ以上の周期成分が採用されて良い。事前に周波数成分を定める方法の他に、ベクトルデータ間で値のばらつきが大きいような成分が機械的に周波数成分として採用されても良い。 For each data set candidate 3512A, the index calculation unit 3513 converts the observed data into multidimensional vector data in which each element has the magnitude of a frequency component. The conversion process may be, for example, a process of normalizing a value representing a data transition, a Fourier transform or a wavelet transform process of a value representing a data transition, or both. Any frequency component may be designated in advance as the frequency component to be calculated by the conversion process. The specified frequency component may be a plurality of different periodic components, and the periodic component may be selected from yearly, monthly, weekly, daily, 1 hour, and 0.5 hour period components. As examples, characteristic components tend to appear in the electric power field: one year (365 days), half a year (180 days), March (90 days), January (30 days), one week (7 days), one year Two or more periodic components of a day, half a day (12 hours), 6 hours, 1 hour and 30 minutes (0.5 hours) may be employed. In addition to the method of determining frequency components in advance, components whose values vary widely among vector data may be mechanically adopted as frequency components.

　グループ分類部３５１４は、各データセット候補３５１２Ａについて、指標算出部３５１３により変換された後のベクトルデータの距離が近いデータ同士をまとめることで、適合データセット３５５Ａにおける二つ以上の観測データについて算出された二つ以上のベクトルデータを１つ乃至複数のグループに分類する。グループの数は、事前に任意に決定された数でも良いし、ＡＩＣ（Ａｋａｉｋｅ　Ｉｎｆｏｒｍａｔｉｏｎ　Ｃｒｉｔｅｒｉｏｎ）やＢＩＣ（Ｂａｙｅｓｉａｎ　Ｉｎｆｏｒｍａｔｉｏｎ　Ｃｒｉｔｅｒｉｏｎ）等の情報量基準が最小となるような数でも良い。 For each data set candidate 3512A, the group classification unit 3514 collects data whose vector data distances are close after being converted by the index calculation unit 3513, so that the group classification unit 3514 can perform calculations for two or more observed data in the compatible data set 355A. The two or more vector data obtained are classified into one or more groups. The number of groups may be arbitrarily determined in advance, or may be a number that minimizes an information amount criterion such as AIC (Akaike Information Criterion) or BIC (Bayesian Information Criterion).

　候補選定部３５１５は、グループのデータ数の分散などばらつきを表す指標を、データセット候補３５１２Ａ毎に計算する。候補選定部３５１５は、複数のデータセット候補３５１２Ａの内、指標が最も小さいデータセット候補３５１２Ａを、データ予測（予測モデルを用いた予測処理）に用いる適合データセット３５５Ａとして出力する。 The candidate selection unit 3515 calculates, for each dataset candidate 3512A, an index representing variation such as the variance of the number of data in a group. The candidate selection unit 3515 outputs the dataset candidate 3512A with the smallest index among the plurality of dataset candidates 3512A as the compatible dataset 355A used for data prediction (prediction processing using a prediction model).

　図７は、適合抽出部３５１の処理フローを示す。本処理フローは、図５に示したＳ３０１の内部処理に相当する。 FIG. 7 shows the processing flow of the matching extraction unit 351. This processing flow corresponds to the internal processing of S301 shown in FIG.

　適合抽出部３５１は、Ｓ３０１２からＳ３０１４の処理をＮ回繰り返す（Ｓ３０１１）。なお、Ｓ３０１２からＳ３０１４の処理をＮ回繰り返す代わりに、適合抽出部３５１は、Ｓ３０１２、Ｓ３０１３、Ｓ３０１４の各処理をＮ回ずつ繰り返しても良い。Ｎは、２以上の整数である。 The matching extraction unit 351 repeats the processing from S3012 to S3014 N times (S3011). Note that instead of repeating the processes from S3012 to S3014 N times, the matching extraction unit 351 may repeat each process of S3012, S3013, and S3014 N times. N is an integer of 2 or more.

　まず、適合抽出部３５１は、観測データ群５２１Ａからデータセット候補３５１２Ａを抽出する（Ｓ３０１２）。 First, the matching extraction unit 351 extracts the dataset candidate 3512A from the observed data group 521A (S3012).

　次に、指標算出部３５１３は、Ｓ３０１２で抽出されたデータセット候補３５１２Ａに含まれる各観測データを、周波数成分の大きさを各要素に持つ多次元ベクトルデータ（周期性指標）に変換する（Ｓ３０１３）。 Next, the index calculation unit 3513 converts each observation data included in the dataset candidate 3512A extracted in S3012 into multidimensional vector data (periodicity index) having each element as a magnitude of a frequency component (S3013 ).

　次に、グループ分類部３５１４は、距離が近いベクトルデータ同士をまとめることで、データセット候補３５１２Ａにおける二つ以上の観測データについての二つ以上のベクトルデータを複数のグループに分類する（Ｓ３０１４）。グループ（データグループ）は、ベクトルデータ間の距離が閾値以下のベクトルデータのグループである。閾値は、二つ以上のベクトルデータに基づき決定された代表的なベクトルデータからの距離の閾値でも良いし、算出されたベクトルデータ間の相対的な距離の閾値でも良い。 Next, the group classification unit 3514 classifies two or more vector data regarding two or more observed data in the dataset candidate 3512A into a plurality of groups by grouping together vector data that are close to each other (S3014). A group (data group) is a group of vector data in which the distance between vector data is equal to or less than a threshold value. The threshold may be a distance threshold from representative vector data determined based on two or more vector data, or a relative distance threshold between calculated vector data.

　最後に、候補選定部３５１５は、複数（Ｎ）のデータセット候補それぞれについて、当該データセット候補についてのグループのデータ数について、分散などばらつきを表す指標を算出し、指標が最小のデータセット候補を、適合データセット３５５Ａとして出力する（Ｓ３０１５）。 Finally, for each of the plurality of (N) dataset candidates, the candidate selection unit 3515 calculates an index representing variation, such as variance, with respect to the number of data in the group for the dataset candidate, and selects the dataset candidate with the smallest index. , and output as a compatible data set 355A (S3015).

　以上を以って、適合抽出部３５１による抽出処理が完了する。 With the above, the extraction process by the matching extraction unit 351 is completed.

　図８Ａおよび図８Ｂを用いて、適合抽出部３５１の処理内容をより具体的に説明する。例として、観測データ群５２１Ａにおける観測データは、電力消費の時系列データとする。 The processing contents of the matching extraction unit 351 will be explained in more detail using FIGS. 8A and 8B. As an example, the observed data in the observed data group 521A is time series data of power consumption.

　まず、候補生成部３５１１は、観測データ群５２１ＡからＮのデータセット候補を抽出する。各データセット候補には、二つ以上の観測データ５２１Ｂ１（電力消費量の推移データ）が含まれる。Ｎのデータセット候補において、データセット候補における観測データの数は同じでも良いし異なっていても良い。また、Ｎのデータセット候補は、それぞれ、対象期間についての二つ以上の観測データで構成される。「対象期間」は、現在から過去１年間や半年といった過去の期間で良く、当該過去の期間は、予測対象期間（将来の期間）に対応した期間で良い（例えば予測対象期間が１月～１２月の場合、当該過去の期間も１月～１２月で良い）。 First, the candidate generation unit 3511 extracts N data set candidates from the observed data group 521A. Each data set candidate includes two or more observation data 521B1 (power consumption transition data). Among the N dataset candidates, the number of observed data in the dataset candidates may be the same or different. Further, each of the N data set candidates is composed of two or more pieces of observation data for the target period. The "target period" may be a past period such as the past year or half a year from the present, and the past period may be a period corresponding to the forecast target period (future period) (for example, the forecast target period is from January to December In the case of months, the relevant past period can also be from January to December).

　次に、指標算出部３５１３は、各観測データ５２１Ｂ１にフーリエ変換を施すことで、各観測データ５２１Ｂ１を、周波数成分の大きさを要素とするベクトルデータ５２１Ｂ２に変換する。フーリエ変換以外の変換、例えばウェーブレット変換が採用されても良い。図８Ａは、便宜的に、観測データ５２１Ｂ１は、１日周期成分と半日周期成分の２次元のベクトルデータに変換されているが、それに代えて、１年周期、半年周期などの長周期成分や、１時間周期、３０分周期などの短周期成分も持つ、より多次元のベクトルデータに変換されても良い。 Next, the index calculation unit 3513 converts each observation data 521B1 into vector data 521B2 whose elements are the magnitudes of frequency components by performing Fourier transform on each observation data 521B1. Transforms other than Fourier transform, such as wavelet transform, may be employed. In FIG. 8A, for convenience, observation data 521B1 is converted into two-dimensional vector data of a daily periodic component and a semi-daily periodic component, but instead, long periodic components such as a yearly period and a half-yearly period, etc. , 1-hour period, 30-minute period, etc., and may be converted into more multidimensional vector data.

　次に、グループ分類部３５１４は、周波数成分の大きさに変換したベクトルデータを、距離尺度の小さいデータ同士でグループ化する。例として、点線で表したグループの境界５２１Ｂ３により分類されるグループ１乃至３が作成される。なお、説明の簡略化のためデータセット候補１および２で同一のグループ数としているが、データセット候補毎にグループ数が異なっても良い。 Next, the group classification unit 3514 groups the vector data converted into the magnitude of the frequency component into data with a small distance measure. As an example, groups 1 to 3 classified by group boundaries 521B3 indicated by dotted lines are created. Note that, to simplify the explanation, the number of groups is the same for dataset candidates 1 and 2, but the number of groups may be different for each dataset candidate.

　次に、候補選定部３５１５は、各データセット候補について、グループ毎のデータ数を集計する（符号５２１Ｂ４を参照）。候補選定部３５１５は、グループ毎の集計したデータ数から、データセット候補毎にグループ毎のデータ数の分散を計算する（符号５２１Ｂ５を参照）。最後に、候補選定部３５１５は、分散が最小となるデータセット候補２を、適合データセット３５５Ａとして出力する。
（１－４－２）傾向分類部３５２ Next, the candidate selection unit 3515 totals the number of data for each group for each dataset candidate (see reference numeral 521B4). The candidate selection unit 3515 calculates the variance of the number of data for each group for each dataset candidate from the total number of data for each group (see reference numeral 521B5). Finally, the candidate selection unit 3515 outputs the dataset candidate 2 with the minimum variance as the compatible dataset 355A.
(1-4-2) Trend classification unit 352

　傾向分類部３５２は、適合データセット３５５Ａと外部データ群７２１Ａとを入力として取得し、適合データセット３５５Ａに含まれる観測データと外部データ群７２１Ａに含まれる外部データとを紐づける。傾向分類部３５２は、適合データセット３５５Ａに含まれる観測データの特徴量データ（ベクトルデータ）同士の類似の度合いにより、適合データセット３５５Ａに含まれる二つ以上の観測データを、一つ乃至複数のグループに分類する。 The trend classification unit 352 receives the compatible data set 355A and the external data group 721A as input, and links the observed data included in the compatible data set 355A with the external data included in the external data group 721A. The trend classification unit 352 classifies two or more pieces of observation data included in the fit data set 355A into one or more types based on the degree of similarity between the feature data (vector data) of the observation data included in the fit data set 355A. Categorize into groups.

　図９を用いて傾向分類部３５２の処理内容をより具体的に説明する。 The processing contents of the trend classification unit 352 will be explained in more detail using FIG. 9.

　傾向分類部３５２は、特徴量変換部３５２１およびグループ分類部３５２２から構成される。記憶領域３５２３は、記憶装置３５の一領域であり、記憶領域３５２３には、中間出力のグループ情報付観測データ３５２３Ａ、およびグループ情報付外部データ３５２３Ｂが格納される。 The trend classification section 352 includes a feature amount conversion section 3521 and a group classification section 3522. The storage area 3523 is one area of the storage device 35, and the intermediate output observation data with group information 3523A and the external data with group information 3523B are stored.

　特徴量変換部３５２１は、適合データセット３５５Ａを入力として受け取り、適合データセット３５５Ａに含まれる各観測データを特徴量データに変換する。特徴量データは、上述した多次元ベクトルデータ（周期性指標）で良い。 The feature value conversion unit 3521 receives the compatible data set 355A as input, and converts each observed data included in the compatible data set 355A into feature data. The feature amount data may be the multidimensional vector data (periodicity index) described above.

　グループ分類部３５２２は、適合データセット３５５Ａにおける観測データ毎のベクトルデータ（特徴量変換部３５２１から出力された特徴量データ）と、外部データ群７２１Ａとを入力として受け取る。グループ分類部３５２２は、ベクトルデータに、当該ベクトルデータに対応した観測データに紐づいている外部データを紐づける。次に、グループ分類部３５２２は、各ベクトルデータ間の距離が近い組み合わせをグループ化することで、適合データセット３５５Ａ内の二つ以上の観測データについての二つ以上のベクトルデータを一つ乃至複数のグループに分類する。グループ（データグループ）において、距離、グループへの分類（グループ化の処理）、および、グループの数は、それぞれ上述した通り（適合抽出部３５１について述べた通り）で良い。 The group classification unit 3522 receives vector data for each observed data in the compatible data set 355A (feature data output from the feature conversion unit 3521) and the external data group 721A as input. The group classification unit 3522 links vector data with external data that is linked to observation data corresponding to the vector data. Next, the group classification unit 3522 groups the combinations in which the distance between each vector data is close, so that two or more vector data regarding two or more observed data in the compatible data set 355A are grouped into one or more vector data. Classify into groups. In a group (data group), the distance, classification into groups (grouping process), and number of groups may be as described above (as described for the matching extraction unit 351).

　最後に、グループ分類部３５２２は、グループ情報付観測データ３５２３Ａ、およびグループ情報付外部データ３５２３Ｂを、記憶領域３５２３に格納し、グループ別予測部３５３へ出力する。グループ情報付観測データ３５２３Ａは、適合データセット３５５Ａにおける各観測データを含み、当該各観測データに、当該観測データの特徴量データの分類先グループを表すグループ情報が紐づけられている。グループ情報付外部データ３５２３Ｂは、適合データセット３５５Ａにおける観測データに紐づけられた外部データを含み、当該外部データに、当該外部データが紐づけられた観測データの特徴量データの分類先グループを表すグループ情報が紐づけられている。
（１－４－３）グループ別予測部３５３ Finally, the group classification unit 3522 stores the observation data with group information 3523A and the external data with group information 3523B in the storage area 3523, and outputs them to the group-by-group prediction unit 353. The observation data with group information 3523A includes each observation data in the compatible data set 355A, and each observation data is associated with group information indicating the group to which the feature amount data of the observation data is classified. The external data with group information 3523B includes external data linked to the observed data in the compatible data set 355A, and represents the group to which the feature data of the observed data to which the external data is linked is classified. Group information is linked.
(1-4-3) Group-specific prediction unit 353

　グループ別予測部３５３では、傾向分類部３５２から出力されたグループ情報付観測データ３５２３Ａ、グループ情報付外部データ３５２３Ｂ、および外部データ群７２１Ａを入力として取得し、グループ毎に予測モデルの構築、および予測対象期間の予測データの算出を行う。 The group-by-group prediction unit 353 receives as input the observed data with group information 3523A, the external data with group information 3523B, and the external data group 721A output from the trend classification unit 352, and constructs a prediction model and makes predictions for each group. Calculate forecast data for the target period.

　図１０を用いてグループ別予測部３５３の処理内容をより具体的に説明する。 The processing contents of the group-by-group prediction unit 353 will be explained in more detail using FIG. 10.

　グループ別予測部３５３は、構築部３５３１およびグループ算出部３５３２から構成される。記憶領域３５３３は、記憶装置３５の一領域であり、記憶領域３５３３には、グループ別予測データ３５３３Ａが格納される。 The group prediction unit 353 is composed of a construction unit 3531 and a group calculation unit 3532. The storage area 3533 is an area of the storage device 35, and the group-specific prediction data 3533A is stored in the storage area 3533.

　構築部３５３１は、グループ情報付観測データ３５２３Ａと、グループ情報付外部データ３５２３Ｂを入力として受け取り、グループ別に予測モデルを構築する。グループ毎に、予測モデルは、グループ情報付観測データ３５２３Ａのある当該グループに属すデータを目的変数とし、グループ情報付外部データ３５２３Ｂの当該グループに属すデータを説明変数とした、回帰、分類、クラスタリング等のモデルや、それらを組み合わせたモデルで良い。予測モデルの構築には、グループ情報付観測データ３５２３Ａのあるグループに属すデータから平均値や中央値を取った値が採用されて良い。また、グループ別に異なるモデルが採用されても良い。 The construction unit 3531 receives the observation data with group information 3523A and the external data with group information 3523B as input, and constructs a prediction model for each group. For each group, the prediction model uses regression, classification, clustering, etc., using data belonging to the group with group information attached observation data 3523A as an objective variable, and using data belonging to the group in the group information attached external data 3523B as an explanatory variable. A model or a model that combines them is fine. In constructing the prediction model, a value obtained by taking an average value or a median value from data belonging to a certain group of the observation data with group information 3523A may be adopted. Further, different models may be adopted for each group.

　グループ算出部３５３２は、構築部３５３１から出力されたグループ別予測モデルと、外部データ群７２１Ａを入力として受け取り、それぞれのグループの予測対象期間の予測データを算出する。外部データ群７２１Ａは、予測データの算出対象とするグループの予測モデル構築に用いた観測データに紐づくデータの、予測対象期間における実績値や予報値を含んでも良い。 The group calculation unit 3532 receives the group-by-group prediction model output from the construction unit 3531 and the external data group 721A as input, and calculates prediction data for the prediction target period of each group. The external data group 721A may include actual values and forecast values in the prediction target period of data associated with observation data used to construct the prediction model of the group for which prediction data is to be calculated.

　最後に、グループ算出部３５３２は、各グループについて算出した予測データを、グループ別予測データ３５３３Ａとして記憶領域３５３３に格納し、全体予測部３５４へ出力する。
（１－４－４）全体予測部３５４ Finally, the group calculation unit 3532 stores the prediction data calculated for each group in the storage area 3533 as group-specific prediction data 3533A, and outputs it to the overall prediction unit 354.
(1-4-4) Overall prediction unit 354

　全体予測部３５４は、グループ別予測部３５３から出力されたグループ別予測データ３５３３Ａ、観測データ群５２１Ａ、適合データセット３５５Ａ、および外部データ群７２１Ａを入力として取得し、全体の予測データ（例えば、適合データセット３５５Ａに属する観測データに対応した需要家が属するエリア全体についての予測データ）を算出する。 The overall prediction unit 354 receives as input the group-specific prediction data 3533A, the observed data group 521A, the adaptive data set 355A, and the external data group 721A output from the group-specific prediction unit 353, and obtains the overall prediction data (e.g., adaptive Forecast data for the entire area to which the customer corresponding to the observation data belonging to the data set 355A belongs is calculated.

　図１１を用いて全体予測部３５４の処理内容をより具体的に説明する。 The processing contents of the overall prediction unit 354 will be explained in more detail using FIG. 11.

　全体予測部３５４は、残余予測部３５４１および全体算出部３５４２から構成される。 The overall prediction unit 354 includes a residual prediction unit 3541 and an overall calculation unit 3542.

　残余予測部３５４１は、まず観測データ群５２１Ａのうち適合データセット３５５Ａ以外の観測データの総和を取ることで、残余データを作成する。残余データは、観測データ群５２１Ａのうち適合データセット３５５Ａ以外の観測データの集合でも良い。次に、残余予測部３５４１は、残余データを目的変数とし、外部データ群７２１Ａに含まれるデータの内の残余データに紐づくデータを説明変数とし、予測モデルの構築を行う。予測モデルは、回帰、分類、クラスタリング等のモデルや、それらを組み合わせたモデルで良い。当該予測モデルの構築には、残余データから平均値や中央値を取った値が採用されても良い。最後に、残余予測部３５４１は、説明変数として用いた外部データ群７２１Ａの予測対象期間の実績値や予測値を予測モデルに入力し、残余データの予測値を算出し、全体算出部３５４２へ出力する。 The residual prediction unit 3541 first creates residual data by taking the sum of observed data other than the compatible data set 355A among the observed data group 521A. The residual data may be a set of observed data other than the compatible data set 355A among the observed data group 521A. Next, the residual prediction unit 3541 constructs a prediction model using the residual data as an objective variable and the data associated with the residual data among the data included in the external data group 721A as an explanatory variable. The predictive model may be a regression, classification, clustering, or other model, or a combination of these models. In constructing the prediction model, a value obtained by taking an average value or a median value from the residual data may be adopted. Finally, the residual prediction unit 3541 inputs the actual values and predicted values for the prediction period of the external data group 721A used as explanatory variables into the prediction model, calculates the predicted value of the residual data, and outputs it to the overall calculation unit 3542. do.

　全体算出部３５４２は、グループ別予測データ３５３３Ａ、および残余予測部３５４１から出力された残余データの予測値を用いて、需要家全体の予測値を算出する。具体的には、例えば、全体算出部３５４２は、グループ別予測データ３５３３Ａの各値と残余データの予測値との総和を取り、全体予測データ３５６Ａとして出力する。 The overall calculation unit 3542 uses the group-specific prediction data 3533A and the predicted value of the residual data output from the residual prediction unit 3541 to calculate the predicted value for the entire customer. Specifically, for example, the overall calculation unit 3542 takes the sum of each value of the group-by-group prediction data 3533A and the predicted value of the residual data, and outputs the sum as the overall prediction data 356A.

　以上の処理を以て、本実施形態における第一の演算処理が終了し、そして本実施形態におけるデータ予測システム３の演算処理が終了する。
（１－５）本実施形態の効果の一例の説明 With the above processing, the first calculation process in this embodiment is completed, and the calculation process of the data prediction system 3 in this embodiment is completed.
(1-5) Description of an example of the effect of this embodiment

　図１２を参照し、本実施の形態によるデータ予測システム３の効果を説明する。 With reference to FIG. 12, the effects of the data prediction system 3 according to this embodiment will be explained.

　図１２は、適合抽出部３５１により抽出した適合データセット３５５Ａを用いて予測対象期間についての予測データ（例えば、予測対象期間における予測される観測データ）を算出した場合の効果の一例の概要を示す図である。例として、スマートメータにより収集された電力消費量を示す観測データを対象とする。なお、凡例２４に示す通り、グラフでは、点線が観測データ（予測対象期間に対応の過去の期間における観測値の時系列）、実線が予測データ（予測対象期間についての予測された観測値の時系列）、破線が予測対象の観測データ（予測対象期間について実際に観測された観測値の時系列）をそれぞれ表す。点線の観測データは、適合データセット３５５Ａに含まれる観測データである。実線の予測データは、グループ別予測データ３５３３Ａや全体予測データ３５６Ａとして出力されるデータである。黒い点２５は、指標算出部３５１３により算出され特徴空間上にプロットされたベクトルデータ（観測データから変換されたデータ）を模擬的に示している。特徴空間は、異なる複数の周期成分をそれぞれ軸とした座標系の空間である。模擬的に、当該特徴空間に、ベクトルデータが座標としてプロットされる。 FIG. 12 shows an overview of an example of the effect when prediction data for the prediction target period (for example, observed data to be predicted in the prediction target period) is calculated using the compatible data set 355A extracted by the compatible extraction unit 351. It is a diagram. As an example, we will use observation data indicating power consumption collected by smart meters. As shown in legend 24, in the graph, the dotted line represents observed data (the time series of observed values in the past period corresponding to the forecast period), and the solid line represents the predicted data (time series of observed values predicted for the forecast period). series), and the broken lines represent the observed data to be predicted (the time series of observed values actually observed for the prediction target period). The observed data indicated by the dotted line is the observed data included in the compatible data set 355A. The prediction data indicated by the solid line is data output as the group-based prediction data 3533A or the overall prediction data 356A. The black dots 25 simulate vector data (data converted from observed data) calculated by the index calculation unit 3513 and plotted on the feature space. The feature space is a coordinate system space with axes each having a plurality of different periodic components. Vector data is plotted as coordinates in the feature space in a simulated manner.

　ケース２１は、工場やオフィスや一般家庭などの種別を表す属性の内訳が均一になるよう観測データを抽出し、同一属性で分類したグループ単位で予測データを算出するケースである。このケースでは、工場と家庭の二種類の属性が存在しており、属性別でデータがグループに分類されている。このケースでは、工場の属性を持つグループＡの中に電力消費傾向が異なるデータが混在している。傾向の違いの原因は、例えば太陽光発電の導入量の違いや業態の違いである。グループＡ内に異なる傾向のデータが混在した結果、グループＡで精度よく需要予測ができず、その結果、最終的に出力される全体予測データと、予測対象の観測データが乖離する。 Case 21 is a case in which observed data is extracted so that the breakdown of attributes representing the type of factory, office, general household, etc. is uniform, and predicted data is calculated for each group classified by the same attribute. In this case, there are two types of attributes: factory and household, and data is classified into groups by attribute. In this case, data with different power consumption trends are mixed in group A that has the factory attribute. Differences in trends are caused by, for example, differences in the amount of solar power generation installed and differences in business types. As a result of data with different trends being mixed in group A, it is not possible to accurately predict demand in group A, and as a result, there is a discrepancy between the overall forecast data that is finally output and the observed data to be predicted.

　ケース２２は、工場やオフィスや一般家庭などの種別を表す属性の内訳が均一になるよう観測データを抽出し、特徴空間上の距離尺度が小さいデータ同士をまとめたグループ単位で予測データを算出するケースである。このケースでは、同一属性で傾向が異なるデータは別グループへ分類されるが、グループＢのように予測モデルを構築するための学習用データを十分に確保できないようなグループ（つまりデータ数が極めて少ないグループ）が生じ得る。結果、当該グループについて需要予測が難しくなり、全体予測データに誤差が残る結果となる。 In case 22, observation data is extracted so that the breakdown of attributes representing types such as factories, offices, and general households is uniform, and predicted data is calculated in groups of data with small distance scales in the feature space. It is a case. In this case, data with the same attributes but different trends are classified into different groups, but groups such as group B for which it is not possible to secure sufficient training data to build a predictive model (in other words, the number of data is extremely small) are classified into different groups. group) may occur. As a result, it becomes difficult to predict demand for this group, resulting in errors remaining in the overall forecast data.

　ケース２３は、本実施の形態による抽出方法により観測データを抽出し、特徴空間上の距離尺度が小さいデータ同士をまとめたグループ単位で予測データを算出するケースである。このケースでは、グループ別のデータ数のばらつき（例えば分散）が小さくなるようなデータセットが選択されるため、グループ間でデータ数が均等になりやすい。その結果、データ数が少なくなるようなグループの発生を防止し、正確な需要予測が可能となる。 Case 23 is a case in which observed data is extracted using the extraction method according to the present embodiment, and predicted data is calculated for each group of data with a small distance scale in the feature space. In this case, a data set is selected in which the variation (for example, variance) in the number of data for each group is small, so the number of data is likely to be equal between the groups. As a result, it is possible to prevent groups with a small amount of data from occurring and to make accurate demand predictions possible.

　すなわち、本実施の形態では、同一属性のグループにデータを分類してグループ別に予測をすることの技術的課題、具体的には、観測値の推移傾向が異なるデータが同一属性のグループに混在するために予測精度が低下し得ることを解決することができる。なぜなら、各データセット候補について、観測データ毎に、異なる複数の周波数成分の大きさを要素に持つベクトルデータが算出され、ベクトルデータ間の距離を基に、データがグループに分類され、故に、同一グループに観測値の推移傾向が異なるデータが混在することは避けられるからである。 In other words, in this embodiment, we will address the technical issue of classifying data into groups with the same attribute and making predictions for each group. Specifically, we will address the technical issue of classifying data into groups with the same attribute and making predictions for each group. Specifically, we will address the technical issue of classifying data into groups with the same attribute and making predictions for each group. Therefore, it is possible to solve the problem that the prediction accuracy may decrease due to the above. This is because, for each data set candidate, vector data whose elements are the magnitudes of different frequency components is calculated for each observed data, and the data is classified into groups based on the distance between the vector data. This is because it is possible to avoid mixing data with different trends in observed values in a group.

　また、本実施の形態では、観測値の推移傾向が類似のグループにデータを分類してグループ別に予測をすることの技術的課題、具体的には、予測に必要なデータ数が不足するために予測精度が低下し得ることのいずれも、本実施の形態では解決することができる。なぜなら、複数のデータセット候補が抽出され、適合データセット３５５Ａは、複数のデータセット候補のうちグループのデータ数のばらつきが最も小さいデータセットである、つまり、いずれのグループについても予測モデル構築用のデータ数が極端に不足しないことが期待されるデータセットであるからである。 In addition, in this embodiment, the technical problem of classifying data into groups with similar trend trends of observed values and making predictions for each group, specifically, due to the insufficient amount of data required for prediction. Any of the problems that could lead to a decrease in prediction accuracy can be solved in this embodiment. This is because multiple dataset candidates are extracted, and the compatible dataset 355A is the dataset with the smallest variation in the number of data in groups among the multiple dataset candidates. This is because it is a data set that is expected not to have an extremely insufficient amount of data.

　なお、複数の周期成分の各々について、周期は、任意の周期（例えば、１年より長い期間、１年、半年、３月、１月、１週間、１日、半日、６時間、１時間および０．５時間のうちのいずれか）で良い。この周期は、観測値の周期性が期待される周期であるため、適合データセット３５５Ａが適切であることがより期待される。また、周波数成分と時間成分が併用されて良い。例えば、周波数成分のみが用いられる場合、フーリエ変換が採用され、周波数成分と時間成分が併用される場合、ウェーブレット変換が採用されて良い。
（２）第二の実施の形態（適合抽出部３５１の処理頻度の変更） Note that for each of the plurality of periodic components, the period may be any period (for example, a period longer than one year, one year, half a year, March, January, one week, one day, half a day, six hours, one hour, and 0.5 hours) is sufficient. Since this period is a period in which periodicity of observed values is expected, it is more expected that the adapted data set 355A is appropriate. Further, the frequency component and the time component may be used together. For example, when only frequency components are used, Fourier transform may be employed, and when frequency components and time components are used together, wavelet transform may be employed.
(2) Second embodiment (changing the processing frequency of the matching extraction unit 351)

　第一の実施の形態では、予測処理のたびに適合抽出部３５１を実行して適合データセット３５５Ａを新しく抽出する構成としているが、これに限らず、適合抽出部３５１による抽出処理の実行の有無は任意として良い。 In the first embodiment, the compatibility extraction unit 351 is executed every time prediction processing is performed to extract a new compatibility data set 355A, but the present invention is not limited to this. may be optional.

　具体的には、適合抽出部３５１の実行の契機は、例えば利用者２が任意に設定した一定期間の経過や、一定回数の実行の後、あるいは観測データ記憶装置５に記憶されている需要家の一定以上の増減が生じた場合などとして良い。 Specifically, the execution of the compatibility extraction unit 351 is triggered, for example, after a certain period of time arbitrarily set by the user 2, after a certain number of executions, or when a consumer stored in the observation data storage device 5 This may be considered as a case where an increase or decrease of more than a certain value occurs.

　適合抽出部３５１による抽出処理の実行の有無を任意とすることで、処理負荷の低減の効果が得られる。
（３）第三の実施の形態（候補選定部３５１５の処理内容の変更） By making it optional whether or not the matching extraction unit 351 executes the extraction process, the effect of reducing the processing load can be obtained.
(3) Third embodiment (change in processing content of candidate selection unit 3515)

　第一の実施の形態では、適合抽出部３５１において、候補選定部３５１５は、複数のデータセット候補３５１２Ａから適合データセット３５５Ａを選定する際、各候補におけるグループ別のデータ数の分散（ばらつきの一例）を評価指標としていたが、これに限らず、拡張した評価指標を用いても良い（データセット候補の評価指標は、観測データを入力とし予測データを出力とする予測モデルへの適合度の一例である）。 In the first embodiment, in the matching extraction unit 351, the candidate selection unit 3515 selects the matching data set 355A from the plurality of data set candidates 3512A. ) was used as the evaluation index, but the evaluation index is not limited to this, and an expanded evaluation index may be used. ).

　具体的には、候補選定部３５１５は、一のデータセット候補３５１２Ａの評価の際に、当該候補についての分散を評価指標とすることに加えて、当該データセット候補３５１２Ａの残余データ（観測データ群５２１Ａのうちの当該データセット候補３５１２Ａ以外の観測データ）の評価指標を算出して良い（すなわち、残余データにおける各観測データも上述のベクトルデータに変換してベクトルデータを一つ以上のグループに分類して良い）。候補選定部３５１５は、当該データセット候補３５１２Ａの分散と残余データの評価指標との線形和を取った値を全体の評価指標として、当該全体の評価指標を、当該データセット候補３５１２Ａの評価指標として良い。 Specifically, when evaluating one dataset candidate 3512A, the candidate selection unit 3515 uses the variance of the candidate as an evaluation index, and also uses the residual data (observed data group) of the dataset candidate 3512A as an evaluation index. 521A other than the relevant data set candidate 3512A) (that is, each observation data in the residual data may also be converted to the above-mentioned vector data and the vector data may be classified into one or more groups. ). The candidate selection unit 3515 sets the linear sum of the variance of the dataset candidate 3512A and the evaluation index of the residual data as the overall evaluation index, and sets the overall evaluation index as the evaluation index of the dataset candidate 3512A. good.

　残余データは、観測データ群５２１Ａのうちの当該データセット候補３５１２Ａ以外の観測データの総和で良く、残余予測部３５４１と同様の方法で得られたデータ良い。候補選定部３５１５は、残余予測部３５４１と同様の方法で、残余データを予測する予測モデル（残余データを目的変数とする予測モデル）を構築し、当該予測モデルの訓練誤差や尤度を算出し、当該訓練誤差や尤度を、残余データの評価指標として良い。 The residual data may be the sum of observed data other than the relevant data set candidate 3512A among the observed data group 521A, and may be data obtained by the same method as the residual prediction unit 3541. The candidate selection unit 3515 constructs a prediction model that predicts the residual data (a prediction model using the residual data as an objective variable) in the same manner as the residual prediction unit 3541, and calculates the training error and likelihood of the prediction model. , the training error or likelihood may be used as an evaluation index for the residual data.

　全体の評価指標は、上述したように、当該データセット候補３５１２Ａのばらつきを表す指標と、残余データの評価指標との線形和で良い。線形和の重みは、事前に設定した任意の値や、抽出された観測データ群と抽出されなかった観測データ群のデータ数の割合を用いることができる。また、残余データの評価指標として予測モデルの尤度を用いる場合は、上述の線形和は、尤度の逆数または訓練誤差とばらつきの指標との線形和で良い。 As described above, the overall evaluation index may be the linear sum of the index representing the variation of the data set candidate 3512A and the evaluation index of the residual data. As the weight of the linear sum, an arbitrary value set in advance or a ratio of the number of data in the extracted observation data group and the unextracted observation data group can be used. Further, when using the likelihood of the prediction model as an evaluation index of residual data, the above-mentioned linear sum may be the reciprocal of the likelihood or the linear sum of the training error and the dispersion index.

　最後に、候補選定部３５１５は、全体の評価指標値が最小となるデータセット候補を、適合データセット３５５Ａとして出力する。 Finally, the candidate selection unit 3515 outputs the dataset candidate with the minimum overall evaluation index value as the compatible dataset 355A.

　拡張した評価指標を用いることで、より正確な予測データの算出が可能となる適合データセット３５５Ａを抽出することが可能となる。
（４）第四の実施の形態（傾向分類部３５２の構成の変更） By using the expanded evaluation index, it becomes possible to extract a suitable data set 355A that allows calculation of more accurate prediction data.
(4) Fourth embodiment (change in configuration of trend classification unit 352)

　第一の実施の形態では、傾向分類部３５２におけるグループの数は、事前に任意に決定した数等であるが、これに限らず、グループ数は、予測対象期間の直近過去の観測データを用いて検証した精度が最大となるようなグループ数で良い。 In the first embodiment, the number of groups in the trend classification unit 352 is arbitrarily determined in advance, but is not limited to this. The number of groups that maximizes the accuracy verified by

　図１３を用いて具体的に説明する。傾向分類部３５２が、グループ分類部３５２２の拡張であるグループ構造決定部３５２４を有する。 This will be explained in detail using FIG. 13. The trend classification section 352 includes a group structure determination section 3524 that is an extension of the group classification section 3522.

　グループ構造決定部３５２４は、グループ数候補設定部３５２４１、グループ分類部３５２４２、精度検証部３５２４３、およびグループ数決定部３５２４４から構成される。 The group structure determining unit 3524 includes a group number candidate setting unit 35241, a group classification unit 35242, an accuracy verification unit 35243, and a group number determining unit 35244.

　グループ数候補設定部３５２４１は、特徴量変換部３５２１から、適合データセット３５５Ａに含まれる各観測データを異なる複数の周波数成分の大きさを要素に持つベクトルデータに変換した値を入力として受け取り、ベクトルデータをグループ分類する際のグループ数候補の値を生成する。グループ数候補は、事前に任意に定めた自然数の集合、あるいは、入力されたデータ数を基に定めたグループ数の最小値から最大値まで、任意に定めた数ずつ増やしていった値の集合として良い。最後に、グループ数候補設定部３５２４１は、当該ベクトルデータとグループ数候補の集合をグループ分類部３５２４２へ出力する。 The group number candidate setting unit 35241 receives as input a value obtained by converting each observed data included in the compatible data set 355A into vector data having elements of different sizes of frequency components from the feature amount converting unit 3521, and Generate a value for the number of groups when classifying data into groups. The group number candidates are a set of natural numbers arbitrarily determined in advance, or a set of values that are increased by an arbitrarily determined number from the minimum value to the maximum group number determined based on the number of input data. Good as. Finally, the group number candidate setting unit 35241 outputs the vector data and the set of group number candidates to the group classification unit 35242.

　グループ分類部３５２４２は、グループ数候補設定部３５２４１から出力されたベクトルデータとグループ数候補の集合、および外部データ群７２１Ａを入力として受け取る。次に、グループ分類部３５２４２は、各ベクトルデータ間の距離が近い組み合わせをグループ化することで、各ベクトルデータと、それぞれに紐づく外部データ群７２１Ａをグループ数候補により既定された数のグループに分類する。最後に、グループ分類部３５２４２は、グループ数候補とそれに対応するグループ情報付観測データ、およびグループ情報付外部データを精度検証部３５２４３へ出力する。 The group classification unit 35242 receives as input the vector data and the set of group number candidates output from the group number candidate setting unit 35241, and the external data group 721A. Next, the group classification unit 35242 groups each vector data and the external data group 721A linked thereto into a predetermined number of groups based on the group number candidates by grouping combinations in which the distance between each vector data is close. Classify. Finally, the group classification unit 35242 outputs the group number candidates, the corresponding observation data with group information, and the external data with group information to the accuracy verification unit 35243.

　精度検証部３５２４３は、グループ分類部３５２４２から出力されたグループ数候補とそれに対応するグループ情報付観測データ、およびグループ情報付外部データを入力として受け取り、グループ数候補毎に予測モデルの構築と予測値算出を行う。予測モデルの構築と予測値の算出は、グループ別予測部３５３、および全体予測部３５４と同様の方法で行われて良い。ここで、予測対象期間は、データ予測システム３で予測しようとする対象期間のなるべく直近で、且つ観測データが実測値として得られている期間を選択すると良い。次に、精度検証部３５２４３は、グループ数候補毎に算出した予測値と実績値として得られている観測データを比較し、予測精度を評価する。精度の指標には、絶対誤差等の誤差の尺度を用いて算出して良い。最後に、グループ数候補毎のグループ情報付観測データ、グループ情報付外部データ、および精度の指標値をグループ数決定部３５２４４へ出力する。 The accuracy verification unit 35243 receives as input the group number candidates output from the group classification unit 35242, the corresponding observed data with group information, and the external data with group information, and constructs a prediction model and predicts values for each group number candidate. Perform calculations. The construction of the prediction model and the calculation of the prediction value may be performed in the same manner as the group prediction unit 353 and the overall prediction unit 354. Here, as the prediction target period, it is preferable to select a period as close as possible to the target period to be predicted by the data prediction system 3 and for which observation data is obtained as actual measured values. Next, the accuracy verification unit 35243 compares the predicted value calculated for each group number candidate with the observed data obtained as the actual value, and evaluates the prediction accuracy. The accuracy index may be calculated using an error measure such as absolute error. Finally, the observed data with group information, the external data with group information, and the accuracy index value for each group number candidate are output to the group number determining unit 35244.

　グループ数決定部３５２４４は、精度評価部３５２３４からグループ数候補毎のグループ情報付観測データ、グループ情報付外部データ、および精度の指標値を入力として受け取り、精度が最良となるグループ数を決定する。具体的には、グループ数候補毎の誤差の尺度を比較し、誤差最小となるグループ数候補におけるグループ情報付観測データ、およびグループ情報付外部データを出力する。 The group number determining unit 35244 receives as input observation data with group information, external data with group information, and accuracy index value for each group number candidate from the accuracy evaluation unit 35234, and determines the number of groups that provides the best accuracy. Specifically, the error scale for each group number candidate is compared, and observation data with group information and external data with group information for the group number candidate with the minimum error are output.

　グループ構造決定部３５２４を用いることにより、処理負荷が増加する代わりに、対象期間の予測値をより正確に算出することが可能となる。
（５）第五の実施の形態（全体予測部３５４の構成の変更） By using the group structure determination unit 3524, it becomes possible to more accurately calculate the predicted value for the target period at the cost of increasing the processing load.
(5) Fifth embodiment (change in configuration of overall prediction unit 354)

　第一の実施の形態では、全体予測部３５４で、グループ別予測データ３５３３Ａと残余予測部３５４１の算出結果から全体予測データ３５６Ａを算出する構成としているが、これに限らず、残余予測部３５４１を用いずにグループ別予測データ３５３３Ａのみを用いて全体予測データ３５６Ａを算出する構成としても良い。 In the first embodiment, the overall prediction unit 354 is configured to calculate the overall prediction data 356A from the group-specific prediction data 3533A and the calculation results of the residual prediction unit 3541, but the present invention is not limited to this. The overall prediction data 356A may be calculated using only the group prediction data 3533A without using the group prediction data 3533A.

　その場合、全体算出部３５４２では、全体の需要家数から適合データセット３５５Ａに含まれる観測データの取得元の需要家数を除した係数を算出し、当該係数をグループ別予測データ３５３３Ａの総和に乗じることで、全体予測データ３５６Ａを算出する。 In that case, the overall calculation unit 3542 calculates a coefficient obtained by dividing the total number of consumers by the number of consumers from which the observed data included in the compatible data set 355A is obtained, and adds the coefficient to the sum of the group-specific forecast data 3533A. By multiplying, the overall prediction data 356A is calculated.

　残余予測部３５４１を用いずに全体予測データ３５６Ａを算出することにより、予測誤差が一定程度増加する代わりに、低い処理負荷で全体予測データ３５６Ａを得ることが可能となる。
（６）第六の実施の形態（適合抽出部３５１の構成の変更） By calculating the overall prediction data 356A without using the residual prediction unit 3541, the overall prediction data 356A can be obtained with a low processing load, although the prediction error increases to a certain extent.
(6) Sixth embodiment (change in configuration of matching extraction unit 351)

　第一の実施の形態では、適合抽出部３５１において、適合データセット３５５Ａに含まれるデータ数は利用者２が任意に定めて良いとしていたが、これに限らず、データサイズと、該データサイズの適合データセットを用いた場合に得られる精度の指標か、該データサイズの適合データセットを用いた場合の処理負荷か、あるいは両方との関係を加味してデータサイズが定められて良い。 In the first embodiment, the number of data included in the compatible data set 355A can be arbitrarily determined by the user 2 in the compatible extraction unit 351, but this is not limited to this. The data size may be determined by considering the relationship between an index of accuracy obtained when using a compatible data set, a processing load when using a compatible data set of the data size, or both.

　図１４を用いて具体的に説明する。適合抽出部３５１は、データサイズ決定部３５１６を備えて良い。適合抽出部３５１は、適合データセットとして出力するデータセット候補を選定する際、データセット候補のばらつきの指標、あるいは第三の実施形態に記載した評価指標に加えて、データサイズ決定部３５１６により決定されたデータサイズを用いる。 This will be explained in detail using FIG. 14. The suitability extraction section 351 may include a data size determination section 3516. When selecting a data set candidate to be output as a suitable data set, the suitability extraction unit 351 uses the data determined by the data size determination unit 3516 in addition to the variation index of the data set candidates or the evaluation index described in the third embodiment. Use the specified data size.

　データサイズ決定部３５１６は、適合データセットのデータサイズや、全体予測データ３５６Ａとして得た予測結果の精度の実績値や、予測値の算出時の処理負荷の実績から、データサイズと予測精度、あるいは処理負荷、あるいはその両方との関係をモデル化する。入力装置３２から目標とする予測精度、処理負荷、あるいはその両方の情報を受け取り、予測精度、処理負荷、あるいはその両方を満たすようなデータサイズをモデル（関係モデルの一例）から特定する。データサイズ決定部３５１６は、特定したデータサイズを候補選定部３５１５へ出力し、候補選定部３５１５は入力されたデータサイズを持つデータセット候補を絞り込んだ上で、適合データセットとして出力するデータセット候補を選定する。 The data size determination unit 3516 determines the data size and prediction accuracy based on the data size of the compatible data set, the actual accuracy of the prediction result obtained as the overall prediction data 356A, and the actual processing load when calculating the predicted value. Model the relationship between processing load and/or both. Information on the target prediction accuracy, processing load, or both is received from the input device 32, and a data size that satisfies the prediction accuracy, processing load, or both is specified from the model (an example of a relational model). The data size determination unit 3516 outputs the specified data size to the candidate selection unit 3515, and the candidate selection unit 3515 narrows down the dataset candidates having the input data size and outputs the dataset candidates as a suitable dataset. Select.

　以上に限らず、データサイズ決定部３５１６は、予測精度の実績の代わりに、候補選定部３５１５で算出した各データセット候補の評価指標を用いることとし、評価指標が一定の閾値を超えるようなデータサイズに絞り込む処理としても良い。 Not limited to the above, the data size determination unit 3516 uses the evaluation index of each dataset candidate calculated by the candidate selection unit 3515 instead of the prediction accuracy record, and the data size determination unit 3516 uses the evaluation index of each dataset candidate calculated by the candidate selection unit 3515, and It may also be a process of narrowing down to size.

　また、データサイズ決定部３５１６により決定したデータサイズは、候補選定部３５１５へ出力する代わりに、候補生成部３５１１へ出力しても良い。その場合、候補生成部３５１１は、入力として受け取った値のデータサイズを持つようなデータセット候補を生成する処理とする。 Furthermore, the data size determined by the data size determination unit 3516 may be output to the candidate generation unit 3511 instead of being output to the candidate selection unit 3515. In that case, the candidate generation unit 3511 generates a dataset candidate having the data size of the value received as input.

　以上のように、データサイズ決定部３５１６は、下記（Ａ）を出力とし下記（Ｂ）および（Ｃ）の少なくとも一つを入力とするモデルである関係モデルを構築し、当該関係モデルを用いて、（Ｂ）および（Ｃ）の少なくとも一つに関する条件を満たすデータサイズを推定して良い。
（Ａ）二つ以上の観測データで構成されたデータセットのデータサイズ。
（Ｂ）当該データサイズのデータセットを用いて予測処理を行った場合の予測精度、または、当該データセットの予測モデルへの適合度。
（Ｃ）当該データサイズのデータセットを用いて予測処理を行った場合の処理負荷。 As described above, the data size determination unit 3516 constructs a relational model that takes the following (A) as an output and at least one of the following (B) and (C) as an input, and uses the relational model to , (B) and (C) may be estimated.
(A) Data size of a dataset composed of two or more observational data.
(B) Prediction accuracy when a prediction process is performed using a dataset of the data size, or the degree of compatibility of the dataset with a prediction model.
(C) Processing load when prediction processing is performed using a data set of the data size.

　データサイズ決定部３５１６は、関係モデルを用いて、（Ｂ）および（Ｃ）の少なくとも一つに関する条件を満たすデータサイズを推定して良い。適合データセットは、当該推定されたデータサイズのデータセットで良い。 The data size determination unit 3516 may use a relational model to estimate a data size that satisfies the conditions regarding at least one of (B) and (C). The suitable data set may be a data set having the estimated data size.

　データサイズ決定部３５１６を使用することで、予測処理を過去一定期間分を対象に実施することで、予測精度の悪化または改善の傾向を把握し、適合データセット３５５Ａの更新の要否を判断することができる。
（７）第七の実施の形態（適合抽出部３５１への外部データの入力の追加） By using the data size determination unit 3516, by performing prediction processing for a certain period of time in the past, trends in prediction accuracy deterioration or improvement can be grasped, and it is determined whether or not it is necessary to update the compatible data set 355A. be able to.
(7) Seventh embodiment (addition of input of external data to matching extraction unit 351)

　第一の実施の形態では、適合抽出部３５１では観測データの周期性を示す指標のみを用いてグループ分類を行っていたが、これに限らず、外部データの情報も用いて分類を行っても良い。 In the first embodiment, the matching extraction unit 351 performs group classification using only the index indicating the periodicity of observed data, but the classification is not limited to this, and classification may also be performed using external data information. good.

　図１５を用いて具体的に説明する。グループ分類部３５１４は、外部データ群７２１Ａも入力として受け取り、外部データの情報も用いてグループ分類を行う。外部データは、例えば観測データを観測した主体の位置情報を用いて良い。電力需要予測を例とすると、電力の需要家の位置情報が近いデータ同士を同一のグループに所属させるようにすることで、気象や地形などの地点依存の影響を正確に予測モデルに反映させることが可能となり、予測精度の向上が期待できる。
（８）第八の実施の形態（適合抽出部３５１への観測データ補完部３５１７の追加） This will be explained in detail using FIG. 15. The group classification unit 3514 also receives the external data group 721A as input, and performs group classification using the external data information as well. As the external data, for example, position information of the subject who observed the observation data may be used. Taking electricity demand forecasting as an example, by assigning data with similar location information of electricity consumers to the same group, it is possible to accurately reflect location-dependent influences such as weather and topography in the prediction model. can be expected to improve prediction accuracy.
(8) Eighth embodiment (addition of observation data complementation unit 3517 to matching extraction unit 351)

　第一の実施の形態では、適合抽出部３５１へ入力された観測データ群５２１Ａをそのまま使用していたが、これに限らず、ある観測データの過去の実績値を他の観測データの過去の実績値で代用して良い。 In the first embodiment, the observation data group 521A input to the matching extraction unit 351 is used as is, but the present invention is not limited to this. You can substitute the value.

　図１６を用いて具体的に説明する。適合抽出部３５１が、観測データ補完部３５１７を備える。適合抽出部３５１へは、観測データ群５２１Ａを観測データ補完部３５１７で処理した後に候補生成部３５１１へ入力する。 This will be explained in detail using FIG. 16. The matching extraction section 351 includes an observed data complementation section 3517. The observation data group 521A is processed by the observation data complementation unit 3517 and then input to the candidate generation unit 3511.

　具体的な処理として、十分な量の過去の実績値が蓄積されていないような観測データ系列を対象として、直近の傾向から類似する観測データ系列を一つ乃至複数特定し、類似の観測データの過去の実績値、あるいは複数の類似の観測データの過去の実績値の平均等の統計量で、対象の観測データ系列の過去の実績値を補完して良い。また、補間に用いる観測データを決定する際には、補間に用いる候補の観測データと補間対象の観測データに紐づく位置情報や属性情報などの外部データを比較し、外部データの類似度合いから決定しても良い。 Specifically, for observation data series for which a sufficient amount of past performance values have not been accumulated, one or more similar observation data series are identified based on recent trends, and similar observation data series are identified. The past performance value of the target observation data series may be supplemented with a past performance value or a statistic such as an average of past performance values of a plurality of similar observation data. In addition, when determining observation data to be used for interpolation, external data such as location information and attribute information linked to the observation data to be used for interpolation and the observation data to be interpolated are compared, and decisions are made based on the degree of similarity of the external data. You may do so.

　以上の処理により、過去の実績値の不足によりデータ予測システムの処理に用いることができなかった観測データを使用することが可能となり、精度の向上が期待できる。過去の実績値が不足する場合の例として、電力需要家などの観測データが観測される主体数が増加する場合などが挙げられる。 Through the above processing, it becomes possible to use observation data that could not be used in the processing of the data prediction system due to a lack of past performance values, and an improvement in accuracy can be expected. An example of a case where past performance values are insufficient is a case where the number of entities whose observation data, such as electricity consumers, are observed increases.

　以上、本発明の幾つかの実施の形態を説明したが、これらは本発明の説明のための例示であって、本発明の範囲をこれらの実施の形態に限定する趣旨ではない。本発明は、他の種々の形態でも実施することが可能である。例えば、上述の実施の形態にて挙げたいずれか２個以上の実施の形態を併用するような形態が採用されても良い。 Although several embodiments of the present invention have been described above, these are illustrative examples for explaining the present invention, and are not intended to limit the scope of the present invention to these embodiments. The present invention can also be implemented in various other forms. For example, an embodiment may be adopted in which two or more of the embodiments listed in the above-mentioned embodiments are used together.

　１……データ処理システム、３……データ予測システム 1...Data processing system, 3...Data prediction system

Claims

an interface device connected to a data source containing a plurality of observation data each having periodicity;
a storage device;
a processor connected to the interface device and the storage device;
the processor extracts a plurality of dataset candidates from the plurality of observation data and stores them in the storage device;
Each of the plurality of dataset candidates is two or more observational data for the target period,
In each dataset candidate, the observed data is all or part of one of the plurality of observed data,
The processor,
For each of the plurality of data set candidates, for each observation data in the data set candidate, vector data having as elements the magnitudes of a plurality of different frequency components of the observation data,
For each of the plurality of data set candidates, classifying two or more vector data into one or more data groups based on the distance between the vector data,
For each of the plurality of dataset candidates, based on the number of data for each data group, calculate the degree of fit to a prediction model that uses observed data as input and predicted data as output,
outputting a dataset candidate used for prediction processing using the prediction model as a compatible dataset based on the fitness degree calculated for each of the plurality of dataset candidates;
Data prediction system.

For each observation data, the vector data is data at coordinates to which the observation data corresponds in a coordinate system with axes each having a plurality of different periodic components;
The data prediction system according to claim 1.

For each of the plurality of dataset candidates, the degree of fit to the prediction model is one of the following:
・Value representing the variation in the number of data in the data group,
・The linearity between the training error or the reciprocal of the likelihood and the value representing the variation when building a prediction model using residual data that is observation data other than the data set candidate among the plurality of observation data. sum,
The data prediction system according to claim 1.

In the data group, the distance between vector data is less than or equal to a threshold;
The data prediction system according to claim 1.

The processor constructs a relational model that is a model that has the following (A) as an output and at least one of the following (B) and (C) as an input,
(A) Data size of a dataset composed of two or more observational data,
(B) the prediction accuracy when the prediction process is performed using a dataset of the data size, or the degree of conformity of the dataset to the prediction model;
(C) processing load when performing the prediction process using a data set of the data size;
The processor uses the relational model to estimate a data size that satisfies conditions regarding at least one of (B) and (C);
The adapted data set is a data set of the estimated data size,
The data prediction system according to claim 1.

The data prediction system according to claim 1,
and a power control system that controls at least one of the power generation equipment and the power storage equipment,
The data prediction system outputs the prediction data by performing the prediction process using the adapted data set,
The power control system receives the prediction data, uses the prediction data to create at least one plan for power generation and power storage, and uses the created plan to control at least one of the power generation equipment and the power storage equipment. control one,
Data processing system.

A computer extracts multiple dataset candidates from a data source that includes multiple observational data, each with periodicity,
Each of the plurality of dataset candidates is a collection of two or more observational data for the target period,
In each dataset candidate, the observed data is all or part of one of the plurality of observed data,
for each of the plurality of data set candidates, the computer calculates, for each observation data in the data set candidate, vector data having as elements the magnitudes of a plurality of different frequency components of the observation data,
for each of the plurality of data set candidates, the computer classifies two or more vector data into one or more data groups based on the distance between the vector data;
the computer calculates, for each of the plurality of data set candidates, the degree of fit to a prediction model that uses observed data as input and predicted data as output, based on the number of data for each data group;
a computer outputs a dataset candidate to be used in a prediction process using the prediction model as a compatible dataset, based on the fitness degree calculated for each of the plurality of dataset candidates;
Data prediction support method.