[go: up one dir, main page]

WO2024014035A1 - Data prediction support method and data prediction system - Google Patents

Data prediction support method and data prediction system Download PDF

Info

Publication number
WO2024014035A1
WO2024014035A1 PCT/JP2023/006982 JP2023006982W WO2024014035A1 WO 2024014035 A1 WO2024014035 A1 WO 2024014035A1 JP 2023006982 W JP2023006982 W JP 2023006982W WO 2024014035 A1 WO2024014035 A1 WO 2024014035A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
prediction
group
dataset
observation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2023/006982
Other languages
French (fr)
Japanese (ja)
Inventor
和樹 南波
将人 内海
喜仁 木下
洋 飯村
大輔 浜場
広晃 小川
潤 山崎
大地 渡邊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hitachi Ltd
Original Assignee
Hitachi Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi Ltd filed Critical Hitachi Ltd
Publication of WO2024014035A1 publication Critical patent/WO2024014035A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Definitions

  • the present invention generally relates to data prediction.
  • the value of demand is calculated in order to operate equipment and allocate resources in accordance with consumer demand. Make predictions.
  • the power demand forecasting system disclosed in Patent Document 1 classifies consumers into a plurality of groups based on performance data indicating the amount of power consumed by a plurality of consumers at each predetermined time, and predicts a portion of the demand from each of the groups. Houses are selectively extracted, and the total power demand, which is the sum of the power demands of the consumers, is predicted based on the actual data of the consumers.
  • the power demand prediction device disclosed in Patent Document 2 classifies power measurement data from a smart meter into groups with similar trends, and constructs a prediction model and calculates a predicted value for each group.
  • a dataset consisting of two or more observational data is extracted from multiple observational data, the two or more observational data in the dataset are classified into multiple groups, and the observational data (observed values) are classified into multiple groups. It is conceivable to construct a prediction model that uses time-series data (time-series data) as input and prediction data (time-series data of predicted values) as output.
  • a group can be a group of consumers having the same attributes such as location, contract type, contract power, and industry.
  • electricity consumption trends may differ depending on whether renewable energy or electric vehicles have been introduced or differences in the behavioral characteristics of the consumers.
  • the accuracy of the prediction model decreases.
  • a group can be a group in which the observed data itself has a similar tendency.
  • the number of observation data is insufficient, and therefore prediction cannot be made with sufficient accuracy.
  • the purpose of the present invention was made in consideration of the above points, and is to extract a data set useful for constructing a prediction model with high prediction accuracy.
  • a data prediction system extracts multiple dataset candidates from a data source that includes multiple observational data, each with periodicity.
  • Each of the plurality of data set candidates is a set of two or more observation data for the target period.
  • the observation data is all or part of one of the plurality of observation data.
  • the data prediction system calculates, for each observational data in the dataset candidate, vector data whose elements are the magnitudes of multiple different frequency components of the observational data, and Vector data is classified into one or more data groups based on the distance between vector data, and based on the number of data in each data group, the goodness of fit to a prediction model that uses observed data as input and predicted data as output is calculated. calculate.
  • the data prediction system outputs a dataset candidate used for prediction processing using a prediction model as a compatible dataset based on the degree of fitness calculated for each of the plurality of dataset candidates.
  • FIG. 1 is a diagram showing a device configuration according to a first embodiment of a data prediction system.
  • FIG. 2 is a diagram showing an apparatus configuration when the embodiment of FIG. 1 is implemented in an electric power supply and demand management system.
  • 1 is a diagram showing the internal configuration of a data prediction system according to a first embodiment;
  • FIG. 2 is a diagram showing a data flow according to the first embodiment of the data prediction system.
  • FIG. 3 is a diagram showing a processing flow according to the first embodiment of the data prediction system.
  • FIG. 3 is a diagram showing details of a matching extraction section. It is a figure which shows the processing flow of a suitability extraction part.
  • FIG. 3 is a diagram illustrating an overview of a part of the processing of the compatibility extraction unit.
  • FIG. 1 is a diagram showing a device configuration according to a first embodiment of a data prediction system.
  • FIG. 2 is a diagram showing an apparatus configuration when the embodiment of FIG. 1 is implemented in an electric power supply and demand management system.
  • FIG. 6 is a diagram illustrating the remaining outline of the process of the matching extraction unit.
  • FIG. 3 is a diagram showing details of a trend classification section.
  • FIG. 3 is a diagram showing details of a group-by-group prediction unit.
  • FIG. 3 is a diagram showing details of the overall prediction unit. It is a figure showing an outline of an example of an effect. It is a figure which shows the data flow by the fourth embodiment of a trend classification part.
  • FIG. 12 is a diagram showing a data flow according to a sixth embodiment of a matching extraction unit.
  • FIG. 11 is a diagram showing a data flow according to a seventh embodiment of a matching extraction unit.
  • FIG. 7 is a diagram showing a data flow according to an eighth embodiment of a matching extraction unit.
  • an "interface device” may be one or more interface devices.
  • the one or more interface devices may be at least one of the following: - One or more I/O (Input/Output) interface devices.
  • the I/O (Input/Output) interface device is an interface device for at least one of an I/O device and a remote display computer.
  • the I/O interface device for the display computer may be a communication interface device.
  • the at least one I/O device may be a user interface device, eg, an input device such as a keyboard and pointing device, or an output device such as a display device. - One or more communication interface devices.
  • the one or more communication interface devices may be one or more communication interface devices of the same type (for example, one or more NICs (Network Interface Cards)), or two or more communication interface devices of different types (for example, one or more NICs). It may also be an HBA (Host Bus Adapter).
  • HBA Hypervisor Adapter
  • memory refers to one or more memory devices, typically a main storage device. At least one memory device in the memory may be a volatile memory device or a non-volatile memory device.
  • persistent storage refers to one or more persistent storage devices.
  • the persistent storage device is typically a nonvolatile storage device (for example, an auxiliary storage device), and specifically, for example, an HDD (Hard Disk Drive) or an SSD (Solid State Drive).
  • HDD Hard Disk Drive
  • SSD Solid State Drive
  • a “storage device” may be at least a memory or a permanent storage device.
  • a "processor” refers to one or more processor devices.
  • the at least one processor device is typically a microprocessor device such as a CPU (Central Processing Unit), but may be another type of processor device such as a GPU (Graphics Processing Unit).
  • At least one processor device may be single-core or multi-core.
  • the at least one processor device may be a processor core.
  • At least one processor device may be a broadly defined processor device such as a hardware circuit (for example, a field-programmable gate array (FPGA) or an application-specific integrated circuit (ASIC)) that performs part or all of the processing.
  • FPGA field-programmable gate array
  • ASIC application-specific integrated circuit
  • functions may be explained using the expression "yyy part", but functions may be realized by one or more computer programs being executed by a processor, or one or more computer programs may be executed by a processor. It may be realized by the above hardware circuits (for example, FPGA or ASIC), or by a combination thereof.
  • a function is realized by a program being executed by a processor, the specified processing is performed using a storage device and/or an interface device as appropriate, so the function may be implemented as at least a part of the processor. good.
  • a process described using a function as a subject may be a process performed by a processor or a device having the processor.
  • the program may be installed from program source.
  • the program source may be, for example, a program distribution computer or a computer-readable recording medium (for example, a non-temporary recording medium).
  • the description of each function is an example, and a plurality of functions may be combined into one function, or one function may be divided into a plurality of functions.
  • FIG. 1 shows the device configuration of the entire data prediction system according to this embodiment.
  • the data processing system 1 when applied to the electric power business field, analyzes observed values of past electric power demand and calculates predicted values of electric power demand and transaction prices for any target period. Based on the predicted values, it becomes possible to manage the supply and demand of electricity, including the formulation and execution of generator operation plans, the formulation and execution of electricity procurement transaction plans from other electric utilities, and the formation planning of power transmission and distribution facilities, etc. .
  • the data processing system 1 includes a user 2, a data prediction system 3, an observation data provider 4, an observation data storage device 5, an external data provider 6, an external data storage device 7, a supply and demand management facility 8, a control device 9, and a communication device. It consists of a route 10.
  • the communication path 10 may be, for example, a communication network such as a LAN (Local Area Network) or a WAN (Wide Area Network), and is a communication path that connects various devices and terminals that make up the data processing system 1 so that they can communicate with each other.
  • the control device 9 may use the prediction data calculated by the data prediction system 3 to create and execute plans regarding the operation and control of equipment such as generators and communication stations, market transactions, equipment formation, and the like.
  • the device configuration shown in FIG. 2 can be considered.
  • the user 2 corresponds to the operator of the supply and demand management equipment 8.
  • the observation data provider 4 corresponds to a consumer
  • the observation data storage device 5 corresponds to a power measuring device.
  • the external data provider 6 corresponds to a public data provider
  • the external data storage device 7 corresponds to a public data storage device.
  • the supply and demand management equipment 8 includes a generator, power storage equipment, a switch, etc.
  • the control device 9 includes, for example, a market transaction management device, a generator control device, a power storage equipment control device, and a switch. at least one of the controllers.
  • the public data may be, for example, data including at least one of the following.
  • - Weather data such as temperature, humidity, solar radiation, wind speed, and atmospheric pressure.
  • ⁇ Calendar date data such as year, month, day, day of the week, and flag values indicating the type of arbitrarily set day.
  • ⁇ Data indicating the occurrence of sudden events such as typhoons and events.
  • industrial dynamics for example, the number of energy consumers, attributes indicating the type of energy consumers such as factories, offices, and households, the industry of energy consumers, and the number of production and sales amount for each industry and company).
  • ⁇ Data showing the topography or climate characteristics of each region.
  • -Data such as the number of communication terminals connected to a communication base station.
  • the observation data storage device 5 is an example of a data source and stores observation data groups.
  • the observation data group is composed of a plurality of observation data each having periodicity.
  • the observed data is data that serves as input data for performing data prediction, and may be time-series data of past observed values.
  • observation data may be, for example, data including at least one of the following.
  • observation data may be data for each measuring instrument (for example, data for each smart meter), or data as a total of multiple measuring instruments (for example, observation data for all smart meters belonging to a predetermined area). (observed data as an average value) may also be used.
  • - Energy consumption data such as electricity, gas, water, etc.
  • - Energy production data such as solar power generation and wind power generation.
  • Communication amount data measured at communication base stations, etc. Historical data of location information of moving objects such as cars.
  • the observation data storage device 5 searches for and/or transmits observation data in response to data acquisition requests from other devices.
  • the external data storage device 7 is input data for performing data prediction, and stores external data linked to one or more observation data.
  • the external data may be linked in advance, or may be linked dynamically by a processing unit such as the group classification unit 3522. Furthermore, the external data may be data representing past values or data representing future values.
  • the external data may be, for example, data including at least one of the following.
  • the external data may be time-series data of values, similar to observation data.
  • Weather data such as temperature, humidity, solar radiation, wind speed, and atmospheric pressure.
  • ⁇ Calendar date data such as year, month, day, day of the week, and flag values indicating the type of arbitrarily set day.
  • ⁇ Data indicating the occurrence of sudden events such as typhoons and events.
  • - Data on industrial dynamics for example, the number of energy consumers, attributes indicating the type of energy consumers such as factories, offices, and households, the industry of energy consumers, and the number of production and sales amount for each industry and company).
  • ⁇ Data showing location information and regional topographic or climate characteristics.
  • -Data such as the number of communication terminals connected to a communication base station.
  • the external data storage device 7 searches for and/or transmits external data in response to data acquisition requests from other devices.
  • the data processing system 1 includes the data prediction system 3 and the power control system (for example, the supply and demand management equipment 8) that controls at least one of the power generation equipment and the power storage equipment.
  • the data prediction system 3 outputs predicted data by performing a prediction process using a compatible data set, which will be described later.
  • the power control system receives the prediction data, uses the prediction data to create at least one plan for power generation and storage, and controls at least one of the power generation equipment and the power storage equipment based on the created plan. .
  • the prediction accuracy of the data prediction system 3 is high, so the created plan is a plan suitable for control. Therefore, suitable power generation and/or storage is realized, and therefore, suitable power supply and demand control is achieved. can be expected.
  • FIG. 3 shows the device configuration of the data prediction system 3, observed data storage device 5, and external data storage device 7 included in the data processing system 1.
  • the data prediction system 3 includes an input device 32, an output device 33, a communication device 34, a storage device 35, and a CPU (Central Processing Unit) 31 connected thereto.
  • the data prediction system 3 is an information processing system such as a personal computer, a server computer, or a handheld computer.
  • the data prediction system 3 may be such a physical computer system (one or more physical computers) or a logical computer system based on a physical computer system (for example, a system as a cloud computing service). ) but that's fine.
  • the input device 32 and the output device 33 may be omitted.
  • the communication device 34 is an example of an interface device
  • the CPU 31 is an example of a processor.
  • the input device 32 is composed of, for example, a keyboard or a mouse
  • the output device 33 is composed of, for example, a display or a printer.
  • the communication device 34 is configured to include, for example, a NIC (Network Interface Card) for connecting to a wireless LAN or a wired LAN.
  • the storage device 35 is a storage medium such as a RAM (Random Access Memory) or a ROM (Read Only Memory). The output results and intermediate results of each processing unit may be outputted as appropriate via the output device 33.
  • the storage device 35 stores one or more computer programs for realizing functions such as a matching extraction section 351, a trend classification section 352, a group prediction section 353, and an overall prediction section 354. When these computer programs are executed by the CPU 31, these functions are realized.
  • the storage device 35 has a storage area 355 for storing a matching data set 355A, and a storage area 356 for storing overall prediction data 356A. Note that the storage areas 355 and 356 may be a single storage area.
  • the compatible data set 355A may be database information or text information used to calculate the overall prediction data 356A, and may be a data set (two or more observation data) extracted from the observation data group 521A (a plurality of observation data). ).
  • the overall prediction data 356A is generated as follows. That is, two or more pieces of observed data included in the compatible data set 355A are classified into one or more groups having similar transition patterns. A prediction model is constructed for each group, and prediction data is generated using the prediction model. The overall prediction data generated from the prediction data of each group is the overall prediction data 356A.
  • the overall prediction data 356A may be database information, text information, or image information that is a graph of calculated values.
  • the observation data storage device 5 includes at least a communication device 51, a storage device 52, and a CPU 53 connected thereto.
  • the storage device 52 has a storage area 521 for storing observation data group 521A.
  • the external data storage device 7 includes at least a communication device 71, a storage device 72, and a CPU 73 connected thereto.
  • the storage device 72 has a storage area 721 for storing an external data group 721A.
  • the external data group 721A is composed of a plurality of external data.
  • the data prediction system 3 performs data prediction using observation data and external data acquired from the observation data storage device 5 and the external data storage device 7.
  • the matching extraction unit 351 extracts a plurality of data set candidates from the observation data group 521A, which is a collection of periodic observation data.
  • a dataset candidate is a collection of two or more observed data.
  • the matching extraction unit 351 converts each observed data included in the data set candidate into data representing the magnitudes of a plurality of different frequency components, and then performs one-on-one matching with each frequency component. map the data to a multidimensional feature space with corresponding dimensions.
  • the matching extraction unit 351 classifies data into a plurality of groups by grouping data that are close to each other in the feature space.
  • the suitability extraction unit 351 calculates, for each dataset candidate, an index representing variation such as variance for the number of data included in each group, and uses the candidate with the minimum index from among the plurality of dataset candidates for data prediction. Output as a conforming data set. (1-3) Processing and data flow of data prediction system according to this embodiment
  • the data prediction system 3 in this embodiment acquires an observation data group 521A and an external data group 721A from the observation data storage device 5 and the external data storage device 7, respectively.
  • the observed data group 521A is input to the matching extraction unit 351.
  • the compatible extraction unit 351 extracts a plurality of data set candidates from the input observation data group 521A, and outputs one of the plurality of data set candidates as the compatible data set 355A (S301).
  • the matching data set 355A is input to the trend classification unit 352 together with the external data group 721A.
  • the trend classification unit 352 links the observed data included in the input compatible data set 355A with the external data included in the external data group 721A.
  • the trend classification unit 352 converts each observed data (typically time series data) included in the adapted data set 355A into a multidimensional vector whose elements are the magnitudes of frequency components, and classifies data with close distances between the vectors. By grouping them together, the data included in the compatible data set 355A and the external data group 721A are classified into one or more groups (S302).
  • a group includes observation data, and external data is linked to the observation data.
  • the "distance” may be a general distance measure that satisfies the axiom of distance, such as Euclidean distance, Mahalanobis distance, Manhattan distance, Chebyshev distance, or Minkowski distance, or a degree of similarity such as cosine similarity.
  • the grouping process can be performed using hierarchical clustering methods such as the Ward method, single link method, complete link method, or centroid method, or clustering methods as neighborhood optimal methods such as k-means, EM algorithm, or spectral clustering.
  • processing may be performed using a clustering method for optimal identification boundary such as unsupervised SVM (Support Vector Machine), VQ algorithm, or SOM (Self-Organizing Maps).
  • the plurality of groups obtained by the trend classification section 352 are input to the group-by-group prediction section 353.
  • the group prediction unit 353 constructs a prediction model for each group created by the trend classification unit 352, and calculates prediction data for each group (for example, time series data of predicted values) using each prediction model. (S303).
  • the group-specific prediction data calculated by the group-specific prediction unit 353 is input to the overall prediction unit 354.
  • the overall prediction unit 354 calculates and outputs the overall prediction data using the group-specific prediction data calculated by the group-specific prediction unit 353 and the residual observed data not extracted by the adaptive extraction unit 351 ( S304).
  • the overall prediction data is prediction data for the entire area to which the customer belongs.
  • the overall prediction data 356A calculated by the overall prediction unit 354 is input to the supply and demand management equipment 8.
  • the supply and demand management equipment 8 uses the input overall forecast data 356A to control the generator, power storage equipment, switch, etc.
  • FIG. 6 shows the data flow inside the matching extraction unit 351.
  • the suitability extraction unit 351 includes a candidate generation unit 3511, an index calculation unit 3513, a group classification unit 3514, and a candidate selection unit 3515.
  • the storage area 3512 is an area in the storage device 35, and the data set candidate 3512A is stored as an intermediate output.
  • the candidate generation unit 3511 extracts a plurality of dataset candidates 3512A from the observed data group 521A, and stores each dataset candidate 3512A in the storage area 3512.
  • Data set candidates may be extracted at random, or by extracting some observed data from each of a plurality of data classifications.
  • Data classification may be any of the following. - A collection of observation data with the same label (the label may depend on the attribute (for example, factory, commercial facility, home, etc.) to which the measuring instrument from which the observation data was obtained belongs). - A collection of observed data with the same or similar statistics (statistics may be the average value, maximum value, minimum value, variance, etc. of the data transition represented by the observed data).
  • the number of observation data included in each dataset candidate may be arbitrarily determined by the user 2.
  • the index calculation unit 3513 converts the observed data into multidimensional vector data in which each element has the magnitude of a frequency component.
  • the conversion process may be, for example, a process of normalizing a value representing a data transition, a Fourier transform or a wavelet transform process of a value representing a data transition, or both. Any frequency component may be designated in advance as the frequency component to be calculated by the conversion process.
  • the specified frequency component may be a plurality of different periodic components, and the periodic component may be selected from yearly, monthly, weekly, daily, 1 hour, and 0.5 hour period components.
  • characteristic components tend to appear in the electric power field: one year (365 days), half a year (180 days), March (90 days), January (30 days), one week (7 days), one year Two or more periodic components of a day, half a day (12 hours), 6 hours, 1 hour and 30 minutes (0.5 hours) may be employed.
  • components whose values vary widely among vector data may be mechanically adopted as frequency components.
  • the group classification unit 3514 collects data whose vector data distances are close after being converted by the index calculation unit 3513, so that the group classification unit 3514 can perform calculations for two or more observed data in the compatible data set 355A.
  • the two or more vector data obtained are classified into one or more groups.
  • the number of groups may be arbitrarily determined in advance, or may be a number that minimizes an information amount criterion such as AIC (Akaike Information Criterion) or BIC (Bayesian Information Criterion).
  • the candidate selection unit 3515 calculates, for each dataset candidate 3512A, an index representing variation such as the variance of the number of data in a group.
  • the candidate selection unit 3515 outputs the dataset candidate 3512A with the smallest index among the plurality of dataset candidates 3512A as the compatible dataset 355A used for data prediction (prediction processing using a prediction model).
  • FIG. 7 shows the processing flow of the matching extraction unit 351. This processing flow corresponds to the internal processing of S301 shown in FIG.
  • the matching extraction unit 351 repeats the processing from S3012 to S3014 N times (S3011). Note that instead of repeating the processes from S3012 to S3014 N times, the matching extraction unit 351 may repeat each process of S3012, S3013, and S3014 N times. N is an integer of 2 or more.
  • the matching extraction unit 351 extracts the dataset candidate 3512A from the observed data group 521A (S3012).
  • the index calculation unit 3513 converts each observation data included in the dataset candidate 3512A extracted in S3012 into multidimensional vector data (periodicity index) having each element as a magnitude of a frequency component (S3013 ).
  • the group classification unit 3514 classifies two or more vector data regarding two or more observed data in the dataset candidate 3512A into a plurality of groups by grouping together vector data that are close to each other (S3014).
  • a group (data group) is a group of vector data in which the distance between vector data is equal to or less than a threshold value.
  • the threshold may be a distance threshold from representative vector data determined based on two or more vector data, or a relative distance threshold between calculated vector data.
  • the candidate selection unit 3515 calculates an index representing variation, such as variance, with respect to the number of data in the group for the dataset candidate, and selects the dataset candidate with the smallest index. , and output as a compatible data set 355A (S3015).
  • the processing contents of the matching extraction unit 351 will be explained in more detail using FIGS. 8A and 8B.
  • the observed data in the observed data group 521A is time series data of power consumption.
  • the candidate generation unit 3511 extracts N data set candidates from the observed data group 521A.
  • Each data set candidate includes two or more observation data 521B1 (power consumption transition data).
  • the number of observed data in the dataset candidates may be the same or different.
  • each of the N data set candidates is composed of two or more pieces of observation data for the target period.
  • the "target period" may be a past period such as the past year or half a year from the present, and the past period may be a period corresponding to the forecast target period (future period) (for example, the forecast target period is from January to December In the case of months, the relevant past period can also be from January to December).
  • the index calculation unit 3513 converts each observation data 521B1 into vector data 521B2 whose elements are the magnitudes of frequency components by performing Fourier transform on each observation data 521B1. Transforms other than Fourier transform, such as wavelet transform, may be employed.
  • observation data 521B1 is converted into two-dimensional vector data of a daily periodic component and a semi-daily periodic component, but instead, long periodic components such as a yearly period and a half-yearly period, etc. , 1-hour period, 30-minute period, etc., and may be converted into more multidimensional vector data.
  • the group classification unit 3514 groups the vector data converted into the magnitude of the frequency component into data with a small distance measure.
  • groups 1 to 3 classified by group boundaries 521B3 indicated by dotted lines are created. Note that, to simplify the explanation, the number of groups is the same for dataset candidates 1 and 2, but the number of groups may be different for each dataset candidate.
  • the candidate selection unit 3515 totals the number of data for each group for each dataset candidate (see reference numeral 521B4).
  • the candidate selection unit 3515 calculates the variance of the number of data for each group for each dataset candidate from the total number of data for each group (see reference numeral 521B5). Finally, the candidate selection unit 3515 outputs the dataset candidate 2 with the minimum variance as the compatible dataset 355A. (1-4-2) Trend classification unit 352
  • the trend classification unit 352 receives the compatible data set 355A and the external data group 721A as input, and links the observed data included in the compatible data set 355A with the external data included in the external data group 721A.
  • the trend classification unit 352 classifies two or more pieces of observation data included in the fit data set 355A into one or more types based on the degree of similarity between the feature data (vector data) of the observation data included in the fit data set 355A. Categorize into groups.
  • the trend classification section 352 includes a feature amount conversion section 3521 and a group classification section 3522.
  • the storage area 3523 is one area of the storage device 35, and the intermediate output observation data with group information 3523A and the external data with group information 3523B are stored.
  • the feature value conversion unit 3521 receives the compatible data set 355A as input, and converts each observed data included in the compatible data set 355A into feature data.
  • the feature amount data may be the multidimensional vector data (periodicity index) described above.
  • the group classification unit 3522 receives vector data for each observed data in the compatible data set 355A (feature data output from the feature conversion unit 3521) and the external data group 721A as input.
  • the group classification unit 3522 links vector data with external data that is linked to observation data corresponding to the vector data.
  • the group classification unit 3522 groups the combinations in which the distance between each vector data is close, so that two or more vector data regarding two or more observed data in the compatible data set 355A are grouped into one or more vector data. Classify into groups. In a group (data group), the distance, classification into groups (grouping process), and number of groups may be as described above (as described for the matching extraction unit 351).
  • the group classification unit 3522 stores the observation data with group information 3523A and the external data with group information 3523B in the storage area 3523, and outputs them to the group-by-group prediction unit 353.
  • the observation data with group information 3523A includes each observation data in the compatible data set 355A, and each observation data is associated with group information indicating the group to which the feature amount data of the observation data is classified.
  • the external data with group information 3523B includes external data linked to the observed data in the compatible data set 355A, and represents the group to which the feature data of the observed data to which the external data is linked is classified. Group information is linked. (1-4-3) Group-specific prediction unit 353
  • the group-by-group prediction unit 353 receives as input the observed data with group information 3523A, the external data with group information 3523B, and the external data group 721A output from the trend classification unit 352, and constructs a prediction model and makes predictions for each group. Calculate forecast data for the target period.
  • the group prediction unit 353 is composed of a construction unit 3531 and a group calculation unit 3532.
  • the storage area 3533 is an area of the storage device 35, and the group-specific prediction data 3533A is stored in the storage area 3533.
  • the construction unit 3531 receives the observation data with group information 3523A and the external data with group information 3523B as input, and constructs a prediction model for each group.
  • the prediction model uses regression, classification, clustering, etc., using data belonging to the group with group information attached observation data 3523A as an objective variable, and using data belonging to the group in the group information attached external data 3523B as an explanatory variable.
  • a model or a model that combines them is fine.
  • a value obtained by taking an average value or a median value from data belonging to a certain group of the observation data with group information 3523A may be adopted. Further, different models may be adopted for each group.
  • the group calculation unit 3532 receives the group-by-group prediction model output from the construction unit 3531 and the external data group 721A as input, and calculates prediction data for the prediction target period of each group.
  • the external data group 721A may include actual values and forecast values in the prediction target period of data associated with observation data used to construct the prediction model of the group for which prediction data is to be calculated.
  • the group calculation unit 3532 stores the prediction data calculated for each group in the storage area 3533 as group-specific prediction data 3533A, and outputs it to the overall prediction unit 354. (1-4-4) Overall prediction unit 354
  • the overall prediction unit 354 receives as input the group-specific prediction data 3533A, the observed data group 521A, the adaptive data set 355A, and the external data group 721A output from the group-specific prediction unit 353, and obtains the overall prediction data (e.g., adaptive Forecast data for the entire area to which the customer corresponding to the observation data belonging to the data set 355A belongs is calculated.
  • the overall prediction data e.g., adaptive Forecast data for the entire area to which the customer corresponding to the observation data belonging to the data set 355A belongs is calculated.
  • the overall prediction unit 354 includes a residual prediction unit 3541 and an overall calculation unit 3542.
  • the residual prediction unit 3541 first creates residual data by taking the sum of observed data other than the compatible data set 355A among the observed data group 521A.
  • the residual data may be a set of observed data other than the compatible data set 355A among the observed data group 521A.
  • the residual prediction unit 3541 constructs a prediction model using the residual data as an objective variable and the data associated with the residual data among the data included in the external data group 721A as an explanatory variable.
  • the predictive model may be a regression, classification, clustering, or other model, or a combination of these models. In constructing the prediction model, a value obtained by taking an average value or a median value from the residual data may be adopted.
  • the residual prediction unit 3541 inputs the actual values and predicted values for the prediction period of the external data group 721A used as explanatory variables into the prediction model, calculates the predicted value of the residual data, and outputs it to the overall calculation unit 3542. do.
  • the overall calculation unit 3542 uses the group-specific prediction data 3533A and the predicted value of the residual data output from the residual prediction unit 3541 to calculate the predicted value for the entire customer. Specifically, for example, the overall calculation unit 3542 takes the sum of each value of the group-by-group prediction data 3533A and the predicted value of the residual data, and outputs the sum as the overall prediction data 356A.
  • FIG. 12 shows an overview of an example of the effect when prediction data for the prediction target period (for example, observed data to be predicted in the prediction target period) is calculated using the compatible data set 355A extracted by the compatible extraction unit 351.
  • It is a diagram.
  • the dotted line represents observed data (the time series of observed values in the past period corresponding to the forecast period)
  • the solid line represents the predicted data (time series of observed values predicted for the forecast period). series)
  • the broken lines represent the observed data to be predicted (the time series of observed values actually observed for the prediction target period).
  • the observed data indicated by the dotted line is the observed data included in the compatible data set 355A.
  • the prediction data indicated by the solid line is data output as the group-based prediction data 3533A or the overall prediction data 356A.
  • the black dots 25 simulate vector data (data converted from observed data) calculated by the index calculation unit 3513 and plotted on the feature space.
  • the feature space is a coordinate system space with axes each having a plurality of different periodic components.
  • Vector data is plotted as coordinates in the feature space in a simulated manner.
  • Case 21 is a case in which observed data is extracted so that the breakdown of attributes representing the type of factory, office, general household, etc. is uniform, and predicted data is calculated for each group classified by the same attribute.
  • data with different power consumption trends are mixed in group A that has the factory attribute. Differences in trends are caused by, for example, differences in the amount of solar power generation installed and differences in business types.
  • differences in trends are caused by, for example, differences in the amount of solar power generation installed and differences in business types.
  • observation data is extracted so that the breakdown of attributes representing types such as factories, offices, and general households is uniform, and predicted data is calculated in groups of data with small distance scales in the feature space. It is a case. In this case, data with the same attributes but different trends are classified into different groups, but groups such as group B for which it is not possible to secure sufficient training data to build a predictive model (in other words, the number of data is extremely small) are classified into different groups. group) may occur. As a result, it becomes difficult to predict demand for this group, resulting in errors remaining in the overall forecast data.
  • Case 23 is a case in which observed data is extracted using the extraction method according to the present embodiment, and predicted data is calculated for each group of data with a small distance scale in the feature space.
  • a data set is selected in which the variation (for example, variance) in the number of data for each group is small, so the number of data is likely to be equal between the groups.
  • the variation for example, variance
  • the technical problem of classifying data into groups with similar trend trends of observed values and making predictions for each group specifically, due to the insufficient amount of data required for prediction.
  • Any of the problems that could lead to a decrease in prediction accuracy can be solved in this embodiment. This is because multiple dataset candidates are extracted, and the compatible dataset 355A is the dataset with the smallest variation in the number of data in groups among the multiple dataset candidates. This is because it is a data set that is expected not to have an extremely insufficient amount of data.
  • the period may be any period (for example, a period longer than one year, one year, half a year, March, January, one week, one day, half a day, six hours, one hour, and 0.5 hours) is sufficient. Since this period is a period in which periodicity of observed values is expected, it is more expected that the adapted data set 355A is appropriate.
  • the frequency component and the time component may be used together. For example, when only frequency components are used, Fourier transform may be employed, and when frequency components and time components are used together, wavelet transform may be employed.
  • the compatibility extraction unit 351 is executed every time prediction processing is performed to extract a new compatibility data set 355A, but the present invention is not limited to this. may be optional.
  • the execution of the compatibility extraction unit 351 is triggered, for example, after a certain period of time arbitrarily set by the user 2, after a certain number of executions, or when a consumer stored in the observation data storage device 5 This may be considered as a case where an increase or decrease of more than a certain value occurs.
  • the candidate selection unit 3515 selects the matching data set 355A from the plurality of data set candidates 3512A.
  • the evaluation index was used as the evaluation index, but the evaluation index is not limited to this, and an expanded evaluation index may be used. ).
  • the candidate selection unit 3515 uses the variance of the candidate as an evaluation index, and also uses the residual data (observed data group) of the dataset candidate 3512A as an evaluation index. 521A other than the relevant data set candidate 3512A) (that is, each observation data in the residual data may also be converted to the above-mentioned vector data and the vector data may be classified into one or more groups. ).
  • the candidate selection unit 3515 sets the linear sum of the variance of the dataset candidate 3512A and the evaluation index of the residual data as the overall evaluation index, and sets the overall evaluation index as the evaluation index of the dataset candidate 3512A. good.
  • the residual data may be the sum of observed data other than the relevant data set candidate 3512A among the observed data group 521A, and may be data obtained by the same method as the residual prediction unit 3541.
  • the candidate selection unit 3515 constructs a prediction model that predicts the residual data (a prediction model using the residual data as an objective variable) in the same manner as the residual prediction unit 3541, and calculates the training error and likelihood of the prediction model. , the training error or likelihood may be used as an evaluation index for the residual data.
  • the overall evaluation index may be the linear sum of the index representing the variation of the data set candidate 3512A and the evaluation index of the residual data.
  • the weight of the linear sum an arbitrary value set in advance or a ratio of the number of data in the extracted observation data group and the unextracted observation data group can be used.
  • the likelihood of the prediction model when using the likelihood of the prediction model as an evaluation index of residual data, the above-mentioned linear sum may be the reciprocal of the likelihood or the linear sum of the training error and the dispersion index.
  • the candidate selection unit 3515 outputs the dataset candidate with the minimum overall evaluation index value as the compatible dataset 355A.
  • the number of groups in the trend classification unit 352 is arbitrarily determined in advance, but is not limited to this.
  • the trend classification section 352 includes a group structure determination section 3524 that is an extension of the group classification section 3522.
  • the group structure determining unit 3524 includes a group number candidate setting unit 35241, a group classification unit 35242, an accuracy verification unit 35243, and a group number determining unit 35244.
  • the group number candidate setting unit 35241 receives as input a value obtained by converting each observed data included in the compatible data set 355A into vector data having elements of different sizes of frequency components from the feature amount converting unit 3521, and Generate a value for the number of groups when classifying data into groups.
  • the group number candidates are a set of natural numbers arbitrarily determined in advance, or a set of values that are increased by an arbitrarily determined number from the minimum value to the maximum group number determined based on the number of input data. Good as.
  • the group number candidate setting unit 35241 outputs the vector data and the set of group number candidates to the group classification unit 35242.
  • the group classification unit 35242 receives as input the vector data and the set of group number candidates output from the group number candidate setting unit 35241, and the external data group 721A. Next, the group classification unit 35242 groups each vector data and the external data group 721A linked thereto into a predetermined number of groups based on the group number candidates by grouping combinations in which the distance between each vector data is close. Classify. Finally, the group classification unit 35242 outputs the group number candidates, the corresponding observation data with group information, and the external data with group information to the accuracy verification unit 35243.
  • the accuracy verification unit 35243 receives as input the group number candidates output from the group classification unit 35242, the corresponding observed data with group information, and the external data with group information, and constructs a prediction model and predicts values for each group number candidate. Perform calculations. The construction of the prediction model and the calculation of the prediction value may be performed in the same manner as the group prediction unit 353 and the overall prediction unit 354. Here, as the prediction target period, it is preferable to select a period as close as possible to the target period to be predicted by the data prediction system 3 and for which observation data is obtained as actual measured values. Next, the accuracy verification unit 35243 compares the predicted value calculated for each group number candidate with the observed data obtained as the actual value, and evaluates the prediction accuracy. The accuracy index may be calculated using an error measure such as absolute error. Finally, the observed data with group information, the external data with group information, and the accuracy index value for each group number candidate are output to the group number determining unit 35244.
  • the group number determining unit 35244 receives as input observation data with group information, external data with group information, and accuracy index value for each group number candidate from the accuracy evaluation unit 35234, and determines the number of groups that provides the best accuracy. Specifically, the error scale for each group number candidate is compared, and observation data with group information and external data with group information for the group number candidate with the minimum error are output.
  • the overall prediction unit 354 is configured to calculate the overall prediction data 356A from the group-specific prediction data 3533A and the calculation results of the residual prediction unit 3541, but the present invention is not limited to this.
  • the overall prediction data 356A may be calculated using only the group prediction data 3533A without using the group prediction data 3533A.
  • the overall calculation unit 3542 calculates a coefficient obtained by dividing the total number of consumers by the number of consumers from which the observed data included in the compatible data set 355A is obtained, and adds the coefficient to the sum of the group-specific forecast data 3533A. By multiplying, the overall prediction data 356A is calculated.
  • the number of data included in the compatible data set 355A can be arbitrarily determined by the user 2 in the compatible extraction unit 351, but this is not limited to this.
  • the data size may be determined by considering the relationship between an index of accuracy obtained when using a compatible data set, a processing load when using a compatible data set of the data size, or both.
  • the suitability extraction section 351 may include a data size determination section 3516.
  • the suitability extraction unit 351 uses the data determined by the data size determination unit 3516 in addition to the variation index of the data set candidates or the evaluation index described in the third embodiment. Use the specified data size.
  • the data size determination unit 3516 determines the data size and prediction accuracy based on the data size of the compatible data set, the actual accuracy of the prediction result obtained as the overall prediction data 356A, and the actual processing load when calculating the predicted value. Model the relationship between processing load and/or both. Information on the target prediction accuracy, processing load, or both is received from the input device 32, and a data size that satisfies the prediction accuracy, processing load, or both is specified from the model (an example of a relational model). The data size determination unit 3516 outputs the specified data size to the candidate selection unit 3515, and the candidate selection unit 3515 narrows down the dataset candidates having the input data size and outputs the dataset candidates as a suitable dataset. Select.
  • the data size determination unit 3516 uses the evaluation index of each dataset candidate calculated by the candidate selection unit 3515 instead of the prediction accuracy record, and the data size determination unit 3516 uses the evaluation index of each dataset candidate calculated by the candidate selection unit 3515, and It may also be a process of narrowing down to size.
  • the data size determined by the data size determination unit 3516 may be output to the candidate generation unit 3511 instead of being output to the candidate selection unit 3515.
  • the candidate generation unit 3511 generates a dataset candidate having the data size of the value received as input.
  • the data size determination unit 3516 constructs a relational model that takes the following (A) as an output and at least one of the following (B) and (C) as an input, and uses the relational model to , (B) and (C) may be estimated.
  • A Data size of a dataset composed of two or more observational data.
  • B Prediction accuracy when a prediction process is performed using a dataset of the data size, or the degree of compatibility of the dataset with a prediction model.
  • C Processing load when prediction processing is performed using a data set of the data size.
  • the data size determination unit 3516 may use a relational model to estimate a data size that satisfies the conditions regarding at least one of (B) and (C).
  • the suitable data set may be a data set having the estimated data size.
  • the matching extraction unit 351 performs group classification using only the index indicating the periodicity of observed data, but the classification is not limited to this, and classification may also be performed using external data information. good.
  • the group classification unit 3514 also receives the external data group 721A as input, and performs group classification using the external data information as well.
  • the external data for example, position information of the subject who observed the observation data may be used.
  • electricity demand forecasting by assigning data with similar location information of electricity consumers to the same group, it is possible to accurately reflect location-dependent influences such as weather and topography in the prediction model. can be expected to improve prediction accuracy.
  • observation data group 521A input to the matching extraction unit 351 is used as is, but the present invention is not limited to this. You can substitute the value.
  • the matching extraction section 351 includes an observed data complementation section 3517.
  • the observation data group 521A is processed by the observation data complementation unit 3517 and then input to the candidate generation unit 3511.
  • observation data series for observation data series for which a sufficient amount of past performance values have not been accumulated, one or more similar observation data series are identified based on recent trends, and similar observation data series are identified.
  • the past performance value of the target observation data series may be supplemented with a past performance value or a statistic such as an average of past performance values of a plurality of similar observation data.
  • external data such as location information and attribute information linked to the observation data to be used for interpolation and the observation data to be interpolated are compared, and decisions are made based on the degree of similarity of the external data. You may do so.
  • observation data that could not be used in the processing of the data prediction system due to a lack of past performance values, and an improvement in accuracy can be expected.
  • An example of a case where past performance values are insufficient is a case where the number of entities whose observation data, such as electricity consumers, are observed increases.

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Marketing (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Public Health (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Water Supply & Treatment (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

This system extracts a plurality of data set candidates from a plurality of pieces of observation data having periodicity. The data set candidate is a set of two or more pieces of observation data for a target period. The system calculates vector data for each data set candidate, the vector data including, as elements, the magnitudes of a plurality of different frequency components of the observation data for each piece of the observation data in the dataset candidate, the system classifies two or more pieces of the vector data into one or more data groups on the basis of a distance between pieces of the vector data, and the system calculates the goodness of fit to a prediction model with the observation data serving as the input and the prediction data serving as the output, on the basis of the number of data pieces for each data group. The system outputs a data set candidate to be used for prediction using the prediction model, on the basis of the goodness of fit that is calculated for each data set candidate.

Description

データ予測支援方法およびデータ予測システムData prediction support method and data prediction system

 本発明は、概して、データ予測に関する。 The present invention generally relates to data prediction.

 電力事業やガス事業などのエネルギー事業分野や、通信事業分野、タクシーや配送業などの運送事業分野などでは、消費者の需要に合わせた設備稼働や資源配分を行うために、需要量の値の予測を行う。 In the energy business field such as electric power business and gas business, the communication business field, and the transportation business field such as taxi and delivery businesses, the value of demand is calculated in order to operate equipment and allocate resources in accordance with consumer demand. Make predictions.

 例えば電力事業の分野では、電気の発電量と需要量とを常に一致させなければならないという物理的な制約がある。必要十分な発電機を事前に待機させる必要があるため、電力の需要を正確に予測する必要がある。 For example, in the field of electric power business, there is a physical constraint that the amount of electricity generated must always match the amount of electricity demanded. Since it is necessary to have sufficient generators on standby in advance, it is necessary to accurately predict the demand for electricity.

 再生可能エネルギーや電気自動車等の分散電源の普及による電力消費傾向の多様化や、電力自由化に伴う小売契約切替の高頻度化が生じている。従って、管轄エリア一括での需要予測では捉えられない電力消費傾向の変化が発生し、高精度の需要予測が困難になる。電力需要の予測に関する技術は、例えば特許文献1および2に開示されている。 Power consumption trends are diversifying due to the spread of renewable energy and distributed power sources such as electric vehicles, and retail contract switching is becoming more frequent due to electricity liberalization. Therefore, changes in power consumption trends occur that cannot be captured by a collective demand forecast for the jurisdiction area, making highly accurate demand forecasting difficult. Techniques related to predicting power demand are disclosed in Patent Documents 1 and 2, for example.

 特許文献1に開示の電力需要予測システムは、複数の需要家の所定の時間毎における消費電力量を示す実績データに基づいて需要家を複数のグループに分類し、グループのそれぞれから一部の需要家を選択的に抽出し、当該需要家の実績データに基づいて、需要家の電力需要の合計である合計電力需要を予測する。 The power demand forecasting system disclosed in Patent Document 1 classifies consumers into a plurality of groups based on performance data indicating the amount of power consumed by a plurality of consumers at each predetermined time, and predicts a portion of the demand from each of the groups. Houses are selectively extracted, and the total power demand, which is the sum of the power demands of the consumers, is predicted based on the actual data of the consumers.

 特許文献2に開示の電力需要予測装置は、スマートメータの電力計測データを傾向の類似するグループに分類し、グループ別に予測モデルの構築および予測値の算定を行う。 The power demand prediction device disclosed in Patent Document 2 classifies power measurement data from a smart meter into groups with similar trends, and constructs a prediction model and calculates a predicted value for each group.

特開2016-181060号公報Japanese Patent Application Publication No. 2016-181060 特開2017-182266号公報Japanese Patent Application Publication No. 2017-182266

 複数の観測データから、二つ以上の観測データで構成されたデータセットを抽出し、データセットにおける二つ以上の観測データを複数のグループに分類し、グループ別に、観測データ(観測された値の時系列データ)を入力として予測データ(予測された値の時系列データ)を出力とする予測モデルを構築することが考えられる。 A dataset consisting of two or more observational data is extracted from multiple observational data, the two or more observational data in the dataset are classified into multiple groups, and the observational data (observed values) are classified into multiple groups. It is conceivable to construct a prediction model that uses time-series data (time-series data) as input and prediction data (time-series data of predicted values) as output.

 特許文献1によれば、グループを、所在地、契約種別、契約電力及び業種などの属性が同じ需要家のグループとすることができる。しかし、同一属性の需要家の間でも、再生可能エネルギーや電気自動車の導入の有無や、需要家の行動特性の違いにより、電力消費傾向が異なる場合がある。同一グループに異なる傾向の観測データが混在する場合、予測モデルの精度が低下する。 According to Patent Document 1, a group can be a group of consumers having the same attributes such as location, contract type, contract power, and industry. However, even among consumers with the same attributes, electricity consumption trends may differ depending on whether renewable energy or electric vehicles have been introduced or differences in the behavioral characteristics of the consumers. When observed data with different trends are mixed in the same group, the accuracy of the prediction model decreases.

 特許文献2によれば、グループを、観測データ自体の傾向が類似したグループとすることができる。しかし、この場合、グループによっては、観測データ数が十分でなく、故に、十分な精度で予測することができない。 According to Patent Document 2, a group can be a group in which the observed data itself has a similar tendency. However, in this case, depending on the group, the number of observation data is insufficient, and therefore prediction cannot be made with sufficient accuracy.

 本発明の目的は、以上の点を考慮してなされたもので、高い予測精度の予測モデルの構築に有用なデータセットを抽出することにある。 The purpose of the present invention was made in consideration of the above points, and is to extract a data set useful for constructing a prediction model with high prediction accuracy.

 データ予測システムは、それぞれ周期性を持つ複数の観測データを含むデータソースから複数のデータセット候補を抽出する。当該複数のデータセット候補の各々は、対象期間についての二つ以上の観測データの集合である。各データセット候補において、観測データは、複数の観測データのうちの一つの観測データの全部または一部である。データ予測システムは、複数のデータセット候補の各々について、当該データセット候補における観測データ毎に、当該観測データの異なる複数の周波数成分の大きさを要素に持つベクトルデータを算出し、二つ以上のベクトルデータを、ベクトルデータ間の距離に基づき、一つ以上のデータグループに分類し、データグループ毎のデータ数を基に、観測データを入力とし予測データを出力とする予測モデルへの適合度を算出する。データ予測システムは、複数のデータセット候補の各々について算出された適合度を基に、予測モデルを用いた予測処理に使用されるデータセット候補を適合データセットとして出力する。 A data prediction system extracts multiple dataset candidates from a data source that includes multiple observational data, each with periodicity. Each of the plurality of data set candidates is a set of two or more observation data for the target period. In each data set candidate, the observation data is all or part of one of the plurality of observation data. For each of a plurality of dataset candidates, the data prediction system calculates, for each observational data in the dataset candidate, vector data whose elements are the magnitudes of multiple different frequency components of the observational data, and Vector data is classified into one or more data groups based on the distance between vector data, and based on the number of data in each data group, the goodness of fit to a prediction model that uses observed data as input and predicted data as output is calculated. calculate. The data prediction system outputs a dataset candidate used for prediction processing using a prediction model as a compatible dataset based on the degree of fitness calculated for each of the plurality of dataset candidates.

 本発明によれば、高い予測精度の予測モデルの構築に有用なデータセットを抽出することが可能となる。 According to the present invention, it is possible to extract a dataset useful for constructing a prediction model with high prediction accuracy.

データ予測システムの第一の実施の形態による装置構成を示す図である。1 is a diagram showing a device configuration according to a first embodiment of a data prediction system. 図1の形態を電力の需給管理システムで実施した場合の装置構成を示す図である。FIG. 2 is a diagram showing an apparatus configuration when the embodiment of FIG. 1 is implemented in an electric power supply and demand management system. データ予測システムの第一の実施の形態による装置内部構成を示す図である。1 is a diagram showing the internal configuration of a data prediction system according to a first embodiment; FIG. データ予測システムの第一の実施の形態によるデータフローを示す図である。FIG. 2 is a diagram showing a data flow according to the first embodiment of the data prediction system. データ予測システムの第一の実施の形態による処理フローを示す図である。FIG. 3 is a diagram showing a processing flow according to the first embodiment of the data prediction system. 適合抽出部の詳細を示す図である。FIG. 3 is a diagram showing details of a matching extraction section. 適合抽出部の処理フローを示す図である。It is a figure which shows the processing flow of a suitability extraction part. 適合抽出部の処理の一部の概要を示す図である。FIG. 3 is a diagram illustrating an overview of a part of the processing of the compatibility extraction unit. 適合抽出部の処理の残りの概要を示す図である。FIG. 6 is a diagram illustrating the remaining outline of the process of the matching extraction unit. 傾向分類部の詳細を示す図である。FIG. 3 is a diagram showing details of a trend classification section. グループ別予測部の詳細を示す図である。FIG. 3 is a diagram showing details of a group-by-group prediction unit. 全体予測部の詳細を示す図である。FIG. 3 is a diagram showing details of the overall prediction unit. 効果の一例の概要を示す図である。It is a figure showing an outline of an example of an effect. 傾向分類部の第四の実施の形態によるデータフローを示す図である。It is a figure which shows the data flow by the fourth embodiment of a trend classification part. 適合抽出部の第六の実施の形態によるデータフローを示す図である。FIG. 12 is a diagram showing a data flow according to a sixth embodiment of a matching extraction unit. 適合抽出部の第七の実施の形態によるデータフローを示す図である。FIG. 11 is a diagram showing a data flow according to a seventh embodiment of a matching extraction unit. 適合抽出部の第八の実施の形態によるデータフローを示す図である。FIG. 7 is a diagram showing a data flow according to an eighth embodiment of a matching extraction unit.

 以下の説明では、「インターフェース装置」は、一つ以上のインターフェースデバイスで良い。当該一つ以上のインターフェースデバイスは、下記のうちの少なくとも一つで良い。
・一つ以上のI/O(Input/Output)インターフェースデバイス。I/O(Input/Output)インターフェースデバイスは、I/Oデバイスと遠隔の表示用計算機とのうちの少なくとも一つに対するインターフェースデバイスである。表示用計算機に対するI/Oインターフェースデバイスは、通信インターフェースデバイスで良い。少なくとも一つのI/Oデバイスは、ユーザインターフェースデバイス、例えば、キーボードおよびポインティングデバイスのような入力デバイスと、表示デバイスのような出力デバイスとのうちのいずれでも良い。
・一つ以上の通信インターフェースデバイス。一つ以上の通信インターフェースデバイスは、一つ以上の同種の通信インターフェースデバイス(例えば一つ以上のNIC(Network Interface Card))であっても良いし二つ以上の異種の通信インターフェースデバイス(例えばNICとHBA(Host Bus Adapter))であっても良い。
In the following description, an "interface device" may be one or more interface devices. The one or more interface devices may be at least one of the following:
- One or more I/O (Input/Output) interface devices. The I/O (Input/Output) interface device is an interface device for at least one of an I/O device and a remote display computer. The I/O interface device for the display computer may be a communication interface device. The at least one I/O device may be a user interface device, eg, an input device such as a keyboard and pointing device, or an output device such as a display device.
- One or more communication interface devices. The one or more communication interface devices may be one or more communication interface devices of the same type (for example, one or more NICs (Network Interface Cards)), or two or more communication interface devices of different types (for example, one or more NICs). It may also be an HBA (Host Bus Adapter).

 また、以下の説明では、「メモリ」は、一つ以上のメモリデバイスであり、典型的には主記憶デバイスで良い。メモリにおける少なくとも一つのメモリデバイスは、揮発性メモリデバイスであっても良いし不揮発性メモリデバイスであっても良い。 Additionally, in the following description, "memory" refers to one or more memory devices, typically a main storage device. At least one memory device in the memory may be a volatile memory device or a non-volatile memory device.

 また、以下の説明では、「永続記憶装置」は、一つ以上の永続記憶デバイスである。永続記憶デバイスは、典型的には、不揮発性の記憶デバイス(例えば補助記憶デバイス)であり、具体的には、例えば、HDD(Hard Disk Drive)またはSSD(Solid State Drive)である。 Also, in the following description, "persistent storage" refers to one or more persistent storage devices. The persistent storage device is typically a nonvolatile storage device (for example, an auxiliary storage device), and specifically, for example, an HDD (Hard Disk Drive) or an SSD (Solid State Drive).

 また、以下の説明では、「記憶装置」は、メモリと永続記憶装置の少なくともメモリで良い。 Furthermore, in the following description, a "storage device" may be at least a memory or a permanent storage device.

 また、以下の説明では、「プロセッサ」は、一つ以上のプロセッサデバイスである。少なくとも一つのプロセッサデバイスは、典型的には、CPU(Central Processing Unit)のようなマイクロプロセッサデバイスであるが、GPU(Graphics Processing Unit)のような他種のプロセッサデバイスでも良い。少なくとも一つのプロセッサデバイスは、シングルコアでも良いしマルチコアでも良い。少なくとも一つのプロセッサデバイスは、プロセッサコアでも良い。少なくとも一つのプロセッサデバイスは、処理の一部または全部を行うハードウェア回路(例えばFPGA(Field-Programmable Gate Array)またはASIC(Application Specific Integrated Circuit))といった広義のプロセッサデバイスでも良い。 Also, in the following description, a "processor" refers to one or more processor devices. The at least one processor device is typically a microprocessor device such as a CPU (Central Processing Unit), but may be another type of processor device such as a GPU (Graphics Processing Unit). At least one processor device may be single-core or multi-core. The at least one processor device may be a processor core. At least one processor device may be a broadly defined processor device such as a hardware circuit (for example, a field-programmable gate array (FPGA) or an application-specific integrated circuit (ASIC)) that performs part or all of the processing.

 また、以下の説明では、「yyy部」の表現にて機能を説明することがあるが、機能は、一つ以上のコンピュータプログラムがプロセッサによって実行されることで実現されても良いし、一つ以上のハードウェア回路(例えばFPGAまたはASIC)によって実現されても良いし、それらの組合せによって実現されても良い。プログラムがプロセッサによって実行されることで機能が実現される場合、定められた処理が、適宜に記憶装置および/またはインターフェース装置等を用いながら行われるため、機能はプロセッサの少なくとも一部とされても良い。機能を主語として説明された処理は、プロセッサあるいはそのプロセッサを有する装置が行う処理としても良い。プログラムは、プログラムソースからインストールされても良い。プログラムソースは、例えば、プログラム配布計算機または計算機が読み取り可能な記録媒体(例えば非一時的な記録媒体)であっても良い。各機能の説明は一例であり、複数の機能が一つの機能にまとめられたり、一つの機能が複数の機能に分割されたりしても良い。 In addition, in the following explanation, functions may be explained using the expression "yyy part", but functions may be realized by one or more computer programs being executed by a processor, or one or more computer programs may be executed by a processor. It may be realized by the above hardware circuits (for example, FPGA or ASIC), or by a combination thereof. When a function is realized by a program being executed by a processor, the specified processing is performed using a storage device and/or an interface device as appropriate, so the function may be implemented as at least a part of the processor. good. A process described using a function as a subject may be a process performed by a processor or a device having the processor. The program may be installed from program source. The program source may be, for example, a program distribution computer or a computer-readable recording medium (for example, a non-temporary recording medium). The description of each function is an example, and a plurality of functions may be combined into one function, or one function may be divided into a plurality of functions.

 以下、図面を参照して、本発明の幾つかの実施の形態を詳述する。
(1)第一の実施の形態
(1-1)本実施の形態によるデータ予測システムの構成
Hereinafter, some embodiments of the present invention will be described in detail with reference to the drawings.
(1) First embodiment (1-1) Configuration of data prediction system according to this embodiment

 図1は、本実施の形態によるデータ予測システム全体の装置構成を示す。 FIG. 1 shows the device configuration of the entire data prediction system according to this embodiment.

 データ処理システム1は、例えば電力事業分野に適用されている場合、過去の電力需要の観測値を分析し、任意の対象期間の電力の需要量や取引価格の予測値などを算出する。予測値に基づき、発電機の運転計画の策定と実行、そして、他の電気事業者からの電力の調達取引計画の策定や実行、送配電設備等の形成計画など電力の需給管理が可能になる。 For example, when applied to the electric power business field, the data processing system 1 analyzes observed values of past electric power demand and calculates predicted values of electric power demand and transaction prices for any target period. Based on the predicted values, it becomes possible to manage the supply and demand of electricity, including the formulation and execution of generator operation plans, the formulation and execution of electricity procurement transaction plans from other electric utilities, and the formation planning of power transmission and distribution facilities, etc. .

 データ処理システム1は、利用者2、データ予測システム3、観測データ提供者4、観測データ記憶装置5、外部データ提供者6、外部データ記憶装置7、需給管理設備8、制御装置9、および通信経路10から構成される。通信経路10は、例えばLAN(Local Area Network)やWAN(Wide Area Network)のような通信ネットワークで良く、データ処理システム1を構成する各種装置および端末を互いに通信可能に接続する通信経路である。制御装置9は、データ予測システム3で算出した予測データを用い、発電機や通信局などの設備の運用、制御、市場取引、設備形成などに関する計画の作成と実行を行って良い。 The data processing system 1 includes a user 2, a data prediction system 3, an observation data provider 4, an observation data storage device 5, an external data provider 6, an external data storage device 7, a supply and demand management facility 8, a control device 9, and a communication device. It consists of a route 10. The communication path 10 may be, for example, a communication network such as a LAN (Local Area Network) or a WAN (Wide Area Network), and is a communication path that connects various devices and terminals that make up the data processing system 1 so that they can communicate with each other. The control device 9 may use the prediction data calculated by the data prediction system 3 to create and execute plans regarding the operation and control of equipment such as generators and communication stations, market transactions, equipment formation, and the like.

 具体例として、データ処理システム1を電力の需給管理を行うシステムにおいて実施した場合、図2に示す装置構成が考えられる。 As a specific example, when the data processing system 1 is implemented in a system that performs power supply and demand management, the device configuration shown in FIG. 2 can be considered.

 利用者2は、需給管理設備8の運用者にあたる。観測データ提供者4は需要家に該当し、観測データ記憶装置5は電力計測装置に該当する。外部データ提供者6は公共データ提供者に該当し、外部データ記憶装置7は公共データ記憶装置に該当する。また、需給管理設備8には発電機や蓄電設備や開閉器などが含まれ、制御装置9は、例えば、市場取引の管理装置、発電機の制御装置、蓄電設備の制御装置、および、開閉器の制御装置、のうちの少なくとも一つである。 The user 2 corresponds to the operator of the supply and demand management equipment 8. The observation data provider 4 corresponds to a consumer, and the observation data storage device 5 corresponds to a power measuring device. The external data provider 6 corresponds to a public data provider, and the external data storage device 7 corresponds to a public data storage device. In addition, the supply and demand management equipment 8 includes a generator, power storage equipment, a switch, etc., and the control device 9 includes, for example, a market transaction management device, a generator control device, a power storage equipment control device, and a switch. at least one of the controllers.

 なお、公共データは、例えば、下記のうちの少なくとも一つを含むデータで良い。
・気温、湿度、日射量、風速、気圧などの気象データ。
・年月日、曜日、任意に設定した日の種別を示すフラグ値などの暦日データ。
・台風やイベントなどの突発事象の発生有無を示すデータ。
・産業動態(例えば、エネルギーの消費者数、エネルギー消費者の工場やオフィスや一般家庭などの種別を表す属性、エネルギー消費者の業種、業種ごとや企業ごとの生産数や売上額)のデータ。
・地域ごとの地形あるいは気候の特性を示すデータ。
・通信基地局に接続する通信端末数などのデータ。
・過去の観測データそのもの。
Note that the public data may be, for example, data including at least one of the following.
- Weather data such as temperature, humidity, solar radiation, wind speed, and atmospheric pressure.
・Calendar date data such as year, month, day, day of the week, and flag values indicating the type of arbitrarily set day.
・Data indicating the occurrence of sudden events such as typhoons and events.
- Data on industrial dynamics (for example, the number of energy consumers, attributes indicating the type of energy consumers such as factories, offices, and households, the industry of energy consumers, and the number of production and sales amount for each industry and company).
・Data showing the topography or climate characteristics of each region.
-Data such as the number of communication terminals connected to a communication base station.
・Past observation data itself.

 観測データ記憶装置5は、データソースの一例であり、観測データ群を記憶する。観測データ群は、それぞれ周期性を持つ複数の観測データで構成される。観測データは、データ予測を行うための入力データとなるデータであり、過去の観測値の時系列のデータで良い。 The observation data storage device 5 is an example of a data source and stores observation data groups. The observation data group is composed of a plurality of observation data each having periodicity. The observed data is data that serves as input data for performing data prediction, and may be time-series data of past observed values.

 観測データは、例えば、下記のうちの少なくとも一つを含むデータで良い。また、観測データは、計測器単位毎のデータ(例えば、スマートメータ毎のデータ)でも良いし、複数の計測器の合計としてのデータ(例えば、所定のエリアに属する全てのスマートメータの観測データの平均値としての観測データ)でも良い。
・電力、ガス、水道などのエネルギー消費量データ。
・太陽光発電や風力発電などのエネルギーの生産量データ。
・卸取引所で取引されるエネルギーの取引価格のデータ。
・通信基地局などで計測される通信量データ。
・自動車などの移動体の位置情報の履歴データ。
The observation data may be, for example, data including at least one of the following. Furthermore, observation data may be data for each measuring instrument (for example, data for each smart meter), or data as a total of multiple measuring instruments (for example, observation data for all smart meters belonging to a predetermined area). (observed data as an average value) may also be used.
- Energy consumption data such as electricity, gas, water, etc.
- Energy production data such as solar power generation and wind power generation.
・Data on energy transaction prices traded on wholesale exchanges.
・Communication amount data measured at communication base stations, etc.
・Historical data of location information of moving objects such as cars.

 観測データ記憶装置5は、他装置からのデータ取得要求に応じて、観測データの検索または送信、あるいはその両方を行う。 The observation data storage device 5 searches for and/or transmits observation data in response to data acquisition requests from other devices.

 外部データ記憶装置7は、データ予測を行うための入力データであり、1つ乃至複数の観測データと紐づく外部データを記憶する。外部データは、予め紐づけられていても良いし、グループ分類部3522のような処理部により動的に紐づけられても良い。また、外部データは、過去の値を表すデータでも良いし将来の値を表すデータでも良い。 The external data storage device 7 is input data for performing data prediction, and stores external data linked to one or more observation data. The external data may be linked in advance, or may be linked dynamically by a processing unit such as the group classification unit 3522. Furthermore, the external data may be data representing past values or data representing future values.

 外部データは、例えば、下記のうちの少なくとも一つを含むデータで良い。外部データは、観測データと同様に値の時系列のデータで良い。
・気温、湿度、日射量、風速、気圧などの気象データ。
・年月日、曜日、任意に設定した日の種別を示すフラグ値などの暦日データ。
・台風やイベントなどの突発事象の発生有無を示すデータ。
・産業動態(例えば、エネルギーの消費者数、エネルギー消費者の工場やオフィスや一般家庭などの種別を表す属性、エネルギー消費者の業種、業種ごとや企業ごとの生産数や売上額)のデータ。
・位置情報や地域ごとの地形あるいは気候の特性を示すデータ。
・通信基地局に接続する通信端末数などのデータ。
・過去の観測データそのもの。
The external data may be, for example, data including at least one of the following. The external data may be time-series data of values, similar to observation data.
- Weather data such as temperature, humidity, solar radiation, wind speed, and atmospheric pressure.
・Calendar date data such as year, month, day, day of the week, and flag values indicating the type of arbitrarily set day.
・Data indicating the occurrence of sudden events such as typhoons and events.
- Data on industrial dynamics (for example, the number of energy consumers, attributes indicating the type of energy consumers such as factories, offices, and households, the industry of energy consumers, and the number of production and sales amount for each industry and company).
・Data showing location information and regional topographic or climate characteristics.
-Data such as the number of communication terminals connected to a communication base station.
・Past observation data itself.

 外部データ記憶装置7は、他装置からのデータ取得要求に応じて、外部データの検索または送信、あるいはその両方を行う。 The external data storage device 7 searches for and/or transmits external data in response to data acquisition requests from other devices.

 以上のように、データ処理システム1は、データ予測システム3と、発電設備および蓄電設備のうちの少なくとも一つを制御する電力制御システム(例えば、需給管理設備8)とを備える。データ予測システム3は、後述の適合データセットを用いて予測処理を行うことで予測データを出力する。電力制御システムは、予測データを受信し、当該予測データを用いて発電および蓄電の少なくとも一つの計画を作成し、当該作成した計画を基に発電設備および蓄電設備のうちの少なくとも一つを制御する。データ予測システム3の予測精度は後述するように高いため、作成される計画は制御に適した計画であり、故に、好適な発電および/または蓄電が実現され、以って、好適な電力需給制御が期待できる。
(1-2)装置内部構成
As described above, the data processing system 1 includes the data prediction system 3 and the power control system (for example, the supply and demand management equipment 8) that controls at least one of the power generation equipment and the power storage equipment. The data prediction system 3 outputs predicted data by performing a prediction process using a compatible data set, which will be described later. The power control system receives the prediction data, uses the prediction data to create at least one plan for power generation and storage, and controls at least one of the power generation equipment and the power storage equipment based on the created plan. . As described later, the prediction accuracy of the data prediction system 3 is high, so the created plan is a plan suitable for control. Therefore, suitable power generation and/or storage is realized, and therefore, suitable power supply and demand control is achieved. can be expected.
(1-2) Device internal configuration

 図3は、データ処理システム1に含まれるデータ予測システム3、観測データ記憶装置5、および外部データ記憶装置7の装置構成を示す。 FIG. 3 shows the device configuration of the data prediction system 3, observed data storage device 5, and external data storage device 7 included in the data processing system 1.

 データ予測システム3は、入力装置32、出力装置33、通信装置34、記憶装置35およびそれらに接続されたCPU(Central Processing Unit)31から構成される。データ予測システム3は、例えばパーソナルコンピュータ、サーバコンピュータまたはハンドヘルドコンピュータなどの情報処理システムである。データ予測システム3は、そのような物理的な計算機システム(一つ以上の物理的な計算機)でもよいし、物理的な計算機システムに基づく論理的な計算機システム(例えば、クラウドコンピューティングサービスとしてのシステム)でも良い。入力装置32および出力装置33は無くても良い。通信装置34がインターフェース装置の一例であり、CPU31がプロセッサの一例である。 The data prediction system 3 includes an input device 32, an output device 33, a communication device 34, a storage device 35, and a CPU (Central Processing Unit) 31 connected thereto. The data prediction system 3 is an information processing system such as a personal computer, a server computer, or a handheld computer. The data prediction system 3 may be such a physical computer system (one or more physical computers) or a logical computer system based on a physical computer system (for example, a system as a cloud computing service). ) but that's fine. The input device 32 and the output device 33 may be omitted. The communication device 34 is an example of an interface device, and the CPU 31 is an example of a processor.

 入力装置32は、例えばキーボードまたはマウスから構成され、出力装置33は、例えばディスプレイまたはプリンタから構成される。また通信装置34は、例えば無線LANまたは有線LANに接続するためのNIC(Network Interface Card)を備えて構成される。また、記憶装置35は、RAM(Random Access Memory)やROM(Read Only Memory)などの記憶媒体である。出力装置33を介して、各処理部の出力結果や、中間結果を適宜出力しても良い。 The input device 32 is composed of, for example, a keyboard or a mouse, and the output device 33 is composed of, for example, a display or a printer. Further, the communication device 34 is configured to include, for example, a NIC (Network Interface Card) for connecting to a wireless LAN or a wired LAN. Further, the storage device 35 is a storage medium such as a RAM (Random Access Memory) or a ROM (Read Only Memory). The output results and intermediate results of each processing unit may be outputted as appropriate via the output device 33.

 記憶装置35は、適合抽出部351、傾向分類部352、グループ別予測部353および全体予測部354といった機能を実現するための一つまたは複数のコンピュータプログラムを記憶する。これらのコンピュータプログラムがCPU31に実行されることで、それらの機能が実現される。 The storage device 35 stores one or more computer programs for realizing functions such as a matching extraction section 351, a trend classification section 352, a group prediction section 353, and an overall prediction section 354. When these computer programs are executed by the CPU 31, these functions are realized.

 また、記憶装置35は、適合データセット355Aを格納するための記憶領域355、および、全体予測データ356Aを格納するための356を有する。なお、記憶領域355および356は単一の記憶領域でも良い。 Furthermore, the storage device 35 has a storage area 355 for storing a matching data set 355A, and a storage area 356 for storing overall prediction data 356A. Note that the storage areas 355 and 356 may be a single storage area.

 適合データセット355Aは、全体予測データ356Aの算出に用いるデータベース情報またはテキスト情報などで良く、観測データ群521A(複数の観測データ)から抽出された一部としてのデータセット(二つ以上の観測データの集合)である。 The compatible data set 355A may be database information or text information used to calculate the overall prediction data 356A, and may be a data set (two or more observation data) extracted from the observation data group 521A (a plurality of observation data). ).

 全体予測データ356Aは、次のように生成される。すなわち、適合データセット355Aに含まれる二つ以上の観測データが、推移様態が類似する1つ乃至複数のグループに分類される。グループ別に、予測モデルが構築され、且つ、当該予測モデルを用いて予測データが生成される。各グループの予測データから生成された全体の予測データが、全体予測データ356Aである。全体予測データ356Aは、データベース情報、テキスト情報、または、算出した値をグラフ化した画像情報などで良い。 The overall prediction data 356A is generated as follows. That is, two or more pieces of observed data included in the compatible data set 355A are classified into one or more groups having similar transition patterns. A prediction model is constructed for each group, and prediction data is generated using the prediction model. The overall prediction data generated from the prediction data of each group is the overall prediction data 356A. The overall prediction data 356A may be database information, text information, or image information that is a graph of calculated values.

 観測データ記憶装置5は、少なくとも通信装置51、記憶装置52およびそれらに接続されたCPU53から構成される。記憶装置52は、観測データ群521Aを格納するための記憶領域521を有する。 The observation data storage device 5 includes at least a communication device 51, a storage device 52, and a CPU 53 connected thereto. The storage device 52 has a storage area 521 for storing observation data group 521A.

 外部データ記憶装置7は、少なくとも通信装置71、記憶装置72およびそれらに接続されたCPU73から構成される。記憶装置72は、外部データ群721Aを格納するための記憶領域721を有する。外部データ群721Aは、複数の外部データで構成される。 The external data storage device 7 includes at least a communication device 71, a storage device 72, and a CPU 73 connected thereto. The storage device 72 has a storage area 721 for storing an external data group 721A. The external data group 721A is composed of a plurality of external data.

 データ予測システム3は、観測データ記憶装置5および外部データ記憶装置7から取得した観測データおよび外部データを用いてデータ予測を行う。 The data prediction system 3 performs data prediction using observation data and external data acquired from the observation data storage device 5 and the external data storage device 7.

 適合抽出部351は、周期性を持つ観測データの集合である観測データ群521Aから複数のデータセット候補を抽出する。データセット候補は、二つ以上の観測データの集合である。適合抽出部351は、抽出したデータセット候補毎に、当該データセット候補に含まれるそれぞれの観測データを、複数の相違なる周波数成分の大きさを表すデータに変換した後、各周波数成分と一対一で対応する次元を持つ多次元の特徴空間に当該データをマッピングする。適合抽出部351は、特徴空間において距離が近いデータ同士をグループ化することで、データを複数のグループに分類する。適合抽出部351は、各データセット候補について、グループにそれぞれ含まれるデータ数について分散などばらつきを表す指標を算出し、複数のデータセット候補の中から前記指標が最小となる候補をデータ予測に用いる適合データセットとして出力する。
(1-3)本実施の形態によるデータ予測システムの処理およびデータフロー
The matching extraction unit 351 extracts a plurality of data set candidates from the observation data group 521A, which is a collection of periodic observation data. A dataset candidate is a collection of two or more observed data. For each extracted data set candidate, the matching extraction unit 351 converts each observed data included in the data set candidate into data representing the magnitudes of a plurality of different frequency components, and then performs one-on-one matching with each frequency component. map the data to a multidimensional feature space with corresponding dimensions. The matching extraction unit 351 classifies data into a plurality of groups by grouping data that are close to each other in the feature space. The suitability extraction unit 351 calculates, for each dataset candidate, an index representing variation such as variance for the number of data included in each group, and uses the candidate with the minimum index from among the plurality of dataset candidates for data prediction. Output as a conforming data set.
(1-3) Processing and data flow of data prediction system according to this embodiment

 図4および図5を用いて、本実施の形態におけるデータ予測システム3のデータフローおよび処理フローの説明を行う。 The data flow and processing flow of the data prediction system 3 in this embodiment will be explained using FIGS. 4 and 5.

 本実施の形態におけるデータ予測システム3は、観測データ記憶装置5、および外部データ記憶装置7からそれぞれ観測データ群521A、外部データ群721Aを取得する。 The data prediction system 3 in this embodiment acquires an observation data group 521A and an external data group 721A from the observation data storage device 5 and the external data storage device 7, respectively.

 観測データ群521Aは、適合抽出部351に入力される。適合抽出部351は、入力された観測データ群521Aから複数のデータセット候補を抽出し、当該複数のデータセット候補の内1つを適合データセット355Aとして出力する(S301)。 The observed data group 521A is input to the matching extraction unit 351. The compatible extraction unit 351 extracts a plurality of data set candidates from the input observation data group 521A, and outputs one of the plurality of data set candidates as the compatible data set 355A (S301).

 適合データセット355Aは、外部データ群721Aと共に傾向分類部352へ入力される。傾向分類部352は、入力された適合データセット355Aが含む観測データと外部データ群721Aが含む外部データとを紐づける。傾向分類部352は、適合データセット355Aに含まれるそれぞれの観測データ(典型的には時系列データ)を周波数成分の大きさを要素に持つ多次元ベクトルに変換し、ベクトル間の距離が近いデータ同士をまとめることで、適合データセット355Aと外部データ群721Aに含まれるデータを1つ乃至複数のグループに分類する(S302)。グループには、観測データが含まれ、観測データに外部データが紐づけられている。「距離」とは、ユークリッド距離、マハラノビス距離、マンハッタン距離、チェビシェフ距離、ミンコフスキー距離などの、距離の公理を満たす一般の距離尺度や、コサイン類似度などの類似度で良い。また、グループ化の処理は、例えばWard法、単リンク法、完全リンク法、重心法などに代表される階層型クラスタリング手法や、k-means、EMアルゴリズムやスペクトラルクラスタリングといった近傍最適手法としてのクラスタリング手法、もしくは教師なしSVM(Support Vector Machine)やVQアルゴリズム、SOM(Self-Organizing Maps)といった識別境界最適としてのクラスタリング手法などにより行う処理で良い。 The matching data set 355A is input to the trend classification unit 352 together with the external data group 721A. The trend classification unit 352 links the observed data included in the input compatible data set 355A with the external data included in the external data group 721A. The trend classification unit 352 converts each observed data (typically time series data) included in the adapted data set 355A into a multidimensional vector whose elements are the magnitudes of frequency components, and classifies data with close distances between the vectors. By grouping them together, the data included in the compatible data set 355A and the external data group 721A are classified into one or more groups (S302). A group includes observation data, and external data is linked to the observation data. The "distance" may be a general distance measure that satisfies the axiom of distance, such as Euclidean distance, Mahalanobis distance, Manhattan distance, Chebyshev distance, or Minkowski distance, or a degree of similarity such as cosine similarity. In addition, the grouping process can be performed using hierarchical clustering methods such as the Ward method, single link method, complete link method, or centroid method, or clustering methods as neighborhood optimal methods such as k-means, EM algorithm, or spectral clustering. Alternatively, processing may be performed using a clustering method for optimal identification boundary such as unsupervised SVM (Support Vector Machine), VQ algorithm, or SOM (Self-Organizing Maps).

 傾向分類部352により得られた複数のグループは、グループ別予測部353に入力される。グループ別予測部353は、傾向分類部352により作成されたグループ別に、予測モデルを構築し、各予測モデルを用いてグループ別の予測データ(例えば、予測された値の時系列のデータ)を算出する(S303)。 The plurality of groups obtained by the trend classification section 352 are input to the group-by-group prediction section 353. The group prediction unit 353 constructs a prediction model for each group created by the trend classification unit 352, and calculates prediction data for each group (for example, time series data of predicted values) using each prediction model. (S303).

 グループ別予測部353により算出されたグループ別の予測データは、全体予測部354に入力される。全体予測部354は、グループ別予測部353により算出されたグループ別の予測データや、適合抽出部351により抽出されなかった残余分の観測データを用いて、全体の予測データを算出し出力する(S304)。観測データが需要家毎のデータである場合、全体予測データは、当該需要家が属するエリア全体についての予測データである。 The group-specific prediction data calculated by the group-specific prediction unit 353 is input to the overall prediction unit 354. The overall prediction unit 354 calculates and outputs the overall prediction data using the group-specific prediction data calculated by the group-specific prediction unit 353 and the residual observed data not extracted by the adaptive extraction unit 351 ( S304). When the observed data is data for each customer, the overall prediction data is prediction data for the entire area to which the customer belongs.

 全体予測部354により算出された全体予測データ356Aは、需給管理設備8に入力される。需給管理設備8は、入力された全体予測データ356Aを用いて、発電機、蓄電設備、開閉器などを制御する。 The overall prediction data 356A calculated by the overall prediction unit 354 is input to the supply and demand management equipment 8. The supply and demand management equipment 8 uses the input overall forecast data 356A to control the generator, power storage equipment, switch, etc.

 以降、各部の詳細を説明する。
(1-4)各構成要素の詳細
(1-4-1)適合抽出部351
The details of each part will be explained below.
(1-4) Details of each component (1-4-1) Compatibility extraction unit 351

 図6、図7、図8を用いて、適合抽出部351の実施形態を説明する。 An embodiment of the matching extraction unit 351 will be described using FIGS. 6, 7, and 8.

 図6は、適合抽出部351内部のデータフローを示す。 FIG. 6 shows the data flow inside the matching extraction unit 351.

 適合抽出部351は、候補生成部3511、指標算出部3513、グループ分類部3514、および、候補選定部3515から構成される。記憶領域3512は、記憶装置35における一領域であり、データセット候補3512Aが中間出力として格納される。 The suitability extraction unit 351 includes a candidate generation unit 3511, an index calculation unit 3513, a group classification unit 3514, and a candidate selection unit 3515. The storage area 3512 is an area in the storage device 35, and the data set candidate 3512A is stored as an intermediate output.

 候補生成部3511は、観測データ群521Aから複数のデータセット候補3512Aを抽出し、記憶領域3512に各データセット候補3512Aを格納する。データセット候補の抽出は、無作為抽出でも良いし、複数のデータ分類の各々から一部の観測データを抽出することでも良い。データ分類は、下記のうちのいずれでも良い。
・ラベルが同じ観測データの集合(ラベルは、観測データが得られた計測器が属する属性(例えば、工場、商業施設、家庭など)に依存して良い)。
・統計量が同じまたは類似の観測データの集合(統計量は、観測データが表すデータ推移の平均値、最大値、最小値、分散などで良い)。
The candidate generation unit 3511 extracts a plurality of dataset candidates 3512A from the observed data group 521A, and stores each dataset candidate 3512A in the storage area 3512. Data set candidates may be extracted at random, or by extracting some observed data from each of a plurality of data classifications. Data classification may be any of the following.
- A collection of observation data with the same label (the label may depend on the attribute (for example, factory, commercial facility, home, etc.) to which the measuring instrument from which the observation data was obtained belongs).
- A collection of observed data with the same or similar statistics (statistics may be the average value, maximum value, minimum value, variance, etc. of the data transition represented by the observed data).

 また、各データセット候補に含まれる観測データの数は、利用者2が任意に定めて良い。 Furthermore, the number of observation data included in each dataset candidate may be arbitrarily determined by the user 2.

 指標算出部3513は、各データセット候補3512Aについて、データセット候補3512Aに含まれる観測データ毎に、観測データを、周波数成分の大きさを各要素に持つ多次元ベクトルデータに変換する。変換の処理は、例えばデータ推移を表す値を正規化する処理か、データ推移を表す値のフーリエ変換やウェーブレット変換処理、あるいはその両方で良い。変換処理により算出する周波数成分として、事前に任意の周波数成分が指定されて良い。指定される周波数成分は、異なる複数の周期成分でよく、周期成分は、年、月、週、日単位や、1時間、0.5時間単位の周期成分から選択されて良い。例として、電力分野に置いて特徴的な成分が現れやすい1年(365日)、半年(180日)、3月(90日)、1月(30日)、1週間(7日)、1日、半日(12時間)、6時間、1時間および30分(0.5時間)のうちの二つ以上の周期成分が採用されて良い。事前に周波数成分を定める方法の他に、ベクトルデータ間で値のばらつきが大きいような成分が機械的に周波数成分として採用されても良い。 For each data set candidate 3512A, the index calculation unit 3513 converts the observed data into multidimensional vector data in which each element has the magnitude of a frequency component. The conversion process may be, for example, a process of normalizing a value representing a data transition, a Fourier transform or a wavelet transform process of a value representing a data transition, or both. Any frequency component may be designated in advance as the frequency component to be calculated by the conversion process. The specified frequency component may be a plurality of different periodic components, and the periodic component may be selected from yearly, monthly, weekly, daily, 1 hour, and 0.5 hour period components. As examples, characteristic components tend to appear in the electric power field: one year (365 days), half a year (180 days), March (90 days), January (30 days), one week (7 days), one year Two or more periodic components of a day, half a day (12 hours), 6 hours, 1 hour and 30 minutes (0.5 hours) may be employed. In addition to the method of determining frequency components in advance, components whose values vary widely among vector data may be mechanically adopted as frequency components.

 グループ分類部3514は、各データセット候補3512Aについて、指標算出部3513により変換された後のベクトルデータの距離が近いデータ同士をまとめることで、適合データセット355Aにおける二つ以上の観測データについて算出された二つ以上のベクトルデータを1つ乃至複数のグループに分類する。グループの数は、事前に任意に決定された数でも良いし、AIC(Akaike Information Criterion)やBIC(Bayesian Information Criterion)等の情報量基準が最小となるような数でも良い。 For each data set candidate 3512A, the group classification unit 3514 collects data whose vector data distances are close after being converted by the index calculation unit 3513, so that the group classification unit 3514 can perform calculations for two or more observed data in the compatible data set 355A. The two or more vector data obtained are classified into one or more groups. The number of groups may be arbitrarily determined in advance, or may be a number that minimizes an information amount criterion such as AIC (Akaike Information Criterion) or BIC (Bayesian Information Criterion).

 候補選定部3515は、グループのデータ数の分散などばらつきを表す指標を、データセット候補3512A毎に計算する。候補選定部3515は、複数のデータセット候補3512Aの内、指標が最も小さいデータセット候補3512Aを、データ予測(予測モデルを用いた予測処理)に用いる適合データセット355Aとして出力する。 The candidate selection unit 3515 calculates, for each dataset candidate 3512A, an index representing variation such as the variance of the number of data in a group. The candidate selection unit 3515 outputs the dataset candidate 3512A with the smallest index among the plurality of dataset candidates 3512A as the compatible dataset 355A used for data prediction (prediction processing using a prediction model).

 図7は、適合抽出部351の処理フローを示す。本処理フローは、図5に示したS301の内部処理に相当する。 FIG. 7 shows the processing flow of the matching extraction unit 351. This processing flow corresponds to the internal processing of S301 shown in FIG.

 適合抽出部351は、S3012からS3014の処理をN回繰り返す(S3011)。なお、S3012からS3014の処理をN回繰り返す代わりに、適合抽出部351は、S3012、S3013、S3014の各処理をN回ずつ繰り返しても良い。Nは、2以上の整数である。 The matching extraction unit 351 repeats the processing from S3012 to S3014 N times (S3011). Note that instead of repeating the processes from S3012 to S3014 N times, the matching extraction unit 351 may repeat each process of S3012, S3013, and S3014 N times. N is an integer of 2 or more.

 まず、適合抽出部351は、観測データ群521Aからデータセット候補3512Aを抽出する(S3012)。 First, the matching extraction unit 351 extracts the dataset candidate 3512A from the observed data group 521A (S3012).

 次に、指標算出部3513は、S3012で抽出されたデータセット候補3512Aに含まれる各観測データを、周波数成分の大きさを各要素に持つ多次元ベクトルデータ(周期性指標)に変換する(S3013)。 Next, the index calculation unit 3513 converts each observation data included in the dataset candidate 3512A extracted in S3012 into multidimensional vector data (periodicity index) having each element as a magnitude of a frequency component (S3013 ).

 次に、グループ分類部3514は、距離が近いベクトルデータ同士をまとめることで、データセット候補3512Aにおける二つ以上の観測データについての二つ以上のベクトルデータを複数のグループに分類する(S3014)。グループ(データグループ)は、ベクトルデータ間の距離が閾値以下のベクトルデータのグループである。閾値は、二つ以上のベクトルデータに基づき決定された代表的なベクトルデータからの距離の閾値でも良いし、算出されたベクトルデータ間の相対的な距離の閾値でも良い。 Next, the group classification unit 3514 classifies two or more vector data regarding two or more observed data in the dataset candidate 3512A into a plurality of groups by grouping together vector data that are close to each other (S3014). A group (data group) is a group of vector data in which the distance between vector data is equal to or less than a threshold value. The threshold may be a distance threshold from representative vector data determined based on two or more vector data, or a relative distance threshold between calculated vector data.

 最後に、候補選定部3515は、複数(N)のデータセット候補それぞれについて、当該データセット候補についてのグループのデータ数について、分散などばらつきを表す指標を算出し、指標が最小のデータセット候補を、適合データセット355Aとして出力する(S3015)。 Finally, for each of the plurality of (N) dataset candidates, the candidate selection unit 3515 calculates an index representing variation, such as variance, with respect to the number of data in the group for the dataset candidate, and selects the dataset candidate with the smallest index. , and output as a compatible data set 355A (S3015).

 以上を以って、適合抽出部351による抽出処理が完了する。 With the above, the extraction process by the matching extraction unit 351 is completed.

 図8Aおよび図8Bを用いて、適合抽出部351の処理内容をより具体的に説明する。例として、観測データ群521Aにおける観測データは、電力消費の時系列データとする。 The processing contents of the matching extraction unit 351 will be explained in more detail using FIGS. 8A and 8B. As an example, the observed data in the observed data group 521A is time series data of power consumption.

 まず、候補生成部3511は、観測データ群521AからNのデータセット候補を抽出する。各データセット候補には、二つ以上の観測データ521B1(電力消費量の推移データ)が含まれる。Nのデータセット候補において、データセット候補における観測データの数は同じでも良いし異なっていても良い。また、Nのデータセット候補は、それぞれ、対象期間についての二つ以上の観測データで構成される。「対象期間」は、現在から過去1年間や半年といった過去の期間で良く、当該過去の期間は、予測対象期間(将来の期間)に対応した期間で良い(例えば予測対象期間が1月~12月の場合、当該過去の期間も1月~12月で良い)。 First, the candidate generation unit 3511 extracts N data set candidates from the observed data group 521A. Each data set candidate includes two or more observation data 521B1 (power consumption transition data). Among the N dataset candidates, the number of observed data in the dataset candidates may be the same or different. Further, each of the N data set candidates is composed of two or more pieces of observation data for the target period. The "target period" may be a past period such as the past year or half a year from the present, and the past period may be a period corresponding to the forecast target period (future period) (for example, the forecast target period is from January to December In the case of months, the relevant past period can also be from January to December).

 次に、指標算出部3513は、各観測データ521B1にフーリエ変換を施すことで、各観測データ521B1を、周波数成分の大きさを要素とするベクトルデータ521B2に変換する。フーリエ変換以外の変換、例えばウェーブレット変換が採用されても良い。図8Aは、便宜的に、観測データ521B1は、1日周期成分と半日周期成分の2次元のベクトルデータに変換されているが、それに代えて、1年周期、半年周期などの長周期成分や、1時間周期、30分周期などの短周期成分も持つ、より多次元のベクトルデータに変換されても良い。 Next, the index calculation unit 3513 converts each observation data 521B1 into vector data 521B2 whose elements are the magnitudes of frequency components by performing Fourier transform on each observation data 521B1. Transforms other than Fourier transform, such as wavelet transform, may be employed. In FIG. 8A, for convenience, observation data 521B1 is converted into two-dimensional vector data of a daily periodic component and a semi-daily periodic component, but instead, long periodic components such as a yearly period and a half-yearly period, etc. , 1-hour period, 30-minute period, etc., and may be converted into more multidimensional vector data.

 次に、グループ分類部3514は、周波数成分の大きさに変換したベクトルデータを、距離尺度の小さいデータ同士でグループ化する。例として、点線で表したグループの境界521B3により分類されるグループ1乃至3が作成される。なお、説明の簡略化のためデータセット候補1および2で同一のグループ数としているが、データセット候補毎にグループ数が異なっても良い。 Next, the group classification unit 3514 groups the vector data converted into the magnitude of the frequency component into data with a small distance measure. As an example, groups 1 to 3 classified by group boundaries 521B3 indicated by dotted lines are created. Note that, to simplify the explanation, the number of groups is the same for dataset candidates 1 and 2, but the number of groups may be different for each dataset candidate.

 次に、候補選定部3515は、各データセット候補について、グループ毎のデータ数を集計する(符号521B4を参照)。候補選定部3515は、グループ毎の集計したデータ数から、データセット候補毎にグループ毎のデータ数の分散を計算する(符号521B5を参照)。最後に、候補選定部3515は、分散が最小となるデータセット候補2を、適合データセット355Aとして出力する。
(1-4-2)傾向分類部352
Next, the candidate selection unit 3515 totals the number of data for each group for each dataset candidate (see reference numeral 521B4). The candidate selection unit 3515 calculates the variance of the number of data for each group for each dataset candidate from the total number of data for each group (see reference numeral 521B5). Finally, the candidate selection unit 3515 outputs the dataset candidate 2 with the minimum variance as the compatible dataset 355A.
(1-4-2) Trend classification unit 352

 傾向分類部352は、適合データセット355Aと外部データ群721Aとを入力として取得し、適合データセット355Aに含まれる観測データと外部データ群721Aに含まれる外部データとを紐づける。傾向分類部352は、適合データセット355Aに含まれる観測データの特徴量データ(ベクトルデータ)同士の類似の度合いにより、適合データセット355Aに含まれる二つ以上の観測データを、一つ乃至複数のグループに分類する。 The trend classification unit 352 receives the compatible data set 355A and the external data group 721A as input, and links the observed data included in the compatible data set 355A with the external data included in the external data group 721A. The trend classification unit 352 classifies two or more pieces of observation data included in the fit data set 355A into one or more types based on the degree of similarity between the feature data (vector data) of the observation data included in the fit data set 355A. Categorize into groups.

 図9を用いて傾向分類部352の処理内容をより具体的に説明する。 The processing contents of the trend classification unit 352 will be explained in more detail using FIG. 9.

 傾向分類部352は、特徴量変換部3521およびグループ分類部3522から構成される。記憶領域3523は、記憶装置35の一領域であり、記憶領域3523には、中間出力のグループ情報付観測データ3523A、およびグループ情報付外部データ3523Bが格納される。 The trend classification section 352 includes a feature amount conversion section 3521 and a group classification section 3522. The storage area 3523 is one area of the storage device 35, and the intermediate output observation data with group information 3523A and the external data with group information 3523B are stored.

 特徴量変換部3521は、適合データセット355Aを入力として受け取り、適合データセット355Aに含まれる各観測データを特徴量データに変換する。特徴量データは、上述した多次元ベクトルデータ(周期性指標)で良い。 The feature value conversion unit 3521 receives the compatible data set 355A as input, and converts each observed data included in the compatible data set 355A into feature data. The feature amount data may be the multidimensional vector data (periodicity index) described above.

 グループ分類部3522は、適合データセット355Aにおける観測データ毎のベクトルデータ(特徴量変換部3521から出力された特徴量データ)と、外部データ群721Aとを入力として受け取る。グループ分類部3522は、ベクトルデータに、当該ベクトルデータに対応した観測データに紐づいている外部データを紐づける。次に、グループ分類部3522は、各ベクトルデータ間の距離が近い組み合わせをグループ化することで、適合データセット355A内の二つ以上の観測データについての二つ以上のベクトルデータを一つ乃至複数のグループに分類する。グループ(データグループ)において、距離、グループへの分類(グループ化の処理)、および、グループの数は、それぞれ上述した通り(適合抽出部351について述べた通り)で良い。 The group classification unit 3522 receives vector data for each observed data in the compatible data set 355A (feature data output from the feature conversion unit 3521) and the external data group 721A as input. The group classification unit 3522 links vector data with external data that is linked to observation data corresponding to the vector data. Next, the group classification unit 3522 groups the combinations in which the distance between each vector data is close, so that two or more vector data regarding two or more observed data in the compatible data set 355A are grouped into one or more vector data. Classify into groups. In a group (data group), the distance, classification into groups (grouping process), and number of groups may be as described above (as described for the matching extraction unit 351).

 最後に、グループ分類部3522は、グループ情報付観測データ3523A、およびグループ情報付外部データ3523Bを、記憶領域3523に格納し、グループ別予測部353へ出力する。グループ情報付観測データ3523Aは、適合データセット355Aにおける各観測データを含み、当該各観測データに、当該観測データの特徴量データの分類先グループを表すグループ情報が紐づけられている。グループ情報付外部データ3523Bは、適合データセット355Aにおける観測データに紐づけられた外部データを含み、当該外部データに、当該外部データが紐づけられた観測データの特徴量データの分類先グループを表すグループ情報が紐づけられている。
(1-4-3)グループ別予測部353
Finally, the group classification unit 3522 stores the observation data with group information 3523A and the external data with group information 3523B in the storage area 3523, and outputs them to the group-by-group prediction unit 353. The observation data with group information 3523A includes each observation data in the compatible data set 355A, and each observation data is associated with group information indicating the group to which the feature amount data of the observation data is classified. The external data with group information 3523B includes external data linked to the observed data in the compatible data set 355A, and represents the group to which the feature data of the observed data to which the external data is linked is classified. Group information is linked.
(1-4-3) Group-specific prediction unit 353

 グループ別予測部353では、傾向分類部352から出力されたグループ情報付観測データ3523A、グループ情報付外部データ3523B、および外部データ群721Aを入力として取得し、グループ毎に予測モデルの構築、および予測対象期間の予測データの算出を行う。 The group-by-group prediction unit 353 receives as input the observed data with group information 3523A, the external data with group information 3523B, and the external data group 721A output from the trend classification unit 352, and constructs a prediction model and makes predictions for each group. Calculate forecast data for the target period.

 図10を用いてグループ別予測部353の処理内容をより具体的に説明する。 The processing contents of the group-by-group prediction unit 353 will be explained in more detail using FIG. 10.

 グループ別予測部353は、構築部3531およびグループ算出部3532から構成される。記憶領域3533は、記憶装置35の一領域であり、記憶領域3533には、グループ別予測データ3533Aが格納される。 The group prediction unit 353 is composed of a construction unit 3531 and a group calculation unit 3532. The storage area 3533 is an area of the storage device 35, and the group-specific prediction data 3533A is stored in the storage area 3533.

 構築部3531は、グループ情報付観測データ3523Aと、グループ情報付外部データ3523Bを入力として受け取り、グループ別に予測モデルを構築する。グループ毎に、予測モデルは、グループ情報付観測データ3523Aのある当該グループに属すデータを目的変数とし、グループ情報付外部データ3523Bの当該グループに属すデータを説明変数とした、回帰、分類、クラスタリング等のモデルや、それらを組み合わせたモデルで良い。予測モデルの構築には、グループ情報付観測データ3523Aのあるグループに属すデータから平均値や中央値を取った値が採用されて良い。また、グループ別に異なるモデルが採用されても良い。 The construction unit 3531 receives the observation data with group information 3523A and the external data with group information 3523B as input, and constructs a prediction model for each group. For each group, the prediction model uses regression, classification, clustering, etc., using data belonging to the group with group information attached observation data 3523A as an objective variable, and using data belonging to the group in the group information attached external data 3523B as an explanatory variable. A model or a model that combines them is fine. In constructing the prediction model, a value obtained by taking an average value or a median value from data belonging to a certain group of the observation data with group information 3523A may be adopted. Further, different models may be adopted for each group.

 グループ算出部3532は、構築部3531から出力されたグループ別予測モデルと、外部データ群721Aを入力として受け取り、それぞれのグループの予測対象期間の予測データを算出する。外部データ群721Aは、予測データの算出対象とするグループの予測モデル構築に用いた観測データに紐づくデータの、予測対象期間における実績値や予報値を含んでも良い。 The group calculation unit 3532 receives the group-by-group prediction model output from the construction unit 3531 and the external data group 721A as input, and calculates prediction data for the prediction target period of each group. The external data group 721A may include actual values and forecast values in the prediction target period of data associated with observation data used to construct the prediction model of the group for which prediction data is to be calculated.

 最後に、グループ算出部3532は、各グループについて算出した予測データを、グループ別予測データ3533Aとして記憶領域3533に格納し、全体予測部354へ出力する。
(1-4-4)全体予測部354
Finally, the group calculation unit 3532 stores the prediction data calculated for each group in the storage area 3533 as group-specific prediction data 3533A, and outputs it to the overall prediction unit 354.
(1-4-4) Overall prediction unit 354

 全体予測部354は、グループ別予測部353から出力されたグループ別予測データ3533A、観測データ群521A、適合データセット355A、および外部データ群721Aを入力として取得し、全体の予測データ(例えば、適合データセット355Aに属する観測データに対応した需要家が属するエリア全体についての予測データ)を算出する。 The overall prediction unit 354 receives as input the group-specific prediction data 3533A, the observed data group 521A, the adaptive data set 355A, and the external data group 721A output from the group-specific prediction unit 353, and obtains the overall prediction data (e.g., adaptive Forecast data for the entire area to which the customer corresponding to the observation data belonging to the data set 355A belongs is calculated.

 図11を用いて全体予測部354の処理内容をより具体的に説明する。 The processing contents of the overall prediction unit 354 will be explained in more detail using FIG. 11.

 全体予測部354は、残余予測部3541および全体算出部3542から構成される。 The overall prediction unit 354 includes a residual prediction unit 3541 and an overall calculation unit 3542.

 残余予測部3541は、まず観測データ群521Aのうち適合データセット355A以外の観測データの総和を取ることで、残余データを作成する。残余データは、観測データ群521Aのうち適合データセット355A以外の観測データの集合でも良い。次に、残余予測部3541は、残余データを目的変数とし、外部データ群721Aに含まれるデータの内の残余データに紐づくデータを説明変数とし、予測モデルの構築を行う。予測モデルは、回帰、分類、クラスタリング等のモデルや、それらを組み合わせたモデルで良い。当該予測モデルの構築には、残余データから平均値や中央値を取った値が採用されても良い。最後に、残余予測部3541は、説明変数として用いた外部データ群721Aの予測対象期間の実績値や予測値を予測モデルに入力し、残余データの予測値を算出し、全体算出部3542へ出力する。 The residual prediction unit 3541 first creates residual data by taking the sum of observed data other than the compatible data set 355A among the observed data group 521A. The residual data may be a set of observed data other than the compatible data set 355A among the observed data group 521A. Next, the residual prediction unit 3541 constructs a prediction model using the residual data as an objective variable and the data associated with the residual data among the data included in the external data group 721A as an explanatory variable. The predictive model may be a regression, classification, clustering, or other model, or a combination of these models. In constructing the prediction model, a value obtained by taking an average value or a median value from the residual data may be adopted. Finally, the residual prediction unit 3541 inputs the actual values and predicted values for the prediction period of the external data group 721A used as explanatory variables into the prediction model, calculates the predicted value of the residual data, and outputs it to the overall calculation unit 3542. do.

 全体算出部3542は、グループ別予測データ3533A、および残余予測部3541から出力された残余データの予測値を用いて、需要家全体の予測値を算出する。具体的には、例えば、全体算出部3542は、グループ別予測データ3533Aの各値と残余データの予測値との総和を取り、全体予測データ356Aとして出力する。 The overall calculation unit 3542 uses the group-specific prediction data 3533A and the predicted value of the residual data output from the residual prediction unit 3541 to calculate the predicted value for the entire customer. Specifically, for example, the overall calculation unit 3542 takes the sum of each value of the group-by-group prediction data 3533A and the predicted value of the residual data, and outputs the sum as the overall prediction data 356A.

 以上の処理を以て、本実施形態における第一の演算処理が終了し、そして本実施形態におけるデータ予測システム3の演算処理が終了する。
(1-5)本実施形態の効果の一例の説明
With the above processing, the first calculation process in this embodiment is completed, and the calculation process of the data prediction system 3 in this embodiment is completed.
(1-5) Description of an example of the effect of this embodiment

 図12を参照し、本実施の形態によるデータ予測システム3の効果を説明する。 With reference to FIG. 12, the effects of the data prediction system 3 according to this embodiment will be explained.

 図12は、適合抽出部351により抽出した適合データセット355Aを用いて予測対象期間についての予測データ(例えば、予測対象期間における予測される観測データ)を算出した場合の効果の一例の概要を示す図である。例として、スマートメータにより収集された電力消費量を示す観測データを対象とする。なお、凡例24に示す通り、グラフでは、点線が観測データ(予測対象期間に対応の過去の期間における観測値の時系列)、実線が予測データ(予測対象期間についての予測された観測値の時系列)、破線が予測対象の観測データ(予測対象期間について実際に観測された観測値の時系列)をそれぞれ表す。点線の観測データは、適合データセット355Aに含まれる観測データである。実線の予測データは、グループ別予測データ3533Aや全体予測データ356Aとして出力されるデータである。黒い点25は、指標算出部3513により算出され特徴空間上にプロットされたベクトルデータ(観測データから変換されたデータ)を模擬的に示している。特徴空間は、異なる複数の周期成分をそれぞれ軸とした座標系の空間である。模擬的に、当該特徴空間に、ベクトルデータが座標としてプロットされる。 FIG. 12 shows an overview of an example of the effect when prediction data for the prediction target period (for example, observed data to be predicted in the prediction target period) is calculated using the compatible data set 355A extracted by the compatible extraction unit 351. It is a diagram. As an example, we will use observation data indicating power consumption collected by smart meters. As shown in legend 24, in the graph, the dotted line represents observed data (the time series of observed values in the past period corresponding to the forecast period), and the solid line represents the predicted data (time series of observed values predicted for the forecast period). series), and the broken lines represent the observed data to be predicted (the time series of observed values actually observed for the prediction target period). The observed data indicated by the dotted line is the observed data included in the compatible data set 355A. The prediction data indicated by the solid line is data output as the group-based prediction data 3533A or the overall prediction data 356A. The black dots 25 simulate vector data (data converted from observed data) calculated by the index calculation unit 3513 and plotted on the feature space. The feature space is a coordinate system space with axes each having a plurality of different periodic components. Vector data is plotted as coordinates in the feature space in a simulated manner.

 ケース21は、工場やオフィスや一般家庭などの種別を表す属性の内訳が均一になるよう観測データを抽出し、同一属性で分類したグループ単位で予測データを算出するケースである。このケースでは、工場と家庭の二種類の属性が存在しており、属性別でデータがグループに分類されている。このケースでは、工場の属性を持つグループAの中に電力消費傾向が異なるデータが混在している。傾向の違いの原因は、例えば太陽光発電の導入量の違いや業態の違いである。グループA内に異なる傾向のデータが混在した結果、グループAで精度よく需要予測ができず、その結果、最終的に出力される全体予測データと、予測対象の観測データが乖離する。 Case 21 is a case in which observed data is extracted so that the breakdown of attributes representing the type of factory, office, general household, etc. is uniform, and predicted data is calculated for each group classified by the same attribute. In this case, there are two types of attributes: factory and household, and data is classified into groups by attribute. In this case, data with different power consumption trends are mixed in group A that has the factory attribute. Differences in trends are caused by, for example, differences in the amount of solar power generation installed and differences in business types. As a result of data with different trends being mixed in group A, it is not possible to accurately predict demand in group A, and as a result, there is a discrepancy between the overall forecast data that is finally output and the observed data to be predicted.

 ケース22は、工場やオフィスや一般家庭などの種別を表す属性の内訳が均一になるよう観測データを抽出し、特徴空間上の距離尺度が小さいデータ同士をまとめたグループ単位で予測データを算出するケースである。このケースでは、同一属性で傾向が異なるデータは別グループへ分類されるが、グループBのように予測モデルを構築するための学習用データを十分に確保できないようなグループ(つまりデータ数が極めて少ないグループ)が生じ得る。結果、当該グループについて需要予測が難しくなり、全体予測データに誤差が残る結果となる。 In case 22, observation data is extracted so that the breakdown of attributes representing types such as factories, offices, and general households is uniform, and predicted data is calculated in groups of data with small distance scales in the feature space. It is a case. In this case, data with the same attributes but different trends are classified into different groups, but groups such as group B for which it is not possible to secure sufficient training data to build a predictive model (in other words, the number of data is extremely small) are classified into different groups. group) may occur. As a result, it becomes difficult to predict demand for this group, resulting in errors remaining in the overall forecast data.

 ケース23は、本実施の形態による抽出方法により観測データを抽出し、特徴空間上の距離尺度が小さいデータ同士をまとめたグループ単位で予測データを算出するケースである。このケースでは、グループ別のデータ数のばらつき(例えば分散)が小さくなるようなデータセットが選択されるため、グループ間でデータ数が均等になりやすい。その結果、データ数が少なくなるようなグループの発生を防止し、正確な需要予測が可能となる。 Case 23 is a case in which observed data is extracted using the extraction method according to the present embodiment, and predicted data is calculated for each group of data with a small distance scale in the feature space. In this case, a data set is selected in which the variation (for example, variance) in the number of data for each group is small, so the number of data is likely to be equal between the groups. As a result, it is possible to prevent groups with a small amount of data from occurring and to make accurate demand predictions possible.

 すなわち、本実施の形態では、同一属性のグループにデータを分類してグループ別に予測をすることの技術的課題、具体的には、観測値の推移傾向が異なるデータが同一属性のグループに混在するために予測精度が低下し得ることを解決することができる。なぜなら、各データセット候補について、観測データ毎に、異なる複数の周波数成分の大きさを要素に持つベクトルデータが算出され、ベクトルデータ間の距離を基に、データがグループに分類され、故に、同一グループに観測値の推移傾向が異なるデータが混在することは避けられるからである。 In other words, in this embodiment, we will address the technical issue of classifying data into groups with the same attribute and making predictions for each group. Specifically, we will address the technical issue of classifying data into groups with the same attribute and making predictions for each group. Specifically, we will address the technical issue of classifying data into groups with the same attribute and making predictions for each group. Therefore, it is possible to solve the problem that the prediction accuracy may decrease due to the above. This is because, for each data set candidate, vector data whose elements are the magnitudes of different frequency components is calculated for each observed data, and the data is classified into groups based on the distance between the vector data. This is because it is possible to avoid mixing data with different trends in observed values in a group.

 また、本実施の形態では、観測値の推移傾向が類似のグループにデータを分類してグループ別に予測をすることの技術的課題、具体的には、予測に必要なデータ数が不足するために予測精度が低下し得ることのいずれも、本実施の形態では解決することができる。なぜなら、複数のデータセット候補が抽出され、適合データセット355Aは、複数のデータセット候補のうちグループのデータ数のばらつきが最も小さいデータセットである、つまり、いずれのグループについても予測モデル構築用のデータ数が極端に不足しないことが期待されるデータセットであるからである。 In addition, in this embodiment, the technical problem of classifying data into groups with similar trend trends of observed values and making predictions for each group, specifically, due to the insufficient amount of data required for prediction. Any of the problems that could lead to a decrease in prediction accuracy can be solved in this embodiment. This is because multiple dataset candidates are extracted, and the compatible dataset 355A is the dataset with the smallest variation in the number of data in groups among the multiple dataset candidates. This is because it is a data set that is expected not to have an extremely insufficient amount of data.

 なお、複数の周期成分の各々について、周期は、任意の周期(例えば、1年より長い期間、1年、半年、3月、1月、1週間、1日、半日、6時間、1時間および0.5時間のうちのいずれか)で良い。この周期は、観測値の周期性が期待される周期であるため、適合データセット355Aが適切であることがより期待される。また、周波数成分と時間成分が併用されて良い。例えば、周波数成分のみが用いられる場合、フーリエ変換が採用され、周波数成分と時間成分が併用される場合、ウェーブレット変換が採用されて良い。
(2)第二の実施の形態(適合抽出部351の処理頻度の変更)
Note that for each of the plurality of periodic components, the period may be any period (for example, a period longer than one year, one year, half a year, March, January, one week, one day, half a day, six hours, one hour, and 0.5 hours) is sufficient. Since this period is a period in which periodicity of observed values is expected, it is more expected that the adapted data set 355A is appropriate. Further, the frequency component and the time component may be used together. For example, when only frequency components are used, Fourier transform may be employed, and when frequency components and time components are used together, wavelet transform may be employed.
(2) Second embodiment (changing the processing frequency of the matching extraction unit 351)

 第一の実施の形態では、予測処理のたびに適合抽出部351を実行して適合データセット355Aを新しく抽出する構成としているが、これに限らず、適合抽出部351による抽出処理の実行の有無は任意として良い。 In the first embodiment, the compatibility extraction unit 351 is executed every time prediction processing is performed to extract a new compatibility data set 355A, but the present invention is not limited to this. may be optional.

 具体的には、適合抽出部351の実行の契機は、例えば利用者2が任意に設定した一定期間の経過や、一定回数の実行の後、あるいは観測データ記憶装置5に記憶されている需要家の一定以上の増減が生じた場合などとして良い。 Specifically, the execution of the compatibility extraction unit 351 is triggered, for example, after a certain period of time arbitrarily set by the user 2, after a certain number of executions, or when a consumer stored in the observation data storage device 5 This may be considered as a case where an increase or decrease of more than a certain value occurs.

 適合抽出部351による抽出処理の実行の有無を任意とすることで、処理負荷の低減の効果が得られる。
(3)第三の実施の形態(候補選定部3515の処理内容の変更)
By making it optional whether or not the matching extraction unit 351 executes the extraction process, the effect of reducing the processing load can be obtained.
(3) Third embodiment (change in processing content of candidate selection unit 3515)

 第一の実施の形態では、適合抽出部351において、候補選定部3515は、複数のデータセット候補3512Aから適合データセット355Aを選定する際、各候補におけるグループ別のデータ数の分散(ばらつきの一例)を評価指標としていたが、これに限らず、拡張した評価指標を用いても良い(データセット候補の評価指標は、観測データを入力とし予測データを出力とする予測モデルへの適合度の一例である)。 In the first embodiment, in the matching extraction unit 351, the candidate selection unit 3515 selects the matching data set 355A from the plurality of data set candidates 3512A. ) was used as the evaluation index, but the evaluation index is not limited to this, and an expanded evaluation index may be used. ).

 具体的には、候補選定部3515は、一のデータセット候補3512Aの評価の際に、当該候補についての分散を評価指標とすることに加えて、当該データセット候補3512Aの残余データ(観測データ群521Aのうちの当該データセット候補3512A以外の観測データ)の評価指標を算出して良い(すなわち、残余データにおける各観測データも上述のベクトルデータに変換してベクトルデータを一つ以上のグループに分類して良い)。候補選定部3515は、当該データセット候補3512Aの分散と残余データの評価指標との線形和を取った値を全体の評価指標として、当該全体の評価指標を、当該データセット候補3512Aの評価指標として良い。 Specifically, when evaluating one dataset candidate 3512A, the candidate selection unit 3515 uses the variance of the candidate as an evaluation index, and also uses the residual data (observed data group) of the dataset candidate 3512A as an evaluation index. 521A other than the relevant data set candidate 3512A) (that is, each observation data in the residual data may also be converted to the above-mentioned vector data and the vector data may be classified into one or more groups. ). The candidate selection unit 3515 sets the linear sum of the variance of the dataset candidate 3512A and the evaluation index of the residual data as the overall evaluation index, and sets the overall evaluation index as the evaluation index of the dataset candidate 3512A. good.

 残余データは、観測データ群521Aのうちの当該データセット候補3512A以外の観測データの総和で良く、残余予測部3541と同様の方法で得られたデータ良い。候補選定部3515は、残余予測部3541と同様の方法で、残余データを予測する予測モデル(残余データを目的変数とする予測モデル)を構築し、当該予測モデルの訓練誤差や尤度を算出し、当該訓練誤差や尤度を、残余データの評価指標として良い。 The residual data may be the sum of observed data other than the relevant data set candidate 3512A among the observed data group 521A, and may be data obtained by the same method as the residual prediction unit 3541. The candidate selection unit 3515 constructs a prediction model that predicts the residual data (a prediction model using the residual data as an objective variable) in the same manner as the residual prediction unit 3541, and calculates the training error and likelihood of the prediction model. , the training error or likelihood may be used as an evaluation index for the residual data.

 全体の評価指標は、上述したように、当該データセット候補3512Aのばらつきを表す指標と、残余データの評価指標との線形和で良い。線形和の重みは、事前に設定した任意の値や、抽出された観測データ群と抽出されなかった観測データ群のデータ数の割合を用いることができる。また、残余データの評価指標として予測モデルの尤度を用いる場合は、上述の線形和は、尤度の逆数または訓練誤差とばらつきの指標との線形和で良い。 As described above, the overall evaluation index may be the linear sum of the index representing the variation of the data set candidate 3512A and the evaluation index of the residual data. As the weight of the linear sum, an arbitrary value set in advance or a ratio of the number of data in the extracted observation data group and the unextracted observation data group can be used. Further, when using the likelihood of the prediction model as an evaluation index of residual data, the above-mentioned linear sum may be the reciprocal of the likelihood or the linear sum of the training error and the dispersion index.

 最後に、候補選定部3515は、全体の評価指標値が最小となるデータセット候補を、適合データセット355Aとして出力する。 Finally, the candidate selection unit 3515 outputs the dataset candidate with the minimum overall evaluation index value as the compatible dataset 355A.

 拡張した評価指標を用いることで、より正確な予測データの算出が可能となる適合データセット355Aを抽出することが可能となる。
(4)第四の実施の形態(傾向分類部352の構成の変更)
By using the expanded evaluation index, it becomes possible to extract a suitable data set 355A that allows calculation of more accurate prediction data.
(4) Fourth embodiment (change in configuration of trend classification unit 352)

 第一の実施の形態では、傾向分類部352におけるグループの数は、事前に任意に決定した数等であるが、これに限らず、グループ数は、予測対象期間の直近過去の観測データを用いて検証した精度が最大となるようなグループ数で良い。 In the first embodiment, the number of groups in the trend classification unit 352 is arbitrarily determined in advance, but is not limited to this. The number of groups that maximizes the accuracy verified by

 図13を用いて具体的に説明する。傾向分類部352が、グループ分類部3522の拡張であるグループ構造決定部3524を有する。 This will be explained in detail using FIG. 13. The trend classification section 352 includes a group structure determination section 3524 that is an extension of the group classification section 3522.

 グループ構造決定部3524は、グループ数候補設定部35241、グループ分類部35242、精度検証部35243、およびグループ数決定部35244から構成される。 The group structure determining unit 3524 includes a group number candidate setting unit 35241, a group classification unit 35242, an accuracy verification unit 35243, and a group number determining unit 35244.

 グループ数候補設定部35241は、特徴量変換部3521から、適合データセット355Aに含まれる各観測データを異なる複数の周波数成分の大きさを要素に持つベクトルデータに変換した値を入力として受け取り、ベクトルデータをグループ分類する際のグループ数候補の値を生成する。グループ数候補は、事前に任意に定めた自然数の集合、あるいは、入力されたデータ数を基に定めたグループ数の最小値から最大値まで、任意に定めた数ずつ増やしていった値の集合として良い。最後に、グループ数候補設定部35241は、当該ベクトルデータとグループ数候補の集合をグループ分類部35242へ出力する。 The group number candidate setting unit 35241 receives as input a value obtained by converting each observed data included in the compatible data set 355A into vector data having elements of different sizes of frequency components from the feature amount converting unit 3521, and Generate a value for the number of groups when classifying data into groups. The group number candidates are a set of natural numbers arbitrarily determined in advance, or a set of values that are increased by an arbitrarily determined number from the minimum value to the maximum group number determined based on the number of input data. Good as. Finally, the group number candidate setting unit 35241 outputs the vector data and the set of group number candidates to the group classification unit 35242.

 グループ分類部35242は、グループ数候補設定部35241から出力されたベクトルデータとグループ数候補の集合、および外部データ群721Aを入力として受け取る。次に、グループ分類部35242は、各ベクトルデータ間の距離が近い組み合わせをグループ化することで、各ベクトルデータと、それぞれに紐づく外部データ群721Aをグループ数候補により既定された数のグループに分類する。最後に、グループ分類部35242は、グループ数候補とそれに対応するグループ情報付観測データ、およびグループ情報付外部データを精度検証部35243へ出力する。 The group classification unit 35242 receives as input the vector data and the set of group number candidates output from the group number candidate setting unit 35241, and the external data group 721A. Next, the group classification unit 35242 groups each vector data and the external data group 721A linked thereto into a predetermined number of groups based on the group number candidates by grouping combinations in which the distance between each vector data is close. Classify. Finally, the group classification unit 35242 outputs the group number candidates, the corresponding observation data with group information, and the external data with group information to the accuracy verification unit 35243.

 精度検証部35243は、グループ分類部35242から出力されたグループ数候補とそれに対応するグループ情報付観測データ、およびグループ情報付外部データを入力として受け取り、グループ数候補毎に予測モデルの構築と予測値算出を行う。予測モデルの構築と予測値の算出は、グループ別予測部353、および全体予測部354と同様の方法で行われて良い。ここで、予測対象期間は、データ予測システム3で予測しようとする対象期間のなるべく直近で、且つ観測データが実測値として得られている期間を選択すると良い。次に、精度検証部35243は、グループ数候補毎に算出した予測値と実績値として得られている観測データを比較し、予測精度を評価する。精度の指標には、絶対誤差等の誤差の尺度を用いて算出して良い。最後に、グループ数候補毎のグループ情報付観測データ、グループ情報付外部データ、および精度の指標値をグループ数決定部35244へ出力する。 The accuracy verification unit 35243 receives as input the group number candidates output from the group classification unit 35242, the corresponding observed data with group information, and the external data with group information, and constructs a prediction model and predicts values for each group number candidate. Perform calculations. The construction of the prediction model and the calculation of the prediction value may be performed in the same manner as the group prediction unit 353 and the overall prediction unit 354. Here, as the prediction target period, it is preferable to select a period as close as possible to the target period to be predicted by the data prediction system 3 and for which observation data is obtained as actual measured values. Next, the accuracy verification unit 35243 compares the predicted value calculated for each group number candidate with the observed data obtained as the actual value, and evaluates the prediction accuracy. The accuracy index may be calculated using an error measure such as absolute error. Finally, the observed data with group information, the external data with group information, and the accuracy index value for each group number candidate are output to the group number determining unit 35244.

 グループ数決定部35244は、精度評価部35234からグループ数候補毎のグループ情報付観測データ、グループ情報付外部データ、および精度の指標値を入力として受け取り、精度が最良となるグループ数を決定する。具体的には、グループ数候補毎の誤差の尺度を比較し、誤差最小となるグループ数候補におけるグループ情報付観測データ、およびグループ情報付外部データを出力する。 The group number determining unit 35244 receives as input observation data with group information, external data with group information, and accuracy index value for each group number candidate from the accuracy evaluation unit 35234, and determines the number of groups that provides the best accuracy. Specifically, the error scale for each group number candidate is compared, and observation data with group information and external data with group information for the group number candidate with the minimum error are output.

 グループ構造決定部3524を用いることにより、処理負荷が増加する代わりに、対象期間の予測値をより正確に算出することが可能となる。
(5)第五の実施の形態(全体予測部354の構成の変更)
By using the group structure determination unit 3524, it becomes possible to more accurately calculate the predicted value for the target period at the cost of increasing the processing load.
(5) Fifth embodiment (change in configuration of overall prediction unit 354)

 第一の実施の形態では、全体予測部354で、グループ別予測データ3533Aと残余予測部3541の算出結果から全体予測データ356Aを算出する構成としているが、これに限らず、残余予測部3541を用いずにグループ別予測データ3533Aのみを用いて全体予測データ356Aを算出する構成としても良い。 In the first embodiment, the overall prediction unit 354 is configured to calculate the overall prediction data 356A from the group-specific prediction data 3533A and the calculation results of the residual prediction unit 3541, but the present invention is not limited to this. The overall prediction data 356A may be calculated using only the group prediction data 3533A without using the group prediction data 3533A.

 その場合、全体算出部3542では、全体の需要家数から適合データセット355Aに含まれる観測データの取得元の需要家数を除した係数を算出し、当該係数をグループ別予測データ3533Aの総和に乗じることで、全体予測データ356Aを算出する。 In that case, the overall calculation unit 3542 calculates a coefficient obtained by dividing the total number of consumers by the number of consumers from which the observed data included in the compatible data set 355A is obtained, and adds the coefficient to the sum of the group-specific forecast data 3533A. By multiplying, the overall prediction data 356A is calculated.

 残余予測部3541を用いずに全体予測データ356Aを算出することにより、予測誤差が一定程度増加する代わりに、低い処理負荷で全体予測データ356Aを得ることが可能となる。
(6)第六の実施の形態(適合抽出部351の構成の変更)
By calculating the overall prediction data 356A without using the residual prediction unit 3541, the overall prediction data 356A can be obtained with a low processing load, although the prediction error increases to a certain extent.
(6) Sixth embodiment (change in configuration of matching extraction unit 351)

 第一の実施の形態では、適合抽出部351において、適合データセット355Aに含まれるデータ数は利用者2が任意に定めて良いとしていたが、これに限らず、データサイズと、該データサイズの適合データセットを用いた場合に得られる精度の指標か、該データサイズの適合データセットを用いた場合の処理負荷か、あるいは両方との関係を加味してデータサイズが定められて良い。 In the first embodiment, the number of data included in the compatible data set 355A can be arbitrarily determined by the user 2 in the compatible extraction unit 351, but this is not limited to this. The data size may be determined by considering the relationship between an index of accuracy obtained when using a compatible data set, a processing load when using a compatible data set of the data size, or both.

 図14を用いて具体的に説明する。適合抽出部351は、データサイズ決定部3516を備えて良い。適合抽出部351は、適合データセットとして出力するデータセット候補を選定する際、データセット候補のばらつきの指標、あるいは第三の実施形態に記載した評価指標に加えて、データサイズ決定部3516により決定されたデータサイズを用いる。 This will be explained in detail using FIG. 14. The suitability extraction section 351 may include a data size determination section 3516. When selecting a data set candidate to be output as a suitable data set, the suitability extraction unit 351 uses the data determined by the data size determination unit 3516 in addition to the variation index of the data set candidates or the evaluation index described in the third embodiment. Use the specified data size.

 データサイズ決定部3516は、適合データセットのデータサイズや、全体予測データ356Aとして得た予測結果の精度の実績値や、予測値の算出時の処理負荷の実績から、データサイズと予測精度、あるいは処理負荷、あるいはその両方との関係をモデル化する。入力装置32から目標とする予測精度、処理負荷、あるいはその両方の情報を受け取り、予測精度、処理負荷、あるいはその両方を満たすようなデータサイズをモデル(関係モデルの一例)から特定する。データサイズ決定部3516は、特定したデータサイズを候補選定部3515へ出力し、候補選定部3515は入力されたデータサイズを持つデータセット候補を絞り込んだ上で、適合データセットとして出力するデータセット候補を選定する。 The data size determination unit 3516 determines the data size and prediction accuracy based on the data size of the compatible data set, the actual accuracy of the prediction result obtained as the overall prediction data 356A, and the actual processing load when calculating the predicted value. Model the relationship between processing load and/or both. Information on the target prediction accuracy, processing load, or both is received from the input device 32, and a data size that satisfies the prediction accuracy, processing load, or both is specified from the model (an example of a relational model). The data size determination unit 3516 outputs the specified data size to the candidate selection unit 3515, and the candidate selection unit 3515 narrows down the dataset candidates having the input data size and outputs the dataset candidates as a suitable dataset. Select.

 以上に限らず、データサイズ決定部3516は、予測精度の実績の代わりに、候補選定部3515で算出した各データセット候補の評価指標を用いることとし、評価指標が一定の閾値を超えるようなデータサイズに絞り込む処理としても良い。 Not limited to the above, the data size determination unit 3516 uses the evaluation index of each dataset candidate calculated by the candidate selection unit 3515 instead of the prediction accuracy record, and the data size determination unit 3516 uses the evaluation index of each dataset candidate calculated by the candidate selection unit 3515, and It may also be a process of narrowing down to size.

 また、データサイズ決定部3516により決定したデータサイズは、候補選定部3515へ出力する代わりに、候補生成部3511へ出力しても良い。その場合、候補生成部3511は、入力として受け取った値のデータサイズを持つようなデータセット候補を生成する処理とする。 Furthermore, the data size determined by the data size determination unit 3516 may be output to the candidate generation unit 3511 instead of being output to the candidate selection unit 3515. In that case, the candidate generation unit 3511 generates a dataset candidate having the data size of the value received as input.

 以上のように、データサイズ決定部3516は、下記(A)を出力とし下記(B)および(C)の少なくとも一つを入力とするモデルである関係モデルを構築し、当該関係モデルを用いて、(B)および(C)の少なくとも一つに関する条件を満たすデータサイズを推定して良い。
(A)二つ以上の観測データで構成されたデータセットのデータサイズ。
(B)当該データサイズのデータセットを用いて予測処理を行った場合の予測精度、または、当該データセットの予測モデルへの適合度。
(C)当該データサイズのデータセットを用いて予測処理を行った場合の処理負荷。
As described above, the data size determination unit 3516 constructs a relational model that takes the following (A) as an output and at least one of the following (B) and (C) as an input, and uses the relational model to , (B) and (C) may be estimated.
(A) Data size of a dataset composed of two or more observational data.
(B) Prediction accuracy when a prediction process is performed using a dataset of the data size, or the degree of compatibility of the dataset with a prediction model.
(C) Processing load when prediction processing is performed using a data set of the data size.

 データサイズ決定部3516は、関係モデルを用いて、(B)および(C)の少なくとも一つに関する条件を満たすデータサイズを推定して良い。適合データセットは、当該推定されたデータサイズのデータセットで良い。 The data size determination unit 3516 may use a relational model to estimate a data size that satisfies the conditions regarding at least one of (B) and (C). The suitable data set may be a data set having the estimated data size.

 データサイズ決定部3516を使用することで、予測処理を過去一定期間分を対象に実施することで、予測精度の悪化または改善の傾向を把握し、適合データセット355Aの更新の要否を判断することができる。
(7)第七の実施の形態(適合抽出部351への外部データの入力の追加)
By using the data size determination unit 3516, by performing prediction processing for a certain period of time in the past, trends in prediction accuracy deterioration or improvement can be grasped, and it is determined whether or not it is necessary to update the compatible data set 355A. be able to.
(7) Seventh embodiment (addition of input of external data to matching extraction unit 351)

 第一の実施の形態では、適合抽出部351では観測データの周期性を示す指標のみを用いてグループ分類を行っていたが、これに限らず、外部データの情報も用いて分類を行っても良い。 In the first embodiment, the matching extraction unit 351 performs group classification using only the index indicating the periodicity of observed data, but the classification is not limited to this, and classification may also be performed using external data information. good.

 図15を用いて具体的に説明する。グループ分類部3514は、外部データ群721Aも入力として受け取り、外部データの情報も用いてグループ分類を行う。外部データは、例えば観測データを観測した主体の位置情報を用いて良い。電力需要予測を例とすると、電力の需要家の位置情報が近いデータ同士を同一のグループに所属させるようにすることで、気象や地形などの地点依存の影響を正確に予測モデルに反映させることが可能となり、予測精度の向上が期待できる。
(8)第八の実施の形態(適合抽出部351への観測データ補完部3517の追加)
This will be explained in detail using FIG. 15. The group classification unit 3514 also receives the external data group 721A as input, and performs group classification using the external data information as well. As the external data, for example, position information of the subject who observed the observation data may be used. Taking electricity demand forecasting as an example, by assigning data with similar location information of electricity consumers to the same group, it is possible to accurately reflect location-dependent influences such as weather and topography in the prediction model. can be expected to improve prediction accuracy.
(8) Eighth embodiment (addition of observation data complementation unit 3517 to matching extraction unit 351)

 第一の実施の形態では、適合抽出部351へ入力された観測データ群521Aをそのまま使用していたが、これに限らず、ある観測データの過去の実績値を他の観測データの過去の実績値で代用して良い。 In the first embodiment, the observation data group 521A input to the matching extraction unit 351 is used as is, but the present invention is not limited to this. You can substitute the value.

 図16を用いて具体的に説明する。適合抽出部351が、観測データ補完部3517を備える。適合抽出部351へは、観測データ群521Aを観測データ補完部3517で処理した後に候補生成部3511へ入力する。 This will be explained in detail using FIG. 16. The matching extraction section 351 includes an observed data complementation section 3517. The observation data group 521A is processed by the observation data complementation unit 3517 and then input to the candidate generation unit 3511.

 具体的な処理として、十分な量の過去の実績値が蓄積されていないような観測データ系列を対象として、直近の傾向から類似する観測データ系列を一つ乃至複数特定し、類似の観測データの過去の実績値、あるいは複数の類似の観測データの過去の実績値の平均等の統計量で、対象の観測データ系列の過去の実績値を補完して良い。また、補間に用いる観測データを決定する際には、補間に用いる候補の観測データと補間対象の観測データに紐づく位置情報や属性情報などの外部データを比較し、外部データの類似度合いから決定しても良い。 Specifically, for observation data series for which a sufficient amount of past performance values have not been accumulated, one or more similar observation data series are identified based on recent trends, and similar observation data series are identified. The past performance value of the target observation data series may be supplemented with a past performance value or a statistic such as an average of past performance values of a plurality of similar observation data. In addition, when determining observation data to be used for interpolation, external data such as location information and attribute information linked to the observation data to be used for interpolation and the observation data to be interpolated are compared, and decisions are made based on the degree of similarity of the external data. You may do so.

 以上の処理により、過去の実績値の不足によりデータ予測システムの処理に用いることができなかった観測データを使用することが可能となり、精度の向上が期待できる。過去の実績値が不足する場合の例として、電力需要家などの観測データが観測される主体数が増加する場合などが挙げられる。 Through the above processing, it becomes possible to use observation data that could not be used in the processing of the data prediction system due to a lack of past performance values, and an improvement in accuracy can be expected. An example of a case where past performance values are insufficient is a case where the number of entities whose observation data, such as electricity consumers, are observed increases.

 以上、本発明の幾つかの実施の形態を説明したが、これらは本発明の説明のための例示であって、本発明の範囲をこれらの実施の形態に限定する趣旨ではない。本発明は、他の種々の形態でも実施することが可能である。例えば、上述の実施の形態にて挙げたいずれか2個以上の実施の形態を併用するような形態が採用されても良い。 Although several embodiments of the present invention have been described above, these are illustrative examples for explaining the present invention, and are not intended to limit the scope of the present invention to these embodiments. The present invention can also be implemented in various other forms. For example, an embodiment may be adopted in which two or more of the embodiments listed in the above-mentioned embodiments are used together.

 1……データ処理システム、3……データ予測システム 1...Data processing system, 3...Data prediction system

Claims (7)

 それぞれ周期性を持つ複数の観測データを含むデータソースに接続されたインターフェース装置と、
 記憶装置と、
 前記インターフェース装置および前記記憶装置に接続されているプロセッサと
を備え、
 前記プロセッサが、前記複数の観測データから複数のデータセット候補を抽出して前記記憶装置に格納し、
 当該複数のデータセット候補の各々は、対象期間についての二つ以上の観測データであり、
 各データセット候補において、観測データは、前記複数の観測データのうちの一つの観測データの全部または一部であり、
 前記プロセッサが、
  前記複数のデータセット候補の各々について、当該データセット候補における観測データ毎に、当該観測データの異なる複数の周波数成分の大きさを要素に持つベクトルデータを算出し、
  前記複数のデータセット候補の各々について、二つ以上のベクトルデータを、ベクトルデータ間の距離に基づき、一つ以上のデータグループに分類し、
  前記複数のデータセット候補の各々について、データグループ毎のデータ数を基に、観測データを入力とし予測データを出力とする予測モデルへの適合度を算出し、
  前記複数のデータセット候補の各々について算出された適合度を基に、前記予測モデルを用いた予測処理に使用されるデータセット候補を適合データセットとして出力する、
データ予測システム。
an interface device connected to a data source containing a plurality of observation data each having periodicity;
a storage device;
a processor connected to the interface device and the storage device;
the processor extracts a plurality of dataset candidates from the plurality of observation data and stores them in the storage device;
Each of the plurality of dataset candidates is two or more observational data for the target period,
In each dataset candidate, the observed data is all or part of one of the plurality of observed data,
The processor,
For each of the plurality of data set candidates, for each observation data in the data set candidate, vector data having as elements the magnitudes of a plurality of different frequency components of the observation data,
For each of the plurality of data set candidates, classifying two or more vector data into one or more data groups based on the distance between the vector data,
For each of the plurality of dataset candidates, based on the number of data for each data group, calculate the degree of fit to a prediction model that uses observed data as input and predicted data as output,
outputting a dataset candidate used for prediction processing using the prediction model as a compatible dataset based on the fitness degree calculated for each of the plurality of dataset candidates;
Data prediction system.
 観測データ毎に、前記ベクトルデータは、異なる複数の周期成分をそれぞれ軸とした座標系における、当該観測データが該当する座標のデータである、
請求項1に記載のデータ予測システム。
For each observation data, the vector data is data at coordinates to which the observation data corresponds in a coordinate system with axes each having a plurality of different periodic components;
The data prediction system according to claim 1.
 前記複数のデータセット候補の各々について、前記予測モデルへの適合度は、下記のうちのいずれか一つである、
  ・データグループのデータ数のばらつきを表す値、
  ・前記複数の観測データのうち当該データセット候補以外の観測データである残余データを用いて予測モデルを構築した際の、訓練誤差、または、尤度の逆数と、前記ばらつきを表す値との線形和、
請求項1に記載のデータ予測システム。
For each of the plurality of dataset candidates, the degree of fit to the prediction model is one of the following:
・Value representing the variation in the number of data in the data group,
・The linearity between the training error or the reciprocal of the likelihood and the value representing the variation when building a prediction model using residual data that is observation data other than the data set candidate among the plurality of observation data. sum,
The data prediction system according to claim 1.
 前記データグループでは、ベクトルデータ間の距離が閾値以下である、
請求項1に記載のデータ予測システム。
In the data group, the distance between vector data is less than or equal to a threshold;
The data prediction system according to claim 1.
 前記プロセッサは、下記(A)を出力とし下記(B)および(C)の少なくとも一つを入力とするモデルである関係モデルを構築し、
  (A)二つ以上の観測データで構成されたデータセットのデータサイズ、
  (B)当該データサイズのデータセットを用いて前記予測処理を行った場合の予測精度、または、当該データセットの前記予測モデルへの適合度、
  (C)当該データサイズのデータセットを用いて前記予測処理を行った場合の処理負荷、
 前記プロセッサは、前記関係モデルを用いて、(B)および(C)の少なくとも一つに関する条件を満たすデータサイズを推定し、
 前記適合データセットは、当該推定されたデータサイズのデータセットである、
請求項1に記載のデータ予測システム。
The processor constructs a relational model that is a model that has the following (A) as an output and at least one of the following (B) and (C) as an input,
(A) Data size of a dataset composed of two or more observational data,
(B) the prediction accuracy when the prediction process is performed using a dataset of the data size, or the degree of conformity of the dataset to the prediction model;
(C) processing load when performing the prediction process using a data set of the data size;
The processor uses the relational model to estimate a data size that satisfies conditions regarding at least one of (B) and (C);
The adapted data set is a data set of the estimated data size,
The data prediction system according to claim 1.
 請求項1に記載のデータ予測システムと、
 発電設備および蓄電設備のうちの少なくとも一つを制御する電力制御システムと
を備え、
 前記データ予測システムは、前記適合データセットを用いて前記予測処理を行うことで前記予測データを出力し、
 前記電力制御システムは、前記予測データを受信し、当該予測データを用いて発電および蓄電の少なくとも一つの計画を作成し、当該作成した計画を基に前記発電設備および前記蓄電設備のうちの少なくとも一つを制御する、
データ処理システム。
The data prediction system according to claim 1,
and a power control system that controls at least one of the power generation equipment and the power storage equipment,
The data prediction system outputs the prediction data by performing the prediction process using the adapted data set,
The power control system receives the prediction data, uses the prediction data to create at least one plan for power generation and power storage, and uses the created plan to control at least one of the power generation equipment and the power storage equipment. control one,
Data processing system.
 コンピュータが、それぞれ周期性を持つ複数の観測データを含むデータソースから複数のデータセット候補を抽出し、
  当該複数のデータセット候補の各々は、対象期間についての二つ以上の観測データの集合であり、
  各データセット候補において、観測データは、前記複数の観測データのうちの一つの観測データの全部または一部であり、
 コンピュータが、前記複数のデータセット候補の各々について、当該データセット候補における観測データ毎に、当該観測データの異なる複数の周波数成分の大きさを要素に持つベクトルデータを算出し、
 コンピュータが、前記複数のデータセット候補の各々について、二つ以上のベクトルデータを、ベクトルデータ間の距離に基づき、一つ以上のデータグループに分類し、
 コンピュータが、前記複数のデータセット候補の各々について、データグループ毎のデータ数を基に、観測データを入力とし予測データを出力とする予測モデルへの適合度を算出し、
 コンピュータが、前記複数のデータセット候補の各々について算出された適合度を基に、前記予測モデルを用いた予測処理に使用されるデータセット候補を適合データセットとして出力する、
データ予測支援方法。
A computer extracts multiple dataset candidates from a data source that includes multiple observational data, each with periodicity,
Each of the plurality of dataset candidates is a collection of two or more observational data for the target period,
In each dataset candidate, the observed data is all or part of one of the plurality of observed data,
for each of the plurality of data set candidates, the computer calculates, for each observation data in the data set candidate, vector data having as elements the magnitudes of a plurality of different frequency components of the observation data,
for each of the plurality of data set candidates, the computer classifies two or more vector data into one or more data groups based on the distance between the vector data;
the computer calculates, for each of the plurality of data set candidates, the degree of fit to a prediction model that uses observed data as input and predicted data as output, based on the number of data for each data group;
a computer outputs a dataset candidate to be used in a prediction process using the prediction model as a compatible dataset, based on the fitness degree calculated for each of the plurality of dataset candidates;
Data prediction support method.
PCT/JP2023/006982 2022-07-12 2023-02-27 Data prediction support method and data prediction system Ceased WO2024014035A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2022-111913 2022-07-12
JP2022111913A JP2024010530A (en) 2022-07-12 2022-07-12 Data prediction support method and data prediction system

Publications (1)

Publication Number Publication Date
WO2024014035A1 true WO2024014035A1 (en) 2024-01-18

Family

ID=89536374

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2023/006982 Ceased WO2024014035A1 (en) 2022-07-12 2023-02-27 Data prediction support method and data prediction system

Country Status (2)

Country Link
JP (1) JP2024010530A (en)
WO (1) WO2024014035A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010204966A (en) * 2009-03-03 2010-09-16 Nippon Telegr & Teleph Corp <Ntt> Sampling device, sampling method, sampling program, class distinction device and class distinction system
JP2021128388A (en) * 2020-02-10 2021-09-02 株式会社イシダ Data generation device, learned model generation device, weighing machine and data generation method
JP2022500747A (en) * 2018-09-10 2022-01-04 グーグル エルエルシーGoogle LLC Biased data rejection using machine learning models

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010204966A (en) * 2009-03-03 2010-09-16 Nippon Telegr & Teleph Corp <Ntt> Sampling device, sampling method, sampling program, class distinction device and class distinction system
JP2022500747A (en) * 2018-09-10 2022-01-04 グーグル エルエルシーGoogle LLC Biased data rejection using machine learning models
JP2021128388A (en) * 2020-02-10 2021-09-02 株式会社イシダ Data generation device, learned model generation device, weighing machine and data generation method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MASATO UTSUMI, YU IKEMOTO, HIROAKI OGAWA, IKUO SHIGEMORI, TOHRU WATANABE: "6-120: Short-term Electricity Demand Forecast based on Cluster Analysis using Periodic Component of Demand,", THE ANNUAL MEETING RECORD I.E.E. JAPAN (2016); MARCH 16-18, 2016, INSTITUTE OF ELECTRICAL ENGINEERS OF JAPAN, JP, vol. 6, 18 March 2016 (2016-03-18) - 18 March 2016 (2016-03-18), JP, pages 181 - 182, XP009553086 *

Also Published As

Publication number Publication date
JP2024010530A (en) 2024-01-24

Similar Documents

Publication Publication Date Title
Wei et al. Short-term load forecasting based on WM algorithm and transfer learning model
JP7319757B2 (en) Data processing system and data processing method
Rajabi et al. A pattern recognition methodology for analyzing residential customers load data and targeting demand response applications
JP6735219B2 (en) Prediction system and prediction method
WO2019049546A1 (en) Prediction system and method
JP7157620B2 (en) Forecasting systems and methods
JP7316233B2 (en) Data processing system and data processing method
US20230402846A1 (en) Data analysis system and method
JP2021128478A5 (en)
Yuqi et al. Short-term load forecasting based on temporal importance analysis and feature extraction
JP5957725B2 (en) Prediction device, prediction method, and prediction program
KR102685543B1 (en) A system for determining the optimal bid amount of a stochastic scenario technique using weather forecast data
Nahid et al. Short-term customer-centric electric load forecasting for low carbon microgrids using a hybrid model
Khan et al. A Survey of Quantitative Techniques in Electricity Consumption—A Global Perspective
Tastu Short-term wind power forecasting: probabilistic and space-time aspects
JP7001766B2 (en) Forecasting system and method
Antoniadis et al. Statistical Learning Tools for Electricity Load Forecasting
JP2016170468A (en) Electric power transaction amount determination system, electric power transaction amount determination method and program
WO2024014035A1 (en) Data prediction support method and data prediction system
Pan et al. Modeling optimization method based on Gamma test and NSGA II for forecast of PV power output
US12289184B2 (en) Estimation system and estimation method
Rueda et al. Important variables in explaining real-time peak price in the independent power market of Ontario
JP7756667B2 (en) Power operation system and method
JP2024007044A (en) Prediction device and prediction method
Gajowniczek et al. Seasonal peak demand classification with machine learning techniques

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23839215

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 23839215

Country of ref document: EP

Kind code of ref document: A1