[go: up one dir, main page]

WO2017168967A1 - Device for determining data analysis method candidate - Google Patents

Device for determining data analysis method candidate Download PDF

Info

Publication number
WO2017168967A1
WO2017168967A1 PCT/JP2017/001371 JP2017001371W WO2017168967A1 WO 2017168967 A1 WO2017168967 A1 WO 2017168967A1 JP 2017001371 W JP2017001371 W JP 2017001371W WO 2017168967 A1 WO2017168967 A1 WO 2017168967A1
Authority
WO
WIPO (PCT)
Prior art keywords
analysis
data
analysis method
candidate determination
target data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2017/001371
Other languages
French (fr)
Japanese (ja)
Inventor
敦子 青木
坂上 聡子
岩田 雅史
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mitsubishi Electric Corp
Original Assignee
Mitsubishi Electric Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mitsubishi Electric Corp filed Critical Mitsubishi Electric Corp
Priority to CN201780007854.4A priority Critical patent/CN108885628A/en
Priority to JP2018508418A priority patent/JP6472573B2/en
Publication of WO2017168967A1 publication Critical patent/WO2017168967A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Definitions

  • This invention relates to a technique for determining data analysis method candidates.
  • Patent Document 1 discloses a software analysis device that selects software parts to be reused or changed at the same time when developing a derivative product based on past development results and change results of software products. Yes.
  • the software analysis apparatus of Patent Document 1 when a user selects a software component that is converted into a source code, a software component that is considered to be used simultaneously with the software component is extracted based on the distance between the software components. To present.
  • Patent Document 2 discloses an information processing device that recommends a source code.
  • the information processing apparatus of Patent Document 2 converts the source code of a program under development into intermediate code, extracts similar intermediate code from the intermediate code stored in the database, and extracts similar intermediate code source code. Recommendation to.
  • Patent Document 1 has a problem that it cannot be used unless there is a software component that has been source-coded.
  • the software parts to be reused are selected using only the distance between the parts of the software parts, the reusable software parts cannot be selected based on the similarity of the analysis target data. there were.
  • Patent Document 2 although there is no limitation on the language type of the source code, there is a problem that the source code cannot be recommended if there is no intermediate code generated from the source-coded program.
  • the present invention has been made in view of the above-described problems, and an object of the present invention is to determine analysis method candidates for data to be analyzed regardless of the presence or absence of source code or intermediate code.
  • a data analysis method candidate determination device is a data analysis method candidate determination device that determines an analysis method candidate of analysis target data to be subjected to data analysis, and a plurality of analyzed data that have been subjected to data analysis in the past
  • An analysis case storage unit that stores data associated with data attributes and analysis methods as analysis cases
  • an analysis target data storage unit that stores data attribute information for the analysis target data
  • a data attribute similarity that is a similarity between the data attribute of the analyzed data and the data attribute of the analyzed data is calculated, and at least one of the analysis methods of the analyzed data is analyzed based on the data attribute similarity
  • An analysis method candidate determination unit that determines the analysis method candidate of the target data.
  • a data analysis method candidate determination device is a data analysis method candidate determination device that determines an analysis method candidate of analysis target data to be subjected to data analysis, and a plurality of analyzed data that have been subjected to data analysis in the past
  • An analysis case storage unit that stores data associated with data attributes and analysis methods as analysis cases
  • an analysis target data storage unit that stores data attribute information for the analysis target data
  • a data attribute similarity that is a similarity between the data attribute of the analyzed data and the data attribute of the analyzed data is calculated, and at least one of the analysis methods of the analyzed data is analyzed based on the data attribute similarity
  • An analysis method candidate determination unit that determines the analysis method candidate of the target data. Since the analysis method candidates are determined based on the data attribute similarity, the analysis method candidates can be determined without the source code of each analysis method.
  • FIG. 1 is a block diagram showing a configuration of a data analysis technique candidate determination device according to Embodiment 1.
  • FIG. It is a figure which illustrates a data attribute.
  • 2 is a diagram illustrating a hardware configuration of a data analysis technique candidate determination device according to Embodiment 1.
  • FIG. 4 is a flowchart showing an operation of the data analysis technique candidate determination device according to the first embodiment. It is a flowchart which shows the process in step S15 of FIG. It is a figure which shows the example of a setting of a distance evaluation axis.
  • 6 is a block diagram illustrating a configuration of a data analysis technique candidate determination device according to Embodiment 2.
  • FIG. 10 is a flowchart showing the operation of the data analysis technique candidate determination device according to the second embodiment.
  • FIG. 10 is a block diagram illustrating a configuration of a data analysis technique candidate determination device according to a modification of the second embodiment. 10 is a flowchart showing an operation of a data analysis technique candidate determination device according to a modification of the second embodiment.
  • FIG. 10 is a block diagram illustrating a configuration of a data analysis technique candidate determination device according to a third embodiment. 10 is a flowchart showing the operation of the data analysis technique candidate determination device according to the third embodiment. 10 is a flowchart showing the operation of the data analysis technique candidate determination device according to the third embodiment. It is a figure which shows the function flowchart A. It is a figure which shows the function flowchart B. FIG.
  • FIG. 10 is a block diagram illustrating a configuration of a data analysis technique candidate determination device according to a fourth embodiment.
  • 10 is a flowchart showing the operation of the data analysis technique candidate determination device according to the fourth embodiment. It is a flowchart which shows operation
  • FIG. 10 is a block diagram illustrating a configuration of a data analysis technique candidate determination device according to a fifth embodiment. 10 is a flowchart showing the operation of the data analysis technique candidate determination device according to the fifth embodiment. It is a flowchart which shows operation
  • FIG. 1 is a block diagram illustrating a configuration of a data analysis technique candidate determination device 11 according to the first embodiment.
  • the data analysis method candidate determination device 11 is a device that determines an analysis method candidate of analysis target data to be analyzed and recommends it to the user.
  • the data analysis method candidate determination device 11 includes an analysis target data storage unit 2, an analysis case storage unit 3, and an analysis method candidate determination unit 4.
  • the components of the data analysis method candidate determination device 11 are not only provided in one device, but are also distributed and arranged in a plurality of devices, and the plurality of devices are connected to each other by a network such as the Internet,
  • the data analysis method candidate determination device 11 as a system as a whole may be configured.
  • the data analysis technique candidate determination device 11 can use the input unit 5 and the output unit 6.
  • the input unit 5 is an input interface for inputting a command or search condition from the user to the data analysis technique candidate determination device 11.
  • the output unit 6 is an output interface that outputs the determination result of the analysis method candidate by the analysis method candidate determination unit 4 to the user.
  • the input unit 5 and the output unit 6 are shown as different configurations from the data analysis method candidate determination device 11, but the data analysis method candidate determination device 11 may include these.
  • the analysis target data storage unit 2 is configured by a recording medium such as an HDD (Hard Disk Drive) or SD, and stores analysis target data to be analyzed and data attributes of the analysis target data.
  • the data to be analyzed by the data analysis method candidate determination device 11 includes temperature, humidity, vibration, speed, acceleration, pressure, amount of solar radiation, distance, weight, current, voltage, electric energy, number of revolutions, Time series data such as numbers, device usage history, access logs, mobile GPS data, discrete data such as weather observations or weather forecasts, reports, inspection records, work history, forms, plans, etc. Includes document data or demographic or white paper statistical data.
  • the data to be analyzed is data that should be analyzed from now on, but in addition to this, analyzed data that has been analyzed in the past and data analysis results newly created by data analysis, prediction, estimation, etc. are analyzed. It may be stored in the target data storage unit 2.
  • the analysis target data storage unit 2 may include data that has not been analyzed in the past but can be used and data attributes of the data.
  • the analysis target data storage unit 2 only needs to store the data attribute of the analysis target data, and the analysis target data itself does not necessarily have to be stored.
  • analysis target data in which the analysis target data itself is not stored in the analysis target data storage unit 2 include open data provided by local governments, data submitted to an SNS (Social Network System), or data analysis method candidate determination device 11 There is data that is distributed and stored in a cloud environment that can be accessed from.
  • SNS Social Network System
  • FIG. 2 is a diagram illustrating data attributes.
  • FIG. 2 shows data attributes for data A, data B, and data C, respectively.
  • Data attributes represent data characteristics, and include, for example, data acquisition intervals, data acquisition methods, distinction between actual values, predicted values, and processed values, data types, related data, and related devices.
  • access authority for data may be used as a data attribute.
  • the analysis case storage unit 3 includes a recording medium such as an HDD (Hard Disk Drive) or SD.
  • the analysis case storage unit 3 stores, as analysis cases, data in which data attributes and analysis techniques are associated with analyzed data that has been subjected to data analysis in the past.
  • the analysis example does not need to be an analysis example created by the data analysis method candidate determination device 11, but is an existing analysis example, a publicly known example based on literature, a trial application example at the examination stage, a non-adoption example, or an analysis method change example Etc. are desirable. Further, the analysis example may include user evaluation information for the analysis method.
  • the analysis method may be described in source code, or may be described in intermediate code that can be executed by the program.
  • the analysis method candidate determination unit 4 selects an analysis method to be used for data analysis of the analysis target data from the analysis methods used in the past analysis examples, and determines the analysis method candidate.
  • the analysis method candidate determined here is output from the output unit 6 in, for example, a text format and recommended to the user. Alternatively, a combination of representative past cases with analysis method candidates may be output in a list format and recommended to the user. In this case, the user can easily understand examples or features of the analysis technique candidates.
  • FIG. 3 is a diagram illustrating a hardware configuration of the data analysis technique candidate determination apparatus 11.
  • the data analysis technique candidate determination device 11 includes a processor 20, a memory 21, and a recording medium 22.
  • the analysis technique candidate determination unit 4 is realized as a function of the processor 20 by executing a software program stored in a memory 21 such as a RAM (Random Access Memory) by a processor 20 such as a CPU (Central Processing Unit). To do. However, these may be realized in cooperation with a plurality of processors.
  • the analysis technique candidate determination unit 4 may be realized by a signal processing circuit that realizes the operation by a hardware electric circuit.
  • the word “processing circuit” can be used instead of the word “part”.
  • FIG. 4 is a flowchart showing the operation of the data analysis technique candidate determination apparatus 11.
  • the user selects analysis target data and analysis purpose via the input unit 5 (step S11).
  • the analysis target data for example, a list of data stored in the analysis target data storage unit 2 may be displayed, and the user may select from the list, or the user inputs new analysis target data using an electronic file or the like. You may be able to do it.
  • analysis target data is newly input, the data is stored in the analysis target data storage unit 2.
  • a list such as a pull-down menu may be displayed, and the user may select from the list, or the user may be able to input with a character string.
  • the analysis purpose selected by the user here is stored in the analysis object data storage unit 2. Further, the analysis purpose is not limited to one, and there may be a plurality of analysis purposes.
  • “television viewing data” and “viewer viewing preference analysis” will be described as analysis target data and analysis purpose examples, respectively.
  • the analysis target data is read from the analysis target data storage unit 2 to the analysis technique candidate determination unit 4 (step S12). That is, television viewing data collected from each television terminal is read as analysis target data.
  • the data attribute and analysis purpose of the analysis target data are read from the analysis target data storage unit 2 to the analysis technique candidate determination unit 4 (step S13). That is, as data attributes of “TV viewing data” that is analysis target data, for example, the data acquisition interval, the location of the data acquisition device, and the owner information of the data acquisition device are read. Read "Analysis”.
  • an analysis case having a data attribute that is the same as or similar to the analysis target data or an analysis purpose that is the same or similar to the analysis target data is read from the analysis case storage unit 3 to the analysis technique candidate determination unit 4 (step S14).
  • analysis examples similar to the data to be analyzed “TV viewing data” include “TV region audience rating survey”, “regional favorite talent analysis”, “popular movie genre survey”, “Power usage survey” or “Production efficiency analysis in factories”.
  • Examples of similar analysis purposes include "Internet browsing history analysis”, “Product purchase status analysis”, “Visit store analysis”, “Point card holding status analysis”, “Public transport ride history”, Or “visit facility analysis during travel”.
  • step S15 the analysis method candidate determination unit 4 determines analysis method candidates for the analysis target data. Detailed processing contents in step S15 will be described later.
  • step S15 the analysis technique candidate created in step S15 is output to the output unit 6 and recommended to the user (step S16), and the process ends.
  • FIG. 5 is a flowchart showing analysis method candidate determination processing by the analysis method candidate determination unit 4 in step S15 of FIG.
  • the data attribute similarity between the analysis target data and the analyzed data is calculated for the analysis example read in step S14 of FIG. 4 (step S151).
  • the processing will be specifically described using “public transit history” data as an example of analyzed data of an analysis example.
  • the data attribute similarity Sz is calculated with respect to data attributes such as "the public transportation route estimated from”.
  • the data attribute similarity Sz is calculated by the following equation, for example.
  • N is the number of items registered as data attributes
  • Lmaxi is the maximum distance of the i-th data attribute item
  • Li is the distance of the i-th data attribute item.
  • a distance evaluation axis is set for each data attribute item, and the distance Li of the i-th data attribute item is calculated using the distance evaluation axis.
  • Fig. 6 shows an example of setting the distance evaluation axis.
  • the distance is set to 10 if at least one of the data to be analyzed and the analyzed data is irregular. If the data acquisition interval of the analyzed data is shorter than the data acquisition interval of the analysis target data, the distance is set to zero. The distance is set to 5 if one acquisition interval of analysis target data and analyzed data is 100 times or more the other acquisition interval.
  • the distance is 0 for the same method, the distance is 2 if one is a log and the other is a terminal input, and the distance is 1 if both are sensor logs but the sensor types are different.
  • the distance evaluation axis may be set on a rule basis for each data attribute item, or may be set by a mathematical expression. Further, the number of rules need not be limited, and the maximum value of the distance may be provided for each evaluation axis. Among the distance evaluation axes set as shown in FIG. 6, the maximum distance is determined as the maximum distance. In FIG. 6, the case where the distance has only a positive value is described. However, there may be a distance that takes a negative value, or a two-dimensional value or more may be taken without taking a one-dimensional value.
  • an analysis purpose similarity Sp with the analysis target data is calculated (step S152).
  • the analysis objective of the analysis target data and the analysis objective of the analyzed data are compared with character strings, and the similarity is calculated as the analysis objective similarity Sp.
  • the analysis target similarity Sp can be obtained by using, for example, a cosine similarity or a Levenshtein distance.
  • the analysis target similarity Sp between the analysis target character string A of the analysis target data and the analysis target character string B of the analyzed data is obtained as the cosine similarity, it is calculated by the following equation.
  • a ⁇ B is the inner product of character string A and character string B,
  • is the distance of character string A, and
  • is the distance of character string B.
  • the analysis target character string A is “analysis of viewer's viewing preference” and the analysis target character string B of the analyzed data is “popular movie genre survey”.
  • a calculation method will be described.
  • keywords are extracted by decomposing the character string A into word levels, “viewing, viewers, preferences, analysis” is obtained, and “popularity, movie, genre, survey” is obtained from the character string B as well.
  • a similar word database in which similar words are defined can be provided in the analysis target data storage unit 2 or the analysis example storage unit 3, and similar words can be linked by referring to the similar word database.
  • the analysis target similarity Sp is calculated as follows.
  • the processing procedure shown in the source code or intermediate code can be changed using a technique such as UML (Unified Modeling Language) or a function flowchart.
  • UML Unified Modeling Language
  • the analysis purpose similarity Sp may be calculated from the similarity of the processing procedures.
  • a method for calculating the analysis object similarity Sp will be described with reference to the function flowchart A shown in FIG. 15 and the function flowchart B shown in FIG.
  • Step S21 is a step of inputting X
  • step S22 is a step of substituting X / 5 for Y
  • step S23 is a step of outputting Y
  • step S24 is a step of inputting Z
  • step S25 is a step of substituting Y ⁇ Z into
  • Step S26 is a step for outputting Y.
  • Step S31 is a step of inputting X
  • step S32 is a step of a subroutine relating to X
  • step S33 is a step of outputting Y.
  • Step S32 of the subroutine relating to X is step S34 in which X / 5 is substituted for Y.
  • the matching rate of processing procedures is defined by the number of matching processing procedures relative to the total number of processing procedures.
  • the coincidence rate is calculated as follows.
  • the analysis target similarity degree Sp can be expressed by the following equation, for example.
  • the analysis target similarity of each of the superordinate concept, the middle concept, and the subordinate concept is expressed by Equation (6). You may calculate and take the average. Alternatively, an ID number considering the similarity of the method is assigned to each of the upper concept, middle concept, and lower concept options in advance, and the analysis target similarity Sp is based on the difference between the numbers obtained by combining the ID numbers. You may ask for.
  • the analysis object similarity Sp with the analysis object whose ID number is “1-0-02” can be calculated as follows.
  • the ID number of the superordinate concept-intermediate concept-subordinate concept is represented by “1-0-01”
  • the ID number of the superordinate concept-intermediate concept-subordinate concept is “5-
  • the analysis object similarity Sp with the analysis object represented by “0-01” can be calculated as follows.
  • the formula for calculating the analysis target similarity degree Sp described above is merely an example. Therefore, it is possible to make modifications such as weighting a specific condition or performing a correction operation such as tilting when there is a bias in the average value of calculation results due to a difference in the calculation method of the analysis target similarity. .
  • an overall similarity S between the analysis target data and the analyzed data is calculated (step S153).
  • the total similarity S is calculated by the following formula, for example.
  • step S154 it is confirmed whether there is any other analyzed data for which the total similarity is not calculated (step S154). If there is analyzed data for which the total similarity is not calculated, the process returns to step S151, and the processes from step S151 to step S153 are executed on the analyzed data. When the calculation of the similarity is completed for all analyzed data, the process proceeds to step S155.
  • step S155 an average similarity is calculated for each analysis method from the total similarity of all analysis cases read in step S14 of FIG.
  • analysis such as “regression analysis”, “k-means method”, “behavior model base reasoning”, “behavior model base reasoning and queuing simulation”, “neural network”, etc.
  • the average similarity Sav with respect to the “regression analysis” is calculated by the following equation, for example.
  • N times indicates the number of cases including “regression analysis” as the data analysis method
  • ⁇ S times indicates the sum of the total similarities of cases including “regression analysis” as the data analysis method.
  • the average similarity may be calculated using various other averages such as a geometric average, a harmonic average, and a weighted average.
  • the average similarity may be calculated while maintaining a combination of a plurality of analysis methods.
  • the average similarity may be calculated again for the data analysis methods used in combination only for the method having a high average similarity.
  • analysis method candidates for the analysis target data are determined (step S156).
  • the analysis method with the highest average similarity may be set as the analysis method candidate, or a plurality of analysis methods may be set as the analysis method candidates in descending order of the average similarity.
  • the average similarity, the number of analysis cases including the analysis method candidates, or the appearance frequency of the analysis purpose using the analysis method candidates Etc. may be output together.
  • the data analysis technique candidate determination device 11 stores an analysis example in which data associated with a data attribute and an analysis technique is stored as an analysis example for each of a plurality of analyzed data subjected to data analysis in the past.
  • the storage unit 3 the analysis target data storage unit 2 that stores data attribute information, and the data attribute similarity that is the similarity between the data attribute of the analysis target data and the data attribute of the analyzed data is calculated.
  • an analysis method candidate determination unit 4 that determines at least one analysis method as an analysis method candidate for the analysis target data from among the analysis methods of the analyzed data based on the data attribute similarity. Therefore, even if there is no source code for each analysis method, analysis method candidates can be determined with reference to analysis cases having similar data attributes.
  • the analysis case storage unit 3 stores analysis purpose information for each of a plurality of analyzed data
  • the analysis target data storage unit 2 stores analysis purpose information of the analysis target data to determine analysis method candidates.
  • the unit 4 calculates the similarity between the analysis purpose of the analysis target data and the analysis purpose of the analyzed data as the analysis target similarity, and combines the analysis target data and the analyzed data based on the analysis target similarity and the data attribute similarity. The similarity is calculated, and at least one analysis method is determined as an analysis method candidate for the analysis target data from the analysis methods of the analyzed data based on the total similarity. Therefore, even if there is no source code for each analysis method, analysis method candidates can be determined with reference to analysis cases having similar data attributes and analysis purposes.
  • the data attributes of the analyzed data and the analysis target data include at least one of a data acquisition interval, a data acquisition method, and an actual value, a predicted value, or a processed value.
  • the analysis method candidate determination unit 4 calculates the analysis target similarity based on the analysis target character string of the analysis target data and the analysis target character string of the analyzed data. By comparing the character strings and calculating the analysis target similarity, and determining the analysis method candidate based on the analysis target similarity, the analysis method candidate can be determined without the source code of each analysis method. it can.
  • the analysis technique candidate determination unit 4 calculates the analysis object similarity based on the analysis purpose of the analysis target data described in the hierarchical structure and the analysis purpose of the analyzed data described in the hierarchical structure. By comparing the similarities between the analytical objectives set in advance for each hierarchy, calculating the analytical objective similarity, and determining the analytical technique candidates based on the analytical objective similarity, there is no source code for each analytical technique. Also, analysis method candidates can be determined.
  • the analysis method candidate determination unit 4 is indicated in the source code or the intermediate code for the analysis target data.
  • the similarity between the processing procedure and the processing procedure indicated in the analysis-purpose source code or intermediate code of the analyzed data is calculated as the analysis-object similarity based on the matching rate or the continuity of the matching processing procedures.
  • Analytical purpose is described in source code or intermediate code by calculating analytical target similarity based on the matching rate of processing procedures or continuity of matching processing procedures and determining analysis method candidates based on the analytical target similarity Even in such a case, analysis method candidates can be determined.
  • the analysis method candidate determination unit 4 calculates the average value of the overall similarity between the analyzed data using the analysis method and the analysis target data for each analysis method, and selects the average based on the average value of the overall similarity
  • the analysis method is determined as a candidate analysis method. Therefore, analysis method candidates can be determined without the source code of each analysis method.
  • FIG. 7 is a block diagram illustrating a configuration of the data analysis technique candidate determination device 12 according to the second embodiment.
  • the data analysis method candidate determination apparatus 12 newly includes an evaluation acquisition unit 7 and a recommended case storage unit 8.
  • the recommended case storage unit 8 is composed of a recording medium such as an HDD (Hard Disk Drive) or SD, and stores recommended case data.
  • the recommended case data is data in which analysis method candidates determined by the analysis method candidate determination unit 4 in the past are associated with the analysis target data and the analysis purpose.
  • the evaluation acquisition unit 7 acquires evaluation information for the analysis technique candidate input by the user via the input unit 5, and adds the evaluation information to the corresponding recommended case stored in the recommended case storage unit 8. That is, the recommended case storage unit 8 stores recommended cases made up of analysis target data, analysis purposes, and analysis method candidates, and evaluation information for the recommended cases in association with each other.
  • the evaluation acquisition unit 7 is realized as a function of the processor 20 by the processor 20 illustrated in FIG. 3 executing a software program stored in the memory 21.
  • FIG. 8 is a flowchart showing the operation of the data analysis technique candidate determination device 12. Steps S11 to S16 are the same as those in the first embodiment and have already been described with reference to FIG.
  • the analysis method candidate determination unit 4 determines the analysis method candidate (step S15), and outputs the analysis method candidate to the output unit 6 (step S16), and associates the analysis target data, the analysis purpose, and the analysis method candidate. Data (recommended case) is stored in the recommended case storage unit 8 (step S17).
  • FIG. 9 is a flowchart showing the operation of the evaluation acquisition unit 7. This flow is performed only when a recommended case is stored in the recommended case storage unit 8.
  • the evaluation acquisition unit 7 determines a recommended case to which evaluation information is to be added (step S71). For example, a screen displaying a list of all recommended cases stored in the recommended case storage unit 8 may be displayed, and the user may select a recommended case from the screen.
  • the user may be allowed to input conditions such as analysis target data or analysis purpose, and the recommended cases may be specified or narrowed down based on the input conditions.
  • the recommended case to which evaluation information is not yet added may be extracted from the recommended case storage unit 8 and presented to the user, and the user may select it.
  • the analysis method candidate actually used by the user is specified (step S72).
  • the plurality of analysis technique candidates are specified.
  • a list screen of a plurality of analysis method candidates is displayed, and analysis method candidates actually used by the user are selected from the list screen.
  • user evaluation information is acquired for the analysis method candidate specified in step S72 (step S73).
  • the user evaluation information is obtained by causing the user to input from the input unit 5.
  • the evaluation information includes supplementary information such as analysis accuracy, user's personal feeling, execution time, and the like.
  • the analysis method candidate that provides the most desirable result may be selected by the user from a list screen of a plurality of analysis method candidates.
  • the ranks may be input to the analysis method candidates in the order in which desirable results are obtained.
  • information related to bad evaluation may be acquired. For example, if there is an analysis method candidate that has been used by the user but has not been adopted as a result due to some problem, for example, a problem related to the analysis method candidate may be input. In addition, it is possible to input analysis methods candidates that are not actually used by the user. Further, supplementary information such as assignments may be selected from answers prepared in advance or may be freely input.
  • the evaluation acquisition unit 7 assigns the evaluation information thus acquired to the recommended case and stores it in the recommended case storage unit 8 (step S74).
  • the evaluation acquisition unit 7 adds, to the analysis case storage unit 3, a recommended case related to an analysis method candidate to which desirable evaluation information is assigned among recommended cases to which evaluation information is assigned (step S ⁇ b> 75). ).
  • a recommended case related to an analysis method candidate to which desirable evaluation information is assigned among recommended cases to which evaluation information is assigned (step S ⁇ b> 75).
  • desirable evaluation for “regression analysis” When information is acquired and undesirable evaluation information is acquired for the “k-means method”, the analysis target data “TV viewing data”, the analysis purpose “analysis of viewer viewing preferences”, the analysis method “regression analysis” Is added to the analysis case storage unit 3 as a new analysis case.
  • FIG. 10 is a block diagram illustrating a configuration of a data analysis technique candidate determination device 13 according to a modification of the second embodiment.
  • the data analysis technique candidate determination device 13 includes an attribute adding unit 9 in addition to the configuration of the data analysis technique candidate determination device 12.
  • the configuration of the data analysis method candidate determination device 13 other than the attribute adding unit 9 is the same as that of the data analysis method candidate determination device 12.
  • the attribute adding unit 9 analyzes the reason for non-adoption of the analysis technique candidate acquired by the evaluation acquiring unit 7, and sets the data attribute corresponding to the reason for non-adoption to all the data attributes stored in the analysis target data storage unit 2. It adds as a new data attribute item of analysis object data. At this time, the attribute adding unit 9 may notify the user such as a system administrator of the added data attribute item through the output unit 6 and prompt the user to input the data attribute related to the added data attribute item. Also, the distance evaluation axis for calculating the data attribute similarity for the added data attribute item may be urged to be input to the user in the same manner as the data attribute. The user can input these data attributes or distance evaluation axes to the data analysis technique candidate determination device 13 through the input unit 5.
  • the attribute adding unit 9 is realized as a function of the processor 20 when the processor 20 illustrated in FIG. 6 executes a software program stored in the memory 21.
  • FIG. 11 is a flowchart showing the operation of the attribute adding unit 9 in the data analysis technique candidate determination device 13. This flow is executed when the reason for not adopting the analysis method candidate is stored in the recommended case storage unit 8.
  • a recommended case to which evaluation information is given is extracted from the recommended case storage unit 8 (step S81).
  • step S82 the reason for non-adoption is extracted for the analysis method candidate that is not used in the recommended case extracted in step S81 (step S82).
  • step S83 the reason for non-employment extracted in step S82 is analyzed.
  • frequency analysis by keyword extraction, simple statistics, or the like can be used.
  • the data attribute item corresponding to the analyzed non-recruitment reason is added as the data attribute item of the analysis target data stored in the analysis target data storage unit 2 (step S84). For example, as a result of analyzing the reason for non-adoption in step S83, if it is found that keywords such as “execution time is long” and “processing is heavy” are many as reasons for non-adoption, “calculation amount”, “execution time per unit amount” To the data attribute.
  • the analysis method candidate determination unit 4 determines the analysis method candidate more finely. It becomes possible to judge attribute similarity. Therefore, it is possible to improve the determination accuracy of the analysis method candidate.
  • the data analysis method candidate determination device 12 includes an evaluation acquisition unit 7 that acquires user evaluation information for the analysis method candidate; And a recommended case storage unit 8 that stores data associating data attributes of the analysis target data, analysis method candidates of the analysis target data, and evaluation information for the analysis method candidates as recommended cases.
  • the determination accuracy of the analysis method candidate can be improved by using, for example, the recommended case that has obtained desirable evaluation information as the analysis case.
  • the data analysis method candidate determination apparatus 13 analyzes from the evaluation information acquired by the evaluation acquisition unit 7.
  • An attribute adding unit 9 is provided for extracting the reason for non-adoption of method candidates and adding an item corresponding to the reason for non-adoption to the data attribute item. Therefore, when the analysis method candidate determination unit 4 determines the analysis method candidate, the data attribute similarity can be determined more finely, so that the determination accuracy of the analysis method candidate can be improved.
  • FIG. 12 is a block diagram illustrating a configuration of the data analysis technique candidate determination device 14 according to the third embodiment.
  • the data analysis method candidate determination device 14 includes a model change proposing unit 10 in addition to the configuration of the data analysis method candidate determination device 11 according to the first embodiment.
  • the model change proposing unit 10 proposes a physical model change such as correction or addition of a physical model when the analysis method candidate determined by the analysis method candidate determining unit 4 includes a physical model-based analysis method.
  • the physical model-based analysis method indicates all data analysis methods using a physical model based on data or design information, such as a device model, a failure model, a behavior model, a correlation model, or a user model.
  • the physical model may be described in a document format such as a parameter sheet, may be described in a chart format such as an FTA (Fault Tree Analysis) diagram, a fault tree, or an electric circuit diagram, an equation of motion or a bathtub curve, etc. Or a machine language such as assembler or source code.
  • the model change proposing unit 10 is realized as a function of the processor 20 by the processor 20 illustrated in FIG. 3 executing a software program stored in the memory 21.
  • the analysis case storage unit 3 stores analysis target data, an analysis purpose and data attribute of the analysis target data, and an analysis technique as an analysis case. Furthermore, when the analysis method is a physical model-based analysis method, change information of the physical model is also stored as an analysis example. Specifically, when a user makes a change (addition, modification) to a physical model and then performs data analysis using the changed physical model, the changed physical model actually used for data analysis Not only the model but also the physical model before the change is stored in the analysis case storage unit 3 as change information.
  • the configuration of the data analysis technique candidate determination device 14 other than that described above is the same as the configuration of the data analysis method candidate determination device 11 according to the first embodiment.
  • FIG. 13 is a flowchart showing the operation of the data analysis technique candidate determination device 14. Steps S11 to S15 and S16 are the same as in the first embodiment, but differ from the first embodiment in that a new step S18 is added between steps S15 and S16.
  • the analysis method candidate of the analysis target data is determined by the analysis method candidate determination unit 4 (step S15)
  • the model change proposing unit 10 proposes a change of the physical model. (Step S18).
  • FIG. 14 is a flowchart showing the operation of the model change proposing unit 10 in step S18 of FIG. This flow is executed only when physical model change information is stored in the analysis case storage unit 3.
  • step S181 it is determined whether the analysis method candidate determined by the analysis method candidate determination unit 4 in step S15 in FIG. 13 includes a physical model base analysis method (step S181). If the physical model base analysis method is not included, the process of the model change proposing unit 10 is terminated. If the physical model-based analysis method is included, the process proceeds to step S182.
  • step S182 the analysis example stored in the analysis example storage unit 3 uses the same analysis method as the physical model database analysis method included in the analysis method candidates, and the analysis example describes the change information of the physical model To extract.
  • step S183 it is determined whether or not the changed physical model data indicated by the change information is stored in the analysis case storage unit 3 (step S183). If the changed physical model data exists in the analysis example storage unit 3, the user is suggested to utilize the changed physical model (step S184). For example, it is assumed that an analysis method that uses the passenger model A as a physical model has been recommended as an analysis method candidate when the user previously analyzed the analysis target data “boarding history of public transportation”. On the other hand, when the user performs a data analysis using the passenger model B in which some modifications are made to the passenger model A or a new passenger model is added, the analysis example storage unit 3 includes: In addition to the analysis target data, the analysis purpose, and the analysis method actually used (passenger model B), the passenger model A before the change is recorded. Thereafter, in another data analysis, when the analysis method candidate determination unit 4 determines an analysis method that uses the passenger model A as a physical model as the analysis method candidate, the passenger model B is used instead of the passenger model A. Suggest to users.
  • step S183 if the changed physical model data does not exist in the analysis example storage unit 3, a method for changing (correcting or adding) the physical model is proposed to the user. For example, if the analysis method that uses the customer model as a physical model is an analysis method candidate for the analysis purpose of “product purchase situation analysis”, the customer model is corrected to a category suitable for the product genre you want to analyze. And a method for adding a purchaser model that “a parent buys on behalf of a child” is proposed.
  • the analysis case data stored in the analysis case storage unit 3 is an analysis in which data analysis is performed using a physical model obtained by changing a certain physical model. For the case, information on the physical model before the change is included.
  • the data analysis technique candidate determination device 14 includes a model change proposing unit 10 in addition to the configuration of the data analysis technique candidate determination device 11 according to the first embodiment.
  • the model change proposing unit 10 proposes a change of the physical model when the analysis method candidate is an analysis method using a physical model and the physical model used in the analysis method candidate is the same as the physical model before the change in the analysis example. To do. Therefore, it is possible to improve the analysis accuracy regarding the physical model-based analysis method.
  • FIG. 17 is a block diagram illustrating a configuration of the data analysis technique candidate determination device 15 according to the fourth embodiment.
  • the data analysis technique candidate determination device 15 includes an existing data utilization proposal unit 101 in addition to the configuration of the data analysis technique candidate determination device 11 according to the first embodiment.
  • the existing data utilization proposal unit 101 Extract the analysis target data (second analysis target data) having the necessary data attributes from the past analysis target data stored in the analysis target data storage unit 2, and propose to the user to use the second analysis target data To do.
  • the existing data utilization proposing unit 101 is realized as a function of the processor 20 by the processor 20 illustrated in FIG. 3 executing a software program stored in the memory 21.
  • the analysis case storage unit 3 stores the analysis target data initially selected by the user, the analysis purpose and data attributes of the analysis target data, and the analysis technique as analysis cases.
  • the analysis case storage unit 3 also stores analysis target data additionally selected by the user as proposed by the existing data utilization proposal unit 101 as an analysis case.
  • the analysis target data may be stored in the analysis case storage unit 3 with a flag for each selection timing.
  • the configuration of the data analysis method candidate determination device 15 other than that described above is the same as the configuration of the data analysis method candidate determination device 11 according to the first embodiment.
  • FIG. 18 is a flowchart showing the operation of the data analysis technique candidate determination device 15.
  • steps S11 to S15 and S16 are the same as in the first embodiment, but differ from the first embodiment in that a new step S19 is added between steps S15 and S16.
  • the analysis method candidate of the analysis target data is determined by the analysis method candidate determination unit 4 (step S15)
  • the data attribute of the analysis target data acquired in step S13 is used as a data attribute necessary for executing the analysis method candidate. If it is insufficient, the existing data utilization proposing unit 101 proposes to add analysis target data (step S19).
  • FIG. 19 is a flowchart showing the operation of the existing data utilization proposing unit 101 in step S19 of FIG.
  • the existing data utilization proposing unit 101 uses the data attributes necessary for the analysis target data (first analysis target data) selected in step S11 of FIG. 18 to execute the analysis method candidates determined in step S15. It is judged whether it has (step S191).
  • the analysis target data does not have the necessary data attribute
  • the following three cases are exemplified.
  • the first is a case where the analysis target data itself is missing.
  • the second case is a case where the data acquisition interval of the analysis target data is rough with respect to the data acquisition interval specified as the necessary data attribute, and a sufficient analysis result cannot be obtained.
  • the third is a case where the data acquisition method specified as the necessary data attribute does not match the data acquisition method of the analysis target, and sufficient analysis results cannot be obtained.
  • the third case corresponds to the case where the analysis target data is a processed value even though the data is directly measured by a sensor or the like.
  • the existing data utilization proposing unit 101 ends the process. On the other hand, when the analysis target data (first analysis target data) does not have a data attribute necessary for executing the analysis method candidate, the existing data utilization proposal unit 101 proceeds to the process of step S192.
  • step S192 the existing data utilization proposing unit 101 uses an analysis method that is the same as or includes an analysis method candidate from the analysis cases stored in the analysis case storage unit 3, and the analysis purpose is the same or similar. Extract analysis cases.
  • the existing data utilization proposing unit 101 compares the data attribute of the analyzed data in the extracted analysis example with the data attribute of the analysis target data currently selected by the user, and determines the data attribute of the analyzed data.
  • Data attributes necessary for executing the analysis technique candidate are extracted (step S193).
  • data access authority is set as the data attribute and the user does not have access authority, or data utilization conditions are set as the data attribute and the data is diverted by contract with the data source.
  • Data attributes such as restricted data may be excluded from extraction. In this case, only the data attribute may be presented by giving access authority or restriction information on data diversion.
  • the existing data utilization proposing unit 101 determines that the analysis target data having the extracted data attribute (first 2 analysis target data), that is, the second analysis target data is added to the currently selected analysis target data (first analysis target data), and the user is proposed to perform the analysis (step S194). For example, when the user analyzes the analysis target data “the power consumption of a general household existing in A prefecture B city C town D chome” by adding the analysis target data “weekday / holiday classification of the analysis target period” Assume that the “k-means method” is presented as a method candidate, and the user decides to use the analysis method candidate.
  • another user uses the “k-means method” in the analysis case storage unit 3 to change the analysis target data “building energy consumption” into the analysis target data “weekday / holiday classification of the analysis target period”.
  • “meteorological observation data during the analysis period” and “employee entry / exit history during the analysis period” were added and analyzed.
  • the analysis target data “employee entry / exit history of the analysis target period in the building” indicates that secondary use of the data is impossible as a data attribute.
  • the existing data utilization proposing unit 101 may propose to the user to additionally use the analysis target data “meteorological observation data in the analysis target period” in step S194.
  • the existing data utilization proposing unit 101 preferably uses the analysis target data “meteorological observation data during the analysis target period” and the analysis target data “employee entry / exit history of the analysis target period”. It may be shown to the user that the secondary use of data is indicated as a data attribute of the data “history of entry / exit of employees in the analysis period”.
  • the first analysis target data has data attributes necessary for the analysis method determined by the analysis method candidate determination unit 4 with respect to the first analysis target data. If not, an existing data utilization proposing unit 101 that proposes utilization of second analysis target data having necessary data attributes is provided. Thus, by proposing the addition of another analysis target data having data attributes necessary for the execution of the analysis method candidate, it is possible to improve the analysis accuracy when the analysis method candidate is executed.
  • the second analysis target data has a data attribute related to whether or not the data can be diverted.
  • the existing data utilization proposing unit 101 proposes utilization of the second analysis target data to the user, whether or not the analyzed data can be diverted. Information to users. Accordingly, when the second analysis target data proposed to the existing data utilization proposing unit 101 is data that cannot be diverted, the user can consider obtaining alternative data that can be diverted, and add alternative data. Thus, it is possible to improve the analysis accuracy when the analysis method candidate is executed.
  • FIG. 20 is a block diagram illustrating a configuration of the data analysis technique candidate determination device 16 according to the fifth embodiment.
  • the data analysis method candidate determination device 16 includes an analysis method review proposal unit 102 in addition to the configuration of the data analysis method candidate determination device 11 according to the first embodiment.
  • the analysis method review proposal unit 102 calculates the adoption rate for each analysis method when a case with the same or similar analysis purpose is added to the analysis case stored in the analysis case storage unit 3 and sets it in advance. If an analysis method of the employment rate that satisfies the analysis method review condition is detected, the analysis method is proposed to the user.
  • the analysis technique review proposal unit 102 is implemented as a function of the processor 20 by the processor 20 illustrated in FIG. 3 executing a software program stored in the memory 21.
  • the information of the user who registered or updated the analysis case, the information of the person in charge of the analysis case, the information of the developer or the provider of the analysis method, the current use state of the analysis case Etc. are preferably stored.
  • the current utilization status of analysis cases may include external cases, etc. in addition to the usage status of product application, trial, or cancellation.
  • the configuration of the data analysis method candidate determination device 16 other than that described above is the same as the configuration of the data analysis method candidate determination device 11 according to the first embodiment.
  • FIG. 21 is a flowchart showing the operation of the data analysis technique candidate determination device 16. Steps S11 to S16 are the same as in the first embodiment, but differ from the first embodiment in that a new step S20 is added after step S16.
  • the analysis method candidate determination unit 4 determines the analysis method candidate of the analysis target data (step S15), and when the analysis method candidate is presented to the user (step S16), the analysis method and the average similarity for each analysis method are proposed to review the analysis method.
  • the analysis method review proposal unit 102 determines whether or not the analysis method review proposal is necessary for the past analysis cases stored in the analysis case storage unit 3 (step S20).
  • FIG. 22 is a flowchart showing the operation of the analysis technique review proposing unit 102 in step S20 of FIG.
  • the analysis method review proposing unit 102 receives the analysis purpose and the average similarity for each analysis method calculated by the analysis method candidate determining unit 4 in step S15 of FIG. 21 (step S201). Subsequently, it is determined whether or not the analysis method has reached the review standard (step S202).
  • the review criteria are, for example, that the average similarity exceeds or is below the threshold.
  • the analysis method review proposal unit 102 holds the reception history of the average similarity for each analysis method for a certain period or a certain number of receptions, and the reception rate for each analysis method exceeds the threshold, or the reception When the correlation between the date and the average similarity shows an increasing tendency or a decreasing tendency for a certain period or more, it may be determined that the review standard has been reached. If the analysis technique does not reach the review criteria, the analysis technique review proposal unit 102 ends the process. On the other hand, if the analysis technique has reached the review criteria, the analysis technique review proposal unit 102 proceeds to the process of step S203.
  • step S203 the analysis technique review proposing unit 102 extracts a past analysis case that is the same as or similar to the analysis purpose received in step S201 from the analysis case storage unit 3.
  • the adoption rate of the analysis technique used in the extracted analysis case is calculated (step S204).
  • N is the number of extractions
  • N x is the number of methods X adopted.
  • the analysis case may be weighted according to the utilization state. That is, the weight is increased for the analysis example that has been applied to the product, and the weight is decreased for the analysis example that has been commercialized. Or you may weight according to the registration date of an analysis example, or an update date. That is, an analysis case with a new registration date or update date / time has a higher weight, and an analysis example with a new registration date / update date / time has a lower weight.
  • the analysis method review proposal unit 102 proposes review of the analysis example (step S205). For example, if the adoption rate of the K-means method exceeds the threshold in the clustering method, the registered / updated user of the analysis case not using the K-means method, the person in charge, the developer of the analysis method, or It is proposed that the analysis method is revised to the K-means method to providers and the like (hereinafter simply referred to as “users and the like”). Alternatively, if the adoption rate of the K-means method falls below the standard value in the clustering method, the analysis method is changed to a method different from the K-means method to users of analysis examples using the K-means method. Suggest to review. In this case, a list showing the analysis methods together with the adoption rate may be presented to the user or the like in descending order of the adoption rate.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The purpose of the present invention is to automatically recommend an analysis algorithm for analysis target data regardless of whether a source code or an intermediate code is present. A device for determining a data analysis method candidate according to the present invention determines an analysis method candidate for analysis target data to be subjected to data analysis, and includes: an analysis example storing unit 3 that stores, as an analysis example, data in which a data attribute and an analysis method are associated with each other, for each of a plurality of previously analyzed data items; an analysis target data storing unit 2 that stores data attribute information about the analysis target data; and an analysis method candidate determining unit 4 that calculates a data attribute similarity degree as the degree of similarity between the data attribute of the analysis target data and the data attribute of each of the previously analyzed data items, and determines, as an analysis method candidate for the analysis target data, at least one analysis method among analysis methods used for the previously analyzed data items on the basis of data attribute similarity degree.

Description

データ分析手法候補決定装置Data analysis method candidate decision device

 この発明は、データ分析手法候補を決定する技術に関する。 This invention relates to a technique for determining data analysis method candidates.

 データを分析するためには、データの特徴や意味するところに応じて適切なデータ分析手法を選択する必要がある。現状では、データサイエンティストと呼ばれるデータ分析手法に詳しい専門の技術者が、データ分析手法を推薦している。近年、インターネットに接続される機器の増加により、インターネットを経由して収集されるデータが爆発的に増加しているため、これらのデータを分析するデータ分析技術者に対するニーズは高まっている。しかしながら、データ分析技術者の育成は進んでおらず、収集されたものの有効活用されていないデータが数多く存在する。 In order to analyze data, it is necessary to select an appropriate data analysis method according to the characteristics and meaning of the data. At present, engineers specializing in data analysis methods called data scientists recommend data analysis methods. In recent years, due to an increase in the number of devices connected to the Internet, data collected via the Internet has increased explosively. Therefore, there is an increasing need for data analysis engineers who analyze these data. However, the development of data analysis engineers has not progressed, and there are many data that have been collected but not effectively used.

 データ分析技術者の不足という課題を解決するためには、データ分析手法を機械的に推薦する仕組みが必要である。関連分野の技術として、特許文献1には、過去のソフトウェア製品の開発実績および変更実績に基づいて、派生製品の開発時に同時に再利用または変更すべきソフト部品を選択するソフトウェア分析装置が開示されている。特許文献1のソフトウェア分析装置では、ソースコード化されたあるソフト部品がユーザにより選択されると、当該ソフト部品と同時利用されていると考えられるソフト部品を、ソフト部品間距離に基づいて抽出し、提示する。 In order to solve the problem of lack of data analysis engineers, a mechanism to recommend data analysis methods mechanically is necessary. As a technology in the related field, Patent Document 1 discloses a software analysis device that selects software parts to be reused or changed at the same time when developing a derivative product based on past development results and change results of software products. Yes. In the software analysis apparatus of Patent Document 1, when a user selects a software component that is converted into a source code, a software component that is considered to be used simultaneously with the software component is extracted based on the distance between the software components. To present.

 また、特許文献2には、ソースコードを推薦する情報処理装置が開示されている。特許文献2の情報処理装置は、開発中のプログラムのソースコードを中間コードに変換し、これに類似する中間コードをデータベースに記憶されている中間コードから抽出し、類似する中間コードのソースコードを推薦する。 Patent Document 2 discloses an information processing device that recommends a source code. The information processing apparatus of Patent Document 2 converts the source code of a program under development into intermediate code, extracts similar intermediate code from the intermediate code stored in the database, and extracts similar intermediate code source code. Recommendation to.

特開2010-113449号公報JP 2010-113449 A 特開2013-3664号公報JP 2013-3664 A

 しかし、特許文献1の技術は、ソースコード化されたソフト部品が存在しなければ利用できない、という問題がある。また、ソフト部品の部品間距離のみを用いて再利用するソフト部品を選定するため、分析対象データの類似性等を手掛かりに、再利用可能なソフト部品を選定することは出来ない、という問題があった。 However, the technique of Patent Document 1 has a problem that it cannot be used unless there is a software component that has been source-coded. In addition, since the software parts to be reused are selected using only the distance between the parts of the software parts, the reusable software parts cannot be selected based on the similarity of the analysis target data. there were.

 また、特許文献2では、ソースコードの言語種別は問わないものの、ソースコード化されたプログラムから生成した中間コードが無ければ、ソースコードの推薦が出来ない、という問題があった。 In Patent Document 2, although there is no limitation on the language type of the source code, there is a problem that the source code cannot be recommended if there is no intermediate code generated from the source-coded program.

 本発明は上述の問題に鑑み、ソースコード又は中間コードの存在有無によらず、分析対象データの分析手法候補を決定することを目的とする。 The present invention has been made in view of the above-described problems, and an object of the present invention is to determine analysis method candidates for data to be analyzed regardless of the presence or absence of source code or intermediate code.

 本発明に係るデータ分析手法候補決定装置は、データ分析を行うべき分析対象データの分析手法候補を決定するデータ分析手法候補決定装置であって、過去にデータ分析が行われた複数の分析済データの夫々について、データ属性および分析手法を紐付けたデータを分析事例として格納する分析事例格納部と、前記分析対象データについて、データ属性の情報を格納する分析対象データ格納部と、前記分析対象データのデータ属性と前記分析済データのデータ属性との類似度であるデータ属性類似度を算出し、前記データ属性類似度に基づき前記分析済データの分析手法の中から少なくとも一つの分析手法を前記分析対象データの分析手法候補として決定する分析手法候補決定部と、を備える。 A data analysis method candidate determination device according to the present invention is a data analysis method candidate determination device that determines an analysis method candidate of analysis target data to be subjected to data analysis, and a plurality of analyzed data that have been subjected to data analysis in the past An analysis case storage unit that stores data associated with data attributes and analysis methods as analysis cases, an analysis target data storage unit that stores data attribute information for the analysis target data, and the analysis target data A data attribute similarity that is a similarity between the data attribute of the analyzed data and the data attribute of the analyzed data is calculated, and at least one of the analysis methods of the analyzed data is analyzed based on the data attribute similarity An analysis method candidate determination unit that determines the analysis method candidate of the target data.

 本発明に係るデータ分析手法候補決定装置は、データ分析を行うべき分析対象データの分析手法候補を決定するデータ分析手法候補決定装置であって、過去にデータ分析が行われた複数の分析済データの夫々について、データ属性および分析手法を紐付けたデータを分析事例として格納する分析事例格納部と、前記分析対象データについて、データ属性の情報を格納する分析対象データ格納部と、前記分析対象データのデータ属性と前記分析済データのデータ属性との類似度であるデータ属性類似度を算出し、前記データ属性類似度に基づき前記分析済データの分析手法の中から少なくとも一つの分析手法を前記分析対象データの分析手法候補として決定する分析手法候補決定部と、を備える。データ属性類似度に基づき分析手法候補を決定するため、各分析手法のソースコードが無くても分析手法候補を決定することができる。 A data analysis method candidate determination device according to the present invention is a data analysis method candidate determination device that determines an analysis method candidate of analysis target data to be subjected to data analysis, and a plurality of analyzed data that have been subjected to data analysis in the past An analysis case storage unit that stores data associated with data attributes and analysis methods as analysis cases, an analysis target data storage unit that stores data attribute information for the analysis target data, and the analysis target data A data attribute similarity that is a similarity between the data attribute of the analyzed data and the data attribute of the analyzed data is calculated, and at least one of the analysis methods of the analyzed data is analyzed based on the data attribute similarity An analysis method candidate determination unit that determines the analysis method candidate of the target data. Since the analysis method candidates are determined based on the data attribute similarity, the analysis method candidates can be determined without the source code of each analysis method.

 本発明の目的、特徴、態様、および利点は、以下の詳細な説明と添付図面とによって、より明白となる。 The objects, features, aspects and advantages of the present invention will become more apparent from the following detailed description and the accompanying drawings.

実施の形態1に係るデータ分析手法候補決定装置の構成を示すブロック図である。1 is a block diagram showing a configuration of a data analysis technique candidate determination device according to Embodiment 1. FIG. データ属性を例示する図である。It is a figure which illustrates a data attribute. 実施の形態1に係るデータ分析手法候補決定装置のハードウェア構成を示す図である。2 is a diagram illustrating a hardware configuration of a data analysis technique candidate determination device according to Embodiment 1. FIG. 実施の形態1に係るデータ分析手法候補決定装置の動作を示すフローチャートである。4 is a flowchart showing an operation of the data analysis technique candidate determination device according to the first embodiment. 図4のステップS15における処理を示すフローチャートである。It is a flowchart which shows the process in step S15 of FIG. 距離評価軸の設定例を示す図である。It is a figure which shows the example of a setting of a distance evaluation axis. 実施の形態2に係るデータ分析手法候補決定装置の構成を示すブロック図である。6 is a block diagram illustrating a configuration of a data analysis technique candidate determination device according to Embodiment 2. FIG. 実施の形態2に係るデータ分析手法候補決定装置の動作を示すフローチャートである。10 is a flowchart showing the operation of the data analysis technique candidate determination device according to the second embodiment. 評価取得部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of an evaluation acquisition part. 実施の形態2の変形例に係るデータ分析手法候補決定装置の構成を示すブロック図である。FIG. 10 is a block diagram illustrating a configuration of a data analysis technique candidate determination device according to a modification of the second embodiment. 実施の形態2の変形例に係るデータ分析手法候補決定装置の動作を示すフローチャートである。10 is a flowchart showing an operation of a data analysis technique candidate determination device according to a modification of the second embodiment. 実施の形態3に係るデータ分析手法候補決定装置の構成を示すブロック図である。FIG. 10 is a block diagram illustrating a configuration of a data analysis technique candidate determination device according to a third embodiment. 実施の形態3に係るデータ分析手法候補決定装置の動作を示すフローチャートである。10 is a flowchart showing the operation of the data analysis technique candidate determination device according to the third embodiment. 実施の形態3に係るデータ分析手法候補決定装置の動作を示すフローチャートである。10 is a flowchart showing the operation of the data analysis technique candidate determination device according to the third embodiment. 関数フローチャートAを示す図である。It is a figure which shows the function flowchart A. 関数フローチャートBを示す図である。It is a figure which shows the function flowchart B. 実施の形態4に係るデータ分析手法候補決定装置の構成を示すブロック図である。FIG. 10 is a block diagram illustrating a configuration of a data analysis technique candidate determination device according to a fourth embodiment. 実施の形態4に係るデータ分析手法候補決定装置の動作を示すフローチャートである。10 is a flowchart showing the operation of the data analysis technique candidate determination device according to the fourth embodiment. 図18のステップS19における既存データ活用提案部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the existing data utilization proposal part in FIG.18 S19. 実施の形態5に係るデータ分析手法候補決定装置の構成を示すブロック図である。FIG. 10 is a block diagram illustrating a configuration of a data analysis technique candidate determination device according to a fifth embodiment. 実施の形態5に係るデータ分析手法候補決定装置の動作を示すフローチャートである。10 is a flowchart showing the operation of the data analysis technique candidate determination device according to the fifth embodiment. 図20のステップS20における分析手法見直し提案部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the analysis technique review proposal part in FIG.20 S20.

 <A.実施の形態1>
 <A-1.構成>
 図1は、実施の形態1に係るデータ分析手法候補決定装置11の構成を示すブロック図である。データ分析手法候補決定装置11は、データ分析を行うべき分析対象データの分析手法候補を決定し、それをユーザに推薦する装置である。データ分析手法候補決定装置11は、分析対象データ格納部2、分析事例格納部3、および分析手法候補決定部4を備えている。但し、これらデータ分析手法候補決定装置11の構成要素は、一つの装置内に設けられるだけでなく、複数の装置に分散して配置され、それら複数の装置がインターネット等のネットワークにより互いに接続され、全体として一つのシステムとしてのデータ分析手法候補決定装置11を構成しても良い。
<A. Embodiment 1>
<A-1. Configuration>
FIG. 1 is a block diagram illustrating a configuration of a data analysis technique candidate determination device 11 according to the first embodiment. The data analysis method candidate determination device 11 is a device that determines an analysis method candidate of analysis target data to be analyzed and recommends it to the user. The data analysis method candidate determination device 11 includes an analysis target data storage unit 2, an analysis case storage unit 3, and an analysis method candidate determination unit 4. However, the components of the data analysis method candidate determination device 11 are not only provided in one device, but are also distributed and arranged in a plurality of devices, and the plurality of devices are connected to each other by a network such as the Internet, The data analysis method candidate determination device 11 as a system as a whole may be configured.

 データ分析手法候補決定装置11は、入力部5と出力部6を利用可能である。入力部5は、ユーザからの指令又は検索条件等をデータ分析手法候補決定装置11に入力するための入力インタフェースである。また、出力部6は、分析手法候補決定部4による分析手法候補の決定結果をユーザに出力する出力インタフェースである。図1では、入力部5と出力部6をデータ分析手法候補決定装置11とは別の構成として示しているが、これらをデータ分析手法候補決定装置11が備えていてもよい。 The data analysis technique candidate determination device 11 can use the input unit 5 and the output unit 6. The input unit 5 is an input interface for inputting a command or search condition from the user to the data analysis technique candidate determination device 11. The output unit 6 is an output interface that outputs the determination result of the analysis method candidate by the analysis method candidate determination unit 4 to the user. In FIG. 1, the input unit 5 and the output unit 6 are shown as different configurations from the data analysis method candidate determination device 11, but the data analysis method candidate determination device 11 may include these.

 分析対象データ格納部2は、HDD(Hard Disk Drive)又はSD等といった記録媒体により構成され、データ分析を行うべき分析対象データと、当該分析対象データのデータ属性とを格納する。データ分析手法候補決定装置11の分析対象データは、センサ等から直接計測された温度、湿度、振動、速度、加速度、圧力、日射量、距離、重量、電流、電圧、電力量、回転数、もしくは数等の時系列データ、または機器の使用履歴、アクセスログ、移動体のGPSデータ、気象観測、もしくは気象予報等の離散データ、または報告書、点検記録、作業履歴、帳票、もしくは計画書等の文書データ、または人口統計もしくは白書等の統計データ等を含む。分析対象データは、これからデータ分析を行うべきデータであるが、その他に、過去にデータ分析を行った分析済データと、データ分析、予測、または推定等によって新たに作成されたデータ分析結果が分析対象データ格納部2に格納されていても良い。また、分析対象データ格納部2には、過去にデータ分析を行っていないが利用可能なデータとそのデータのデータ属性とが含まれていてもよい。なお、分析対象データ格納部2は、分析対象データのデータ属性が格納されていれば良く、分析対象データ自体は必ずしも格納されていなくても良い。分析対象データ自体が分析対象データ格納部2に格納されない分析対象データの例としては、自治体等が提供するオープンデータ、SNS(Social Network System)に投稿されたデータ、またはデータ分析手法候補決定装置11からアクセス可能なクラウド環境等に分散保存されたデータ等がある。 The analysis target data storage unit 2 is configured by a recording medium such as an HDD (Hard Disk Drive) or SD, and stores analysis target data to be analyzed and data attributes of the analysis target data. The data to be analyzed by the data analysis method candidate determination device 11 includes temperature, humidity, vibration, speed, acceleration, pressure, amount of solar radiation, distance, weight, current, voltage, electric energy, number of revolutions, Time series data such as numbers, device usage history, access logs, mobile GPS data, discrete data such as weather observations or weather forecasts, reports, inspection records, work history, forms, plans, etc. Includes document data or demographic or white paper statistical data. The data to be analyzed is data that should be analyzed from now on, but in addition to this, analyzed data that has been analyzed in the past and data analysis results newly created by data analysis, prediction, estimation, etc. are analyzed. It may be stored in the target data storage unit 2. The analysis target data storage unit 2 may include data that has not been analyzed in the past but can be used and data attributes of the data. The analysis target data storage unit 2 only needs to store the data attribute of the analysis target data, and the analysis target data itself does not necessarily have to be stored. Examples of analysis target data in which the analysis target data itself is not stored in the analysis target data storage unit 2 include open data provided by local governments, data submitted to an SNS (Social Network System), or data analysis method candidate determination device 11 There is data that is distributed and stored in a cloud environment that can be accessed from.

 図2は、データ属性を例示する図である。図2は、データA、データB、およびデータCの夫々についてデータ属性を示している。データ属性とはデータの特徴を表すもので、例えばデータの取得間隔、データの取得方法、実績値か予測値か加工値の別、データ種別、関連データ、および関連機器等がある。このほか、データに対するアクセス権限をデータ属性としてもよい。 FIG. 2 is a diagram illustrating data attributes. FIG. 2 shows data attributes for data A, data B, and data C, respectively. Data attributes represent data characteristics, and include, for example, data acquisition intervals, data acquisition methods, distinction between actual values, predicted values, and processed values, data types, related data, and related devices. In addition, access authority for data may be used as a data attribute.

 分析事例格納部3は、HDD(Hard Disk Drive)又はSD等といった記録媒体により構成されている。分析事例格納部3には、過去にデータ分析が行われた分析済みデータについて、データ属性と分析手法とを紐付けたデータが分析事例として格納されている。分析事例は、データ分析手法候補決定装置11によって作成された分析事例である必要はなく、既存の分析事例、文献等による公知事例、検討段階における試適用事例、不採用事例、または分析方法変更事例等を含むことが望ましい。また、分析事例は、分析手法に対するユーザの評価情報を含んでいても良い。各分析事例において、分析手法はソースコードで記載されてもよいし、プログラムが実行可能な中間コードで記載されてもよい。あるいは、「回帰分析」または「k-means法」等のように名称で記載されてもよい。あるいは、「統計解析→クラスタリング→k-means法」のように、上位概念、中位概念、下位概念からなる階層構造で記載してもよい。あるいは、ID化されて記載されてもよい。 The analysis case storage unit 3 includes a recording medium such as an HDD (Hard Disk Drive) or SD. The analysis case storage unit 3 stores, as analysis cases, data in which data attributes and analysis techniques are associated with analyzed data that has been subjected to data analysis in the past. The analysis example does not need to be an analysis example created by the data analysis method candidate determination device 11, but is an existing analysis example, a publicly known example based on literature, a trial application example at the examination stage, a non-adoption example, or an analysis method change example Etc. are desirable. Further, the analysis example may include user evaluation information for the analysis method. In each analysis case, the analysis method may be described in source code, or may be described in intermediate code that can be executed by the program. Alternatively, it may be described by a name such as “regression analysis” or “k-means method”. Or you may describe by the hierarchical structure which consists of a high-order concept, a middle concept, and a low-order concept like "statistical analysis-> clustering-> k-means method". Alternatively, it may be described as an ID.

 分析手法候補決定部4は、分析対象データのデータ分析に用いるべき分析手法を過去の分析事例で用いられた分析手法の中から選択し、分析手法候補として決定する。ここで決定された分析手法候補は、出力部6から例えばテキスト形式で出力され、ユーザに推薦される。あるいは、分析手法候補に代表過去事例を合わせたものがリスト形式で出力され、ユーザに推薦されてもよい。この場合、ユーザは分析手法候補の実施例または特徴を理解しやすい。 The analysis method candidate determination unit 4 selects an analysis method to be used for data analysis of the analysis target data from the analysis methods used in the past analysis examples, and determines the analysis method candidate. The analysis method candidate determined here is output from the output unit 6 in, for example, a text format and recommended to the user. Alternatively, a combination of representative past cases with analysis method candidates may be output in a list format and recommended to the user. In this case, the user can easily understand examples or features of the analysis technique candidates.

 図3は、データ分析手法候補決定装置11のハードウェア構成を示す図である。データ分析手法候補決定装置11は、プロセッサ20、メモリ21、および記録媒体22を備えて構成される。分析手法候補決定部4は、RAM(Random Access Memory)等のメモリ21に格納されたソフトウェアプログラムが、CPU(Central Processing Unit)等のプロセッサ20により実行されることにより、当該プロセッサ20の機能として実現する。ただし、これらは複数のプロセッサが連携して実現されても良い。なお、分析手法候補決定部4は、当該動作をハードウェアの電気回路で実現する信号処理回路により実現されてもよい。ソフトウェアの分析手法候補決定部4と、ハードウェアの分析手法候補決定部4とを合わせた概念として、「部」という語に代えて「処理回路」という語を用いることも出来る。 FIG. 3 is a diagram illustrating a hardware configuration of the data analysis technique candidate determination apparatus 11. The data analysis technique candidate determination device 11 includes a processor 20, a memory 21, and a recording medium 22. The analysis technique candidate determination unit 4 is realized as a function of the processor 20 by executing a software program stored in a memory 21 such as a RAM (Random Access Memory) by a processor 20 such as a CPU (Central Processing Unit). To do. However, these may be realized in cooperation with a plurality of processors. Note that the analysis technique candidate determination unit 4 may be realized by a signal processing circuit that realizes the operation by a hardware electric circuit. As a concept combining the software analysis method candidate determination unit 4 and the hardware analysis method candidate determination unit 4, the word “processing circuit” can be used instead of the word “part”.

 <A-2.動作>
 図4は、データ分析手法候補決定装置11の動作を示すフローチャートである。まず、ユーザが入力部5を介して、分析対象データおよび分析目的を選択する(ステップS11)。分析対象データについては、例えば分析対象データ格納部2に格納済のデータの一覧を表示して、その中からユーザに選択させても良いし、ユーザが電子ファイル等で新たに分析対象データを入力できるようにしても良い。新たに分析対象データが入力された場合、当該データは分析対象データ格納部2に格納される。
<A-2. Operation>
FIG. 4 is a flowchart showing the operation of the data analysis technique candidate determination apparatus 11. First, the user selects analysis target data and analysis purpose via the input unit 5 (step S11). For the analysis target data, for example, a list of data stored in the analysis target data storage unit 2 may be displayed, and the user may select from the list, or the user inputs new analysis target data using an electronic file or the like. You may be able to do it. When analysis target data is newly input, the data is stored in the analysis target data storage unit 2.

 分析目的については、例えばプルダウンメニュー等の一覧を表示して、その中からユーザに選択させても良いし、ユーザが文字列で入力できるようにしても良い。ここでユーザが選択した分析目的は、分析対象データ格納部2に格納される。また、分析目的は1つに限定せず、複数あっても良い。ここでは、「テレビの視聴データ」、「視聴者の視聴嗜好の分析」を、それぞれ分析対象データ、分析目的の例として説明を続ける。 For the purpose of analysis, for example, a list such as a pull-down menu may be displayed, and the user may select from the list, or the user may be able to input with a character string. The analysis purpose selected by the user here is stored in the analysis object data storage unit 2. Further, the analysis purpose is not limited to one, and there may be a plurality of analysis purposes. Here, “television viewing data” and “viewer viewing preference analysis” will be described as analysis target data and analysis purpose examples, respectively.

 次に、分析対象データ格納部2から分析手法候補決定部4に分析対象データを読み込む(ステップS12)。すなわち、各テレビ端末から収集したテレビの視聴データを分析対象データとして読み込む。 Next, the analysis target data is read from the analysis target data storage unit 2 to the analysis technique candidate determination unit 4 (step S12). That is, television viewing data collected from each television terminal is read as analysis target data.

 続いて、分析対象データ格納部2から分析手法候補決定部4に、分析対象データのデータ属性および分析目的を読み込む(ステップS13)。すなわち、分析対象データである「テレビの視聴データ」のデータ属性として、例えばデータ取得間隔、データ取得機器の所在地、およびデータ取得機器の所有者情報を読み込み、分析目的として「視聴者の視聴嗜好の分析」を読み込む。 Subsequently, the data attribute and analysis purpose of the analysis target data are read from the analysis target data storage unit 2 to the analysis technique candidate determination unit 4 (step S13). That is, as data attributes of “TV viewing data” that is analysis target data, for example, the data acquisition interval, the location of the data acquisition device, and the owner information of the data acquisition device are read. Read "Analysis".

 続いて、分析事例格納部3から分析手法候補決定部4に、データ属性が分析対象データと同一若しくは類似、または分析目的が分析対象データと同一若しくは類似する分析事例を読み込む(ステップS14)。例えば、分析対象データ「テレビの視聴データ」とデータ属性が類似する分析事例として、「テレビの地域別視聴率調査」、「地域別好きなタレント分析」、「人気のある映画ジャンル調査」、「電力使用状況調査」、または「工場における生産効率分析」等がある。また、分析目的が類似する分析事例として、「インターネットのブラウジング履歴分析」、「商品購入状況分析」、「立ち寄り店舗分析」、「ポイントカードの保有状況分析」、「公共交通機関の乗車履歴」、または「旅行時の訪問施設分析」等がある。 Subsequently, an analysis case having a data attribute that is the same as or similar to the analysis target data or an analysis purpose that is the same or similar to the analysis target data is read from the analysis case storage unit 3 to the analysis technique candidate determination unit 4 (step S14). For example, analysis examples similar to the data to be analyzed “TV viewing data” include “TV region audience rating survey”, “regional favorite talent analysis”, “popular movie genre survey”, “ “Power usage survey” or “Production efficiency analysis in factories”. Examples of similar analysis purposes include "Internet browsing history analysis", "Product purchase status analysis", "Visit store analysis", "Point card holding status analysis", "Public transport ride history", Or “visit facility analysis during travel”.

 続いて、分析手法候補決定部4が分析対象データの分析手法候補を決定する(ステップS15)。ステップS15における詳細な処理内容は後述する。 Subsequently, the analysis method candidate determination unit 4 determines analysis method candidates for the analysis target data (step S15). Detailed processing contents in step S15 will be described later.

 最後に、ステップS15で作成した分析手法候補を出力部6に出力してユーザに推薦し(ステップS16)、処理を終了する。 Finally, the analysis technique candidate created in step S15 is output to the output unit 6 and recommended to the user (step S16), and the process ends.

 図5は、図4のステップS15における、分析手法候補決定部4による分析手法候補の決定処理を示すフローチャートである。初めに、図4のステップS14にて読み込んだ分析事例に関して、分析対象データと分析済データとのデータ属性類似度を算出する(ステップS151)。「公共交通機関の乗車履歴」データを分析事例の分析済データの一例として、処理を具体的に説明する。ユーザが指定した分析対象データである「テレビの視聴データ」のデータ属性と、分析済データ「公共交通機関の乗車履歴」の分析に用いた「交通系ICカードの乗車履歴」データまたは「GPSデータから推定した公共交通機関の乗車経路」データ等のデータ属性とについて、データ属性類似度Szを算出する。データ属性類似度Szは、例えば以下の式により算出される。 FIG. 5 is a flowchart showing analysis method candidate determination processing by the analysis method candidate determination unit 4 in step S15 of FIG. First, the data attribute similarity between the analysis target data and the analyzed data is calculated for the analysis example read in step S14 of FIG. 4 (step S151). The processing will be specifically described using “public transit history” data as an example of analyzed data of an analysis example. Data attribute of “TV viewing data” that is analysis target data designated by the user, and “Traffic IC card boarding history” data or “GPS data” used for analyzing analyzed data “Boarding history of public transportation” The data attribute similarity Sz is calculated with respect to data attributes such as "the public transportation route estimated from". The data attribute similarity Sz is calculated by the following equation, for example.

Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000001

 但し、Nはデータ属性として登録している項目数、Lmaxiはi番目のデータ属性項目の最大距離、Liはi番目のデータ属性項目の距離とする。例えば、データ属性項目ごとに距離評価軸を設定し、当該距離評価軸を用いてi番目のデータ属性項目の距離Liを算出する。 However, N is the number of items registered as data attributes, Lmaxi is the maximum distance of the i-th data attribute item, and Li is the distance of the i-th data attribute item. For example, a distance evaluation axis is set for each data attribute item, and the distance Li of the i-th data attribute item is calculated using the distance evaluation axis.

 図6に、距離評価軸の設定例を示す。例えば、データ取得間隔については、分析対象データと分析済データのうち少なくとも一方のデータ取得間隔が不定期であれば、距離を10とする。また、分析済データのデータ取得間隔が分析対象データのデータ取得間隔よりも短ければ、距離を0とする。また、分析対象データおよび分析済データの一方の取得間隔が他方の取得間隔の100倍以上であれば距離を5とする。また、データ取得方法については、例えば、同一手法なら距離を0、一方がログで他方が端末入力なら距離を2、双方ともセンサログだがセンサ種別が異なっていれば距離を1とする。また、実績値と予測値の別については、例えば、双方とも実績値であれば距離を0、一方が実績値で他方が予測値であれば距離を20、双方とも予測値であれば距離を100とする。このように、距離評価軸は、データ属性項目ごとにルールベースで設定されてもよいし、数式で設定されても良い。また、ルール数に制限を設けなくてもよく、距離の最大値は評価軸ごとに設けてもよい。図6のように設定された距離評価軸の中で、距離が最大となるものを最大距離とする。なお、図6では距離は正の値のみのケースについて記載したが、負の値をとる距離があってもよく、1次元値を取らず2次元以上の値をとってもよい。 Fig. 6 shows an example of setting the distance evaluation axis. For example, regarding the data acquisition interval, the distance is set to 10 if at least one of the data to be analyzed and the analyzed data is irregular. If the data acquisition interval of the analyzed data is shorter than the data acquisition interval of the analysis target data, the distance is set to zero. The distance is set to 5 if one acquisition interval of analysis target data and analyzed data is 100 times or more the other acquisition interval. Regarding the data acquisition method, for example, the distance is 0 for the same method, the distance is 2 if one is a log and the other is a terminal input, and the distance is 1 if both are sensor logs but the sensor types are different. For example, if both are actual values, the distance is 0, if one is the actual value and the other is the predicted value, the distance is 20, and if both are the predicted values, the distance is 100. Thus, the distance evaluation axis may be set on a rule basis for each data attribute item, or may be set by a mathematical expression. Further, the number of rules need not be limited, and the maximum value of the distance may be provided for each evaluation axis. Among the distance evaluation axes set as shown in FIG. 6, the maximum distance is determined as the maximum distance. In FIG. 6, the case where the distance has only a positive value is described. However, there may be a distance that takes a negative value, or a two-dimensional value or more may be taken without taking a one-dimensional value.

 続いて、ステップS151でデータ属性類似度を算出した分析事例に対して、分析対象データとの分析目的類似度Spを算出する(ステップS152)。例えば、分析対象データの分析目的と分析済データの分析目的とを文字列で比較して、その類似度を分析目的類似度Spとして算出する。分析目的類似度Spは、例えばコサイン類似度またはレーベンシュタイン距離等を用いて求めることができる。例えば、分析対象データの分析目的の文字列Aと、分析済データの分析目的の文字列Bとの間の分析目的類似度Spをコサイン類似度で求めると、以下の式で算出される。 Subsequently, for the analysis example for which the data attribute similarity is calculated in step S151, an analysis purpose similarity Sp with the analysis target data is calculated (step S152). For example, the analysis objective of the analysis target data and the analysis objective of the analyzed data are compared with character strings, and the similarity is calculated as the analysis objective similarity Sp. The analysis target similarity Sp can be obtained by using, for example, a cosine similarity or a Levenshtein distance. For example, when the analysis target similarity Sp between the analysis target character string A of the analysis target data and the analysis target character string B of the analyzed data is obtained as the cosine similarity, it is calculated by the following equation.

Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000002

 但し、A・Bは文字列Aと文字列Bの内積、|A|は文字列Aの距離、|B|は文字列Bの距離とする。 However, A · B is the inner product of character string A and character string B, | A | is the distance of character string A, and | B | is the distance of character string B.

 分析対象データの分析目的の文字列Aを「視聴者の視聴嗜好の分析」、分析済データの分析目的の文字列Bを「人気のある映画ジャンル調査」として、これらの分析目的類似度Spの算出方法を説明する。文字列Aを単語レベルに分解してキーワードを抽出すると、「視聴、者、嗜好、分析」が得られ、同様に文字列Bからは「人気、映画、ジャンル、調査」が得られる。このとき、「嗜好=人気」や「分析=調査」のように、類似語を紐づけて、文字列Bのキーワードを「嗜好、映画、ジャンル、分析」としてもよい。類似語を定義した類似語データベースを分析対象データ格納部2または分析事例格納部3に設け、当該類似語データベースを参照して類似語の紐付けを行うことができる。 The analysis target character string A is “analysis of viewer's viewing preference” and the analysis target character string B of the analyzed data is “popular movie genre survey”. A calculation method will be described. When keywords are extracted by decomposing the character string A into word levels, “viewing, viewers, preferences, analysis” is obtained, and “popularity, movie, genre, survey” is obtained from the character string B as well. At this time, similar words may be linked and the keyword of the character string B may be “preference, movie, genre, analysis” like “preference = popular” or “analysis = survey”. A similar word database in which similar words are defined can be provided in the analysis target data storage unit 2 or the analysis example storage unit 3, and similar words can be linked by referring to the similar word database.

 文字列A,Bをベクトル表示すると、A:(視聴、者、嗜好、分析、映画、ジャンル)=(2,1,1,1,0,0)、B:(視聴、者、嗜好、分析、映画、ジャンル)=(0,0,1,1,1,1)となる。 When the character strings A and B are displayed as vectors, A: (viewing, audience, preference, analysis, movie, genre) = (2,1,1,1,0,0), B: (viewing, audience, preference, analysis) , Movie, genre) = (0, 0, 1, 1, 1, 1).

 また、分析目的類似度Spは、以下のように算出される。 Also, the analysis target similarity Sp is calculated as follows.

Figure JPOXMLDOC01-appb-M000003
Figure JPOXMLDOC01-appb-M000003

 その他の例として、分析目的がソースコードまたは中間コードで記載されている場合には、ソースコードまたは中間コードに示される処理手順をUML(Unified Modeling Language、統一モデリング言語)または関数フローチャート等の手法で整理し、処理手順の類似度から分析目的類似度Spを算出しても良い。以下、図15に示す関数フローチャートAと図16に示す関数フローチャートBを例に、分析目的類似度Spの算出方法を説明する。 As another example, when the analysis purpose is described in source code or intermediate code, the processing procedure shown in the source code or intermediate code can be changed using a technique such as UML (Unified Modeling Language) or a function flowchart. The analysis purpose similarity Sp may be calculated from the similarity of the processing procedures. Hereinafter, a method for calculating the analysis object similarity Sp will be described with reference to the function flowchart A shown in FIG. 15 and the function flowchart B shown in FIG.

 関数フローチャートAは、ステップS21からステップS26が順番に実行されることを示している。ステップS21はXを入力するステップ、ステップS22はX/5をYに代入するステップ、ステップS23はYを出力するステップ、ステップS24はZを入力するステップ、ステップS25はY×ZをAに代入するステップ、ステップS26はYを出力するステップである。 The function flowchart A indicates that steps S21 to S26 are executed in order. Step S21 is a step of inputting X, step S22 is a step of substituting X / 5 for Y, step S23 is a step of outputting Y, step S24 is a step of inputting Z, and step S25 is a step of substituting Y × Z into A Step S26 is a step for outputting Y.

 関数フローチャートBはステップS31からステップS33が順番に実行されることを示している。ステップS31はXを入力するステップ、ステップS32はXに関するサブルーチンのステップであり、ステップS33はYを出力するステップである。Xに関するサブルーチンのステップS32は、X/5をYに代入するステップS34である。 Function flowchart B shows that steps S31 to S33 are executed in order. Step S31 is a step of inputting X, step S32 is a step of a subroutine relating to X, and step S33 is a step of outputting Y. Step S32 of the subroutine relating to X is step S34 in which X / 5 is substituted for Y.

 これら二つの関数フローチャートA,Bのそれぞれにおいて、処理手順の一致率を全処理手順数に対する一致処理手順数で定義したとする。入出力処理と演算処理のみを処理手順のカウント対象とした場合、一致率は以下のように算出される。 Suppose that in each of these two function flowcharts A and B, the matching rate of processing procedures is defined by the number of matching processing procedures relative to the total number of processing procedures. When only the input / output process and the arithmetic process are to be counted in the processing procedure, the coincidence rate is calculated as follows.

Figure JPOXMLDOC01-appb-M000004
Figure JPOXMLDOC01-appb-M000004

 この一致率に一致処理手順の連続数の大きさを加味すると、分析目的類似度Spは例えば以下のような式で表すことができる。 When the magnitude of the number of consecutive matching processing procedures is added to this matching rate, the analysis target similarity degree Sp can be expressed by the following equation, for example.

Figure JPOXMLDOC01-appb-M000005
Figure JPOXMLDOC01-appb-M000005

 また、分析目的が上位概念、中位概念、および下位概念からなる階層構造で記載されている場合には、上位概念、中位概念、および下位概念それぞれの分析目的類似度を(6)式で算出し、その平均をとってもよい。あるいは、上位概念、中位概念、および下位概念の選択肢のそれぞれにあらかじめ手法の類似度を考慮したID番号を付与しておき、ID番号を組み合わせた数字の差分量に基づいて分析目的類似度Spを求めてもよい。 In addition, when the analysis purpose is described in a hierarchical structure composed of a superordinate concept, a middle concept, and a subordinate concept, the analysis target similarity of each of the superordinate concept, the middle concept, and the subordinate concept is expressed by Equation (6). You may calculate and take the average. Alternatively, an ID number considering the similarity of the method is assigned to each of the upper concept, middle concept, and lower concept options in advance, and the analysis target similarity Sp is based on the difference between the numbers obtained by combining the ID numbers. You may ask for.

 例えば、ID番号の最大値を「9-9-99」とすると、上位概念-中位概念-下位概念のID番号が「1-0-01」で表される分析目的と、上位概念-中位概念-下位概念のID番号が「1-0-02」で表される分析目的との分析目的類似度Spは、以下のように算出することができる。 For example, if the maximum value of the ID number is “9-9-99”, the analysis purpose in which the ID number of the superordinate concept—medium concept—subordinate concept is represented by “1-0-01” and the superordinate concept—medium The analysis object similarity Sp with the analysis object whose ID number is “1-0-02” can be calculated as follows.

Figure JPOXMLDOC01-appb-M000006
Figure JPOXMLDOC01-appb-M000006

 また、上位概念-中位概念-下位概念のID番号が、「1-0-01」で表されている分析目的に対して、上位概念-中位概念-下位概念のID番号が「5-0-01」で表されている分析目的との分析目的類似度Spは、以下のように算出することができる。 For the purpose of analysis where the ID number of the superordinate concept-intermediate concept-subordinate concept is represented by “1-0-01”, the ID number of the superordinate concept-intermediate concept-subordinate concept is “5- The analysis object similarity Sp with the analysis object represented by “0-01” can be calculated as follows.

Figure JPOXMLDOC01-appb-M000007
Figure JPOXMLDOC01-appb-M000007

 上記で説明した分析目的類似度Spの算出式は、あくまでも一例である。よって、特定の条件に重みづけを行ったり、分析目的類似度の算出方法の違いによる演算結果の平均値に偏りがある場合等に傾斜等の補正演算を行ったり、といった変形例が可能である。 The formula for calculating the analysis target similarity degree Sp described above is merely an example. Therefore, it is possible to make modifications such as weighting a specific condition or performing a correction operation such as tilting when there is a bias in the average value of calculation results due to a difference in the calculation method of the analysis target similarity. .

 また、分析目的の記述方法が異なる事例が混在している場合には、複数の事例を代表する事例を抽出し、代表事例についてのみすべての分析目的記述方法における分析目的を付与することで、間接的に分析目的の比較ができるようにしてもよい。 In addition, if there are cases with different description methods for analysis purposes, cases that represent multiple cases are extracted, and only the representative cases are assigned analysis purposes for all analysis purpose description methods, so that In particular, comparison for analysis purposes may be made possible.

 続いて、データ属性類似度Szと分析目的類似度Spに基づいて、分析対象データと分析済データとの総合類似度Sを算出する(ステップS153)。総合類似度Sは、例えば以下の式により算出される。 Subsequently, based on the data attribute similarity Sz and the analysis target similarity Sp, an overall similarity S between the analysis target data and the analyzed data is calculated (step S153). The total similarity S is calculated by the following formula, for example.

Figure JPOXMLDOC01-appb-M000008
Figure JPOXMLDOC01-appb-M000008

 続いて、総合類似度を算出していない他の分析済データが存在するか否かを確認する(ステップS154)。総合類似度を算出していない分析済データが存在すれば、ステップS151に戻り、当該分析済データに対してステップS151からステップS153までの処理を実行する。全ての分析済データに対して類似度の算出が終了すれば、ステップS155に進む。 Subsequently, it is confirmed whether there is any other analyzed data for which the total similarity is not calculated (step S154). If there is analyzed data for which the total similarity is not calculated, the process returns to step S151, and the processes from step S151 to step S153 are executed on the analyzed data. When the calculation of the similarity is completed for all analyzed data, the process proceeds to step S155.

 ステップS155では、図4のステップS14で読み込んだ全ての分析事例の総合類似度から、分析手法ごとに平均類似度を算出する。例えば、図4のステップS14で読み込んだ分析事例では、「回帰分析」、「k-means法」、「行動モデルベース推論」「行動モデルベース推論及び待ち行列シミュレーション」、「ニューラルネットワーク」等の分析手法が用いられていたとする。このとき、「回帰分析」に対する平均類似度Savは、例えば以下の式により算出される。 In step S155, an average similarity is calculated for each analysis method from the total similarity of all analysis cases read in step S14 of FIG. For example, in the analysis example read in step S14 in FIG. 4, analysis such as “regression analysis”, “k-means method”, “behavior model base reasoning”, “behavior model base reasoning and queuing simulation”, “neural network”, etc. Suppose that the technique was used. At this time, the average similarity Sav with respect to the “regression analysis” is calculated by the following equation, for example.

Figure JPOXMLDOC01-appb-M000009
Figure JPOXMLDOC01-appb-M000009

 但し、Nは、データ分析手法として「回帰分析」を含む事例数を示し、ΣSは、データ分析手法として「回帰分析」を含む事例の総合類似度の和を示している。上記の例では相加平均を用いたが、相乗平均、調和平均、加重平均など、他の様々な平均を用いて平均類似度を算出しても良い。 However, N times indicates the number of cases including “regression analysis” as the data analysis method, and ΣS times indicates the sum of the total similarities of cases including “regression analysis” as the data analysis method. Although the arithmetic average is used in the above example, the average similarity may be calculated using various other averages such as a geometric average, a harmonic average, and a weighted average.

 1つの事例の中で複数の分析手法が用いられている場合は、複数の分析手法の組み合わせを保持したまま平均類似度を算出してもよい。あるいは、単一手法としての平均類似度を算出した後、平均類似度の高い手法についてのみ、組み合わせて使用されているデータ分析手法に対して再度、平均類似度を算出してもよい。 When a plurality of analysis methods are used in one case, the average similarity may be calculated while maintaining a combination of a plurality of analysis methods. Alternatively, after calculating the average similarity as a single method, the average similarity may be calculated again for the data analysis methods used in combination only for the method having a high average similarity.

 最後に、分析対象データに対する分析手法候補を決定する(ステップS156)。ここでは、最も平均類似度の高い分析手法を分析手法候補としても良いし、平均類似度の高い順に複数の分析手法を分析手法候補としても良い。図4のステップS16で分析手法候補を出力する際、分析手法候補に加えて、その平均類似度、当該分析手法候補を含む分析事例数、または当該分析手法候補を用いている分析目的の出現頻度等を共に出力しても良い。 Finally, analysis method candidates for the analysis target data are determined (step S156). Here, the analysis method with the highest average similarity may be set as the analysis method candidate, or a plurality of analysis methods may be set as the analysis method candidates in descending order of the average similarity. When outputting the analysis method candidates in step S16 of FIG. 4, in addition to the analysis method candidates, the average similarity, the number of analysis cases including the analysis method candidates, or the appearance frequency of the analysis purpose using the analysis method candidates Etc. may be output together.

 <A-3.効果>
 実施の形態1に係るデータ分析手法候補決定装置11は、過去にデータ分析が行われた複数の分析済データの夫々について、データ属性および分析手法を紐付けたデータを分析事例として格納する分析事例格納部3と、分析対象データについて、データ属性の情報を格納する分析対象データ格納部2と、分析対象データのデータ属性と分析済データのデータ属性との類似度であるデータ属性類似度を算出し、データ属性類似度に基づき分析済データの分析手法の中から少なくとも一つの分析手法を分析対象データの分析手法候補として決定する分析手法候補決定部4と、を備える。従って、各分析手法のソースコードが無くても、データ属性が類似する分析事例を参考にして分析手法候補を決定することができる。
<A-3. Effect>
The data analysis technique candidate determination device 11 according to the first embodiment stores an analysis example in which data associated with a data attribute and an analysis technique is stored as an analysis example for each of a plurality of analyzed data subjected to data analysis in the past. The storage unit 3, the analysis target data storage unit 2 that stores data attribute information, and the data attribute similarity that is the similarity between the data attribute of the analysis target data and the data attribute of the analyzed data is calculated. And an analysis method candidate determination unit 4 that determines at least one analysis method as an analysis method candidate for the analysis target data from among the analysis methods of the analyzed data based on the data attribute similarity. Therefore, even if there is no source code for each analysis method, analysis method candidates can be determined with reference to analysis cases having similar data attributes.

 また、分析事例格納部3は、複数の分析済データの夫々について、分析目的の情報を格納し、分析対象データ格納部2は、分析対象データの分析目的の情報を格納し、分析手法候補決定部4は、分析対象データの分析目的と分析済データの分析目的との類似度を分析目的類似度として算出し、分析目的類似度及びデータ属性類似度に基づき分析対象データと分析済データの総合類似度を算出し、総合類似度に基づき、分析済データの分析手法の中から少なくとも一つの分析手法を分析対象データの分析手法候補として決定する。従って、各分析手法のソースコードが無くても、データ属性及び分析目的が類似する分析事例を参考にして分析手法候補を決定することができる。 The analysis case storage unit 3 stores analysis purpose information for each of a plurality of analyzed data, and the analysis target data storage unit 2 stores analysis purpose information of the analysis target data to determine analysis method candidates. The unit 4 calculates the similarity between the analysis purpose of the analysis target data and the analysis purpose of the analyzed data as the analysis target similarity, and combines the analysis target data and the analyzed data based on the analysis target similarity and the data attribute similarity. The similarity is calculated, and at least one analysis method is determined as an analysis method candidate for the analysis target data from the analysis methods of the analyzed data based on the total similarity. Therefore, even if there is no source code for each analysis method, analysis method candidates can be determined with reference to analysis cases having similar data attributes and analysis purposes.

 また、分析済データ及び分析対象データのデータ属性は、データ取得間隔、データ取得方法、実績値か予測値か加工値の別、のいずれかを少なくとも含む。これらのデータ属性の類似度に基づき分析手法候補を決定することで、各分析手法のソースコードが無くても、分析手法候補を決定することができる。 Further, the data attributes of the analyzed data and the analysis target data include at least one of a data acquisition interval, a data acquisition method, and an actual value, a predicted value, or a processed value. By determining the analysis method candidates based on the similarity of these data attributes, the analysis method candidates can be determined without the source code of each analysis method.

 また、分析手法候補決定部4は、分析対象データの分析目的の文字列と、分析済データの分析目的の文字列とに基づき、分析目的類似度を算出する。文字列同士を比較して分析目的類似度を算出し、当該分析目的類似度に基づき分析手法候補を決定することにより、各分析手法のソースコードが無くても、分析手法候補を決定することができる。 Further, the analysis method candidate determination unit 4 calculates the analysis target similarity based on the analysis target character string of the analysis target data and the analysis target character string of the analyzed data. By comparing the character strings and calculating the analysis target similarity, and determining the analysis method candidate based on the analysis target similarity, the analysis method candidate can be determined without the source code of each analysis method. it can.

 また、分析手法候補決定部4は、階層構造で記載された分析対象データの分析目的と、階層構造で記載された分析済データの分析目的とに基づき、分析目的類似度を算出する。階層ごとにあらかじめ設定された分析目的同士の類似性を比較して分析目的類似度を算出し、当該分析目的類似度に基づき分析手法候補を決定することにより、各分析手法のソースコードが無くても、分析手法候補を決定することができる。 Also, the analysis technique candidate determination unit 4 calculates the analysis object similarity based on the analysis purpose of the analysis target data described in the hierarchical structure and the analysis purpose of the analyzed data described in the hierarchical structure. By comparing the similarities between the analytical objectives set in advance for each hierarchy, calculating the analytical objective similarity, and determining the analytical technique candidates based on the analytical objective similarity, there is no source code for each analytical technique. Also, analysis method candidates can be determined.

 また、分析対象データの分析目的および分析済みデータの分析目的がソースコード又は中間コードで記載される場合、分析手法候補決定部4は、分析対象データの分析目的のソースコード又は中間コードに示される処理手順と、分析済みデータの分析目的のソースコード又は中間コードに示される処理手順との類似度を、一致率又は一致する処理手順の連続性に基づき、分析目的類似度として算出する。処理手順の一致率又は一致する処理手順の連続性等に基づき分析目的類似度を算出し、当該分析目的類似度に基づき分析手法候補を決定することにより、分析目的がソースコード又は中間コードで記載されている場合にも、分析手法候補を決定することができる。 When the analysis purpose of the analysis target data and the analysis purpose of the analyzed data are described in the source code or the intermediate code, the analysis method candidate determination unit 4 is indicated in the source code or the intermediate code for the analysis target data. The similarity between the processing procedure and the processing procedure indicated in the analysis-purpose source code or intermediate code of the analyzed data is calculated as the analysis-object similarity based on the matching rate or the continuity of the matching processing procedures. Analytical purpose is described in source code or intermediate code by calculating analytical target similarity based on the matching rate of processing procedures or continuity of matching processing procedures and determining analysis method candidates based on the analytical target similarity Even in such a case, analysis method candidates can be determined.

 また、分析手法候補決定部4は、分析手法ごとに、当該分析手法を用いた分析済データと分析対象データとの総合類似度の平均値を算出し、総合類似度の平均値に基づき選択した分析手法を分析手法候補と決定する。従って、各分析手法のソースコードが無くても、分析手法候補を決定することができる。 The analysis method candidate determination unit 4 calculates the average value of the overall similarity between the analyzed data using the analysis method and the analysis target data for each analysis method, and selects the average based on the average value of the overall similarity The analysis method is determined as a candidate analysis method. Therefore, analysis method candidates can be determined without the source code of each analysis method.

 <B.実施の形態2>
 <B-1.構成>
 図7は、実施の形態2に係るデータ分析手法候補決定装置12の構成を示すブロック図である。データ分析手法候補決定装置12は、実施の形態1に係るデータ分析手法候補決定装置11の構成に加えて、新たに評価取得部7と、推薦事例格納部8とを備えている。
<B. Second Embodiment>
<B-1. Configuration>
FIG. 7 is a block diagram illustrating a configuration of the data analysis technique candidate determination device 12 according to the second embodiment. In addition to the configuration of the data analysis technique candidate determination device 11 according to Embodiment 1, the data analysis method candidate determination apparatus 12 newly includes an evaluation acquisition unit 7 and a recommended case storage unit 8.

 推薦事例格納部8は、HDD(Hard Disk Drive)又はSD等といった記録媒体により構成され、推薦事例データを格納する。推薦事例データとは、過去に分析手法候補決定部4で決定した分析手法候補が、分析対象データおよび分析目的に紐付けられたデータである。 The recommended case storage unit 8 is composed of a recording medium such as an HDD (Hard Disk Drive) or SD, and stores recommended case data. The recommended case data is data in which analysis method candidates determined by the analysis method candidate determination unit 4 in the past are associated with the analysis target data and the analysis purpose.

 評価取得部7は、ユーザが入力部5を介して入力した分析手法候補に対する評価情報を取得し、当該評価情報を、推薦事例格納部8に格納された対応する推薦事例に追加する。すなわち、推薦事例格納部8では、分析対象データ、分析目的、および分析手法候補からなる推薦事例と、当該推薦事例に対する評価情報とが紐付けて格納されている。評価取得部7は、図3に示すプロセッサ20がメモリ21に格納されたソフトウェアプログラムを実行することにより、プロセッサ20の機能として実現する。 The evaluation acquisition unit 7 acquires evaluation information for the analysis technique candidate input by the user via the input unit 5, and adds the evaluation information to the corresponding recommended case stored in the recommended case storage unit 8. That is, the recommended case storage unit 8 stores recommended cases made up of analysis target data, analysis purposes, and analysis method candidates, and evaluation information for the recommended cases in association with each other. The evaluation acquisition unit 7 is realized as a function of the processor 20 by the processor 20 illustrated in FIG. 3 executing a software program stored in the memory 21.

 <B-2.動作>
 図8は、データ分析手法候補決定装置12の動作を示すフローチャートである。ステップS11~S16までは実施の形態1と同様であり、図4で既に説明しているため、ここでは説明を省略する。分析手法候補決定部4は、分析手法候補を決定し(ステップS15)、当該分析手法候補を出力部6に出力すると(ステップS16)、分析対象データ、分析目的、および分析手法候補を紐付けたデータ(推薦事例)を推薦事例格納部8に格納する(ステップS17)。
<B-2. Operation>
FIG. 8 is a flowchart showing the operation of the data analysis technique candidate determination device 12. Steps S11 to S16 are the same as those in the first embodiment and have already been described with reference to FIG. The analysis method candidate determination unit 4 determines the analysis method candidate (step S15), and outputs the analysis method candidate to the output unit 6 (step S16), and associates the analysis target data, the analysis purpose, and the analysis method candidate. Data (recommended case) is stored in the recommended case storage unit 8 (step S17).

 図9は、評価取得部7の動作を示すフローチャートである。このフローは、推薦事例格納部8に推薦事例が格納されている場合にのみ行われる。まず、評価取得部7は、評価情報を付加すべき推薦事例を決定する(ステップS71)。例えば、推薦事例格納部8に格納された全推薦事例をリスト表示した画面を表示し、当該画面からユーザに推薦事例を選択させても良い。また、ユーザに分析対象データまたは分析目的等の条件を入力させ、入力された条件から推奨事例を特定又は絞り込んでも良い。また、まだ評価情報が付加されていない推薦事例を推薦事例格納部8から抽出してユーザに提示し、ユーザに選択させても良い。 FIG. 9 is a flowchart showing the operation of the evaluation acquisition unit 7. This flow is performed only when a recommended case is stored in the recommended case storage unit 8. First, the evaluation acquisition unit 7 determines a recommended case to which evaluation information is to be added (step S71). For example, a screen displaying a list of all recommended cases stored in the recommended case storage unit 8 may be displayed, and the user may select a recommended case from the screen. In addition, the user may be allowed to input conditions such as analysis target data or analysis purpose, and the recommended cases may be specified or narrowed down based on the input conditions. Moreover, the recommended case to which evaluation information is not yet added may be extracted from the recommended case storage unit 8 and presented to the user, and the user may select it.

 次に、ステップS71で決定した推薦事例で推薦された複数の分析手法候補のうち、実際にユーザが使用した分析手法候補を特定する(ステップS72)。ユーザが複数の分析手法候補を使用した場合には、複数の分析手法候補が特定される。ここでは、例えば複数の分析手法候補のリスト画面を表示し、当該リスト画面から実際にユーザが使用した分析手法候補を選択させる。 Next, among the plurality of analysis method candidates recommended in the recommended case determined in step S71, the analysis method candidate actually used by the user is specified (step S72). When the user uses a plurality of analysis technique candidates, the plurality of analysis technique candidates are specified. Here, for example, a list screen of a plurality of analysis method candidates is displayed, and analysis method candidates actually used by the user are selected from the list screen.

 次に、ステップS72で特定した分析手法候補について、ユーザの評価情報を取得する(ステップS73)。ユーザの評価情報は、入力部5からユーザに入力させることによって取得する。評価情報は、例えば分析精度、ユーザの個人所感、実行時間等の補足情報を含む。また、複数の分析手法候補のリスト画面からユーザに最も望ましい結果が得られた分析手法候補を選択させても良い。あるいは、最も望ましい一つを選択する代わりに、望ましい結果が得られた順に分析手法候補に順位を入力させても良い。 Next, user evaluation information is acquired for the analysis method candidate specified in step S72 (step S73). The user evaluation information is obtained by causing the user to input from the input unit 5. The evaluation information includes supplementary information such as analysis accuracy, user's personal feeling, execution time, and the like. Alternatively, the analysis method candidate that provides the most desirable result may be selected by the user from a list screen of a plurality of analysis method candidates. Alternatively, instead of selecting the most desirable one, the ranks may be input to the analysis method candidates in the order in which desirable results are obtained.

 また、上記のような良い評価に関する情報以外に、悪い評価に関する情報を取得しても良い。例えば、ユーザが使用したものの、何らかの課題がある等の理由で結果的に採用しなかった分析手法候補があれば、当該分析手法候補に関する課題を入力させても良い。また、課題については実際にユーザが使用していない分析手法候補についても入力可能とする。また、課題等の補足情報は、予め用意した選択肢の中から回答を選択させても良いし、自由に入力させても良い。 Moreover, in addition to the information related to good evaluation as described above, information related to bad evaluation may be acquired. For example, if there is an analysis method candidate that has been used by the user but has not been adopted as a result due to some problem, for example, a problem related to the analysis method candidate may be input. In addition, it is possible to input analysis methods candidates that are not actually used by the user. Further, supplementary information such as assignments may be selected from answers prepared in advance or may be freely input.

 評価取得部7は、こうして取得した評価情報を推薦事例に付与して、推薦事例格納部8に格納する(ステップS74)。 The evaluation acquisition unit 7 assigns the evaluation information thus acquired to the recommended case and stores it in the recommended case storage unit 8 (step S74).

 さらに、評価取得部7は、評価情報が付与された推薦事例のうち、望ましい評価情報が付与された分析手法候補に関する推薦事例を、新たな分析事例として分析事例格納部3に追加する(ステップS75)。例えば、分析対象データ「テレビの視聴データ」、分析目的「視聴者の視聴嗜好の分析」に対する分析手法候補「回帰分析」、「k-means法」のうち、「回帰分析」に対して望ましい評価情報を取得し、「k-means法」について望ましくない評価情報を取得した場合には、分析対象データ「テレビの視聴データ」、分析目的「視聴者の視聴嗜好の分析」、分析手法「回帰分析」を新たな分析事例として分析事例格納部3に追加する。複数の分析手法について望ましい評価情報を得た場合には、望ましい評価情報を得た全ての分析手法について、上記のとおり分析事例格納部3に追加する。このようにして、望ましい評価情報を得た分析事例が追加され、それを用いて分析手法候補の決定を行うことにより、分析手法候補の決定精度が向上する。 Furthermore, the evaluation acquisition unit 7 adds, to the analysis case storage unit 3, a recommended case related to an analysis method candidate to which desirable evaluation information is assigned among recommended cases to which evaluation information is assigned (step S <b> 75). ). For example, among the analysis method candidates “regression analysis” and “k-means method” for the analysis target data “TV viewing data” and the analysis purpose “analysis of viewer viewing preference”, desirable evaluation for “regression analysis” When information is acquired and undesirable evaluation information is acquired for the “k-means method”, the analysis target data “TV viewing data”, the analysis purpose “analysis of viewer viewing preferences”, the analysis method “regression analysis” Is added to the analysis case storage unit 3 as a new analysis case. When desirable evaluation information is obtained for a plurality of analysis methods, all the analysis methods that have obtained desirable evaluation information are added to the analysis case storage unit 3 as described above. In this way, an analysis example obtained with desirable evaluation information is added, and the analysis method candidate is determined using the analysis example, thereby improving the determination accuracy of the analysis method candidate.

 <B-3.変形例>
 図10は、実施の形態2の変形例に係るデータ分析手法候補決定装置13の構成を示すブロック図である。データ分析手法候補決定装置13は、データ分析手法候補決定装置12の構成に加えて、属性追加部9を備える。属性追加部9以外のデータ分析手法候補決定装置13の構成は、データ分析手法候補決定装置12と同様である。
<B-3. Modification>
FIG. 10 is a block diagram illustrating a configuration of a data analysis technique candidate determination device 13 according to a modification of the second embodiment. The data analysis technique candidate determination device 13 includes an attribute adding unit 9 in addition to the configuration of the data analysis technique candidate determination device 12. The configuration of the data analysis method candidate determination device 13 other than the attribute adding unit 9 is the same as that of the data analysis method candidate determination device 12.

 属性追加部9は、評価取得部7で取得した分析手法候補の不採用理由を分析し、不採用理由に対応するデータ属性を、分析対象データ格納部2にデータ属性が格納されている全ての分析対象データの新たなデータ属性項目として追加する。このとき属性追加部9は、追加されたデータ属性項目を、出力部6を通してシステム管理者等のユーザに通知し、追加されたデータ属性項目に関するデータ属性を入力するように促しても良い。また、追加されたデータ属性項目についてデータ属性類似度を算出するための距離評価軸もデータ属性と同様、ユーザに入力するように促しても良い。ユーザは、入力部5を通して、これらのデータ属性又は距離評価軸をデータ分析手法候補決定装置13に入力することができる。属性追加部9は、図6に示すプロセッサ20がメモリ21に格納されたソフトウェアプログラムを実行することにより、プロセッサ20の機能として実現する。 The attribute adding unit 9 analyzes the reason for non-adoption of the analysis technique candidate acquired by the evaluation acquiring unit 7, and sets the data attribute corresponding to the reason for non-adoption to all the data attributes stored in the analysis target data storage unit 2. It adds as a new data attribute item of analysis object data. At this time, the attribute adding unit 9 may notify the user such as a system administrator of the added data attribute item through the output unit 6 and prompt the user to input the data attribute related to the added data attribute item. Also, the distance evaluation axis for calculating the data attribute similarity for the added data attribute item may be urged to be input to the user in the same manner as the data attribute. The user can input these data attributes or distance evaluation axes to the data analysis technique candidate determination device 13 through the input unit 5. The attribute adding unit 9 is realized as a function of the processor 20 when the processor 20 illustrated in FIG. 6 executes a software program stored in the memory 21.

 図11は、データ分析手法候補決定装置13における属性追加部9の動作を示すフローチャートである。このフローは、推薦事例格納部8において、分析手法候補の不採用理由が格納されている場合に実行される。 FIG. 11 is a flowchart showing the operation of the attribute adding unit 9 in the data analysis technique candidate determination device 13. This flow is executed when the reason for not adopting the analysis method candidate is stored in the recommended case storage unit 8.

 まず、推薦事例格納部8から評価情報が付与された推薦事例を抽出する(ステップS81)。 First, a recommended case to which evaluation information is given is extracted from the recommended case storage unit 8 (step S81).

 次に、ステップS81で抽出した推薦事例の不採用となった分析手法候補について、その不採用理由を抽出する(ステップS82)。 Next, the reason for non-adoption is extracted for the analysis method candidate that is not used in the recommended case extracted in step S81 (step S82).

 続いて、ステップS82で抽出した不採用理由を分析する(ステップS83)。分析手法としては、キーワード抽出による頻度解析または単純統計等を用いることができる。 Subsequently, the reason for non-employment extracted in step S82 is analyzed (step S83). As an analysis method, frequency analysis by keyword extraction, simple statistics, or the like can be used.

 最後に、分析した不採用理由に対応するデータ属性項目を、分析対象データ格納部2に格納される分析対象データのデータ属性の項目として追加する(ステップS84)。例えば、ステップS83で不採用理由を分析した結果、「実行時間が長い」、「処理が重い」といったキーワードが不採用理由として多いことが分かれば、「計算量」、「単位量当たりの実行時間」等の計算負荷に関する項目をデータ属性に追加する。 Finally, the data attribute item corresponding to the analyzed non-recruitment reason is added as the data attribute item of the analysis target data stored in the analysis target data storage unit 2 (step S84). For example, as a result of analyzing the reason for non-adoption in step S83, if it is found that keywords such as “execution time is long” and “processing is heavy” are many as reasons for non-adoption, “calculation amount”, “execution time per unit amount” To the data attribute.

 このように、データ分析手法候補決定装置13によれば、分析手法候補の不採用理由に対応したデータ属性を追加することによって、分析手法候補決定部4における分析手法候補の決定にあたり、より細かくデータ属性類似度の判断をすることが出来るようになる。従って、分析手法候補の決定精度を向上させることが出来る。 As described above, according to the data analysis method candidate determination device 13, by adding the data attribute corresponding to the reason for not using the analysis method candidate, the analysis method candidate determination unit 4 determines the analysis method candidate more finely. It becomes possible to judge attribute similarity. Therefore, it is possible to improve the determination accuracy of the analysis method candidate.

 <B-4.効果>
 実施の形態2に係るデータ分析手法候補決定装置12は、実施の形態1に係るデータ分析手法候補決定装置11の構成に加えて、分析手法候補に対するユーザの評価情報を取得する評価取得部7と、分析対象データのデータ属性と、分析対象データの分析手法候補と、分析手法候補に対する評価情報とを紐付けたデータを推薦事例として格納する推薦事例格納部8と、を備える。このように、分析手法候補の決定結果を推薦事例として格納すれば、例えば望ましい評価情報を得た推薦事例を分析事例として用いることにより、分析手法候補の決定精度が向上させることが出来る。
<B-4. Effect>
In addition to the configuration of the data analysis method candidate determination device 11 according to the first embodiment, the data analysis method candidate determination device 12 according to the second embodiment includes an evaluation acquisition unit 7 that acquires user evaluation information for the analysis method candidate; And a recommended case storage unit 8 that stores data associating data attributes of the analysis target data, analysis method candidates of the analysis target data, and evaluation information for the analysis method candidates as recommended cases. As described above, if the determination result of the analysis method candidate is stored as a recommended case, the determination accuracy of the analysis method candidate can be improved by using, for example, the recommended case that has obtained desirable evaluation information as the analysis case.

 また、実施の形態2の変形例に係るデータ分析手法候補決定装置13は、実施の形態2に係るデータ分析手法候補決定装置12の構成に加えて、評価取得部7が取得した評価情報から分析手法候補の不採用理由を抽出し、不採用理由に対応する項目をデータ属性の項目に追加する属性追加部9を備える。従って、分析手法候補決定部4における分析手法候補の決定にあたり、より細かくデータ属性類似度の判断をすることが出来るようになるため、分析手法候補の決定精度を向上させることが出来る。 In addition to the configuration of the data analysis technique candidate determination device 12 according to the second embodiment, the data analysis method candidate determination apparatus 13 according to the modification of the second embodiment analyzes from the evaluation information acquired by the evaluation acquisition unit 7. An attribute adding unit 9 is provided for extracting the reason for non-adoption of method candidates and adding an item corresponding to the reason for non-adoption to the data attribute item. Therefore, when the analysis method candidate determination unit 4 determines the analysis method candidate, the data attribute similarity can be determined more finely, so that the determination accuracy of the analysis method candidate can be improved.

 <C.実施の形態3>
 <C-1.構成>
 図12は、実施の形態3に係るデータ分析手法候補決定装置14の構成を示すブロック図である。データ分析手法候補決定装置14は、実施の形態1に係るデータ分析手法候補決定装置11の構成に加えて、モデル変更提案部10を備えている。
<C. Embodiment 3>
<C-1. Configuration>
FIG. 12 is a block diagram illustrating a configuration of the data analysis technique candidate determination device 14 according to the third embodiment. The data analysis method candidate determination device 14 includes a model change proposing unit 10 in addition to the configuration of the data analysis method candidate determination device 11 according to the first embodiment.

 モデル変更提案部10は、分析手法候補決定部4で決定した分析手法候補が物理モデルベース解析手法を含む場合に、物理モデルの修正や追加といった物理モデルの変更を提案する。ここで、物理モデルベース解析手法とは、機器モデル、故障モデル、挙動モデル、相関モデル、またはユーザモデル等、データまたは設計情報に基づく物理モデルを活用したデータ分析手法全般を示している。物理モデルはパラメータシートのような文書形式で記載されてもよく、FTA(Fault Tree Analysis)図、故障木、または電気回路図等の図表形式で記載されてもよいし、運動方程式またはバスタブ曲線等の数式で記載されてもよいし、アセンブラまたはソースコードのような機械言語で記載されてもよい。モデル変更提案部10は、図3に示すプロセッサ20がメモリ21に格納されたソフトウェアプログラムを実行することにより、プロセッサ20の機能として実現する。 The model change proposing unit 10 proposes a physical model change such as correction or addition of a physical model when the analysis method candidate determined by the analysis method candidate determining unit 4 includes a physical model-based analysis method. Here, the physical model-based analysis method indicates all data analysis methods using a physical model based on data or design information, such as a device model, a failure model, a behavior model, a correlation model, or a user model. The physical model may be described in a document format such as a parameter sheet, may be described in a chart format such as an FTA (Fault Tree Analysis) diagram, a fault tree, or an electric circuit diagram, an equation of motion or a bathtub curve, etc. Or a machine language such as assembler or source code. The model change proposing unit 10 is realized as a function of the processor 20 by the processor 20 illustrated in FIG. 3 executing a software program stored in the memory 21.

 分析事例格納部3には、分析対象データと、当該分析対象データの分析目的およびデータ属性と、分析手法とが、分析事例として格納されている。さらに、分析手法が物理モデルベース解析手法である場合には、物理モデルの変更情報も分析事例として格納されている。具体的には、ユーザがある物理モデルに変更(追加、修正)を加えた上で、変更後の物理モデルを用いてデータ分析を行った場合に、実際にデータ分析に用いた変更後の物理モデルだけでなく、変更前の物理モデルも変更情報として分析事例格納部3に格納される。 The analysis case storage unit 3 stores analysis target data, an analysis purpose and data attribute of the analysis target data, and an analysis technique as an analysis case. Furthermore, when the analysis method is a physical model-based analysis method, change information of the physical model is also stored as an analysis example. Specifically, when a user makes a change (addition, modification) to a physical model and then performs data analysis using the changed physical model, the changed physical model actually used for data analysis Not only the model but also the physical model before the change is stored in the analysis case storage unit 3 as change information.

 以上に説明した以外のデータ分析手法候補決定装置14の構成は、実施の形態1に係るデータ分析手法候補決定装置11の構成と同様である。 The configuration of the data analysis technique candidate determination device 14 other than that described above is the same as the configuration of the data analysis method candidate determination device 11 according to the first embodiment.

 <C-2.動作>
 図13は、データ分析手法候補決定装置14の動作を示すフローチャートである。ステップS11~15、S16は実施の形態1と同様であるが、ステップS15とステップS16の間に新たなステップS18が追加される点が実施の形態1とは異なる。分析手法候補決定部4で分析対象データの分析手法候補が決定されると(ステップS15)、当該分析手法候補が物理モデルベース解析手法を含む場合、モデル変更提案部10が物理モデルの変更を提案する(ステップS18)。
<C-2. Operation>
FIG. 13 is a flowchart showing the operation of the data analysis technique candidate determination device 14. Steps S11 to S15 and S16 are the same as in the first embodiment, but differ from the first embodiment in that a new step S18 is added between steps S15 and S16. When the analysis method candidate of the analysis target data is determined by the analysis method candidate determination unit 4 (step S15), when the analysis method candidate includes a physical model-based analysis method, the model change proposing unit 10 proposes a change of the physical model. (Step S18).

 図14は、図13のステップS18におけるモデル変更提案部10の動作を示すフローチャートである。このフローは、分析事例格納部3に物理モデルの変更情報が格納されている場合にのみ実行される。 FIG. 14 is a flowchart showing the operation of the model change proposing unit 10 in step S18 of FIG. This flow is executed only when physical model change information is stored in the analysis case storage unit 3.

 まず、図13のステップS15で分析手法候補決定部4が決定した分析手法候補に、物理モデルベース解析手法が含まれているかを判定する(ステップS181)。物理モデルベース解析手法が含まれていなければ、モデル変更提案部10の処理を終了する。物理モデルベース解析手法が含まれていれば、ステップS182に移行する。 First, it is determined whether the analysis method candidate determined by the analysis method candidate determination unit 4 in step S15 in FIG. 13 includes a physical model base analysis method (step S181). If the physical model base analysis method is not included, the process of the model change proposing unit 10 is terminated. If the physical model-based analysis method is included, the process proceeds to step S182.

 ステップS182では、分析事例格納部3に格納された分析事例の中から、分析手法候補に含まれる物理モデルデータベース解析手法と同一の分析手法を用い、かつ物理モデルの変更情報が記載された分析事例を抽出する。 In step S182, the analysis example stored in the analysis example storage unit 3 uses the same analysis method as the physical model database analysis method included in the analysis method candidates, and the analysis example describes the change information of the physical model To extract.

 次に、変更情報で示された変更後の物理モデルデータが分析事例格納部3に格納されているか否かを判断する(ステップS183)。そして、変更後の物理モデルデータが分析事例格納部3に存在すれば、当該変更後の物理モデルの活用をユーザに提案する(ステップS184)。例えば、過去にユーザが分析対象データ「公共交通機関の乗車履歴」を分析する際に、乗客モデルAを物理モデルとして使用する分析手法が分析手法候補として推薦されたとする。これに対して、ユーザが乗客モデルAに何らかの修正を加えたり新たな乗客モデルを追加したりする等の変更を加えた乗客モデルBによってデータ分析を行った場合、分析事例格納部3には、分析対象データ、分析目的、実際に使用した分析手法(乗客モデルB)に加えて、変更前の乗客モデルAが記録される。その後、別のデータ分析において、分析手法候補決定部4が乗客モデルAを物理モデルとして使用する分析手法を分析手法候補として決定した場合には、乗客モデルAに代えて乗客モデルBを使用するようユーザに提案する。 Next, it is determined whether or not the changed physical model data indicated by the change information is stored in the analysis case storage unit 3 (step S183). If the changed physical model data exists in the analysis example storage unit 3, the user is suggested to utilize the changed physical model (step S184). For example, it is assumed that an analysis method that uses the passenger model A as a physical model has been recommended as an analysis method candidate when the user previously analyzed the analysis target data “boarding history of public transportation”. On the other hand, when the user performs a data analysis using the passenger model B in which some modifications are made to the passenger model A or a new passenger model is added, the analysis example storage unit 3 includes: In addition to the analysis target data, the analysis purpose, and the analysis method actually used (passenger model B), the passenger model A before the change is recorded. Thereafter, in another data analysis, when the analysis method candidate determination unit 4 determines an analysis method that uses the passenger model A as a physical model as the analysis method candidate, the passenger model B is used instead of the passenger model A. Suggest to users.

 ステップS183で、変更後の物理モデルデータが分析事例格納部3に存在しなければ、物理モデルの変更(修正または追加)を行うための手法をユーザに提案する。例えば、「商品購入状況分析」という分析目的に対して、購入客モデルを物理モデルとして使用する分析手法が分析手法候補である場合には、購入客モデルを分析したい商品ジャンルに適した区分に修正したり、「子供に代わって親が買う」という購入客モデルを追加したりするための手法を提案する。 In step S183, if the changed physical model data does not exist in the analysis example storage unit 3, a method for changing (correcting or adding) the physical model is proposed to the user. For example, if the analysis method that uses the customer model as a physical model is an analysis method candidate for the analysis purpose of “product purchase situation analysis”, the customer model is corrected to a category suitable for the product genre you want to analyze. And a method for adding a purchaser model that “a parent buys on behalf of a child” is proposed.

 <C-3.効果>
 実施の形態3に係るデータ分析手法候補決定装置14において、分析事例格納部3に格納される分析事例データは、ユーザがある物理モデルに変更を加えた物理モデルを用いてデータ解析を行った分析事例について、変更前の物理モデルの情報を含む。そして、データ分析手法候補決定装置14は、実施の形態1に係るデータ分析手法候補決定装置11の構成に加えて、モデル変更提案部10を備える。モデル変更提案部10は、分析手法候補が物理モデルを用いる解析手法であり、分析手法候補で用いる物理モデルが、分析事例における変更前の物理モデルと同一である場合に、物理モデルの変更を提案する。従って、物理モデルベース解析手法に関する分析精度を向上させることが可能となる。
<C-3. Effect>
In the data analysis technique candidate determination device 14 according to the third embodiment, the analysis case data stored in the analysis case storage unit 3 is an analysis in which data analysis is performed using a physical model obtained by changing a certain physical model. For the case, information on the physical model before the change is included. The data analysis technique candidate determination device 14 includes a model change proposing unit 10 in addition to the configuration of the data analysis technique candidate determination device 11 according to the first embodiment. The model change proposing unit 10 proposes a change of the physical model when the analysis method candidate is an analysis method using a physical model and the physical model used in the analysis method candidate is the same as the physical model before the change in the analysis example. To do. Therefore, it is possible to improve the analysis accuracy regarding the physical model-based analysis method.

 <D.実施の形態4>
 <D-1.構成>
 図17は、実施の形態4に係るデータ分析手法候補決定装置15の構成を示すブロック図である。データ分析手法候補決定装置15は、実施の形態1に係るデータ分析手法候補決定装置11の構成に加えて、既存データ活用提案部101を備えている。
<D. Embodiment 4>
<D-1. Configuration>
FIG. 17 is a block diagram illustrating a configuration of the data analysis technique candidate determination device 15 according to the fourth embodiment. The data analysis technique candidate determination device 15 includes an existing data utilization proposal unit 101 in addition to the configuration of the data analysis technique candidate determination device 11 according to the first embodiment.

 既存データ活用提案部101は、分析手法候補決定部4が決定した分析手法の実行に必要なデータ属性を、ユーザが選定している分析対象データ(第1分析対象データ)が持たない場合に、分析対象データ格納部2に保存された過去の分析対象データの中から、必要なデータ属性を有する分析対象データ(第2分析対象データ)を抽出し、第2分析対象データの活用をユーザに提案する。既存データ活用提案部101は、図3に示すプロセッサ20がメモリ21に格納されたソフトウェアプログラムを実行することにより、プロセッサ20の機能として実現する。 When the analysis target data (first analysis target data) selected by the user does not have the data attributes necessary for the execution of the analysis method determined by the analysis method candidate determination unit 4, the existing data utilization proposal unit 101 Extract the analysis target data (second analysis target data) having the necessary data attributes from the past analysis target data stored in the analysis target data storage unit 2, and propose to the user to use the second analysis target data To do. The existing data utilization proposing unit 101 is realized as a function of the processor 20 by the processor 20 illustrated in FIG. 3 executing a software program stored in the memory 21.

 分析事例格納部3には、ユーザが当初選定していた分析対象データと、当該分析対象データの分析目的及びデータ属性と、分析手法とが、分析事例として格納されている。また、分析事例格納部3には、既存データ活用提案部101により提案されたことによってユーザが追加選定した分析対象データも分析事例として格納されている。分析対象データは、選定タイミング別にフラグを付けて分析事例格納部3に保存されても良い。 The analysis case storage unit 3 stores the analysis target data initially selected by the user, the analysis purpose and data attributes of the analysis target data, and the analysis technique as analysis cases. The analysis case storage unit 3 also stores analysis target data additionally selected by the user as proposed by the existing data utilization proposal unit 101 as an analysis case. The analysis target data may be stored in the analysis case storage unit 3 with a flag for each selection timing.

 以上に説明した以外のデータ分析手法候補決定装置15の構成は、実施の形態1に係るデータ分析手法候補決定装置11の構成と同様である。 The configuration of the data analysis method candidate determination device 15 other than that described above is the same as the configuration of the data analysis method candidate determination device 11 according to the first embodiment.

 <D-2.動作>
 図18は、データ分析手法候補決定装置15の動作を示すフローチャートである。図18のフローチャートにおいてステップS11~15、S16は実施の形態1と同様であるが、ステップS15とステップS16の間に新たなステップS19が追加される点が実施の形態1とは異なる。分析手法候補決定部4で分析対象データの分析手法候補が決定されると(ステップS15)、ステップS13で取得した分析対象データのデータ属性が当該分析手法候補を実行するために必要なデータ属性として不足する場合に、既存データ活用提案部101が分析対象データの追加を提案する(ステップS19)。
<D-2. Operation>
FIG. 18 is a flowchart showing the operation of the data analysis technique candidate determination device 15. In the flowchart of FIG. 18, steps S11 to S15 and S16 are the same as in the first embodiment, but differ from the first embodiment in that a new step S19 is added between steps S15 and S16. When the analysis method candidate of the analysis target data is determined by the analysis method candidate determination unit 4 (step S15), the data attribute of the analysis target data acquired in step S13 is used as a data attribute necessary for executing the analysis method candidate. If it is insufficient, the existing data utilization proposing unit 101 proposes to add analysis target data (step S19).

 図19は、図18のステップS19における既存データ活用提案部101の動作を示すフローチャートである。 FIG. 19 is a flowchart showing the operation of the existing data utilization proposing unit 101 in step S19 of FIG.

 まず、既存データ活用提案部101は、図18のステップS11で選択された分析対象データ(第1の分析対象データ)が、ステップS15で決定された分析手法候補を実行するために必要なデータ属性を有しているか否かを判断する(ステップS191)。ここで、分析対象データが必要なデータ属性を有していない場合として、以下の3つの場合が例示される。1つ目は、分析対象データそのものが欠落している場合である。2つ目は、必要なデータ属性として規定されたデータの取得間隔に対して分析対象データの取得間隔が粗く、十分な分析結果が得られない場合である。3つ目は、必要なデータ属性として規定されたデータの取得方法に分析対象データの取得方法が適合せず、十分な分析結果が得られない場合である。例えば、センサ等で直接計測したデータであることが要求されているにも関わらず、分析対象データが加工値である場合等が3つ目のケースに該当する。 First, the existing data utilization proposing unit 101 uses the data attributes necessary for the analysis target data (first analysis target data) selected in step S11 of FIG. 18 to execute the analysis method candidates determined in step S15. It is judged whether it has (step S191). Here, as the case where the analysis target data does not have the necessary data attribute, the following three cases are exemplified. The first is a case where the analysis target data itself is missing. The second case is a case where the data acquisition interval of the analysis target data is rough with respect to the data acquisition interval specified as the necessary data attribute, and a sufficient analysis result cannot be obtained. The third is a case where the data acquisition method specified as the necessary data attribute does not match the data acquisition method of the analysis target, and sufficient analysis results cannot be obtained. For example, the third case corresponds to the case where the analysis target data is a processed value even though the data is directly measured by a sensor or the like.

 分析対象データ(第1の分析対象データ)が、分析手法候補を実行するために必要なデータ属性を有している場合、既存データ活用提案部101は処理を終了する。一方、分析対象データ(第1の分析対象データ)が、分析手法候補を実行するために必要なデータ属性を有していない場合、既存データ活用提案部101はステップS192の処理に移行する。 If the analysis target data (first analysis target data) has a data attribute necessary for executing the analysis method candidate, the existing data utilization proposing unit 101 ends the process. On the other hand, when the analysis target data (first analysis target data) does not have a data attribute necessary for executing the analysis method candidate, the existing data utilization proposal unit 101 proceeds to the process of step S192.

 ステップS192で既存データ活用提案部101は、分析事例格納部3に格納された分析事例の中から、分析手法候補と同一もしくは分析手法候補を含む分析手法を用い、かつ分析目的が同一もしくは類似した分析事例を抽出する。 In step S192, the existing data utilization proposing unit 101 uses an analysis method that is the same as or includes an analysis method candidate from the analysis cases stored in the analysis case storage unit 3, and the analysis purpose is the same or similar. Extract analysis cases.

 次に、既存データ活用提案部101は、抽出された分析事例における分析済みデータのデータ属性と、ユーザが現在選定している分析対象データのデータ属性とを比較し、分析済みデータのデータ属性から分析手法候補の実行に必要なデータ属性を抽出する(ステップS193)。この際、データ属性としてデータに対するアクセス権限が設定されておりユーザがアクセス権限を保有していないデータ、またはデータ属性としてデータの活用条件が設定されておりデータ出典元との契約によりデータの流用に制限があるデータ等のデータ属性は、抽出から除外してもよい。また、この場合、アクセス権限またはデータの流用に関する制限情報を付与してデータ属性のみ提示してもよい。 Next, the existing data utilization proposing unit 101 compares the data attribute of the analyzed data in the extracted analysis example with the data attribute of the analysis target data currently selected by the user, and determines the data attribute of the analyzed data. Data attributes necessary for executing the analysis technique candidate are extracted (step S193). At this time, data access authority is set as the data attribute and the user does not have access authority, or data utilization conditions are set as the data attribute and the data is diverted by contract with the data source. Data attributes such as restricted data may be excluded from extraction. In this case, only the data attribute may be presented by giving access authority or restriction information on data diversion.

 そして、既存データ活用提案部101は、ステップS193で抽出されたデータ属性を保有する分析対象データが分析対象データ格納部2に存在すれば、当該抽出されたデータ属性を保有する分析対象データ(第2の分析対象データ)の活用、すなわち第2の分析対象データを現在選択中の分析対象データ(第1の分析対象データ)に追加して分析を行うようユーザに提案する(ステップS194)。例えば、ユーザが分析対象データ「A県B市C町D丁目に存在する一般家庭の消費電力量」を分析対象データ「分析対象期間の平日/休日区分」を追加して分析する際に、分析手法候補として「k-means法」が提示され、ユーザが当該分析手法候補を用いることを決定したとする。この際に、分析事例格納部3に、別のユーザが「k-means法」を用いて、分析対象データ「ビルの消費電力量」を分析対象データ「分析対象期間の平日/休日区分」と「分析対象期間の気象観測データ」と「分析対象期間の従業員のビル内入退出履歴」を追加して分析した事例が存在したとする。ただし、分析対象データ「分析対象期間の従業員のビル内入退出履歴」にはデータ属性としてデータの二次利用が不可であることが示されているものとする。その場合、ステップS194で既存データ活用提案部101は、分析対象データ「分析対象期間の気象観測データ」を追加利用するようユーザに提案してもよい。このとき、既存データ活用提案部101は、分析対象データ「分析対象期間の気象観測データ」と分析対象データ「分析対象期間の従業員のビル内入退出履歴」の追加利用が望ましいが、分析対象データ「分析対象期間の従業員のビル内入退出履歴」のデータ属性としてデータの二次利用が不可であることが示されていることを、ユーザに提示してもよい。 Then, if the analysis target data having the data attribute extracted in step S193 exists in the analysis target data storage unit 2, the existing data utilization proposing unit 101 determines that the analysis target data having the extracted data attribute (first 2 analysis target data), that is, the second analysis target data is added to the currently selected analysis target data (first analysis target data), and the user is proposed to perform the analysis (step S194). For example, when the user analyzes the analysis target data “the power consumption of a general household existing in A prefecture B city C town D chome” by adding the analysis target data “weekday / holiday classification of the analysis target period” Assume that the “k-means method” is presented as a method candidate, and the user decides to use the analysis method candidate. At this time, another user uses the “k-means method” in the analysis case storage unit 3 to change the analysis target data “building energy consumption” into the analysis target data “weekday / holiday classification of the analysis target period”. Suppose that there was a case in which “meteorological observation data during the analysis period” and “employee entry / exit history during the analysis period” were added and analyzed. However, it is assumed that the analysis target data “employee entry / exit history of the analysis target period in the building” indicates that secondary use of the data is impossible as a data attribute. In that case, the existing data utilization proposing unit 101 may propose to the user to additionally use the analysis target data “meteorological observation data in the analysis target period” in step S194. At this time, the existing data utilization proposing unit 101 preferably uses the analysis target data “meteorological observation data during the analysis target period” and the analysis target data “employee entry / exit history of the analysis target period”. It may be shown to the user that the secondary use of data is indicated as a data attribute of the data “history of entry / exit of employees in the analysis period”.

 なお、上記では、分析対象データが分析手法候補を適用するために必要なデータ属性を有していない場合として3つの場合を例示し、このような場合に分析対象データの追加を提案することについて説明した。しかし、分析対象データが分析手法候補を適用するために必要なデータ属性を有している場合であっても、以下のような場合には分析対象データの追加を提案しても良い。1つ目は、必要なデータ属性は有しているものの、最良の結果が得られない条件の分析対象データが選択されている場合である。2つ目は、現在選択されている分析対象データでも分析は可能だが、新たな分析対象データを追加することで、さらに正確な分析結果が得られる場合である。 In the above, three cases are exemplified as cases where the analysis target data does not have the data attribute necessary for applying the analysis method candidate, and in such a case, the addition of the analysis target data is proposed. explained. However, even if the analysis target data has a data attribute necessary for applying the analysis method candidate, addition of the analysis target data may be proposed in the following cases. The first is a case where analysis target data having a necessary data attribute but a condition that does not provide the best result is selected. The second is a case where analysis is possible even with the currently selected analysis target data, but more accurate analysis results can be obtained by adding new analysis target data.

 <D-3.効果>
 実施の形態4に係るデータ分析手法候補決定装置15は、第1の分析対象データに対して分析手法候補決定部4が決定した分析手法に必要なデータ属性を、第1の分析対象データが持たない場合に、必要なデータ属性を有する第2の分析対象データの活用を提案する既存データ活用提案部101を備える。このように、分析手法候補の実施に必要なデータ属性を有する別の分析対象データの追加を提案することで、分析手法候補を実行した場合の分析精度を向上させることが可能となる。
<D-3. Effect>
In the data analysis method candidate determination device 15 according to the fourth embodiment, the first analysis target data has data attributes necessary for the analysis method determined by the analysis method candidate determination unit 4 with respect to the first analysis target data. If not, an existing data utilization proposing unit 101 that proposes utilization of second analysis target data having necessary data attributes is provided. Thus, by proposing the addition of another analysis target data having data attributes necessary for the execution of the analysis method candidate, it is possible to improve the analysis accuracy when the analysis method candidate is executed.

 また、第2の分析対象データはデータの流用可否に関するデータ属性を有し、既存データ活用提案部101は、第2の分析対象データの活用をユーザに提案する際に、分析済データの流用可否に関する情報をユーザに提供する。従って、ユーザは既存データ活用提案部101に提案された第2の分析対象データが流用不可のデータである場合には、流用可能な代替データの入手を検討することができ、代替データを追加することにより、分析手法候補を実行した場合の分析精度を向上させることが可能となる。 In addition, the second analysis target data has a data attribute related to whether or not the data can be diverted. When the existing data utilization proposing unit 101 proposes utilization of the second analysis target data to the user, whether or not the analyzed data can be diverted. Information to users. Accordingly, when the second analysis target data proposed to the existing data utilization proposing unit 101 is data that cannot be diverted, the user can consider obtaining alternative data that can be diverted, and add alternative data. Thus, it is possible to improve the analysis accuracy when the analysis method candidate is executed.

 <E.実施の形態5>
 <E-1.構成>
 図20は、実施の形態5に係るデータ分析手法候補決定装置16の構成を示すブロック図である。データ分析手法候補決定装置16は、実施の形態1に係るデータ分析手法候補決定装置11の構成に加えて、分析手法見直し提案部102を備えている。
<E. Embodiment 5>
<E-1. Configuration>
FIG. 20 is a block diagram illustrating a configuration of the data analysis technique candidate determination device 16 according to the fifth embodiment. The data analysis method candidate determination device 16 includes an analysis method review proposal unit 102 in addition to the configuration of the data analysis method candidate determination device 11 according to the first embodiment.

 分析手法見直し提案部102は、分析事例格納部3に格納されている分析事例について、分析目的が同一もしくは類似した事例が追加された際に、分析手法毎の採用率を演算し、事前に設定した分析手法見直し条件を満たす採用率の分析手法が検出された場合に、ユーザに対して分析手法の変更を提案する。分析手法見直し提案部102は、図3に示すプロセッサ20がメモリ21に格納されたソフトウェアプログラムを実行することにより、プロセッサ20の機能として実現する。 The analysis method review proposal unit 102 calculates the adoption rate for each analysis method when a case with the same or similar analysis purpose is added to the analysis case stored in the analysis case storage unit 3 and sets it in advance. If an analysis method of the employment rate that satisfies the analysis method review condition is detected, the analysis method is proposed to the user. The analysis technique review proposal unit 102 is implemented as a function of the processor 20 by the processor 20 illustrated in FIG. 3 executing a software program stored in the memory 21.

 分析事例格納部3には、分析事例と共に、分析事例を登録または更新したユーザの情報、分析事例の問い合わせ担当者の情報、分析手法の開発者または提供者の情報、分析事例の現在の活用状況等が格納されることが望ましい。分析事例の現在の活用状況には、製品適用済、試行中、または中止等の使用状況のほか、外部事例等が含まれていてもよい。 In the analysis case storage unit 3, the information of the user who registered or updated the analysis case, the information of the person in charge of the analysis case, the information of the developer or the provider of the analysis method, the current use state of the analysis case Etc. are preferably stored. The current utilization status of analysis cases may include external cases, etc. in addition to the usage status of product application, trial, or cancellation.

 以上に説明した以外のデータ分析手法候補決定装置16の構成は、実施の形態1に係るデータ分析手法候補決定装置11の構成と同様である。 The configuration of the data analysis method candidate determination device 16 other than that described above is the same as the configuration of the data analysis method candidate determination device 11 according to the first embodiment.

 <E-2.動作>
 図21は、データ分析手法候補決定装置16の動作を示すフローチャートである。ステップS11~16は実施の形態1と同様であるが、ステップS16の後に新たなステップS20が追加される点が実施の形態1とは異なる。分析手法候補決定部4で分析対象データの分析手法候補が決定され(ステップS15)、ユーザに分析手法候補を提示すると(ステップS16)、分析目的と分析手法毎の平均類似度を分析手法見直し提案部102に通知し、分析手法見直し提案部102が分析事例格納部3に格納された過去の分析事例に対して、分析手法の見直し提案要否を判定する(ステップS20)。
<E-2. Operation>
FIG. 21 is a flowchart showing the operation of the data analysis technique candidate determination device 16. Steps S11 to S16 are the same as in the first embodiment, but differ from the first embodiment in that a new step S20 is added after step S16. The analysis method candidate determination unit 4 determines the analysis method candidate of the analysis target data (step S15), and when the analysis method candidate is presented to the user (step S16), the analysis method and the average similarity for each analysis method are proposed to review the analysis method. The analysis method review proposal unit 102 determines whether or not the analysis method review proposal is necessary for the past analysis cases stored in the analysis case storage unit 3 (step S20).

 図22は、図20のステップS20における分析手法見直し提案部102の動作を示すフローチャートである。 FIG. 22 is a flowchart showing the operation of the analysis technique review proposing unit 102 in step S20 of FIG.

 まず、分析手法見直し提案部102は、分析目的と、図21のステップS15で分析手法候補決定部4が算出した分析手法毎の平均類似度を受信する(ステップS201)。続いて、分析手法が見直し基準に達しているか否かを判定する(ステップS202)。見直し基準は、例えば平均類似度が閾値を超えているまたは閾値以下となっていることである。また、分析手法見直し提案部102が分析手法毎の平均類似度の受信履歴を一定期間分もしくは一定受信件数分等保持しておき、分析手法毎の受信率が閾値を超えている場合、あるいは受信日時と平均類似度との相関が一定期間以上増加傾向または減少傾向を示している場合などに、見直し基準に達していると判断しても良い。分析手法が見直し基準に達していなければ、分析手法見直し提案部102は処理を終了する。一方、分析手法が見直し基準に達していれば、分析手法見直し提案部102はステップS203の処理に移行する。 First, the analysis method review proposing unit 102 receives the analysis purpose and the average similarity for each analysis method calculated by the analysis method candidate determining unit 4 in step S15 of FIG. 21 (step S201). Subsequently, it is determined whether or not the analysis method has reached the review standard (step S202). The review criteria are, for example, that the average similarity exceeds or is below the threshold. In addition, the analysis method review proposal unit 102 holds the reception history of the average similarity for each analysis method for a certain period or a certain number of receptions, and the reception rate for each analysis method exceeds the threshold, or the reception When the correlation between the date and the average similarity shows an increasing tendency or a decreasing tendency for a certain period or more, it may be determined that the review standard has been reached. If the analysis technique does not reach the review criteria, the analysis technique review proposal unit 102 ends the process. On the other hand, if the analysis technique has reached the review criteria, the analysis technique review proposal unit 102 proceeds to the process of step S203.

 ステップS203において分析手法見直し提案部102は、ステップS201で受信した分析目的と同一もしくは類似する過去の分析事例を、分析事例格納部3から抽出する。この時に、登録日時または更新日時の新しい事例からN件(例えば、N=1000)を抽出するというように、抽出件数を限定しても良い。また、登録日時または更新日時が直近のN年(例えば、N=5)の分析事例のみを抽出するというように、抽出期間を限定してもよい。 In step S203, the analysis technique review proposing unit 102 extracts a past analysis case that is the same as or similar to the analysis purpose received in step S201 from the analysis case storage unit 3. At this time, the number of extractions may be limited such that N cases (for example, N = 1000) are extracted from the new cases of registration date and update date. In addition, the extraction period may be limited such that only the analysis cases in the N years (for example, N = 5) with the latest registration date or update date are extracted.

 次に、抽出された分析事例で用いられている分析手法の採用率を算出する(ステップS204)。採用率Pは、例えばP=Nx/Nにより算出することができる。ただし、N:抽出件数、N:手法Xの採用数とする。このとき、分析事例格納部に分析事例の現在の活用状況が格納されている場合には、活用状況に応じて分析事例に重み付けを行っても良い。すなわち、製品適用済みの分析事例については重みを大きくし、製品化中止となった分析事例等については重みを小さくする。あるいは、分析事例の登録日時もしくは更新日時に応じて重みづけを行ってもよい。すなわち、登録日時もしくは更新日時が新しい分析事例ほど重みを大きくし、登録日時もしくは更新日時が古い分析事例ほど重みを小さくする。 Next, the adoption rate of the analysis technique used in the extracted analysis case is calculated (step S204). The adoption rate P can be calculated by, for example, P = Nx / N. However, N is the number of extractions, and N x is the number of methods X adopted. At this time, when the current utilization state of the analysis case is stored in the analysis case storage unit, the analysis case may be weighted according to the utilization state. That is, the weight is increased for the analysis example that has been applied to the product, and the weight is decreased for the analysis example that has been commercialized. Or you may weight according to the registration date of an analysis example, or an update date. That is, an analysis case with a new registration date or update date / time has a higher weight, and an analysis example with a new registration date / update date / time has a lower weight.

 次に、分析手法見直し提案部102は、採用率が分析手法見直し条件に該当する分析手法があれば、分析事例の見直しを提案する(ステップS205)。例えば、クラスタリング手法の中で、K-means法の採用率が閾値を超えた場合には、K-means法を使用していない分析事例の登録・更新ユーザ、担当者、分析手法の開発者もしくは提供者等(以下、単に「ユーザ等」と称する)に、分析手法をK-means法に見直すことを提案する。あるいは、クラスタリング手法の中で、K-means法の採用率が基準値を下回ると、K-means法を使用している分析事例のユーザ等に、分析手法をK-means法とは異なる手法に見直すことを提案する。この場合、ユーザ等に分析手法を採用率の高いものから順に、採用率と共に示したリストを提示しても良い。 Next, if there is an analysis method whose adoption rate meets the analysis method review condition, the analysis method review proposal unit 102 proposes review of the analysis example (step S205). For example, if the adoption rate of the K-means method exceeds the threshold in the clustering method, the registered / updated user of the analysis case not using the K-means method, the person in charge, the developer of the analysis method, or It is proposed that the analysis method is revised to the K-means method to providers and the like (hereinafter simply referred to as “users and the like”). Alternatively, if the adoption rate of the K-means method falls below the standard value in the clustering method, the analysis method is changed to a method different from the K-means method to users of analysis examples using the K-means method. Suggest to review. In this case, a list showing the analysis methods together with the adoption rate may be presented to the user or the like in descending order of the adoption rate.

 <E-3.効果>
 実施の形態5に係るデータ分析手法候補決定装置16において、分析手法候補決定部4により分析手法が決定された分析対象データと分析目的が同一または類似する分析事例について、分析手法の見直しを提案する分析手法見直し提案部102を備える。このように、過去の分析事例における分析手法毎の採用率を算出し、採用率に基づいて分析手法の見直しを提案することで、過去の分析事例に対しても新しい分析手法候補等の提案を実施することができ、分析手法を実行した場合の分析精度を向上させることが可能となる。
<E-3. Effect>
In the data analysis method candidate determination device 16 according to the fifth embodiment, a review of the analysis method is proposed for an analysis example whose analysis purpose is the same as or similar to the analysis target data whose analysis method is determined by the analysis method candidate determination unit 4 An analysis method review proposal unit 102 is provided. In this way, by calculating the adoption rate for each analysis method in the past analysis cases and proposing a review of the analysis method based on the adoption rate, proposals for new analysis method candidates etc. can be made for past analysis cases. It is possible to improve the analysis accuracy when the analysis method is executed.

 なお、本発明は、その発明の範囲内において、各実施の形態を自由に組み合わせたり、各実施の形態を適宜、変形、省略したりすることが可能である。 In the present invention, it is possible to freely combine the respective embodiments within the scope of the invention, and to appropriately modify and omit the respective embodiments.

 この発明は詳細に説明されたが、上記した説明は、すべての態様において、例示であって、この発明がそれに限定されるものではない。例示されていない無数の変形例が、この発明の範囲から外れることなく想定され得るものと解される。 Although the present invention has been described in detail, the above description is illustrative in all aspects, and the present invention is not limited thereto. It is understood that countless variations that are not illustrated can be envisaged without departing from the scope of the present invention.

 2 分析対象データ格納部、3 分析事例格納部、4 分析手法候補決定部、5 入力部、6 出力部、7 評価取得部、8 推薦事例格納部、9 属性追加部、10 モデル変更提案部、11,12,13,14,15,16 データ分析手法候補決定装置、20 プロセッサ、21 メモリ、22 記録媒体、101 既存データ活用提案部、102 分析手法見直し提案部。 2 analysis target data storage unit, 3 analysis case storage unit, 4 analysis method candidate determination unit, 5 input unit, 6 output unit, 7 evaluation acquisition unit, 8 recommended case storage unit, 9 attribute addition unit, 10 model change proposal unit, 11, 12, 13, 14, 15, 16 Data analysis method candidate determination device, 20 processor, 21 memory, 22 recording medium, 101 existing data utilization proposal unit, 102 analysis method review proposal unit.

Claims (13)

 データ分析を行うべき分析対象データの分析手法候補を決定するデータ分析手法候補決定装置であって、
 過去にデータ分析が行われた複数の分析済データの夫々について、データ属性および分析手法を紐付けたデータを分析事例として格納する分析事例格納部(3)と、
 前記分析対象データについて、データ属性の情報を格納する分析対象データ格納部(2)と、
 前記分析対象データのデータ属性と前記分析済データのデータ属性との類似度であるデータ属性類似度を算出し、前記データ属性類似度に基づき前記分析済データの分析手法の中から少なくとも一つの分析手法を前記分析対象データの分析手法候補として決定する分析手法候補決定部(4)と、を備える、
データ分析手法候補決定装置。
A data analysis method candidate determination device for determining an analysis method candidate of analysis target data to be analyzed,
An analysis case storage unit (3) for storing data associated with data attributes and analysis methods as analysis cases for each of a plurality of analyzed data that has been subjected to data analysis in the past,
About the analysis target data, an analysis target data storage unit (2) for storing data attribute information;
A data attribute similarity that is a similarity between the data attribute of the analysis target data and the data attribute of the analyzed data is calculated, and at least one analysis is performed from the analysis methods of the analyzed data based on the data attribute similarity An analysis method candidate determination unit (4) for determining a method as an analysis method candidate for the analysis target data,
Data analysis method candidate determination device.
 前記分析事例格納部(3)は、前記複数の分析済データの夫々について、分析目的の情報を格納し、
 前記分析対象データ格納部(2)は、前記分析対象データの分析目的の情報を格納し、
 前記分析手法候補決定部(4)は、前記分析対象データの分析目的と前記分析済データの分析目的との類似度を分析目的類似度として算出し、前記分析目的類似度及び前記データ属性類似度に基づき前記分析対象データと前記分析済データの総合類似度を算出し、前記総合類似度に基づき、前記分析済データの分析手法の中から少なくとも一つの分析手法を前記分析対象データの分析手法候補として決定する、
請求項1に記載のデータ分析手法候補決定装置。
The analysis case storage unit (3) stores analysis purpose information for each of the plurality of analyzed data,
The analysis target data storage unit (2) stores analysis purpose information of the analysis target data,
The analysis method candidate determination unit (4) calculates the similarity between the analysis purpose of the analysis target data and the analysis purpose of the analyzed data as the analysis purpose similarity, and the analysis purpose similarity and the data attribute similarity The analysis target data and the analyzed data are calculated based on the total similarity, and based on the total similarity, at least one analysis method is selected from the analysis methods of the analyzed data. As determined,
The data analysis method candidate determination device according to claim 1.
 前記分析済データ及び前記分析対象データのデータ属性は、データ取得間隔、データ取得方法、実績値か予測値か加工値の別、のいずれかを少なくとも含む、
請求項1又は2に記載のデータ分析手法候補決定装置。
The data attribute of the analyzed data and the analysis target data includes at least one of a data acquisition interval, a data acquisition method, a result value, a predicted value, or a processed value,
The data analysis technique candidate determination apparatus according to claim 1 or 2.
 前記分析手法候補決定部(4)は、前記分析対象データの分析目的の文字列と、前記分析済データの分析目的の文字列とに基づき、前記分析目的類似度を算出する、
請求項2に記載のデータ分析手法候補決定装置。
The analysis technique candidate determination unit (4) calculates the analysis target similarity based on the analysis target character string of the analysis target data and the analysis target character string of the analyzed data.
The data analysis method candidate determination device according to claim 2.
 前記分析手法候補決定部(4)は、階層構造で記載された分析対象データの分析目的と、階層構造で記載された分析済データの分析目的とに基づき、分析目的類似度を算出する、
請求項2に記載のデータ分析手法候補決定装置。
The analysis technique candidate determination unit (4) calculates the analysis object similarity based on the analysis purpose of the analysis target data described in the hierarchical structure and the analysis purpose of the analyzed data described in the hierarchical structure.
The data analysis method candidate determination device according to claim 2.
 前記分析対象データの分析目的および前記分析済みデータの分析目的がソースコード又は中間コードで記載される場合、
 前記分析手法候補決定部(4)は、前記分析対象データの分析目的の前記ソースコード又は前記中間コードに示される処理手順と、前記分析済みデータの分析目的の前記ソースコード又は前記中間コードに示される処理手順との類似度を、一致率又は一致する処理手順の連続性に基づき、前記分析目的類似度として算出する、請求項2に記載のデータ分析手法候補決定装置。
When the analysis purpose of the analysis target data and the analysis purpose of the analyzed data are described in source code or intermediate code,
The analysis technique candidate determination unit (4) indicates the processing procedure shown in the source code or the intermediate code for analysis of the analysis target data and the source code or the intermediate code for the analysis of the analyzed data. The data analysis technique candidate determination device according to claim 2, wherein the similarity to the processed processing procedure is calculated as the analysis target similarity based on a matching rate or continuity of matching processing procedures.
 前記分析手法候補決定部(4)は、分析手法ごとに、当該分析手法を用いた前記分析済データと前記分析対象データとの前記総合類似度の平均値を算出し、前記総合類似度の平均値に基づき選択した分析手法を前記分析手法候補と決定する、
請求項2,4から6のいずれか1項に記載のデータ分析手法候補決定装置。
The analysis method candidate determination unit (4) calculates, for each analysis method, an average value of the total similarity between the analyzed data using the analysis method and the analysis target data, and calculates the average of the total similarity The analysis method selected based on the value is determined as the analysis method candidate.
The data analysis method candidate determination device according to any one of claims 2, 4 to 6.
 前記分析手法候補に対するユーザの評価情報を取得する評価取得部(7)と、
 前記分析対象データのデータ属性と、前記分析対象データの前記分析手法候補と、前記分析手法候補に対する前記評価情報とを紐付けたデータを推薦事例として格納する推薦事例格納部(8)と、
をさらに備える、
請求項1から7のいずれか1項に記載のデータ分析手法候補決定装置。
An evaluation acquisition unit (7) for acquiring user evaluation information for the analysis technique candidate;
A recommended case storage unit (8) for storing data associating data attributes of the analysis target data, the analysis method candidates of the analysis target data, and the evaluation information for the analysis method candidates as recommended cases;
Further comprising
The data analysis technique candidate determination apparatus according to any one of claims 1 to 7.
 前記評価取得部(7)が取得した前記評価情報から前記分析手法候補の不採用理由を抽出し、前記不採用理由に対応する項目を前記データ属性の項目に追加する属性追加部(9)をさらに備える、
請求項8に記載のデータ分析手法候補決定装置。
An attribute adding unit (9) for extracting the reason for non-adoption of the analysis technique candidate from the evaluation information acquired by the evaluation acquiring unit (7) and adding an item corresponding to the reason for non-adopting to the item of the data attribute In addition,
The data analysis method candidate determination device according to claim 8.
 前記分析事例格納部(3)は、ユーザがある物理モデルに変更を加えた物理モデルを用いてデータ解析を行った分析事例について、変更前の物理モデルの情報を格納し、
 前記分析手法候補が物理モデルを用いる解析手法であり、前記分析手法候補で用いる物理モデルが、前記分析事例における変更前の物理モデルと同一である場合には、前記物理モデルの変更を提案するモデル変更提案部(10)をさらに備える、
請求項1から9のいずれか1項に記載のデータ分析手法候補決定装置。
The analysis case storage unit (3) stores information on the physical model before the change for the analysis case in which the data analysis is performed using the physical model in which the user has changed the physical model,
If the analysis method candidate is an analysis method using a physical model, and the physical model used in the analysis method candidate is the same as the physical model before the change in the analysis example, a model for proposing the change of the physical model A change proposal unit (10);
The data analysis method candidate determination device according to any one of claims 1 to 9.
 前記分析対象データのうち第1の分析対象データに対して前記分析手法候補決定部(4)が決定した分析手法に対して必要なデータ属性を、前記第1の分析対象データが持たない場合に、前記分析対象データのうち前記必要なデータ属性を有する第2の分析対象データの活用をユーザに提案する既存データ活用提案部(101)をさらに備える、
請求項1から10のいずれか1項に記載のデータ分析手法候補決定装置。
When the first analysis target data does not have data attributes necessary for the analysis method determined by the analysis method candidate determination unit (4) for the first analysis target data among the analysis target data. , Further comprising an existing data utilization proposing unit (101) that proposes to the user to utilize the second analysis object data having the necessary data attribute among the analysis object data,
The data analysis technique candidate determination apparatus according to any one of claims 1 to 10.
 前記第2の分析対象データはデータの流用可否に関するデータ属性を有し、
 前記既存データ活用提案部(101)は、前記第2の分析対象データの活用をユーザに提案する際に、前記第2の分析対象データの流用可否に関する情報をユーザに提供する、
請求項11に記載のデータ分析手法候補決定装置。
The second analysis target data has a data attribute relating to whether or not the data can be used,
The existing data utilization proposing unit (101) provides the user with information on whether or not the second analysis target data can be used when proposing the user to utilize the second analysis target data.
The data analysis method candidate determination device according to claim 11.
 前記分析手法候補決定部(4)により分析手法候補が決定された前記分析対象データと分析目的が同一または類似する前記分析事例について、分析手法の見直しを提案する分析手法見直し提案部(102)をさらに備える、
請求項1から12のいずれか1項に記載のデータ分析手法候補決定装置。
An analysis method review proposal unit (102) that proposes a review of the analysis method for the analysis example whose analysis purpose is the same as or similar to the analysis target data for which the analysis method candidate is determined by the analysis method candidate determination unit (4) In addition,
The data analysis method candidate determination device according to any one of claims 1 to 12.
PCT/JP2017/001371 2016-03-28 2017-01-17 Device for determining data analysis method candidate Ceased WO2017168967A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201780007854.4A CN108885628A (en) 2016-03-28 2017-01-17 Data analysis method candidate determination device
JP2018508418A JP6472573B2 (en) 2016-03-28 2017-01-17 Data analysis method candidate decision device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2016063215 2016-03-28
JP2016-063215 2016-03-28

Publications (1)

Publication Number Publication Date
WO2017168967A1 true WO2017168967A1 (en) 2017-10-05

Family

ID=59964054

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2017/001371 Ceased WO2017168967A1 (en) 2016-03-28 2017-01-17 Device for determining data analysis method candidate

Country Status (3)

Country Link
JP (1) JP6472573B2 (en)
CN (1) CN108885628A (en)
WO (1) WO2017168967A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019123732A1 (en) * 2017-12-18 2019-06-27 株式会社日立製作所 Analysis support method, analysis support server, and storage medium
JP6567229B1 (en) * 2018-03-30 2019-08-28 三菱電機株式会社 Learning processing device, data analysis device, analysis method selection method, and analysis method selection program
JP2021189522A (en) * 2020-05-26 2021-12-13 株式会社日立製作所 Analysis method retrieval device and analysis method retrieval method
JP2022028611A (en) * 2020-07-21 2022-02-16 日本電気株式会社 Method, apparatus, device, and storage medium used for information processing
JPWO2022176014A1 (en) * 2021-02-16 2022-08-25
JP7369320B1 (en) 2023-07-14 2023-10-25 コリニア株式会社 Information processing device, method, program, and system
JP7540808B1 (en) * 2024-06-28 2024-08-27 株式会社フェズ Analysis support system, analysis support method, and analysis support program
WO2025027711A1 (en) * 2023-07-28 2025-02-06 三菱電機株式会社 Data processing method distribution device, data processing method distribution method, and data processing method distribution program

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111080128A (en) * 2019-12-17 2020-04-28 内蒙古电力(集团)有限责任公司内蒙古电力科学研究院分公司 Big data analysis and reliability evaluation management system for thermal power station metal equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05204991A (en) * 1992-01-30 1993-08-13 Hitachi Ltd Time-series data search method and search system using the same
JPH07198789A (en) * 1993-12-28 1995-08-01 Mitsubishi Denki Semiconductor Software Kk Characteristic analyzing apparatus and characteristic analyzing method used in the characteristic analyzing apparatus
JPH11161498A (en) * 1997-11-26 1999-06-18 Hitachi Ltd Knowledge information analysis method, knowledge information processing system, and storage medium
JP2005157896A (en) * 2003-11-27 2005-06-16 Mitsubishi Electric Corp Data analysis support system
JP2010205218A (en) * 2009-03-06 2010-09-16 Dainippon Printing Co Ltd Data analysis support device, data analysis support system, data analysis support method, and program
JP2014202718A (en) * 2013-04-09 2014-10-27 株式会社日立ハイテクノロジーズ Chromatograph data processing apparatus, method using the same, liquid chromatograph apparatus, and program
JP2016029516A (en) * 2014-07-25 2016-03-03 株式会社日立製作所 Data analysis method and data analysis system

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014034557A1 (en) * 2012-08-31 2014-03-06 日本電気株式会社 Text mining device, text mining method, and computer-readable recording medium
JP6229665B2 (en) * 2013-01-11 2017-11-15 日本電気株式会社 Text mining device, text mining system, text mining method and program
US9576263B2 (en) * 2013-09-19 2017-02-21 Oracle International Corporation Contextualized report building
WO2015049797A1 (en) * 2013-10-04 2015-04-09 株式会社日立製作所 Data management method, data management device and storage medium
US20150170067A1 (en) * 2013-12-17 2015-06-18 International Business Machines Corporation Determining analysis recommendations based on data analysis context
CN106469202A (en) * 2016-08-31 2017-03-01 杭州探索文化传媒有限公司 A kind of data analysing method of video display big data platform

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05204991A (en) * 1992-01-30 1993-08-13 Hitachi Ltd Time-series data search method and search system using the same
JPH07198789A (en) * 1993-12-28 1995-08-01 Mitsubishi Denki Semiconductor Software Kk Characteristic analyzing apparatus and characteristic analyzing method used in the characteristic analyzing apparatus
JPH11161498A (en) * 1997-11-26 1999-06-18 Hitachi Ltd Knowledge information analysis method, knowledge information processing system, and storage medium
JP2005157896A (en) * 2003-11-27 2005-06-16 Mitsubishi Electric Corp Data analysis support system
JP2010205218A (en) * 2009-03-06 2010-09-16 Dainippon Printing Co Ltd Data analysis support device, data analysis support system, data analysis support method, and program
JP2014202718A (en) * 2013-04-09 2014-10-27 株式会社日立ハイテクノロジーズ Chromatograph data processing apparatus, method using the same, liquid chromatograph apparatus, and program
JP2016029516A (en) * 2014-07-25 2016-03-03 株式会社日立製作所 Data analysis method and data analysis system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HIDEYUKI UESUGI ET AL.: "Jinbun Kagaku to Computer Symposium Ronbunshu Tsunagaru Digital Archive - Bun'ya Soshiki-Chiiki o Koete", DEVELOPMENT OF THE DATABASE FOR PICTURES OF THE CHARACTERS ON THE MONUMENTS, vol. 2012, no. 7, 10 November 2012 (2012-11-10), pages 179 - 184 *

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20200020932A (en) * 2017-12-18 2020-02-26 가부시키가이샤 히타치세이사쿠쇼 Analysis Support Methods, Analysis Support Servers and Storage Media
JP2019109676A (en) * 2017-12-18 2019-07-04 株式会社日立製作所 Analysis support method, analysis support server and storage media
WO2019123732A1 (en) * 2017-12-18 2019-06-27 株式会社日立製作所 Analysis support method, analysis support server, and storage medium
KR102309094B1 (en) 2017-12-18 2021-10-06 가부시키가이샤 히타치세이사쿠쇼 Analysis support method, analysis support server and storage medium
CN111971664B (en) * 2018-03-30 2022-03-15 三菱电机株式会社 Learning processing device, data analysis device, analysis pattern selection method, and analysis pattern selection program
CN111971664A (en) * 2018-03-30 2020-11-20 三菱电机株式会社 Learning processing device, data analysis device, analysis pattern selection method, and analysis pattern selection program
US11042786B2 (en) 2018-03-30 2021-06-22 Mitsubishi Electric Corporation Learning processing device, data analysis device, analytical procedure selection method, and recording medium
WO2019187012A1 (en) * 2018-03-30 2019-10-03 三菱電機株式会社 Learning device, data analysis device, analytical procedure selection method, and analytical procedure selection program
JP6567229B1 (en) * 2018-03-30 2019-08-28 三菱電機株式会社 Learning processing device, data analysis device, analysis method selection method, and analysis method selection program
JP2021189522A (en) * 2020-05-26 2021-12-13 株式会社日立製作所 Analysis method retrieval device and analysis method retrieval method
JP7502081B2 (en) 2020-05-26 2024-06-18 株式会社日立製作所 Analysis technique search device and analysis technique search method
JP7173234B2 (en) 2020-07-21 2022-11-16 日本電気株式会社 Methods, apparatus, devices and storage media used for information processing
JP2022028611A (en) * 2020-07-21 2022-02-16 日本電気株式会社 Method, apparatus, device, and storage medium used for information processing
WO2022176014A1 (en) * 2021-02-16 2022-08-25 日本電信電話株式会社 Data analysis method selection device, method, and program
JPWO2022176014A1 (en) * 2021-02-16 2022-08-25
JP7469730B2 (en) 2021-02-16 2024-04-17 日本電信電話株式会社 Data analysis method selection device, method and program
JP7369320B1 (en) 2023-07-14 2023-10-25 コリニア株式会社 Information processing device, method, program, and system
JP2025012800A (en) * 2023-07-14 2025-01-24 コリニア株式会社 Information processing device, method, program, and system
WO2025027711A1 (en) * 2023-07-28 2025-02-06 三菱電機株式会社 Data processing method distribution device, data processing method distribution method, and data processing method distribution program
JP7657382B1 (en) * 2023-07-28 2025-04-04 三菱電機株式会社 Data processing system distribution device, data processing system distribution method, and data processing system distribution program
JP7540808B1 (en) * 2024-06-28 2024-08-27 株式会社フェズ Analysis support system, analysis support method, and analysis support program

Also Published As

Publication number Publication date
JP6472573B2 (en) 2019-02-20
JPWO2017168967A1 (en) 2018-07-19
CN108885628A (en) 2018-11-23

Similar Documents

Publication Publication Date Title
JP6472573B2 (en) Data analysis method candidate decision device
US9582547B2 (en) Generalized graph, rule, and spatial structure based recommendation engine
CN111046275B (en) User label determining method and device based on artificial intelligence and storage medium
US20080097937A1 (en) Distributed method for integrating data mining and text categorization techniques
CN109816438B (en) Information pushing method and device
KR102410715B1 (en) Apparatus and method for analyzing sentiment of text data based on machine learning
CN108664658B (en) Collaborative filtering video recommendation method considering dynamic change of user preference
CN112989169B (en) Target object identification method, information recommendation method, device, equipment and medium
CN112084954B (en) Video object detection method and device, electronic equipment and storage medium
Ribeiro Jr et al. Strategies for combining Twitter users geo-location methods
CN114579858A (en) Content recommendation method and apparatus, electronic device, storage medium
JP6450203B2 (en) Personal profile generation device and program thereof, and content recommendation device
JP2011253243A (en) Information providing device, information providing method and information providing program
JP2018170008A (en) Method and system for mapping attributes of entities
Won et al. A hybrid collaborative filtering model using customer search keyword data for product recommendation
Hidayati et al. The influence of user profile and post metadata on the popularity of image-based social media: A data perspective
CN105931055A (en) Service provider feature modeling method for crowdsourcing platform
WO2019184480A1 (en) Item recommendation
KR20230049486A (en) Political tendency analysis device and service providing method using the same
CN118761844A (en) Information recommendation method, system and device
KR20200072207A (en) item recommendation method based on preference sensitivity over purchase decision factor
CN118503539A (en) Data processing method, device, electronic equipment and storage medium
CN114117251B (en) A Matrix Factorization Personalized Recommendation Method Integrating Multi-Factors for Smart Wenbo
Fan et al. Context-aware web services recommendation based on user preference
CN117216361A (en) Recommended methods, devices, electronic devices and computer-readable storage media

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2018508418

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17773517

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 17773517

Country of ref document: EP

Kind code of ref document: A1