WO2017168967A1

WO2017168967A1 - Device for determining data analysis method candidate

Info

Publication number: WO2017168967A1
Application number: PCT/JP2017/001371
Authority: WO
Inventors: 敦子青木; 坂上　聡子; 岩田　雅史
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2016-03-28
Filing date: 2017-01-17
Publication date: 2017-10-05
Anticipated expiration: 2018-09-28
Also published as: JP6472573B2; JPWO2017168967A1; CN108885628A

Abstract

The purpose of the present invention is to automatically recommend an analysis algorithm for analysis target data regardless of whether a source code or an intermediate code is present. A device for determining a data analysis method candidate according to the present invention determines an analysis method candidate for analysis target data to be subjected to data analysis, and includes: an analysis example storing unit 3 that stores, as an analysis example, data in which a data attribute and an analysis method are associated with each other, for each of a plurality of previously analyzed data items; an analysis target data storing unit 2 that stores data attribute information about the analysis target data; and an analysis method candidate determining unit 4 that calculates a data attribute similarity degree as the degree of similarity between the data attribute of the analysis target data and the data attribute of each of the previously analyzed data items, and determines, as an analysis method candidate for the analysis target data, at least one analysis method among analysis methods used for the previously analyzed data items on the basis of data attribute similarity degree.

Description

Data analysis method candidate decision device

　この発明は、データ分析手法候補を決定する技術に関する。 This invention relates to a technique for determining data analysis method candidates.

　データを分析するためには、データの特徴や意味するところに応じて適切なデータ分析手法を選択する必要がある。現状では、データサイエンティストと呼ばれるデータ分析手法に詳しい専門の技術者が、データ分析手法を推薦している。近年、インターネットに接続される機器の増加により、インターネットを経由して収集されるデータが爆発的に増加しているため、これらのデータを分析するデータ分析技術者に対するニーズは高まっている。しかしながら、データ分析技術者の育成は進んでおらず、収集されたものの有効活用されていないデータが数多く存在する。 In order to analyze data, it is necessary to select an appropriate data analysis method according to the characteristics and meaning of the data. At present, engineers specializing in data analysis methods called data scientists recommend data analysis methods. In recent years, due to an increase in the number of devices connected to the Internet, data collected via the Internet has increased explosively. Therefore, there is an increasing need for data analysis engineers who analyze these data. However, the development of data analysis engineers has not progressed, and there are many data that have been collected but not effectively used.

　データ分析技術者の不足という課題を解決するためには、データ分析手法を機械的に推薦する仕組みが必要である。関連分野の技術として、特許文献１には、過去のソフトウェア製品の開発実績および変更実績に基づいて、派生製品の開発時に同時に再利用または変更すべきソフト部品を選択するソフトウェア分析装置が開示されている。特許文献１のソフトウェア分析装置では、ソースコード化されたあるソフト部品がユーザにより選択されると、当該ソフト部品と同時利用されていると考えられるソフト部品を、ソフト部品間距離に基づいて抽出し、提示する。 In order to solve the problem of lack of data analysis engineers, a mechanism to recommend data analysis methods mechanically is necessary. As a technology in the related field, Patent Document 1 discloses a software analysis device that selects software parts to be reused or changed at the same time when developing a derivative product based on past development results and change results of software products. Yes. In the software analysis apparatus of Patent Document 1, when a user selects a software component that is converted into a source code, a software component that is considered to be used simultaneously with the software component is extracted based on the distance between the software components. To present.

　また、特許文献２には、ソースコードを推薦する情報処理装置が開示されている。特許文献２の情報処理装置は、開発中のプログラムのソースコードを中間コードに変換し、これに類似する中間コードをデータベースに記憶されている中間コードから抽出し、類似する中間コードのソースコードを推薦する。 Patent Document 2 discloses an information processing device that recommends a source code. The information processing apparatus of Patent Document 2 converts the source code of a program under development into intermediate code, extracts similar intermediate code from the intermediate code stored in the database, and extracts similar intermediate code source code. Recommendation to.

特開２０１０－１１３４４９号公報JP 2010-113449 A 特開２０１３－３６６４号公報JP 2013-3664 A

　しかし、特許文献１の技術は、ソースコード化されたソフト部品が存在しなければ利用できない、という問題がある。また、ソフト部品の部品間距離のみを用いて再利用するソフト部品を選定するため、分析対象データの類似性等を手掛かりに、再利用可能なソフト部品を選定することは出来ない、という問題があった。 However, the technique of Patent Document 1 has a problem that it cannot be used unless there is a software component that has been source-coded. In addition, since the software parts to be reused are selected using only the distance between the parts of the software parts, the reusable software parts cannot be selected based on the similarity of the analysis target data. there were.

　また、特許文献２では、ソースコードの言語種別は問わないものの、ソースコード化されたプログラムから生成した中間コードが無ければ、ソースコードの推薦が出来ない、という問題があった。 In Patent Document 2, although there is no limitation on the language type of the source code, there is a problem that the source code cannot be recommended if there is no intermediate code generated from the source-coded program.

　本発明は上述の問題に鑑み、ソースコード又は中間コードの存在有無によらず、分析対象データの分析手法候補を決定することを目的とする。 The present invention has been made in view of the above-described problems, and an object of the present invention is to determine analysis method candidates for data to be analyzed regardless of the presence or absence of source code or intermediate code.

　本発明に係るデータ分析手法候補決定装置は、データ分析を行うべき分析対象データの分析手法候補を決定するデータ分析手法候補決定装置であって、過去にデータ分析が行われた複数の分析済データの夫々について、データ属性および分析手法を紐付けたデータを分析事例として格納する分析事例格納部と、前記分析対象データについて、データ属性の情報を格納する分析対象データ格納部と、前記分析対象データのデータ属性と前記分析済データのデータ属性との類似度であるデータ属性類似度を算出し、前記データ属性類似度に基づき前記分析済データの分析手法の中から少なくとも一つの分析手法を前記分析対象データの分析手法候補として決定する分析手法候補決定部と、を備える。 A data analysis method candidate determination device according to the present invention is a data analysis method candidate determination device that determines an analysis method candidate of analysis target data to be subjected to data analysis, and a plurality of analyzed data that have been subjected to data analysis in the past An analysis case storage unit that stores data associated with data attributes and analysis methods as analysis cases, an analysis target data storage unit that stores data attribute information for the analysis target data, and the analysis target data A data attribute similarity that is a similarity between the data attribute of the analyzed data and the data attribute of the analyzed data is calculated, and at least one of the analysis methods of the analyzed data is analyzed based on the data attribute similarity An analysis method candidate determination unit that determines the analysis method candidate of the target data.

　本発明に係るデータ分析手法候補決定装置は、データ分析を行うべき分析対象データの分析手法候補を決定するデータ分析手法候補決定装置であって、過去にデータ分析が行われた複数の分析済データの夫々について、データ属性および分析手法を紐付けたデータを分析事例として格納する分析事例格納部と、前記分析対象データについて、データ属性の情報を格納する分析対象データ格納部と、前記分析対象データのデータ属性と前記分析済データのデータ属性との類似度であるデータ属性類似度を算出し、前記データ属性類似度に基づき前記分析済データの分析手法の中から少なくとも一つの分析手法を前記分析対象データの分析手法候補として決定する分析手法候補決定部と、を備える。データ属性類似度に基づき分析手法候補を決定するため、各分析手法のソースコードが無くても分析手法候補を決定することができる。 A data analysis method candidate determination device according to the present invention is a data analysis method candidate determination device that determines an analysis method candidate of analysis target data to be subjected to data analysis, and a plurality of analyzed data that have been subjected to data analysis in the past An analysis case storage unit that stores data associated with data attributes and analysis methods as analysis cases, an analysis target data storage unit that stores data attribute information for the analysis target data, and the analysis target data A data attribute similarity that is a similarity between the data attribute of the analyzed data and the data attribute of the analyzed data is calculated, and at least one of the analysis methods of the analyzed data is analyzed based on the data attribute similarity An analysis method candidate determination unit that determines the analysis method candidate of the target data. Since the analysis method candidates are determined based on the data attribute similarity, the analysis method candidates can be determined without the source code of each analysis method.

　本発明の目的、特徴、態様、および利点は、以下の詳細な説明と添付図面とによって、より明白となる。 The objects, features, aspects and advantages of the present invention will become more apparent from the following detailed description and the accompanying drawings.

実施の形態１に係るデータ分析手法候補決定装置の構成を示すブロック図である。1 is a block diagram showing a configuration of a data analysis technique candidate determination device according to Embodiment 1. FIG. データ属性を例示する図である。It is a figure which illustrates a data attribute. 実施の形態１に係るデータ分析手法候補決定装置のハードウェア構成を示す図である。2 is a diagram illustrating a hardware configuration of a data analysis technique candidate determination device according to Embodiment 1. FIG. 実施の形態１に係るデータ分析手法候補決定装置の動作を示すフローチャートである。4 is a flowchart showing an operation of the data analysis technique candidate determination device according to the first embodiment. 図４のステップＳ１５における処理を示すフローチャートである。It is a flowchart which shows the process in step S15 of FIG. 距離評価軸の設定例を示す図である。It is a figure which shows the example of a setting of a distance evaluation axis. 実施の形態２に係るデータ分析手法候補決定装置の構成を示すブロック図である。6 is a block diagram illustrating a configuration of a data analysis technique candidate determination device according to Embodiment 2. FIG. 実施の形態２に係るデータ分析手法候補決定装置の動作を示すフローチャートである。10 is a flowchart showing the operation of the data analysis technique candidate determination device according to the second embodiment. 評価取得部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of an evaluation acquisition part. 実施の形態２の変形例に係るデータ分析手法候補決定装置の構成を示すブロック図である。FIG. 10 is a block diagram illustrating a configuration of a data analysis technique candidate determination device according to a modification of the second embodiment. 実施の形態２の変形例に係るデータ分析手法候補決定装置の動作を示すフローチャートである。10 is a flowchart showing an operation of a data analysis technique candidate determination device according to a modification of the second embodiment. 実施の形態３に係るデータ分析手法候補決定装置の構成を示すブロック図である。FIG. 10 is a block diagram illustrating a configuration of a data analysis technique candidate determination device according to a third embodiment. 実施の形態３に係るデータ分析手法候補決定装置の動作を示すフローチャートである。10 is a flowchart showing the operation of the data analysis technique candidate determination device according to the third embodiment. 実施の形態３に係るデータ分析手法候補決定装置の動作を示すフローチャートである。10 is a flowchart showing the operation of the data analysis technique candidate determination device according to the third embodiment. 関数フローチャートＡを示す図である。It is a figure which shows the function flowchart A. 関数フローチャートＢを示す図である。It is a figure which shows the function flowchart B. 実施の形態４に係るデータ分析手法候補決定装置の構成を示すブロック図である。FIG. 10 is a block diagram illustrating a configuration of a data analysis technique candidate determination device according to a fourth embodiment. 実施の形態４に係るデータ分析手法候補決定装置の動作を示すフローチャートである。10 is a flowchart showing the operation of the data analysis technique candidate determination device according to the fourth embodiment. 図１８のステップＳ１９における既存データ活用提案部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the existing data utilization proposal part in FIG.18 S19. 実施の形態５に係るデータ分析手法候補決定装置の構成を示すブロック図である。FIG. 10 is a block diagram illustrating a configuration of a data analysis technique candidate determination device according to a fifth embodiment. 実施の形態５に係るデータ分析手法候補決定装置の動作を示すフローチャートである。10 is a flowchart showing the operation of the data analysis technique candidate determination device according to the fifth embodiment. 図２０のステップＳ２０における分析手法見直し提案部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the analysis technique review proposal part in FIG.20 S20.

　＜Ａ．実施の形態１＞
　＜Ａ－１．構成＞
　図１は、実施の形態１に係るデータ分析手法候補決定装置１１の構成を示すブロック図である。データ分析手法候補決定装置１１は、データ分析を行うべき分析対象データの分析手法候補を決定し、それをユーザに推薦する装置である。データ分析手法候補決定装置１１は、分析対象データ格納部２、分析事例格納部３、および分析手法候補決定部４を備えている。但し、これらデータ分析手法候補決定装置１１の構成要素は、一つの装置内に設けられるだけでなく、複数の装置に分散して配置され、それら複数の装置がインターネット等のネットワークにより互いに接続され、全体として一つのシステムとしてのデータ分析手法候補決定装置１１を構成しても良い。 <A. Embodiment 1>
<A-1. Configuration>
FIG. 1 is a block diagram illustrating a configuration of a data analysis technique candidate determination device 11 according to the first embodiment. The data analysis method candidate determination device 11 is a device that determines an analysis method candidate of analysis target data to be analyzed and recommends it to the user. The data analysis method candidate determination device 11 includes an analysis target data storage unit 2, an analysis case storage unit 3, and an analysis method candidate determination unit 4. However, the components of the data analysis method candidate determination device 11 are not only provided in one device, but are also distributed and arranged in a plurality of devices, and the plurality of devices are connected to each other by a network such as the Internet, The data analysis method candidate determination device 11 as a system as a whole may be configured.

　データ分析手法候補決定装置１１は、入力部５と出力部６を利用可能である。入力部５は、ユーザからの指令又は検索条件等をデータ分析手法候補決定装置１１に入力するための入力インタフェースである。また、出力部６は、分析手法候補決定部４による分析手法候補の決定結果をユーザに出力する出力インタフェースである。図１では、入力部５と出力部６をデータ分析手法候補決定装置１１とは別の構成として示しているが、これらをデータ分析手法候補決定装置１１が備えていてもよい。 The data analysis technique candidate determination device 11 can use the input unit 5 and the output unit 6. The input unit 5 is an input interface for inputting a command or search condition from the user to the data analysis technique candidate determination device 11. The output unit 6 is an output interface that outputs the determination result of the analysis method candidate by the analysis method candidate determination unit 4 to the user. In FIG. 1, the input unit 5 and the output unit 6 are shown as different configurations from the data analysis method candidate determination device 11, but the data analysis method candidate determination device 11 may include these.

　分析対象データ格納部２は、ＨＤＤ（Ｈａｒｄ　Ｄｉｓｋ　Ｄｒｉｖｅ）又はＳＤ等といった記録媒体により構成され、データ分析を行うべき分析対象データと、当該分析対象データのデータ属性とを格納する。データ分析手法候補決定装置１１の分析対象データは、センサ等から直接計測された温度、湿度、振動、速度、加速度、圧力、日射量、距離、重量、電流、電圧、電力量、回転数、もしくは数等の時系列データ、または機器の使用履歴、アクセスログ、移動体のＧＰＳデータ、気象観測、もしくは気象予報等の離散データ、または報告書、点検記録、作業履歴、帳票、もしくは計画書等の文書データ、または人口統計もしくは白書等の統計データ等を含む。分析対象データは、これからデータ分析を行うべきデータであるが、その他に、過去にデータ分析を行った分析済データと、データ分析、予測、または推定等によって新たに作成されたデータ分析結果が分析対象データ格納部２に格納されていても良い。また、分析対象データ格納部２には、過去にデータ分析を行っていないが利用可能なデータとそのデータのデータ属性とが含まれていてもよい。なお、分析対象データ格納部２は、分析対象データのデータ属性が格納されていれば良く、分析対象データ自体は必ずしも格納されていなくても良い。分析対象データ自体が分析対象データ格納部２に格納されない分析対象データの例としては、自治体等が提供するオープンデータ、ＳＮＳ（Social Network System）に投稿されたデータ、またはデータ分析手法候補決定装置１１からアクセス可能なクラウド環境等に分散保存されたデータ等がある。 The analysis target data storage unit 2 is configured by a recording medium such as an HDD (Hard Disk Drive) or SD, and stores analysis target data to be analyzed and data attributes of the analysis target data. The data to be analyzed by the data analysis method candidate determination device 11 includes temperature, humidity, vibration, speed, acceleration, pressure, amount of solar radiation, distance, weight, current, voltage, electric energy, number of revolutions, Time series data such as numbers, device usage history, access logs, mobile GPS data, discrete data such as weather observations or weather forecasts, reports, inspection records, work history, forms, plans, etc. Includes document data or demographic or white paper statistical data. The data to be analyzed is data that should be analyzed from now on, but in addition to this, analyzed data that has been analyzed in the past and data analysis results newly created by data analysis, prediction, estimation, etc. are analyzed. It may be stored in the target data storage unit 2. The analysis target data storage unit 2 may include data that has not been analyzed in the past but can be used and data attributes of the data. The analysis target data storage unit 2 only needs to store the data attribute of the analysis target data, and the analysis target data itself does not necessarily have to be stored. Examples of analysis target data in which the analysis target data itself is not stored in the analysis target data storage unit 2 include open data provided by local governments, data submitted to an SNS (Social Network System), or data analysis method candidate determination device 11 There is data that is distributed and stored in a cloud environment that can be accessed from.

　図２は、データ属性を例示する図である。図２は、データＡ、データＢ、およびデータＣの夫々についてデータ属性を示している。データ属性とはデータの特徴を表すもので、例えばデータの取得間隔、データの取得方法、実績値か予測値か加工値の別、データ種別、関連データ、および関連機器等がある。このほか、データに対するアクセス権限をデータ属性としてもよい。 FIG. 2 is a diagram illustrating data attributes. FIG. 2 shows data attributes for data A, data B, and data C, respectively. Data attributes represent data characteristics, and include, for example, data acquisition intervals, data acquisition methods, distinction between actual values, predicted values, and processed values, data types, related data, and related devices. In addition, access authority for data may be used as a data attribute.

　分析事例格納部３は、ＨＤＤ（Ｈａｒｄ　Ｄｉｓｋ　Ｄｒｉｖｅ）又はＳＤ等といった記録媒体により構成されている。分析事例格納部３には、過去にデータ分析が行われた分析済みデータについて、データ属性と分析手法とを紐付けたデータが分析事例として格納されている。分析事例は、データ分析手法候補決定装置１１によって作成された分析事例である必要はなく、既存の分析事例、文献等による公知事例、検討段階における試適用事例、不採用事例、または分析方法変更事例等を含むことが望ましい。また、分析事例は、分析手法に対するユーザの評価情報を含んでいても良い。各分析事例において、分析手法はソースコードで記載されてもよいし、プログラムが実行可能な中間コードで記載されてもよい。あるいは、「回帰分析」または「ｋ－ｍｅａｎｓ法」等のように名称で記載されてもよい。あるいは、「統計解析→クラスタリング→ｋ－ｍｅａｎｓ法」のように、上位概念、中位概念、下位概念からなる階層構造で記載してもよい。あるいは、ＩＤ化されて記載されてもよい。 The analysis case storage unit 3 includes a recording medium such as an HDD (Hard Disk Drive) or SD. The analysis case storage unit 3 stores, as analysis cases, data in which data attributes and analysis techniques are associated with analyzed data that has been subjected to data analysis in the past. The analysis example does not need to be an analysis example created by the data analysis method candidate determination device 11, but is an existing analysis example, a publicly known example based on literature, a trial application example at the examination stage, a non-adoption example, or an analysis method change example Etc. are desirable. Further, the analysis example may include user evaluation information for the analysis method. In each analysis case, the analysis method may be described in source code, or may be described in intermediate code that can be executed by the program. Alternatively, it may be described by a name such as “regression analysis” or “k-means method”. Or you may describe by the hierarchical structure which consists of a high-order concept, a middle concept, and a low-order concept like "statistical analysis-> clustering-> k-means method". Alternatively, it may be described as an ID.

　分析手法候補決定部４は、分析対象データのデータ分析に用いるべき分析手法を過去の分析事例で用いられた分析手法の中から選択し、分析手法候補として決定する。ここで決定された分析手法候補は、出力部６から例えばテキスト形式で出力され、ユーザに推薦される。あるいは、分析手法候補に代表過去事例を合わせたものがリスト形式で出力され、ユーザに推薦されてもよい。この場合、ユーザは分析手法候補の実施例または特徴を理解しやすい。 The analysis method candidate determination unit 4 selects an analysis method to be used for data analysis of the analysis target data from the analysis methods used in the past analysis examples, and determines the analysis method candidate. The analysis method candidate determined here is output from the output unit 6 in, for example, a text format and recommended to the user. Alternatively, a combination of representative past cases with analysis method candidates may be output in a list format and recommended to the user. In this case, the user can easily understand examples or features of the analysis technique candidates.

　図３は、データ分析手法候補決定装置１１のハードウェア構成を示す図である。データ分析手法候補決定装置１１は、プロセッサ２０、メモリ２１、および記録媒体２２を備えて構成される。分析手法候補決定部４は、ＲＡＭ（Ｒａｎｄｏｍ　Ａｃｃｅｓｓ　Ｍｅｍｏｒｙ）等のメモリ２１に格納されたソフトウェアプログラムが、ＣＰＵ（Ｃｅｎｔｒａｌ　Ｐｒｏｃｅｓｓｉｎｇ　Ｕｎｉｔ）等のプロセッサ２０により実行されることにより、当該プロセッサ２０の機能として実現する。ただし、これらは複数のプロセッサが連携して実現されても良い。なお、分析手法候補決定部４は、当該動作をハードウェアの電気回路で実現する信号処理回路により実現されてもよい。ソフトウェアの分析手法候補決定部４と、ハードウェアの分析手法候補決定部４とを合わせた概念として、「部」という語に代えて「処理回路」という語を用いることも出来る。 FIG. 3 is a diagram illustrating a hardware configuration of the data analysis technique candidate determination apparatus 11. The data analysis technique candidate determination device 11 includes a processor 20, a memory 21, and a recording medium 22. The analysis technique candidate determination unit 4 is realized as a function of the processor 20 by executing a software program stored in a memory 21 such as a RAM (Random Access Memory) by a processor 20 such as a CPU (Central Processing Unit). To do. However, these may be realized in cooperation with a plurality of processors. Note that the analysis technique candidate determination unit 4 may be realized by a signal processing circuit that realizes the operation by a hardware electric circuit. As a concept combining the software analysis method candidate determination unit 4 and the hardware analysis method candidate determination unit 4, the word “processing circuit” can be used instead of the word “part”.

　＜Ａ－２．動作＞
　図４は、データ分析手法候補決定装置１１の動作を示すフローチャートである。まず、ユーザが入力部５を介して、分析対象データおよび分析目的を選択する（ステップＳ１１）。分析対象データについては、例えば分析対象データ格納部２に格納済のデータの一覧を表示して、その中からユーザに選択させても良いし、ユーザが電子ファイル等で新たに分析対象データを入力できるようにしても良い。新たに分析対象データが入力された場合、当該データは分析対象データ格納部２に格納される。 <A-2. Operation>
FIG. 4 is a flowchart showing the operation of the data analysis technique candidate determination apparatus 11. First, the user selects analysis target data and analysis purpose via the input unit 5 (step S11). For the analysis target data, for example, a list of data stored in the analysis target data storage unit 2 may be displayed, and the user may select from the list, or the user inputs new analysis target data using an electronic file or the like. You may be able to do it. When analysis target data is newly input, the data is stored in the analysis target data storage unit 2.

　分析目的については、例えばプルダウンメニュー等の一覧を表示して、その中からユーザに選択させても良いし、ユーザが文字列で入力できるようにしても良い。ここでユーザが選択した分析目的は、分析対象データ格納部２に格納される。また、分析目的は１つに限定せず、複数あっても良い。ここでは、「テレビの視聴データ」、「視聴者の視聴嗜好の分析」を、それぞれ分析対象データ、分析目的の例として説明を続ける。 For the purpose of analysis, for example, a list such as a pull-down menu may be displayed, and the user may select from the list, or the user may be able to input with a character string. The analysis purpose selected by the user here is stored in the analysis object data storage unit 2. Further, the analysis purpose is not limited to one, and there may be a plurality of analysis purposes. Here, “television viewing data” and “viewer viewing preference analysis” will be described as analysis target data and analysis purpose examples, respectively.

　次に、分析対象データ格納部２から分析手法候補決定部４に分析対象データを読み込む（ステップＳ１２）。すなわち、各テレビ端末から収集したテレビの視聴データを分析対象データとして読み込む。 Next, the analysis target data is read from the analysis target data storage unit 2 to the analysis technique candidate determination unit 4 (step S12). That is, television viewing data collected from each television terminal is read as analysis target data.

　続いて、分析対象データ格納部２から分析手法候補決定部４に、分析対象データのデータ属性および分析目的を読み込む（ステップＳ１３）。すなわち、分析対象データである「テレビの視聴データ」のデータ属性として、例えばデータ取得間隔、データ取得機器の所在地、およびデータ取得機器の所有者情報を読み込み、分析目的として「視聴者の視聴嗜好の分析」を読み込む。 Subsequently, the data attribute and analysis purpose of the analysis target data are read from the analysis target data storage unit 2 to the analysis technique candidate determination unit 4 (step S13). That is, as data attributes of “TV viewing data” that is analysis target data, for example, the data acquisition interval, the location of the data acquisition device, and the owner information of the data acquisition device are read. Read "Analysis".

　続いて、分析事例格納部３から分析手法候補決定部４に、データ属性が分析対象データと同一若しくは類似、または分析目的が分析対象データと同一若しくは類似する分析事例を読み込む（ステップＳ１４）。例えば、分析対象データ「テレビの視聴データ」とデータ属性が類似する分析事例として、「テレビの地域別視聴率調査」、「地域別好きなタレント分析」、「人気のある映画ジャンル調査」、「電力使用状況調査」、または「工場における生産効率分析」等がある。また、分析目的が類似する分析事例として、「インターネットのブラウジング履歴分析」、「商品購入状況分析」、「立ち寄り店舗分析」、「ポイントカードの保有状況分析」、「公共交通機関の乗車履歴」、または「旅行時の訪問施設分析」等がある。 Subsequently, an analysis case having a data attribute that is the same as or similar to the analysis target data or an analysis purpose that is the same or similar to the analysis target data is read from the analysis case storage unit 3 to the analysis technique candidate determination unit 4 (step S14). For example, analysis examples similar to the data to be analyzed “TV viewing data” include “TV region audience rating survey”, “regional favorite talent analysis”, “popular movie genre survey”, “ “Power usage survey” or “Production efficiency analysis in factories”. Examples of similar analysis purposes include "Internet browsing history analysis", "Product purchase status analysis", "Visit store analysis", "Point card holding status analysis", "Public transport ride history", Or “visit facility analysis during travel”.

　続いて、分析手法候補決定部４が分析対象データの分析手法候補を決定する（ステップＳ１５）。ステップＳ１５における詳細な処理内容は後述する。 Subsequently, the analysis method candidate determination unit 4 determines analysis method candidates for the analysis target data (step S15). Detailed processing contents in step S15 will be described later.

　最後に、ステップＳ１５で作成した分析手法候補を出力部６に出力してユーザに推薦し（ステップＳ１６）、処理を終了する。 Finally, the analysis technique candidate created in step S15 is output to the output unit 6 and recommended to the user (step S16), and the process ends.

　図５は、図４のステップＳ１５における、分析手法候補決定部４による分析手法候補の決定処理を示すフローチャートである。初めに、図４のステップＳ１４にて読み込んだ分析事例に関して、分析対象データと分析済データとのデータ属性類似度を算出する（ステップＳ１５１）。「公共交通機関の乗車履歴」データを分析事例の分析済データの一例として、処理を具体的に説明する。ユーザが指定した分析対象データである「テレビの視聴データ」のデータ属性と、分析済データ「公共交通機関の乗車履歴」の分析に用いた「交通系ＩＣカードの乗車履歴」データまたは「ＧＰＳデータから推定した公共交通機関の乗車経路」データ等のデータ属性とについて、データ属性類似度Ｓｚを算出する。データ属性類似度Ｓｚは、例えば以下の式により算出される。 FIG. 5 is a flowchart showing analysis method candidate determination processing by the analysis method candidate determination unit 4 in step S15 of FIG. First, the data attribute similarity between the analysis target data and the analyzed data is calculated for the analysis example read in step S14 of FIG. 4 (step S151). The processing will be specifically described using “public transit history” data as an example of analyzed data of an analysis example. Data attribute of “TV viewing data” that is analysis target data designated by the user, and “Traffic IC card boarding history” data or “GPS data” used for analyzing analyzed data “Boarding history of public transportation” The data attribute similarity Sz is calculated with respect to data attributes such as "the public transportation route estimated from". The data attribute similarity Sz is calculated by the following equation, for example.

　但し、Ｎはデータ属性として登録している項目数、Ｌｍａｘｉはｉ番目のデータ属性項目の最大距離、Ｌｉはｉ番目のデータ属性項目の距離とする。例えば、データ属性項目ごとに距離評価軸を設定し、当該距離評価軸を用いてｉ番目のデータ属性項目の距離Ｌｉを算出する。 However, N is the number of items registered as data attributes, Lmaxi is the maximum distance of the i-th data attribute item, and Li is the distance of the i-th data attribute item. For example, a distance evaluation axis is set for each data attribute item, and the distance Li of the i-th data attribute item is calculated using the distance evaluation axis.

　図６に、距離評価軸の設定例を示す。例えば、データ取得間隔については、分析対象データと分析済データのうち少なくとも一方のデータ取得間隔が不定期であれば、距離を１０とする。また、分析済データのデータ取得間隔が分析対象データのデータ取得間隔よりも短ければ、距離を０とする。また、分析対象データおよび分析済データの一方の取得間隔が他方の取得間隔の１００倍以上であれば距離を５とする。また、データ取得方法については、例えば、同一手法なら距離を０、一方がログで他方が端末入力なら距離を２、双方ともセンサログだがセンサ種別が異なっていれば距離を１とする。また、実績値と予測値の別については、例えば、双方とも実績値であれば距離を０、一方が実績値で他方が予測値であれば距離を２０、双方とも予測値であれば距離を１００とする。このように、距離評価軸は、データ属性項目ごとにルールベースで設定されてもよいし、数式で設定されても良い。また、ルール数に制限を設けなくてもよく、距離の最大値は評価軸ごとに設けてもよい。図６のように設定された距離評価軸の中で、距離が最大となるものを最大距離とする。なお、図６では距離は正の値のみのケースについて記載したが、負の値をとる距離があってもよく、１次元値を取らず２次元以上の値をとってもよい。 Fig. 6 shows an example of setting the distance evaluation axis. For example, regarding the data acquisition interval, the distance is set to 10 if at least one of the data to be analyzed and the analyzed data is irregular. If the data acquisition interval of the analyzed data is shorter than the data acquisition interval of the analysis target data, the distance is set to zero. The distance is set to 5 if one acquisition interval of analysis target data and analyzed data is 100 times or more the other acquisition interval. Regarding the data acquisition method, for example, the distance is 0 for the same method, the distance is 2 if one is a log and the other is a terminal input, and the distance is 1 if both are sensor logs but the sensor types are different. For example, if both are actual values, the distance is 0, if one is the actual value and the other is the predicted value, the distance is 20, and if both are the predicted values, the distance is 100. Thus, the distance evaluation axis may be set on a rule basis for each data attribute item, or may be set by a mathematical expression. Further, the number of rules need not be limited, and the maximum value of the distance may be provided for each evaluation axis. Among the distance evaluation axes set as shown in FIG. 6, the maximum distance is determined as the maximum distance. In FIG. 6, the case where the distance has only a positive value is described. However, there may be a distance that takes a negative value, or a two-dimensional value or more may be taken without taking a one-dimensional value.

　続いて、ステップＳ１５１でデータ属性類似度を算出した分析事例に対して、分析対象データとの分析目的類似度Ｓｐを算出する（ステップＳ１５２）。例えば、分析対象データの分析目的と分析済データの分析目的とを文字列で比較して、その類似度を分析目的類似度Ｓｐとして算出する。分析目的類似度Ｓｐは、例えばコサイン類似度またはレーベンシュタイン距離等を用いて求めることができる。例えば、分析対象データの分析目的の文字列Ａと、分析済データの分析目的の文字列Ｂとの間の分析目的類似度Ｓｐをコサイン類似度で求めると、以下の式で算出される。 Subsequently, for the analysis example for which the data attribute similarity is calculated in step S151, an analysis purpose similarity Sp with the analysis target data is calculated (step S152). For example, the analysis objective of the analysis target data and the analysis objective of the analyzed data are compared with character strings, and the similarity is calculated as the analysis objective similarity Sp. The analysis target similarity Sp can be obtained by using, for example, a cosine similarity or a Levenshtein distance. For example, when the analysis target similarity Sp between the analysis target character string A of the analysis target data and the analysis target character string B of the analyzed data is obtained as the cosine similarity, it is calculated by the following equation.

　但し、Ａ・Ｂは文字列Ａと文字列Ｂの内積、｜Ａ｜は文字列Ａの距離、｜Ｂ｜は文字列Ｂの距離とする。 However, A · B is the inner product of character string A and character string B, | A | is the distance of character string A, and | B | is the distance of character string B.

　分析対象データの分析目的の文字列Ａを「視聴者の視聴嗜好の分析」、分析済データの分析目的の文字列Ｂを「人気のある映画ジャンル調査」として、これらの分析目的類似度Ｓｐの算出方法を説明する。文字列Ａを単語レベルに分解してキーワードを抽出すると、「視聴、者、嗜好、分析」が得られ、同様に文字列Ｂからは「人気、映画、ジャンル、調査」が得られる。このとき、「嗜好＝人気」や「分析＝調査」のように、類似語を紐づけて、文字列Ｂのキーワードを「嗜好、映画、ジャンル、分析」としてもよい。類似語を定義した類似語データベースを分析対象データ格納部２または分析事例格納部３に設け、当該類似語データベースを参照して類似語の紐付けを行うことができる。 The analysis target character string A is “analysis of viewer's viewing preference” and the analysis target character string B of the analyzed data is “popular movie genre survey”. A calculation method will be described. When keywords are extracted by decomposing the character string A into word levels, “viewing, viewers, preferences, analysis” is obtained, and “popularity, movie, genre, survey” is obtained from the character string B as well. At this time, similar words may be linked and the keyword of the character string B may be “preference, movie, genre, analysis” like “preference = popular” or “analysis = survey”. A similar word database in which similar words are defined can be provided in the analysis target data storage unit 2 or the analysis example storage unit 3, and similar words can be linked by referring to the similar word database.

　文字列Ａ，Ｂをベクトル表示すると、Ａ：（視聴、者、嗜好、分析、映画、ジャンル）＝（２，１，１，１，０，０）、Ｂ：（視聴、者、嗜好、分析、映画、ジャンル）＝（０，０，１，１，１，１）となる。 When the character strings A and B are displayed as vectors, A: (viewing, audience, preference, analysis, movie, genre) = (2,1,1,1,0,0), B: (viewing, audience, preference, analysis) , Movie, genre) = (0, 0, 1, 1, 1, 1).

　また、分析目的類似度Ｓｐは、以下のように算出される。 Also, the analysis target similarity Sp is calculated as follows.

　その他の例として、分析目的がソースコードまたは中間コードで記載されている場合には、ソースコードまたは中間コードに示される処理手順をＵＭＬ（Ｕｎｉｆｉｅｄ　Ｍｏｄｅｌｉｎｇ　Ｌａｎｇｕａｇｅ、統一モデリング言語）または関数フローチャート等の手法で整理し、処理手順の類似度から分析目的類似度Ｓｐを算出しても良い。以下、図１５に示す関数フローチャートＡと図１６に示す関数フローチャートＢを例に、分析目的類似度Ｓｐの算出方法を説明する。 As another example, when the analysis purpose is described in source code or intermediate code, the processing procedure shown in the source code or intermediate code can be changed using a technique such as UML (Unified Modeling Language) or a function flowchart. The analysis purpose similarity Sp may be calculated from the similarity of the processing procedures. Hereinafter, a method for calculating the analysis object similarity Sp will be described with reference to the function flowchart A shown in FIG. 15 and the function flowchart B shown in FIG.

　関数フローチャートＡは、ステップＳ２１からステップＳ２６が順番に実行されることを示している。ステップＳ２１はＸを入力するステップ、ステップＳ２２はＸ／５をＹに代入するステップ、ステップＳ２３はＹを出力するステップ、ステップＳ２４はＺを入力するステップ、ステップＳ２５はＹ×ＺをＡに代入するステップ、ステップＳ２６はＹを出力するステップである。 The function flowchart A indicates that steps S21 to S26 are executed in order. Step S21 is a step of inputting X, step S22 is a step of substituting X / 5 for Y, step S23 is a step of outputting Y, step S24 is a step of inputting Z, and step S25 is a step of substituting Y × Z into A Step S26 is a step for outputting Y.

　関数フローチャートＢはステップＳ３１からステップＳ３３が順番に実行されることを示している。ステップＳ３１はＸを入力するステップ、ステップＳ３２はＸに関するサブルーチンのステップであり、ステップＳ３３はＹを出力するステップである。Ｘに関するサブルーチンのステップＳ３２は、Ｘ／５をＹに代入するステップＳ３４である。 Function flowchart B shows that steps S31 to S33 are executed in order. Step S31 is a step of inputting X, step S32 is a step of a subroutine relating to X, and step S33 is a step of outputting Y. Step S32 of the subroutine relating to X is step S34 in which X / 5 is substituted for Y.

　これら二つの関数フローチャートＡ，Ｂのそれぞれにおいて、処理手順の一致率を全処理手順数に対する一致処理手順数で定義したとする。入出力処理と演算処理のみを処理手順のカウント対象とした場合、一致率は以下のように算出される。 Suppose that in each of these two function flowcharts A and B, the matching rate of processing procedures is defined by the number of matching processing procedures relative to the total number of processing procedures. When only the input / output process and the arithmetic process are to be counted in the processing procedure, the coincidence rate is calculated as follows.

　この一致率に一致処理手順の連続数の大きさを加味すると、分析目的類似度Ｓｐは例えば以下のような式で表すことができる。 When the magnitude of the number of consecutive matching processing procedures is added to this matching rate, the analysis target similarity degree Sp can be expressed by the following equation, for example.

　また、分析目的が上位概念、中位概念、および下位概念からなる階層構造で記載されている場合には、上位概念、中位概念、および下位概念それぞれの分析目的類似度を（６）式で算出し、その平均をとってもよい。あるいは、上位概念、中位概念、および下位概念の選択肢のそれぞれにあらかじめ手法の類似度を考慮したＩＤ番号を付与しておき、ＩＤ番号を組み合わせた数字の差分量に基づいて分析目的類似度Ｓｐを求めてもよい。 In addition, when the analysis purpose is described in a hierarchical structure composed of a superordinate concept, a middle concept, and a subordinate concept, the analysis target similarity of each of the superordinate concept, the middle concept, and the subordinate concept is expressed by Equation (6). You may calculate and take the average. Alternatively, an ID number considering the similarity of the method is assigned to each of the upper concept, middle concept, and lower concept options in advance, and the analysis target similarity Sp is based on the difference between the numbers obtained by combining the ID numbers. You may ask for.

　例えば、ＩＤ番号の最大値を「９－９－９９」とすると、上位概念－中位概念－下位概念のＩＤ番号が「１－０－０１」で表される分析目的と、上位概念－中位概念－下位概念のＩＤ番号が「１－０－０２」で表される分析目的との分析目的類似度Ｓｐは、以下のように算出することができる。 For example, if the maximum value of the ID number is “9-9-99”, the analysis purpose in which the ID number of the superordinate concept—medium concept—subordinate concept is represented by “1-0-01” and the superordinate concept—medium The analysis object similarity Sp with the analysis object whose ID number is “1-0-02” can be calculated as follows.

　また、上位概念－中位概念－下位概念のＩＤ番号が、「１－０－０１」で表されている分析目的に対して、上位概念－中位概念－下位概念のＩＤ番号が「５－０－０１」で表されている分析目的との分析目的類似度Ｓｐは、以下のように算出することができる。 For the purpose of analysis where the ID number of the superordinate concept-intermediate concept-subordinate concept is represented by “1-0-01”, the ID number of the superordinate concept-intermediate concept-subordinate concept is “5- The analysis object similarity Sp with the analysis object represented by “0-01” can be calculated as follows.

　上記で説明した分析目的類似度Ｓｐの算出式は、あくまでも一例である。よって、特定の条件に重みづけを行ったり、分析目的類似度の算出方法の違いによる演算結果の平均値に偏りがある場合等に傾斜等の補正演算を行ったり、といった変形例が可能である。 The formula for calculating the analysis target similarity degree Sp described above is merely an example. Therefore, it is possible to make modifications such as weighting a specific condition or performing a correction operation such as tilting when there is a bias in the average value of calculation results due to a difference in the calculation method of the analysis target similarity. .

　また、分析目的の記述方法が異なる事例が混在している場合には、複数の事例を代表する事例を抽出し、代表事例についてのみすべての分析目的記述方法における分析目的を付与することで、間接的に分析目的の比較ができるようにしてもよい。 In addition, if there are cases with different description methods for analysis purposes, cases that represent multiple cases are extracted, and only the representative cases are assigned analysis purposes for all analysis purpose description methods, so that In particular, comparison for analysis purposes may be made possible.

　続いて、データ属性類似度Ｓｚと分析目的類似度Ｓｐに基づいて、分析対象データと分析済データとの総合類似度Ｓを算出する（ステップＳ１５３）。総合類似度Ｓは、例えば以下の式により算出される。 Subsequently, based on the data attribute similarity Sz and the analysis target similarity Sp, an overall similarity S between the analysis target data and the analyzed data is calculated (step S153). The total similarity S is calculated by the following formula, for example.

　続いて、総合類似度を算出していない他の分析済データが存在するか否かを確認する（ステップＳ１５４）。総合類似度を算出していない分析済データが存在すれば、ステップＳ１５１に戻り、当該分析済データに対してステップＳ１５１からステップＳ１５３までの処理を実行する。全ての分析済データに対して類似度の算出が終了すれば、ステップＳ１５５に進む。 Subsequently, it is confirmed whether there is any other analyzed data for which the total similarity is not calculated (step S154). If there is analyzed data for which the total similarity is not calculated, the process returns to step S151, and the processes from step S151 to step S153 are executed on the analyzed data. When the calculation of the similarity is completed for all analyzed data, the process proceeds to step S155.

　ステップＳ１５５では、図４のステップＳ１４で読み込んだ全ての分析事例の総合類似度から、分析手法ごとに平均類似度を算出する。例えば、図４のステップＳ１４で読み込んだ分析事例では、「回帰分析」、「ｋ－ｍｅａｎｓ法」、「行動モデルベース推論」「行動モデルベース推論及び待ち行列シミュレーション」、「ニューラルネットワーク」等の分析手法が用いられていたとする。このとき、「回帰分析」に対する平均類似度Ｓａｖは、例えば以下の式により算出される。 In step S155, an average similarity is calculated for each analysis method from the total similarity of all analysis cases read in step S14 of FIG. For example, in the analysis example read in step S14 in FIG. 4, analysis such as “regression analysis”, “k-means method”, “behavior model base reasoning”, “behavior model base reasoning and queuing simulation”, “neural network”, etc. Suppose that the technique was used. At this time, the average similarity Sav with respect to the “regression analysis” is calculated by the following equation, for example.

　但し、Ｎ_回は、データ分析手法として「回帰分析」を含む事例数を示し、ΣＳ_回は、データ分析手法として「回帰分析」を含む事例の総合類似度の和を示している。上記の例では相加平均を用いたが、相乗平均、調和平均、加重平均など、他の様々な平均を用いて平均類似度を算出しても良い。 However, N _times indicates the number of cases including “regression analysis” as the data analysis method, and ΣS _times indicates the sum of the total similarities of cases including “regression analysis” as the data analysis method. Although the arithmetic average is used in the above example, the average similarity may be calculated using various other averages such as a geometric average, a harmonic average, and a weighted average.

　１つの事例の中で複数の分析手法が用いられている場合は、複数の分析手法の組み合わせを保持したまま平均類似度を算出してもよい。あるいは、単一手法としての平均類似度を算出した後、平均類似度の高い手法についてのみ、組み合わせて使用されているデータ分析手法に対して再度、平均類似度を算出してもよい。 When a plurality of analysis methods are used in one case, the average similarity may be calculated while maintaining a combination of a plurality of analysis methods. Alternatively, after calculating the average similarity as a single method, the average similarity may be calculated again for the data analysis methods used in combination only for the method having a high average similarity.

　最後に、分析対象データに対する分析手法候補を決定する（ステップＳ１５６）。ここでは、最も平均類似度の高い分析手法を分析手法候補としても良いし、平均類似度の高い順に複数の分析手法を分析手法候補としても良い。図４のステップＳ１６で分析手法候補を出力する際、分析手法候補に加えて、その平均類似度、当該分析手法候補を含む分析事例数、または当該分析手法候補を用いている分析目的の出現頻度等を共に出力しても良い。 Finally, analysis method candidates for the analysis target data are determined (step S156). Here, the analysis method with the highest average similarity may be set as the analysis method candidate, or a plurality of analysis methods may be set as the analysis method candidates in descending order of the average similarity. When outputting the analysis method candidates in step S16 of FIG. 4, in addition to the analysis method candidates, the average similarity, the number of analysis cases including the analysis method candidates, or the appearance frequency of the analysis purpose using the analysis method candidates Etc. may be output together.

　＜Ａ－３．効果＞
　実施の形態１に係るデータ分析手法候補決定装置１１は、過去にデータ分析が行われた複数の分析済データの夫々について、データ属性および分析手法を紐付けたデータを分析事例として格納する分析事例格納部３と、分析対象データについて、データ属性の情報を格納する分析対象データ格納部２と、分析対象データのデータ属性と分析済データのデータ属性との類似度であるデータ属性類似度を算出し、データ属性類似度に基づき分析済データの分析手法の中から少なくとも一つの分析手法を分析対象データの分析手法候補として決定する分析手法候補決定部４と、を備える。従って、各分析手法のソースコードが無くても、データ属性が類似する分析事例を参考にして分析手法候補を決定することができる。 <A-3. Effect>
The data analysis technique candidate determination device 11 according to the first embodiment stores an analysis example in which data associated with a data attribute and an analysis technique is stored as an analysis example for each of a plurality of analyzed data subjected to data analysis in the past. The storage unit 3, the analysis target data storage unit 2 that stores data attribute information, and the data attribute similarity that is the similarity between the data attribute of the analysis target data and the data attribute of the analyzed data is calculated. And an analysis method candidate determination unit 4 that determines at least one analysis method as an analysis method candidate for the analysis target data from among the analysis methods of the analyzed data based on the data attribute similarity. Therefore, even if there is no source code for each analysis method, analysis method candidates can be determined with reference to analysis cases having similar data attributes.

　また、分析事例格納部３は、複数の分析済データの夫々について、分析目的の情報を格納し、分析対象データ格納部２は、分析対象データの分析目的の情報を格納し、分析手法候補決定部４は、分析対象データの分析目的と分析済データの分析目的との類似度を分析目的類似度として算出し、分析目的類似度及びデータ属性類似度に基づき分析対象データと分析済データの総合類似度を算出し、総合類似度に基づき、分析済データの分析手法の中から少なくとも一つの分析手法を分析対象データの分析手法候補として決定する。従って、各分析手法のソースコードが無くても、データ属性及び分析目的が類似する分析事例を参考にして分析手法候補を決定することができる。 The analysis case storage unit 3 stores analysis purpose information for each of a plurality of analyzed data, and the analysis target data storage unit 2 stores analysis purpose information of the analysis target data to determine analysis method candidates. The unit 4 calculates the similarity between the analysis purpose of the analysis target data and the analysis purpose of the analyzed data as the analysis target similarity, and combines the analysis target data and the analyzed data based on the analysis target similarity and the data attribute similarity. The similarity is calculated, and at least one analysis method is determined as an analysis method candidate for the analysis target data from the analysis methods of the analyzed data based on the total similarity. Therefore, even if there is no source code for each analysis method, analysis method candidates can be determined with reference to analysis cases having similar data attributes and analysis purposes.

　また、分析済データ及び分析対象データのデータ属性は、データ取得間隔、データ取得方法、実績値か予測値か加工値の別、のいずれかを少なくとも含む。これらのデータ属性の類似度に基づき分析手法候補を決定することで、各分析手法のソースコードが無くても、分析手法候補を決定することができる。 Further, the data attributes of the analyzed data and the analysis target data include at least one of a data acquisition interval, a data acquisition method, and an actual value, a predicted value, or a processed value. By determining the analysis method candidates based on the similarity of these data attributes, the analysis method candidates can be determined without the source code of each analysis method.

　また、分析手法候補決定部４は、分析対象データの分析目的の文字列と、分析済データの分析目的の文字列とに基づき、分析目的類似度を算出する。文字列同士を比較して分析目的類似度を算出し、当該分析目的類似度に基づき分析手法候補を決定することにより、各分析手法のソースコードが無くても、分析手法候補を決定することができる。 Further, the analysis method candidate determination unit 4 calculates the analysis target similarity based on the analysis target character string of the analysis target data and the analysis target character string of the analyzed data. By comparing the character strings and calculating the analysis target similarity, and determining the analysis method candidate based on the analysis target similarity, the analysis method candidate can be determined without the source code of each analysis method. it can.

　また、分析手法候補決定部４は、階層構造で記載された分析対象データの分析目的と、階層構造で記載された分析済データの分析目的とに基づき、分析目的類似度を算出する。階層ごとにあらかじめ設定された分析目的同士の類似性を比較して分析目的類似度を算出し、当該分析目的類似度に基づき分析手法候補を決定することにより、各分析手法のソースコードが無くても、分析手法候補を決定することができる。 Also, the analysis technique candidate determination unit 4 calculates the analysis object similarity based on the analysis purpose of the analysis target data described in the hierarchical structure and the analysis purpose of the analyzed data described in the hierarchical structure. By comparing the similarities between the analytical objectives set in advance for each hierarchy, calculating the analytical objective similarity, and determining the analytical technique candidates based on the analytical objective similarity, there is no source code for each analytical technique. Also, analysis method candidates can be determined.

　また、分析対象データの分析目的および分析済みデータの分析目的がソースコード又は中間コードで記載される場合、分析手法候補決定部４は、分析対象データの分析目的のソースコード又は中間コードに示される処理手順と、分析済みデータの分析目的のソースコード又は中間コードに示される処理手順との類似度を、一致率又は一致する処理手順の連続性に基づき、分析目的類似度として算出する。処理手順の一致率又は一致する処理手順の連続性等に基づき分析目的類似度を算出し、当該分析目的類似度に基づき分析手法候補を決定することにより、分析目的がソースコード又は中間コードで記載されている場合にも、分析手法候補を決定することができる。 When the analysis purpose of the analysis target data and the analysis purpose of the analyzed data are described in the source code or the intermediate code, the analysis method candidate determination unit 4 is indicated in the source code or the intermediate code for the analysis target data. The similarity between the processing procedure and the processing procedure indicated in the analysis-purpose source code or intermediate code of the analyzed data is calculated as the analysis-object similarity based on the matching rate or the continuity of the matching processing procedures. Analytical purpose is described in source code or intermediate code by calculating analytical target similarity based on the matching rate of processing procedures or continuity of matching processing procedures and determining analysis method candidates based on the analytical target similarity Even in such a case, analysis method candidates can be determined.

　また、分析手法候補決定部４は、分析手法ごとに、当該分析手法を用いた分析済データと分析対象データとの総合類似度の平均値を算出し、総合類似度の平均値に基づき選択した分析手法を分析手法候補と決定する。従って、各分析手法のソースコードが無くても、分析手法候補を決定することができる。 The analysis method candidate determination unit 4 calculates the average value of the overall similarity between the analyzed data using the analysis method and the analysis target data for each analysis method, and selects the average based on the average value of the overall similarity The analysis method is determined as a candidate analysis method. Therefore, analysis method candidates can be determined without the source code of each analysis method.

　＜Ｂ．実施の形態２＞
　＜Ｂ－１．構成＞
　図７は、実施の形態２に係るデータ分析手法候補決定装置１２の構成を示すブロック図である。データ分析手法候補決定装置１２は、実施の形態１に係るデータ分析手法候補決定装置１１の構成に加えて、新たに評価取得部７と、推薦事例格納部８とを備えている。 <B. Second Embodiment>
<B-1. Configuration>
FIG. 7 is a block diagram illustrating a configuration of the data analysis technique candidate determination device 12 according to the second embodiment. In addition to the configuration of the data analysis technique candidate determination device 11 according to Embodiment 1, the data analysis method candidate determination apparatus 12 newly includes an evaluation acquisition unit 7 and a recommended case storage unit 8.

　推薦事例格納部８は、ＨＤＤ（Ｈａｒｄ　Ｄｉｓｋ　Ｄｒｉｖｅ）又はＳＤ等といった記録媒体により構成され、推薦事例データを格納する。推薦事例データとは、過去に分析手法候補決定部４で決定した分析手法候補が、分析対象データおよび分析目的に紐付けられたデータである。 The recommended case storage unit 8 is composed of a recording medium such as an HDD (Hard Disk Drive) or SD, and stores recommended case data. The recommended case data is data in which analysis method candidates determined by the analysis method candidate determination unit 4 in the past are associated with the analysis target data and the analysis purpose.

　評価取得部７は、ユーザが入力部５を介して入力した分析手法候補に対する評価情報を取得し、当該評価情報を、推薦事例格納部８に格納された対応する推薦事例に追加する。すなわち、推薦事例格納部８では、分析対象データ、分析目的、および分析手法候補からなる推薦事例と、当該推薦事例に対する評価情報とが紐付けて格納されている。評価取得部７は、図３に示すプロセッサ２０がメモリ２１に格納されたソフトウェアプログラムを実行することにより、プロセッサ２０の機能として実現する。 The evaluation acquisition unit 7 acquires evaluation information for the analysis technique candidate input by the user via the input unit 5, and adds the evaluation information to the corresponding recommended case stored in the recommended case storage unit 8. That is, the recommended case storage unit 8 stores recommended cases made up of analysis target data, analysis purposes, and analysis method candidates, and evaluation information for the recommended cases in association with each other. The evaluation acquisition unit 7 is realized as a function of the processor 20 by the processor 20 illustrated in FIG. 3 executing a software program stored in the memory 21.

　＜Ｂ－２．動作＞
　図８は、データ分析手法候補決定装置１２の動作を示すフローチャートである。ステップＳ１１～Ｓ１６までは実施の形態１と同様であり、図４で既に説明しているため、ここでは説明を省略する。分析手法候補決定部４は、分析手法候補を決定し（ステップＳ１５）、当該分析手法候補を出力部６に出力すると（ステップＳ１６）、分析対象データ、分析目的、および分析手法候補を紐付けたデータ（推薦事例）を推薦事例格納部８に格納する（ステップＳ１７）。 <B-2. Operation>
FIG. 8 is a flowchart showing the operation of the data analysis technique candidate determination device 12. Steps S11 to S16 are the same as those in the first embodiment and have already been described with reference to FIG. The analysis method candidate determination unit 4 determines the analysis method candidate (step S15), and outputs the analysis method candidate to the output unit 6 (step S16), and associates the analysis target data, the analysis purpose, and the analysis method candidate. Data (recommended case) is stored in the recommended case storage unit 8 (step S17).

　図９は、評価取得部７の動作を示すフローチャートである。このフローは、推薦事例格納部８に推薦事例が格納されている場合にのみ行われる。まず、評価取得部７は、評価情報を付加すべき推薦事例を決定する（ステップＳ７１）。例えば、推薦事例格納部８に格納された全推薦事例をリスト表示した画面を表示し、当該画面からユーザに推薦事例を選択させても良い。また、ユーザに分析対象データまたは分析目的等の条件を入力させ、入力された条件から推奨事例を特定又は絞り込んでも良い。また、まだ評価情報が付加されていない推薦事例を推薦事例格納部８から抽出してユーザに提示し、ユーザに選択させても良い。 FIG. 9 is a flowchart showing the operation of the evaluation acquisition unit 7. This flow is performed only when a recommended case is stored in the recommended case storage unit 8. First, the evaluation acquisition unit 7 determines a recommended case to which evaluation information is to be added (step S71). For example, a screen displaying a list of all recommended cases stored in the recommended case storage unit 8 may be displayed, and the user may select a recommended case from the screen. In addition, the user may be allowed to input conditions such as analysis target data or analysis purpose, and the recommended cases may be specified or narrowed down based on the input conditions. Moreover, the recommended case to which evaluation information is not yet added may be extracted from the recommended case storage unit 8 and presented to the user, and the user may select it.

　次に、ステップＳ７１で決定した推薦事例で推薦された複数の分析手法候補のうち、実際にユーザが使用した分析手法候補を特定する（ステップＳ７２）。ユーザが複数の分析手法候補を使用した場合には、複数の分析手法候補が特定される。ここでは、例えば複数の分析手法候補のリスト画面を表示し、当該リスト画面から実際にユーザが使用した分析手法候補を選択させる。 Next, among the plurality of analysis method candidates recommended in the recommended case determined in step S71, the analysis method candidate actually used by the user is specified (step S72). When the user uses a plurality of analysis technique candidates, the plurality of analysis technique candidates are specified. Here, for example, a list screen of a plurality of analysis method candidates is displayed, and analysis method candidates actually used by the user are selected from the list screen.

　次に、ステップＳ７２で特定した分析手法候補について、ユーザの評価情報を取得する（ステップＳ７３）。ユーザの評価情報は、入力部５からユーザに入力させることによって取得する。評価情報は、例えば分析精度、ユーザの個人所感、実行時間等の補足情報を含む。また、複数の分析手法候補のリスト画面からユーザに最も望ましい結果が得られた分析手法候補を選択させても良い。あるいは、最も望ましい一つを選択する代わりに、望ましい結果が得られた順に分析手法候補に順位を入力させても良い。 Next, user evaluation information is acquired for the analysis method candidate specified in step S72 (step S73). The user evaluation information is obtained by causing the user to input from the input unit 5. The evaluation information includes supplementary information such as analysis accuracy, user's personal feeling, execution time, and the like. Alternatively, the analysis method candidate that provides the most desirable result may be selected by the user from a list screen of a plurality of analysis method candidates. Alternatively, instead of selecting the most desirable one, the ranks may be input to the analysis method candidates in the order in which desirable results are obtained.

　また、上記のような良い評価に関する情報以外に、悪い評価に関する情報を取得しても良い。例えば、ユーザが使用したものの、何らかの課題がある等の理由で結果的に採用しなかった分析手法候補があれば、当該分析手法候補に関する課題を入力させても良い。また、課題については実際にユーザが使用していない分析手法候補についても入力可能とする。また、課題等の補足情報は、予め用意した選択肢の中から回答を選択させても良いし、自由に入力させても良い。 Moreover, in addition to the information related to good evaluation as described above, information related to bad evaluation may be acquired. For example, if there is an analysis method candidate that has been used by the user but has not been adopted as a result due to some problem, for example, a problem related to the analysis method candidate may be input. In addition, it is possible to input analysis methods candidates that are not actually used by the user. Further, supplementary information such as assignments may be selected from answers prepared in advance or may be freely input.

　評価取得部７は、こうして取得した評価情報を推薦事例に付与して、推薦事例格納部８に格納する（ステップＳ７４）。 The evaluation acquisition unit 7 assigns the evaluation information thus acquired to the recommended case and stores it in the recommended case storage unit 8 (step S74).

　さらに、評価取得部７は、評価情報が付与された推薦事例のうち、望ましい評価情報が付与された分析手法候補に関する推薦事例を、新たな分析事例として分析事例格納部３に追加する（ステップＳ７５）。例えば、分析対象データ「テレビの視聴データ」、分析目的「視聴者の視聴嗜好の分析」に対する分析手法候補「回帰分析」、「ｋ－ｍｅａｎｓ法」のうち、「回帰分析」に対して望ましい評価情報を取得し、「ｋ－ｍｅａｎｓ法」について望ましくない評価情報を取得した場合には、分析対象データ「テレビの視聴データ」、分析目的「視聴者の視聴嗜好の分析」、分析手法「回帰分析」を新たな分析事例として分析事例格納部３に追加する。複数の分析手法について望ましい評価情報を得た場合には、望ましい評価情報を得た全ての分析手法について、上記のとおり分析事例格納部３に追加する。このようにして、望ましい評価情報を得た分析事例が追加され、それを用いて分析手法候補の決定を行うことにより、分析手法候補の決定精度が向上する。 Furthermore, the evaluation acquisition unit 7 adds, to the analysis case storage unit 3, a recommended case related to an analysis method candidate to which desirable evaluation information is assigned among recommended cases to which evaluation information is assigned (step S <b> 75). ). For example, among the analysis method candidates “regression analysis” and “k-means method” for the analysis target data “TV viewing data” and the analysis purpose “analysis of viewer viewing preference”, desirable evaluation for “regression analysis” When information is acquired and undesirable evaluation information is acquired for the “k-means method”, the analysis target data “TV viewing data”, the analysis purpose “analysis of viewer viewing preferences”, the analysis method “regression analysis” Is added to the analysis case storage unit 3 as a new analysis case. When desirable evaluation information is obtained for a plurality of analysis methods, all the analysis methods that have obtained desirable evaluation information are added to the analysis case storage unit 3 as described above. In this way, an analysis example obtained with desirable evaluation information is added, and the analysis method candidate is determined using the analysis example, thereby improving the determination accuracy of the analysis method candidate.

　＜Ｂ－３．変形例＞
　図１０は、実施の形態２の変形例に係るデータ分析手法候補決定装置１３の構成を示すブロック図である。データ分析手法候補決定装置１３は、データ分析手法候補決定装置１２の構成に加えて、属性追加部９を備える。属性追加部９以外のデータ分析手法候補決定装置１３の構成は、データ分析手法候補決定装置１２と同様である。 <B-3. Modification>
FIG. 10 is a block diagram illustrating a configuration of a data analysis technique candidate determination device 13 according to a modification of the second embodiment. The data analysis technique candidate determination device 13 includes an attribute adding unit 9 in addition to the configuration of the data analysis technique candidate determination device 12. The configuration of the data analysis method candidate determination device 13 other than the attribute adding unit 9 is the same as that of the data analysis method candidate determination device 12.

　属性追加部９は、評価取得部７で取得した分析手法候補の不採用理由を分析し、不採用理由に対応するデータ属性を、分析対象データ格納部２にデータ属性が格納されている全ての分析対象データの新たなデータ属性項目として追加する。このとき属性追加部９は、追加されたデータ属性項目を、出力部６を通してシステム管理者等のユーザに通知し、追加されたデータ属性項目に関するデータ属性を入力するように促しても良い。また、追加されたデータ属性項目についてデータ属性類似度を算出するための距離評価軸もデータ属性と同様、ユーザに入力するように促しても良い。ユーザは、入力部５を通して、これらのデータ属性又は距離評価軸をデータ分析手法候補決定装置１３に入力することができる。属性追加部９は、図６に示すプロセッサ２０がメモリ２１に格納されたソフトウェアプログラムを実行することにより、プロセッサ２０の機能として実現する。 The attribute adding unit 9 analyzes the reason for non-adoption of the analysis technique candidate acquired by the evaluation acquiring unit 7, and sets the data attribute corresponding to the reason for non-adoption to all the data attributes stored in the analysis target data storage unit 2. It adds as a new data attribute item of analysis object data. At this time, the attribute adding unit 9 may notify the user such as a system administrator of the added data attribute item through the output unit 6 and prompt the user to input the data attribute related to the added data attribute item. Also, the distance evaluation axis for calculating the data attribute similarity for the added data attribute item may be urged to be input to the user in the same manner as the data attribute. The user can input these data attributes or distance evaluation axes to the data analysis technique candidate determination device 13 through the input unit 5. The attribute adding unit 9 is realized as a function of the processor 20 when the processor 20 illustrated in FIG. 6 executes a software program stored in the memory 21.

　図１１は、データ分析手法候補決定装置１３における属性追加部９の動作を示すフローチャートである。このフローは、推薦事例格納部８において、分析手法候補の不採用理由が格納されている場合に実行される。 FIG. 11 is a flowchart showing the operation of the attribute adding unit 9 in the data analysis technique candidate determination device 13. This flow is executed when the reason for not adopting the analysis method candidate is stored in the recommended case storage unit 8.

　まず、推薦事例格納部８から評価情報が付与された推薦事例を抽出する（ステップＳ８１）。 First, a recommended case to which evaluation information is given is extracted from the recommended case storage unit 8 (step S81).

　次に、ステップＳ８１で抽出した推薦事例の不採用となった分析手法候補について、その不採用理由を抽出する（ステップＳ８２）。 Next, the reason for non-adoption is extracted for the analysis method candidate that is not used in the recommended case extracted in step S81 (step S82).

　続いて、ステップＳ８２で抽出した不採用理由を分析する（ステップＳ８３）。分析手法としては、キーワード抽出による頻度解析または単純統計等を用いることができる。 Subsequently, the reason for non-employment extracted in step S82 is analyzed (step S83). As an analysis method, frequency analysis by keyword extraction, simple statistics, or the like can be used.

　最後に、分析した不採用理由に対応するデータ属性項目を、分析対象データ格納部２に格納される分析対象データのデータ属性の項目として追加する（ステップＳ８４）。例えば、ステップＳ８３で不採用理由を分析した結果、「実行時間が長い」、「処理が重い」といったキーワードが不採用理由として多いことが分かれば、「計算量」、「単位量当たりの実行時間」等の計算負荷に関する項目をデータ属性に追加する。 Finally, the data attribute item corresponding to the analyzed non-recruitment reason is added as the data attribute item of the analysis target data stored in the analysis target data storage unit 2 (step S84). For example, as a result of analyzing the reason for non-adoption in step S83, if it is found that keywords such as “execution time is long” and “processing is heavy” are many as reasons for non-adoption, “calculation amount”, “execution time per unit amount” To the data attribute.

　このように、データ分析手法候補決定装置１３によれば、分析手法候補の不採用理由に対応したデータ属性を追加することによって、分析手法候補決定部４における分析手法候補の決定にあたり、より細かくデータ属性類似度の判断をすることが出来るようになる。従って、分析手法候補の決定精度を向上させることが出来る。 As described above, according to the data analysis method candidate determination device 13, by adding the data attribute corresponding to the reason for not using the analysis method candidate, the analysis method candidate determination unit 4 determines the analysis method candidate more finely. It becomes possible to judge attribute similarity. Therefore, it is possible to improve the determination accuracy of the analysis method candidate.

　＜Ｂ－４．効果＞
　実施の形態２に係るデータ分析手法候補決定装置１２は、実施の形態１に係るデータ分析手法候補決定装置１１の構成に加えて、分析手法候補に対するユーザの評価情報を取得する評価取得部７と、分析対象データのデータ属性と、分析対象データの分析手法候補と、分析手法候補に対する評価情報とを紐付けたデータを推薦事例として格納する推薦事例格納部８と、を備える。このように、分析手法候補の決定結果を推薦事例として格納すれば、例えば望ましい評価情報を得た推薦事例を分析事例として用いることにより、分析手法候補の決定精度が向上させることが出来る。 <B-4. Effect>
In addition to the configuration of the data analysis method candidate determination device 11 according to the first embodiment, the data analysis method candidate determination device 12 according to the second embodiment includes an evaluation acquisition unit 7 that acquires user evaluation information for the analysis method candidate; And a recommended case storage unit 8 that stores data associating data attributes of the analysis target data, analysis method candidates of the analysis target data, and evaluation information for the analysis method candidates as recommended cases. As described above, if the determination result of the analysis method candidate is stored as a recommended case, the determination accuracy of the analysis method candidate can be improved by using, for example, the recommended case that has obtained desirable evaluation information as the analysis case.

　また、実施の形態２の変形例に係るデータ分析手法候補決定装置１３は、実施の形態２に係るデータ分析手法候補決定装置１２の構成に加えて、評価取得部７が取得した評価情報から分析手法候補の不採用理由を抽出し、不採用理由に対応する項目をデータ属性の項目に追加する属性追加部９を備える。従って、分析手法候補決定部４における分析手法候補の決定にあたり、より細かくデータ属性類似度の判断をすることが出来るようになるため、分析手法候補の決定精度を向上させることが出来る。 In addition to the configuration of the data analysis technique candidate determination device 12 according to the second embodiment, the data analysis method candidate determination apparatus 13 according to the modification of the second embodiment analyzes from the evaluation information acquired by the evaluation acquisition unit 7. An attribute adding unit 9 is provided for extracting the reason for non-adoption of method candidates and adding an item corresponding to the reason for non-adoption to the data attribute item. Therefore, when the analysis method candidate determination unit 4 determines the analysis method candidate, the data attribute similarity can be determined more finely, so that the determination accuracy of the analysis method candidate can be improved.

　＜Ｃ．実施の形態３＞
　＜Ｃ－１．構成＞
　図１２は、実施の形態３に係るデータ分析手法候補決定装置１４の構成を示すブロック図である。データ分析手法候補決定装置１４は、実施の形態１に係るデータ分析手法候補決定装置１１の構成に加えて、モデル変更提案部１０を備えている。 <C. Embodiment 3>
<C-1. Configuration>
FIG. 12 is a block diagram illustrating a configuration of the data analysis technique candidate determination device 14 according to the third embodiment. The data analysis method candidate determination device 14 includes a model change proposing unit 10 in addition to the configuration of the data analysis method candidate determination device 11 according to the first embodiment.

　モデル変更提案部１０は、分析手法候補決定部４で決定した分析手法候補が物理モデルベース解析手法を含む場合に、物理モデルの修正や追加といった物理モデルの変更を提案する。ここで、物理モデルベース解析手法とは、機器モデル、故障モデル、挙動モデル、相関モデル、またはユーザモデル等、データまたは設計情報に基づく物理モデルを活用したデータ分析手法全般を示している。物理モデルはパラメータシートのような文書形式で記載されてもよく、ＦＴＡ（Fault Tree Analysis）図、故障木、または電気回路図等の図表形式で記載されてもよいし、運動方程式またはバスタブ曲線等の数式で記載されてもよいし、アセンブラまたはソースコードのような機械言語で記載されてもよい。モデル変更提案部１０は、図３に示すプロセッサ２０がメモリ２１に格納されたソフトウェアプログラムを実行することにより、プロセッサ２０の機能として実現する。 The model change proposing unit 10 proposes a physical model change such as correction or addition of a physical model when the analysis method candidate determined by the analysis method candidate determining unit 4 includes a physical model-based analysis method. Here, the physical model-based analysis method indicates all data analysis methods using a physical model based on data or design information, such as a device model, a failure model, a behavior model, a correlation model, or a user model. The physical model may be described in a document format such as a parameter sheet, may be described in a chart format such as an FTA (Fault Tree Analysis) diagram, a fault tree, or an electric circuit diagram, an equation of motion or a bathtub curve, etc. Or a machine language such as assembler or source code. The model change proposing unit 10 is realized as a function of the processor 20 by the processor 20 illustrated in FIG. 3 executing a software program stored in the memory 21.

　分析事例格納部３には、分析対象データと、当該分析対象データの分析目的およびデータ属性と、分析手法とが、分析事例として格納されている。さらに、分析手法が物理モデルベース解析手法である場合には、物理モデルの変更情報も分析事例として格納されている。具体的には、ユーザがある物理モデルに変更（追加、修正）を加えた上で、変更後の物理モデルを用いてデータ分析を行った場合に、実際にデータ分析に用いた変更後の物理モデルだけでなく、変更前の物理モデルも変更情報として分析事例格納部３に格納される。 The analysis case storage unit 3 stores analysis target data, an analysis purpose and data attribute of the analysis target data, and an analysis technique as an analysis case. Furthermore, when the analysis method is a physical model-based analysis method, change information of the physical model is also stored as an analysis example. Specifically, when a user makes a change (addition, modification) to a physical model and then performs data analysis using the changed physical model, the changed physical model actually used for data analysis Not only the model but also the physical model before the change is stored in the analysis case storage unit 3 as change information.

　以上に説明した以外のデータ分析手法候補決定装置１４の構成は、実施の形態１に係るデータ分析手法候補決定装置１１の構成と同様である。 The configuration of the data analysis technique candidate determination device 14 other than that described above is the same as the configuration of the data analysis method candidate determination device 11 according to the first embodiment.

　＜Ｃ－２．動作＞
　図１３は、データ分析手法候補決定装置１４の動作を示すフローチャートである。ステップＳ１１～１５、Ｓ１６は実施の形態１と同様であるが、ステップＳ１５とステップＳ１６の間に新たなステップＳ１８が追加される点が実施の形態１とは異なる。分析手法候補決定部４で分析対象データの分析手法候補が決定されると（ステップＳ１５）、当該分析手法候補が物理モデルベース解析手法を含む場合、モデル変更提案部１０が物理モデルの変更を提案する（ステップＳ１８）。 <C-2. Operation>
FIG. 13 is a flowchart showing the operation of the data analysis technique candidate determination device 14. Steps S11 to S15 and S16 are the same as in the first embodiment, but differ from the first embodiment in that a new step S18 is added between steps S15 and S16. When the analysis method candidate of the analysis target data is determined by the analysis method candidate determination unit 4 (step S15), when the analysis method candidate includes a physical model-based analysis method, the model change proposing unit 10 proposes a change of the physical model. (Step S18).

　図１４は、図１３のステップＳ１８におけるモデル変更提案部１０の動作を示すフローチャートである。このフローは、分析事例格納部３に物理モデルの変更情報が格納されている場合にのみ実行される。 FIG. 14 is a flowchart showing the operation of the model change proposing unit 10 in step S18 of FIG. This flow is executed only when physical model change information is stored in the analysis case storage unit 3.

　まず、図１３のステップＳ１５で分析手法候補決定部４が決定した分析手法候補に、物理モデルベース解析手法が含まれているかを判定する（ステップＳ１８１）。物理モデルベース解析手法が含まれていなければ、モデル変更提案部１０の処理を終了する。物理モデルベース解析手法が含まれていれば、ステップＳ１８２に移行する。 First, it is determined whether the analysis method candidate determined by the analysis method candidate determination unit 4 in step S15 in FIG. 13 includes a physical model base analysis method (step S181). If the physical model base analysis method is not included, the process of the model change proposing unit 10 is terminated. If the physical model-based analysis method is included, the process proceeds to step S182.

　ステップＳ１８２では、分析事例格納部３に格納された分析事例の中から、分析手法候補に含まれる物理モデルデータベース解析手法と同一の分析手法を用い、かつ物理モデルの変更情報が記載された分析事例を抽出する。 In step S182, the analysis example stored in the analysis example storage unit 3 uses the same analysis method as the physical model database analysis method included in the analysis method candidates, and the analysis example describes the change information of the physical model To extract.

　次に、変更情報で示された変更後の物理モデルデータが分析事例格納部３に格納されているか否かを判断する（ステップＳ１８３）。そして、変更後の物理モデルデータが分析事例格納部３に存在すれば、当該変更後の物理モデルの活用をユーザに提案する（ステップＳ１８４）。例えば、過去にユーザが分析対象データ「公共交通機関の乗車履歴」を分析する際に、乗客モデルＡを物理モデルとして使用する分析手法が分析手法候補として推薦されたとする。これに対して、ユーザが乗客モデルＡに何らかの修正を加えたり新たな乗客モデルを追加したりする等の変更を加えた乗客モデルＢによってデータ分析を行った場合、分析事例格納部３には、分析対象データ、分析目的、実際に使用した分析手法（乗客モデルＢ）に加えて、変更前の乗客モデルＡが記録される。その後、別のデータ分析において、分析手法候補決定部４が乗客モデルＡを物理モデルとして使用する分析手法を分析手法候補として決定した場合には、乗客モデルＡに代えて乗客モデルＢを使用するようユーザに提案する。 Next, it is determined whether or not the changed physical model data indicated by the change information is stored in the analysis case storage unit 3 (step S183). If the changed physical model data exists in the analysis example storage unit 3, the user is suggested to utilize the changed physical model (step S184). For example, it is assumed that an analysis method that uses the passenger model A as a physical model has been recommended as an analysis method candidate when the user previously analyzed the analysis target data “boarding history of public transportation”. On the other hand, when the user performs a data analysis using the passenger model B in which some modifications are made to the passenger model A or a new passenger model is added, the analysis example storage unit 3 includes: In addition to the analysis target data, the analysis purpose, and the analysis method actually used (passenger model B), the passenger model A before the change is recorded. Thereafter, in another data analysis, when the analysis method candidate determination unit 4 determines an analysis method that uses the passenger model A as a physical model as the analysis method candidate, the passenger model B is used instead of the passenger model A. Suggest to users.

　ステップＳ１８３で、変更後の物理モデルデータが分析事例格納部３に存在しなければ、物理モデルの変更（修正または追加）を行うための手法をユーザに提案する。例えば、「商品購入状況分析」という分析目的に対して、購入客モデルを物理モデルとして使用する分析手法が分析手法候補である場合には、購入客モデルを分析したい商品ジャンルに適した区分に修正したり、「子供に代わって親が買う」という購入客モデルを追加したりするための手法を提案する。 In step S183, if the changed physical model data does not exist in the analysis example storage unit 3, a method for changing (correcting or adding) the physical model is proposed to the user. For example, if the analysis method that uses the customer model as a physical model is an analysis method candidate for the analysis purpose of “product purchase situation analysis”, the customer model is corrected to a category suitable for the product genre you want to analyze. And a method for adding a purchaser model that “a parent buys on behalf of a child” is proposed.

　＜Ｃ－３．効果＞
　実施の形態３に係るデータ分析手法候補決定装置１４において、分析事例格納部３に格納される分析事例データは、ユーザがある物理モデルに変更を加えた物理モデルを用いてデータ解析を行った分析事例について、変更前の物理モデルの情報を含む。そして、データ分析手法候補決定装置１４は、実施の形態１に係るデータ分析手法候補決定装置１１の構成に加えて、モデル変更提案部１０を備える。モデル変更提案部１０は、分析手法候補が物理モデルを用いる解析手法であり、分析手法候補で用いる物理モデルが、分析事例における変更前の物理モデルと同一である場合に、物理モデルの変更を提案する。従って、物理モデルベース解析手法に関する分析精度を向上させることが可能となる。 <C-3. Effect>
In the data analysis technique candidate determination device 14 according to the third embodiment, the analysis case data stored in the analysis case storage unit 3 is an analysis in which data analysis is performed using a physical model obtained by changing a certain physical model. For the case, information on the physical model before the change is included. The data analysis technique candidate determination device 14 includes a model change proposing unit 10 in addition to the configuration of the data analysis technique candidate determination device 11 according to the first embodiment. The model change proposing unit 10 proposes a change of the physical model when the analysis method candidate is an analysis method using a physical model and the physical model used in the analysis method candidate is the same as the physical model before the change in the analysis example. To do. Therefore, it is possible to improve the analysis accuracy regarding the physical model-based analysis method.

　＜Ｄ．実施の形態４＞
　＜Ｄ－１．構成＞
　図１７は、実施の形態４に係るデータ分析手法候補決定装置１５の構成を示すブロック図である。データ分析手法候補決定装置１５は、実施の形態１に係るデータ分析手法候補決定装置１１の構成に加えて、既存データ活用提案部１０１を備えている。 <D. Embodiment 4>
<D-1. Configuration>
FIG. 17 is a block diagram illustrating a configuration of the data analysis technique candidate determination device 15 according to the fourth embodiment. The data analysis technique candidate determination device 15 includes an existing data utilization proposal unit 101 in addition to the configuration of the data analysis technique candidate determination device 11 according to the first embodiment.

　既存データ活用提案部１０１は、分析手法候補決定部４が決定した分析手法の実行に必要なデータ属性を、ユーザが選定している分析対象データ（第１分析対象データ）が持たない場合に、分析対象データ格納部２に保存された過去の分析対象データの中から、必要なデータ属性を有する分析対象データ（第２分析対象データ）を抽出し、第２分析対象データの活用をユーザに提案する。既存データ活用提案部１０１は、図３に示すプロセッサ２０がメモリ２１に格納されたソフトウェアプログラムを実行することにより、プロセッサ２０の機能として実現する。 When the analysis target data (first analysis target data) selected by the user does not have the data attributes necessary for the execution of the analysis method determined by the analysis method candidate determination unit 4, the existing data utilization proposal unit 101 Extract the analysis target data (second analysis target data) having the necessary data attributes from the past analysis target data stored in the analysis target data storage unit 2, and propose to the user to use the second analysis target data To do. The existing data utilization proposing unit 101 is realized as a function of the processor 20 by the processor 20 illustrated in FIG. 3 executing a software program stored in the memory 21.

　分析事例格納部３には、ユーザが当初選定していた分析対象データと、当該分析対象データの分析目的及びデータ属性と、分析手法とが、分析事例として格納されている。また、分析事例格納部３には、既存データ活用提案部１０１により提案されたことによってユーザが追加選定した分析対象データも分析事例として格納されている。分析対象データは、選定タイミング別にフラグを付けて分析事例格納部３に保存されても良い。 The analysis case storage unit 3 stores the analysis target data initially selected by the user, the analysis purpose and data attributes of the analysis target data, and the analysis technique as analysis cases. The analysis case storage unit 3 also stores analysis target data additionally selected by the user as proposed by the existing data utilization proposal unit 101 as an analysis case. The analysis target data may be stored in the analysis case storage unit 3 with a flag for each selection timing.

　以上に説明した以外のデータ分析手法候補決定装置１５の構成は、実施の形態１に係るデータ分析手法候補決定装置１１の構成と同様である。 The configuration of the data analysis method candidate determination device 15 other than that described above is the same as the configuration of the data analysis method candidate determination device 11 according to the first embodiment.

　＜Ｄ－２．動作＞
　図１８は、データ分析手法候補決定装置１５の動作を示すフローチャートである。図１８のフローチャートにおいてステップＳ１１～１５、Ｓ１６は実施の形態１と同様であるが、ステップＳ１５とステップＳ１６の間に新たなステップＳ１９が追加される点が実施の形態１とは異なる。分析手法候補決定部４で分析対象データの分析手法候補が決定されると（ステップＳ１５）、ステップＳ１３で取得した分析対象データのデータ属性が当該分析手法候補を実行するために必要なデータ属性として不足する場合に、既存データ活用提案部１０１が分析対象データの追加を提案する（ステップＳ１９）。 <D-2. Operation>
FIG. 18 is a flowchart showing the operation of the data analysis technique candidate determination device 15. In the flowchart of FIG. 18, steps S11 to S15 and S16 are the same as in the first embodiment, but differ from the first embodiment in that a new step S19 is added between steps S15 and S16. When the analysis method candidate of the analysis target data is determined by the analysis method candidate determination unit 4 (step S15), the data attribute of the analysis target data acquired in step S13 is used as a data attribute necessary for executing the analysis method candidate. If it is insufficient, the existing data utilization proposing unit 101 proposes to add analysis target data (step S19).

　図１９は、図１８のステップＳ１９における既存データ活用提案部１０１の動作を示すフローチャートである。 FIG. 19 is a flowchart showing the operation of the existing data utilization proposing unit 101 in step S19 of FIG.

　まず、既存データ活用提案部１０１は、図１８のステップＳ１１で選択された分析対象データ（第１の分析対象データ）が、ステップＳ１５で決定された分析手法候補を実行するために必要なデータ属性を有しているか否かを判断する（ステップＳ１９１）。ここで、分析対象データが必要なデータ属性を有していない場合として、以下の３つの場合が例示される。１つ目は、分析対象データそのものが欠落している場合である。２つ目は、必要なデータ属性として規定されたデータの取得間隔に対して分析対象データの取得間隔が粗く、十分な分析結果が得られない場合である。３つ目は、必要なデータ属性として規定されたデータの取得方法に分析対象データの取得方法が適合せず、十分な分析結果が得られない場合である。例えば、センサ等で直接計測したデータであることが要求されているにも関わらず、分析対象データが加工値である場合等が３つ目のケースに該当する。 First, the existing data utilization proposing unit 101 uses the data attributes necessary for the analysis target data (first analysis target data) selected in step S11 of FIG. 18 to execute the analysis method candidates determined in step S15. It is judged whether it has (step S191). Here, as the case where the analysis target data does not have the necessary data attribute, the following three cases are exemplified. The first is a case where the analysis target data itself is missing. The second case is a case where the data acquisition interval of the analysis target data is rough with respect to the data acquisition interval specified as the necessary data attribute, and a sufficient analysis result cannot be obtained. The third is a case where the data acquisition method specified as the necessary data attribute does not match the data acquisition method of the analysis target, and sufficient analysis results cannot be obtained. For example, the third case corresponds to the case where the analysis target data is a processed value even though the data is directly measured by a sensor or the like.

　分析対象データ（第１の分析対象データ）が、分析手法候補を実行するために必要なデータ属性を有している場合、既存データ活用提案部１０１は処理を終了する。一方、分析対象データ（第１の分析対象データ）が、分析手法候補を実行するために必要なデータ属性を有していない場合、既存データ活用提案部１０１はステップＳ１９２の処理に移行する。 If the analysis target data (first analysis target data) has a data attribute necessary for executing the analysis method candidate, the existing data utilization proposing unit 101 ends the process. On the other hand, when the analysis target data (first analysis target data) does not have a data attribute necessary for executing the analysis method candidate, the existing data utilization proposal unit 101 proceeds to the process of step S192.

　ステップＳ１９２で既存データ活用提案部１０１は、分析事例格納部３に格納された分析事例の中から、分析手法候補と同一もしくは分析手法候補を含む分析手法を用い、かつ分析目的が同一もしくは類似した分析事例を抽出する。 In step S192, the existing data utilization proposing unit 101 uses an analysis method that is the same as or includes an analysis method candidate from the analysis cases stored in the analysis case storage unit 3, and the analysis purpose is the same or similar. Extract analysis cases.

　次に、既存データ活用提案部１０１は、抽出された分析事例における分析済みデータのデータ属性と、ユーザが現在選定している分析対象データのデータ属性とを比較し、分析済みデータのデータ属性から分析手法候補の実行に必要なデータ属性を抽出する（ステップＳ１９３）。この際、データ属性としてデータに対するアクセス権限が設定されておりユーザがアクセス権限を保有していないデータ、またはデータ属性としてデータの活用条件が設定されておりデータ出典元との契約によりデータの流用に制限があるデータ等のデータ属性は、抽出から除外してもよい。また、この場合、アクセス権限またはデータの流用に関する制限情報を付与してデータ属性のみ提示してもよい。 Next, the existing data utilization proposing unit 101 compares the data attribute of the analyzed data in the extracted analysis example with the data attribute of the analysis target data currently selected by the user, and determines the data attribute of the analyzed data. Data attributes necessary for executing the analysis technique candidate are extracted (step S193). At this time, data access authority is set as the data attribute and the user does not have access authority, or data utilization conditions are set as the data attribute and the data is diverted by contract with the data source. Data attributes such as restricted data may be excluded from extraction. In this case, only the data attribute may be presented by giving access authority or restriction information on data diversion.

　そして、既存データ活用提案部１０１は、ステップＳ１９３で抽出されたデータ属性を保有する分析対象データが分析対象データ格納部２に存在すれば、当該抽出されたデータ属性を保有する分析対象データ（第２の分析対象データ）の活用、すなわち第２の分析対象データを現在選択中の分析対象データ（第１の分析対象データ）に追加して分析を行うようユーザに提案する（ステップＳ１９４）。例えば、ユーザが分析対象データ「Ａ県Ｂ市Ｃ町Ｄ丁目に存在する一般家庭の消費電力量」を分析対象データ「分析対象期間の平日／休日区分」を追加して分析する際に、分析手法候補として「ｋ－ｍｅａｎｓ法」が提示され、ユーザが当該分析手法候補を用いることを決定したとする。この際に、分析事例格納部３に、別のユーザが「ｋ－ｍｅａｎｓ法」を用いて、分析対象データ「ビルの消費電力量」を分析対象データ「分析対象期間の平日／休日区分」と「分析対象期間の気象観測データ」と「分析対象期間の従業員のビル内入退出履歴」を追加して分析した事例が存在したとする。ただし、分析対象データ「分析対象期間の従業員のビル内入退出履歴」にはデータ属性としてデータの二次利用が不可であることが示されているものとする。その場合、ステップＳ１９４で既存データ活用提案部１０１は、分析対象データ「分析対象期間の気象観測データ」を追加利用するようユーザに提案してもよい。このとき、既存データ活用提案部１０１は、分析対象データ「分析対象期間の気象観測データ」と分析対象データ「分析対象期間の従業員のビル内入退出履歴」の追加利用が望ましいが、分析対象データ「分析対象期間の従業員のビル内入退出履歴」のデータ属性としてデータの二次利用が不可であることが示されていることを、ユーザに提示してもよい。 Then, if the analysis target data having the data attribute extracted in step S193 exists in the analysis target data storage unit 2, the existing data utilization proposing unit 101 determines that the analysis target data having the extracted data attribute (first 2 analysis target data), that is, the second analysis target data is added to the currently selected analysis target data (first analysis target data), and the user is proposed to perform the analysis (step S194). For example, when the user analyzes the analysis target data “the power consumption of a general household existing in A prefecture B city C town D chome” by adding the analysis target data “weekday / holiday classification of the analysis target period” Assume that the “k-means method” is presented as a method candidate, and the user decides to use the analysis method candidate. At this time, another user uses the “k-means method” in the analysis case storage unit 3 to change the analysis target data “building energy consumption” into the analysis target data “weekday / holiday classification of the analysis target period”. Suppose that there was a case in which “meteorological observation data during the analysis period” and “employee entry / exit history during the analysis period” were added and analyzed. However, it is assumed that the analysis target data “employee entry / exit history of the analysis target period in the building” indicates that secondary use of the data is impossible as a data attribute. In that case, the existing data utilization proposing unit 101 may propose to the user to additionally use the analysis target data “meteorological observation data in the analysis target period” in step S194. At this time, the existing data utilization proposing unit 101 preferably uses the analysis target data “meteorological observation data during the analysis target period” and the analysis target data “employee entry / exit history of the analysis target period”. It may be shown to the user that the secondary use of data is indicated as a data attribute of the data “history of entry / exit of employees in the analysis period”.

　なお、上記では、分析対象データが分析手法候補を適用するために必要なデータ属性を有していない場合として３つの場合を例示し、このような場合に分析対象データの追加を提案することについて説明した。しかし、分析対象データが分析手法候補を適用するために必要なデータ属性を有している場合であっても、以下のような場合には分析対象データの追加を提案しても良い。１つ目は、必要なデータ属性は有しているものの、最良の結果が得られない条件の分析対象データが選択されている場合である。２つ目は、現在選択されている分析対象データでも分析は可能だが、新たな分析対象データを追加することで、さらに正確な分析結果が得られる場合である。 In the above, three cases are exemplified as cases where the analysis target data does not have the data attribute necessary for applying the analysis method candidate, and in such a case, the addition of the analysis target data is proposed. explained. However, even if the analysis target data has a data attribute necessary for applying the analysis method candidate, addition of the analysis target data may be proposed in the following cases. The first is a case where analysis target data having a necessary data attribute but a condition that does not provide the best result is selected. The second is a case where analysis is possible even with the currently selected analysis target data, but more accurate analysis results can be obtained by adding new analysis target data.

　＜Ｄ－３．効果＞
　実施の形態４に係るデータ分析手法候補決定装置１５は、第１の分析対象データに対して分析手法候補決定部４が決定した分析手法に必要なデータ属性を、第１の分析対象データが持たない場合に、必要なデータ属性を有する第２の分析対象データの活用を提案する既存データ活用提案部１０１を備える。このように、分析手法候補の実施に必要なデータ属性を有する別の分析対象データの追加を提案することで、分析手法候補を実行した場合の分析精度を向上させることが可能となる。 <D-3. Effect>
In the data analysis method candidate determination device 15 according to the fourth embodiment, the first analysis target data has data attributes necessary for the analysis method determined by the analysis method candidate determination unit 4 with respect to the first analysis target data. If not, an existing data utilization proposing unit 101 that proposes utilization of second analysis target data having necessary data attributes is provided. Thus, by proposing the addition of another analysis target data having data attributes necessary for the execution of the analysis method candidate, it is possible to improve the analysis accuracy when the analysis method candidate is executed.

　また、第２の分析対象データはデータの流用可否に関するデータ属性を有し、既存データ活用提案部１０１は、第２の分析対象データの活用をユーザに提案する際に、分析済データの流用可否に関する情報をユーザに提供する。従って、ユーザは既存データ活用提案部１０１に提案された第２の分析対象データが流用不可のデータである場合には、流用可能な代替データの入手を検討することができ、代替データを追加することにより、分析手法候補を実行した場合の分析精度を向上させることが可能となる。 In addition, the second analysis target data has a data attribute related to whether or not the data can be diverted. When the existing data utilization proposing unit 101 proposes utilization of the second analysis target data to the user, whether or not the analyzed data can be diverted. Information to users. Accordingly, when the second analysis target data proposed to the existing data utilization proposing unit 101 is data that cannot be diverted, the user can consider obtaining alternative data that can be diverted, and add alternative data. Thus, it is possible to improve the analysis accuracy when the analysis method candidate is executed.

　＜Ｅ．実施の形態５＞
　＜Ｅ－１．構成＞
　図２０は、実施の形態５に係るデータ分析手法候補決定装置１６の構成を示すブロック図である。データ分析手法候補決定装置１６は、実施の形態１に係るデータ分析手法候補決定装置１１の構成に加えて、分析手法見直し提案部１０２を備えている。 <E. Embodiment 5>
<E-1. Configuration>
FIG. 20 is a block diagram illustrating a configuration of the data analysis technique candidate determination device 16 according to the fifth embodiment. The data analysis method candidate determination device 16 includes an analysis method review proposal unit 102 in addition to the configuration of the data analysis method candidate determination device 11 according to the first embodiment.

　分析手法見直し提案部１０２は、分析事例格納部３に格納されている分析事例について、分析目的が同一もしくは類似した事例が追加された際に、分析手法毎の採用率を演算し、事前に設定した分析手法見直し条件を満たす採用率の分析手法が検出された場合に、ユーザに対して分析手法の変更を提案する。分析手法見直し提案部１０２は、図３に示すプロセッサ２０がメモリ２１に格納されたソフトウェアプログラムを実行することにより、プロセッサ２０の機能として実現する。 The analysis method review proposal unit 102 calculates the adoption rate for each analysis method when a case with the same or similar analysis purpose is added to the analysis case stored in the analysis case storage unit 3 and sets it in advance. If an analysis method of the employment rate that satisfies the analysis method review condition is detected, the analysis method is proposed to the user. The analysis technique review proposal unit 102 is implemented as a function of the processor 20 by the processor 20 illustrated in FIG. 3 executing a software program stored in the memory 21.

　分析事例格納部３には、分析事例と共に、分析事例を登録または更新したユーザの情報、分析事例の問い合わせ担当者の情報、分析手法の開発者または提供者の情報、分析事例の現在の活用状況等が格納されることが望ましい。分析事例の現在の活用状況には、製品適用済、試行中、または中止等の使用状況のほか、外部事例等が含まれていてもよい。 In the analysis case storage unit 3, the information of the user who registered or updated the analysis case, the information of the person in charge of the analysis case, the information of the developer or the provider of the analysis method, the current use state of the analysis case Etc. are preferably stored. The current utilization status of analysis cases may include external cases, etc. in addition to the usage status of product application, trial, or cancellation.

　以上に説明した以外のデータ分析手法候補決定装置１６の構成は、実施の形態１に係るデータ分析手法候補決定装置１１の構成と同様である。 The configuration of the data analysis method candidate determination device 16 other than that described above is the same as the configuration of the data analysis method candidate determination device 11 according to the first embodiment.

　＜Ｅ－２．動作＞
　図２１は、データ分析手法候補決定装置１６の動作を示すフローチャートである。ステップＳ１１～１６は実施の形態１と同様であるが、ステップＳ１６の後に新たなステップＳ２０が追加される点が実施の形態１とは異なる。分析手法候補決定部４で分析対象データの分析手法候補が決定され（ステップＳ１５）、ユーザに分析手法候補を提示すると（ステップＳ１６）、分析目的と分析手法毎の平均類似度を分析手法見直し提案部１０２に通知し、分析手法見直し提案部１０２が分析事例格納部３に格納された過去の分析事例に対して、分析手法の見直し提案要否を判定する（ステップＳ２０）。 <E-2. Operation>
FIG. 21 is a flowchart showing the operation of the data analysis technique candidate determination device 16. Steps S11 to S16 are the same as in the first embodiment, but differ from the first embodiment in that a new step S20 is added after step S16. The analysis method candidate determination unit 4 determines the analysis method candidate of the analysis target data (step S15), and when the analysis method candidate is presented to the user (step S16), the analysis method and the average similarity for each analysis method are proposed to review the analysis method. The analysis method review proposal unit 102 determines whether or not the analysis method review proposal is necessary for the past analysis cases stored in the analysis case storage unit 3 (step S20).

　図２２は、図２０のステップＳ２０における分析手法見直し提案部１０２の動作を示すフローチャートである。 FIG. 22 is a flowchart showing the operation of the analysis technique review proposing unit 102 in step S20 of FIG.

　まず、分析手法見直し提案部１０２は、分析目的と、図２１のステップＳ１５で分析手法候補決定部４が算出した分析手法毎の平均類似度を受信する（ステップＳ２０１）。続いて、分析手法が見直し基準に達しているか否かを判定する（ステップＳ２０２）。見直し基準は、例えば平均類似度が閾値を超えているまたは閾値以下となっていることである。また、分析手法見直し提案部１０２が分析手法毎の平均類似度の受信履歴を一定期間分もしくは一定受信件数分等保持しておき、分析手法毎の受信率が閾値を超えている場合、あるいは受信日時と平均類似度との相関が一定期間以上増加傾向または減少傾向を示している場合などに、見直し基準に達していると判断しても良い。分析手法が見直し基準に達していなければ、分析手法見直し提案部１０２は処理を終了する。一方、分析手法が見直し基準に達していれば、分析手法見直し提案部１０２はステップＳ２０３の処理に移行する。 First, the analysis method review proposing unit 102 receives the analysis purpose and the average similarity for each analysis method calculated by the analysis method candidate determining unit 4 in step S15 of FIG. 21 (step S201). Subsequently, it is determined whether or not the analysis method has reached the review standard (step S202). The review criteria are, for example, that the average similarity exceeds or is below the threshold. In addition, the analysis method review proposal unit 102 holds the reception history of the average similarity for each analysis method for a certain period or a certain number of receptions, and the reception rate for each analysis method exceeds the threshold, or the reception When the correlation between the date and the average similarity shows an increasing tendency or a decreasing tendency for a certain period or more, it may be determined that the review standard has been reached. If the analysis technique does not reach the review criteria, the analysis technique review proposal unit 102 ends the process. On the other hand, if the analysis technique has reached the review criteria, the analysis technique review proposal unit 102 proceeds to the process of step S203.

　ステップＳ２０３において分析手法見直し提案部１０２は、ステップＳ２０１で受信した分析目的と同一もしくは類似する過去の分析事例を、分析事例格納部３から抽出する。この時に、登録日時または更新日時の新しい事例からＮ件（例えば、Ｎ＝１０００）を抽出するというように、抽出件数を限定しても良い。また、登録日時または更新日時が直近のＮ年（例えば、Ｎ＝５）の分析事例のみを抽出するというように、抽出期間を限定してもよい。 In step S203, the analysis technique review proposing unit 102 extracts a past analysis case that is the same as or similar to the analysis purpose received in step S201 from the analysis case storage unit 3. At this time, the number of extractions may be limited such that N cases (for example, N = 1000) are extracted from the new cases of registration date and update date. In addition, the extraction period may be limited such that only the analysis cases in the N years (for example, N = 5) with the latest registration date or update date are extracted.

　次に、抽出された分析事例で用いられている分析手法の採用率を算出する（ステップＳ２０４）。採用率Ｐは、例えばＰ＝Ｎｘ／Ｎにより算出することができる。ただし、Ｎ：抽出件数、Ｎ_ｘ：手法Ｘの採用数とする。このとき、分析事例格納部に分析事例の現在の活用状況が格納されている場合には、活用状況に応じて分析事例に重み付けを行っても良い。すなわち、製品適用済みの分析事例については重みを大きくし、製品化中止となった分析事例等については重みを小さくする。あるいは、分析事例の登録日時もしくは更新日時に応じて重みづけを行ってもよい。すなわち、登録日時もしくは更新日時が新しい分析事例ほど重みを大きくし、登録日時もしくは更新日時が古い分析事例ほど重みを小さくする。 Next, the adoption rate of the analysis technique used in the extracted analysis case is calculated (step S204). The adoption rate P can be calculated by, for example, P = Nx / N. However, N is the number of extractions, and N _{x is the} number of methods X adopted. At this time, when the current utilization state of the analysis case is stored in the analysis case storage unit, the analysis case may be weighted according to the utilization state. That is, the weight is increased for the analysis example that has been applied to the product, and the weight is decreased for the analysis example that has been commercialized. Or you may weight according to the registration date of an analysis example, or an update date. That is, an analysis case with a new registration date or update date / time has a higher weight, and an analysis example with a new registration date / update date / time has a lower weight.

　次に、分析手法見直し提案部１０２は、採用率が分析手法見直し条件に該当する分析手法があれば、分析事例の見直しを提案する（ステップＳ２０５）。例えば、クラスタリング手法の中で、Ｋ－ｍｅａｎｓ法の採用率が閾値を超えた場合には、Ｋ－ｍｅａｎｓ法を使用していない分析事例の登録・更新ユーザ、担当者、分析手法の開発者もしくは提供者等（以下、単に「ユーザ等」と称する）に、分析手法をＫ－ｍｅａｎｓ法に見直すことを提案する。あるいは、クラスタリング手法の中で、Ｋ－ｍｅａｎｓ法の採用率が基準値を下回ると、Ｋ－ｍｅａｎｓ法を使用している分析事例のユーザ等に、分析手法をＫ－ｍｅａｎｓ法とは異なる手法に見直すことを提案する。この場合、ユーザ等に分析手法を採用率の高いものから順に、採用率と共に示したリストを提示しても良い。 Next, if there is an analysis method whose adoption rate meets the analysis method review condition, the analysis method review proposal unit 102 proposes review of the analysis example (step S205). For example, if the adoption rate of the K-means method exceeds the threshold in the clustering method, the registered / updated user of the analysis case not using the K-means method, the person in charge, the developer of the analysis method, or It is proposed that the analysis method is revised to the K-means method to providers and the like (hereinafter simply referred to as “users and the like”). Alternatively, if the adoption rate of the K-means method falls below the standard value in the clustering method, the analysis method is changed to a method different from the K-means method to users of analysis examples using the K-means method. Suggest to review. In this case, a list showing the analysis methods together with the adoption rate may be presented to the user or the like in descending order of the adoption rate.

　＜Ｅ－３．効果＞
　実施の形態５に係るデータ分析手法候補決定装置１６において、分析手法候補決定部４により分析手法が決定された分析対象データと分析目的が同一または類似する分析事例について、分析手法の見直しを提案する分析手法見直し提案部１０２を備える。このように、過去の分析事例における分析手法毎の採用率を算出し、採用率に基づいて分析手法の見直しを提案することで、過去の分析事例に対しても新しい分析手法候補等の提案を実施することができ、分析手法を実行した場合の分析精度を向上させることが可能となる。 <E-3. Effect>
In the data analysis method candidate determination device 16 according to the fifth embodiment, a review of the analysis method is proposed for an analysis example whose analysis purpose is the same as or similar to the analysis target data whose analysis method is determined by the analysis method candidate determination unit 4 An analysis method review proposal unit 102 is provided. In this way, by calculating the adoption rate for each analysis method in the past analysis cases and proposing a review of the analysis method based on the adoption rate, proposals for new analysis method candidates etc. can be made for past analysis cases. It is possible to improve the analysis accuracy when the analysis method is executed.

　なお、本発明は、その発明の範囲内において、各実施の形態を自由に組み合わせたり、各実施の形態を適宜、変形、省略したりすることが可能である。 In the present invention, it is possible to freely combine the respective embodiments within the scope of the invention, and to appropriately modify and omit the respective embodiments.

　この発明は詳細に説明されたが、上記した説明は、すべての態様において、例示であって、この発明がそれに限定されるものではない。例示されていない無数の変形例が、この発明の範囲から外れることなく想定され得るものと解される。 Although the present invention has been described in detail, the above description is illustrative in all aspects, and the present invention is not limited thereto. It is understood that countless variations that are not illustrated can be envisaged without departing from the scope of the present invention.

　２　分析対象データ格納部、３　分析事例格納部、４　分析手法候補決定部、５　入力部、６　出力部、７　評価取得部、８　推薦事例格納部、９　属性追加部、１０　モデル変更提案部、１１，１２，１３，１４，１５，１６　データ分析手法候補決定装置、２０　プロセッサ、２１　メモリ、２２　記録媒体、１０１　既存データ活用提案部、１０２　分析手法見直し提案部。 2 analysis target data storage unit, 3 analysis case storage unit, 4 analysis method candidate determination unit, 5 input unit, 6 output unit, 7 evaluation acquisition unit, 8 recommended case storage unit, 9 attribute addition unit, 10 model change proposal unit, 11, 12, 13, 14, 15, 16 Data analysis method candidate determination device, 20 processor, 21 memory, 22 recording medium, 101 existing data utilization proposal unit, 102 analysis method review proposal unit.

Claims

A data analysis method candidate determination device for determining an analysis method candidate of analysis target data to be analyzed,
An analysis case storage unit (3) for storing data associated with data attributes and analysis methods as analysis cases for each of a plurality of analyzed data that has been subjected to data analysis in the past,
About the analysis target data, an analysis target data storage unit (2) for storing data attribute information;
A data attribute similarity that is a similarity between the data attribute of the analysis target data and the data attribute of the analyzed data is calculated, and at least one analysis is performed from the analysis methods of the analyzed data based on the data attribute similarity An analysis method candidate determination unit (4) for determining a method as an analysis method candidate for the analysis target data,
Data analysis method candidate determination device.

The analysis case storage unit (3) stores analysis purpose information for each of the plurality of analyzed data,
The analysis target data storage unit (2) stores analysis purpose information of the analysis target data,
The analysis method candidate determination unit (4) calculates the similarity between the analysis purpose of the analysis target data and the analysis purpose of the analyzed data as the analysis purpose similarity, and the analysis purpose similarity and the data attribute similarity The analysis target data and the analyzed data are calculated based on the total similarity, and based on the total similarity, at least one analysis method is selected from the analysis methods of the analyzed data. As determined,
The data analysis method candidate determination device according to claim 1.

The data attribute of the analyzed data and the analysis target data includes at least one of a data acquisition interval, a data acquisition method, a result value, a predicted value, or a processed value,
The data analysis technique candidate determination apparatus according to claim 1 or 2.

The analysis technique candidate determination unit (4) calculates the analysis target similarity based on the analysis target character string of the analysis target data and the analysis target character string of the analyzed data.
The data analysis method candidate determination device according to claim 2.

The analysis technique candidate determination unit (4) calculates the analysis object similarity based on the analysis purpose of the analysis target data described in the hierarchical structure and the analysis purpose of the analyzed data described in the hierarchical structure.
The data analysis method candidate determination device according to claim 2.

When the analysis purpose of the analysis target data and the analysis purpose of the analyzed data are described in source code or intermediate code,
The analysis technique candidate determination unit (4) indicates the processing procedure shown in the source code or the intermediate code for analysis of the analysis target data and the source code or the intermediate code for the analysis of the analyzed data. The data analysis technique candidate determination device according to claim 2, wherein the similarity to the processed processing procedure is calculated as the analysis target similarity based on a matching rate or continuity of matching processing procedures.

The analysis method candidate determination unit (4) calculates, for each analysis method, an average value of the total similarity between the analyzed data using the analysis method and the analysis target data, and calculates the average of the total similarity The analysis method selected based on the value is determined as the analysis method candidate.
The data analysis method candidate determination device according to any one of claims 2, 4 to 6.

An evaluation acquisition unit (7) for acquiring user evaluation information for the analysis technique candidate;
A recommended case storage unit (8) for storing data associating data attributes of the analysis target data, the analysis method candidates of the analysis target data, and the evaluation information for the analysis method candidates as recommended cases;
Further comprising
The data analysis technique candidate determination apparatus according to any one of claims 1 to 7.

An attribute adding unit (9) for extracting the reason for non-adoption of the analysis technique candidate from the evaluation information acquired by the evaluation acquiring unit (7) and adding an item corresponding to the reason for non-adopting to the item of the data attribute In addition,
The data analysis method candidate determination device according to claim 8.

The analysis case storage unit (3) stores information on the physical model before the change for the analysis case in which the data analysis is performed using the physical model in which the user has changed the physical model,
If the analysis method candidate is an analysis method using a physical model, and the physical model used in the analysis method candidate is the same as the physical model before the change in the analysis example, a model for proposing the change of the physical model A change proposal unit (10);
The data analysis method candidate determination device according to any one of claims 1 to 9.

When the first analysis target data does not have data attributes necessary for the analysis method determined by the analysis method candidate determination unit (4) for the first analysis target data among the analysis target data. , Further comprising an existing data utilization proposing unit (101) that proposes to the user to utilize the second analysis object data having the necessary data attribute among the analysis object data,
The data analysis technique candidate determination apparatus according to any one of claims 1 to 10.

The second analysis target data has a data attribute relating to whether or not the data can be used,
The existing data utilization proposing unit (101) provides the user with information on whether or not the second analysis target data can be used when proposing the user to utilize the second analysis target data.
The data analysis method candidate determination device according to claim 11.

An analysis method review proposal unit (102) that proposes a review of the analysis method for the analysis example whose analysis purpose is the same as or similar to the analysis target data for which the analysis method candidate is determined by the analysis method candidate determination unit (4) In addition,
The data analysis method candidate determination device according to any one of claims 1 to 12.