[go: up one dir, main page]

WO2022137526A1 - Information processing program, information processing method, and information processing device - Google Patents

Information processing program, information processing method, and information processing device Download PDF

Info

Publication number
WO2022137526A1
WO2022137526A1 PCT/JP2020/048809 JP2020048809W WO2022137526A1 WO 2022137526 A1 WO2022137526 A1 WO 2022137526A1 JP 2020048809 W JP2020048809 W JP 2020048809W WO 2022137526 A1 WO2022137526 A1 WO 2022137526A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
information
evaluation
input
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2020/048809
Other languages
French (fr)
Japanese (ja)
Inventor
諒太 下山
直樹 梅田
信之 鷲尾
芳隆 末廣
主税 斎藤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to PCT/JP2020/048809 priority Critical patent/WO2022137526A1/en
Priority to JP2022570966A priority patent/JPWO2022137526A1/ja
Publication of WO2022137526A1 publication Critical patent/WO2022137526A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data

Definitions

  • the present invention relates to an information processing program, an information processing method, and an information processing device.
  • a series of data processing may be realized by combining multiple software products.
  • a method of managing the flow of data processed in an information processing system by using data lineage information representing a relationship between the data has been proposed.
  • the relationship between the physical data element and the business data element is based on the first data lineage that represents the relationship between the physical data elements and the second data lineage that represents the relationship between the business data elements. Is detected.
  • data including a value indicating the strength of the received signal when the base station device receives the signal from the terminal device or the degree of error is acquired, and the quality index of the received signal of the base station device is derived based on the acquired data.
  • a control device to be used.
  • the problem is whether or not the data output by the data processing is generated through the process intended by the user. For example, if the source of the data to be processed is not intended, the reliability of the result of data processing is lowered. However, a quality control mechanism for data output through a series of data processing by such a plurality of software products has not been established.
  • an information processing program extracts to the computer the information of the first input data which is the starting point of data processing by the plurality of software based on the history information indicating the history of the input data and the output data for each of the plurality of software. Processing that evaluates the quality of the first output data output by the data processing according to the comparison of whether or not the information of the first input data matches the information of the predetermined data input by the user. Let it run.
  • an information processing method is provided. Further, in one object, an information processing device is provided.
  • the quality of the data can be adequately evaluated.
  • FIG. 1 is a diagram illustrating an information processing apparatus according to the first embodiment.
  • the information processing device 10 has a storage unit 11 and a processing unit 12.
  • the storage unit 11 may be a volatile storage device such as a RAM (Random Access Memory) or a non-volatile storage device such as an HDD (Hard Disk Drive) or a flash memory.
  • the processing unit 12 may include a CPU (Central Processing Unit), a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array), and the like.
  • the processing unit 12 may be a processor that executes a program.
  • the "processor” here may include a set of a plurality of processors (multiprocessor).
  • the storage unit 11 stores the history information 20 indicating the history of the input data and the output data for each of the plurality of software.
  • Provenance information 20 is obtained, for example, from an information processing system (not shown) that executes a plurality of software.
  • the information processing apparatus 10 may be connected to the information processing system via a network, acquire the history information 20 from the information processing system via the network, and store the history information 20 in the storage unit 11.
  • the history information 20 may be input to the information processing apparatus 10 by the user and stored in the storage unit 11.
  • Provenance information 20 may be data lineage information showing the history of data source, data processing, shaping, and the like.
  • the processing unit 12 extracts the information of the first input data which is the starting point of the data processing by the plurality of software based on the history information 20 stored in the storage unit 11.
  • the processing unit 12 determines the quality of the first output data output by the data processing according to the comparison of whether or not the information of the first input data matches the information of the predetermined data input by the user. Make an evaluation.
  • identification information such as a data name of the first input data, identification information of the data storage unit 41 storing the first input data, and the like can be considered.
  • the identification information of the data storage unit 41 may be, for example, the DB name or directory path of the DB (Database) corresponding to the data storage unit 41, or the identification name of the storage device that provides the data storage unit 41.
  • the information of the predetermined data is information about the data assumed as the input data by the user, such as the identification information of the data assumed as the original input by the user, the identification information of the data storage unit for storing the data, and the like. Can be considered.
  • the history information 20 includes information related to data processing by software 31 and 32 executed in the information processing system.
  • the data used for the data processing are data d1, d2, d3.
  • the data d1 stored in the data storage unit 41 is input to the software 31, the data d2 is generated by the processing p1 of the software 31 based on the data d1, and the data d2 is stored in the data storage unit 42. Indicates that it will be stored.
  • the data d2 stored in the data storage unit 42 is input to the software 32, the data d3 is generated by the processing p2 of the software 32 based on the data d2, and the data d3 is stored in the data storage unit 43. Indicates that it will be stored.
  • the data storage units 41, 42, and 43 may be realized by a storage device included in the information processing system, or may be realized by a storage device external to the information processing system.
  • the processing unit 12 evaluates the quality of data as follows based on the history information 20. First, the processing unit 12 identifies the data d3 output by the data processing by the software 31 and 32 based on the history information 20. Then, the processing unit 12 specifies that the generation source data of the data d3 is the data d2 based on the history information 20. Further, the processing unit 12 specifies that the generation source data of the data d2 is the data d1 based on the history information 20.
  • the processing unit 12 detects that there is no source data in the previous stage of the data d1 based on the history information 20. Therefore, the processing unit 12 specifies that the input data at the start point of the data processing that outputs the data d3 is the data d1. Therefore, the processing unit 12 extracts the information of the data d1 as the information of the input data of the starting point for the data d3 based on the history information 20 (step S1).
  • the data d1 is an example of the first input data.
  • the processing unit 12 evaluates the quality of the data d3 according to the comparison of whether or not the information of the data d1 matches the information of the predetermined data (step S2). For example, the processing unit 12 may compare whether or not the identification information of the data d1 matches the identification information of the predetermined data assumed by the user. In addition to or instead of comparing the data identification information, the processing unit 12 uses the identification information of the data storage unit 41 in which the data d1 is stored to be the identification information of the predetermined data storage unit assumed by the user. You may make a comparison as to whether or not they match.
  • the processing unit 12 evaluates the quality of the data d3 higher than the case where the information does not match. For example, the processing unit 12 may provide the data d3 with a quality index value indicating that the larger the value, the higher the quality. In this case, the processing unit 12 adds a predetermined value to the index value when the information of the data d1 matches the information of the predetermined data, and prevents addition to the index value when the information does not match.
  • the processing unit 12 When comparing a plurality of items included in the information of the data d1 with a plurality of items included in the information of the predetermined data, the processing unit 12 sets the index value according to whether or not the plurality of items completely match.
  • a predetermined value may be added, or the value to be added may be changed according to the number of matching items.
  • the processing unit 12 may extract other data in addition to the data d1 as the input data of the starting point for the data d3.
  • the quality of the data d3 is evaluated according to the comparison of whether or not the information of each of the input data of the plurality of start points matches the information of the predetermined data.
  • the information of the predetermined data may include information of a plurality of data assumed by the user.
  • processing unit 12 may obtain the quality evaluation value by the point deduction method instead of the point addition method. Further, the quality index value may indicate that the smaller the value, the higher the quality. In that case, the above “addition” may be read as "subtraction".
  • the information of the first input data which is the starting point of the data processing by the plurality of software is extracted based on the history information indicating the history of the input data and the output data for each of the plurality of software. ..
  • the quality of the first output data output by the data processing is evaluated according to the comparison of whether or not the information of the first input data matches the information of the predetermined data input by the user.
  • the quality of the data output by the data processing changes depending on whether or not the data intended by the user is processed. For example, when performing data processing such as analysis, if input data not intended by the user is processed, the input data may contain unnecessary information or incorrect information, resulting in data processing. Is more likely to be wrong, which reduces the reliability of the result.
  • the information processing apparatus 10 identifies the data d1 at the start point of data processing by the plurality of software based on the history information 20, and confirms whether or not the data d1 is the input intended by the user.
  • the quality of the data d3 output by the processing can be appropriately evaluated.
  • the processing unit 12 can perform the above quality evaluation on a plurality of output data including the data d3 output by the information processing system.
  • the processing unit 12 may display the result of quality evaluation for each of the plurality of output data including the data d3 on the display device.
  • the processing unit 12 may display a data flow diagram from the input data of the start point corresponding to each output data to the output data on the display device for each output data.
  • the processing unit 12 may highlight the data flow diagram relating to the output data whose quality is evaluated to be lower than the standard. In this way, it is possible to support the user in reviewing the data flow in the information processing system.
  • FIG. 2 is a diagram showing a system example of the second embodiment.
  • the system of the second embodiment includes an information processing system 50 and an information processing device 100.
  • the information processing system 50 and the information processing apparatus 100 are connected to the network 60.
  • the network 60 may be the Internet, a WAN (Wide Area Network), or a LAN (Local Area Network).
  • the information processing system 50 executes various software and executes various data processing in which a plurality of software are combined.
  • the software may be referred to as a product or a software product.
  • the information processing system 50 includes server devices 200, 300, .... Server devices 200, 300, ... Each is a server computer that executes software.
  • the server devices 200, 300, ... Each are connected to the internal network of the information processing system 50.
  • Each of the server devices 200, 300, ... may execute a plurality of software.
  • software executed on one server device may be linked with other software executed on another server device.
  • the information processing system 50 may include a storage device that stores data and is used for exchanging data between server devices.
  • the information processing device 100 is a server computer that utilizes data lineage information in the information processing system 50 to perform data quality evaluation across a plurality of software in the information processing system 50.
  • the information processing device 100 is an example of the information processing device 10 of the first embodiment.
  • FIG. 3 is a diagram showing a hardware example of the information processing device.
  • the information processing device 100 includes a CPU 101, a RAM 102, an HDD 103, a GPU (Graphics Processing Unit) 104, an input interface 105, a medium reader 106, and a NIC (Network Interface Card) 107.
  • the CPU 101 is an example of the processing unit 12 of the first embodiment.
  • the RAM 102 or the HDD 103 is an example of the storage unit 11 of the first embodiment.
  • the CPU 101 is a processor that executes a program instruction.
  • the CPU 101 loads at least a part of the programs and data stored in the HDD 103 into the RAM 102 and executes the program.
  • the CPU 101 may include a plurality of processor cores.
  • the information processing device 100 may have a plurality of processors. The processes described below may be performed in parallel using multiple processors or processor cores. Also, a set of multiple processors may be referred to as a "multiprocessor" or simply a "processor".
  • the RAM 102 is a volatile semiconductor memory that temporarily stores a program executed by the CPU 101 and data used by the CPU 101 for calculation.
  • the information processing apparatus 100 may include a type of memory other than the RAM, or may include a plurality of memories.
  • the HDD 103 is a non-volatile storage device that stores software programs such as an OS (Operating System), middleware, and application software, and data.
  • the information processing device 100 may be provided with other types of storage devices such as a flash memory and an SSD (Solid State Drive), or may be provided with a plurality of non-volatile storage devices.
  • the GPU 104 outputs an image to the display 61 connected to the information processing apparatus 100 in accordance with a command from the CPU 101.
  • the display 61 any kind of display such as a CRT (Cathode RayTube) display, a liquid crystal display (LCD: Liquid Crystal Display), a plasma display, and an organic EL (OEL: Organic Electro-Luminescence) display can be used.
  • the input interface 105 acquires an input signal from the input device 62 connected to the information processing device 100 and outputs the input signal to the CPU 101.
  • a pointing device such as a mouse, a touch panel, a touch pad, a trackball, a keyboard, a remote controller, a button switch, or the like can be used. Further, a plurality of types of input devices may be connected to the information processing apparatus 100.
  • the medium reader 106 is a reading device that reads programs and data recorded on the recording medium 63.
  • the recording medium 63 for example, a magnetic disk, an optical disk, a magneto-optical disk (MO: Magneto-Optical disk), a semiconductor memory, or the like can be used.
  • the magnetic disk includes a flexible disk (FD: Flexible Disk) and an HDD.
  • Optical discs include CDs (Compact Discs) and DVDs (Digital Versatile Discs).
  • the medium reader 106 copies, for example, a program or data read from the recording medium 63 to another recording medium such as the RAM 102 or the HDD 103.
  • the read program is executed by, for example, the CPU 101.
  • the recording medium 63 may be a portable recording medium and may be used for distribution of programs and data. Further, the recording medium 63 and the HDD 103 may be referred to as a computer-readable recording medium.
  • NIC107 is an interface that is connected to the network 60 and communicates with other computers via the network 60.
  • the NIC 107 is connected to a communication device such as a switch or a router by a cable.
  • FIG. 4 is a diagram showing an example of software in an information processing system.
  • the information processing system 50 includes data acquisition software 51, data processing software 52, data storage unit 53, and data shaping software 54.
  • the software mentioned here is an example, and the information processing system 50 may have other software that performs other processing in place of or in addition to at least a part of these software. ..
  • the data storage unit 53 is realized by a storage device included in the information processing system 50.
  • the data acquisition software 51, the data processing software 52, and the data shaping software 54 acquire the data processed by the software in the previous stage in this order, and output the processing result data.
  • the data acquisition software 51 acquires the input data A1 from the input data storage unit 70 and provides it to the data processing software 52.
  • the data processing software 52 generates the accumulated data A2 from the input data A1 by the data processing processing s1.
  • the data processing software 52 stores the stored data A2 in the data storage unit 53.
  • the data acquisition software 51 acquires the input data B1 from the input data storage unit 71 and provides it to the data processing software 52.
  • the data processing software 52 generates the accumulated data B2 from the input data B1 by the data processing processing s2.
  • the data processing software 52 stores the stored data B2 in the data storage unit 53.
  • the number of input data used to generate a certain stored data may be plural.
  • the data shaping software 54 acquires the stored data A2 and B2 stored in the data storage unit 53.
  • the data shaping software 54 generates the utilization data A3 from the accumulated data A2 by the data shaping process s3, and stores the utilization data A3 in the utilization data storage unit 80.
  • the data shaping software 54 generates the utilization data AB from the accumulated data A2 and B2 by the data shaping process s4, and stores the utilization data AB in the utilization data storage unit 80.
  • the data shaping software 54 generates the utilization data AB from the accumulated data A2 and B2 by the data shaping process s5, and stores the utilization data AB in the utilization data storage unit 80. It can be said that the utilization data is output data that is finally output by a series of data processing.
  • At least a part of the input data storage units 70 and 71 and the utilization data storage unit 80 may be realized by the storage device included in the information processing system 50. Further, at least a part of the input data storage units 70 and 71 and the utilization data storage unit 80 may be realized by a storage device external to the information processing system 50 and accessible from the information processing system 50 via the network 60. .. Further, since the "accumulated data" is intermediate data for creating "utilization data” from “input data", it can also be called “intermediate data”. Further, the "data” may be a unit of information called a file, a table, a record, or the like.
  • FIG. 5 is a diagram showing a functional example of the information processing apparatus.
  • the information processing apparatus 100 includes a storage unit 110, a history information analysis unit 130, an evaluation unit 140, and a display control unit 150.
  • the storage area of the RAM 102 or the HDD 103 is used for the storage unit 110.
  • the history information analysis unit 130, the evaluation unit 140, and the display control unit 150 are realized by executing the program stored in the RAM 102 by the CPU 101.
  • the storage unit 110 stores information used for processing of the history information analysis unit 130, the evaluation unit 140, and the display control unit 150.
  • the information stored in the storage unit 110 includes the history information.
  • Provenance information is acquired from the information processing system 50 by the user and stored in the storage unit 110.
  • the history information may be created by analyzing the query execution log acquired from the information processing system 50 by the history information analysis unit 130 and stored in the storage unit 110.
  • the history information analysis unit 130 analyzes the history information stored in the storage unit 110.
  • the history information analysis unit 130 has an input data extraction unit 131, an access authority prediction unit 132, and a delay time calculation unit 133.
  • the input data extraction unit 131 extracts the input data of the start point of the data processing realized by the plurality of software in the information processing system 50 based on the history information, and stores the information of the input data of the start point in the storage unit 110. ..
  • the input data extraction unit 131 extracts the input data of the starting point corresponding to the utilization data by tracing back the generation flow of the utilization data based on the history information.
  • the access authority prediction unit 132 predicts the access authority of the accumulated data generated based on the input data and the utilization data generated based on the accumulated data based on the history information and the information of the input data of the start point. ..
  • the access authority prediction unit 132 stores the predicted access authority information in the storage unit 110.
  • the access authority prediction unit 132 specifies processing contents such as data processing and data shaping based on the history information.
  • the access authority prediction unit 132 predicts the access authority of the data output by the data processing or the data shaping based on the access authority of the data which is the input of the data processing or the data shaping and the specified processing content.
  • the delay time calculation unit 133 calculates the delay time from input to output in a series of data processing based on the history information and the data generation log processed by the information processing system 50, and stores the calculated delay time information. It is stored in the unit 110.
  • the evaluation unit 140 evaluates the quality of the data generated by the data processing using the plurality of software in the information processing system 50 based on the analysis result by the history information analysis unit 130.
  • the evaluation unit 140 evaluates the quality of data according to the following three evaluation types.
  • the first evaluation type is data history.
  • the evaluation unit 140 evaluates the data history based on the information of the input data of the starting point extracted by the input data extraction unit 131. Specifically, the evaluation unit 140 evaluates the data history based on whether or not the input data of the starting point matches the data intended by the user.
  • the second evaluation type is security.
  • the evaluation unit 140 evaluates the security of the data based on the prediction result of the access authority by the access authority prediction unit 132. Specifically, the evaluation unit 140 determines whether or not the access authority of the data generated by the information processing system 50 is an appropriate access authority predicted from the access authority of the data from which the data is generated. Evaluate the security of.
  • the third evaluation type is up-to-date.
  • the evaluation unit 140 evaluates the latestness of the data based on the data generation delay time calculated by the delay time calculation unit 133. Specifically, the evaluation unit 140 evaluates the up-to-dateness of the data based on whether or not the data generated by the information processing system 50 is generated within an allowable delay time from the generation of the generation source data.
  • the evaluation unit 140 gives evaluation values for each evaluation type of data history, security, and up-to-dateness.
  • the evaluation value is an index showing the degree of high quality. The larger the evaluation value, the higher the quality, and the smaller the evaluation value, the lower the quality.
  • a predetermined value such as "1" may be set as an upper limit for the evaluation value for each evaluation type.
  • the evaluation unit 140 stores the evaluation value assigned to each data for each evaluation type in the storage unit 110. Further, the evaluation unit 140 performs a comprehensive evaluation of the utilization data based on the evaluation value for each effect item for each data, and stores the result of the comprehensive evaluation in the storage unit 110. The result of the comprehensive evaluation of the utilization data is also used as the result of the evaluation for the data processing that generated the utilization data.
  • the display control unit 150 displays the evaluation result by the evaluation unit 140 on the display 61, or transmits the evaluation result to another computer via the network 60.
  • the display control unit 150 outputs a data flow diagram in the information processing system 50, and controls the display mode of the icon indicating the data in the data flow diagram according to the evaluation value of the data. For example, the display control unit 150 distinguishes and displays data having an evaluation value higher than the reference and data having an evaluation value lower than the reference. Further, the display control unit 150 controls to narrow down the display to the data flow corresponding to the icon designated by the user among the plurality of data flows included in the data flow diagram.
  • FIG. 6 is a diagram showing an example of history information.
  • Provenance information 111 is stored in the storage unit 110 in advance.
  • a JSON (JavaScript Object Notation) format is shown as the data format, but other data formats may be used.
  • JAVASCRIPT is a registered trademark.
  • the history information 111 records the history of data processed by a plurality of software in the information processing system 50.
  • FIG. 6 shows a portion of the history information 111 showing that the accumulated data “data-A4” is aggregated by the data aggregation software and the utilization data “data-A5” is output.
  • variable "typeName” indicates that the processing type of the data aggregation is a script generated using a predetermined script language such as Python (registered trademark). "XXX” may represent the name of the script language or the like.
  • the variable name "createdBy” and the value "create-user” indicate that the name of the user who created the corresponding script is "create-user".
  • regression application indicates that the processing content of the corresponding software is "Aggression application", that is, data aggregation.
  • sample_user of the variable "attributes.run_user” indicates that the name of the execution user of the corresponding data aggregation is "sample_user”.
  • sample_server of the variable "attributes.server” indicates that the name of the execution server that executes the corresponding data aggregation software is “sample_server”.
  • the value "data-A4" of the variable “attributes.inputs.name” indicates that the name of the input data for the data aggregation is “data-A4".
  • the value “hdfs_path” of the variable “attributes.inputs.typeName” indicates that the input data type is "hdfs_path”.
  • hdfs is an abbreviation for Hadoop (registered trademark) Distributed File System.
  • the information of the storage unit of the acquisition source of the accumulated data "data-A4" may be included in the value of the variable "attributes.imputs.name” or included in the value of the variable “attributes.imputs.typeName”. You may.
  • the value "data-A5" of the variable "attributes.outputs.name” indicates that the name of the output data corresponding to the data aggregation for the accumulated data "data-A4" is "data-A5".
  • the value "hdfs_path” of the variable “attributes.outputs.typeName” indicates that the output data type is "hdfs_path”.
  • the information in the storage unit of the output destination of the utilization data "data-A5" may be included in the value of the variable "attributes.outputs.name” or included in the value of the variable “attributes.outputs.typeName”. You may.
  • history information 111 information indicating the history of the processed data is recorded by the same data structure for the processing of other software.
  • a method of evaluating data based on the history information 111 will be described.
  • the evaluation of the data history that is, the history evaluation will be described.
  • the provenance evaluation the output data is evaluated based on the information of the input data of the starting point for the data processing.
  • FIG. 7 is a diagram showing an example of provenance evaluation.
  • the history information 111 includes information indicating the history from the input data A1 to the utilization data A3. Further, the history information 111 includes information indicating the history from the stored data By stored in the data storage unit 53a to the utilization data Bz stored in the utilization data storage unit 81. The history information 111 indicates that the utilization data Bz is generated by the data shaping process s6 of the data shaping software 54 for the stored data By.
  • the user-owned input data list 112 is input to the information processing apparatus 100 by the user and stored in the storage unit 110.
  • the user-possessed input data list 112 shows information of data assumed by the user as input data of a start point of data processing.
  • the user-owned input data list 112 includes data names and acquisition source items.
  • the data name is the name of the data.
  • the acquisition source is the name of the storage unit of the acquisition source of the data.
  • "DB_A1" is the name of the input data storage unit 70.
  • the input data extraction unit 131 generates an input data list 113 for the user corresponding to the user possessed input data list 112 based on the history information 111.
  • the input data list 113 shows information on the input data of the starting point in the data processing used by the user.
  • the input data extraction unit 131 acquires the identification information of the user corresponding to the user-owned input data list 112.
  • the input data extraction unit 131 identifies a process (for example, "description") in which the identification information of the corresponding user is recorded as an execution user (for example, "run_user") from the history information 111.
  • the history information 111 may be information including only the history of data related to data processing used by the user. In this case, the input data extraction unit 131 can omit the process of distinguishing the provenance between the corresponding user and another user.
  • the input data extraction unit 131 specifies, for example, the history from the input data A1 to the utilization data A3 by tracing the input data and the output data of the process specified for the corresponding user in order from the input data to the output data. do. Similarly, the input data extraction unit 131 specifies the history from the accumulated data By to the inflection data Bz. Note that FIG. 7 omits the illustration of the history of other data for the corresponding user.
  • the direction from the input data to the output data is the forward direction
  • the direction from the output data to the input data is the opposite direction.
  • the input data extraction unit 131 extracts the input data of the starting point used for obtaining the output data based on the history of the data related to the corresponding user extracted from the history information 111, and generates the input data list 113.
  • the input data list 113 is stored in the storage unit 110. It can be said that the input data list 113 is a list of input data of the starting point specified by the input data extraction unit 131.
  • the input data extraction unit 131 specifies the input data A1 of the starting point used to obtain the utilization data A3 by tracing the history from the utilization data A3 in the opposite direction.
  • the data that ends when traced in the reverse direction that is, the input corresponding to the data does not exist in the history information 111, and the data that cannot be traced in the reverse direction further corresponds to the input data of the start point.
  • the input data extraction unit 131 specifies the input data By of the starting point used for obtaining the utilization data Bz by tracing the history from the utilization data Bz in the reverse direction. Since the input data By of the start point is stored in the data storage unit 53, it can also be called the stored data By.
  • the input data extraction unit 131 generates an input data list 113 including the input data A1 and By of the specified start point.
  • the input data list 113 includes the data name and the item of the acquisition source, similarly to the user-owned input data list 112.
  • the evaluation unit 140 generates a discrepancy list 114 by comparing the input data list 112 possessed by the user and the input data list 113 extracted from the history information 111.
  • the mismatch list 114 is stored in the storage unit 110.
  • the discrepancy list 114 includes list, history and source items. In the list item, among the records of the user-owned input data list 112, the data names of the records that do not exist in the input data list 113 are registered. In the history item, among the records of the input data list 113, the data names of the records that do not exist in the user-owned input data list 112 are registered. The acquisition source of the corresponding data is registered in the acquisition source item.
  • a record including the data name "Bx” and the acquisition source "DB_Bx” of the user-owned input data list 112 is registered in the mismatch list 114 with respect to the user-owned input data list 112 and the input data list 113. Further, a record including the data name "By” and the acquisition source "DB_By” of the input data list 113 is registered in the mismatch list 114.
  • the evaluation unit 140 generates a history evaluation result 115 based on the disagreement list 114.
  • the provenance evaluation result 115 is stored in the storage unit 110.
  • Provenance evaluation result 115 includes items of classification and evaluation value.
  • the classification item information representing a part of the data processing section belonging to a series of data processing is registered.
  • the evaluation value item the evaluation value of the quality of the data for each classification is registered.
  • the data changes such as the input data, the accumulated data, and the utilization data of the starting point
  • the data from the input data to the accumulated data is classified as the first section
  • the accumulated data to the utilization data is classified as the second section.
  • the data may change, such as the input data of the starting point, the first storage data, the second storage data, and the utilization data.
  • the first accumulated data to the second accumulated data is classified as the second section
  • the second accumulated data to the utilized data is classified as the third section. good.
  • the evaluation unit 140 identifies the data corresponding to the history of the discrepancy list 114 and the information of the acquisition source as the input data of the start point among the history of the data related to the corresponding user in the history information 111, and the input data of the start point.
  • the evaluation value is set to "0" for the classification following.
  • the evaluation unit 140 specifies the data history of the corresponding user in the history information 111, in which the data corresponding to the data name and the acquisition source information of the user possessed input data list 112 is used as the input data of the starting point. ..
  • the evaluation unit 140 sets the evaluation value to "1" for the classification following the specified input data of the start point.
  • the history evaluation result 115 a record with an evaluation value of "1" is registered for the classification "between A1 and A2". This indicates that the evaluation value of the quality of the accumulated data A2 generated based on the input data A1 by the history evaluation is "1".
  • the evaluation unit 140 can control the quality of the utilization data and the intermediate data to be evaluated higher as the number of the data of the actual start point included in the user-owned input data list 112 is larger.
  • FIG. 8 is a diagram showing an example of security evaluation.
  • the history information 111 reaches the utilization data Az and AB1 stored in the utilization data storage unit 80 via the storage data Ay stored in the data storage unit 53 from the input data Ax stored in the input data storage unit 70. Includes history information.
  • the history information 111 indicates that the accumulated data Ay was generated by the data acquisition of the data acquisition software 51 for the input data Ax and the data processing process s1 of the data processing software 52. Further, the history information 111 indicates that the utilization data Az is generated by the data shaping process s3 of the data shaping software 54 for the accumulated data Ay. Further, the history information 111 indicates that the utilization data AB1 is generated by the data shaping process s4 of the data shaping software 54 for the accumulated data Ay.
  • the access authority prediction unit 132 acquires the input data access authority information 111a indicating the access authority to the input data of the starting point specified at the time of the provenance evaluation.
  • the input data access authority information 111a may be provided by the user or may be acquired from the information processing system 50. Alternatively, when the history information 111 includes data access authority information, the input data access authority information 111a may be acquired from the history information 111.
  • the input data access authority information 111a is stored in the storage unit 110.
  • the access authority indicates the usage restrictions of the relevant data, and indicates the personnel who can access the relevant data.
  • the access authority prediction unit 132 may specify the personnel who can access the data according to the information of the purpose of using the data (for example, the purpose of disclosure or secret management within the organization).
  • the access authority prediction unit 132 acquires the processing shaping processing information 111b based on the history information 111.
  • the processing shaping processing information 111b indicates the processing content used to obtain the output data from the input data in the corresponding processing.
  • the access authority prediction unit 132 derives the processing content by analyzing the query for the software specified by the history information 111.
  • the processing and shaping processing information 111b is stored in the storage unit 110.
  • the access authority prediction unit 132 predicts the access authority based on the input data access authority information 111a and the processing and shaping processing information 111b, and generates the access authority prediction result 116. Details of access authority prediction will be described later.
  • the access authority prediction result 116 is stored in the storage unit 110.
  • the access authority prediction result 116 includes the data name and access authority items.
  • the name of the data is registered in the item of the data name.
  • the access authority item the access authority predicted from the preceding data is registered for the corresponding data.
  • a record indicating that the predicted access authority is "only the person in charge” is registered for the data name "Ay”. Further, in the access authority prediction result 116, a record indicating that the predicted access authority is "only the data administrator” is registered for the data name "Az”. Further, in the access authority prediction result 116, a record indicating that the predicted access authority is "anyone" is registered for the data name "AB1".
  • the access authority prediction unit 132 acquires the access authority information 117 indicating the actual access authority of each data, in addition to the access authority prediction result 116.
  • the access authority information 117 is acquired from the information processing system 50 and stored in the storage unit 110.
  • the access authority information 117 includes a record indicating that the actual access authority is "anyone" for the data name "Ay”.
  • the access authority information 117 includes a record indicating that the actual access authority is "anyone” for the data name "Az”.
  • the access authority information 117 includes a record indicating that the predicted access authority is "anyone" for the data name "AB1".
  • the evaluation unit 140 generates the security evaluation result 118 by comparing the access authority prediction result 116 with the access authority information 117.
  • the security evaluation result 118 is stored in the storage unit 110.
  • Security evaluation result 118 includes classification and evaluation value items. The meanings of the items of classification and evaluation value are the same as the meanings of the items of the same name in the history evaluation result 115.
  • the evaluation unit 140 compares the access authority prediction result 116 and the access authority information 117, and determines whether or not the predicted access authority and the actual access authority match with respect to the data having the same data name. When the predicted access authority and the actual access authority match, the evaluation unit 140 sets the evaluation value to "1" for the classification for which the corresponding data is output. On the other hand, when the predicted access authority and the actual access authority do not match, the evaluation unit 140 sets the evaluation value to "0" for the classification for which the corresponding data is output.
  • a record with an evaluation value of "0" is registered for the classification "between Ax and Ay". This indicates that the evaluation value of the quality of the accumulated data Ay generated based on the input data Ax by the security evaluation is "0".
  • FIG. 9 is a diagram showing an example of access authority prediction.
  • the access authority prediction unit 132 can predict the access authority of other data generated based on the input data of the start point based on the input data access authority information 111a regarding the input data of the start point. For example, the access authority prediction unit 132 may acquire the input data access authority information 111a from the information of the data catalog in the information processing system 50, or may acquire the input data access authority information 111a input by the user. ..
  • Input data access authority information 111a includes data name, column, secret classification, and access authority items.
  • the name of the data is registered in the item of the data name.
  • the column name (column name) included in the corresponding data is registered in the column item.
  • a column is a data item contained in the corresponding data.
  • secret classification a secret classification indicating the classification of secret management for the corresponding column of the corresponding data is registered.
  • access authority item the access authority for the corresponding column of the corresponding data is registered.
  • a record having a data name "Ax”, a column “collect_a”, a secret category “confidential”, and an access authority "anyone in the company” is registered.
  • This record indicates that the secret classification of the information in the column “column_a” of the input data Ax is "confidential” and the access authority is "anyone in the company”.
  • "in-house” represents all users belonging to the company to which the corresponding user belongs.
  • the input data access authority information 111a may include secret classification and access authority information regarding the input data of another starting point.
  • the access authority prediction unit 132 acquires the processing shaping processing information 111b. As described above, the access authority prediction unit 132 generates processing shaping processing information 111b by performing query analysis of each processing included in a series of data processing. For example, the access authority prediction unit 132 can also generate the processing shaping processing information 111b from the relationship between the input data and the output data for a certain process included in the history information 111. Processing The processing information 111b includes items of processing, input data, output data, input columns, and output columns. Identification information of the processing content in each software is registered in the processing item. Input data for the processing content is registered in the input data item. Output data for the processing content is registered in the output data item. The name of the column (input column) in the input data is registered in the item of the input column. The name of the column (output column) in the output data is registered in the item of the output column.
  • the processing shaping processing information 111b includes a record of processing "processing s1", input data "Ax”, output data "Ay”, input column "collect_a”, and output column "collect_a1". This record indicates that in the data processing process s1, the column volume_a1 of the accumulated data Ay is generated based on the column volume_a of the input data Ax.
  • the processing shaping processing information 111b includes a record of processing "processing s1", input data "Ax”, output data "Ay”, input columns "collect_b, volume_c", and output column "collect_bc”. This record indicates that in the data processing process s1, the column volume_bc of the accumulated data Ay is generated based on the columns volume_b and volume_c of the input data Ax.
  • the access authority prediction unit 132 generates an access authority prediction result 116a for each output column based on the input data access authority information 111a and the processing shaping processing information 111b.
  • the access authority of the output column is predicted based on the access authority of the input column. For example, when there is one input column corresponding to a certain output column, the access authority of the input column is the expected access authority to the output column. Further, when there are a plurality of input columns for a certain output column, the most restrictive access authority among the access authority of the plurality of input columns is the expected access authority for the output column.
  • the column volume_a1 is generated based on the column volume_a.
  • the access authority of the column volume_a is "anyone in the company”. Therefore, the access authority prediction unit 132 predicts that the access authority of the column volume_a1 of the accumulated data Ay is "anyone in the company”. The access authority prediction unit 132 adds the predicted access authority "anyone in the company” to the prediction result 116a in association with the identification information of the column volume_a1.
  • the column “column_bc” is generated based on the columns “column_b” and “column_c”.
  • the access authority of the column “collect_b” is “only the person in charge”
  • the access authority of the column “collect_c” is "anyone in the company”. Therefore, the access authority prediction unit 132 predicts that the access authority of the column “collect_bc” of the accumulated data Ay is “only the person in charge” and “anyone in the company", which is the most restrictive "only the person in charge”. ..
  • the access authority prediction unit 132 adds the predicted access authority “only the person in charge” to the prediction result 116a in association with the identification information of the column volume_bc.
  • the access authority prediction unit 132 predicts the access authority of the accumulated data Ay based on the access authority predicted for each of the columns volume_a1 and volume_bc included in the accumulated data Ay, and generates the access authority prediction result 116. For example, the access authority prediction unit 132 may predict the most restrictive access authority among the predicted access authority for all columns of the corresponding data as the access authority of the corresponding data. In the example of the accumulated data Ay, among the predicted access privileges "anyone in the company" and “only the person in charge” for each of the columns volume_a1 and volume_bc, the most restrictive access authority is "only the person in charge”. Therefore, the access authority prediction unit 132 predicts that the access authority of the accumulated data Ay is "only the person in charge”. The access authority prediction unit 132 adds the access authority “only the person in charge” predicted for the stored data Ay to the access authority prediction result 116.
  • the evaluation unit 140 may generate the security evaluation result 118 by comparing the prediction result 116a predicted for each column of data with the actual access authority for each column of the data. In that case, for example, the evaluation unit 140 evaluates that the more columns with the same access authority, the higher the evaluation value of the corresponding data, that is, the more columns with the same access authority, the higher the quality. It can also be controlled.
  • FIG. 10 is a diagram showing an example of up-to-dateness evaluation.
  • Provenance information 111 includes information indicating the provenance of the data exemplified in FIG.
  • the delay time calculation unit 133 acquires the delay requirement information 119.
  • the delay requirement information 119 a time allowed by the user is registered as a delay time from the generation of the input data of the start point to the update of the utilization data.
  • the delay requirement information 119 is input to the information processing apparatus 100 by the user and stored in the storage unit 110.
  • Delay requirement information 119 includes data names and delay requirement items.
  • the data name of the utilization data is registered in the data name item.
  • the allowable delay time from the generation of the input data of the starting point to the update of the utilization data is registered.
  • the upper limit of the delay time allowed from the generation of the input data of the starting point to the update of the utilization data may be registered.
  • the delay requirement information 119 includes a record indicating that the delay requirement of the utilization data A3 is "within 2 hours”. Further, the delay requirement information 119 includes a record indicating that the delay requirement of the utilization data AB is "within 5 minutes”. Further, the delay requirement information 119 includes a record indicating that the delay requirement of the utilization data B3 is "within 1 minute”. The delay requirement information 119 may include a record of delay requirements for utilization data generated by other data processing.
  • the delay time calculation unit 133 generates the actual delay time information 120 based on the history information 111.
  • the actual delay time information 120 is stored in the storage unit 110.
  • the delay time calculation unit 133 acquires the data update log recorded by the information processing system 50 from the information processing system 50 and stores it in the storage unit 110.
  • the data update log contains information indicating the data name and the time when the data of the data name was updated.
  • the delay time calculation unit 133 generates the actual delay time information 120 based on the history information 111 and the data update log.
  • the actual delay time information 120 includes items of data name, update time, and delay time.
  • the data name is registered in the data name item.
  • the update time item the update time of the data with the corresponding data name is registered. In the example of FIG. 10, for the sake of simplicity, an example in which the update time is the same day is shown, but the update time may include a date.
  • the item of delay time the time elapsed from the time when the input data of the start point is updated, that is, the delay time is registered. For the input data of the start point, the delay time is "-" (no setting).
  • the delay time calculation unit 133 can acquire the data name and update time information from the above-mentioned data update log. Further, the delay time calculation unit 133 identifies the input data of the start point and the subsequent data generated based on the input data of the start point by tracing the history of the data based on the history information 111, and identifies the subsequent data generated based on the input data of the start point. The delay time for the data can be calculated.
  • the actual delay time information 120 includes a record with an update time "02:30" and a delay time "-" with respect to the input data A1. Since the input data A1 is the "input data of the starting point" specified by the input data extraction unit 131, the delay time is "-".
  • the actual delay time information 120 includes a record of the update time "13:30" and the delay time "-" with respect to the input data B1. Since the input data B1 is the "input data of the starting point" specified by the input data extraction unit 131, the delay time is "-".
  • the actual delay time information 120 includes a record of the update time "02:32” and the delay time "2 minutes” with respect to the accumulated data A2.
  • the stored data A2 is stored in the data storage unit 53 after data acquisition and data processing for the input data A1. Therefore, the delay time of the accumulated data A2 is the difference "2 minutes” between the update time "02:30" of the input data A1 and the update time "02:32" of the accumulated data A2.
  • the actual delay time information 120 includes a record of the update time “13:33” and the delay time “3 minutes” with respect to the accumulated data B2.
  • the actual delay time information 120 includes a record of the update time “04:00” and the delay time “1 hour 30 minutes” with respect to the utilization data A3.
  • the actual delay time information 120 includes a record of the update time “13:33” and the delay time “3 minutes” with respect to the utilization data AB.
  • the actual delay time information 120 includes a record of the update time “13:33” and the delay time “3 minutes” with respect to the utilization data B3.
  • the evaluation unit 140 evaluates the up-to-dateness of the data by determining whether or not the delay time calculated for the utilization data satisfies the delay requirement of the delay requirement information 119 based on the actual delay time information 120. Then, the latestness evaluation result 121 is generated.
  • the up-to-dateness evaluation result 121 is stored in the storage unit 110.
  • the up-to-dateness evaluation result 121 includes items of classification and evaluation value. The meaning of the classification and the evaluation value is the same as the meaning of the item of the same name in the provenance evaluation result 115.
  • the evaluation unit 140 sets the evaluation value of each classification up to the corresponding utilization data to "1".
  • the evaluation unit 140 sets the evaluation value of each classification leading to the utilization data to "0".
  • the evaluation value of the classification is set to "0".
  • the evaluation value of each classification up to the utilization data A3 and AB is "1".
  • the evaluation value of each classification leading to the utilization data “B3” is “0”.
  • the classification "between B1 and B2" is a classification linked to the utilization data AB and B3, but the evaluation value is "0" because the delay requirement is not satisfied for the utilization data B3.
  • the history information 111 may include information corresponding to the data update log.
  • the delay time calculation unit 133 can generate the actual delay time information 120 from the history information 111 without separately acquiring the data update log from the information processing system 50.
  • the evaluation unit 140 evaluates the data quality by the history evaluation, the security evaluation, and the up-to-dateness evaluation based on the analysis result by the history information analysis unit 130. Further, the evaluation unit 140 performs a comprehensive evaluation of data quality based on the evaluation results of the history evaluation, the security evaluation, and the up-to-dateness evaluation. Next, the comprehensive evaluation will be described.
  • FIG. 11 is a diagram showing an example of a comprehensive evaluation result table.
  • the comprehensive evaluation result table 122 is generated by the evaluation unit 140 and stored in the storage unit 110 based on the probability evaluation result 115, the security evaluation result 118, and the up-to-dateness evaluation result 121.
  • the comprehensive evaluation result table 122 includes items of classification, history evaluation value, security evaluation value, up-to-dateness evaluation value, and comprehensive evaluation value.
  • Classification is registered in the classification item.
  • the meaning of the classification is the same as the meaning of the classification in the history evaluation result 115.
  • the evaluation value in the provenance evaluation result 115 for the corresponding classification that is, the provenance evaluation value is registered.
  • the evaluation value in the security evaluation result 118 for the corresponding classification is registered.
  • the evaluation value in the up-to-dateness evaluation result 121 for the corresponding classification is registered.
  • the comprehensive evaluation value calculated based on the history evaluation value, the security evaluation value and the up-to-dateness evaluation value is registered.
  • the comprehensive evaluation value is the sum of the history evaluation value, the security evaluation value, and the up-to-dateness evaluation value.
  • the comprehensive evaluation result table 122 is a record of a history evaluation value "1", a security evaluation value "1", an up-to-date evaluation value "1", and a comprehensive evaluation value "3" for the classification "A1-A2". including. Further, for example, in the comprehensive evaluation result table 122, the history evaluation value "0", the security evaluation value "0", the up-to-dateness evaluation value "0", and the comprehensive evaluation value "0" are shown for the classification "B1-B2". Includes records for. Records for other classifications are also registered in the comprehensive evaluation result table 122.
  • the evaluation unit 140 has shown that the sum (V1 + V2 + V3) of the history evaluation value (V1), the security evaluation value (V2), and the up-to-dateness evaluation value (V3) is used as the comprehensive evaluation value.
  • V1, V2, and V3 are positive real numbers.
  • the evaluation unit 140 may use a weighted sum (w1 * V1 + w2 * V2 + w3 * V3) weighted for each of the provenance evaluation value, the security evaluation value, and the up-to-dateness evaluation value as the comprehensive evaluation value.
  • w1, w2, and w3 are positive real numbers.
  • the display control unit 150 causes the display 61 to display an evaluation result screen showing the evaluation result based on the comprehensive evaluation result table 122. Next, an example of the evaluation result screen will be described.
  • FIG. 12 is a diagram showing a first example of the evaluation result screen.
  • the evaluation result screen 400 includes images of the data flow diagram 401 and the legend 402.
  • the data flow diagram 401 is a diagram showing a data flow in the information processing system 50.
  • the display control unit 150 accepts the input of the user's identification information, and displays the flow of data related to the user's identification information as the data flow diagram 401 based on the history information 111.
  • the data flow is represented by an arrow.
  • One arrow corresponds to the classification in the comprehensive evaluation result table 122.
  • the display control unit 150 presents the evaluation value for each classification, that is, the evaluation result of the quality, to the user by coloring the arrow.
  • Legend 402 indicates the high quality corresponding to the color of the arrow.
  • the example of FIG. 12 shows a case where quality is distinguished by three colors.
  • the first color represents quality "high”.
  • the second color represents quality "medium”.
  • the third color represents quality "low”.
  • the first color is, for example, green.
  • the second color is, for example, yellow.
  • the third color is, for example, red.
  • the display control unit 150 uses the arrow of the classification corresponding to the total evaluation value of "3" as the first color in the comprehensive evaluation result table 122. Further, the display control unit 150 uses the arrow of the classification corresponding to the comprehensive evaluation value "0" in the comprehensive evaluation result table 122 as the third color. Further, the display control unit 150 sets the arrow of the classification in which the comprehensive evaluation value is larger than 0 and smaller than 3 in the comprehensive evaluation result table 122 as the second color.
  • the color coding of the arrows according to the comprehensive evaluation value may be performed by using two colors or four or more colors.
  • the display control unit 150 displays, for example, a cross mark "X" superimposed on the utilization data obtained via an arrow having a low overall evaluation value (for example, an overall evaluation value of less than 3). Show the user that it is a part that needs to be reviewed.
  • the user can select an icon representing data or processing in the data flow diagram 401 by operating the pointer P1 displayed on the evaluation result screen 400 by the input device 62.
  • FIG. 13 is a diagram showing a second example of the evaluation result screen.
  • the display control unit 150 detects that the icon of the utilization data AB on the evaluation result screen 400 is selected by the pointer P1, the display control unit 150 updates the evaluation result screen 400 to the evaluation result screen 500.
  • the evaluation result screen 500 includes images of the data flow diagram 501 and the legend 502.
  • the legend 502 is the same as the legend 402.
  • an arrow in the reverse direction which goes back from the selected utilization data AB to the accumulated data B2 via the data shaping process s4, is highlighted. Further, in the data flow diagram 501, a forward arrow indicating that the input data B1 reaches the accumulated data B2 via the data acquisition process and the data processing process s2 is highlighted. Other arrows are displayed in an inconspicuous manner.
  • the display control unit 150 preferentially selects the one having the lower overall evaluation value and displays the arrow in the opposite direction. More specifically, on the evaluation result screen 500, when the data shaping process s4 traced back from the utilization data AB is traced in the reverse direction, the data is branched into the accumulated data A2 and B2. Therefore, the display control unit 150 highlights the arrow connected to the classification "B1-B2", which is the lower of the classifications "A1-A2" and "B1-B2", which has the lower overall evaluation value. This makes it easier for the user to find a part that causes deterioration of data quality.
  • the evaluation result screens 400 and 500 for the comprehensive evaluation value are illustrated, but the display control unit 150 uses the comprehensive evaluation result table 122 as a basis for the provenance evaluation value, the security evaluation value, and the latest.
  • the evaluation result screen for each of the sex evaluation values may be displayed.
  • FIG. 14 is a flowchart showing a processing example of the information processing apparatus.
  • the information processing apparatus 100 starts the following procedure when, for example, the user receives an input for starting a data quality evaluation.
  • the provenance information analysis unit 130 generates an input data list 113 corresponding to the identification information of the corresponding user based on the provenance information 111.
  • the evaluation unit 140 evaluates the history of data based on the user-owned input data list 112 and the input data list 113, and generates the history evaluation result 115.
  • the evaluation unit 140 stores the generated probability evaluation result 115 in the storage unit 110. The procedure for provenance evaluation will be described later.
  • the provenance information analysis unit 130 generates the access authority prediction result 116 based on the provenance information 111.
  • the evaluation unit 140 performs security evaluation of data based on the access authority prediction result 116 and the actual access authority information 117, and generates the security evaluation result 118.
  • the evaluation unit 140 stores the generated security evaluation result 118 in the storage unit 110. The security evaluation procedure will be described later.
  • the provenance information analysis unit 130 generates the actual delay time information 120 based on the provenance information 111.
  • the evaluation unit 140 evaluates the latestness of the data based on the delay requirement information 119 and the actual delay time information 120, and generates the latestness evaluation result 121.
  • the evaluation unit 140 stores the generated up-to-dateness evaluation result 121 in the storage unit 110. The procedure for up-to-date evaluation will be described later.
  • the evaluation unit 140 generates a comprehensive evaluation result table 122 based on the provenance evaluation result 115, the security evaluation result 118, and the up-to-dateness evaluation result 121.
  • the evaluation unit 140 stores the generated comprehensive evaluation result table 122 in the storage unit 110.
  • the evaluation unit 140 registers the history evaluation value, the security evaluation value, and the latestness evaluation value of each category in the history evaluation result 115, the security evaluation result 118, and the up-to-dateness evaluation result 121 in the comprehensive evaluation result table 122.
  • the evaluation unit 140 calculates the sum of the history evaluation value, the security evaluation value, and the up-to-dateness evaluation value as the comprehensive evaluation value for each classification, and registers the sum in the comprehensive evaluation result table 122.
  • the comprehensive evaluation value may be calculated by another calculation method such as a weighted sum of the history evaluation value, the security evaluation value, and the up-to-dateness evaluation value.
  • the display control unit 150 executes the evaluation result display control based on the comprehensive evaluation result table 122. The procedure for controlling the evaluation result display will be described later.
  • the display control unit 150 receives the input of the end of the evaluation result display by the user, the display control unit 150 ends the evaluation result display control and ends the display of the evaluation result screen. Then, the processing of the information processing apparatus 100 is completed.
  • FIG. 15 is a flowchart showing a history evaluation example. Provenance evaluation corresponds to step S10. (S20)
  • the input data extraction unit 131 acquires the user-owned input data list 112 and stores it in the storage unit 110. For example, the user-owned input data list 112 is input to the information processing apparatus 100 by the user.
  • the input data extraction unit 131 generates a list of input data of the actual start point, that is, an input data list 113, based on the history information 111 stored in the storage unit 110, and stores the input data list 113 in the storage unit 110. At this time, the input data extraction unit 131 identifies the history of the data used in the process corresponding to the identification information of the corresponding user based on, for example, the history information 111, and generates the input data list 113 from the specified history.
  • the evaluation unit 140 specifies the utilization data to be evaluated. In the history evaluation process, it is assumed that the initial value of the history evaluation value of each data at the time when step S22 is first executed is 0.
  • the evaluation unit 140 determines whether or not the input data of the actual starting point corresponding to the corresponding utilization data matches the user-owned input data based on the user-owned input data list 112 and the input data list 113. .. If they match, the evaluation unit 140 proceeds to step S24. If they do not match, the evaluation unit 140 proceeds to step S25.
  • the evaluation unit 140 adds points to the evaluation value of the history evaluation of the corresponding utilization data, that is, the history evaluation value. In addition, the evaluation unit 140 also adds points to the history evaluation value of the intermediate data (accumulated data) leading to the corresponding utilization data. In adding points, for example, the revenue evaluation value "1" (unit points) is added. However, the history evaluation value to be added may be other than "1", and as described above, the history evaluation value for the corresponding data may be given so as not to exceed a predetermined upper limit value (for example, "1").
  • the quality evaluation value for a certain data corresponds to the evaluation value of the classification that outputs the data. For example, the evaluation value for the classification "between A1 and A2" in the comprehensive evaluation result table 122 can be said to be the evaluation value for the accumulated data A2.
  • the evaluation unit 140 determines whether or not all the utilization data to be evaluated have been evaluated. If all the utilization data to be evaluated have been evaluated, the evaluation unit 140 ends the history evaluation. If the evaluation target full utilization data has not been evaluated, the evaluation unit 140 proceeds to step S22.
  • the evaluation unit 140 adds points to the history evaluation values of the utilization data and the intermediate data as many as the number included in the user-possessed input data list 112 among the actual start point data. For example, when the number of the input data of the actual start point included in the user-owned input data list 112 is 2, the evaluation unit 140 utilizes the evaluation value twice as the unit score a of the evaluation value, or the intermediate data. It is conceivable to add to each history evaluation value of.
  • FIG. 16 is a flowchart showing an example of security evaluation.
  • the security evaluation corresponds to step S11.
  • the access authority prediction unit 132 identifies the input data of the starting point corresponding to the identification information of the user to be processed based on the input data list 113 generated in step S21.
  • the access authority prediction unit 132 acquires the input data access authority information 111a related to the input data of the specified start point and stores it in the storage unit 110. Further, the access authority prediction unit 132 acquires the processing and shaping processing information 111b related to the processing from the input data of the start point to the utilization data based on the history information 111, and stores it in the storage unit 110.
  • the access authority prediction unit 132 predicts the access authority of other data obtained from the input data of the start point based on the history information 111, generates the access authority prediction result 116, and stores it in the storage unit 110.
  • Other data obtained from the input data of the start point includes accumulated data (intermediate data) and utilization data generated based on the input data of the start point.
  • the access authority prediction unit 132 acquires the access authority information 117 indicating the actual data access authority from the data catalog or the like of the information processing system 50 and stores it in the storage unit 110.
  • the evaluation unit 140 specifies the data to be evaluated. Candidates for the data to be evaluated are all accumulated data and all utilization data included in the access authority prediction result 116. The evaluation unit 140 specifies one data to be evaluated from the data candidates to be evaluated. In the security evaluation process, the initial value of the security evaluation value of each data at the time when step S32 is first executed is assumed to be 0.
  • the evaluation unit 140 determines whether or not the predicted access authority for the data to be evaluated in the access authority prediction result 116 matches the actual access authority in the access authority information 117. If they match, the evaluation unit 140 proceeds to step S34. If they do not match, the evaluation unit 140 proceeds to step S35.
  • the evaluation unit 140 adds points to the evaluation value of the security evaluation of the data to be evaluated, that is, the security evaluation value.
  • the security evaluation value for example, a security evaluation value "1" is added.
  • the history evaluation value to be added may be other than "1", and as described above, a security evaluation value may be given to the relevant data so as not to exceed a predetermined upper limit value (for example, "1").
  • the evaluation unit 140 determines whether or not all the data to be evaluated have been evaluated. If all the data to be evaluated have been evaluated, the evaluation unit 140 ends the security evaluation. If all the data to be evaluated have not been evaluated, the evaluation unit 140 proceeds to step S32.
  • FIG. 17 is a flowchart showing an example of up-to-date evaluation.
  • the up-to-dateness evaluation corresponds to step S12.
  • the delay time calculation unit 133 acquires the delay requirement information 119 regarding the utilization data.
  • the delay requirement information 119 is input to the information processing apparatus 100 by the user.
  • the delay time calculation unit 133 calculates the delay time of the actual data update for the utilization data based on the history information 111. As described above, the delay time calculation unit 133 can use the data update log acquired from the information processing system 50 to calculate the delay time for data update. The delay time calculation unit 133 records the calculated delay time in the actual delay time information 120 stored in the storage unit 110.
  • the evaluation unit 140 specifies the utilization data to be evaluated. In the up-to-dateness evaluation process, it is assumed that the initial value of the up-to-dateness evaluation value of each data at the time when step S42 is first executed is 0.
  • the evaluation unit 140 determines whether or not the delay time calculated for the corresponding utilization data is within the permissible range based on the delay requirement information 119 based on the delay requirement information 119 and the actual delay time information 120. judge. If it is within the permissible range, the process proceeds to step S44. If it is not within the permissible range, the process proceeds to step S45.
  • the evaluation unit 140 adds points to the evaluation value of the latestness evaluation of the corresponding utilization data, that is, the latestness evaluation value. In addition, the evaluation unit 140 also adds points for the up-to-dateness evaluation value of the intermediate data (accumulated data) leading to the corresponding utilization data. In addition, for example, the latestness evaluation value "1" is added. However, the up-to-dateness evaluation value to be added may be other than "1", and as described above, even if the up-to-dateness evaluation value is given to the corresponding data so as not to exceed a predetermined upper limit value (for example, "1"). good.
  • the evaluation unit 140 determines whether or not all the utilization data to be evaluated have been evaluated. If all the utilization data to be evaluated have been evaluated, the evaluation unit 140 ends the up-to-date evaluation. If the evaluation target full utilization data has not been evaluated, the evaluation unit 140 proceeds to step S42.
  • FIG. 18 is a flowchart showing an evaluation result display control example.
  • the evaluation result display control corresponds to step S14.
  • the display control unit 150 accepts the user's selection of the evaluation type to be displayed.
  • Evaluation types include history evaluation, security evaluation, up-to-date evaluation, and comprehensive evaluation. The user may select one of these evaluation types, or may select a combination of two of the history evaluation, the security evaluation, and the up-to-date evaluation. In the following, the case where the comprehensive evaluation is mainly selected is illustrated, but the procedure is the same for other evaluation types.
  • the display control unit 150 displays the evaluation result screen 400 showing the history of the data corresponding to the identification information of the corresponding user on the display 61.
  • the display control unit 150 transmits the information of the evaluation result screen 400 to another device such as a client device connected via the network 60, and the evaluation result screen is displayed on the display connected to the other device by the other device. 400 may be displayed.
  • the display control unit 150 displays the data flow diagram 401 in which the arrow indicating “classification” is color-coded using the evaluation value (here, the comprehensive evaluation value) corresponding to the evaluation type selected in step S50 on the evaluation result screen 400. Display inside.
  • the evaluation result screen 400 may include the image of the legend 402. When two evaluation types are selected in step S50, the display control unit 150 may color-code the arrows according to the sum of the evaluation values of the two evaluation types for the corresponding classification.
  • the display control unit 150 determines whether or not any data is selected in the data flow diagram 401. If there is such a selection, the display control unit 150 proceeds to step S53. If there is no such selection, the display control unit 150 proceeds to step S54.
  • the user can operate the input device 62 to select the data displayed in the data flow diagram 401 by the pointer P1.
  • the user may operate the input device of the client device to input the selection of any data to the information processing device 100. ..
  • the display control unit 150 displays a data flow that traces back from the data selected based on the comprehensive evaluation value. For example, when the utilization data AB is selected in the data flow diagram 401, the display control unit 150 displays an evaluation result screen 500 including a data flow that traces back from the utilization data AB to the accumulated data B2 which is the previous data. As described above, when the display control unit 150 has a branch in the retroactive route, the display control unit 150 preferentially selects and selects the classification that includes many classifications having a low comprehensive evaluation value among the classifications leading to the utilization data AB. It is conceivable to highlight the classification of the branch destination.
  • the display control unit 150 highlights the arrow connected to the classification "B1-B2", which is the lower of the classifications "A1-A2" and "B1-B2", which has the lower overall evaluation value.
  • the display control unit 150 determines whether or not the input for ending the display has been accepted. When the display end input is received, the display control unit 150 ends the evaluation result display control. If the display control unit 150 does not accept the input for ending the display, the display control unit 150 proceeds to step S52.
  • the information processing apparatus 100 it is possible to realize a service for evaluating the data quality in the information processing infrastructure.
  • the information processing apparatus 100 evaluates the flow of data across software products by utilizing the history information of the data. Further, the information processing apparatus 100 displays the problematic portion on the data flow diagram based on the evaluation result.
  • the information processing apparatus 100 appropriately evaluates the quality of data in the data processing by the plurality of software based on the history information 111. For example, in a series of data processing by a plurality of software, the quality of the data output by the data processing changes depending on whether or not the data intended by the user is processed. For example, when performing data processing such as analysis, if input data not intended by the user is processed, the input data may contain unnecessary information or incorrect information, resulting in data processing. Is more likely to be wrong, which reduces the reliability of the result.
  • the information processing apparatus 100 specifies the input data of the start point of the data processing by the plurality of software based on the history information 111. By confirming whether the input data of the start point is the input intended by the user, the quality of the utilization data and the accumulated data output by the data processing can be appropriately evaluated. In addition, the evaluation can be performed faster than the user can perform.
  • the information processing apparatus 100 evaluates the quality of data in a plurality of evaluation types such as security evaluation and up-to-dateness evaluation in addition to the history evaluation, and comprehensively evaluates the quality of the data from the evaluation results in the plurality of evaluation types. Therefore, the quality of the data can be evaluated more appropriately.
  • the information processing apparatus 100 performs the following processing.
  • the history information analysis unit 130 extracts information on the first input data, which is a starting point of data processing by the plurality of software, based on the history information 111 showing the history of input data and output data for each of the plurality of software.
  • the evaluation unit 140 evaluates the quality of the first output data output by the data processing according to the comparison of whether or not the information of the first input data matches the information of the predetermined data input by the user. I do.
  • the evaluation unit 140 ranges from the first input data to the first output data according to the comparison of whether or not the information of the first input data matches the information of the predetermined data input by the user. Perform quality evaluation of intermediate data via.
  • the history information analysis unit 130 acquires the information of the first access authority to the first input data, and based on the first access authority and the history information 111, the second access authority to the first output data. Predict.
  • the evaluation unit 140 evaluates the quality of the first output data according to the comparison of whether or not the actual access authority of the first output data matches the predicted second access authority.
  • the access authority of the first output data predicted from the history information 111 matches the actual access authority, the first output data is generated through an appropriate process. It is possible to evaluate whether or not it is a product. For example, if the access permissions do not match, there is a possibility that inappropriate processing that the user did not anticipate has been performed in the process of generating the first output data. Therefore, if the predicted access authority does not match the actual access authority, the quality of the corresponding data is judged to be low.
  • the history information analysis unit 130 determines the third access authority for the intermediate data passing from the first input data to the first output data based on the first access authority. Predict. The provenance information analysis unit 130 predicts the second access authority based on the predicted third access authority.
  • the second access authority can be appropriately predicted by predicting the access authority of the data in order in the forward direction of the data flow based on the prediction result of the access authority to the intermediate data.
  • the evaluation unit 140 may evaluate the quality of the intermediate data according to the comparison of whether or not the actual access authority of the intermediate data matches the predicted third access authority. This makes it possible to appropriately evaluate the quality of not only the first output data, which is the final output of the data processing, but also the intermediate data generated in the process of data processing.
  • the history information analysis unit 130 acquires information on the first delay time allowed from the generation of the first input data to the update of the first output data.
  • the information of the first delay time is input by the user, for example.
  • the history information analysis unit 130 calculates a second delay time from the generation of the first input data to the update of the first output data based on the data update history information and the history information 111. ..
  • the evaluation unit 140 evaluates the quality of the first output data according to the comparison of whether or not the second delay time is shorter than the first delay time.
  • the first output data is generated through an appropriate process depending on whether the delay time from the generation of the first input data to the update of the first output data meets the delay requirement of the user. It is possible to evaluate whether or not it is a thing. For example, if the second delay time is longer than the first delay time, the delay requirement is not satisfied, and an abnormality or performance deterioration occurs in the process from the first input data to the first output data. There may be. Therefore, when the second delay time is longer than the first delay time, it is determined that the quality of the first output data is low.
  • the evaluation unit 140 passes through from the first input data to the first output data in addition to the first output data according to the comparison between the second delay time and the first delay time. Data quality may be evaluated. In this case as well, if the second delay time is longer than the first delay time, the quality of the intermediate data is judged to be low.
  • the display control unit 150 causes the display device to display the data flow diagram 401 from the first input data to the first output data.
  • the display control unit 150 determines the display mode of the image element, which is included in the data flow diagram 401 and shows the relationship between the first input data and the first output data, based on the result of quality evaluation for the first output data. change.
  • the arrow indicating the data flow in FIG. 401 is an example of an image element.
  • the display 61 is an example of a display device.
  • the display device may be a display device connected to another information processing device. In that case, the display control unit 150 performs display control by transmitting information on the display content to another information processing device via the network 60.
  • the display control unit 150 accepts the selection of the image showing the first output data included in the data flow diagram 401 displayed on the display device. Then, the display control unit 150 emphasizes among a plurality of image elements showing the relationship between the data up to the first output data, based on the evaluation value of the data up to the first output data. Select the image element to be displayed and highlight the selected image element.
  • a series based on the input data of a plurality of starting points of the data flow leading to the first output data (for example, between the classifications “A1-A2” and “B1-B2”) There may be.
  • the display control unit 150 selects and highlights an image element belonging to a series containing many classifications having low evaluation values or a series having a low average value of the evaluation values of the classifications.
  • the information processing of the first embodiment can be realized by causing the processing unit 12 to execute the program. Further, the information processing of the second embodiment can be realized by causing the CPU 101 to execute the program.
  • the program can be recorded on a computer-readable recording medium 63.
  • the program can be distributed by distributing the recording medium 63 on which the program is recorded.
  • the program may be stored in another computer and distributed via the network.
  • the computer may store (install) a program recorded on the recording medium 63 or a program received from another computer in a storage device such as RAM 102 or HDD 103, read the program from the storage device, and execute the program. good.
  • Information processing device 11 Storage unit 12 Processing unit 20 History information 31, 32 Software 41, 42, 43 Data storage unit d1, d2, d3 Data S1, S2 Step

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention makes it possible to evaluate the quality of data appropriately. In the present invention, a storage unit stores history information indicating a history of input data and output data for each of a plurality of pieces of software. A processing unit extracts information of first input data, which serves as a start point of data processing executed by the plurality of pieces of software, on the basis of the history information stored in the storage unit. The processing unit evaluates the quality of first output data output through the data processing in accordance with a comparison of whether or not the information of the first input data matches the information of predetermined data input by a user.

Description

情報処理プログラム、情報処理方法および情報処理装置Information processing programs, information processing methods and information processing equipment

 本発明は情報処理プログラム、情報処理方法および情報処理装置に関する。 The present invention relates to an information processing program, an information processing method, and an information processing device.

 情報処理システムでは、様々なソフトウェア製品によって大量のデータが処理される。複数のソフトウェア製品を組み合わせて、一連のデータ処理が実現されることもある。
 例えば、情報処理システムにおいて処理されるデータの流れを、データ間の関係を表すデータリネージの情報を用いて管理する方法が提案されている。提案の方法では、物理的データ要素間の関係を表す第1のデータリネージと、ビジネスデータ要素間の関係を表す第2のデータリネージとを基に、物理的データ要素とビジネスデータ要素との関連を検出する。
In information processing systems, a large amount of data is processed by various software products. A series of data processing may be realized by combining multiple software products.
For example, a method of managing the flow of data processed in an information processing system by using data lineage information representing a relationship between the data has been proposed. In the proposed method, the relationship between the physical data element and the business data element is based on the first data lineage that represents the relationship between the physical data elements and the second data lineage that represents the relationship between the business data elements. Is detected.

 なお、基地局装置が端末装置から信号を受信した際の受信信号強度、またはエラーの程度を示す値を含むデータを取得し、取得したデータに基づいて基地局装置の受信信号の品質指標を導出する制御装置の提案がある。 It should be noted that data including a value indicating the strength of the received signal when the base station device receives the signal from the terminal device or the degree of error is acquired, and the quality index of the received signal of the base station device is derived based on the acquired data. There is a proposal for a control device to be used.

国際公開第2018/089633号International Publication No. 2018/089633 特開2019-110497号公報Japanese Unexamined Patent Publication No. 2019-110497

 複数のソフトウェア製品が組み合わされて一連のデータ処理が実現される場合、当該データ処理により出力されるデータが、ユーザの意図する過程を経て生成されたものであるか否かが問題となる。例えば、処理対象のデータの出処が意図したものでないと、データ処理の結果に対する信頼性は低下する。しかし、こうした複数のソフトウェア製品による一連のデータ処理を経て出力されるデータの品質管理の仕組みが確立されていない。 When a series of data processing is realized by combining a plurality of software products, the problem is whether or not the data output by the data processing is generated through the process intended by the user. For example, if the source of the data to be processed is not intended, the reliability of the result of data processing is lowered. However, a quality control mechanism for data output through a series of data processing by such a plurality of software products has not been established.

 1つの側面では、本発明は、データの品質を適切に評価する情報処理プログラム、情報処理方法および情報処理装置を提供することを目的とする。 In one aspect, it is an object of the present invention to provide an information processing program, an information processing method and an information processing apparatus for appropriately evaluating the quality of data.

 1つの態様では、情報処理プログラムが提供される。情報処理プログラムは、コンピュータに、複数のソフトウェアの各々に対する入力データおよび出力データの履歴を示す来歴情報に基づいて、複数のソフトウェアによるデータ処理の始点である第1の入力データの情報を抽出し、第1の入力データの情報がユーザにより入力された、所定のデータの情報に一致するか否かの比較に応じて、データ処理により出力される第1の出力データの品質評価を行う、処理を実行させる。 In one aspect, an information processing program is provided. The information processing program extracts to the computer the information of the first input data which is the starting point of data processing by the plurality of software based on the history information indicating the history of the input data and the output data for each of the plurality of software. Processing that evaluates the quality of the first output data output by the data processing according to the comparison of whether or not the information of the first input data matches the information of the predetermined data input by the user. Let it run.

 また、1つの態様では、情報処理方法が提供される。
 また、1つの対象では、情報処理装置が提供される。
Further, in one aspect, an information processing method is provided.
Further, in one object, an information processing device is provided.

 1つの側面では、データの品質を適切に評価することができる。
 本発明の上記および他の目的、特徴および利点は本発明の例として好ましい実施の形態を表す添付の図面と関連した以下の説明により明らかになるであろう。
In one aspect, the quality of the data can be adequately evaluated.
The above and other objects, features and advantages of the invention will be apparent by the following description in connection with the accompanying drawings representing preferred embodiments of the invention.

第1の実施の形態の情報処理装置を説明する図である。It is a figure explaining the information processing apparatus of 1st Embodiment. 第2の実施の形態のシステム例を示す図である。It is a figure which shows the system example of the 2nd Embodiment. 情報処理装置のハードウェア例を示す図である。It is a figure which shows the hardware example of an information processing apparatus. 情報処理システムにおけるソフトウェアの例を示す図である。It is a figure which shows the example of software in an information processing system. 情報処理装置の機能例を示す図である。It is a figure which shows the functional example of an information processing apparatus. 来歴情報の例を示す図である。It is a figure which shows the example of the history information. 来歴評価の例を示す図である。It is a figure which shows the example of the provenance evaluation. セキュリティ評価の例を示す図である。It is a figure which shows the example of the security evaluation. アクセス権限予測の例を示す図である。It is a figure which shows the example of access authority prediction. 最新性評価の例を示す図である。It is a figure which shows the example of the up-to-dateness evaluation. 総合評価結果テーブルの例を示す図である。It is a figure which shows the example of the comprehensive evaluation result table. 評価結果画面の第1の例を示す図である。It is a figure which shows the 1st example of the evaluation result screen. 評価結果画面の第2の例を示す図である。It is a figure which shows the 2nd example of the evaluation result screen. 情報処理装置の処理例を示すフローチャートである。It is a flowchart which shows the processing example of an information processing apparatus. 来歴評価例を示すフローチャートである。It is a flowchart which shows the history evaluation example. セキュリティ評価例を示すフローチャートである。It is a flowchart which shows the security evaluation example. 最新性評価例を示すフローチャートである。It is a flowchart which shows the latestness evaluation example. 評価結果表示制御例を示すフローチャートである。It is a flowchart which shows the evaluation result display control example.

 以下、本実施の形態について図面を参照して説明する。
 [第1の実施の形態]
 第1の実施の形態を説明する。
Hereinafter, the present embodiment will be described with reference to the drawings.
[First Embodiment]
The first embodiment will be described.

 図1は、第1の実施の形態の情報処理装置を説明する図である。
 情報処理装置10は、記憶部11および処理部12を有する。記憶部11は、RAM(Random Access Memory)などの揮発性記憶装置でもよいし、HDD(Hard Disk Drive)やフラッシュメモリなどの不揮発性記憶装置でもよい。処理部12は、CPU(Central Processing Unit)、DSP(Digital Signal Processor)、ASIC(Application Specific Integrated Circuit)、FPGA(Field Programmable Gate Array)などを含み得る。処理部12はプログラムを実行するプロセッサであってもよい。ここでいう「プロセッサ」には、複数のプロセッサの集合(マルチプロセッサ)も含まれ得る。
FIG. 1 is a diagram illustrating an information processing apparatus according to the first embodiment.
The information processing device 10 has a storage unit 11 and a processing unit 12. The storage unit 11 may be a volatile storage device such as a RAM (Random Access Memory) or a non-volatile storage device such as an HDD (Hard Disk Drive) or a flash memory. The processing unit 12 may include a CPU (Central Processing Unit), a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array), and the like. The processing unit 12 may be a processor that executes a program. The "processor" here may include a set of a plurality of processors (multiprocessor).

 記憶部11は、複数のソフトウェアの各々に対する入力データおよび出力データの履歴を示す来歴情報20を記憶する。来歴情報20は、例えば、複数のソフトウェアを実行する、図示を省略している情報処理システムから取得される。例えば、情報処理装置10は、ネットワークを介して当該情報処理システムと接続され、ネットワークを介して当該情報処理システムから来歴情報20を取得して記憶部11に格納してもよい。あるいは、来歴情報20は、ユーザによって情報処理装置10に入力され、記憶部11に格納されてもよい。来歴情報20は、データの出処やデータの加工や整形などの履歴を示すデータリネージの情報でもよい。 The storage unit 11 stores the history information 20 indicating the history of the input data and the output data for each of the plurality of software. Provenance information 20 is obtained, for example, from an information processing system (not shown) that executes a plurality of software. For example, the information processing apparatus 10 may be connected to the information processing system via a network, acquire the history information 20 from the information processing system via the network, and store the history information 20 in the storage unit 11. Alternatively, the history information 20 may be input to the information processing apparatus 10 by the user and stored in the storage unit 11. Provenance information 20 may be data lineage information showing the history of data source, data processing, shaping, and the like.

 処理部12は、記憶部11に記憶された来歴情報20に基づいて、複数のソフトウェアによるデータ処理の始点である第1の入力データの情報を抽出する。処理部12は、第1の入力データの情報がユーザにより入力された、所定のデータの情報に一致するか否かの比較に応じて、当該データ処理により出力される第1の出力データの品質評価を行う。 The processing unit 12 extracts the information of the first input data which is the starting point of the data processing by the plurality of software based on the history information 20 stored in the storage unit 11. The processing unit 12 determines the quality of the first output data output by the data processing according to the comparison of whether or not the information of the first input data matches the information of the predetermined data input by the user. Make an evaluation.

 第1の入力データの情報としては、第1の入力データのデータ名などの識別情報や、第1の入力データを格納しているデータ記憶部41の識別情報などが考えられる。データ記憶部41の識別情報は、例えばデータ記憶部41に対応するDB(Database)のDB名やディレクトリパス、あるいは、データ記憶部41を提供する記憶装置の識別名などが考えられる。所定のデータの情報は、ユーザが入力データとして想定しているデータに関する情報であり、ユーザが本来の入力として想定しているデータの識別情報や、当該データを格納するデータ記憶部の識別情報などが考えられる。 As the information of the first input data, identification information such as a data name of the first input data, identification information of the data storage unit 41 storing the first input data, and the like can be considered. The identification information of the data storage unit 41 may be, for example, the DB name or directory path of the DB (Database) corresponding to the data storage unit 41, or the identification name of the storage device that provides the data storage unit 41. The information of the predetermined data is information about the data assumed as the input data by the user, such as the identification information of the data assumed as the original input by the user, the identification information of the data storage unit for storing the data, and the like. Can be considered.

 例えば、来歴情報20は、情報処理システムで実行されるソフトウェア31,32によるデータ処理に関する情報を含む。当該データ処理に使用されるデータは、データd1,d2,d3である。例えば、来歴情報20は、データ記憶部41に記憶されたデータd1がソフトウェア31に入力されて、データd1に基づいてソフトウェア31の処理p1によりデータd2が生成され、データd2がデータ記憶部42に格納されることを示す。 For example, the history information 20 includes information related to data processing by software 31 and 32 executed in the information processing system. The data used for the data processing are data d1, d2, d3. For example, in the history information 20, the data d1 stored in the data storage unit 41 is input to the software 31, the data d2 is generated by the processing p1 of the software 31 based on the data d1, and the data d2 is stored in the data storage unit 42. Indicates that it will be stored.

 また、来歴情報20は、データ記憶部42に記憶されたデータd2がソフトウェア32に入力されて、データd2に基づいてソフトウェア32の処理p2によりデータd3が生成され、データd3がデータ記憶部43に格納されることを示す。なお、データ記憶部41,42,43は、情報処理システムに含まれる記憶装置により実現されてもよいし、情報処理システムの外部の記憶装置により実現されてもよい。 Further, in the history information 20, the data d2 stored in the data storage unit 42 is input to the software 32, the data d3 is generated by the processing p2 of the software 32 based on the data d2, and the data d3 is stored in the data storage unit 43. Indicates that it will be stored. The data storage units 41, 42, and 43 may be realized by a storage device included in the information processing system, or may be realized by a storage device external to the information processing system.

 例えば、処理部12は、来歴情報20に基づいて次のように、データの品質評価を行う。まず、処理部12は、来歴情報20に基づいて、ソフトウェア31,32によるデータ処理により出力されるデータd3を特定する。そして、処理部12は、来歴情報20に基づいて、データd3の生成元データがデータd2であることを特定する。更に、処理部12は、来歴情報20に基づいて、データd2の生成元データがデータd1であることを特定する。 For example, the processing unit 12 evaluates the quality of data as follows based on the history information 20. First, the processing unit 12 identifies the data d3 output by the data processing by the software 31 and 32 based on the history information 20. Then, the processing unit 12 specifies that the generation source data of the data d3 is the data d2 based on the history information 20. Further, the processing unit 12 specifies that the generation source data of the data d2 is the data d1 based on the history information 20.

 処理部12は、来歴情報20に基づいて、データd1の前段階の生成元データはないことを検出する。このため、処理部12は、データd3を出力するデータ処理の始点の入力データは、データd1であると特定する。したがって、処理部12は、来歴情報20に基づいて、データd3に対する始点の入力データの情報として、データd1の情報を抽出する(ステップS1)。データd1は、第1の入力データの一例である。 The processing unit 12 detects that there is no source data in the previous stage of the data d1 based on the history information 20. Therefore, the processing unit 12 specifies that the input data at the start point of the data processing that outputs the data d3 is the data d1. Therefore, the processing unit 12 extracts the information of the data d1 as the information of the input data of the starting point for the data d3 based on the history information 20 (step S1). The data d1 is an example of the first input data.

 処理部12は、データd1の情報が、所定のデータの情報に一致するか否かの比較に応じて、データd3の品質評価を行う(ステップS2)。例えば、処理部12は、データd1の識別情報が、ユーザの想定する所定のデータの識別情報に一致するか否かの比較を行ってもよい。データの識別情報の比較に加えて、あるいは当該比較に代えて、処理部12は、データd1が格納されているデータ記憶部41の識別情報が、ユーザの想定する所定のデータ記憶部の識別情報に一致するか否かの比較を行ってもよい。 The processing unit 12 evaluates the quality of the data d3 according to the comparison of whether or not the information of the data d1 matches the information of the predetermined data (step S2). For example, the processing unit 12 may compare whether or not the identification information of the data d1 matches the identification information of the predetermined data assumed by the user. In addition to or instead of comparing the data identification information, the processing unit 12 uses the identification information of the data storage unit 41 in which the data d1 is stored to be the identification information of the predetermined data storage unit assumed by the user. You may make a comparison as to whether or not they match.

 処理部12は、データd1の情報が、所定のデータの情報に一致する場合、データd3の品質を、一致しない場合よりも高く評価する。例えば、処理部12は、データd3に対し、値が大きい程品質が高いことを示す品質の指標値を設けてもよい。この場合、処理部12は、データd1の情報が、所定のデータの情報に一致する場合に指標値に所定値を加算し、一致しない場合に指標値への加算を行わないようにする。 When the information of the data d1 matches the information of the predetermined data, the processing unit 12 evaluates the quality of the data d3 higher than the case where the information does not match. For example, the processing unit 12 may provide the data d3 with a quality index value indicating that the larger the value, the higher the quality. In this case, the processing unit 12 adds a predetermined value to the index value when the information of the data d1 matches the information of the predetermined data, and prevents addition to the index value when the information does not match.

 データd1の情報に含まれる複数の項目と所定のデータの情報に含まれる複数の項目とを比較する場合、処理部12は、複数の項目が完全一致するか否かに応じて、指標値に所定値を加算してもよいし、一致する項目数に応じて、加算する値を変えてもよい。 When comparing a plurality of items included in the information of the data d1 with a plurality of items included in the information of the predetermined data, the processing unit 12 sets the index value according to whether or not the plurality of items completely match. A predetermined value may be added, or the value to be added may be changed according to the number of matching items.

 また、処理部12は、データd3に対する始点の入力データとして、データd1に加えて他のデータを抽出することもある。その場合、複数の始点の入力データそれぞれの情報が、所定のデータの情報に一致するか否かの比較に応じて、データd3の品質評価を行う。所定のデータの情報には、ユーザの想定する複数のデータの情報が含まれてもよい。 Further, the processing unit 12 may extract other data in addition to the data d1 as the input data of the starting point for the data d3. In that case, the quality of the data d3 is evaluated according to the comparison of whether or not the information of each of the input data of the plurality of start points matches the information of the predetermined data. The information of the predetermined data may include information of a plurality of data assumed by the user.

 なお、処理部12は、加点方式ではなく、減点方式で品質の評価値を求めてもよい。また、品質の指標値は、値が小さいほど品質が高いことを示すものでもよく、その場合、上記の「加算」を「減算」に読み替えればよい。 Note that the processing unit 12 may obtain the quality evaluation value by the point deduction method instead of the point addition method. Further, the quality index value may indicate that the smaller the value, the higher the quality. In that case, the above "addition" may be read as "subtraction".

 情報処理装置10によれば、複数のソフトウェアの各々に対する入力データおよび出力データの履歴を示す来歴情報に基づいて、複数のソフトウェアによるデータ処理の始点である第1の入力データの情報が抽出される。第1の入力データの情報がユーザにより入力された、所定のデータの情報に一致するか否かの比較に応じて、データ処理により出力される第1の出力データの品質評価が行われる。 According to the information processing apparatus 10, the information of the first input data which is the starting point of the data processing by the plurality of software is extracted based on the history information indicating the history of the input data and the output data for each of the plurality of software. .. The quality of the first output data output by the data processing is evaluated according to the comparison of whether or not the information of the first input data matches the information of the predetermined data input by the user.

 これにより、データの品質を適切に評価することができる。
 情報処理システムでは、種々のソフトウェアが実行されており、複数のソフトウェアによる一連のデータ処理において、ユーザが意図するデータが処理されているか否かにより、データ処理により出力されるデータの品質が変わる。例えば、分析などのデータ処理を行う場合、ユーザが意図しない入力データが処理されていると、当該入力データに不要な情報や誤った情報が含まれていることなどが要因となり、データ処理の結果が誤っている可能性が高まるため、当該結果の信頼性が低下する。
This makes it possible to appropriately evaluate the quality of the data.
In the information processing system, various softwares are executed, and in a series of data processing by a plurality of softwares, the quality of the data output by the data processing changes depending on whether or not the data intended by the user is processed. For example, when performing data processing such as analysis, if input data not intended by the user is processed, the input data may contain unnecessary information or incorrect information, resulting in data processing. Is more likely to be wrong, which reduces the reliability of the result.

 そこで、情報処理装置10は、来歴情報20に基づき、複数のソフトウェアによるデータ処理の始点のデータd1を特定し、データd1がユーザの意図する入力であるか否かを確認することで、当該データ処理により出力されるデータd3の品質を適切に評価できる。 Therefore, the information processing apparatus 10 identifies the data d1 at the start point of data processing by the plurality of software based on the history information 20, and confirms whether or not the data d1 is the input intended by the user. The quality of the data d3 output by the processing can be appropriately evaluated.

 処理部12は、情報処理システムにより出力される、データd3を含む複数の出力データに対して、上記の品質評価を行うことができる。処理部12は、データd3を含む複数の出力データそれぞれに対する品質評価の結果を、表示装置により表示させてもよい。例えば、処理部12は、各出力データに対応する始点の入力データから当該出力データに至るデータフロー図を、出力データ毎に表示装置に表示させてもよい。この場合、処理部12は、品質が基準よりも低いと評価される出力データに関するデータフロー図を強調表示させてもよい。このようにして、ユーザによる情報処理システムにおけるデータフローの見直しを支援することも考えられる。 The processing unit 12 can perform the above quality evaluation on a plurality of output data including the data d3 output by the information processing system. The processing unit 12 may display the result of quality evaluation for each of the plurality of output data including the data d3 on the display device. For example, the processing unit 12 may display a data flow diagram from the input data of the start point corresponding to each output data to the output data on the display device for each output data. In this case, the processing unit 12 may highlight the data flow diagram relating to the output data whose quality is evaluated to be lower than the standard. In this way, it is possible to support the user in reviewing the data flow in the information processing system.

 以下では、より具体的な例を示して、情報処理装置10の機能を詳細に説明する。
 [第2の実施の形態]
 次に、第2の実施の形態を説明する。
Hereinafter, the function of the information processing apparatus 10 will be described in detail by showing a more specific example.
[Second Embodiment]
Next, a second embodiment will be described.

 図2は、第2の実施の形態のシステム例を示す図である。
 第2の実施の形態のシステムは、情報処理システム50および情報処理装置100を含む。情報処理システム50および情報処理装置100は、ネットワーク60に接続されている。ネットワーク60は、インターネットやWAN(Wide Area Network)でもよいし、LAN(Local Area Network)でもよい。
FIG. 2 is a diagram showing a system example of the second embodiment.
The system of the second embodiment includes an information processing system 50 and an information processing device 100. The information processing system 50 and the information processing apparatus 100 are connected to the network 60. The network 60 may be the Internet, a WAN (Wide Area Network), or a LAN (Local Area Network).

 情報処理システム50は、種々のソフトウェアを実行し、複数のソフトウェアを組み合わせた様々なデータ処理を実行する。ソフトウェアは、製品あるいはソフトウェア製品などと呼ばれてもよい。情報処理システム50は、サーバ装置200,300,…を有する。サーバ装置200,300,…それぞれは、ソフトウェアを実行するサーバコンピュータである。サーバ装置200,300,…それぞれは、情報処理システム50の内部ネットワークに接続される。サーバ装置200,300,…それぞれは、複数のソフトウェアを実行してもよい。また、あるサーバ装置で実行されるソフトウェアが他のサーバ装置で実行される他のソフトウェアと連携することもある。情報処理システム50は、データを蓄積し、サーバ装置間でのデータの受け渡しに用いられる記憶装置を含み得る。 The information processing system 50 executes various software and executes various data processing in which a plurality of software are combined. The software may be referred to as a product or a software product. The information processing system 50 includes server devices 200, 300, .... Server devices 200, 300, ... Each is a server computer that executes software. The server devices 200, 300, ... Each are connected to the internal network of the information processing system 50. Each of the server devices 200, 300, ... may execute a plurality of software. In addition, software executed on one server device may be linked with other software executed on another server device. The information processing system 50 may include a storage device that stores data and is used for exchanging data between server devices.

 情報処理装置100は、情報処理システム50におけるデータリネージの情報を活用して、情報処理システム50における複数のソフトウェアを跨いだデータ品質評価を行うサーバコンピュータである。情報処理装置100は、第1の実施の形態の情報処理装置10の一例である。 The information processing device 100 is a server computer that utilizes data lineage information in the information processing system 50 to perform data quality evaluation across a plurality of software in the information processing system 50. The information processing device 100 is an example of the information processing device 10 of the first embodiment.

 図3は、情報処理装置のハードウェア例を示す図である。
 情報処理装置100は、CPU101、RAM102、HDD103、GPU(Graphics Processing Unit)104、入力インタフェース105、媒体リーダ106およびNIC(Network Interface Card)107を有する。なお、CPU101は、第1の実施の形態の処理部12の一例である。RAM102またはHDD103は、第1の実施の形態の記憶部11の一例である。
FIG. 3 is a diagram showing a hardware example of the information processing device.
The information processing device 100 includes a CPU 101, a RAM 102, an HDD 103, a GPU (Graphics Processing Unit) 104, an input interface 105, a medium reader 106, and a NIC (Network Interface Card) 107. The CPU 101 is an example of the processing unit 12 of the first embodiment. The RAM 102 or the HDD 103 is an example of the storage unit 11 of the first embodiment.

 CPU101は、プログラムの命令を実行するプロセッサである。CPU101は、HDD103に記憶されたプログラムやデータの少なくとも一部をRAM102にロードし、プログラムを実行する。なお、CPU101は複数のプロセッサコアを含んでもよい。また、情報処理装置100は複数のプロセッサを有してもよい。以下で説明する処理は複数のプロセッサまたはプロセッサコアを用いて並列に実行されてもよい。また、複数のプロセッサの集合を「マルチプロセッサ」または単に「プロセッサ」と言うことがある。 The CPU 101 is a processor that executes a program instruction. The CPU 101 loads at least a part of the programs and data stored in the HDD 103 into the RAM 102 and executes the program. The CPU 101 may include a plurality of processor cores. Further, the information processing device 100 may have a plurality of processors. The processes described below may be performed in parallel using multiple processors or processor cores. Also, a set of multiple processors may be referred to as a "multiprocessor" or simply a "processor".

 RAM102は、CPU101が実行するプログラムやCPU101が演算に用いるデータを一時的に記憶する揮発性の半導体メモリである。なお、情報処理装置100は、RAM以外の種類のメモリを備えてもよく、複数個のメモリを備えてもよい。 The RAM 102 is a volatile semiconductor memory that temporarily stores a program executed by the CPU 101 and data used by the CPU 101 for calculation. The information processing apparatus 100 may include a type of memory other than the RAM, or may include a plurality of memories.

 HDD103は、OS(Operating System)やミドルウェアやアプリケーションソフトウェアなどのソフトウェアのプログラム、および、データを記憶する不揮発性の記憶装置である。なお、情報処理装置100は、フラッシュメモリやSSD(Solid State Drive)などの他の種類の記憶装置を備えてもよく、複数の不揮発性の記憶装置を備えてもよい。 The HDD 103 is a non-volatile storage device that stores software programs such as an OS (Operating System), middleware, and application software, and data. The information processing device 100 may be provided with other types of storage devices such as a flash memory and an SSD (Solid State Drive), or may be provided with a plurality of non-volatile storage devices.

 GPU104は、CPU101からの命令に従って、情報処理装置100に接続されたディスプレイ61に画像を出力する。ディスプレイ61としては、CRT(Cathode Ray Tube)ディスプレイ、液晶ディスプレイ(LCD:Liquid Crystal Display)、プラズマディスプレイ、有機EL(OEL:Organic Electro-Luminescence)ディスプレイなど、任意の種類のディスプレイを用いることができる。 The GPU 104 outputs an image to the display 61 connected to the information processing apparatus 100 in accordance with a command from the CPU 101. As the display 61, any kind of display such as a CRT (Cathode RayTube) display, a liquid crystal display (LCD: Liquid Crystal Display), a plasma display, and an organic EL (OEL: Organic Electro-Luminescence) display can be used.

 入力インタフェース105は、情報処理装置100に接続された入力デバイス62から入力信号を取得し、CPU101に出力する。入力デバイス62としては、マウス・タッチパネル・タッチパッド・トラックボールなどのポインティングデバイス、キーボード、リモートコントローラ、ボタンスイッチなどを用いることができる。また、情報処理装置100に、複数の種類の入力デバイスが接続されていてもよい。 The input interface 105 acquires an input signal from the input device 62 connected to the information processing device 100 and outputs the input signal to the CPU 101. As the input device 62, a pointing device such as a mouse, a touch panel, a touch pad, a trackball, a keyboard, a remote controller, a button switch, or the like can be used. Further, a plurality of types of input devices may be connected to the information processing apparatus 100.

 媒体リーダ106は、記録媒体63に記録されたプログラムやデータを読み取る読み取り装置である。記録媒体63として、例えば、磁気ディスク、光ディスク、光磁気ディスク(MO:Magneto-Optical disk)、半導体メモリなどを使用できる。磁気ディスクには、フレキシブルディスク(FD:Flexible Disk)やHDDが含まれる。光ディスクには、CD(Compact Disc)やDVD(Digital Versatile Disc)が含まれる。 The medium reader 106 is a reading device that reads programs and data recorded on the recording medium 63. As the recording medium 63, for example, a magnetic disk, an optical disk, a magneto-optical disk (MO: Magneto-Optical disk), a semiconductor memory, or the like can be used. The magnetic disk includes a flexible disk (FD: Flexible Disk) and an HDD. Optical discs include CDs (Compact Discs) and DVDs (Digital Versatile Discs).

 媒体リーダ106は、例えば、記録媒体63から読み取ったプログラムやデータを、RAM102やHDD103などの他の記録媒体にコピーする。読み取られたプログラムは、例えば、CPU101によって実行される。なお、記録媒体63は可搬型記録媒体であってもよく、プログラムやデータの配布に用いられることがある。また、記録媒体63やHDD103を、コンピュータ読み取り可能な記録媒体と言うことがある。 The medium reader 106 copies, for example, a program or data read from the recording medium 63 to another recording medium such as the RAM 102 or the HDD 103. The read program is executed by, for example, the CPU 101. The recording medium 63 may be a portable recording medium and may be used for distribution of programs and data. Further, the recording medium 63 and the HDD 103 may be referred to as a computer-readable recording medium.

 NIC107は、ネットワーク60に接続され、ネットワーク60を介して他のコンピュータと通信を行うインタフェースである。NIC107は、例えば、スイッチやルータなどの通信装置とケーブルで接続される。 NIC107 is an interface that is connected to the network 60 and communicates with other computers via the network 60. The NIC 107 is connected to a communication device such as a switch or a router by a cable.

 サーバ装置200,300,…も情報処理装置100と同様のハードウェアにより実現される。
 図4は、情報処理システムにおけるソフトウェアの例を示す図である。
The server devices 200, 300, ... Are also realized by the same hardware as the information processing device 100.
FIG. 4 is a diagram showing an example of software in an information processing system.

 情報処理システム50は、データ取得ソフトウェア51、データ加工ソフトウェア52、データ蓄積部53およびデータ整形ソフトウェア54を有する。ここで挙げたソフトウェアは一例であり、情報処理システム50は、これらのソフトウェアの少なくとも一部に代えて、または、これらのソフトウェアに加えて、他の処理を行う他のソフトウェアを有してもよい。データ蓄積部53は、情報処理システム50が有する記憶装置により実現される。 The information processing system 50 includes data acquisition software 51, data processing software 52, data storage unit 53, and data shaping software 54. The software mentioned here is an example, and the information processing system 50 may have other software that performs other processing in place of or in addition to at least a part of these software. .. The data storage unit 53 is realized by a storage device included in the information processing system 50.

 例えば、データ取得ソフトウェア51、データ加工ソフトウェア52およびデータ整形ソフトウェア54は、この順番で前段のソフトウェアにより処理されたデータを取得し、処理結果のデータを出力する。 For example, the data acquisition software 51, the data processing software 52, and the data shaping software 54 acquire the data processed by the software in the previous stage in this order, and output the processing result data.

 データ取得ソフトウェア51は、入力データ記憶部70から入力データA1を取得し、データ加工ソフトウェア52に提供する。データ加工ソフトウェア52は、データ加工処理s1により入力データA1から蓄積データA2を生成する。データ加工ソフトウェア52は、蓄積データA2をデータ蓄積部53に格納する。 The data acquisition software 51 acquires the input data A1 from the input data storage unit 70 and provides it to the data processing software 52. The data processing software 52 generates the accumulated data A2 from the input data A1 by the data processing processing s1. The data processing software 52 stores the stored data A2 in the data storage unit 53.

 また、データ取得ソフトウェア51は、入力データ記憶部71から入力データB1を取得し、データ加工ソフトウェア52に提供する。データ加工ソフトウェア52は、データ加工処理s2により入力データB1から蓄積データB2を生成する。データ加工ソフトウェア52は、蓄積データB2をデータ蓄積部53に格納する。なお、ある蓄積データの生成に使用される入力データは、複数であってもよい。 Further, the data acquisition software 51 acquires the input data B1 from the input data storage unit 71 and provides it to the data processing software 52. The data processing software 52 generates the accumulated data B2 from the input data B1 by the data processing processing s2. The data processing software 52 stores the stored data B2 in the data storage unit 53. The number of input data used to generate a certain stored data may be plural.

 データ整形ソフトウェア54は、データ蓄積部53に格納された蓄積データA2,B2を取得する。データ整形ソフトウェア54は、データ整形処理s3により蓄積データA2から活用データA3を生成し、活用データA3を活用データ記憶部80に格納する。データ整形ソフトウェア54は、データ整形処理s4により蓄積データA2,B2から活用データABを生成し、活用データABを活用データ記憶部80に格納する。データ整形ソフトウェア54は、データ整形処理s5により蓄積データA2,B2から活用データABを生成し、活用データABを活用データ記憶部80に格納する。活用データは、一連のデータ処理によって最終的に出力される出力データであると言える。 The data shaping software 54 acquires the stored data A2 and B2 stored in the data storage unit 53. The data shaping software 54 generates the utilization data A3 from the accumulated data A2 by the data shaping process s3, and stores the utilization data A3 in the utilization data storage unit 80. The data shaping software 54 generates the utilization data AB from the accumulated data A2 and B2 by the data shaping process s4, and stores the utilization data AB in the utilization data storage unit 80. The data shaping software 54 generates the utilization data AB from the accumulated data A2 and B2 by the data shaping process s5, and stores the utilization data AB in the utilization data storage unit 80. It can be said that the utilization data is output data that is finally output by a series of data processing.

 入力データ記憶部70,71および活用データ記憶部80の少なくとも一部は、情報処理システム50が有する記憶装置により実現されてもよい。また、入力データ記憶部70,71および活用データ記憶部80の少なくとも一部は、情報処理システム50の外部の、情報処理システム50からネットワーク60を介してアクセス可能な記憶装置により実現されてもよい。また、「蓄積データ」は、「入力データ」から「活用データ」を作成する中間のデータであるので、「中間データ」とも呼べる。更に、「データ」は、ファイル、テーブルまたはレコードなどと呼ばれる情報の一単位でもよい。 At least a part of the input data storage units 70 and 71 and the utilization data storage unit 80 may be realized by the storage device included in the information processing system 50. Further, at least a part of the input data storage units 70 and 71 and the utilization data storage unit 80 may be realized by a storage device external to the information processing system 50 and accessible from the information processing system 50 via the network 60. .. Further, since the "accumulated data" is intermediate data for creating "utilization data" from "input data", it can also be called "intermediate data". Further, the "data" may be a unit of information called a file, a table, a record, or the like.

 図5は、情報処理装置の機能例を示す図である。
 情報処理装置100は、記憶部110、来歴情報解析部130、評価部140および表示制御部150を有する。記憶部110には、RAM102やHDD103の記憶領域が用いられる。来歴情報解析部130、評価部140および表示制御部150は、RAM102に記憶されたプログラムがCPU101により実行されることで実現される。
FIG. 5 is a diagram showing a functional example of the information processing apparatus.
The information processing apparatus 100 includes a storage unit 110, a history information analysis unit 130, an evaluation unit 140, and a display control unit 150. The storage area of the RAM 102 or the HDD 103 is used for the storage unit 110. The history information analysis unit 130, the evaluation unit 140, and the display control unit 150 are realized by executing the program stored in the RAM 102 by the CPU 101.

 記憶部110は、来歴情報解析部130、評価部140および表示制御部150の処理に用いられる情報を記憶する。記憶部110が記憶する情報は、来歴情報を含む。来歴情報は、ユーザによって情報処理システム50から取得され、記憶部110に格納される。あるいは、来歴情報は、情報処理システム50から取得されるクエリ実行ログを来歴情報解析部130により解析することで作成され、記憶部110に格納されてもよい。 The storage unit 110 stores information used for processing of the history information analysis unit 130, the evaluation unit 140, and the display control unit 150. The information stored in the storage unit 110 includes the history information. Provenance information is acquired from the information processing system 50 by the user and stored in the storage unit 110. Alternatively, the history information may be created by analyzing the query execution log acquired from the information processing system 50 by the history information analysis unit 130 and stored in the storage unit 110.

 来歴情報解析部130は、記憶部110に記憶された来歴情報の解析を行う。来歴情報解析部130は、入力データ抽出部131、アクセス権限予測部132および遅延時間算出部133を有する。 The history information analysis unit 130 analyzes the history information stored in the storage unit 110. The history information analysis unit 130 has an input data extraction unit 131, an access authority prediction unit 132, and a delay time calculation unit 133.

 入力データ抽出部131は、来歴情報に基づいて、情報処理システム50における複数のソフトウェアにより実現されるデータ処理の始点の入力データを抽出し、当該始点の入力データの情報を記憶部110に格納する。入力データ抽出部131は、来歴情報を基に、活用データの生成フローを遡って辿ることで、当該活用データに対応する始点の入力データを抽出する。 The input data extraction unit 131 extracts the input data of the start point of the data processing realized by the plurality of software in the information processing system 50 based on the history information, and stores the information of the input data of the start point in the storage unit 110. .. The input data extraction unit 131 extracts the input data of the starting point corresponding to the utilization data by tracing back the generation flow of the utilization data based on the history information.

 アクセス権限予測部132は、来歴情報および始点の入力データの情報に基づいて、当該入力データに基づいて生成される蓄積データ、および、蓄積データに基づいて生成される活用データのアクセス権限を予測する。アクセス権限予測部132は、予測したアクセス権限の情報を記憶部110に格納する。アクセス権限予測部132は、来歴情報に基づいて、データ加工やデータ整形などの処理内容を特定する。アクセス権限予測部132は、当該データ加工やデータ整形の入力となるデータのアクセス権限と、特定した処理内容とを基に、データ加工やデータ整形により出力されるデータのアクセス権限を予測する。 The access authority prediction unit 132 predicts the access authority of the accumulated data generated based on the input data and the utilization data generated based on the accumulated data based on the history information and the information of the input data of the start point. .. The access authority prediction unit 132 stores the predicted access authority information in the storage unit 110. The access authority prediction unit 132 specifies processing contents such as data processing and data shaping based on the history information. The access authority prediction unit 132 predicts the access authority of the data output by the data processing or the data shaping based on the access authority of the data which is the input of the data processing or the data shaping and the specified processing content.

 遅延時間算出部133は、来歴情報および情報処理システム50で処理されたデータの生成ログを基に、一連のデータ処理における入力から出力までの遅延時間を算出し、算出した遅延時間の情報を記憶部110に格納する。 The delay time calculation unit 133 calculates the delay time from input to output in a series of data processing based on the history information and the data generation log processed by the information processing system 50, and stores the calculated delay time information. It is stored in the unit 110.

 評価部140は、来歴情報解析部130による解析結果に基づいて、情報処理システム50における複数のソフトウェアを用いたデータ処理により生成されるデータの品質を評価する。評価部140は、次の3つの評価種別によって、データの品質を評価する。 The evaluation unit 140 evaluates the quality of the data generated by the data processing using the plurality of software in the information processing system 50 based on the analysis result by the history information analysis unit 130. The evaluation unit 140 evaluates the quality of data according to the following three evaluation types.

 第1の評価種別は、データ来歴である。評価部140は、入力データ抽出部131により抽出された始点の入力データの情報に基づいてデータ来歴の評価を行う。具体的には、評価部140は、始点の入力データが、ユーザが意図するデータに一致するか否かによりデータ来歴を評価する。 The first evaluation type is data history. The evaluation unit 140 evaluates the data history based on the information of the input data of the starting point extracted by the input data extraction unit 131. Specifically, the evaluation unit 140 evaluates the data history based on whether or not the input data of the starting point matches the data intended by the user.

 第2の評価種別は、セキュリティである。評価部140は、アクセス権限予測部132によるアクセス権限の予測結果に基づいて、データに関するセキュリティの評価を行う。具体的には、評価部140は、情報処理システム50で生成されるデータのアクセス権限が、当該データの生成元データのアクセス権限から予測される適切なアクセス権限となっているか否かにより当該データに関するセキュリティを評価する。 The second evaluation type is security. The evaluation unit 140 evaluates the security of the data based on the prediction result of the access authority by the access authority prediction unit 132. Specifically, the evaluation unit 140 determines whether or not the access authority of the data generated by the information processing system 50 is an appropriate access authority predicted from the access authority of the data from which the data is generated. Evaluate the security of.

 第3の評価種別は、最新性である。評価部140は、遅延時間算出部133により算出されたデータ生成の遅延時間に基づいて、データの最新性の評価を行う。具体的には、評価部140は、情報処理システム50で生成されたデータが、生成元データの発生から、許容される遅延時間内に生成されたか否かにより当該データの最新性を評価する。 The third evaluation type is up-to-date. The evaluation unit 140 evaluates the latestness of the data based on the data generation delay time calculated by the delay time calculation unit 133. Specifically, the evaluation unit 140 evaluates the up-to-dateness of the data based on whether or not the data generated by the information processing system 50 is generated within an allowable delay time from the generation of the generation source data.

 評価部140は、データ来歴、セキュリティ、および、最新性それぞれの評価種別に対して評価値を与える。評価値は、品質の高さの度合いを示す指標である。評価値の値が大きいほど品質が高いことを示し、評価値の値が小さいほど品質が低いことを示す。各評価種別に対する評価値には、「1」などの所定値が上限として設けられてもよい。評価部140は、各データに対して評価種別ごとに付与した評価値を、記憶部110に格納する。また、評価部140は、各データに対する効果項目ごとの評価値に基づいて、活用データの総合評価を行い、総合評価の結果を記憶部110に格納する。活用データの総合評価の結果は、当該活用データを生成したデータ処理に対する評価の結果としても使用される。 The evaluation unit 140 gives evaluation values for each evaluation type of data history, security, and up-to-dateness. The evaluation value is an index showing the degree of high quality. The larger the evaluation value, the higher the quality, and the smaller the evaluation value, the lower the quality. A predetermined value such as "1" may be set as an upper limit for the evaluation value for each evaluation type. The evaluation unit 140 stores the evaluation value assigned to each data for each evaluation type in the storage unit 110. Further, the evaluation unit 140 performs a comprehensive evaluation of the utilization data based on the evaluation value for each effect item for each data, and stores the result of the comprehensive evaluation in the storage unit 110. The result of the comprehensive evaluation of the utilization data is also used as the result of the evaluation for the data processing that generated the utilization data.

 表示制御部150は、評価部140による評価結果を、ディスプレイ61に表示させたり、ネットワーク60を介して他のコンピュータに送信したりする。表示制御部150は、情報処理システム50におけるデータフロー図を出力し、データの評価値に応じて、データフロー図におけるデータを示すアイコンの表示態様を制御する。例えば、表示制御部150は、評価値が基準よりも高いデータと評価値が基準よりも低いデータとを区別して表示させる。また、表示制御部150は、データフロー図に含まれる複数のデータフローのうちユーザにより指定されたアイコンに対応するデータフローに表示を絞り込む制御を行う。 The display control unit 150 displays the evaluation result by the evaluation unit 140 on the display 61, or transmits the evaluation result to another computer via the network 60. The display control unit 150 outputs a data flow diagram in the information processing system 50, and controls the display mode of the icon indicating the data in the data flow diagram according to the evaluation value of the data. For example, the display control unit 150 distinguishes and displays data having an evaluation value higher than the reference and data having an evaluation value lower than the reference. Further, the display control unit 150 controls to narrow down the display to the data flow corresponding to the icon designated by the user among the plurality of data flows included in the data flow diagram.

 図6は、来歴情報の例を示す図である。
 来歴情報111は、記憶部110に予め格納される。来歴情報111の例では、データ形式としてJSON(JavaScript Object Notation)形式を示すが、他のデータ形式が用いられてもよい。なお、JAVASCRIPTは登録商標である。
FIG. 6 is a diagram showing an example of history information.
Provenance information 111 is stored in the storage unit 110 in advance. In the example of the history information 111, a JSON (JavaScript Object Notation) format is shown as the data format, but other data formats may be used. JAVASCRIPT is a registered trademark.

 来歴情報111には、情報処理システム50における複数のソフトウェアにより処理されたデータの来歴が記録されている。一例として、図6では来歴情報111における、データ集計を行うソフトウェアにより蓄積データ「data-A4」が集計されて、活用データ「data-A5」が出力されたことを示す部分を表している。 The history information 111 records the history of data processed by a plurality of software in the information processing system 50. As an example, FIG. 6 shows a portion of the history information 111 showing that the accumulated data “data-A4” is aggregated by the data aggregation software and the utilization data “data-A5” is output.

 変数「typeName」の値「XXX_script1」は、当該データ集計の処理タイプがPython(登録商標)などの所定のスクリプト言語を用いて生成されたスクリプトであることを示す。「XXX」は、スクリプト言語の名称などを表してもよい。変数名「createdBy」および値「create-user」は、該当のスクリプトを作成したユーザの名称が「create-user」であることを示す。 The value "XXX_script1" of the variable "typeName" indicates that the processing type of the data aggregation is a script generated using a predetermined script language such as Python (registered trademark). "XXX" may represent the name of the script language or the like. The variable name "createdBy" and the value "create-user" indicate that the name of the user who created the corresponding script is "create-user".

 変数「attributes.qualifiedName」の値「XXX_Aggregation application」は、該当のソフトウェアの修飾名が「XXX_Aggregation application」であることを示す。 The value "XXX_Aggression application" of the variable "attributes.qualifiedName" indicates that the qualified name of the corresponding software is "XXX_Aggression application".

 変数「description」の値「Aggregation application」は、該当のソフトウェアの処理内容が「Aggregation application」、すなわち、データ集計であることを示す。 The value "Aggression application" of the variable "description" indicates that the processing content of the corresponding software is "Aggression application", that is, data aggregation.

 変数「attributes.run_user」の値「sample_user」は、該当のデータ集計の実行ユーザの名称が「sample_user」であることを示す。 The value "sample_user" of the variable "attributes.run_user" indicates that the name of the execution user of the corresponding data aggregation is "sample_user".

 変数「attributes.server」の値「sample_server」は、該当のデータ集計のソフトウェアを実行する実行サーバの名称が「sample_server」であることを示す。 The value "sample_server" of the variable "attributes.server" indicates that the name of the execution server that executes the corresponding data aggregation software is "sample_server".

 変数「attributes.inputs.name」の値「data-A4」は、データ集計に対する入力データの名称が「data-A4」であることを示す。
 変数「attributes.inputs.typeName」の値「hdfs_path」は、入力データタイプが「hdfs_path」であることを示す。なお、hdfsは、Hadoop(登録商標) Distributed File Systemの略である。
The value "data-A4" of the variable "attributes.inputs.name" indicates that the name of the input data for the data aggregation is "data-A4".
The value "hdfs_path" of the variable "attributes.inputs.typeName" indicates that the input data type is "hdfs_path". In addition, hdfs is an abbreviation for Hadoop (registered trademark) Distributed File System.

 例えば、蓄積データ「data-A4」の取得元の記憶部の情報は、変数「attributes.inputs.name」の値に含まれてもよいし、変数「attributes.inputs.typeName」の値に含まれてもよい。 For example, the information of the storage unit of the acquisition source of the accumulated data "data-A4" may be included in the value of the variable "attributes.imputs.name" or included in the value of the variable "attributes.imputs.typeName". You may.

 変数「attributes.outputs.name」の値「data-A5」は、蓄積データ「data-A4」に対するデータ集計に応じた出力データの名称が「data-A5」であることを示す。 The value "data-A5" of the variable "attributes.outputs.name" indicates that the name of the output data corresponding to the data aggregation for the accumulated data "data-A4" is "data-A5".

 変数「attributes.outputs.typeName」の値「hdfs_path」は、出力データタイプが「hdfs_path」であることを示す。
 例えば、活用データ「data-A5」の出力先の記憶部の情報は、変数「attributes.outputs.name」の値に含まれてもよいし、変数「attributes.outputs.typeName」の値に含まれてもよい。
The value "hdfs_path" of the variable "attributes.outputs.typeName" indicates that the output data type is "hdfs_path".
For example, the information in the storage unit of the output destination of the utilization data "data-A5" may be included in the value of the variable "attributes.outputs.name" or included in the value of the variable "attributes.outputs.typeName". You may.

 来歴情報111には、他のソフトウェアの処理についても、同様のデータ構造によって、処理されたデータの来歴を示す情報が記録されている。
 次に、来歴情報111に基づく、データの評価方法を説明する。まず、データ来歴の評価、すなわち、来歴評価を説明する。来歴評価では、データ処理に対する始点の入力データの情報に基づいて、出力データを評価する。
In the history information 111, information indicating the history of the processed data is recorded by the same data structure for the processing of other software.
Next, a method of evaluating data based on the history information 111 will be described. First, the evaluation of the data history, that is, the history evaluation will be described. In the provenance evaluation, the output data is evaluated based on the information of the input data of the starting point for the data processing.

 図7は、来歴評価の例を示す図である。
 例えば、来歴情報111は、入力データA1から活用データA3に至る来歴を示す情報を含む。また、来歴情報111は、データ蓄積部53aに記憶された蓄積データByから活用データ記憶部81に記憶された活用データBzに至る来歴を示す情報を含む。来歴情報111は、蓄積データByに対するデータ整形ソフトウェア54のデータ整形処理s6により活用データBzが生成されたことを示す。
FIG. 7 is a diagram showing an example of provenance evaluation.
For example, the history information 111 includes information indicating the history from the input data A1 to the utilization data A3. Further, the history information 111 includes information indicating the history from the stored data By stored in the data storage unit 53a to the utilization data Bz stored in the utilization data storage unit 81. The history information 111 indicates that the utilization data Bz is generated by the data shaping process s6 of the data shaping software 54 for the stored data By.

 ユーザ所持入力データリスト112は、ユーザによって情報処理装置100に入力され、記憶部110に格納される。ユーザ所持入力データリスト112は、データ処理の始点の入力データとしてユーザが想定しているデータの情報を示す。例えば、ユーザ所持入力データリスト112は、データ名および取得元の項目を含む。データ名は、データの名称である。取得元は、データの取得元の記憶部の名称である。例えば、「DB_A1」は、入力データ記憶部70の名称である。 The user-owned input data list 112 is input to the information processing apparatus 100 by the user and stored in the storage unit 110. The user-possessed input data list 112 shows information of data assumed by the user as input data of a start point of data processing. For example, the user-owned input data list 112 includes data names and acquisition source items. The data name is the name of the data. The acquisition source is the name of the storage unit of the acquisition source of the data. For example, "DB_A1" is the name of the input data storage unit 70.

 入力データ抽出部131は、来歴情報111に基づいて、ユーザ所持入力データリスト112に対応するユーザに関する入力データリスト113を生成する。入力データリスト113は、該当のユーザが利用するデータ処理における始点の入力データの情報を示す。入力データ抽出部131は、ユーザ所持入力データリスト112に対応するユーザの識別情報を取得する。入力データ抽出部131は、来歴情報111から、実行ユーザ(例えば、「run_user」)として、該当のユーザの識別情報が記録されている処理(例えば、「description」)を特定する。なお、来歴情報111は、該当のユーザが利用するデータ処理に関するデータの来歴だけを含む情報でもよい。この場合、入力データ抽出部131は、該当のユーザと他のユーザとで来歴を区別する処理を省略できる。 The input data extraction unit 131 generates an input data list 113 for the user corresponding to the user possessed input data list 112 based on the history information 111. The input data list 113 shows information on the input data of the starting point in the data processing used by the user. The input data extraction unit 131 acquires the identification information of the user corresponding to the user-owned input data list 112. The input data extraction unit 131 identifies a process (for example, "description") in which the identification information of the corresponding user is recorded as an execution user (for example, "run_user") from the history information 111. The history information 111 may be information including only the history of data related to data processing used by the user. In this case, the input data extraction unit 131 can omit the process of distinguishing the provenance between the corresponding user and another user.

 入力データ抽出部131は、該当のユーザに対して特定した処理の入力データおよび出力データを、入力データから出力データへ順番に辿ることで、例えば、入力データA1から活用データA3に至る来歴を特定する。同様に、入力データ抽出部131は、蓄積データByから活用データBzに至る来歴を特定する。なお、図7では、該当のユーザに対する他のデータの来歴の図示を省略している。ここで、ある処理に関して、入力データから出力データへ向かう方向は順方向であり、出力データから入力データへ向かう方向は逆方向である。 The input data extraction unit 131 specifies, for example, the history from the input data A1 to the utilization data A3 by tracing the input data and the output data of the process specified for the corresponding user in order from the input data to the output data. do. Similarly, the input data extraction unit 131 specifies the history from the accumulated data By to the inflection data Bz. Note that FIG. 7 omits the illustration of the history of other data for the corresponding user. Here, with respect to a certain process, the direction from the input data to the output data is the forward direction, and the direction from the output data to the input data is the opposite direction.

 入力データ抽出部131は、来歴情報111から抽出した該当のユーザに関するデータの来歴に基づいて、出力データを得るために使用される始点の入力データを抽出し、入力データリスト113を生成する。入力データリスト113は、記憶部110に格納される。入力データリスト113は、入力データ抽出部131により特定された始点の入力データの一覧であると言える。 The input data extraction unit 131 extracts the input data of the starting point used for obtaining the output data based on the history of the data related to the corresponding user extracted from the history information 111, and generates the input data list 113. The input data list 113 is stored in the storage unit 110. It can be said that the input data list 113 is a list of input data of the starting point specified by the input data extraction unit 131.

 例えば、入力データ抽出部131は、活用データA3から来歴を逆方向に辿ることで、活用データA3を得るために用いられる始点の入力データA1を特定する。逆方向に辿ったときに終端となるデータ、すなわち、当該データに対応する入力が来歴情報111になく、それ以上は逆方向に辿れないデータが始点の入力データに相当する。また、入力データ抽出部131は、活用データBzから来歴を逆方向に辿ることで、活用データBzを得るために用いられる始点の入力データByを特定する。始点の入力データByは、データ蓄積部53に格納されているので、蓄積データByとも呼べる。入力データ抽出部131は、特定した始点の入力データA1,Byを含む入力データリスト113を生成する。入力データリスト113は、ユーザ所持入力データリスト112と同様に、データ名および取得元の項目を含む。 For example, the input data extraction unit 131 specifies the input data A1 of the starting point used to obtain the utilization data A3 by tracing the history from the utilization data A3 in the opposite direction. The data that ends when traced in the reverse direction, that is, the input corresponding to the data does not exist in the history information 111, and the data that cannot be traced in the reverse direction further corresponds to the input data of the start point. Further, the input data extraction unit 131 specifies the input data By of the starting point used for obtaining the utilization data Bz by tracing the history from the utilization data Bz in the reverse direction. Since the input data By of the start point is stored in the data storage unit 53, it can also be called the stored data By. The input data extraction unit 131 generates an input data list 113 including the input data A1 and By of the specified start point. The input data list 113 includes the data name and the item of the acquisition source, similarly to the user-owned input data list 112.

 評価部140は、ユーザ所持入力データリスト112および来歴情報111から抽出した入力データリスト113を比較することで、不一致リスト114を生成する。不一致リスト114は、記憶部110に格納される。不一致リスト114は、一覧、来歴および取得元の項目を含む。一覧の項目には、ユーザ所持入力データリスト112のレコードのうち、入力データリスト113に存在しないレコードにおけるデータ名が登録される。来歴の項目には、入力データリスト113のレコードのうち、ユーザ所持入力データリスト112に存在しないレコードにおけるデータ名が登録される。取得元の項目には、該当のデータの取得元が登録される。一覧の項目にデータ名が登録される場合、同レコードの来歴の項目は、設定なしとなる。来歴の項目にデータ名が登録される場合、同レコードの一覧の項目は、設定なしとなる。図中、設定なしをハイフン「-」で表す。 The evaluation unit 140 generates a discrepancy list 114 by comparing the input data list 112 possessed by the user and the input data list 113 extracted from the history information 111. The mismatch list 114 is stored in the storage unit 110. The discrepancy list 114 includes list, history and source items. In the list item, among the records of the user-owned input data list 112, the data names of the records that do not exist in the input data list 113 are registered. In the history item, among the records of the input data list 113, the data names of the records that do not exist in the user-owned input data list 112 are registered. The acquisition source of the corresponding data is registered in the acquisition source item. When the data name is registered in the item of the list, the item of the history of the same record is not set. When the data name is registered in the history item, the item in the list of the same record is not set. In the figure, no setting is indicated by a hyphen "-".

 ユーザ所持入力データリスト112および入力データリスト113に対して、不一致リスト114には、ユーザ所持入力データリスト112のデータ名「Bx」および取得元「DB_Bx」を含むレコードが登録される。また、不一致リスト114には、入力データリスト113のデータ名「By」および取得元「DB_By」を含むレコードが登録される。 A record including the data name "Bx" and the acquisition source "DB_Bx" of the user-owned input data list 112 is registered in the mismatch list 114 with respect to the user-owned input data list 112 and the input data list 113. Further, a record including the data name "By" and the acquisition source "DB_By" of the input data list 113 is registered in the mismatch list 114.

 評価部140は、不一致リスト114に基づいて、来歴評価結果115を生成する。来歴評価結果115は、記憶部110に格納される。来歴評価結果115は、分類および評価値の項目を含む。分類の項目には、一連のデータ処理に属する一部のデータ処理区間を表す情報が登録される。評価値の項目には、分類ごとのデータの品質の評価値が登録される。 The evaluation unit 140 generates a history evaluation result 115 based on the disagreement list 114. The provenance evaluation result 115 is stored in the storage unit 110. Provenance evaluation result 115 includes items of classification and evaluation value. In the classification item, information representing a part of the data processing section belonging to a series of data processing is registered. In the evaluation value item, the evaluation value of the quality of the data for each classification is registered.

 ここで、例えば、始点の入力データ、蓄積データ、活用データというようにデータが変遷する場合、入力データから蓄積データまでが第1区間、蓄積データから活用データまでが第2区間というように分類される。あるいは、始点の入力データ、第1蓄積データ、第2蓄積データ、活用データというようにデータが変遷する場合も考えられる。この場合、入力データから第1蓄積データまでが第1区間、第1蓄積データから第2蓄積データまでが第2区間、第2蓄積データから活用データまでが第3区間というように分類されてもよい。 Here, for example, when the data changes such as the input data, the accumulated data, and the utilization data of the starting point, the data from the input data to the accumulated data is classified as the first section, and the accumulated data to the utilization data is classified as the second section. To. Alternatively, it is conceivable that the data may change, such as the input data of the starting point, the first storage data, the second storage data, and the utilization data. In this case, even if the input data to the first accumulated data is classified as the first section, the first accumulated data to the second accumulated data is classified as the second section, and the second accumulated data to the utilized data is classified as the third section. good.

 評価部140は、来歴情報111における該当のユーザに関するデータの来歴のうち、不一致リスト114の来歴および取得元の情報に対応するデータを始点の入力データとするものを特定し、当該始点の入力データに後続する分類について、評価値を「0」とする。一方、評価部140は、来歴情報111における該当のユーザに関するデータの来歴のうち、ユーザ所持入力データリスト112のデータ名および取得元の情報に対応するデータを始点の入力データとするものを特定する。評価部140は、特定した当該始点の入力データに後続する分類について、評価値を「1」とする。 The evaluation unit 140 identifies the data corresponding to the history of the discrepancy list 114 and the information of the acquisition source as the input data of the start point among the history of the data related to the corresponding user in the history information 111, and the input data of the start point. The evaluation value is set to "0" for the classification following. On the other hand, the evaluation unit 140 specifies the data history of the corresponding user in the history information 111, in which the data corresponding to the data name and the acquisition source information of the user possessed input data list 112 is used as the input data of the starting point. .. The evaluation unit 140 sets the evaluation value to "1" for the classification following the specified input data of the start point.

 例えば、来歴評価結果115には、分類「A1-A2間」に対して、評価値「1」というレコードが登録される。これは、入力データA1に基づいて生成される蓄積データA2の、来歴評価による品質の評価値が「1」であることを示す。 For example, in the history evaluation result 115, a record with an evaluation value of "1" is registered for the classification "between A1 and A2". This indicates that the evaluation value of the quality of the accumulated data A2 generated based on the input data A1 by the history evaluation is "1".

 また、来歴評価結果115には、分類「A2-A3間」に対して、評価値「1」というレコードが登録される。これは、蓄積データA2に基づいて生成される活用データA3の、来歴評価による品質の評価値が「1」であることを示す。 Further, in the history evaluation result 115, a record having an evaluation value of "1" is registered for the classification "A2-A3". This indicates that the evaluation value of the quality of the utilization data A3 generated based on the accumulated data A2 by the history evaluation is "1".

 また、来歴評価結果115には、分類「By-Bz間」に対して、評価値「0」というレコードが登録される。これは、蓄積データByに基づいて生成される活用データBzの、来歴評価による品質の評価値が「0」であることを示す。 Further, in the history evaluation result 115, a record having an evaluation value of "0" is registered for the classification "between By and Bz". This indicates that the evaluation value of the quality of the utilization data Bz generated based on the accumulated data By is "0" by the history evaluation.

 なお、活用データに対応する実際の始点のデータが複数のこともある。その場合、評価部140は、実際の始点のデータのうち、ユーザ所持入力データリスト112に含まれる数が多い程、活用データや中間データの品質を高く評価するように制御できる。 In addition, there may be multiple data of the actual starting point corresponding to the utilization data. In that case, the evaluation unit 140 can control the quality of the utilization data and the intermediate data to be evaluated higher as the number of the data of the actual start point included in the user-owned input data list 112 is larger.

 次に、セキュリティ評価を説明する。
 図8は、セキュリティ評価の例を示す図である。
 例えば、来歴情報111は、入力データ記憶部70に記憶された入力データAxからデータ蓄積部53に記憶された蓄積データAyを経て、活用データ記憶部80に記憶された活用データAz,AB1に至る来歴を示す情報を含む。
Next, the security evaluation will be described.
FIG. 8 is a diagram showing an example of security evaluation.
For example, the history information 111 reaches the utilization data Az and AB1 stored in the utilization data storage unit 80 via the storage data Ay stored in the data storage unit 53 from the input data Ax stored in the input data storage unit 70. Includes history information.

 来歴情報111は、入力データAxに対するデータ取得ソフトウェア51のデータ取得およびデータ加工ソフトウェア52のデータ加工処理s1により、蓄積データAyが生成されたことを示す。また、来歴情報111は、蓄積データAyに対するデータ整形ソフトウェア54のデータ整形処理s3により、活用データAzが生成されたことを示す。更に、来歴情報111は、蓄積データAyに対するデータ整形ソフトウェア54のデータ整形処理s4により、活用データAB1が生成されたことを示す。 The history information 111 indicates that the accumulated data Ay was generated by the data acquisition of the data acquisition software 51 for the input data Ax and the data processing process s1 of the data processing software 52. Further, the history information 111 indicates that the utilization data Az is generated by the data shaping process s3 of the data shaping software 54 for the accumulated data Ay. Further, the history information 111 indicates that the utilization data AB1 is generated by the data shaping process s4 of the data shaping software 54 for the accumulated data Ay.

 まず、アクセス権限予測部132は、来歴評価の際に特定した始点の入力データに対するアクセス権限を示す入力データアクセス権限情報111aを取得する。入力データアクセス権限情報111aは、ユーザにより提供されてもよいし、情報処理システム50から取得されてもよい。あるいは、来歴情報111にデータのアクセス権限の情報が含まれる場合、入力データアクセス権限情報111aは、来歴情報111から取得されてもよい。入力データアクセス権限情報111aは、記憶部110に格納される。 First, the access authority prediction unit 132 acquires the input data access authority information 111a indicating the access authority to the input data of the starting point specified at the time of the provenance evaluation. The input data access authority information 111a may be provided by the user or may be acquired from the information processing system 50. Alternatively, when the history information 111 includes data access authority information, the input data access authority information 111a may be acquired from the history information 111. The input data access authority information 111a is stored in the storage unit 110.

 例えば、アクセス権限は、該当のデータの利用制限事項を示し、当該データにアクセス可能な人員などを示す。アクセス権限予測部132は、当該データにアクセス可能な人員などを、該当のデータの利用目的(例えば、公開目的や組織内で秘密に管理するなど)の情報に応じて特定してもよい。 For example, the access authority indicates the usage restrictions of the relevant data, and indicates the personnel who can access the relevant data. The access authority prediction unit 132 may specify the personnel who can access the data according to the information of the purpose of using the data (for example, the purpose of disclosure or secret management within the organization).

 また、アクセス権限予測部132は、来歴情報111に基づいて、加工整形処理情報111bを取得する。加工整形処理情報111bは、該当の処理で入力データから出力データを得るために用いられた処理内容を示す。例えば、アクセス権限予測部132は、来歴情報111により特定したソフトウェアに対するクエリを解析することで、処理内容を導出する。加工整形処理情報111bは、記憶部110に格納される。 Further, the access authority prediction unit 132 acquires the processing shaping processing information 111b based on the history information 111. The processing shaping processing information 111b indicates the processing content used to obtain the output data from the input data in the corresponding processing. For example, the access authority prediction unit 132 derives the processing content by analyzing the query for the software specified by the history information 111. The processing and shaping processing information 111b is stored in the storage unit 110.

 アクセス権限予測部132は、入力データアクセス権限情報111aおよび加工整形処理情報111bに基づいて、アクセス権限予測を行い、アクセス権限予測結果116を生成する。アクセス権限予測の詳細は後述される。アクセス権限予測結果116は、記憶部110に格納される。 The access authority prediction unit 132 predicts the access authority based on the input data access authority information 111a and the processing and shaping processing information 111b, and generates the access authority prediction result 116. Details of access authority prediction will be described later. The access authority prediction result 116 is stored in the storage unit 110.

 アクセス権限予測結果116は、データ名およびアクセス権限の項目を含む。データ名の項目には、データの名称が登録される。アクセス権限の項目には、該当のデータに関して、先行のデータから予測されたアクセス権限が登録される。 The access authority prediction result 116 includes the data name and access authority items. The name of the data is registered in the item of the data name. In the access authority item, the access authority predicted from the preceding data is registered for the corresponding data.

 例えば、アクセス権限予測結果116には、データ名「Ay」に対して、予測されたアクセス権限が「担当者のみ」であることを示すレコードが登録される。また、アクセス権限予測結果116には、データ名「Az」に対して、予測されたアクセス権限が「データ管理者のみ」であることを示すレコードが登録される。また、アクセス権限予測結果116には、データ名「AB1」に対して、予測されたアクセス権限が「誰でも」であることを示すレコードが登録される。 For example, in the access authority prediction result 116, a record indicating that the predicted access authority is "only the person in charge" is registered for the data name "Ay". Further, in the access authority prediction result 116, a record indicating that the predicted access authority is "only the data administrator" is registered for the data name "Az". Further, in the access authority prediction result 116, a record indicating that the predicted access authority is "anyone" is registered for the data name "AB1".

 一方、アクセス権限予測部132は、アクセス権限予測結果116とは別に、各データの実際のアクセス権限を示すアクセス権限情報117を取得する。アクセス権限情報117は、情報処理システム50から取得され、記憶部110に格納される。例えば、アクセス権限情報117には、データ名「Ay」に対して、実際のアクセス権限が「誰でも」であることを示すレコードが含まれる。また、アクセス権限情報117には、データ名「Az」に対して、実際のアクセス権限が「誰でも」であることを示すレコードが含まれる。また、アクセス権限情報117には、データ名「AB1」に対して、予測されたアクセス権限が「誰でも」であることを示すレコードが含まれる。 On the other hand, the access authority prediction unit 132 acquires the access authority information 117 indicating the actual access authority of each data, in addition to the access authority prediction result 116. The access authority information 117 is acquired from the information processing system 50 and stored in the storage unit 110. For example, the access authority information 117 includes a record indicating that the actual access authority is "anyone" for the data name "Ay". Further, the access authority information 117 includes a record indicating that the actual access authority is "anyone" for the data name "Az". Further, the access authority information 117 includes a record indicating that the predicted access authority is "anyone" for the data name "AB1".

 評価部140は、アクセス権限予測結果116とアクセス権限情報117とを比較することで、セキュリティ評価結果118を生成する。セキュリティ評価結果118は、記憶部110に格納される。セキュリティ評価結果118は、分類および評価値の項目を含む。分類および評価値の項目の意味は、来歴評価結果115の同名の項目の意味と同じである。 The evaluation unit 140 generates the security evaluation result 118 by comparing the access authority prediction result 116 with the access authority information 117. The security evaluation result 118 is stored in the storage unit 110. Security evaluation result 118 includes classification and evaluation value items. The meanings of the items of classification and evaluation value are the same as the meanings of the items of the same name in the history evaluation result 115.

 評価部140は、アクセス権限予測結果116およびアクセス権限情報117を比較して、同一のデータ名のデータに関して、予測されたアクセス権限と実際のアクセス権限とが一致するか否かを判定する。評価部140は、予測されたアクセス権限と実際のアクセス権限とが一致する場合、該当のデータが出力となる分類について評価値を「1」とする。一方、評価部140は、予測されたアクセス権限と実際のアクセス権限とが一致しない場合、該当のデータが出力となる分類について評価値を「0」とする。 The evaluation unit 140 compares the access authority prediction result 116 and the access authority information 117, and determines whether or not the predicted access authority and the actual access authority match with respect to the data having the same data name. When the predicted access authority and the actual access authority match, the evaluation unit 140 sets the evaluation value to "1" for the classification for which the corresponding data is output. On the other hand, when the predicted access authority and the actual access authority do not match, the evaluation unit 140 sets the evaluation value to "0" for the classification for which the corresponding data is output.

 例えば、セキュリティ評価結果118には、分類「Ax-Ay間」に対して、評価値「0」というレコードが登録される。これは、入力データAxに基づいて生成される蓄積データAyの、セキュリティ評価による品質の評価値が「0」であることを示す。 For example, in the security evaluation result 118, a record with an evaluation value of "0" is registered for the classification "between Ax and Ay". This indicates that the evaluation value of the quality of the accumulated data Ay generated based on the input data Ax by the security evaluation is "0".

 また、セキュリティ評価結果118には、分類「Ay-Az間」に対して、評価値「0」というレコードが登録される。これは、蓄積データAyに基づいて生成される活用データAzの、セキュリティ評価による品質の評価値が「0」であることを示す。 Further, in the security evaluation result 118, a record having an evaluation value of "0" is registered for the classification "between Ay and Az". This indicates that the evaluation value of the quality of the utilization data Az generated based on the accumulated data Ay by the security evaluation is "0".

 また、セキュリティ評価結果118には、分類「Ay-AB1間」に対して、評価値「1」というレコードが登録される。これは、蓄積データAyに基づいて生成される活用データAB1の、セキュリティ評価による品質の評価値が「1」であることを示す。 Further, in the security evaluation result 118, a record with an evaluation value of "1" is registered for the classification "between Ay and AB1". This indicates that the evaluation value of the quality of the utilization data AB1 generated based on the accumulated data Ay is "1" by the security evaluation.

 ここで、アクセス権限予測の例を説明する。
 図9は、アクセス権限予測の例を示す図である。
 アクセス権限予測部132は、始点の入力データに関する入力データアクセス権限情報111aに基づいて、始点の入力データに基づいて生成される他のデータのアクセス権限を予測することができる。例えば、アクセス権限予測部132は、情報処理システム50におけるデータカタログの情報から入力データアクセス権限情報111aを取得してもよいし、ユーザにより入力された入力データアクセス権限情報111aを取得してもよい。
Here, an example of access authority prediction will be described.
FIG. 9 is a diagram showing an example of access authority prediction.
The access authority prediction unit 132 can predict the access authority of other data generated based on the input data of the start point based on the input data access authority information 111a regarding the input data of the start point. For example, the access authority prediction unit 132 may acquire the input data access authority information 111a from the information of the data catalog in the information processing system 50, or may acquire the input data access authority information 111a input by the user. ..

 入力データアクセス権限情報111aは、データ名、カラム、秘密区分およびアクセス権限の項目を含む。データ名の項目には、データの名称が登録される。カラムの項目には、該当のデータに含まれるカラムの名称(カラム名)が登録される。カラムは、該当のデータに含まれるデータ項目である。秘密区分の項目には、該当のデータの該当のカラムに対する秘密管理の区分を示す秘密区分が登録される。アクセス権限の項目には、該当のデータの該当のカラムに対するアクセス権限が登録される。 Input data access authority information 111a includes data name, column, secret classification, and access authority items. The name of the data is registered in the item of the data name. The column name (column name) included in the corresponding data is registered in the column item. A column is a data item contained in the corresponding data. In the item of secret classification, a secret classification indicating the classification of secret management for the corresponding column of the corresponding data is registered. In the access authority item, the access authority for the corresponding column of the corresponding data is registered.

 例えば、入力データアクセス権限情報111aは、データ名「Ax」、カラム「column_a」、秘密区分「社外秘」、アクセス権限「社内誰でも」というレコードが登録される。このレコードは、入力データAxのカラム「column_a」の情報の秘密区分が「社外秘」であり、アクセス権限が「社内誰でも」アクセス可能であることを示す。ここで、「社内」とは、該当のユーザが所属する会社に属するユーザ全般を表している。 For example, in the input data access authority information 111a, a record having a data name "Ax", a column "collect_a", a secret category "confidential", and an access authority "anyone in the company" is registered. This record indicates that the secret classification of the information in the column "column_a" of the input data Ax is "confidential" and the access authority is "anyone in the company". Here, "in-house" represents all users belonging to the company to which the corresponding user belongs.

 入力データアクセス権限情報111aには、データ名「Ax」の「column_b」などの他のカラムに対しても、「関係者外秘」、「公開情報」といった秘密区分や、「担当者のみ」や「社内誰でも」といったアクセス権限が登録されている。また、入力データアクセス権限情報111aは、他の始点の入力データに関する秘密区分やアクセス権限の情報を含み得る。 In the input data access authority information 111a, even for other columns such as "collect_b" of the data name "Ax", secret classifications such as "confidential person concerned" and "public information", "only person in charge" and Access rights such as "Anyone in the company" are registered. Further, the input data access authority information 111a may include secret classification and access authority information regarding the input data of another starting point.

 アクセス権限予測部132は、加工整形処理情報111bを取得する。前述のように、アクセス権限予測部132は、一連のデータ処理に含まれる各処理のクエリ解析を行うことで、加工整形処理情報111bを生成する。例えば、アクセス権限予測部132は、来歴情報111に含まれる、ある処理に対する入力データと出力データとの関係から、加工整形処理情報111bを生成することもできる。加工整形処理情報111bは、処理、入力データ、出力データ、入力カラムおよび出力カラムの項目を含む。処理の項目には、各ソフトウェアにおける処理内容の識別情報が登録される。入力データの項目には、当該処理内容に対する入力データが登録される。出力データの項目には、当該処理内容に対する出力データが登録される。入力カラムの項目には、入力データにおけるカラム(入力カラム)の名称が登録される。出力カラムの項目には、出力データにおけるカラム(出力カラム)の名称が登録される。 The access authority prediction unit 132 acquires the processing shaping processing information 111b. As described above, the access authority prediction unit 132 generates processing shaping processing information 111b by performing query analysis of each processing included in a series of data processing. For example, the access authority prediction unit 132 can also generate the processing shaping processing information 111b from the relationship between the input data and the output data for a certain process included in the history information 111. Processing The processing information 111b includes items of processing, input data, output data, input columns, and output columns. Identification information of the processing content in each software is registered in the processing item. Input data for the processing content is registered in the input data item. Output data for the processing content is registered in the output data item. The name of the column (input column) in the input data is registered in the item of the input column. The name of the column (output column) in the output data is registered in the item of the output column.

 例えば、加工整形処理情報111bは、処理「処理s1」、入力データ「Ax」、出力データ「Ay」、入力カラム「column_a」、出力カラム「column_a1」というレコードを含む。このレコードは、データ加工処理s1では、入力データAxのカラムcolumn_aに基づいて、蓄積データAyのカラムcolumn_a1が生成されることを示す。 For example, the processing shaping processing information 111b includes a record of processing "processing s1", input data "Ax", output data "Ay", input column "collect_a", and output column "collect_a1". This record indicates that in the data processing process s1, the column volume_a1 of the accumulated data Ay is generated based on the column volume_a of the input data Ax.

 また、例えば、加工整形処理情報111bは、処理「処理s1」、入力データ「Ax」、出力データ「Ay」、入力カラム「column_b,column_c」、出力カラム「column_bc」というレコードを含む。このレコードは、データ加工処理s1では、入力データAxのカラムcolumn_b,column_cに基づいて、蓄積データAyのカラムcolumn_bcが生成されることを示す。 Further, for example, the processing shaping processing information 111b includes a record of processing "processing s1", input data "Ax", output data "Ay", input columns "collect_b, volume_c", and output column "collect_bc". This record indicates that in the data processing process s1, the column volume_bc of the accumulated data Ay is generated based on the columns volume_b and volume_c of the input data Ax.

 加工整形処理情報111bには、データ整形処理s3などの他の処理に関するレコードも登録される。
 アクセス権限予測部132は、入力データアクセス権限情報111aおよび加工整形処理情報111bに基づいて、出力カラムごとのアクセス権限の予測結果116aを生成する。出力カラムのアクセス権限は、入力カラムのアクセス権限に基づいて予測される。例えば、ある出力カラムに対応する入力カラムが1つの場合、当該入力カラムのアクセス権限が出力カラムに対して予測されるアクセス権限である。また、ある出力カラムに対する入力カラムが複数の場合、当該複数の入力カラムのアクセス権限のうちの最も制限の強いアクセス権限が、出力カラムに対して予測されるアクセス権限である。
Records related to other processes such as data shaping process s3 are also registered in the processing shaping process information 111b.
The access authority prediction unit 132 generates an access authority prediction result 116a for each output column based on the input data access authority information 111a and the processing shaping processing information 111b. The access authority of the output column is predicted based on the access authority of the input column. For example, when there is one input column corresponding to a certain output column, the access authority of the input column is the expected access authority to the output column. Further, when there are a plurality of input columns for a certain output column, the most restrictive access authority among the access authority of the plurality of input columns is the expected access authority for the output column.

 例えば、加工整形処理情報111bによれば、カラムcolumn_a1は、カラムcolumn_aに基づいて生成される。入力データアクセス権限情報111aによれば、カラムcolumn_aのアクセス権限は、「社内誰でも」である。よって、アクセス権限予測部132は、蓄積データAyのカラムcolumn_a1のアクセス権限を「社内誰でも」であると予測する。アクセス権限予測部132は、カラムcolumn_a1の識別情報に対応付けて、予測したアクセス権限「社内誰でも」を、予測結果116aに追加する。 For example, according to the processing shaping processing information 111b, the column volume_a1 is generated based on the column volume_a. According to the input data access authority information 111a, the access authority of the column volume_a is "anyone in the company". Therefore, the access authority prediction unit 132 predicts that the access authority of the column volume_a1 of the accumulated data Ay is "anyone in the company". The access authority prediction unit 132 adds the predicted access authority "anyone in the company" to the prediction result 116a in association with the identification information of the column volume_a1.

 また、加工整形処理情報111bによれば、カラム「column_bc」は、カラム「column_b」および「column_c」に基づいて生成される。入力データアクセス権限情報111aによれば、カラム「column_b」のアクセス権限は「担当者のみ」であり、カラム「column_c」のアクセス権限は「社内誰でも」である。よって、アクセス権限予測部132は、蓄積データAyのカラム「column_bc」のアクセス権限を、「担当者のみ」および「社内誰でも」のうち、最も制限の強い「担当者のみ」であると予測する。アクセス権限予測部132は、カラムcolumn_bcの識別情報に対応付けて、予測したアクセス権限「担当者のみ」を、予測結果116aに追加する。 Further, according to the processing shaping processing information 111b, the column "column_bc" is generated based on the columns "column_b" and "column_c". According to the input data access authority information 111a, the access authority of the column "collect_b" is "only the person in charge", and the access authority of the column "collect_c" is "anyone in the company". Therefore, the access authority prediction unit 132 predicts that the access authority of the column "collect_bc" of the accumulated data Ay is "only the person in charge" and "anyone in the company", which is the most restrictive "only the person in charge". .. The access authority prediction unit 132 adds the predicted access authority “only the person in charge” to the prediction result 116a in association with the identification information of the column volume_bc.

 アクセス権限予測部132は、蓄積データAyに含まれるカラムcolumn_a1,column_bcそれぞれに対して予測されたアクセス権限に基づいて、蓄積データAyのアクセス権限を予測し、アクセス権限予測結果116を生成する。例えば、アクセス権限予測部132は、該当のデータの全カラムに対して予測されたアクセス権限のうち、最も制限の強いアクセス権限を該当のデータのアクセス権限として予測してもよい。蓄積データAyの例では、カラムcolumn_a1,column_bcそれぞれに対して予測されたアクセス権限「社内誰でも」および「担当者のみ」のうち、最も制限の強いアクセス権限は「担当者のみ」である。よって、アクセス権限予測部132は、蓄積データAyのアクセス権限を「担当者のみ」と予測する。アクセス権限予測部132は、蓄積データAyに対して予測したアクセス権限「担当者のみ」を、アクセス権限予測結果116に追加する。 The access authority prediction unit 132 predicts the access authority of the accumulated data Ay based on the access authority predicted for each of the columns volume_a1 and volume_bc included in the accumulated data Ay, and generates the access authority prediction result 116. For example, the access authority prediction unit 132 may predict the most restrictive access authority among the predicted access authority for all columns of the corresponding data as the access authority of the corresponding data. In the example of the accumulated data Ay, among the predicted access privileges "anyone in the company" and "only the person in charge" for each of the columns volume_a1 and volume_bc, the most restrictive access authority is "only the person in charge". Therefore, the access authority prediction unit 132 predicts that the access authority of the accumulated data Ay is "only the person in charge". The access authority prediction unit 132 adds the access authority “only the person in charge” predicted for the stored data Ay to the access authority prediction result 116.

 なお、評価部140は、データのカラムごとに予測した予測結果116aと、当該データのカラムごとの実際のアクセス権限とを比較して、セキュリティ評価結果118を生成してもよい。その場合、例えば、評価部140は、アクセス権限が一致するカラムが多いほど該当のデータの評価値が高くなるように、すなわち、アクセス権限が一致するカラムが多いほど品質が高いと評価するように制御することもできる。 The evaluation unit 140 may generate the security evaluation result 118 by comparing the prediction result 116a predicted for each column of data with the actual access authority for each column of the data. In that case, for example, the evaluation unit 140 evaluates that the more columns with the same access authority, the higher the evaluation value of the corresponding data, that is, the more columns with the same access authority, the higher the quality. It can also be controlled.

 次に、最新性評価を説明する。
 図10は、最新性評価の例を示す図である。
 来歴情報111は、図4で例示したデータの来歴を示す情報を含む。まず、遅延時間算出部133は、遅延要件情報119を取得する。遅延要件情報119は、始点の入力データの発生から活用データが更新されるまでの遅延時間として、ユーザが許容する時間が登録される。遅延要件情報119は、ユーザによって情報処理装置100に入力され、記憶部110に格納される。
Next, the latestness evaluation will be described.
FIG. 10 is a diagram showing an example of up-to-dateness evaluation.
Provenance information 111 includes information indicating the provenance of the data exemplified in FIG. First, the delay time calculation unit 133 acquires the delay requirement information 119. In the delay requirement information 119, a time allowed by the user is registered as a delay time from the generation of the input data of the start point to the update of the utilization data. The delay requirement information 119 is input to the information processing apparatus 100 by the user and stored in the storage unit 110.

 遅延要件情報119は、データ名および遅延要件の項目を含む。データ名の項目には、活用データのデータ名が登録される。遅延要件の項目には、始点の入力データの発生から活用データが更新されるまでに許容される遅延時間が登録される。遅延要件の項目には、始点の入力データの発生から活用データが更新されるまでに許容される遅延時間の上限が登録されてもよい。 Delay requirement information 119 includes data names and delay requirement items. The data name of the utilization data is registered in the data name item. In the item of delay requirement, the allowable delay time from the generation of the input data of the starting point to the update of the utilization data is registered. In the item of delay requirement, the upper limit of the delay time allowed from the generation of the input data of the starting point to the update of the utilization data may be registered.

 例えば、遅延要件情報119は、活用データA3の遅延要件が「2時間以内」であることを示すレコードを含む。また、遅延要件情報119は、活用データABの遅延要件が「5分以内」であることを示すレコードを含む。更に、遅延要件情報119は、活用データB3の遅延要件が「1分以内」であることを示すレコードを含む。遅延要件情報119は、他のデータ処理により生成される活用データに対する遅延要件のレコードを含み得る。 For example, the delay requirement information 119 includes a record indicating that the delay requirement of the utilization data A3 is "within 2 hours". Further, the delay requirement information 119 includes a record indicating that the delay requirement of the utilization data AB is "within 5 minutes". Further, the delay requirement information 119 includes a record indicating that the delay requirement of the utilization data B3 is "within 1 minute". The delay requirement information 119 may include a record of delay requirements for utilization data generated by other data processing.

 遅延時間算出部133は、来歴情報111に基づいて、実遅延時間情報120を生成する。実遅延時間情報120は、記憶部110に格納される。例えば、遅延時間算出部133は、情報処理システム50で記録されるデータ更新ログを、情報処理システム50から取得し、記憶部110に格納する。データ更新ログは、データ名と、当該データ名のデータが更新された時刻とを示す情報を含む。遅延時間算出部133は、来歴情報111と、データ更新ログとに基づいて、実遅延時間情報120を生成する。 The delay time calculation unit 133 generates the actual delay time information 120 based on the history information 111. The actual delay time information 120 is stored in the storage unit 110. For example, the delay time calculation unit 133 acquires the data update log recorded by the information processing system 50 from the information processing system 50 and stores it in the storage unit 110. The data update log contains information indicating the data name and the time when the data of the data name was updated. The delay time calculation unit 133 generates the actual delay time information 120 based on the history information 111 and the data update log.

 実遅延時間情報120は、データ名、更新時刻および遅延時間の項目を含む。データ名の項目には、データ名が登録される。更新時刻の項目には、該当のデータ名のデータの更新時刻が登録される。図10の例では、簡単のために、更新時刻は同日のものである例を示すが、更新時刻は、日付を含んでもよい。遅延時間の項目には、始点の入力データが更新された時刻から経過した時間、すなわち、遅延時間が登録される。始点の入力データに対しては、遅延時間は「-」(設定なし)となる。 The actual delay time information 120 includes items of data name, update time, and delay time. The data name is registered in the data name item. In the update time item, the update time of the data with the corresponding data name is registered. In the example of FIG. 10, for the sake of simplicity, an example in which the update time is the same day is shown, but the update time may include a date. In the item of delay time, the time elapsed from the time when the input data of the start point is updated, that is, the delay time is registered. For the input data of the start point, the delay time is "-" (no setting).

 遅延時間算出部133は、データ名および更新時刻の情報を、前述のデータ更新ログから取得することができる。また、遅延時間算出部133は、来歴情報111に基づいてデータの来歴を辿ることで、始点の入力データと、当該始点の入力データを基に生成される後続のデータを特定し、当該後続のデータに対する遅延時間を算出することができる。 The delay time calculation unit 133 can acquire the data name and update time information from the above-mentioned data update log. Further, the delay time calculation unit 133 identifies the input data of the start point and the subsequent data generated based on the input data of the start point by tracing the history of the data based on the history information 111, and identifies the subsequent data generated based on the input data of the start point. The delay time for the data can be calculated.

 例えば、実遅延時間情報120は、入力データA1に対して更新時刻「02:30」、遅延時間「-」のレコードを含む。入力データA1は、入力データ抽出部131により特定される「始点の入力データ」であるため、遅延時間は「-」となる。 For example, the actual delay time information 120 includes a record with an update time "02:30" and a delay time "-" with respect to the input data A1. Since the input data A1 is the "input data of the starting point" specified by the input data extraction unit 131, the delay time is "-".

 また、実遅延時間情報120は、入力データB1に対して更新時刻「13:30」、遅延時間「-」のレコードを含む。入力データB1は、入力データ抽出部131により特定される「始点の入力データ」であるため、遅延時間は「-」となる。 Further, the actual delay time information 120 includes a record of the update time "13:30" and the delay time "-" with respect to the input data B1. Since the input data B1 is the "input data of the starting point" specified by the input data extraction unit 131, the delay time is "-".

 また、実遅延時間情報120は、蓄積データA2に対して更新時刻「02:32」、遅延時間「2分」のレコードを含む。来歴情報111によれば、蓄積データA2は、入力データA1に対するデータ取得およびデータ加工を経てデータ蓄積部53に格納される。このため、蓄積データA2の遅延時間は、入力データA1の更新時刻「02:30」と、蓄積データA2の更新時刻「02:32」との差「2分」となる。 Further, the actual delay time information 120 includes a record of the update time "02:32" and the delay time "2 minutes" with respect to the accumulated data A2. According to the history information 111, the stored data A2 is stored in the data storage unit 53 after data acquisition and data processing for the input data A1. Therefore, the delay time of the accumulated data A2 is the difference "2 minutes" between the update time "02:30" of the input data A1 and the update time "02:32" of the accumulated data A2.

 実遅延時間情報120は、蓄積データB2に対して更新時刻「13:33」、遅延時間「3分」のレコードを含む。実遅延時間情報120は、活用データA3に対して更新時刻「04:00」、遅延時間「1時間30分」のレコードを含む。実遅延時間情報120は、活用データABに対して更新時刻「13:33」、遅延時間「3分」のレコードを含む。実遅延時間情報120は、活用データB3に対して更新時刻「13:33」、遅延時間「3分」のレコードを含む。 The actual delay time information 120 includes a record of the update time "13:33" and the delay time "3 minutes" with respect to the accumulated data B2. The actual delay time information 120 includes a record of the update time "04:00" and the delay time "1 hour 30 minutes" with respect to the utilization data A3. The actual delay time information 120 includes a record of the update time “13:33” and the delay time “3 minutes” with respect to the utilization data AB. The actual delay time information 120 includes a record of the update time “13:33” and the delay time “3 minutes” with respect to the utilization data B3.

 評価部140は、実遅延時間情報120に基づいて、活用データに対して計算された遅延時間が、遅延要件情報119の遅延要件を満たすか否かを判定することで、データの最新性を評価し、最新性評価結果121を生成する。最新性評価結果121は、記憶部110に格納される。最新性評価結果121は、分類および評価値の項目を含む。分類および評価値の意味は、来歴評価結果115の同名の項目の意味と同じである。 The evaluation unit 140 evaluates the up-to-dateness of the data by determining whether or not the delay time calculated for the utilization data satisfies the delay requirement of the delay requirement information 119 based on the actual delay time information 120. Then, the latestness evaluation result 121 is generated. The up-to-dateness evaluation result 121 is stored in the storage unit 110. The up-to-dateness evaluation result 121 includes items of classification and evaluation value. The meaning of the classification and the evaluation value is the same as the meaning of the item of the same name in the provenance evaluation result 115.

 評価部140は、活用データに対して計算された遅延時間が、遅延要件を満たす場合、該当の活用データに至る各分類の評価値を「1」とする。評価部140は、活用データに対して計算された遅延時間が、遅延要件を満たさない場合、該当の活用データに至る各分類の評価値を「0」とする。ある分類に対して、評価値「1」および「0」の両方が付与され得る場合、当該分類の評価値を「0」とする。 When the delay time calculated for the utilization data meets the delay requirement, the evaluation unit 140 sets the evaluation value of each classification up to the corresponding utilization data to "1". When the delay time calculated for the utilization data does not satisfy the delay requirement, the evaluation unit 140 sets the evaluation value of each classification leading to the utilization data to "0". When both the evaluation values "1" and "0" can be given to a certain classification, the evaluation value of the classification is set to "0".

 実遅延時間情報の例では、活用データA3,ABについては、遅延要件を満たすので、活用データA3,ABに至る各分類の評価値は「1」となる。一方、活用データB3については、遅延要件を満たさないので、活用データ「B3」に至る各分類の評価値は「0」となる。特に、分類「B1-B2間」は、活用データAB,B3に連なる分類であるが、活用データB3に関して遅延要件を満たさないので、評価値は「0」となる。 In the example of the actual delay time information, since the utilization data A3 and AB satisfy the delay requirement, the evaluation value of each classification up to the utilization data A3 and AB is "1". On the other hand, since the utilization data B3 does not satisfy the delay requirement, the evaluation value of each classification leading to the utilization data “B3” is “0”. In particular, the classification "between B1 and B2" is a classification linked to the utilization data AB and B3, but the evaluation value is "0" because the delay requirement is not satisfied for the utilization data B3.

 なお、来歴情報111に、データ更新ログに相当する情報が含まれることもある。その場合、遅延時間算出部133は、情報処理システム50からデータ更新ログを別途取得しなくても、来歴情報111から実遅延時間情報120を生成できる。 Note that the history information 111 may include information corresponding to the data update log. In that case, the delay time calculation unit 133 can generate the actual delay time information 120 from the history information 111 without separately acquiring the data update log from the information processing system 50.

 このようにして、評価部140は、来歴情報解析部130による解析結果を基に、来歴評価、セキュリティ評価および最新性評価によるデータ品質の評価を行う。更に、評価部140は、来歴評価、セキュリティ評価および最新性評価の評価結果を基に、データ品質の総合評価を行う。次に、総合評価について説明する。 In this way, the evaluation unit 140 evaluates the data quality by the history evaluation, the security evaluation, and the up-to-dateness evaluation based on the analysis result by the history information analysis unit 130. Further, the evaluation unit 140 performs a comprehensive evaluation of data quality based on the evaluation results of the history evaluation, the security evaluation, and the up-to-dateness evaluation. Next, the comprehensive evaluation will be described.

 図11は、総合評価結果テーブルの例を示す図である。
 総合評価結果テーブル122は、来歴評価結果115、セキュリティ評価結果118および最新性評価結果121に基づいて、評価部140により生成され、記憶部110に格納される。総合評価結果テーブル122は、分類、来歴評価値、セキュリティ評価値、最新性評価値および総合評価値の項目を含む。
FIG. 11 is a diagram showing an example of a comprehensive evaluation result table.
The comprehensive evaluation result table 122 is generated by the evaluation unit 140 and stored in the storage unit 110 based on the probability evaluation result 115, the security evaluation result 118, and the up-to-dateness evaluation result 121. The comprehensive evaluation result table 122 includes items of classification, history evaluation value, security evaluation value, up-to-dateness evaluation value, and comprehensive evaluation value.

 分類の項目には、分類が登録される。分類の意味は、来歴評価結果115における分類の意味と同じである。来歴評価値の項目には、該当の分類に対する来歴評価結果115における評価値、すなわち、来歴評価値が登録される。セキュリティ評価値の項目には、該当の分類に対するセキュリティ評価結果118における評価値、すなわち、セキュリティ評価値が登録される。最新性評価値の項目には、該当の分類に対する最新性評価結果121における評価値、すなわち、最新性評価値が登録される。総合評価値の項目には、来歴評価値、セキュリティ評価値および最新性評価値に基づいて計算される総合評価値が登録される。例えば、総合評価値は、来歴評価値とセキュリティ評価値と最新性評価値との和である。 Classification is registered in the classification item. The meaning of the classification is the same as the meaning of the classification in the history evaluation result 115. In the item of the provenance evaluation value, the evaluation value in the provenance evaluation result 115 for the corresponding classification, that is, the provenance evaluation value is registered. In the item of security evaluation value, the evaluation value in the security evaluation result 118 for the corresponding classification, that is, the security evaluation value is registered. In the item of up-to-dateness evaluation value, the evaluation value in the up-to-dateness evaluation result 121 for the corresponding classification, that is, the up-to-dateness evaluation value is registered. In the item of the comprehensive evaluation value, the comprehensive evaluation value calculated based on the history evaluation value, the security evaluation value and the up-to-dateness evaluation value is registered. For example, the comprehensive evaluation value is the sum of the history evaluation value, the security evaluation value, and the up-to-dateness evaluation value.

 例えば、総合評価結果テーブル122は、分類「A1-A2間」に対して、来歴評価値「1」、セキュリティ評価値「1」、最新性評価値「1」、総合評価値「3」のレコードを含む。また、例えば、総合評価結果テーブル122は、分類「B1-B2間」に対して、来歴評価値「0」、セキュリティ評価値「0」、最新性評価値「0」、総合評価値「0」のレコードを含む。総合評価結果テーブル122には、他の分類に対するレコードも登録される。 For example, the comprehensive evaluation result table 122 is a record of a history evaluation value "1", a security evaluation value "1", an up-to-date evaluation value "1", and a comprehensive evaluation value "3" for the classification "A1-A2". including. Further, for example, in the comprehensive evaluation result table 122, the history evaluation value "0", the security evaluation value "0", the up-to-dateness evaluation value "0", and the comprehensive evaluation value "0" are shown for the classification "B1-B2". Includes records for. Records for other classifications are also registered in the comprehensive evaluation result table 122.

 上記の例では、評価部140が、来歴評価値(V1)とセキュリティ評価値(V2)と最新性評価値(V3)との和(V1+V2+V3)を総合評価値とすることを示した。ここで、V1,V2,V3は正の実数である。一方、総合評価値の算出方法には他の例も考えられる。例えば、評価部140は、来歴評価値、セキュリティ評価値および最新性評価値それぞれに対して重みを付けた重み付き和(w1*V1+w2*V2+w3*V3)を総合評価値としてもよい。ここで、w1,w2,w3は正の実数である。 In the above example, the evaluation unit 140 has shown that the sum (V1 + V2 + V3) of the history evaluation value (V1), the security evaluation value (V2), and the up-to-dateness evaluation value (V3) is used as the comprehensive evaluation value. Here, V1, V2, and V3 are positive real numbers. On the other hand, other examples can be considered for the calculation method of the comprehensive evaluation value. For example, the evaluation unit 140 may use a weighted sum (w1 * V1 + w2 * V2 + w3 * V3) weighted for each of the provenance evaluation value, the security evaluation value, and the up-to-dateness evaluation value as the comprehensive evaluation value. Here, w1, w2, and w3 are positive real numbers.

 表示制御部150は、総合評価結果テーブル122に基づいて、評価結果を示す評価結果画面をディスプレイ61に表示させる。次に、評価結果画面の例を説明する。
 図12は、評価結果画面の第1の例を示す図である。
The display control unit 150 causes the display 61 to display an evaluation result screen showing the evaluation result based on the comprehensive evaluation result table 122. Next, an example of the evaluation result screen will be described.
FIG. 12 is a diagram showing a first example of the evaluation result screen.

 評価結果画面400は、データフロー図401および凡例402の画像を含む。データフロー図401は、情報処理システム50におけるデータの流れを表す図である。表示制御部150は、ユーザの識別情報の入力を受け付け、来歴情報111に基づいて、当該ユーザの識別情報に関連するデータの流れを、データフロー図401として表示させる。 The evaluation result screen 400 includes images of the data flow diagram 401 and the legend 402. The data flow diagram 401 is a diagram showing a data flow in the information processing system 50. The display control unit 150 accepts the input of the user's identification information, and displays the flow of data related to the user's identification information as the data flow diagram 401 based on the history information 111.

 データフロー図401では、データの流れが矢印で表されている。1つの矢印は、総合評価結果テーブル122における分類に対応する。表示制御部150は、矢印に色を付けることで、各分類に対する評価値、すなわち、品質の評価結果をユーザに提示する。凡例402は、矢印の色に対応する品質の高さを示す。 Data flow In FIG. 401, the data flow is represented by an arrow. One arrow corresponds to the classification in the comprehensive evaluation result table 122. The display control unit 150 presents the evaluation value for each classification, that is, the evaluation result of the quality, to the user by coloring the arrow. Legend 402 indicates the high quality corresponding to the color of the arrow.

 図12の例では、品質を3色で区別する場合を示している。第1の色は、品質「高」を表す。第2の色は、品質「中」を表す。第3の色は、品質「低」を表す。第1の色は、例えば緑である。第2の色は、例えば黄色である。第3の色は、例えば赤である。 The example of FIG. 12 shows a case where quality is distinguished by three colors. The first color represents quality "high". The second color represents quality "medium". The third color represents quality "low". The first color is, for example, green. The second color is, for example, yellow. The third color is, for example, red.

 例えば、表示制御部150は、総合評価結果テーブル122で総合評価値が満点、すなわち、「3」に対応する分類の矢印を第1の色とする。また、表示制御部150は、総合評価結果テーブル122で総合評価値が「0」に対応する分類の矢印を第3の色とする。更に、表示制御部150は、総合評価結果テーブル122で総合評価値が0より大きく3より小さい分類の矢印を第2の色とする。なお、総合評価値に応じた矢印の色分けは、2色や4色以上を用いて行われてもよい。 For example, the display control unit 150 uses the arrow of the classification corresponding to the total evaluation value of "3" as the first color in the comprehensive evaluation result table 122. Further, the display control unit 150 uses the arrow of the classification corresponding to the comprehensive evaluation value "0" in the comprehensive evaluation result table 122 as the third color. Further, the display control unit 150 sets the arrow of the classification in which the comprehensive evaluation value is larger than 0 and smaller than 3 in the comprehensive evaluation result table 122 as the second color. The color coding of the arrows according to the comprehensive evaluation value may be performed by using two colors or four or more colors.

 表示制御部150は、総合評価値が低い(例えば、3未満の総合評価値である)矢印を経由して得られる活用データに対して、例えばクロスマーク「X」を重ねて表示させることで、見直しを要する箇所であることをユーザに提示する。 The display control unit 150 displays, for example, a cross mark "X" superimposed on the utilization data obtained via an arrow having a low overall evaluation value (for example, an overall evaluation value of less than 3). Show the user that it is a part that needs to be reviewed.

 ユーザは、評価結果画面400に重ねて表示されるポインタP1を、入力デバイス62により操作することで、データフロー図401におけるデータや処理などを表すアイコンを選択することができる。 The user can select an icon representing data or processing in the data flow diagram 401 by operating the pointer P1 displayed on the evaluation result screen 400 by the input device 62.

 図13は、評価結果画面の第2の例を示す図である。
 表示制御部150は、評価結果画面400における活用データABのアイコンがポインタP1により選択されたことを検出すると、評価結果画面400を評価結果画面500に更新する。評価結果画面500は、データフロー図501および凡例502の画像を含む。凡例502は、凡例402と同じである。
FIG. 13 is a diagram showing a second example of the evaluation result screen.
When the display control unit 150 detects that the icon of the utilization data AB on the evaluation result screen 400 is selected by the pointer P1, the display control unit 150 updates the evaluation result screen 400 to the evaluation result screen 500. The evaluation result screen 500 includes images of the data flow diagram 501 and the legend 502. The legend 502 is the same as the legend 402.

 データフロー図501では、選択された活用データABからデータ整形処理s4を経由して蓄積データB2へ遡る、逆方向の矢印が強調表示される。また、データフロー図501では、入力データB1からデータ取得処理やデータ加工処理s2を経て蓄積データB2へ至ることを表す順方向の矢印が強調表示される。それ以外の矢印については、目立たない態様で表示される。 In the data flow diagram 501, an arrow in the reverse direction, which goes back from the selected utilization data AB to the accumulated data B2 via the data shaping process s4, is highlighted. Further, in the data flow diagram 501, a forward arrow indicating that the input data B1 reaches the accumulated data B2 via the data acquisition process and the data processing process s2 is highlighted. Other arrows are displayed in an inconspicuous manner.

 また、データ整形処理s4に対するデータの入力元を遡る際、当該入力元は蓄積データA2,B2と2つあり、分岐している。この場合、表示制御部150は、総合評価値の低い方を優先的に選択して、逆方向の矢印を表示させることが考えられる。より具体的には、評価結果画面500では、活用データABから遡ったデータ整形処理s4を逆に辿ると、蓄積データA2,B2に分岐する。したがって、表示制御部150は、分類「A1-A2間」、分類「B1-B2間」のうち、総合評価値の低い方である分類「B1-B2間」に連なる矢印を強調表示させる。これにより、ユーザはデータの品質低下の要因となる箇所を見つけ易くなる。 Further, when tracing back the data input source for the data shaping process s4, there are two input sources, the accumulated data A2 and B2, and they are branched. In this case, it is conceivable that the display control unit 150 preferentially selects the one having the lower overall evaluation value and displays the arrow in the opposite direction. More specifically, on the evaluation result screen 500, when the data shaping process s4 traced back from the utilization data AB is traced in the reverse direction, the data is branched into the accumulated data A2 and B2. Therefore, the display control unit 150 highlights the arrow connected to the classification "B1-B2", which is the lower of the classifications "A1-A2" and "B1-B2", which has the lower overall evaluation value. This makes it easier for the user to find a part that causes deterioration of data quality.

 なお、図12,図13の例では、総合評価値に対する評価結果画面400,500を例示したが、表示制御部150は、総合評価結果テーブル122を基に、来歴評価値、セキュリティ評価値および最新性評価値の各々に対する評価結果画面を表示させてもよい。 In the examples of FIGS. 12 and 13, the evaluation result screens 400 and 500 for the comprehensive evaluation value are illustrated, but the display control unit 150 uses the comprehensive evaluation result table 122 as a basis for the provenance evaluation value, the security evaluation value, and the latest. The evaluation result screen for each of the sex evaluation values may be displayed.

 次に、情報処理装置100の処理手順を説明する。
 図14は、情報処理装置の処理例を示すフローチャートである。
 情報処理装置100は、例えば、ユーザによるデータの品質評価の開始の入力を受け付けると下記の手順を開始する。
Next, the processing procedure of the information processing apparatus 100 will be described.
FIG. 14 is a flowchart showing a processing example of the information processing apparatus.
The information processing apparatus 100 starts the following procedure when, for example, the user receives an input for starting a data quality evaluation.

 (S10)来歴情報解析部130は、来歴情報111に基づいて、該当のユーザの識別情報に対応する入力データリスト113を生成する。評価部140は、ユーザ所持入力データリスト112および入力データリスト113に基づいてデータの来歴評価を行い、来歴評価結果115を生成する。評価部140は、生成した来歴評価結果115を記憶部110に格納する。来歴評価の手順は後述される。 (S10) The provenance information analysis unit 130 generates an input data list 113 corresponding to the identification information of the corresponding user based on the provenance information 111. The evaluation unit 140 evaluates the history of data based on the user-owned input data list 112 and the input data list 113, and generates the history evaluation result 115. The evaluation unit 140 stores the generated probability evaluation result 115 in the storage unit 110. The procedure for provenance evaluation will be described later.

 (S11)来歴情報解析部130は、来歴情報111に基づいてアクセス権限予測結果116を生成する。評価部140は、アクセス権限予測結果116および実際のアクセス権限情報117に基づいてデータのセキュリティ評価を行い、セキュリティ評価結果118を生成する。評価部140は、生成したセキュリティ評価結果118を記憶部110に格納する。セキュリティ評価の手順は後述される。 (S11) The provenance information analysis unit 130 generates the access authority prediction result 116 based on the provenance information 111. The evaluation unit 140 performs security evaluation of data based on the access authority prediction result 116 and the actual access authority information 117, and generates the security evaluation result 118. The evaluation unit 140 stores the generated security evaluation result 118 in the storage unit 110. The security evaluation procedure will be described later.

 (S12)来歴情報解析部130は、来歴情報111に基づいて実遅延時間情報120を生成する。評価部140は、遅延要件情報119および実遅延時間情報120に基づいて、データの最新性評価を行い、最新性評価結果121を生成する。評価部140は、生成した最新性評価結果121を記憶部110に格納する。最新性評価の手順は後述される。 (S12) The provenance information analysis unit 130 generates the actual delay time information 120 based on the provenance information 111. The evaluation unit 140 evaluates the latestness of the data based on the delay requirement information 119 and the actual delay time information 120, and generates the latestness evaluation result 121. The evaluation unit 140 stores the generated up-to-dateness evaluation result 121 in the storage unit 110. The procedure for up-to-date evaluation will be described later.

 (S13)評価部140は、来歴評価結果115、セキュリティ評価結果118および最新性評価結果121に基づいて総合評価結果テーブル122を生成する。評価部140は、生成した総合評価結果テーブル122を記憶部110に格納する。評価部140は、来歴評価結果115、セキュリティ評価結果118および最新性評価結果121における各分類の来歴評価値、セキュリティ評価値および最新性評価値を、総合評価結果テーブル122に登録する。そして、評価部140は、分類ごとに来歴評価値とセキュリティ評価値と最新性評価値との和を総合評価値として計算し、総合評価結果テーブル122に登録する。総合評価値は、来歴評価値とセキュリティ評価値と最新性評価値との重み付き和など、他の計算方法で計算されてもよい。 (S13) The evaluation unit 140 generates a comprehensive evaluation result table 122 based on the provenance evaluation result 115, the security evaluation result 118, and the up-to-dateness evaluation result 121. The evaluation unit 140 stores the generated comprehensive evaluation result table 122 in the storage unit 110. The evaluation unit 140 registers the history evaluation value, the security evaluation value, and the latestness evaluation value of each category in the history evaluation result 115, the security evaluation result 118, and the up-to-dateness evaluation result 121 in the comprehensive evaluation result table 122. Then, the evaluation unit 140 calculates the sum of the history evaluation value, the security evaluation value, and the up-to-dateness evaluation value as the comprehensive evaluation value for each classification, and registers the sum in the comprehensive evaluation result table 122. The comprehensive evaluation value may be calculated by another calculation method such as a weighted sum of the history evaluation value, the security evaluation value, and the up-to-dateness evaluation value.

 (S14)表示制御部150は、総合評価結果テーブル122に基づいて、評価結果表示制御を実行する。評価結果表示制御の手順は後述される。表示制御部150は、ユーザによる評価結果表示の終了の入力を受け付けると、評価結果表示制御を終了し、評価結果画面の表示を終了する。そして、情報処理装置100の処理が終了する。 (S14) The display control unit 150 executes the evaluation result display control based on the comprehensive evaluation result table 122. The procedure for controlling the evaluation result display will be described later. When the display control unit 150 receives the input of the end of the evaluation result display by the user, the display control unit 150 ends the evaluation result display control and ends the display of the evaluation result screen. Then, the processing of the information processing apparatus 100 is completed.

 図15は、来歴評価例を示すフローチャートである。
 来歴評価は、ステップS10に相当する。
 (S20)入力データ抽出部131は、ユーザ所持入力データリスト112を取得し、記憶部110に格納する。例えば、ユーザ所持入力データリスト112は、ユーザにより情報処理装置100に入力される。
FIG. 15 is a flowchart showing a history evaluation example.
Provenance evaluation corresponds to step S10.
(S20) The input data extraction unit 131 acquires the user-owned input data list 112 and stores it in the storage unit 110. For example, the user-owned input data list 112 is input to the information processing apparatus 100 by the user.

 (S21)入力データ抽出部131は、記憶部110に記憶された来歴情報111に基づいて、実際の始点の入力データのリスト、すなわち、入力データリスト113を生成し、記憶部110に格納する。このとき、入力データ抽出部131は、例えば来歴情報111に基づいて、該当のユーザの識別情報に対応する処理で用いられるデータの来歴を特定し、特定した来歴から入力データリスト113を生成する。 (S21) The input data extraction unit 131 generates a list of input data of the actual start point, that is, an input data list 113, based on the history information 111 stored in the storage unit 110, and stores the input data list 113 in the storage unit 110. At this time, the input data extraction unit 131 identifies the history of the data used in the process corresponding to the identification information of the corresponding user based on, for example, the history information 111, and generates the input data list 113 from the specified history.

 (S22)評価部140は、評価対象の活用データを特定する。なお、来歴評価の処理において、最初にステップS22を実行する時点における各データの来歴評価値の初期値は0であるとする。 (S22) The evaluation unit 140 specifies the utilization data to be evaluated. In the history evaluation process, it is assumed that the initial value of the history evaluation value of each data at the time when step S22 is first executed is 0.

 (S23)評価部140は、ユーザ所持入力データリスト112および入力データリスト113に基づいて、該当の活用データに対応する実際の始点の入力データがユーザ所持入力データに一致するか否かを判定する。一致する場合、評価部140は、ステップS24に処理を進める。一致しない場合、評価部140は、ステップS25に処理を進める。 (S23) The evaluation unit 140 determines whether or not the input data of the actual starting point corresponding to the corresponding utilization data matches the user-owned input data based on the user-owned input data list 112 and the input data list 113. .. If they match, the evaluation unit 140 proceeds to step S24. If they do not match, the evaluation unit 140 proceeds to step S25.

 (S24)評価部140は、該当の活用データの来歴評価の評価値、すなわち、来歴評価値を加点する。また、評価部140は、該当の活用データに至る中間データ(蓄積データ)の来歴評価値も加点する。加点では、例えば、来歴評価値「1」(単位点数)を加算する。ただし、加算する来歴評価値は「1」以外でもよく、前述のように、所定の上限値(例えば「1」)を超えないように該当のデータへの来歴評価値が付与されてもよい。ここで、あるデータに対する品質の評価値は、当該データを出力とする分類の評価値に相当する。例えば、総合評価結果テーブル122の分類「A1-A2間」に対する評価値は、蓄積データA2に対する評価値と言える。 (S24) The evaluation unit 140 adds points to the evaluation value of the history evaluation of the corresponding utilization data, that is, the history evaluation value. In addition, the evaluation unit 140 also adds points to the history evaluation value of the intermediate data (accumulated data) leading to the corresponding utilization data. In adding points, for example, the revenue evaluation value "1" (unit points) is added. However, the history evaluation value to be added may be other than "1", and as described above, the history evaluation value for the corresponding data may be given so as not to exceed a predetermined upper limit value (for example, "1"). Here, the quality evaluation value for a certain data corresponds to the evaluation value of the classification that outputs the data. For example, the evaluation value for the classification "between A1 and A2" in the comprehensive evaluation result table 122 can be said to be the evaluation value for the accumulated data A2.

 (S25)評価部140は、評価対象の全活用データを評価済であるか否かを判定する。評価対象の全活用データを評価済の場合、評価部140は、来歴評価を終了する。評価対象の全活用データを評価済でない場合、評価部140は、ステップS22に処理を進める。 (S25) The evaluation unit 140 determines whether or not all the utilization data to be evaluated have been evaluated. If all the utilization data to be evaluated have been evaluated, the evaluation unit 140 ends the history evaluation. If the evaluation target full utilization data has not been evaluated, the evaluation unit 140 proceeds to step S22.

 なお、ステップS23,S24の処理において、活用データに対応する実際の始点のデータが複数のこともある。その場合、評価部140は、実際の始点のデータのうち、ユーザ所持入力データリスト112に含まれる数だけ、活用データや中間データの来歴評価値を加点する。例えば、評価部140は、実際の始点の入力データのうち、ユーザ所持入力データリスト112に含まれる数が2の場合に、評価値の単位点数aの2倍の評価値を活用データや中間データの各々の来歴評価値に加算することが考えられる。 In the processing of steps S23 and S24, there may be a plurality of actual starting point data corresponding to the utilization data. In that case, the evaluation unit 140 adds points to the history evaluation values of the utilization data and the intermediate data as many as the number included in the user-possessed input data list 112 among the actual start point data. For example, when the number of the input data of the actual start point included in the user-owned input data list 112 is 2, the evaluation unit 140 utilizes the evaluation value twice as the unit score a of the evaluation value, or the intermediate data. It is conceivable to add to each history evaluation value of.

 図16は、セキュリティ評価例を示すフローチャートである。
 セキュリティ評価は、ステップS11に相当する。
 (S30)アクセス権限予測部132は、ステップS21で生成された入力データリスト113に基づいて、処理対象のユーザの識別情報に対応する始点の入力データを特定する。アクセス権限予測部132は、特定した始点の入力データに関する入力データアクセス権限情報111aを取得し、記憶部110に格納する。また、アクセス権限予測部132は、来歴情報111に基づいて、始点の入力データから活用データに至るまでの処理に関する加工整形処理情報111bを取得し、記憶部110に格納する。
FIG. 16 is a flowchart showing an example of security evaluation.
The security evaluation corresponds to step S11.
(S30) The access authority prediction unit 132 identifies the input data of the starting point corresponding to the identification information of the user to be processed based on the input data list 113 generated in step S21. The access authority prediction unit 132 acquires the input data access authority information 111a related to the input data of the specified start point and stores it in the storage unit 110. Further, the access authority prediction unit 132 acquires the processing and shaping processing information 111b related to the processing from the input data of the start point to the utilization data based on the history information 111, and stores it in the storage unit 110.

 (S31)アクセス権限予測部132は、来歴情報111に基づいて、始点の入力データから得られる他のデータのアクセス権限を予測し、アクセス権限予測結果116を生成し、記憶部110に格納する。始点の入力データから得られる他のデータには、始点の入力データに基づいて生成される蓄積データ(中間データ)や活用データが含まれる。また、アクセス権限予測部132は、情報処理システム50のデータカタログなどから実際のデータのアクセス権限を示すアクセス権限情報117を取得し、記憶部110に格納する。 (S31) The access authority prediction unit 132 predicts the access authority of other data obtained from the input data of the start point based on the history information 111, generates the access authority prediction result 116, and stores it in the storage unit 110. Other data obtained from the input data of the start point includes accumulated data (intermediate data) and utilization data generated based on the input data of the start point. Further, the access authority prediction unit 132 acquires the access authority information 117 indicating the actual data access authority from the data catalog or the like of the information processing system 50 and stores it in the storage unit 110.

 (S32)評価部140は、評価対象のデータを特定する。評価対象のデータの候補は、アクセス権限予測結果116に含まれる全ての蓄積データおよび全ての活用データである。評価部140は、評価対象のデータの候補の中から評価対象のデータを1つ特定する。なお、セキュリティ評価の処理において、最初にステップS32を実行する時点における各データのセキュリティ評価値の初期値は0であるとする。 (S32) The evaluation unit 140 specifies the data to be evaluated. Candidates for the data to be evaluated are all accumulated data and all utilization data included in the access authority prediction result 116. The evaluation unit 140 specifies one data to be evaluated from the data candidates to be evaluated. In the security evaluation process, the initial value of the security evaluation value of each data at the time when step S32 is first executed is assumed to be 0.

 (S33)評価部140は、アクセス権限予測結果116における、評価対象のデータに対して予測されたアクセス権限が、アクセス権限情報117における実際のアクセス権限に一致するか否かを判定する。一致する場合、評価部140は、ステップS34に処理を進める。一致しない場合、評価部140は、ステップS35に処理を進める。 (S33) The evaluation unit 140 determines whether or not the predicted access authority for the data to be evaluated in the access authority prediction result 116 matches the actual access authority in the access authority information 117. If they match, the evaluation unit 140 proceeds to step S34. If they do not match, the evaluation unit 140 proceeds to step S35.

 (S34)評価部140は、評価対象のデータのセキュリティ評価の評価値、すなわち、セキュリティ評価値を加点する。加点では、例えば、セキュリティ評価値「1」を加算する。ただし、加点する来歴評価値は「1」以外でもよく、前述のように、所定の上限値(例えば「1」)を超えないように該当のデータへのセキュリティ評価値が付与されてもよい。 (S34) The evaluation unit 140 adds points to the evaluation value of the security evaluation of the data to be evaluated, that is, the security evaluation value. In addition, for example, a security evaluation value "1" is added. However, the history evaluation value to be added may be other than "1", and as described above, a security evaluation value may be given to the relevant data so as not to exceed a predetermined upper limit value (for example, "1").

 (S35)評価部140は、評価対象の全データを評価済であるか否かを判定する。評価対象の全データを評価済の場合、評価部140は、セキュリティ評価を終了する。評価対象の全データを評価済でない場合、評価部140は、ステップS32に処理を進める。 (S35) The evaluation unit 140 determines whether or not all the data to be evaluated have been evaluated. If all the data to be evaluated have been evaluated, the evaluation unit 140 ends the security evaluation. If all the data to be evaluated have not been evaluated, the evaluation unit 140 proceeds to step S32.

 図17は、最新性評価例を示すフローチャートである。
 最新性評価は、ステップS12に相当する。
 (S40)遅延時間算出部133は、活用データに関する遅延要件情報119を取得する。例えば、遅延要件情報119は、ユーザにより情報処理装置100に入力される。
FIG. 17 is a flowchart showing an example of up-to-date evaluation.
The up-to-dateness evaluation corresponds to step S12.
(S40) The delay time calculation unit 133 acquires the delay requirement information 119 regarding the utilization data. For example, the delay requirement information 119 is input to the information processing apparatus 100 by the user.

 (S41)遅延時間算出部133は、来歴情報111に基づいて、活用データに対する実際のデータ更新の遅延時間を計算する。前述のように、遅延時間算出部133は、情報処理システム50から取得されるデータ更新ログを、データ更新の遅延時間の計算に用いることができる。遅延時間算出部133は、計算した遅延時間を、記憶部110に格納された実遅延時間情報120に記録する。 (S41) The delay time calculation unit 133 calculates the delay time of the actual data update for the utilization data based on the history information 111. As described above, the delay time calculation unit 133 can use the data update log acquired from the information processing system 50 to calculate the delay time for data update. The delay time calculation unit 133 records the calculated delay time in the actual delay time information 120 stored in the storage unit 110.

 (S42)評価部140は、評価対象の活用データを特定する。なお、最新性評価の処理において、最初にステップS42を実行する時点における各データの最新性評価値の初期値は0であるとする。 (S42) The evaluation unit 140 specifies the utilization data to be evaluated. In the up-to-dateness evaluation process, it is assumed that the initial value of the up-to-dateness evaluation value of each data at the time when step S42 is first executed is 0.

 (S43)評価部140は、遅延要件情報119および実遅延時間情報120に基づいて、該当の活用データに対して計算された遅延時間が遅延要件情報119に基づく許容範囲内であるか否かを判定する。許容範囲内である場合、ステップS44に処理を進める。許容範囲内でない場合、ステップS45に処理を進める。 (S43) The evaluation unit 140 determines whether or not the delay time calculated for the corresponding utilization data is within the permissible range based on the delay requirement information 119 based on the delay requirement information 119 and the actual delay time information 120. judge. If it is within the permissible range, the process proceeds to step S44. If it is not within the permissible range, the process proceeds to step S45.

 (S44)評価部140は、該当の活用データの最新性評価の評価値、すなわち、最新性評価値を加点する。また、評価部140は、該当の活用データに至る中間データ(蓄積データ)の最新性評価値も加点する。加点では、例えば、最新性評価値「1」を加算する。ただし、加算する最新性評価値は「1」以外でもよく、前述のように、所定の上限値(例えば「1」)を超えないように該当のデータへの最新性評価値が付与されてもよい。 (S44) The evaluation unit 140 adds points to the evaluation value of the latestness evaluation of the corresponding utilization data, that is, the latestness evaluation value. In addition, the evaluation unit 140 also adds points for the up-to-dateness evaluation value of the intermediate data (accumulated data) leading to the corresponding utilization data. In addition, for example, the latestness evaluation value "1" is added. However, the up-to-dateness evaluation value to be added may be other than "1", and as described above, even if the up-to-dateness evaluation value is given to the corresponding data so as not to exceed a predetermined upper limit value (for example, "1"). good.

 (S45)評価部140は、評価対象の全活用データを評価済であるか否かを判定する。評価対象の全活用データを評価済の場合、評価部140は、最新性評価を終了する。評価対象の全活用データを評価済でない場合、評価部140は、ステップS42に処理を進める。 (S45) The evaluation unit 140 determines whether or not all the utilization data to be evaluated have been evaluated. If all the utilization data to be evaluated have been evaluated, the evaluation unit 140 ends the up-to-date evaluation. If the evaluation target full utilization data has not been evaluated, the evaluation unit 140 proceeds to step S42.

 図18は、評価結果表示制御例を示すフローチャートである。
 評価結果表示制御は、ステップS14に相当する。
 (S50)表示制御部150は、表示対象とする評価種別の、ユーザによる選択を受け付ける。評価種別には、来歴評価、セキュリティ評価、最新性評価、および、総合評価がある。ユーザは、これらの評価種別のうちの1つを選択してもよいし、来歴評価、セキュリティ評価および最新性評価のうちの2つの組み合わせを選択してもよい。以下では、主に総合評価が選択される場合を例示するが、他の評価種別の場合も同様の手順となる。
FIG. 18 is a flowchart showing an evaluation result display control example.
The evaluation result display control corresponds to step S14.
(S50) The display control unit 150 accepts the user's selection of the evaluation type to be displayed. Evaluation types include history evaluation, security evaluation, up-to-date evaluation, and comprehensive evaluation. The user may select one of these evaluation types, or may select a combination of two of the history evaluation, the security evaluation, and the up-to-date evaluation. In the following, the case where the comprehensive evaluation is mainly selected is illustrated, but the procedure is the same for other evaluation types.

 (S51)表示制御部150は、該当のユーザの識別情報に対応するデータの来歴を示す評価結果画面400をディスプレイ61に表示させる。表示制御部150は、ネットワーク60を介して接続されたクライアント装置などの他の装置に評価結果画面400の情報を送信し、他の装置によって、当該他の装置に接続されたディスプレイに評価結果画面400を表示させてもよい。表示制御部150は、ステップS50で選択された評価種別に対応する評価値(ここでは、総合評価値)を用いて「分類」を表す矢印を色分けしたデータフロー図401を、評価結果画面400の中に表示させる。評価結果画面400は、凡例402の画像を含んでもよい。ステップS50で、2つの評価種別が選択された場合、表示制御部150は、該当の分類に対する2つの評価種別での評価値の和により、矢印を色分けすればよい。 (S51) The display control unit 150 displays the evaluation result screen 400 showing the history of the data corresponding to the identification information of the corresponding user on the display 61. The display control unit 150 transmits the information of the evaluation result screen 400 to another device such as a client device connected via the network 60, and the evaluation result screen is displayed on the display connected to the other device by the other device. 400 may be displayed. The display control unit 150 displays the data flow diagram 401 in which the arrow indicating “classification” is color-coded using the evaluation value (here, the comprehensive evaluation value) corresponding to the evaluation type selected in step S50 on the evaluation result screen 400. Display inside. The evaluation result screen 400 may include the image of the legend 402. When two evaluation types are selected in step S50, the display control unit 150 may color-code the arrows according to the sum of the evaluation values of the two evaluation types for the corresponding classification.

 (S52)表示制御部150は、データフロー図401において、何れかのデータの選択があるか否かを判定する。当該選択がある場合、表示制御部150は、ステップS53に処理を進める。当該選択がない場合、表示制御部150は、ステップS54に処理を進める。例えば、ユーザは、入力デバイス62を操作して、ポインタP1によりデータフロー図401に表示されたデータを選択できる。あるいは、評価結果画面400がユーザの使用するクライアント装置により表示される場合、ユーザは、クライアント装置の入力デバイスを操作して、何れかのデータの選択を、情報処理装置100に入力してもよい。 (S52) The display control unit 150 determines whether or not any data is selected in the data flow diagram 401. If there is such a selection, the display control unit 150 proceeds to step S53. If there is no such selection, the display control unit 150 proceeds to step S54. For example, the user can operate the input device 62 to select the data displayed in the data flow diagram 401 by the pointer P1. Alternatively, when the evaluation result screen 400 is displayed by the client device used by the user, the user may operate the input device of the client device to input the selection of any data to the information processing device 100. ..

 (S53)表示制御部150は、総合評価値に基づいて選択されたデータから遡るデータフローを表示させる。例えば、表示制御部150は、データフロー図401において活用データABが選択された場合、活用データABから1つ前のデータである蓄積データB2まで遡るデータフローを含む評価結果画面500を表示させる。前述のように、表示制御部150は、遡る経路に分岐がある場合、活用データABに至る分類のうち、総合評価値が低い分類を多く含む方を優先的に選択して、選択した方の分岐先の分類を強調表示することが考えられる。例えば、評価結果画面500では、活用データABから遡ったデータ整形処理s4を逆に辿ると、蓄積データA2,B2に分岐する。したがって、表示制御部150は、分類「A1-A2間」、分類「B1-B2間」のうち、総合評価値の低い方である分類「B1-B2間」に連なる矢印を強調表示させる。 (S53) The display control unit 150 displays a data flow that traces back from the data selected based on the comprehensive evaluation value. For example, when the utilization data AB is selected in the data flow diagram 401, the display control unit 150 displays an evaluation result screen 500 including a data flow that traces back from the utilization data AB to the accumulated data B2 which is the previous data. As described above, when the display control unit 150 has a branch in the retroactive route, the display control unit 150 preferentially selects and selects the classification that includes many classifications having a low comprehensive evaluation value among the classifications leading to the utilization data AB. It is conceivable to highlight the classification of the branch destination. For example, on the evaluation result screen 500, when the data shaping process s4 traced back from the utilization data AB is traced in the reverse direction, the data is branched into the accumulated data A2 and B2. Therefore, the display control unit 150 highlights the arrow connected to the classification "B1-B2", which is the lower of the classifications "A1-A2" and "B1-B2", which has the lower overall evaluation value.

 (S54)表示制御部150は、表示終了の入力を受け付けたか否かを判定する。表示終了の入力を受け付けた場合、表示制御部150は、評価結果表示制御を終了する。表示制御部150は、表示終了の入力を受け付けていない場合、ステップS52に処理を進める。 (S54) The display control unit 150 determines whether or not the input for ending the display has been accepted. When the display end input is received, the display control unit 150 ends the evaluation result display control. If the display control unit 150 does not accept the input for ending the display, the display control unit 150 proceeds to step S52.

 このように、情報処理装置100によれば、情報処理基盤におけるデータ品質を評価するサービスを実現することができる。特に、情報処理装置100により、データの来歴情報を活用することで、ソフトウェア製品を跨ったデータの流れを評価する。また、情報処理装置100により、評価結果に基づき、データフロー図に問題箇所を表して表示する。 As described above, according to the information processing apparatus 100, it is possible to realize a service for evaluating the data quality in the information processing infrastructure. In particular, the information processing apparatus 100 evaluates the flow of data across software products by utilizing the history information of the data. Further, the information processing apparatus 100 displays the problematic portion on the data flow diagram based on the evaluation result.

 これにより、複数のソフトウェアを跨ぐデータの評価が可能となり、情報処理基盤全体の中での問題箇所のユーザによる特定を支援できる。
 ここで、情報処理システム50では、種々のソフトウェアが実行される。例えば、ソフトウェア製品ごとに、データの欠損値、重複データおよびデータの種類などによりデータ自体を評価することが考えられる。しかし、複数のソフトウェア製品で構成される情報処理基盤では、ソフトウェア製品ごとの評価だけではなく、データの収集から出力先まで、一連の流れの中で適正にデータ品質が管理されているかを評価することが考えられる。
This makes it possible to evaluate data across multiple softwares and support the user's identification of problem areas in the entire information processing infrastructure.
Here, in the information processing system 50, various software is executed. For example, it is conceivable to evaluate the data itself for each software product based on data missing values, duplicate data, data types, and the like. However, in an information processing platform consisting of multiple software products, not only evaluation of each software product but also evaluation of whether data quality is properly managed in a series of flows from data collection to output destination is evaluated. Is possible.

 ところが、複数のソフトウェア製品に跨る一連の流れの中でデータ品質を評価する仕組みが考えられていない。例えば、一連の流れ中でのデータ品質を評価するために、各ソフトウェア製品のデータ品質を属人的に評価することが考えられるが、評価に時間がかかる。特に、データ品質悪化の原因を特定するために、ユーザにデータの流れを読み解く作業を強いるのは、ユーザの負担が大きく、評価に時間がかかる。 However, no mechanism for evaluating data quality has been considered in a series of flows that span multiple software products. For example, in order to evaluate the data quality in a series of flows, it is conceivable to personally evaluate the data quality of each software product, but the evaluation takes time. In particular, forcing the user to read the data flow in order to identify the cause of the deterioration of data quality is a heavy burden on the user and takes time for evaluation.

 そこで、情報処理装置100は、来歴情報111に基づいて、複数のソフトウェアによるデータ処理におけるデータの品質を適切に評価する。
 例えば、複数のソフトウェアによる一連のデータ処理において、ユーザが意図するデータが処理されているか否かにより、データ処理により出力されるデータの品質が変わる。例えば、分析などのデータ処理を行う場合、ユーザが意図しない入力データが処理されていると、当該入力データに不要な情報や誤った情報が含まれていることなどが要因となり、データ処理の結果が誤っている可能性が高まるため、当該結果の信頼性が低下する。
Therefore, the information processing apparatus 100 appropriately evaluates the quality of data in the data processing by the plurality of software based on the history information 111.
For example, in a series of data processing by a plurality of software, the quality of the data output by the data processing changes depending on whether or not the data intended by the user is processed. For example, when performing data processing such as analysis, if input data not intended by the user is processed, the input data may contain unnecessary information or incorrect information, resulting in data processing. Is more likely to be wrong, which reduces the reliability of the result.

 そこで、情報処理装置100は、来歴情報111に基づき、複数のソフトウェアによるデータ処理の始点の入力データを特定する。始点の入力データがユーザの意図する入力であるか否かを確認することで、当該データ処理により出力される活用データや蓄積データの品質を適切に評価できる。また、当該評価をユーザが行うよりも速く行える。 Therefore, the information processing apparatus 100 specifies the input data of the start point of the data processing by the plurality of software based on the history information 111. By confirming whether the input data of the start point is the input intended by the user, the quality of the utilization data and the accumulated data output by the data processing can be appropriately evaluated. In addition, the evaluation can be performed faster than the user can perform.

 また、情報処理装置100は、来歴評価に加えて、セキュリティ評価や最新性評価といった複数の評価種別でデータの品質評価を行い、複数の評価種別での評価結果から、データの品質を総合評価することで、より適切にデータの品質を評価できる。 Further, the information processing apparatus 100 evaluates the quality of data in a plurality of evaluation types such as security evaluation and up-to-dateness evaluation in addition to the history evaluation, and comprehensively evaluates the quality of the data from the evaluation results in the plurality of evaluation types. Therefore, the quality of the data can be evaluated more appropriately.

 例えば、情報処理装置100は、次の処理を行う。
 来歴情報解析部130は、複数のソフトウェアの各々に対する入力データおよび出力データの履歴を示す来歴情報111に基づいて、複数のソフトウェアによるデータ処理の始点である第1の入力データの情報を抽出する。評価部140は、第1の入力データの情報がユーザにより入力された、所定のデータの情報に一致するか否かの比較に応じて、データ処理により出力される第1の出力データの品質評価を行う。
For example, the information processing apparatus 100 performs the following processing.
The history information analysis unit 130 extracts information on the first input data, which is a starting point of data processing by the plurality of software, based on the history information 111 showing the history of input data and output data for each of the plurality of software. The evaluation unit 140 evaluates the quality of the first output data output by the data processing according to the comparison of whether or not the information of the first input data matches the information of the predetermined data input by the user. I do.

 これにより、データの品質を適切に評価することができる。
 評価部140は、第1の入力データの情報がユーザにより入力された、所定のデータの情報に一致するか否かの比較に応じて、第1の入力データから第1の出力データに至るまでに経由する中間データの品質評価を行う。
This makes it possible to appropriately evaluate the quality of the data.
The evaluation unit 140 ranges from the first input data to the first output data according to the comparison of whether or not the information of the first input data matches the information of the predetermined data input by the user. Perform quality evaluation of intermediate data via.

 これにより、データ処理の最終的な出力である第1の出力データだけでなく、データ処理の過程で生成される中間データの品質を適切に評価することができる。
 また、来歴情報解析部130は、第1の入力データに対する第1のアクセス権限の情報を取得し、第1のアクセス権限および来歴情報111に基づいて、第1の出力データに対する第2のアクセス権限を予測する。評価部140は、第1の出力データの実際のアクセス権限が、予測された第2のアクセス権限に一致するか否かの比較に応じて、第1の出力データの品質評価を行う。
This makes it possible to appropriately evaluate the quality of not only the first output data, which is the final output of the data processing, but also the intermediate data generated in the process of data processing.
Further, the history information analysis unit 130 acquires the information of the first access authority to the first input data, and based on the first access authority and the history information 111, the second access authority to the first output data. Predict. The evaluation unit 140 evaluates the quality of the first output data according to the comparison of whether or not the actual access authority of the first output data matches the predicted second access authority.

 このように、来歴情報111から予測される第1の出力データのアクセス権限が、実際のアクセス権限に一致するか否かを確認することで、第1の出力データが適切なプロセスを経て生成されたものであるか否かを評価できる。例えば、アクセス権限が一致しない場合には、第1の出力データの生成過程において、ユーザの想定していない不適切な処理が行われている可能性がある。このため、予測されたアクセス権限が実際のアクセス権限に一致しない場合、該当のデータの品質は低いと判断される。 In this way, by confirming whether or not the access authority of the first output data predicted from the history information 111 matches the actual access authority, the first output data is generated through an appropriate process. It is possible to evaluate whether or not it is a product. For example, if the access permissions do not match, there is a possibility that inappropriate processing that the user did not anticipate has been performed in the process of generating the first output data. Therefore, if the predicted access authority does not match the actual access authority, the quality of the corresponding data is judged to be low.

 来歴情報解析部130は、第2のアクセス権限の予測では、第1の入力データから第1の出力データに至るまでに経由する中間データに対する第3のアクセス権限を第1のアクセス権限に基づいて予測する。来歴情報解析部130は、予測した第3のアクセス権限に基づいて第2のアクセス権限を予測する。 In the prediction of the second access authority, the history information analysis unit 130 determines the third access authority for the intermediate data passing from the first input data to the first output data based on the first access authority. Predict. The provenance information analysis unit 130 predicts the second access authority based on the predicted third access authority.

 このように、中間データに対するアクセス権限の予測結果に基づいて、データフローの順方向に順番にデータのアクセス権限を予測することで、第2のアクセス権限を適切に予測することができる。 In this way, the second access authority can be appropriately predicted by predicting the access authority of the data in order in the forward direction of the data flow based on the prediction result of the access authority to the intermediate data.

 評価部140は、中間データの実際のアクセス権限が、予測した第3のアクセス権限に一致するか否かの比較に応じて、中間データの品質評価を行ってもよい。
 これにより、データ処理の最終的な出力である第1の出力データだけでなく、データ処理の過程で生成される中間データの品質を適切に評価することができる。
The evaluation unit 140 may evaluate the quality of the intermediate data according to the comparison of whether or not the actual access authority of the intermediate data matches the predicted third access authority.
This makes it possible to appropriately evaluate the quality of not only the first output data, which is the final output of the data processing, but also the intermediate data generated in the process of data processing.

 また、来歴情報解析部130は、第1の入力データの発生から第1の出力データが更新されるまでに許容される第1の遅延時間の情報を取得する。第1の遅延時間の情報は、例えば、ユーザにより入力される。来歴情報解析部130は、データの更新履歴の情報および来歴情報111に基づいて、第1の入力データが発生してから第1の出力データが更新されるまでの第2の遅延時間を計算する。評価部140は、第2の遅延時間が第1の遅延時間よりも短いか否かの比較に応じて、第1の出力データの品質評価を行う。 Further, the history information analysis unit 130 acquires information on the first delay time allowed from the generation of the first input data to the update of the first output data. The information of the first delay time is input by the user, for example. The history information analysis unit 130 calculates a second delay time from the generation of the first input data to the update of the first output data based on the data update history information and the history information 111. .. The evaluation unit 140 evaluates the quality of the first output data according to the comparison of whether or not the second delay time is shorter than the first delay time.

 このように、第1の入力データの発生から第1の出力データの更新までの遅延時間が、ユーザの遅延要件を満たすか否かによって、第1の出力データが適切なプロセスを経て生成されたものであるか否かを評価できる。例えば、第2の遅延時間が第1の遅延時間よりも長い場合、遅延要件を満たさないこととなり、第1の入力データから第1の出力データに至るプロセスにおいて、異常や性能劣化などが生じている可能性がある。このため、第2の遅延時間が第1の遅延時間よりも長い場合、第1の出力データの品質は低いと判断される。評価部140は、第2の遅延時間と第1の遅延時間との比較に応じて、第1の出力データに加えて、第1の入力データから第1の出力データに至るまでに経由する中間データの品質評価を行ってもよい。この場合も、第2の遅延時間が第1の遅延時間よりも長い場合には、中間データの品質は低いと判断される。 In this way, the first output data is generated through an appropriate process depending on whether the delay time from the generation of the first input data to the update of the first output data meets the delay requirement of the user. It is possible to evaluate whether or not it is a thing. For example, if the second delay time is longer than the first delay time, the delay requirement is not satisfied, and an abnormality or performance deterioration occurs in the process from the first input data to the first output data. There may be. Therefore, when the second delay time is longer than the first delay time, it is determined that the quality of the first output data is low. The evaluation unit 140 passes through from the first input data to the first output data in addition to the first output data according to the comparison between the second delay time and the first delay time. Data quality may be evaluated. In this case as well, if the second delay time is longer than the first delay time, the quality of the intermediate data is judged to be low.

 表示制御部150は、第1の入力データから第1の出力データに至るデータフロー図401を表示装置に表示させる。表示制御部150は、データフロー図401に含まれる、第1の入力データと第1の出力データとの関連を示す画像要素の表示態様を、第1の出力データに対する品質評価の結果に基づいて変更する。 The display control unit 150 causes the display device to display the data flow diagram 401 from the first input data to the first output data. The display control unit 150 determines the display mode of the image element, which is included in the data flow diagram 401 and shows the relationship between the first input data and the first output data, based on the result of quality evaluation for the first output data. change.

 これにより、ユーザによる見直し箇所の特定を支援できる。データフロー図401におけるデータフローを示す矢印は、画像要素の一例である。ディスプレイ61は、表示装置の一例である。表示装置は、他の情報処理装置に接続された表示装置でもよい。その場合、表示制御部150は、ネットワーク60を介して他の情報処理装置に表示内容の情報を送信することで表示制御を行う。 This can help the user identify the review location. Data flow The arrow indicating the data flow in FIG. 401 is an example of an image element. The display 61 is an example of a display device. The display device may be a display device connected to another information processing device. In that case, the display control unit 150 performs display control by transmitting information on the display content to another information processing device via the network 60.

 表示制御部150は、表示装置に表示されたデータフロー図401に含まれる第1の出力データを示す画像が選択を受け付ける。すると、表示制御部150は、第1の出力データに至るまでに経由するデータの評価値に基づいて、第1の出力データに至るまでのデータ間の関連を示す複数の画像要素のうち、強調表示させる画像要素を選択し、選択した画像要素を強調表示させる。 The display control unit 150 accepts the selection of the image showing the first output data included in the data flow diagram 401 displayed on the display device. Then, the display control unit 150 emphasizes among a plurality of image elements showing the relationship between the data up to the first output data, based on the evaluation value of the data up to the first output data. Select the image element to be displayed and highlight the selected image element.

 これにより、ユーザによる見直し箇所の特定を支援できる。例えば、図13に例示したように、第1の出力データに至るデータフローの複数の始点の入力データを基にする系列(例えば、分類「A1-A2間」,「B1-B2」間)がある場合がある。その場合、表示制御部150は、評価値の低い分類を多く含む系列や分類の評価値の平均値が低い系列に属する画像要素を選択して、強調表示させることが考えられる。これにより、ユーザによる見直しの優先度の高い箇所を提示可能となり、ユーザによる見直し箇所の効率的な特定を支援できる。 This can help the user identify the review location. For example, as illustrated in FIG. 13, a series based on the input data of a plurality of starting points of the data flow leading to the first output data (for example, between the classifications “A1-A2” and “B1-B2”) There may be. In that case, it is conceivable that the display control unit 150 selects and highlights an image element belonging to a series containing many classifications having low evaluation values or a series having a low average value of the evaluation values of the classifications. As a result, it is possible to present a part having a high priority for review by the user, and it is possible to support the efficient identification of the part to be reviewed by the user.

 なお、第1の実施の形態の情報処理は、処理部12にプログラムを実行させることで実現できる。また、第2の実施の形態の情報処理は、CPU101にプログラムを実行させることで実現できる。プログラムは、コンピュータ読み取り可能な記録媒体63に記録できる。 The information processing of the first embodiment can be realized by causing the processing unit 12 to execute the program. Further, the information processing of the second embodiment can be realized by causing the CPU 101 to execute the program. The program can be recorded on a computer-readable recording medium 63.

 例えば、プログラムを記録した記録媒体63を配布することで、プログラムを流通させることができる。また、プログラムを他のコンピュータに格納しておき、ネットワーク経由でプログラムを配布してもよい。コンピュータは、例えば、記録媒体63に記録されたプログラムまたは他のコンピュータから受信したプログラムを、RAM102やHDD103などの記憶装置に格納し(インストールし)、当該記憶装置からプログラムを読み込んで実行してもよい。 For example, the program can be distributed by distributing the recording medium 63 on which the program is recorded. Alternatively, the program may be stored in another computer and distributed via the network. For example, the computer may store (install) a program recorded on the recording medium 63 or a program received from another computer in a storage device such as RAM 102 or HDD 103, read the program from the storage device, and execute the program. good.

 上記については単に本発明の原理を示すものである。更に、多数の変形や変更が当業者にとって可能であり、本発明は上記に示し、説明した正確な構成および応用例に限定されるものではなく、対応する全ての変形例および均等物は、添付の請求項およびその均等物による本発明の範囲とみなされる。 The above merely indicates the principle of the present invention. Further, numerous modifications and modifications are possible to those skilled in the art, and the invention is not limited to the exact configurations and applications described and described above, and all corresponding modifications and equivalents are attached. It is considered to be the scope of the present invention according to the claims and their equivalents.

 10 情報処理装置
 11 記憶部
 12 処理部
 20 来歴情報
 31,32 ソフトウェア
 41,42,43 データ記憶部
 d1,d2,d3 データ
 S1,S2 ステップ
10 Information processing device 11 Storage unit 12 Processing unit 20 History information 31, 32 Software 41, 42, 43 Data storage unit d1, d2, d3 Data S1, S2 Step

Claims (10)

 コンピュータに、
 複数のソフトウェアの各々に対する入力データおよび出力データの履歴を示す来歴情報に基づいて、前記複数のソフトウェアによるデータ処理の始点である第1の入力データの情報を抽出し、
 前記第1の入力データの情報がユーザにより入力された、所定のデータの情報に一致するか否かの比較に応じて、前記データ処理により出力される第1の出力データの品質評価を行う、
 処理を実行させる情報処理プログラム。
On the computer
Based on the history information indicating the history of the input data and the output data for each of the plurality of software, the information of the first input data which is the starting point of the data processing by the plurality of software is extracted.
The quality of the first output data output by the data processing is evaluated according to the comparison of whether or not the information of the first input data matches the information of the predetermined data input by the user.
An information processing program that executes processing.
 前記コンピュータに更に、
 前記第1の入力データの情報がユーザにより入力された、所定のデータの情報に一致するか否かの比較に応じて、前記第1の入力データから前記第1の出力データに至るまでに経由する中間データの品質評価を行う、
 処理を実行させる請求項1記載の情報処理プログラム。
Further to the computer
The route from the first input data to the first output data depends on the comparison of whether or not the information of the first input data matches the information of the predetermined data input by the user. Perform quality evaluation of intermediate data
The information processing program according to claim 1, wherein the processing is executed.
 前記コンピュータに更に、
 前記第1の入力データに対する第1のアクセス権限の情報を取得し、前記第1のアクセス権限および前記来歴情報に基づいて、前記第1の出力データに対する第2のアクセス権限を予測し、
 前記第1の出力データの実際のアクセス権限が、予測された前記第2のアクセス権限に一致するか否かの比較に応じて、前記第1の出力データの品質評価を行う、
 処理を実行させる請求項1または2記載の情報処理プログラム。
Further to the computer
The information of the first access authority to the first input data is acquired, and the second access authority to the first output data is predicted based on the first access authority and the history information.
The quality of the first output data is evaluated according to the comparison of whether or not the actual access authority of the first output data matches the predicted second access authority.
The information processing program according to claim 1 or 2, wherein the processing is executed.
 前記第2のアクセス権限の予測では、前記第1の入力データから前記第1の出力データに至るまでに経由する中間データに対する第3のアクセス権限を前記第1のアクセス権限に基づいて予測し、予測した前記第3のアクセス権限に基づいて前記第2のアクセス権限を予測する、
 請求項3記載の情報処理プログラム。
In the prediction of the second access authority, the third access authority to the intermediate data passing from the first input data to the first output data is predicted based on the first access authority. Predicting the second access right based on the predicted third access right,
The information processing program according to claim 3.
 前記コンピュータに更に、
 中間データの実際のアクセス権限が、前記第3のアクセス権限に一致するか否かの比較に応じて、中間データの品質評価を行う、
 請求項4記載の情報処理プログラム。
Further to the computer
The quality of the intermediate data is evaluated according to the comparison of whether or not the actual access authority of the intermediate data matches the third access authority.
The information processing program according to claim 4.
 前記コンピュータに更に、
 前記第1の入力データの発生から前記第1の出力データが更新されるまでに許容される第1の遅延時間の情報を取得し、
 データの更新履歴の情報および前記来歴情報に基づいて、前記第1の入力データが発生してから前記第1の出力データが更新されるまでの第2の遅延時間を計算し、前記第2の遅延時間が前記第1の遅延時間よりも短いか否かの比較に応じて、前記第1の出力データの品質評価を行う、
 処理を実行させる請求項1乃至5の何れか1項に記載の情報処理プログラム。
Further to the computer
Information on the first delay time allowed from the generation of the first input data to the update of the first output data is acquired.
Based on the data update history information and the history information, the second delay time from the generation of the first input data to the update of the first output data is calculated, and the second delay time is calculated. The quality of the first output data is evaluated according to the comparison of whether or not the delay time is shorter than the first delay time.
The information processing program according to any one of claims 1 to 5, wherein the processing is executed.
 前記コンピュータに更に、
 前記第1の入力データから前記第1の出力データに至るデータフロー図を表示装置に表示させ、前記データフロー図に含まれる、前記第1の入力データと前記第1の出力データとの関連を示す画像要素の表示態様を、前記第1の出力データに対する品質評価の結果に基づいて変更する、
 処理を実行させる請求項1乃至6の何れか1項に記載の情報処理プログラム。
Further to the computer
A data flow diagram from the first input data to the first output data is displayed on the display device, and the relationship between the first input data and the first output data included in the data flow diagram is determined. The display mode of the image element shown is changed based on the result of the quality evaluation for the first output data.
The information processing program according to any one of claims 1 to 6, wherein the processing is executed.
 前記コンピュータに更に、
 前記表示装置に表示された前記データフロー図に含まれる前記第1の出力データを示す画像が選択されると、前記第1の出力データに至るまでに経由するデータの評価値に基づいて、前記第1の出力データに至るまでのデータ間の関連を示す複数の画像要素のうち、強調表示させる前記画像要素を選択し、選択した前記画像要素を強調表示させる、
 処理を実行させる請求項7記載の情報処理プログラム。
Further to the computer
When an image showing the first output data included in the data flow diagram displayed on the display device is selected, the said is based on the evaluation value of the data passing through to the first output data. Among a plurality of image elements showing the relationship between the data up to the first output data, the image element to be highlighted is selected, and the selected image element is highlighted.
The information processing program according to claim 7, wherein the processing is executed.
 コンピュータが、
 複数のソフトウェアの各々に対する入力データおよび出力データの履歴を示す来歴情報に基づいて、前記複数のソフトウェアによるデータ処理の始点である第1の入力データの情報を抽出し、
 前記第1の入力データの情報がユーザにより入力された、所定のデータの情報に一致するか否かの比較に応じて、前記データ処理により出力される第1の出力データの品質評価を行う、
 情報処理方法。
The computer
Based on the history information indicating the history of the input data and the output data for each of the plurality of software, the information of the first input data which is the starting point of the data processing by the plurality of software is extracted.
The quality of the first output data output by the data processing is evaluated according to the comparison of whether or not the information of the first input data matches the information of the predetermined data input by the user.
Information processing method.
 複数のソフトウェアの各々に対する入力データおよび出力データの履歴を示す来歴情報を記憶する記憶部と、
 前記記憶部に記憶された前記来歴情報に基づいて、前記複数のソフトウェアによるデータ処理の始点である第1の入力データの情報を抽出し、前記第1の入力データの情報がユーザにより入力された、所定のデータの情報に一致するか否かの比較に応じて、前記データ処理により出力される第1の出力データの品質評価を行う処理部と、
 を有する情報処理装置。
A storage unit that stores history information indicating the history of input data and output data for each of a plurality of software, and a storage unit.
Based on the history information stored in the storage unit, the information of the first input data which is the starting point of the data processing by the plurality of software is extracted, and the information of the first input data is input by the user. , A processing unit that evaluates the quality of the first output data output by the data processing according to the comparison of whether or not the information matches the predetermined data.
Information processing device with.
PCT/JP2020/048809 2020-12-25 2020-12-25 Information processing program, information processing method, and information processing device Ceased WO2022137526A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2020/048809 WO2022137526A1 (en) 2020-12-25 2020-12-25 Information processing program, information processing method, and information processing device
JP2022570966A JPWO2022137526A1 (en) 2020-12-25 2020-12-25

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/048809 WO2022137526A1 (en) 2020-12-25 2020-12-25 Information processing program, information processing method, and information processing device

Publications (1)

Publication Number Publication Date
WO2022137526A1 true WO2022137526A1 (en) 2022-06-30

Family

ID=82157801

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/048809 Ceased WO2022137526A1 (en) 2020-12-25 2020-12-25 Information processing program, information processing method, and information processing device

Country Status (2)

Country Link
JP (1) JPWO2022137526A1 (en)
WO (1) WO2022137526A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005196728A (en) * 2003-12-11 2005-07-21 Nec Corp Security verification system, device, method, and program for security verification
JP2020500369A (en) * 2016-11-09 2020-01-09 アビニシオ テクノロジー エルエルシー System and method for determining relationships between data elements

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005196728A (en) * 2003-12-11 2005-07-21 Nec Corp Security verification system, device, method, and program for security verification
JP2020500369A (en) * 2016-11-09 2020-01-09 アビニシオ テクノロジー エルエルシー System and method for determining relationships between data elements

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ITO TOSHIO: "Flowganizer Dataflow Management Technology to Maintain High Service Quality in Complicated Networked Systems", TOSHIBA REVIEW, vol. 71, no. 6, 1 January 2016 (2016-01-01), pages 40 - 43, XP055951427 *

Also Published As

Publication number Publication date
JPWO2022137526A1 (en) 2022-06-30

Similar Documents

Publication Publication Date Title
US12147578B2 (en) Consent receipt management systems and related methods
US20240281720A1 (en) Apparatuses, methods, and computer program products for programmatically parsing, classifying, and labeling data objects
US10984132B2 (en) Data processing systems and methods for populating and maintaining a centralized database of personal data
US10776518B2 (en) Consent receipt management systems and related methods
US10440062B2 (en) Consent receipt management systems and related methods
WO2022127474A1 (en) Providing explainable machine learning model results using distributed ledgers
US20200410117A1 (en) Consent receipt management systems and related methods
US20200193063A1 (en) Consent receipt management systems and related methods
US20190392171A1 (en) Consent receipt management systems and related methods
Chang et al. Towards a reuse strategic decision pattern framework–from theories to practices
US8285660B2 (en) Semi-automatic evaluation and prioritization of architectural alternatives for data integration
CN111226197A (en) Cognitive learning workflow execution
US10642870B2 (en) Data processing systems and methods for automatically detecting and documenting privacy-related aspects of computer software
US8285576B2 (en) Automated derivation, design and execution of industry-specific information environment
US9910858B2 (en) System and method for providing contextual analytics data
US12112287B1 (en) Automated estimation of resources related to testing within a service provider network
US20250117370A1 (en) Data Management Ecosystem for Databases
JPWO2014054230A1 (en) Information system construction device, information system construction method, and information system construction program
CN118586849A (en) Data processing methods
US20210397745A1 (en) Data providing server device and data providing method
US20240152847A1 (en) Workflow management with form-based, dynamic workflow builder and application-level blue-green topology
JP6695847B2 (en) Software parts management system, computer
WO2022137526A1 (en) Information processing program, information processing method, and information processing device
KR20230102700A (en) File management system interconnected with web service and method of the same
US11481662B1 (en) Analysis of interactions with data objects stored by a network-based storage service

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20967012

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022570966

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20967012

Country of ref document: EP

Kind code of ref document: A1