WO2013145512A1 - Dispositif de gestion et procédé de gestion de traitement distribué - Google Patents

Dispositif de gestion et procédé de gestion de traitement distribué Download PDF

Info

Publication number: WO2013145512A1
Authority: WO; WIPO (PCT)
Prior art keywords: data; processing; information; server; vertex
Prior art date: 2012-03-30
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.): Ceased

Application number

PCT/JP2013/000305

Other languages

English (en)

Japanese (ja)

Inventor

理人浅原

慎二中台

Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)

NEC Corp

Original Assignee

NEC Corp

Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)

2012-03-30

Filing date

2013-01-23

Publication date

2013-10-03

2013-01-23 Application filed by NEC Corp filed Critical NEC Corp

2013-10-03 Publication of WO2013145512A1 publication Critical patent/WO2013145512A1/fr

2014-09-30 Anticipated expiration legal-status Critical

Status Ceased legal-status Critical Current

Images

Classifications

- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/145—Network analysis or design involving simulating, designing, planning or modelling of a network

Definitions

the present invention relates to a technique for managing distributed processing of data in a distributed system in which data devices for storing data and processing devices for processing the data are distributedly arranged.
Patent Document 1 discloses a distributed system that determines a calculation server that processes data stored in a plurality of computers, and by sequentially determining the nearest available calculation server from the computer that stores the individual data. A distributed system for determining communication paths for all data has been proposed.
Japanese Patent Application Laid-Open No. 2004-228688 proposes a technique that does not reduce the speed of file input / output on the storage side even when there are a plurality of file transfer requests simultaneously.
Japanese Patent Application Laid-Open Publication No. 2003-259259 proposes a distributed file system that provides an address space that can collectively manage a group of files stored on a plurality of disks.
Patent Document 4 in order to reduce the network load in a distributed database system, when transferring database data to a client, it is arranged in a certain computer in consideration of the data transfer time. It has been proposed to move a relay server to another computer.
Patent Document 5 proposes a method of dividing a file according to the line speed and load status of each transfer path through which the file is transferred, and transferring the divided file.
Patent Document 6 proposes a stream processing apparatus that determines allocation of resources with high use efficiency in a short time in response to stream input / output requests for which various speeds are specified.
Patent Document 7 dynamically assigns an I / O node for each job without stopping execution of the job in a computer system in which a computing node accesses a file system using an I / O node.
a method for efficiently using I / O resources has been proposed.
Patent Documents 3 and 7 mentioned above a method for centrally handling data stored in a plurality of data servers and a method for determining an I / O node occupancy necessary for accessing a file system are proposed. It ’s just that.
the present invention has been made in view of the circumstances as described above, and reduces the data processing time of the entire system in a distributed system in which data devices for storing data and processing devices for processing the data are distributed. Provide technology to reduce.
the first aspect relates to a management device.
the management device according to the first aspect includes a plurality of first vertices indicating a plurality of data devices for storing data, a plurality of second vertices indicating a plurality of processing devices for processing data, and each of the data devices.
Each processing amount constraint condition that includes the data processing capacity per unit time of the processing device as an upper limit is set, from each of the second vertices to at least one third vertex of the subsequent stage from each of the second vertices
a model generation unit that generates model information capable of constructing a conceptual model including at least one second side; and the conceptual model including the first vertex, the second vertex, the first side, and the second side, respectively.
the concept Determine the flow rate of each side on the model, select each path on the conceptual model that satisfies the flow rate of each side, and according to each vertex included in each selected path, by the processing device and the processing device And a determination unit that determines a plurality of combinations with the data device that stores data to be processed.
the second aspect relates to a distributed processing management method.
at least one computer includes a plurality of first vertices indicating a plurality of data devices that store data, and a plurality of second vertices indicating a plurality of processing devices that process data.
a plurality of transfer amount constraint conditions each including a data transferable amount per unit time from each data device to each processing device as an upper limit value.
Each processing amount constraint condition including the first side of each of the processing units and the data processing capacity per unit time of each processing device as an upper limit value is set, respectively, from each second vertex to the subsequent stage from each second vertex Generating model information capable of constructing a conceptual model including at least one second side reaching at least one third vertex, and each of the first vertex, the second vertex, the first side, and the second side Including the concept
the amount of data processing per unit time that can be executed according to the transfer amount constraint condition and the processing amount constraint condition set for the first side and the second side included in the route for each route on Dell Using the sum, determine the flow rate of each side on the conceptual model, select each route on the conceptual model that satisfies the flow rate on each side, and depending on each vertex included in each selected route Determining a plurality of combinations of the processing device and the data device storing data processed by the processing device.
Another aspect of the present invention may be a management program that causes at least one computer to implement each configuration in the first aspect, or a computer-readable recording medium that stores such a program. There may be.
This recording medium includes a non-transitory tangible medium.
10 is a flowchart showing detailed operations of the master server of the first embodiment in step (S404-10).
7 is a flowchart showing detailed operations of the master server of the first embodiment in a step (S404-20). It is a flowchart which shows detailed operation
FIG. 23 is a flowchart showing a detailed operation of the master server in the first modified example of the second embodiment regarding the step (S404-20) shown in FIG.
FIG. It is a figure which shows the information stored in the input-output communication path information storage part in Example 1.
FIG. It is a figure which shows the information stored in the data location storage part in Example 1.
FIG. It is a figure which shows the model information produced
FIG. It is a figure which shows the conceptual model constructed
FIG. 10 is a diagram illustrating information stored in a job information storage unit according to the second embodiment. It is a figure which shows the information stored in the server state storage part in Example 2. FIG. It is a figure which shows the information stored in the data location storage part in Example 2. FIG. It is a figure which shows the model information produced
FIG. It is a figure which shows notionally the data transmission / reception implemented in Example 2.
FIG. It is a figure which shows the information stored in the data location storage part in Example 3.
FIG. It is a figure which shows the model information produced
FIG. It is a figure which shows the conceptual model constructed
FIG. 10 is a diagram conceptually illustrating a configuration of a distributed system in a fourth embodiment. It is a figure which shows the information stored in the server state storage part in Example 4. FIG. It is a figure which shows the information stored in the input-output communication path information storage part in Example 4. FIG. It is a figure which shows the model information produced
FIG. FIG. 62 is a diagram showing a conceptual model constructed from model information shown in FIG. 61.
FIG. 10 is a diagram illustrating information stored in a job information storage unit according to the fifth embodiment.
FIG. FIG. 68 is a diagram showing a conceptual model constructed from model information shown in FIG. 67. It is a figure which shows notionally the determination process of the flow function f by the flow increase method in the maximum flow problem in Example 5, and the determination process of data flow information. It is a figure which shows notionally the determination process of the flow function f by the flow increase method in the maximum flow problem in Example 5, and the determination process of data flow information. It is a figure which shows notionally the determination process of the flow function f by the flow increase method in the maximum flow problem in Example 5, and the determination process of data flow information.
FIG. It is a figure which shows notionally the data transmission / reception implemented in Example 5.
FIG. It is a figure which shows the information stored in the server state storage part in Example 6.
FIG. It is a figure which shows the model information produced
FIG. It is a figure which shows the conceptual model constructed
FIG. 1A is a diagram conceptually illustrating a configuration example of a distributed system in the first embodiment.
a configuration overview and an operation overview of the distributed system 350 in the first embodiment, and differences between the first embodiment and related technologies will be described with reference to FIG. 1A.
the distributed system 350 includes a master server 300, a network switch 320, a plurality of processing servers 330 # 1 to 330 # n, a plurality of data servers 340 # 1 to 340 # n, and the like that are connected to each other by a network 370.
the distributed system 350 may include a client 360, another server 399, and the like.
the data servers 340 # 1 to 340 # n may be collectively referred to as the data server 340
the processing servers 330 # 1 to 330 # n may be collectively referred to as the processing server 330.
the data server 340 stores data that can be processed by the processing server 330.
the processing server 330 receives the data from the data server 340 and processes the received data by executing a processing program.
the client 360 transmits request information that is information for requesting the master server 300 to start data processing.
the request information includes information indicating a processing program and data used by the processing program.
the master server 300 determines a processing server 330 for processing one or more of the data stored in the data server 340 for each data. For each processing server 330 determined, the master server 300 includes determination information including information indicating data to be processed and the data server 340 storing the data, and information indicating the data processing amount per unit time. Generate. The data server 340 and the processing server 330 transmit and receive data based on the determination information, and the processing server 330 processes the received data.
the master server 300, the processing server 330, the data server 340, and the client 360 may be individually realized by dedicated devices or may be realized by general-purpose computers.
a plurality of the master server 300, the processing server 330, the data server 340, and the client 360 may be realized by one dedicated device or one computer.
one dedicated device or one computer as hardware is collectively referred to as one computer device.
the processing server 330 and the data server 340 are realized by the one computer device.
the single computer device includes, for example, a CPU (Central Processing Unit), a memory, an input / output interface (I / F), and the like that are connected to each other via a bus.
the memory is a RAM (Random Access Memory), a ROM (Read Only Memory), a hard disk, a portable storage medium, or the like.
the input / output I / F is connected to a communication device or the like that communicates with other devices via the network 370.
the input / output I / F may be connected to a user interface device such as a display device or an input device. Note that this embodiment does not limit the hardware configurations of the master server 300, the processing server 330, the data server 340, and the client 360.
FIG. 1B, FIG. 2A, and FIG. 2B are diagrams conceptually illustrating each configuration example of the distributed system 350.
the processing server 330 and the data server 340 are represented as computers, and the network 370 is represented as a data transmission / reception path via a switch.
the master server 300 is not shown.
the switch is a network device such as a hub or a router.
the distributed system 350 includes, for example, a plurality of computers 111 and 112 and switches 101 to 103 that connect them to each other.
a plurality of computers 111 and switches 102 are accommodated in a rack 121
a plurality of computers 112 and switches 103 are accommodated in a rack 122.
the racks 121 and 122 are accommodated in the data center 131, and the data centers 131 and 132 are connected by the inter-base communication network 141.
FIG. 1B illustrates a distributed system 350 in which switches and computers are connected in a star configuration.
FIGS. 2A and 2B illustrate a distributed system 350 configured by cascade-connected switches.
2A and 2B show examples of data transmission / reception between the data server 340 and the processing server 330, respectively.
the computers 207 to 210 function as the data server 340, and the computers 207 and 209 also function as the processing server 330.
the computer 210 functions as the master server 300.
the unusable computer 208 stores processing target data 211 and 212 in the storage disk 204.
the unusable computer 210 stores the processing target data 213 in the storage disk 206.
the available computer 207 executes the processing process 214, and the available computer 209 executes the processing process 215.
FIG. 3 is a diagram showing the transferable amount per unit time between computers. According to the table 220 shown in FIG. 3, the transfer amount per unit time when transferring the data to be processed as described above to another computer is shown. In this example, it is assumed that each processing process can perform necessary processing on the allocated data in parallel.
the transferable amount per unit time is 50 megabytes per second (MB / s).
the transferable amount for the computer 208 and the computer 209 is 50 MB / s
the transferable amount for the computer 210 and the computer 207 is 50 MB / s
the transferable amount for the computer 210 and the computer 209 is 100 MB / s. s.
FIG. 4 is a diagram showing the processable amount per unit time of the computer. According to the table 221 shown in FIG. 4, the processable amount per unit time for the computer 207 to process data to be processed is 50 MB / s, and the processable amount for the computer 209 is 150 MB / s. .
the throughput of the data processing is a smaller value between the transferable amount per unit time of the path for transferring the processing target data and the processable amount per unit time of the computer that performs the processing.
the processing target data 211 is transmitted via the data transfer path 216 and processed by the available computer 207, and the processing target data 213 is transmitted via the data transfer path 217 for use.
the processing target data 211 is transmitted via the data transfer path 230 and processed by the available computer 207, and the processing target data 212 is transmitted via the data transfer path 231.
the data 213 to be processed is transmitted through the data transfer path 232 and processed by the available computer 209.
the total throughput of data processing in FIG. 2A is 150 MB / s, which is the sum of the throughput (50 MB / s) related to the data 211 to be processed and the throughput (100 MB / s) related to the data 213 to be processed.
the throughput (50 MB / s) regarding the data 211 to be processed is a smaller value between the transferable amount 50 MB / s of the data transfer path 216 and the processable amount 50 MB / s of the computer 207.
the throughput (100 MB / s) related to the processing target data 213 is a smaller value of the transferable amount 100 MB / s of the data transfer path 217 and the processable amount 150 MB / s of the computer 209.
the total throughput of the data processing in FIG. 2B is the throughput (50 MB / s) regarding the processing target data 211, the throughput regarding the processing target data 212 (50 MB / s), and the throughput regarding the processing target data 213 (100 MB / s).
the sum is 200 MB / s.
the throughput (50 MB / s) regarding the data 211 to be processed is a smaller value between the transferable amount 50 MB / s of the data transfer path 230 and the processable amount 50 MB / s of the computer 207.
the throughput (50 MB / s) regarding the data 212 to be processed is a smaller value between the transferable amount 50 MB / s of the data transfer path 231 and the processable amount 150 MB / s of the computer 209.
the throughput (100 MB / s) regarding the data 213 to be processed is a smaller value between the transferable amount 100 MB / s of the data transfer path 232 and the processable amount 150 MB / s of the computer 209.
the data processing in FIG. 2B has a higher total throughput and is more efficient than the data processing in FIG. 2A.
the distributed system 350 in the first embodiment performs efficient data allocation as shown in FIG. 2B in the situation illustrated in FIGS. 2A and 2B.
FIGS. 2A and 2B the details of the distributed system 350 in the first embodiment will be described.
FIG. 5 is a diagram conceptually illustrating a processing configuration example of each device of the distributed system 350 in the first embodiment.
Each of these processing units may be realized individually or in combination and realized as a hardware component, a software component, or a combination of a hardware component and a software component. May be.
a hardware component is a hardware circuit such as a field programmable gate array (FPGA), an application specific integrated circuit (ASIC), a gate array, a combination of logic gates, a signal processing circuit, an analog circuit, etc. is there.
a software component is realized by executing data (program) on one or more memories by one or more processors (for example, a CPU (Central Processing Unit), a DSP (Digital Signal Processor), etc.).
processors for example, a CPU (Central Processing Unit), a DSP (Digital Signal Processor), etc.
the processing server 330 includes a processing server management unit 331, a processing execution unit 332, a processing program storage unit 333, a data transmission / reception unit 334, and the like.
the processing server management unit 331 holds information regarding the execution state of the processing program used when the processing execution unit 332 processes data.
the processing server management unit 331 updates the information regarding the execution state of the processing program according to the change in the execution state of the processing program.
the execution state of the processing program includes, for example, a pre-execution state, a running state, and an execution completion state.
the pre-execution state indicates a state where the process of assigning data to the process execution unit 332 has been completed, but the process execution unit 332 has not yet executed the process of the data.
the in-execution state indicates a state in which the process execution unit 332 is executing the data.
the execution completion state indicates a state in which the process execution unit 332 has completed processing the data.
As the execution state of the processing program a state determined based on the ratio of the amount of data processed by the processing execution unit 332 to the total amount of data allocated to the processing execution unit 332 may be used.
the data transmission / reception unit 334 transmits / receives data to / from another processing server 330 or the data server 340.
the processing server 330 sends data to be processed from the data server 340 designated by the master server 300 to the data transmission / reception unit 343 of the data server 340, the data transmission / reception unit 322 of the network switch 320, and the processing server 330.
the data transmission / reception unit 334 receives the data.
the process execution unit 332 of the process server 330 processes the received data to be processed.
the processing server 330 may directly acquire processing target data from the processing data storage unit 342.
the data transmission / reception unit 343 of the data server 340 and the data transmission / reception unit 334 of the processing server 330 may directly communicate without passing through the data transmission / reception unit 322 of the network switch 320.
the data server 340 includes a data server management unit 341, a processing data storage unit 342, a data transmission / reception unit 343, and the like.
the data server management unit 341 transmits location information of data stored in the processing data storage unit 342 to the master server 300.
the processing data storage unit 342 stores data uniquely identified in the distributed system 350.
the processing data storage unit 342 includes a hard disk drive (HDD), a solid state drive (SSD), a USB memory (Universal Serial Bus flash drive), and a RAM (Random Access Memory) disk. And so on.
the data stored in the processing data storage unit 342 may be data output by the processing server 330 or data being output.
the data stored in the processing data storage unit 342 may be received by the processing data storage unit 342 from another server or the like, or may be read by the processing data storage unit 342 from a portable storage medium or the like.
the data transmission / reception unit 343 transmits / receives data to / from another processing server 330 or another data server 340.
the network switch 320 has a data transmission / reception unit 322.
the data transmission / reception unit 322 relays data transmitted / received between the processing server 330 and the data server 340.
the master server 300 includes a data location storage unit 3070, a server state storage unit 3060, an input / output channel information storage unit 3080, a model generation unit 301, a determination unit 303, and the like.
Data may be explicitly specified by an identification name in a structure program that defines the structure of a directory or data, or may be specified based on other processing results such as an output result of a specified processing program.
the structure program is information that defines data to be processed by the processing program.
the structure program receives information (name or identifier) indicating certain data as an input, and outputs a directory name in which data corresponding to the input is stored and a file name indicating a file constituting the data.
the structure program may be a list of directory names or file names.
the data is each distributed file.
the unit of information received as an argument by the processing program is a row or a record
the data is a plurality of rows or a plurality of records in the distributed file.
the unit of information received as an argument by the processing program is a “row” of a table in a relational database
the data is a set of rows obtained by a predetermined search from a set of tables or A set of rows obtained by a range search of a certain attribute from the set is obtained.
the data may be a container such as Map or Vector of a program such as C ++ or JAVA (registered trademark), or may be an element of the container. Furthermore, even if the data is a matrix, it may be a row, column, or matrix element of the matrix.
the data to be processed is determined by registering one or more data identifiers in the data location storage unit 3070.
the name of the data to be processed is stored in the data location storage unit 3070 in association with the identifier of the data and the identifier of the data storage device.
Each data may be divided into a plurality of subsets (partial data), and the plurality of subsets may be distributed in a plurality of storage devices. Further, certain data may be multiplexed and arranged in two or more storage devices. In this case, data multiplexed from one data is also collectively referred to as distributed data.
the processing server 330 may input any one of the distributed data as the processing data in order to process the multiplexed data.
FIG. 6 is a diagram illustrating an example of information stored in the data location storage unit 3070.
the data location storage unit 3070 stores a plurality of data location information, which is information associated with a data name 3071, a distributed form 3073, a data description 3074, or a data name 3077.
the distribution form 3073 is information indicating a data storage form.
data for example, MyDataSet1
single is set in the distributed form 3073 of the row (data location information) corresponding to the data.
data for example, MyDataSet2
distributed arrangement is set in the distributed form 3073 of the row information (data location information) corresponding to the data.
data for example, MyDataSet3
n-duplication (1 / n) (n is 2 or more) in the distribution form 3073 of the information (data location information) of the row corresponding to the data Integer).
the data description 3074 includes a data identifier 3075, an identifier 3076 of a data storage device (data server 340 or its processing data storage unit 342), and a processing status 3078.
the data identifier 3075 is an identifier that uniquely indicates the data in each data storage device.
the information specified by the data identifier 3075 is determined according to the type of target data. For example, when the data is a file, the data identifier 3075 is information for specifying a file name. When the data is a database record, the data identifier 3075 may be information specifying SQL (Structured Query Language) for extracting the record.
SQL Structured Query Language
the storage device identifier 3076 is an identifier of the data server 340 or the processing data storage unit 342 for storing each data.
the identifier 3076 may be unique information in the distributed system 350, or may be an IP (Internet Protocol) address assigned to each device.
the processing status 3078 is information indicating the processing status of the data specified by the data identifier 3075.
“unprocessed” indicating that all of the data is unprocessed
“processing” indicating that the data is being processed by the processing server 330
all of the data has been processed. “Processed” is set to indicate that.
the processing status 3078 may be information indicating the progress of processing the data (for example, unprocessed after the 50th MB). Further, in the case of multiplexing or the like, when the processing statuses of the data indicated by all the data identifiers are equal, they may be described together.
the processing status 3078 is updated by the master server 300 or the like according to the progress status of processing by the processing server 330.
the data location storage unit 3070 stores each of the partial data names 3077 as the data name 3071 in association with the distribution form 3073 and the data description 3074 (for example, the fifth line in FIG. 6).
data for example, SubSet1
the name 3071 of the data is associated with the distribution form 3073 and the data description 3074 for each multiplexed data included in the data, and the data location It is stored in the storage unit 3070.
the data description 3074 includes an identifier 3076 of a storage device that stores the multiplexed data and an identifier (data identifier 3075) that uniquely indicates the data in the storage device.
Information of each row (each data location information) in the data location storage unit 3070 is deleted by the master server 300, the processing server 330, or the data server 340 when processing of the corresponding data is completed. Further, instead of deleting the information on each row (each data location information) in the data location storage unit 3070, information indicating completion and incomplete data processing is added to the information on each row (each data location information). Thus, completion of data processing may be recorded.
the data location storage unit 3070 may not include the distributed form 3073.
the type of data distribution is one of the above-described types.
the master server 300, the data server 340, and the processing server 330 may switch processes described below based on the description of the distributed form 3073.
FIG. 7 is a diagram illustrating an example of information stored in the input / output communication path information storage unit 3080.
the input / output communication path information storage unit 3080 is input / output communication that is information that associates the communication path ID 3081, the usable bandwidth 3082, the input source apparatus ID 3083, and the output destination apparatus ID 3084 with respect to each input / output communication path configuring the distributed system 350. Stores road information.
the communication path ID 3081 is an identifier of an input / output communication path between devices in which input / output communication occurs.
the available bandwidth 3082 is bandwidth information currently available on the input / output communication path. The available bandwidth generally indicates the amount of data that can be transferred per unit time.
the band information may be an actual measurement value or an estimated value.
the input source device ID 3083 is an identifier of a device that inputs data to the input / output communication path.
the output destination device ID 3084 is an identifier of a device from which the input / output communication path outputs data.
the device identifiers indicated by the input source device ID 3083 and the output destination device ID 3084 are unique identifiers in the distributed system 350 assigned to the data server 340, the processing server 330, the network switch 320, the processing data storage unit 342, and the like. It may be an IP address assigned to each device.
the input / output communication path may be a communication path between the data transmission / reception unit 343 of the data server 340 and the data transmission / reception unit 334 of the processing server 330, or the processing data storage unit 342 and the data transmission / reception unit 343 in the data server 340. Or a communication path between the data transmission / reception unit 343 of the data server 340 and the data transmission / reception unit 322 of the network switch 320.
the input / output communication path may be a communication path between the data transmission / reception unit 322 of the network switch 320 and the data transmission / reception unit 334 of the processing server 330, or a communication path between the data transmission / reception unit 322 of the network switch 320. It may be.
the input / output communication path is also It is included in the communication path.
such an input / output communication path is also simply referred to as a communication path.
FIG. 8 is a diagram illustrating an example of information stored in the server state storage unit 3060.
the server status storage unit 3060 includes a server ID 3061, load information 3062, configuration information 3063, processing data storage unit information 3064, and processable amount information 3065 for each processing server 330 and each data server 340 operating in the distributed system 350.
Each of the processing server status information which is information associated with each other, is stored.
the server ID 3061 is an identifier of the processing server 330 or the data server 340.
the identifiers of the processing server 330 and the data server 340 may be unique identifiers in the distributed system 350, or may be IP addresses assigned to them.
the load information 3062 includes information regarding the processing load of the processing server 330 or the data server 340.
the load information 3062 is, for example, a CPU usage rate, a memory usage amount, a network usage band, and the like.
Configuration information 3063 includes configuration status information of the processing server 330 or the data server 340.
the configuration information 3063 is, for example, hardware specifications such as the CPU frequency, the number of cores, and the memory amount in the processing server 330, and software specifications such as an OS (Operating System).
the processing data storage unit information 3064 includes an identifier of the processing data storage unit 342 included in the data server 340.
the processable amount information 3065 indicates the amount of data that can be processed by the processing server 330 per unit time.
Information stored in the server status storage unit 3060, the data location storage unit 3070, and the input / output communication path information storage unit 3080 may be updated by status notifications transmitted from the switch 320, the processing server 330, the data server 340, and the like.
the master server 300 may be updated with response information obtained through an inquiry.
the switch 320 generates information indicating the communication throughput of each port of itself and the identifier (MAC (Media Access Control) address, IP address, etc.) of the connection destination device of each port, and notifies the generated information of the status To the master server 300.
the server status storage unit 3060, the data location storage unit 3070, and the input / output communication path information storage unit 3080 update the stored information based on the information sent as the status notification.
the processing server 330 generates information indicating the throughput of the network interface, information indicating the allocation status of the processing target data to the processing execution unit 332, and information indicating the usage status of the processing execution unit 332.
the generated information may be transmitted to the master server 300 as the state notification.
the data server 340 generates information indicating the throughput of its own processing data storage unit 342 (disk) or network interface, and information indicating a list of data elements stored in the data server 340. Information may be transmitted to the master server 300 as the status notification.
the master server 300 may receive the status notification as described above by transmitting information requesting the status notification as described above to the switch 320, the processing server 330, and the data server 340.
Information stored in the server state storage unit 3060, the data location storage unit 3070, and the input / output communication path information storage unit 3080 may be given in advance by the administrator of the client 360 or the distributed system 350. Also, these pieces of information may be collected by a program such as a crawler that searches the distributed system 350. Further, the input / output communication path information storage unit 3080 and the data location storage unit 3070 may be provided in a distributed device by a technique such as a distributed hash table.
the model information includes information indicating each communication path from each data server 340 to each processing server 330, and a transfer amount constraint including, as an upper limit, a data transferable amount per unit time from each data server 340 to each processing server 330. And a processing amount constraint condition that includes the data processing capacity per unit time of the processing server as an upper limit value.
FIG. 9A is a diagram showing an example of model information.
Each line (each entry) of the model information 500 includes an identifier, a lower limit value of the flow rate, an upper limit value of the flow rate, and a pointer to the next element.
the identifier is information for specifying a node included in the model.
logical software elements may be set in the nodes included in the model.
the data server 340 and the processing server 330 are allocated as nodes indicating hardware elements, but a storage device (data device) included in the data server 340 may be allocated, or a CPU included in the processing server 330.
a processing device such as may be assigned. The logical elements will be described later.
an identifier indicating another node connected from the node indicated by the corresponding identifier is set.
the pointer to the next element may be set with a line number that can identify each line or memory address information.
the transfer amount restriction condition or the processing amount restriction condition is set in the flow rate lower limit value and the flow rate upper limit value.
the following conceptual model indicated by a plurality of vertices (nodes) and a plurality of sides connecting the vertices can be constructed.
This conceptual model is called a directed graph based on a network model or graph theory.
Each vertex corresponds to each node in the model information of FIG. 9A.
Each side corresponds to an input / output communication path (communication path) that connects the hardware elements indicated by each vertex, or a process for target data itself.
an available bandwidth of the input / output communication path is set as the transfer amount restriction condition.
the processing amount restriction condition is set for each side indicating the processing for the target data itself.
the input / output communication path is indicated by a subgraph composed of sides and nodes that are end points of the sides.
FIG. 9B is a diagram showing an example of a conceptual model constructed by model information.
the side connecting the data server D and the processing server P has the available bandwidth of the corresponding communication path as an attribute value (transfer amount constraint condition).
the side connecting the processing server P and the logical vertex ⁇ has the possible processing amount per unit time of the processing server P as an attribute value (processing amount constraint condition).
processing amount constraint condition a side where there is no restriction on the usable bandwidth or the possible processing amount is treated as having no constraint condition, that is, the usable bandwidth or the possible processing amount is infinite.
the available bandwidth and the possible processing amount on the side where there is no such constraint condition may be treated as special values other than infinity.
a plurality of vertices and sides may exist before the vertex indicating the data server D, between the data server D and the processing server P, and between the processing server P and the vertex ⁇ .
the job executed in the distributed system 350 is a unit of program processing requested to be executed by the distributed system 350, for example.
the form of the model information is not limited to the form of reference numeral 500 in FIG. 9A.
the model information may be realized by a linked list in which data field groups storing vertex and edge information are linked by reference.
the model generation unit 301 may change the generation method of the model information according to the operation state of the hardware element. For example, the model generation unit 301 may determine that a processing server 330 with a high CPU usage rate cannot be used, and exclude such processing server 330 from the model information target.
Execution can be performed according to the transfer amount constraint condition and the processing amount constraint condition set for the first side and the second side included in the route for each route on the conceptual model including the second side to the third vertex of
the flow rate of each side on the conceptual model is determined.
Each path in the conceptual model has a data flow from when certain data to be processed is sent from the processing data storage unit 342 of the data server 340 toward the processing server 330 until it is processed by the processing server 330. Indicates.
the flow rate of each side is determined so that, for example, the total amount of data processing per unit time for each route on the conceptual model is maximized.
the flow of each side is narrowed down from the multiple paths on the conceptual model to the path that minimizes the number of edges (number of communication hops) included in each path.
the total data processing amount per unit time for each narrowed path is determined to be the maximum.
the determination unit 303 selects each path on the conceptual model that satisfies the flow rate of each side determined in this way, and the processing server 330 and the processing server 330 according to each vertex included in each selected path. A plurality of combinations with the data server 340 that stores data to be processed is determined.
the information including the routes including the first side and the second side in the conceptual model and the flow rate of each route may be referred to as data flow Fi or data flow information.
the determination unit 303 generates such data flow information.
the flow rate of each side can also be expressed as a flow rate function f (e) that satisfies the following constraint expression on all sides e on the conceptual model.
Constraint expression: l (e) ⁇ f (e) ⁇ u (e) u (e) represents an upper limit capacity function that outputs an upper limit value (flow rate upper limit value of model information) of the transfer amount constraint condition or the processing amount constraint condition set for each side e, and l (e) represents The lower limit capacity function for outputting the lower limit value (flow rate lower limit value of the model information) of the transfer amount constraint condition or the processing amount constraint condition set for each side e is shown.
the determination unit 303 determines the flow function f that minimizes the processing time of the job executed in the distributed system 350.
the flow function f can be determined, for example, by maximizing the objective function with ⁇ e ⁇ E ′ (f (e)) as an objective function for a certain edge set E ′. Maximization of the objective function can be realized by using a linear programming method, a flow increasing method in a maximum flow problem, a preflow push method, or the like.
the determination unit 303 may add logical vertices and edges to the conceptual model constructed from the model information in order to determine the flow function f.
the determination unit 303 includes a logical start point, a set of edges connecting the start point and the vertex indicating the data server 340, a logical An end point and a set of sides connecting the vertex and the end point indicating the processing server 330 may be added to the conceptual model.
Such logical vertices and edges may be included in the model information by the model generation unit 301.
E represents a set of edges constituting the network model
V represents a set of vertices constituting the network model
f (e) represents a flow function of the edge e
s represents a starting point of the network model
t represents Indicates the end point of the network model
⁇ indicates a set of edges that exit from a certain vertex
⁇ + indicates a set of edges that enter a certain vertex
u (e) is an upper constraint that outputs a flow rate upper limit value of the edge e
the capacity function is shown
l (e) is a downward-constrained capacity function that outputs the lower limit value of the flow rate of the side e.
the determination unit 303 may add a constraint condition so that the determined data flow information can be executed by the distributed system 350.
the flow function f may be determined in a state in which a constraint condition is not added to the model information that does not include a flow in which the correspondence between the transferred data and the processed data is inconsistent.
FIG. 10 is a diagram showing an example of data flow information.
the data flow information includes route information and information on the flow rate of the route.
the processing amount per unit time (unit processing amount) is set as the flow rate of the route.
the route information is indicated by information on each vertex (data server D1, processing server P1, and logical vertex ⁇ ) included in the route.
an identifier (Flow1) for specifying the data flow Fi is set.
Flow1 for specifying the data flow Fi is set.
the determination unit 303 determines a combination of the processing server 330 and the data server 340 from which the processing target data of the processing server 330 is acquired based on the data flow information generated as described above.
the decision information indicating is generated.
the generated determination information is acquired by each processing server 330 included in the combination.
FIG. 11 is a diagram showing an example of decision information.
the determination information includes a data server ID, a processing data storage unit ID, a data ID, received data specifying information, and a data processing amount per unit time.
the data server ID is an identifier of the data server 340 that stores data to be processed by the processing server 330
the processing data storage unit ID is an identifier of the processing data storage unit 342 of the data server 340
the data ID is This is the identifier of the data to be processed.
the data ID may not be included in the decision information.
the processing server 330 uses the data stored in the processing data storage unit 342 of the data server 340 specified by the data server ID and the processing data storage unit ID, to determine whether the processing target of the job has been processed. Data may be acquired.
the received data identification information is multiplexed when the processing target data stored in a certain processing data storage unit 342 of a certain data server 340 is processed by a plurality of processing servers 330 or multiplexed to a plurality of data servers 340. This is set when the stored processing target data is processed by a plurality of processing servers 330.
the received data specifying information for example, information specifying a predetermined section in the data (for example, the start position of the section, the processing amount) is set. In addition to the cases described above, the reception data specifying information may not be set in the determination information.
the data processing amount per unit time that can be included in the decision information is set based on the unit processing amount included in the data flow information.
the processing server 330 sends the data specified by the determination information to the data server 340 with the data processing amount per unit time. Request to transfer. If the data processing amount per unit time is not included in the determination information, the processing server 330 may request the data server 340 to transfer at an arbitrary processing amount.
the determination information may be acquired by each data server 340 included in the combination.
the determination information may include the processing server ID, which is the identifier of the processing server 330, instead of the data server ID.
the determination unit 303 may distribute the processing program received from the client 360 to the processing server 330, for example.
the determination unit 303 inquires of the processing server 330 whether or not the processing program corresponding to the determination information is stored, and when the processing server 330 determines that the processing program is not stored, the processing received from the client The program may be distributed to the processing server 330.
the information for designating the above-described conceptual model, constraint conditions, and objective function may be described in a structure program or the like, and the structure program or the like may be given from the client 360 to the master server 300. Further, information for designating the conceptual model, the constraint condition, and the objective function may be given from the client 360 to the master server 300 as an activation parameter or the like.
the master server 300 may determine the conceptual model with reference to the data location storage unit 3070 and the like.
the master server 300 stores the model information generated by the model generation unit 301 and the data flow information generated by the determination unit 303 in a memory or the like, and stores the model information and data flow information in the model generation unit 301 and the determination unit. You may give to the input of 303. In this case, the model generation unit 301 and the determination unit 303 may use the model information and data flow information for model generation and optimal arrangement calculation. Further, the master server 300 may be realized so as to be compatible with all conceptual models, constraint conditions, and objective functions, or may be realized so as to be compatible only with a specific conceptual model.
FIG. 12 is a flowchart showing an overall outline of an operation example of the distributed system 350.
the master server 300 When the master server 300 receives request information that is a request to execute a processing program from the client 360, the master server 300 acquires the following pieces of information (S401).
the master server 300 includes a set of input / output communication path information in the distributed system 350, a set of data location information in which processing target data is associated with the data server 340 storing the data, and identifiers of usable processing servers 330. Get a set.
the master server 300 determines whether or not unprocessed data remains in the acquired set of processing target data (S402). When the master server 300 determines that unprocessed data does not remain in the acquired set of processing target data (S402; No), the process ends.
the master server 300 determines that unprocessed data remains in the acquired processing target data set (S402; Yes)
the master server 300 further includes the acquired identifiers of the available processing servers 330.
the master server 300 determines that there is a processing server 330 that can additionally execute processing (S403; Yes)
the master server 300 uses the acquired set of identifiers of the processing server 330 and the set of identifiers of the data server 340 as keys.
the input / output communication path information and the processing server state information are acquired, and model information is generated based on these information (S404).
the processing server 330 that can additionally execute processing is also referred to as an available processing server 330.
the master server 300 determines each combination of the processing server 330 and the data server 340 that maximizes a predetermined objective function under predetermined constraint conditions based on the generated model information (S405).
the master server 300 generates data flow information indicating each determined combination.
Each processing server 330 and each data server 340 corresponding to the combination determined in (S405) by the master server 300 transmits and receives the processing target data, and each processing server 330 processes the received processing target data. (S406). Thereafter, the processing of the distributed system 350 returns to the step (S401).
FIG. 13 is a flowchart showing the detailed operation of the master server 300 of the first embodiment in the step (S401).
the model generation unit 301 of the master server 300 acquires from the data location storage unit 3070 a set of identifiers of the data server 340 that stores the processing target data specified by the request information from the client 360 (S401-1).
the model generation unit 301 acquires a set of identifiers of the data server 340 and a set of identifiers of the processing server 330 from the server state storage unit 3060 (S401-2). Note that the step (S401-2) may be executed before the step (S401-1).
FIG. 14 is a flowchart showing a detailed operation of the master server 300 of the first embodiment in the step (S404).
the model generation unit 301 of the master server 300 acquires input / output communication path information indicating a communication path for the processing server 330 to process the processing target data from the input / output communication path information storage unit 3080. Based on the input / output communication path information acquired in the model information (for example, reference numeral 500 in FIG. 9A) stored in the memory or the like, the model generation unit 301 stores information on the communication path from the data server 340 to the processing server 330. Is added (S404-10).
the model generation unit 301 adds logical communication path information from the processing server 330 to the subsequent logical vertex to the model information (S404-20). Note that the step (S404-20) may be executed before the step (S404-10).
FIG. 15 is a flowchart showing a detailed operation of the master server 300 of the first embodiment in the step (S404-10).
the model generation unit 301 refers to the information acquired from the data location storage unit 3070 based on the request information, and executes the process (S404-12) for each data server Di storing the processing target data. (S404-11).
the model generation unit 301 executes steps (S404-13) to (S404-15) for each available processing server Pj (S404-12).
the model generation unit 301 adds a line including the name (or identifier) of the data server Di to the model information 500 (S404-13).
the model generation unit 301 sets the name (or identifier) of the processing server Pj as a pointer to the next element of the added row (S404-14).
the “identifier” and the “pointer to the next element” in the model information 500 may be information that can identify a certain node in the conceptual model.
the model generation unit 301 sets the usable bandwidth of the communication path between the data server Di and the processing server Pj to the flow rate upper limit value of the additional row, and sets the flow rate lower limit value of the additional row to 0 or more and the flow rate upper limit value.
the following values are set (S404-15). Note that the step (S404-15) may be executed before the step (S404-14).
FIG. 16 is a flowchart showing a detailed operation of the master server 300 of the first embodiment in the step (S404-20).
the model generation unit 301 executes steps (S404-22) to (S404-26) for each available processing server Pj acquired from the server state storage unit 3060 based on the request information (S404-26). 21).
the model generation unit 301 adds a line including the name (or identifier) of the processing server Pj to the model information 500 (S404-22).
the model generation unit 301 determines whether or not a vertex exists in the subsequent stage of the processing server (S404-23).
the vertex at the subsequent stage of the processing server refers to an identifier of a line that can be reached by following a pointer to the next element of the line including the name (or identifier) of an arbitrary processing server in the model information 500.
the model generation unit 301 determines that there is no vertex in the subsequent stage of the processing server (S404-23; No)
the model generation unit 301 sets an identifier ⁇ that is an arbitrary name that does not match the identifier included in the model information 500 (S404-). 24). Note that if the model generation unit 301 determines that there is a vertex in the subsequent stage of the processing server (S404-23; Yes), the model generation unit 301 does not execute the process (S404-24). Subsequently, the model generation unit 301 sets the identifier ⁇ as a pointer to the next element of the added row (S404-25).
the model generation unit 301 sets a possible processing amount per unit time of the processing server Pj to the upper limit flow rate value of the additional row, and a value that is greater than or equal to 0 and less than or equal to the upper limit flow rate of the additional row Is set (S404-26). Note that the step (S404-25) may be performed after the step (S404-26). Further, the step (S404-26) may be executed at any time after the step (S404-22).
FIG. 17 is a flowchart showing the detailed operation of the master server 300 of the first embodiment in the step (S405).
the determination unit 303 of the master server 300 operates as follows using a conceptual model (herein referred to as a directed graph) that can be constructed based on the model information generated as described above.
the determining unit 303 determines the flow rate (data flow Fi) of each side based on the directed graph so that the processing time of the job executed by the distributed system 350 is minimized (S405-1).
the determination unit 303 generates the flow rate function f (e) so that the job processing time is minimized.
the determination unit 303 maximizes the objective function ( ⁇ e ⁇ E ′ (f (e)) for a certain edge set E ′) based on the network model constructed from the model information.
the determination unit 303 performs processing for maximizing the objective function using a linear programming method, a flow increase method in the maximum flow problem, or the like. A specific example of the operation using the flow increasing method in the maximum flow problem will be described later as a first embodiment.
the determining unit 303 sets the vertex indicating the starting point in the directed graph to the vertex variable i (S405-2). Next, the determination unit 303 secures an area for storing the path information array and the unit processing amount on the memory, and initializes the value of the unit processing amount to infinity (S405-3).
the determination unit 303 determines whether the vertex indicated by the vertex variable i is the end point of the directed graph (S405-4). Hereinafter, the vertex indicated by the vertex variable i is simply expressed as the vertex variable i.
the determination unit 303 determines that the vertex variable i is not the end point of the directed graph (S405-4; No), is there a communication channel with a non-zero flow rate among the communication channels exiting from the vertex variable i in the directed graph? It is determined whether or not (S405-5). When there is no communication path with a non-zero flow rate (S405-5; No), the determination unit 303 ends the process.
the determination unit 303 selects the communication path (S405-6). Subsequently, the determination unit 303 adds the vertex variable i to the path information array secured on the memory (S405-7).
the determination unit 303 determines whether the unit processing amount secured in the memory is smaller than or equal to the flow rate of the communication path selected in step (S405-6) (S405-8), and determines the unit processing amount. Is larger than the flow rate of the communication channel (S405-8; No), the unit processing amount secured in the memory is updated with the flow rate of the communication channel (S405-9). If the unit processing amount is smaller than or equal to the flow rate of the communication path (S405-8; Yes), the determination unit 303 does not execute the step (S405-9).
the determination unit 303 sets the vertex serving as the other end point of the communication path selected in the step (S405-6) to the vertex variable i (S405-10), returns to the step (S405-4), and executes it.
the determination unit 303 determines from the path information stored in the path information array and the unit processing amount. Data flow information is generated, and the data flow information is stored in the memory (S405-11).
the path information of the data flow information generated here at least a vertex indicating the data server 340 and a vertex indicating the processing server 330 included in one path from the start point to the end point in the directed graph are set.
the unit processing amount of the data flow information the data processing amount per unit time indicated by one path from the start point to the end point in the directed graph is set.
the determination unit 303 updates the flow rate of each side connecting the vertices included in the route information with a value obtained by subtracting the unit processing amount from the original flow rate (S405-12). Thereafter, the determination unit 303 returns to the step (S405-2) and executes the step again.
FIG. 18 is a flowchart showing the detailed operation of the master server 300 of the first embodiment in the step (S406).
the determination unit 303 executes the process (S406-2) for each processing server Pj in the set of available processing servers 330 (S406-1).
the determining unit 303 executes steps (S406-3) to (S406-4) for each piece of route information Fj in the set of route information including the processing server Pj (S406-2). Each path information Fj is included in the data flow information generated in the step (S405).
the determination unit 303 extracts the identifier of the data server 340 storing the processing target data from the path information Fj (S406-3).
the determination unit 303 transmits the processing program and the determination information to the processing server Pj (S406-4).
the processing program is a processing program for instructing the data server 340 storing the processing target data to transfer the data.
the data server 340 and the processing target data are specified by information included in the determination information.
the communication bandwidth of each input / output communication path in the distributed system 350 and the processing capability of each processing server 330 are determined from the entire arbitrary combination of each data server 340 and each processing server 330.
Considered model information is generated.
a combination of the processing server 330 and the data server 340 that is the acquisition destination of data to be processed by the processing server 330 is determined, and the combination Accordingly, transmission / reception and processing of the processing target data constituting the job executed in the distributed system 350 are executed.
the entire distributed system 350 including the plurality of data servers 340 and the plurality of processing servers 330 is executed while avoiding a decrease in efficiency due to a bottleneck of the communication bandwidth and processing server capability. Job processing time can be minimized.
the network model is generated in consideration of the communication bandwidth of each input / output communication path in the distributed system 350, the total amount of processing data of all the processing servers 330 per unit time in the distributed system 350. It is possible to determine a combination of the processing server 330 and the data server 340 based on the data transfer path that maximizes.
the first embodiment may be configured such that the master server 300 outputs the data flow information generated by the determination unit 303.
the determination unit 303 outputs the generated data flow information after executing the step (S405-11) of FIG.
the determination unit 303 outputs the information shown in the example of FIG.
This output form is not limited.
Data flow information may be output to a file, transmitted to another device, displayed on a display device, or sent to a printing device.
the data flow information output in this way can be used for planning more detailed data processing.
the distributed system 350 can dynamically determine the data transfer path according to the processing status.
the distribution system 350 according to the second embodiment will be described focusing on the content different from the first embodiment. The same contents as those in the first embodiment are omitted as appropriate.
the master server 300 handles a plurality of program processes requested to be executed by the distributed system 350. A unit of program processing requested to be executed by the distributed system is expressed as a job.
a mode in which the processing amount per unit time is changed according to the portion of the processing target data in the program processing requested to be executed by the distributed system 350 is also supported.
the job is handled by being replaced with a set of data having the same processing amount per unit time.
This data set may be expressed as a logical data set.
FIG. 19 is a diagram conceptually illustrating a processing configuration example of each device of the distributed system 350 in the second embodiment.
the master server 300 in the second embodiment further includes a job information storage unit 3040 in addition to the configuration of the first embodiment.
FIG. 20 is a diagram illustrating an example of information stored in the job information storage unit 3040.
Each row (each entry) stored in the job information storage unit 3040 includes a job ID 3041, a data name 3042, a minimum unit processing amount 3043, and a maximum unit processing amount 3044.
the job ID 3041 a unique identifier in the distributed system 350 assigned for each job executed by the distributed system 350 is set.
the data name 3042 the name (identifier) of data handled by the job is set.
the minimum unit processing amount 3043 is the minimum value of the processing amount per unit time specified for the logical data set that is data handled by the job.
the maximum unit processing amount 3044 is the maximum value of the processing amount per unit time specified for the logical data set.
the job information storage unit 3040 stores a plurality of rows having one job ID, and in each of these rows, a different data name 3042 and minimum unit processing amount are stored. 3043 and the maximum unit processing amount 3044 may be stored, respectively.
the model generation unit 301 of the master server 300 further reflects the job configuration information stored in the job information storage unit 3040 in the model information 500.
This reflection operation will be described in the following operation example section.
the logical vertex indicating the job is placed in the previous stage of the vertex indicating the data server 340, and the vertex of the job is transferred to the data server 340.
the edge indicating the logical communication path leading to the job the logical vertex preceding the job vertex, and the edge indicating the logical communication path from the vertex preceding the job to the job vertex.
FIG. 21 is a flowchart showing a detailed operation of the master server 300 of the second embodiment in the step (S401).
step (S401-0) is added to FIG. 13 showing the detailed operation of the first embodiment.
the model generation unit 301 acquires a set of jobs being executed from the job information storage unit 3040.
FIG. 22 is a flowchart showing a detailed operation of the master server 300 of the second embodiment in the step (S404).
the step (S404-30) is added to FIG. 14 showing the detailed operation of the first embodiment.
the model generation unit 301 includes, in the model information 500, logical communication path information to each job in the job set acquired from the job information storage unit 3040, and each job from each job.
Logical communication path information to the data server 340 storing the data to be processed in (1) is added (S404-30). Note that the order of the steps (S404-30), (S404-10), and (S404-20) shown in FIG. 22 may be changed.
FIG. 23 is a flowchart showing a detailed operation of the master server 300 of the second embodiment in the step (S404-30).
the model generation unit 301 of the master server 300 executes the process (S404-32) and subsequent steps (S404-31) for each job Ji in the acquired job set.
the model generation unit 301 determines whether or not there is a vertex in the previous stage of the job Ji (S404-32).
the top vertex of the job Ji corresponds to an identifier of a line in the model information 500 in which information (job name) indicating a certain job is set as a pointer to the next element.
the model generation unit 301 sets an identifier ⁇ when there is no vertex in the previous stage of the job (S404-32; No) (S404-34).
the identifier ⁇ is an arbitrary name that does not match the identifier included in the model information 500.
the model generation unit 301 acquires the identifier ⁇ of the previous stage (S404-33).
the model generation unit 301 adds a line including ⁇ as an identifier to the model information 500 (S404-35).
the model generation unit 301 sets the name of the job Ji as a pointer to the next element of the added row (S404-36).
the model generation unit 301 sets the maximum unit processing amount and the minimum unit processing amount assigned to the job Ji to the upper limit flow rate and lower limit flow rate of the additional row (S404-37).
the model generation unit 301 executes steps (S404-39) to (S404-3B) for each data server Dj that stores data handled by the job Ji (S404-38).
the model generation unit 301 adds a line whose identifier indicates job Ji to the model information 500 (S404-39).
the model generation unit 301 sets the name (or identifier) of the data server Dj as a pointer to the next element of the added row (S404-3A).
the model generation unit 301 sets a transfer amount that can be allocated to the data server Dj by the job Ji to the flow rate upper limit value of the additional row, and a value that is greater than or equal to 0 and less than or equal to the flow rate upper limit value of the flow rate lower limit value of the additional row. Is set (S404-3B).
the transfer amount that can be allocated to the data server Dj by the job Ji indicates, for example, the requested processing amount specified for each data handled by the job Ji, and may be given by the user or determined by the distributed system 350 May be.
the unit processing amount specified for each job is generated from a network model (conceptual model) that can be generated based on model information in which constraints and unit processing amount constraints specified for each data handled in each job are added.
a combination with the data server 340 that is the acquisition destination of data to be processed by the server 330 is determined.
the unit processing amount specified for the job executed in the distributed system 350 is taken into consideration, and transmission / reception and processing of the processing target data constituting the job are executed. , The processing time of the job can be minimized.
each priority when priority is set for each job, each priority can be set as a ratio between jobs of a unit processing amount specified for each job. Therefore, according to the second embodiment, even when a priority is set for each job, the processing target data is set so as to satisfy the set priority constraint and minimize the processing time as a whole. Transmission / reception and processing can be executed.
the master server 300 sets a termination point for each job as a pointer to the next element in the line including the identifier indicating the processing server 330 of the model information 500.
the number of rows including the identifier indicating the processing server 330 is equal to the number of job end points.
FIG. 24 is a diagram illustrating an example of information stored in the server state storage unit 3060 in the first modification of the second embodiment. As illustrated in FIG. 24, the server state storage unit 3060 stores the processable amount for each job as the processable amount information 3065 of each processing server 330.
FIG. 25 is a flowchart showing detailed operations of the master server 300 in the first modified example of the second embodiment regarding the step (S404-20) shown in FIG.
the model generation unit 301 executes the process (S404-2B) for each available processing server Pi acquired from the server state storage unit 3060 based on the request information (S404-2A).
the model generation unit 301 executes the steps (S404-2C) to (S404-2E) for each job Jj (S404-2B).
the model generation unit 301 adds a line including the name (or identifier) of the processing server Pi to the model information 500 (S404-2C).
the model generation unit 301 sets an identifier indicating the end point of the job Jj as a pointer to the next element of the additional row (S404-2D).
the model generation unit 301 sets a possible processing amount per unit time regarding the job Jj of the processing server Pi to the upper limit flow rate value of the additional row, and the flow rate lower limit value of the additional row is greater than or equal to 0 and less than or equal to the upper limit flow rate.
a value is set (S404-2E). Note that the step (S404-2E) may be performed before the step (S404-2D).
jobs having different processable amounts can be handled in the processing server 330, and each process is performed when determining the combination of the processing server 330 and the data server 340.
the processable amount for each job in the server 330 can be taken into account. Therefore, according to the first modification of the second embodiment, it is possible to more accurately minimize the processing time of each job to be executed in the entire system.
the master server 300 sets information (name) indicating a job in the pointer to the next element in the line including the identifier indicating the processing server 330 in the model information 500.
the number of rows including the identifier indicating the processing server 330 is equal to the number of jobs.
the pointer to the next element is set to the name (or identifier) of the job Jj in the above-described step of FIG. 25 (S404-2D).
the information stored in the server state storage unit 3060 is the same as that in the first modification of the second embodiment. That is, also in the second modification example of the second embodiment, the possibility per unit time for each job of the processing server 330 is set as the flow rate lower limit value and the flow rate upper limit value of the line including the identifier indicating the processing server 330 in the model information 500. A processing amount is set.
the determination process of the flow rate of each side (see step (S405-1) in FIG. 17) by the determination unit 303 is easier and faster than the first modification.
model information is modeled by the circulation flow
an identifier indicating the data server 340 and an identifier indicating the data are set in the pointer to the next element in the line including the identifier indicating the processing server 330.
an edge from the vertex indicating the processing server 330 to the vertex indicating the data server 340 or the logical vertex indicating data is provided.
the distribution system 350 according to the third embodiment will be described focusing on the content different from the first embodiment and the second embodiment. The same contents as those in the first embodiment and the second embodiment are omitted as appropriate.
the master server 300 also handles multiplexed processing target data.
the data location storage unit 3070 of the master server 300 further stores data size information.
FIG. 26 is a flowchart showing a detailed operation of the master server 300 of the third embodiment in the step (S404).
step (S404-40) is added to FIG. 14 showing the detailed operation of the first embodiment.
the model generation unit 301 adds logical communication path information from the data to the data server to the model information 500. Note that the order of the steps (S404-40), (S404-10), and (S404-20) shown in FIG. 26 may be changed.
FIG. 27 is a flowchart showing detailed operations of the master server 300 of the third embodiment in the step (S404-40).
the model generation unit 301 executes the process (S404-42) for each data di in the set of processing target data specified based on the request information (S404-41).
the model generation unit 301 executes steps (S404-43) to (S404-45) for each data server Dj that stores the multiplexed data di (S404-42).
the multiplexed data di exists for the number of multiplexed data.
the model generation unit 301 adds a line including di as an identifier (S404-43).
the model generation unit 301 sets the name (or identifier) of the data server Dj as a pointer to the next element of the added row (S404-44).
the model generation unit 301 sets the maximum processing amount and the minimum processing amount of the data di specified for the data server Dj to the flow rate upper limit value and the flow rate lower limit value in the additional row (S404-45).
the master server 300 may determine the upper limit flow rate and the lower limit flow rate. In this case, for example, the master server 300 sets infinity as the flow rate upper limit value and sets 0 as the flow rate lower limit value.
a common identifier di is attached to the multiplexed data in the model information 500 generated by the model generation unit 301. That is, the lines with the common identifier di are added by the number of multiplexed data di.
FIG. 28 is a flowchart showing a detailed operation of the master server 300 of the third embodiment in the step (S406).
each processing server 330 is assigned to each piece of data multiplexed and stored in different data servers 340. Thereby, the same vertex which shows the multiplexed same data may be contained in several data flow information.
the determination unit 303 of the master server 300 executes the step (S406-2-1) and the step (S406-3-1) for each data di in the set of processing target data (S406-1-1). .
the determination unit 303 identifies data flow information including the data di in the path information, and sets the unit processing amount set in each identified data flow information to the processing server 330 included in the path information of each data flow information. Aggregate every time.
the deciding unit 303 divides the data di by the ratio of the unit processing amount for each of the aggregated processing servers 330, and associates each divided data di with each data server 340 that stores the divided data di (S406-2-1). ).
the determination unit 303 executes the step (S406-4-1) for each piece of route information fj in the set of route information including the data di (S406-3-1).
the determination unit 303 sends the processing program and the determination information to the processing server Pk included in the route information fj (S406-4-1).
the processing program is a processing program for instructing to transfer a divided portion of the data di from the data server 340 storing the data di.
the data server 340 and the data are specified by information included in the determination information.
a processing server 330 is assigned to each multiplexed data, and a communication band for processing each multiplexed data And model information considering the processing capability of the processing server 330 is generated. Then, based on a conceptual model (network model) that can be constructed from the model information, a combination of the processing server 330 and the data server 340 that is the acquisition destination of data to be processed by the processing server 330 is determined.
a conceptual model network model
each multiplexed data is not transferred and processed in duplicate, but each data divided into an amount corresponding to the communication bandwidth and processing capacity is allocated to each processing server 330, and the entire system The data to be multiplexed is controlled so as to be transferred and processed.
the distributed system 350 according to the fourth embodiment will be described focusing on the contents different from those of the first to third embodiments. The same contents as those in the first to third embodiments are omitted as appropriate.
the master server 300 further defines between the data server 340 and the processing server 330 by intermediate devices and communication paths included therebetween, and detailed constraint information (available) Model information is generated in consideration of the bandwidth. Therefore, in the fourth embodiment, a portion between the data server 340 and the processing server 330 is referred to as a data transfer path instead of a communication path (input / output communication path), and a path that forms the data transfer path is referred to as a communication path.
FIG. 29 is a diagram conceptually illustrating a processing configuration example of each device of the distributed system 350 in the fourth embodiment.
the network switch 320 in the fourth embodiment further includes a switch management unit 321 in addition to the configuration of the first embodiment.
the processing server management unit 331 transmits status information such as the disk available bandwidth and the network available bandwidth of the processing server 330 to the master server 300.
the data server management unit 341 transmits status information including the disk available bandwidth and the network available bandwidth of the data server 340 to the master server 300.
the switch management unit 321 acquires information such as an available bandwidth of a communication path connected to the network switch 320 and transmits the information to the master server 300 via the data transmission / reception unit 322.
the input / output communication path information storage unit 3080 stores information on communication paths included in the data transfer path from the data server 340 to the processing server 330.
the communication path information includes an identifier of the connection source apparatus, an identifier of the connection destination apparatus, usable bandwidth information of the communication path, and the like.
the model generation unit 301 further includes a vertex indicating an intermediate device (for example, the network switch 320) through which the data stored in the data server 340 is received by the processing server 330, and from the vertex indicating the data server 340.
the edge that reaches the apex that indicates the nearest intermediate device of the data server 340, and the upper limit of the transfer amount restriction condition is set to the transferable amount per unit time from the data server 340 to the nearest intermediate device
the upper limit value of the transfer amount restriction condition is set to the transferable amount per unit time from the intermediate device to the other intermediate device from the vertex indicating the intermediate device to the vertex indicating the other intermediate device.
Upper limit generates a further model information may be constructed a conceptual model comprising at least one of the sides is set to transferable per unit time to the processing server 330 from the nearest intermediate devices.
the conceptual model indicates the vertex indicating the intermediate device, the edge from the vertex indicating the data server 340 to the vertex indicating the intermediate device, and the intermediate device. And an edge from the vertex to the vertex indicating the processing server 330.
the conceptual model is the first intermediate device and the second intermediate device.
Two vertices indicating two intermediate devices an edge from the vertex indicating the data server 340 to the vertex indicating the first intermediate device, an edge extending from the vertex indicating the first intermediate device to the vertex indicating the second intermediate device, 2 includes an edge extending from the vertex indicating the intermediate device to the vertex indicating the processing server 330.
the model generation unit 301 has a vertex indicating the intermediate device, one or more vertexes indicating one or more input units to the intermediate device, and one or more outputs of the intermediate device. You may make it comprise one or more vertices which show a part, and one or more sides which connect between the input part and output part which can transfer data.
FIG. 30 is a flowchart showing detailed operations of the master server 300 of the fourth embodiment in the step (S404).
the model generation unit 301 of the master server 300 executes the process (S404-12-10) for each data server Di storing unprocessed processing target data (S404-11).
the model generation unit 301 adds a line related to the data transfer path from the data server Di to the processing server 330 to the model information 500 (S404-12-10).
FIGS. 31A, 31B, and 31C are flowcharts showing detailed operations of the master server 300 of the fourth embodiment in the step (S404-12-10).
the name (or identifier) of the data server Di is set as the initial value in the device IDi.
the model generation unit 301 extracts information (input / output channel information) of a row in which the device IDi is set as the input source device ID from the input / output channel information storage unit 3080 (S404-12-11).
the model generation unit 301 identifies a set of output destination device IDs included in the extracted input / output communication path information (S404-12-12).
the model generation unit 301 determines whether or not the device IDi indicates a switch (S404-12-13). When the model generation unit 301 determines that the device IDi indicates a switch (S404-12-13; Yes), the model generation unit 301 performs the process of FIG. 31B. FIG. 31B will be described later.
the model generation unit 301 determines that the device IDi does not indicate a switch (S404-12-13; No), whether or not a line including the device IDi as an identifier has already been set in the model information 500. Determination is made (S404-12-14). If the model generation unit 301 determines that the line including the device IDi as an identifier has already been set in the model information 500 (S404-12-14; Yes), the model generation unit 301 ends the process of FIG. 31A.
model generation unit 301 determines that the line including the device IDi as an identifier has not yet been set in the model information 500 (S404-12-14; No), the output specified by the step (S404-12-12)
the process (S404-12-16) is executed for each output destination device IDj in the set of destination devices ID (S404-12-15).
step (S404-12-16) the model generation unit 301 determines whether the output destination device IDj indicates a switch.
the model generation unit 301 determines that the output destination device IDj indicates a switch (S404-12-16; Yes)
the model generation unit 301 performs the process of FIG. 31C. FIG. 31C will be described later.
the model generation unit 301 determines that the output destination device IDj does not indicate a switch (S404-12-16; No)
the model generation unit 301 adds a line including the device IDi as an identifier to the model information 500 (S404-12). -17).
the model generation unit 301 sets the output destination device IDj as a pointer to the next element in the added row (S404-12-18).
the model generation unit 301 sets the available bandwidth of the input / output communication path between the device indicated by the device IDi and the device indicated by the output destination device IDj as the flow rate upper limit value in the additional row, and adds the additional A value not less than 0 and not more than the upper limit of the flow rate is set as the lower limit of the flow rate in the row (S404-12-19). Note that the step (S404-12-19) may be executed before the step (S404-12-18).
the model generation unit 301 determines whether or not the output destination device IDj indicates the processing server 330 (S404-12-1A).
the model generation unit 301 determines that the output destination device IDj does not indicate the processing server 330 (S404-12-1A; No)
the model generation unit 301 recursively executes the processing of FIG. 31A (S404-12-1B).
the output destination device IDj is set as the initial value for the device IDi.
a line including the output destination device IDj as an identifier is added to the model information 500. If the model generation unit 301 determines that the output destination device IDj indicates the processing server 330 (S404-12-1A; Yes), the model generation unit 301 does not perform recursive execution of the processing in FIG. 31A.
step (S404-12-13) in FIG. 31A If it is determined in step (S404-12-13) in FIG. 31A that the device IDi indicates a switch (S404-12-13; Yes), the model generation unit 301 identifies the device ID in step (S404-12-12).
the process (S404-12-1D) is executed for each output destination device IDj in the set of output destination device IDs (S404-12-1C).
the device ID i indicating the switch may be referred to as a switch i.
the model generation unit 301 determines whether a line including the identifier of the output port for the output destination device IDj in the switch i exists in the model information 500 (S404-12-1D). When the model generation unit 301 determines that there is no line including the identifier of the output port in the model information (S404-12-1D; No), the model generation unit 301 stores the output port to the output destination device IDj in the switch i. A line including the identifier is added (S404-12-1E).
the model generation unit 301 sets the output destination device IDj as a pointer to the next element in the added row (S404-12-1F). Further, the model generation unit 301 sets the available bandwidth of the input / output communication path between the device indicated by the device IDi and the device indicated by the output destination device IDj as the flow rate upper limit value in the additional row, and adds the additional A value not less than 0 and not more than the upper limit of the flow rate is set as the lower limit of the flow rate in the row (S404-12-1G). Note that the step (S404-12-1G) may be performed before the step (S404-12-1F).
model generation unit 301 determines that there is a line including the identifier of the output port in the model information 500 (S404-12-1D; Yes)
the above-described steps (S404-12-1E) and (S404- 12-1F) and (S404-12-1G) are not executed.
the model generation unit 301 executes the step (S404-12-1I) for the input port identifier k in each switch i whose input source device ID is different from the output destination device IDj (S404-12-1H).
the model generation unit 301 determines whether or not (S404-12-1I). When the model generation unit 301 determines that there is no corresponding row in the model information 500 (S404-12-1I; No), the model generation unit 301 adds a row including k as an identifier to the model information 500 (S404-12-1J). ).
the model generation unit 301 sets the identifier of the output port to the output destination device IDj in the switch i in the pointer to the next element in the additional row (S404-12-1K).
the model generation unit 301 sets infinity to the upper limit of flow rate in the additional row, and sets a value that is greater than or equal to 0 and lower than or equal to the upper limit of flow rate in the additional row (S404-12-1L). Note that the step (S404-12-1L) may be executed before the step (S404-12-1K).
the model generation unit 301 determines whether or not the output destination device IDj indicates the processing server 330 (S404-12-1M).
the model generation unit 301 determines that the output destination device IDj does not indicate the processing server 330 (S404-12-1M; No)
the model generation unit 301 recursively executes the processing of FIG. 31A (S404-12-1N).
the output destination device IDj is set as the initial value for the device IDi.
a line including the output destination device IDj as an identifier is added to the model information 500.
the model generation unit 301 determines that the output destination device IDj indicates the processing server 330 (S404-12-1M; Yes)
the model generation unit 301 does not perform recursive execution of the processing in FIG. 31A.
the output destination device IDj indicating a switch may be referred to as a switch j.
the model generation unit 301 determines that the output destination device ID is the device IDi.
the process (S404-12-1P) is executed for each output port identifier k in each different switch j (S404-12-1O).
the model generation unit 301 determines whether there is a line in the model information 500 in which the identifier k is set as the pointer to the next element (S404-12-1P). When the model generation unit 301 determines that the corresponding row does not exist (S404-12-1P; No), the model generation unit 301 adds a row including the identifier of the input port from the device IDi in the switch j to the model information 500 (S404). -12-1Q).
the model generation unit 301 sets the identifier k to the pointer to the next element in the added row (S404-12-1R). Further, the model generation unit 301 sets infinity to the upper limit value of the flow rate in the additional row, and sets 0 to the lower limit value of the flow rate in the additional row (S404-12-1S). Note that the step (S404-12-1S) may be executed before the step (S404-12-1R). On the other hand, when the model generation unit 301 determines that the corresponding row exists (S404-12-1M; Yes), the process (S404-12-1Q), (S404-12-1R), and (S404-12-1S) ) Is not executed.
the model generation unit 301 adds a line including the device IDi as an identifier to the model information 500 (S404-12-1T).
the model generation unit 301 sets the identifier of the input port from the switch j in the device IDi to the pointer to the next element in the additional row (S404-12-1U).
the model generation unit 301 sets the available bandwidth of the input / output communication path between the device indicated by the device IDi and the device (switch j) indicated by the output destination device IDj to the flow rate upper limit value in the additional row. Then, a value not less than 0 and not more than the upper limit of the flow rate is set as the lower limit of the flow rate in the additional row (S404-12-1V). Note that the step (S404-12-1V) may be executed before the step (S404-12-1U).
the model generation unit 301 recursively executes the process of FIG. 31A (S404-12-1W). At this time, the output destination device IDj is set as the initial value for the device IDi. As a result, a line including the output destination device IDj as an identifier is added to the model information 500.
the devices constituting the data transfer path between each data server 340 and each processing server 330 and the available bandwidth of the communication path are further considered, and the processing server 330 and the processing target of the processing server 330 are The combination with the data server 340 that is the data acquisition destination is determined. Therefore, according to the fourth embodiment, it is possible to more accurately minimize the processing time of each job to be executed in the entire system.
the distributed system 350 according to the fifth embodiment will be described focusing on the content different from the above-described embodiments. About the same content as each above-mentioned embodiment, it abbreviate
the master server 300 generates model information in consideration of a decrease in processing capability when processing of a plurality of jobs is executed by the processing server 330 in parallel.
FIG. 32 is a flowchart showing the detailed operation of the master server 300 of the fifth embodiment in the step (S404-20) (see FIG. 22).
an edge indicating a decrease in processing capability of the processing server 330 due to processing of a plurality of jobs is added to the model information.
the model generation unit 301 executes steps (S404-2G) to (S404-2J) for each available processing server Pi acquired from the server state storage unit 3060 based on the request information (S404-). 2F).
the model generation unit 301 adds a line including the name (or identifier) of the processing server Pi to the model information 500 (S404-2G).
the model generation unit 301 sets the second name (or identifier) indicating the processing server Pi to the pointer to the next element in the added row (S404-2H).
the second name (or identifier) indicating the processing server Pi is a name (or identifier) corresponding to Pi that is unique in the model information 500.
the model generation unit 301 sets a possible processing amount per unit time of the processing server Pi in the flow rate upper limit value in the additional row, and a value that is greater than or equal to 0 and less than or equal to the flow rate upper limit value in the flow rate lower limit value in the additional row. Is set (S404-2I). Note that the step (S404-2I) may be performed before the step (S404-2H).
the model generation unit 301 executes the process (S404-2K) to the process (S404-2M) for each job Jj (S404-2J).
the model generation unit 301 adds a line including the second name (or identifier) indicating the processing server Pi to the model information 500 (S404-2K).
the model generation unit 301 sets an identifier indicating the end point of the job Jj as a pointer to the next element of the additional row (S404-2L). Further, the model generation unit 301 sets a possible processing amount per unit time of the processing server Pi for the job Jj to the upper limit flow rate value of the additional row, and the flow rate lower limit value of the additional row is 0 or more and the upper limit flow rate value.
the following values are set (S404-2M). Note that the step (S404-2M) may be performed before the step (S404-2L).
each processing capability of each job executed in parallel by the processing server 330 is further added to the model information as a constraint condition, and a conceptual model (network model) that can be constructed from this model information. ), The combination of the processing server 330 and the data server 340 that is the acquisition destination of data to be processed by the processing server 330 is determined.
the fifth embodiment it is possible to realize data transmission / reception and data processing of the distributed system 350 in consideration of a decrease in processing capacity due to processing of a plurality of jobs, and to minimize processing time in the entire system. Can do.
the distributed system 350 according to the sixth embodiment will be described focusing on the content different from the above-described embodiments. About the same content as each above-mentioned embodiment, it abbreviate
the master server 300 generates model information in consideration of a decrease in processing capability when processing of a plurality of jobs having different processing loads is performed in parallel by the processing server 330.
FIG. 33 is a diagram illustrating an example of information stored in the server state storage unit 3060 according to the sixth embodiment.
the server state storage unit 3060 stores the remaining resource information as the load information 3062, and further stores new processing load information 3066.
the remaining resource information indicates the remaining amount of resources used by the processing server 330 when the processing is executed.
the processing load information indicates the amount of resources of the processing server 330 used when executing the processing.
the processing load information is represented by, for example, a resource usage amount per unit processing amount.
the determining unit 303 of the master server 300 acquires the flow rate (flow rate function f (e)) of each side (step (S405-1 in FIG. 17)) and is acquired from the server state storage unit 3060.
the remaining resource information (load information 3062) and processing load information 3066 are further considered.
the processing load (resource amount) and the remaining load (remaining resource amount) consumed per unit processing amount of each job are managed, and these are used as constraints as a concept.
a route in the model is selected, and as a result, the correspondence between the data server 340, the processing server 330, and the unit processing amount for exchanging data regarding each job is determined. That is, in the sixth embodiment, the processing capability of the processing server 330 when jobs with different processing loads are processed in parallel is further added.
the sixth embodiment it is possible to realize data transmission / reception and data processing of the distributed system 350 in consideration of a decrease in processing capacity due to processing of a plurality of jobs having different processing loads. It can be minimized.
the processing load (resource amount) spent per unit processing amount for each job is stored in the processing load information of the server state storage unit 3060. May store the processing load consumed per unit processing amount without distinguishing between jobs.
Example 1 shows a specific example of the first embodiment described above.
FIG. 34 is a diagram conceptually illustrating a configuration example of the distributed system 350 in the first embodiment.
the distributed system 350 according to the first embodiment includes switches sw1 and sw2 and servers n1 to n4, and the servers n1 to n4 are connected to each other via the switches sw1 and sw2.
the servers n1 to n4 function as the processing server 330 and the data server 340 depending on the situation.
the servers n1 to n4 have disks D1 to D4 as the processing data storage unit 342.
any one of the servers n1 to n4 functions as the master server 300.
the server n1 has p1 as the usable process execution unit 332, and the server n3 has p3 as the usable process execution unit 332.
FIG. 35 is a diagram illustrating information stored in the server state storage unit 3060 according to the first embodiment.
the servers n1 and n3 are available processing servers 330.
the processable amount of the server n1 is 50 MB / s
the processable amount of the server n3 is 150 MB / s.
FIG. 36 is a diagram illustrating information stored in the input / output communication path information storage unit 3080 according to the first embodiment.
the available bandwidth at the time of data transmission from the server n2 to the server n1, the available bandwidth at the time of data transmission from the server n2 to the server n3, and the available bandwidth at the time of data transmission from the server n4 to the server n1 are 50 MB / s, respectively.
the available bandwidth at the time of data transmission from the server n4 to the server n3 is 100 MB / s.
FIG. 37 is a diagram illustrating information stored in the data location storage unit 3070 according to the first embodiment.
the processing target data (MyDataSet1) is divided and stored in files da, db, and dc.
the files da and db are stored in the disk D2 of the server n2, and the file dc is stored in the disk D4 of the server n4.
the processing target data (MyDataSet1) is data that is distributed and not multiplexed.
the model generation unit 301 of the master server 300 obtains ⁇ n2, n4 ⁇ as a set of identifiers of the data server 340 storing the processing target data from the data location storage unit 3070 in FIG.
the model generation unit 301 obtains ⁇ n1, n3 ⁇ as a set of identifiers of the available processing servers 330 from the server state storage unit 3060 in FIG.
the model generation unit 301 adds the information stored in the server state storage unit 3060 and the information stored in the input / output communication path information storage unit 3080 with respect to the acquired ⁇ n1, n3 ⁇ and ⁇ n2, n4 ⁇ . Based on this, model information is generated.
FIG. 38 is a diagram illustrating model information generated in the first embodiment.
FIG. 39 is a diagram showing a conceptual model (directed graph, network model) constructed by the model information shown in FIG.
the value given to each side on the conceptual model shown in FIG. 39 is the maximum value of the current data transfer amount per unit time in the communication channel (the upper limit value of the transfer amount constraint condition) or the start point of the side
the maximum value of the data processing amount per unit time in the processing server corresponding to (the upper limit value of the processing amount constraint condition) is shown.
the determination unit 303 of the master server 300 determines the flow rate (flow rate function f of each side) in the conceptual model so that the processing time of the job corresponding to the request information is minimized in the entire system. ) Respectively.
the flow function f is determined so that the data processing amount per unit time in the distributed system 350 is maximized.
40A to 40G are diagrams conceptually showing the flow function f determination process and the data flow information determination process by the flow increase method in the maximum flow problem in the first embodiment.
the determination unit 303 constructs a network model as shown in FIG. 40A based on the model information of FIG. Note that the construction of the network model does not only mean that some data form as a software element is generated, but is for explaining the processing until the flow rate function f is determined by using the model information. It is also used simply as a concept. In this network model, a start point s is set and an end point t is set.
the determination unit 303 displays the residual graph of the network shown in FIG. 40C. Identify.
the flow with the flow rate 0 is not shown in the residual graph.
the determination unit 303 identifies a flow increasing path from the residual graph shown in FIG. 40C and assigns a flow to the path.
a flow of 50 MB / s is given to the route (s, n2, n3, t) as shown in FIG. 40D.
the determination unit 303 specifies the network residual graph shown in FIG. 40E.
the determining unit 303 identifies a flow increasing path from the residual graph shown in FIG. 40E and gives a flow to the path. Based on the residual graph shown in FIG. 40E, the determination unit 303 assigns a flow of 100 MB / s to the route (s, n4, n3, t) as shown in FIG. 40F. As a result, the determination unit 303 identifies the residual graph of the network illustrated in FIG. 40G.
the determination unit 303 ends the process. As a result, the obtained combination information of each route and each data flow rate becomes data flow information.
FIG. 41 is a diagram showing data flow information in the first embodiment.
the determining unit 303 transmits the processing program to the servers n1 and n3 based on the data flow information determined as described above. Furthermore, the determination unit 303 instructs the data reception and processing execution by transmitting determination information corresponding to the processing program to the processing servers n1 and n3.
FIG. 42 is a diagram conceptually illustrating data transmission / reception performed in the first embodiment.
the processing server n1 that has received the decision information acquires data in the processing data storage unit 342 of the data server n2.
the process execution unit p1 executes the process of the acquired data.
the processing server n3 acquires data in the processing data storage unit 342 of the data server n2.
the process execution unit p3 executes processing of the acquired data.
the processing server n3 acquires data in the processing data storage unit 342 of the data server n4.
the process execution unit p3 executes processing of the acquired data.
Example 2 shows a specific example of the first modification of the second embodiment described above.
the configuration of the distributed system 350 in the second embodiment is the same as that in the first embodiment (see FIG. 34).
the state of the input / output communication path information storage unit 3080 in the second embodiment is also the same as that in the first embodiment (see FIG. 36).
FIG. 43 is a diagram illustrating information stored in the job information storage unit 3040 according to the second embodiment.
a job MyJob1 and a job MyJob2 are input as units for executing a program.
the maximum unit processing amount of the job MyJob1 is set to 25 MB / s, and the minimum unit processing amount of the job MyJob1 is not set.
the minimum unit processing amount of the job MyJob2 is set to 50 MB / s, and the maximum unit processing amount of the job MyJob2 is not set.
FIG. 44 is a diagram illustrating information stored in the server state storage unit 3060 according to the second embodiment.
the server n1 can process the jobs MyJob1 and MyJob2 at 50 MB / s, respectively, and the server n3 can process the jobs MyJob1 and MyJob2 at 150 MB / s, respectively.
FIG. 45 is a diagram illustrating information stored in the data location storage unit 3070 according to the second embodiment.
the data location storage unit 3070 stores information about the processing target data MyDataSet1 and MyDataSet2.
MyDataSet1 is stored in the file da, and the file da is stored in the disk D2 of the server n2.
MyDataSet2 is divided into files db, dc, and dd and stored.
the files db and dc are stored in the disk D2 of the server n2, and the file dd is stored in the disk D4 of the server n4. ing.
MyDataSet1 and MyDataSet2 are data that are distributed and not multiplexed.
the job information storage unit 3040, server state storage unit 3060, input / output communication path information storage unit 3080, and data location storage unit 3070 of the master server 300 are in the states shown in FIGS. 43, 44, 36, and 45.
the request information for requesting execution of the job MyJob1 using the processing target data (MyDataSet1) by the client 360 and requesting execution of the job MyJob2 using the processing target data (MyDataSet2) is received by the master server 300.
MyDataSet1 processing target data
MyDataSet2 processing target data
the model generation unit 301 of the master server 300 obtains ⁇ MyJob1, MyJob2 ⁇ as a set of jobs instructed to be executed from the job information storage unit 3040 in FIG.
the model generation unit 301 acquires the name of data used by the job, the minimum unit processing amount, and the maximum unit processing amount for each job.
the model generation unit 301 obtains ⁇ D2, D4 ⁇ as a set of identifiers of the data server 340 storing the processing target data from the data location storage unit 3070 of FIG.
the model generation unit 301 acquires ⁇ n2, n4 ⁇ as a set of identifiers of the data server 340 from the server state storage unit 3060 in FIG. 44, and uses ⁇ n1, n3 ⁇ as a set of identifiers of the available processing servers 330. obtain. Further, the model generation unit 301 obtains the processable amount information of the available processing servers n1 and n3 from the server state storage unit 3060 in FIG.
the model generation unit 301 generates model information based on each set acquired in this way and information stored in the input / output channel information storage unit 3080 in FIG.
FIG. 46 is a diagram illustrating model information generated in the second embodiment.
FIG. 47 is a diagram showing a conceptual model constructed by the model information shown in FIG. The value given to each side on the conceptual model shown in FIG. 47 is the maximum value of the current data transfer amount per unit time in the communication channel (upper limit value of the transfer amount constraint condition) or the start point of the side The maximum value of the data processing amount per unit time in the processing server corresponding to (the upper limit value of the processing amount constraint condition) is shown.
the determination unit 303 determines the flow function f based on the model information of FIG. 46 so that the job processing time is minimized.
FIGS. 48A to 48F and FIGS. 49A to 49J are diagrams conceptually showing a flow function f determination process and a data flow information determination process by the flow increase method in the maximum flow problem in the second embodiment.
48A to 48F are diagrams showing an example of an initial flow calculation procedure that satisfies the lower limit flow rate restriction.
the determination unit 303 constructs the network model shown in FIG. 48A based on the model information of FIG. In this network model, start points s1 and s2 are set, an end point t1 corresponding to the start point s1 is set, and an end point t2 corresponding to the start point s2 is set. Furthermore, the determination unit 303 sets a virtual start point s * and a virtual end point t * for the network model shown in FIG. 48A.
the determination unit 303 sets a difference value between the flow rate upper limit value before the change and the flow rate lower limit value before the change in the new flow rate upper limit value on the side to which the flow rate restriction is given. Moreover, the determination part 303 sets 0 to the new flow volume lower limit value of the side. Such processing is performed on the network model shown in FIG. 48A, whereby the network model shown in FIG. 48B is constructed.
the determining unit 303 connects the end point of the side connecting the s2 to which the lower limit flow rate restriction is given and MyJob2 and the virtual start point s *, and the s2 and the virtual end point t *, respectively. Specifically, a side where a predetermined flow rate upper limit value is set is added between the aforementioned vertices. This predetermined flow rate upper limit value is a flow rate lower limit value before change that has been set on the side where the lower limit flow rate restriction is given. Further, the determination unit 303 connects the end point t2 and the start point s2. Specifically, a side where the upper limit of the flow rate is infinite is added between the end point t2 and the start point s2. Such processing is performed on the network model shown in FIG. 48B, whereby the network model shown in FIG. 48C is constructed.
the determination unit 303 obtains an s * -t * -flow in which the flow rate of the side exiting from s * and the side entering t * is saturated for the network model shown in FIG. 48C. Note that the absence of the corresponding flow indicates that no solution satisfying the lower limit flow rate restriction exists in the original network model.
the path (s *, MyJob2, n4, n3, t2, s2, t *) shown in FIG. 48D corresponds to the s * -t * -flow.
the determining unit 303 deletes the added vertex and edge from the network model, and returns the flow restriction value of the edge to which the flow restriction is given to the original value before the change. And the determination part 303 gives a flow only by the part of the flow volume lower limit with respect to the said edge
the determination unit 303 as illustrated in FIG. 48E, the actual path (s2, MyJob2, n4, n3, t2) from which the added vertex and edge are deleted. Gives a flow of 50 MB / s.
the residual graph of the network shown in FIG. 48F is specified.
This path (s2, MyJob2, n4, n3, t2) is an initial flow (FIG. 49A) that satisfies the lower limit flow rate restriction.
the determination unit 303 identifies a flow increasing path from the residual graph shown in FIG. 49B (similar to FIG. 48F) and gives a flow to the path. Specifically, the determination unit 303 gives a flow of 25 MB / s to the route (s1, MyJob1, n2, n1, t1) as shown in FIG. 49C based on the residual graph shown in FIG. 49B. As a result, the determination unit 303 specifies the network residual graph shown in FIG. 49D.
the determining unit 303 identifies a flow increasing path from the residual graph shown in FIG. 49D and gives a flow to the path. Based on the residual graph shown in FIG. 49D, the determination unit 303 additionally gives a flow of 50 MB / s to the route (s2, MyJob2, n4, n3, t2) as shown in FIG. 49E. As a result, the determination unit 303 specifies the network residual graph shown in FIG. 49F.
the determining unit 303 identifies a flow increasing path from the residual graph shown in FIG. 49F and gives a flow to the path. Accordingly, the determination unit 303 additionally gives a flow of 50 MB / s to the route (s2, MyJob2, n2, n1, t2) as illustrated in FIG. 49G based on the residual graph illustrated in FIG. 49F. As a result, the determination unit 303 identifies the network residual graph shown in FIG. 49H.
the determining unit 303 identifies a flow increasing path from the residual graph shown in FIG. 49H and gives a flow to the path. Based on the residual graph shown in FIG. 49H, the determination unit 303 additionally gives a flow of 50 MB / s to the route (s2, MyJob2, n2, n3, t2) as shown in FIG. 49I. As a result, the determination unit 303 identifies the network residual graph shown in FIG. 49J.
the determination unit 303 ends the process. As a result, the obtained combination information of each route and each data flow rate becomes data flow information.
FIG. 50 is a diagram illustrating data flow information generated in the second embodiment.
the determining unit 303 transmits the processing program to the servers n1 and n3 based on the data flow information determined as described above. Furthermore, the determination unit 303 instructs the data reception and processing execution by transmitting determination information corresponding to the processing program to the processing servers n1 and n3.
FIG. 51 is a diagram conceptually illustrating data transmission / reception performed in the second embodiment.
the processing server n1 that has received the decision information acquires data in the processing data storage unit 342 of the data server n2.
the process execution unit p1 executes MyJob1 process on the acquired data.
the processing server n1 acquires data in the processing data storage unit 342 of the data server n2.
the process execution unit p1 executes MyJob2 process on the acquired data.
the processing server n3 acquires data in the processing data storage unit 342 of the data server n2.
the process execution unit p3 executes MyJob2 process on the acquired data.
the processing server n3 acquires data in the processing data storage unit 342 of the data server n4.
the process execution unit p3 executes MyJob2 process on the acquired data.
Example 3 shows a specific example of the above-described third embodiment.
the configuration of the distributed system 350 in the third embodiment is the same as that in the first embodiment (see FIG. 34). Further, the state of the server state storage unit 3060 in the third embodiment is the same as that in the first embodiment (see FIG. 35). The state of the input / output communication path information storage unit 3080 in the third embodiment is also the same as that in the first embodiment (see FIG. 36).
FIG. 52 is a diagram illustrating information stored in the data location storage unit 3070 according to the third embodiment.
the processing target data (MyDataSet1) is divided and stored into a file da and data db.
the file da is a single file that is not multiplexed and is stored in the disk D2 of the server n2.
the data db is duplicated into a file db1 and a file db2, the file db1 is stored on the disk D2 of the server n2, and the file db2 is stored on the disk D4 of the server n4.
the processing target is processed by the client 360.
request information for requesting execution of a processing program that uses data MyDataSet1 is transmitted to the master server 300.
MyDataSet1 request information for requesting execution of a processing program that uses data
the model generation unit 301 acquires ⁇ n2, n4 ⁇ from the data location storage unit 3070 in FIG. 52 and the server state storage unit 3060 in FIG. ⁇ N1, n3 ⁇ is acquired as a set of identifiers of processable servers 330 that can be processed.
the model generation unit 301 includes each of the acquired sets, information stored in the server state storage unit 3060 in FIG. 35, information stored in the input / output communication path information storage unit 3080 in FIG. 36, and FIG. Model information is generated based on the information stored in the data location storage unit 3070.
FIG. 53 is a diagram showing model information generated in the third embodiment.
FIG. 54 is a diagram showing a conceptual model constructed from the model information shown in FIG. The value given to each side on the conceptual model shown in FIG. 54 is the maximum value of the current data transfer amount per unit time in the communication channel (upper limit value of the transfer amount constraint condition) or the start point of the side The maximum value of the data processing amount per unit time in the processing server corresponding to (the upper limit value of the processing amount constraint condition) is shown.
the determining unit 303 determines the flow function f based on the model information shown in FIG. 53 so that the job processing time is minimized.
FIGS. 55A to 55G are diagrams conceptually showing a flow function f determination process and a data flow information determination process by the flow increase method in the maximum flow problem in the third embodiment.
the determination unit 303 constructs the network model shown in FIG. 55A based on the model information of FIG. In this network model, a start point s is set and an end point t is set.
the determination unit 303 gives a flow of 50 MB / s to the route (s, db, n2, n1, t).
the determination unit 303 specifies the residual graph of the network illustrated in FIG. 55C.
the determining unit 303 identifies the flow increasing path from the residual graph shown in FIG. 55C and gives a flow of 50 MB / s to the path (s, da, n2, n3, t) as shown in FIG. 55D. As a result, the determination unit 303 identifies the network residual graph shown in FIG. 55E.
the determination unit 303 identifies the flow increasing path from the residual graph shown in FIG. 55E, and gives a flow of 100 MB / s to the path (s, db, n4, n3, t) as shown in FIG. 55F. As a result, the determination unit 303 identifies the network residual graph shown in FIG. 55G.
the determination unit 303 ends the process. As a result, the obtained combination information of each route and each data flow rate becomes data flow information.
FIG. 56 is a diagram showing data flow information in the third embodiment.
the determination unit 303 determines the data to be processed by each processing server 330 for the duplicated data db. Identify. Specifically, the determination unit 303 identifies data flow information (Flow 1 and Flow 3) including the data db in the path information, and unit processing amounts (50 MB / s and 100 MB) set in each identified data flow information / S) for each processing server 330 (n1 and n3) included in the path information of the data flow information.
the unit processing amounts for the processing servers n1 and n3 are 50 MB / s and 100 MB / s.
the master server 300 causes the processing server n1 to process the 0th byte to the 2nd gigabyte of the file db1 stored in the data server n2, and stores the processing server n3 in the data server n4. It is determined that the second to sixth gigabytes of the file db2 to be processed are processed. This information is included in the decision information corresponding to the processing program.
FIG. 57 is a diagram conceptually illustrating data transmission / reception performed in the third embodiment.
the processing server n1 that has received the decision information acquires data in the processing data storage unit 342 of the data server n2.
the process execution unit p1 executes a process on the acquired data from the 0th byte to the 2nd gigabyte.
the processing server n3 acquires data in the processing data storage unit 342 of the data server n2.
the process execution unit p3 executes processing of the acquired data.
the processing server n3 acquires data in the processing data storage unit 342 of the data server n4.
the process execution unit p3 executes a process on the acquired data from the 2nd to 6th gigabytes.
Example 4 shows a specific example of the above-described fourth embodiment.
FIG. 58 is a diagram conceptually illustrating the configuration of the distributed system 350 in the fourth embodiment.
the distributed system 350 according to the fourth embodiment includes a switch sw and servers n1 to n4.
Servers n1 to n4 are connected to each other via a switch sw.
the servers n1 to n4 function as the processing server 330 and the data server 340 depending on the situation.
the servers n1 to n4 have disks D1 to D4 as processing data storage units 342, respectively.
any one of the servers n1 to n4 functions as the master server 300.
the servers n1 to n3 have p1 to p3 as available processing execution units 332.
FIG. 59 is a diagram illustrating information stored in the server state storage unit 3060 according to the fourth embodiment.
the servers n1 to n3 are available processing servers 330, and the processable amounts of the servers n1 to n3 are 50 MB / s, 50 MB / s, and 100 MB / s, respectively.
FIG. 60 is a diagram illustrating information stored in the input / output communication path information storage unit 3080 according to the fourth embodiment.
Each reading speed of the disk D2 of the server n2 and the disk D4 of the server n4 is 100 MB / s.
the available bandwidth at the time of data transmission from the server n2 to the switch sw and the available bandwidth at the time of data transmission from the server n4 to the switch sw are 100 MB / s.
the available bandwidth at the time of data transmission from the switch sw to the server n1 the available bandwidth at the time of data transmission from the switch sw to the server n2, and the available bandwidth at the time of data transmission from the switch sw to the server n3 are 100 MB / s.
the state of the data location storage unit 3070 in the fourth embodiment is the same as that in the first embodiment (see FIG. 37).
the model generation unit 301 of the master server 300 receives ⁇ n2, n4 ⁇ as a set of identifiers of the data server 340 storing the processing target data from the data location storage unit 3070 in FIG. 37 and the server state storage unit 3060 in FIG. And obtain ⁇ n1, n2, n3 ⁇ as a set of identifiers of the available processing servers 330. Based on each of these acquired sets, information stored in the server state storage unit 3060 in FIG. 59, and information stored in the input / output communication path information storage unit 3080 in FIG. Produce a top cloth.
FIG. 61 is a diagram illustrating model information generated in the fourth embodiment.
FIG. 62 is a diagram showing a conceptual model constructed from the model information shown in FIG. The value given to each side on the conceptual model shown in FIG. 62 is the maximum value of the current data transfer amount per unit time in the communication channel (upper limit value of the transfer amount constraint condition) or the start point of the side The maximum value of the data processing amount per unit time in the processing server corresponding to (the upper limit value of the processing amount constraint condition) is shown.
the determination unit 303 of the master server 300 determines the flow rate function f based on the model information of FIG. 61 so that the job processing time is minimized.
FIGS. 63A to 63G are diagrams conceptually showing a flow function f determination process and a data flow information determination process by the flow increase method in the maximum flow problem in the fourth embodiment.
the determination unit 303 constructs the network model shown in FIG. 63A based on the model information of FIG. In this network model, a start point s is set and an end point t is set. The determination unit 303 gives a flow of 50 MB / s to the route (s, D2, ON2, n2, t) as illustrated in FIG. 63B. As a result, the determination unit 303 specifies the residual graph of the network illustrated in FIG. 63C.
the determination unit 303 identifies the flow increasing path from the residual graph illustrated in FIG. 63C, and as illustrated in FIG. 63D, the flow of 50 MB / s on the path (s, D2, ON2, i2sw, o1sw, n1, t). give. As a result, the determination unit 303 identifies the residual graph of the network illustrated in FIG. 63E.
the determination unit 303 identifies a flow increasing path from the residual graph illustrated in FIG. 63E, and a flow of 50 MB / s on the path (s, D4, ON4, i4sw, o3sw, n3, t) as illustrated in FIG. 63F. give. As a result, the determination unit 303 specifies the network residual graph shown in FIG. 63G.
the determination unit 303 ends the process. As a result, the obtained combination information of each route and each data flow rate becomes data flow information.
FIG. 64 is a diagram showing data flow information in the fourth embodiment.
the determination unit 303 transmits the processing program from the processing servers n1 to n3 based on the data flow information determined as described above. Furthermore, the determination unit 303 instructs data reception and processing execution by transmitting determination information corresponding to the processing program to the processing servers n1 to n3.
FIG. 65 is a diagram conceptually illustrating data transmission / reception performed in the fourth embodiment.
the processing server n1 that has received the decision information acquires data in the processing data storage unit 342 of the data server n2.
the process execution unit p1 executes the process of the acquired data.
the processing server n2 acquires data in the processing data storage unit 342 of the data server n2.
the process execution unit p2 executes the process of the acquired data.
the processing server n3 acquires data in the processing data storage unit 342 of the data server n4.
the process execution unit p3 executes the process of the acquired data.
Example 5 shows a specific example of the fifth embodiment described above.
the configuration of the distributed system 350 in the fifth embodiment is the same as that in the first embodiment (see FIG. 34).
the state of the input / output communication path information storage unit 3080 in the fifth embodiment is also the same as that in the first embodiment (see FIG. 36).
the state of the server state storage unit 3060 in the fifth embodiment is the same as that in the second embodiment (see FIG. 44).
the state of the data location storage unit 3070 in the fifth embodiment is the same as that in the second embodiment (see FIG. 45).
FIG. 66 is a diagram illustrating information stored in the job information storage unit 3040 according to the fifth embodiment.
a job MyJob1 and a job MyJob2 are input as units for executing a program.
the minimum unit processing amount and the maximum unit processing amount are not set for the jobs MyJob1 and MyJob2.
the job information storage unit 3040, server status storage unit 3060, input / output communication path information storage unit 3080, and data location storage unit 3070 of the master server 300 are in the states shown in FIGS. 66, 44, 36, and 45.
request information for requesting execution of the job MyJob1 using the processing target data (MyDataSet1) and the job MyJob2 using the processing target data (MyDataSet2) is transmitted to the master server 300.
MyDataSet1 processing target data
MyDataSet2 job MyJob2 using the processing target data
the model generation unit 301 acquires ⁇ MyJob1, MyJob2 ⁇ as a set of jobs that are currently instructed from the job information storage unit 3040 in FIG. 66, and further, for each job, the name of data used by the job, The minimum unit processing amount and the maximum unit processing amount are acquired. Further, the model generation unit 301 acquires ⁇ n2, n4 ⁇ as a set of identifiers of data servers that store the processing target data from the data location storage unit 3070 in FIG. 45 and the server state storage unit 3060 in FIG. ⁇ N1, n3 ⁇ is acquired as a set of identifiers of possible processing servers 330. Further, the model generation unit 301 obtains each processable amount information of the server n1 and the server n3 related to each job from the server state storage unit 3060 of FIG.
the model generation unit 301 generates model information based on the acquired sets and information stored in the input / output communication path information storage unit 3080 in FIG.
FIG. 67 is a diagram illustrating model information generated in the fifth embodiment.
a pointer to the next element in the row including the identifiers of the processing servers n1 and n3 is used as a second identifier (n1 ′ and n3 ′) of each processing server n1 and n3. Is set.
names (MyJob1 ′, MyJob2 ′) indicating the end of each job (MyJob1, MyJob2) are set in the pointer to the next element in the row including the second identifiers of the processing servers n1 and n3.
FIG. 68 is a diagram showing a conceptual model constructed from the model information shown in FIG.
the value given to each side on the conceptual model shown in FIG. 68 is the maximum value of the current data transfer amount per unit time on the communication path (upper limit value of the transfer amount constraint condition), or the start point of the side
the maximum value of the data processing amount per unit time in the processing server corresponding to (the upper limit value of the processing amount constraint condition) is shown.
the determination unit 303 of the master server 300 determines the flow rate function f based on the model information of FIG. 67 so that the job processing time is minimized.
the determination unit 303 constructs the network model shown in FIG. 69A based on the model information of FIG. In this network model, start points s1 and s2 are set, an end point t1 corresponding to the start point s1 (MyJob1) is set, and an end point t2 corresponding to the start point s2 (MyJob2) is set.
the determination unit 303 gives a flow of 50 MB / s to the route (s1, MyJob1, n2, n1, n1 ′, t1) as shown in FIG. 69B. As a result, the determination unit 303 identifies the network residual graph shown in FIG. 69C.
the determination unit 303 identifies a flow increasing path from the residual graph illustrated in FIG. 69C, and performs a flow of 50 MB / s on the path (s2, MyJob2, n2, n3, n3 ′, t2) as illustrated in FIG. 69D. Give in addition. As a result, the determination unit 303 specifies the network residual graph shown in FIG. 69E.
the determination unit 303 identifies a flow increasing path from the residual graph shown in FIG. 69E, and, as shown in FIG. 69F, determines a flow of 100 MB / s on the path (s2, MyJob2, n4, n3, n3 ′, t2). Give in addition. As a result, the determination unit 303 specifies the residual graph of the network illustrated in FIG. 69G.
the determination unit 303 ends the process. As a result, the obtained combination information of each route and each data flow rate becomes data flow information.
FIG. 70 is a diagram illustrating data flow information in the fifth embodiment.
the determining unit 303 transmits the processing program to the processing servers n1 and n3 based on the data flow information determined in this way.
the determination unit 303 instructs the data reception and processing execution by transmitting determination information corresponding to the processing program to the processing servers n1 and n3.
FIG. 71 is a diagram conceptually illustrating data transmission / reception performed in the fifth embodiment.
the processing server n1 that has received the decision information acquires data in the processing data storage unit 342 of the data server n2.
the process execution unit p1 executes the process of the acquired data.
the processing server n3 acquires data in the processing data storage unit 342 of the data server n2 and data in the processing data storage unit 342 of the data server n4.
the process execution unit p3 executes a process for each acquired data.
Example 6 shows a specific example of the above-described sixth embodiment.
the configuration of the distributed system 350 in the sixth embodiment is the same as that in the first embodiment (see FIG. 34). Further, the state of the input / output communication path information storage unit 3080 in the sixth embodiment is the same as that in the first embodiment (see FIG. 36).
the state of the job information storage unit 3040 in the sixth embodiment is the same as that in the fifth embodiment (see FIG. 66).
the state of the data location storage unit 3070 in the sixth embodiment is the same as that in the second embodiment (see FIG. 45).
FIG. 72 is a diagram illustrating information stored in the server state storage unit 3060 according to the sixth embodiment.
the server n1 one resource that can be used for processing remains, and 0.01 and 0.02 per unit processing amount (1 MB / s) for each processing of the jobs MyJob1 and MyJob2. Resources are used.
the server n2 0.5 resources that can be used for processing remain, and resources of 0.002 and 0.004 per unit processing amount (1 MB / s) are used for the processing of jobs MyJob1 and MyJob2.
the server n2 0.5 resources that can be used for processing remain, and resources of 0.002 and 0.004 per unit processing amount (1 MB / s) are used for the processing of jobs MyJob1 and MyJob2.
the job information storage unit 3040, server state storage unit 3060, input / output communication path information storage unit 3080, and data location storage unit 3070 of the master server 300 are in the states shown in FIGS. 66, 72, 36, and 45.
request information for requesting execution of the job MyJob1 using the processing target data (MyDataSet1) and the job MyJob2 using the processing target data (MyDataSet2) is transmitted to the master server 300 by the client 360. To do.
the operation of the distributed system 350 in this situation will be described.
the model generation unit 301 of the master server 300 acquires ⁇ MyJob1, MyJob2 ⁇ as a set of jobs that are currently instructed to execute from the job information storage unit 3040 in FIG. 66, and for each job, the data used by the job. Get the name, minimum unit throughput, and maximum unit throughput. Further, the model generation unit 301 processes ⁇ n2, n4 ⁇ from the data location storage unit 3070 in FIG. 45 and the server state storage unit 3060 in FIG. 72 as a set of identifiers of data servers storing the processing target data. ⁇ N1, n3 ⁇ is acquired as a set of identifiers of possible processing servers 330. In addition, the model generation unit 301 acquires the remaining resource amount, the processable amount information, and the processing load information regarding the server n1 and the server n3 from the server state storage unit 3060 in FIG.
the model generation unit 301 generates model information based on each set acquired in this way and information stored in the input / output channel information storage unit 3080 in FIG.
FIG. 73 is a diagram illustrating model information generated in the sixth embodiment.
FIG. 74 is a diagram showing a conceptual model constructed from the model information shown in FIG. The value of each side on the conceptual model shown in FIG. 74 indicates the maximum value of the data transfer amount per unit time (the upper limit value of the transfer amount constraint condition).
the determination unit 303 of the master server 300 is based on the model information of FIG. 73 and the remaining resource amount, processable amount information, and processing load information regarding the servers n1 and n3 acquired from the server state storage unit 3060 of FIG.
the flow function f is determined so that the processing time of the job is minimized.
75A to 75I are diagrams conceptually showing a flow function f determination process and a data flow information determination process by the flow increase method in the maximum flow problem in the sixth embodiment.
the determination unit 303 constructs the network model shown in FIG. 75A based on the model information table shown in FIG. In this network model, start points s1 and s2 are set, an end point t1 corresponding to the start point s1 is set, and an end point t2 corresponding to the start point s2 is set.
the determination unit 303 gives a flow of 50 MB / s to the route (s1, MyJob1, n2, n1, t1) as shown in FIG. 75B.
the determination unit 303 specifies the residual graph of the network illustrated in FIG. 75C.
the resource remaining amount of the processing server n3 remains 0.5.
the determination unit 303 identifies a flow increasing path from the residual graph illustrated in FIG. 75C and adds a flow of 100 MB / s to the path (s2, MyJob2, n4, n3, t2) as illustrated in FIG. 75D. Give in.
the determination unit 303 specifies the network residual graph shown in FIG. 75E.
the resource remaining amount of the processing server n1 is 0.5 as it was before.
the determination unit 303 identifies a flow increasing path from the residual graph shown in FIG. 75E, and additionally gives a flow of 50 MB / s to the path (s1, MyJob1, n2, n3, t1) as shown in FIG. 75F. .
the determination unit 303 specifies the network residual graph shown in FIG. 75G.
the resource remaining amount of the processing server n1 is 0.5 as it was before.
the determination unit 303 ends the process. As a result, the obtained combination information of each route and each data flow rate becomes data flow information.
FIG. 76 is a diagram showing data flow information in the sixth embodiment.
the determining unit 303 transmits the processing program to the servers n1 and n3 based on the data flow information determined as described above. Furthermore, the determination unit 303 instructs the data reception and processing execution by transmitting determination information corresponding to the processing program to the processing servers n1 and n3.
FIG. 77 is a diagram conceptually illustrating data transmission / reception performed in the sixth embodiment.
the processing server n1 acquires data in each processing data storage unit 342 of the data servers n2 and n4.
the process execution unit p1 executes each acquired data process.
the processing server n3 acquires data in the processing data storage units 342 of the data servers n2 and n4, respectively.
the process execution unit p3 executes each acquired data process.
Appendix 1 A plurality of first vertices indicating a plurality of data devices for storing data, a plurality of second vertices indicating a plurality of processing devices for processing data, and data per unit time from each data device to each processing device
Each processing amount constraint condition including the possible amount as an upper limit value is set, and includes at least one second side from each second vertex to at least one third vertex subsequent to each second vertex.
a model generation unit that generates model information capable of constructing a conceptual model; and The first side and the second side included in the path are set for the paths on the conceptual model including the first vertex, the second vertex, the first side, and the second side, respectively.
the total amount of data processing per unit time that can be executed according to the transfer amount constraint condition and the processing amount constraint condition is used to determine the flow rate of each side on the conceptual model and satisfy the flow rate of each side
a determination unit that selects each route on the model and determines a plurality of combinations of the processing device and the data device that stores data processed by the processing device in accordance with each vertex included in each selected route.
the model generation unit includes a plurality of data devices that store a plurality of data devices that store a plurality of data handled by the job from a vertex indicating the job and a vertex indicating the job based on information stored in the job information storage unit Generating the model information that can construct the conceptual model further including a plurality of sides that respectively reach the vertices of
the determining unit is configured to process the job, the data device storing at least one data handled by the job, and the at least one data according to each vertex included in each of the selected paths.
a plurality of combinations with a processing device are determined, and based on the information of the plurality of combinations and information stored in the data location storage unit, the processing device, an identifier of data processed by the processing device, and the processing device Generating information indicating a correspondence relationship with the data device storing data processed by The management apparatus according to attachment 1.
a data location storage unit that stores the identifier of the original data multiplexed into a plurality of replicated data and the identifier of each data device that stores the replicated data in association with each other;
the model generation unit further includes the conceptual model further including: a vertex indicating the original data; and a plurality of sides respectively extending from the vertex indicating the original data to a first vertex indicating each data device storing the duplicate data.
the determining unit is configured to store the original data, the data device storing at least one of the plurality of duplicate data, and at least one of the plurality of duplicate data according to each vertex included in each of the selected paths.
the processing device Determining a plurality of combinations with the processing device that processes one, and based on the information of the plurality of combinations and the information stored in the data location storage unit, the processing device and the data processed by the processing device Generating information indicating the correspondence between the identifier and the data device storing the data to be processed by the processing device;
the management device according to attachment 1 or 2.
the determination unit further includes the data processing amount per unit time for each of the selected routes in each of the correspondences corresponding to the routes, and generates a plurality of the correspondences having a common data identifier.
the amount of data corresponding to the amount of data processing per unit time included in the correspondence relationship of the original data indicated by the common identifier is transmitted to each processing device indicated by the plurality of correspondence relationships. Decide to process each The management device according to attachment 3.
the model generation unit is an edge from the second vertex indicating the processing device to the first vertex indicating the data device, the vertex indicating data, or the vertex indicating a job that handles a plurality of data, Generating the model information that can construct the conceptual model further including one or more sides for which the processing amount constraint is set;
the management device according to any one of appendices 1 to 4.
the model generation unit includes a plurality of start vertices for each job each handling a plurality of data, a respective end point vertex for each job, and a plurality of sides extending from the respective start point vertices to a vertex indicating each job.
a plurality of sides extending from the second vertex indicating the processing device to each of the end points, and the upper limit value of the processing amount constraint condition is a unit that the processing device can execute for each job.
Generating the model information capable of constructing the conceptual model further including a plurality of sides each set to a processing amount per time, The management device according to any one of appendices 1 to 4.
the model generation unit further includes a vertex indicating an intermediate device through which data stored in the data device is received by the processing device, and from the first vertex indicating the data device to the data device An edge that reaches the apex that indicates the nearest intermediate device, and the upper limit of the transfer amount restriction condition is set to the transferable amount per unit time from the data device to the nearest intermediate device, the intermediate The upper limit value of the transfer amount restriction condition is set to the transferable amount per unit time from the intermediate device to the other intermediate device, from the vertex indicating the device to the vertex indicating the other intermediate device.
generating the model information may construct the conceptual model further comprises at least one of the sides is set to transferable amount per unit time, The management device according to any one of appendices 1 to 6.
the model generation unit includes a vertex indicating the intermediate device, one or more vertexes indicating one or more input units to the intermediate device, one or more vertexes indicating one or more output units of the intermediate device, and data Comprises one or more sides connecting the input unit and the output unit to which can be transferred,
the management apparatus according to appendix 7.
the model generation unit further sets a lower limit value of a data transfer amount or a data processing amount in the transfer amount constraint condition or the processing amount constraint condition set in at least one side included in the conceptual model.
the management device according to any one of appendices 1 to 9.
a communication path information storage unit for storing input / output communication path information between each data device and each processing device;
a data location storage unit that stores an identifier of data to be processed and the data device that stores the data in association with each other;
a server status storage unit for storing the data processing capacity per unit time of each processing device;
the management device according to any one of appendices 1 to 10, comprising: The processing device that processes data acquired from the data device according to each combination of the processing device and the data device determined by the determination unit of the management device; A data device that transmits data to the processing device in accordance with each combination of the processing device and the data device determined by the determination unit of the management device; A distributed system.
At least one computer At least one computer A plurality of first vertices indicating a plurality of data devices for storing data, a plurality of second vertices indicating a plurality of processing devices for processing data, and data per unit time from each data device to each processing device
Each processing amount constraint condition including the possible amount as an upper limit value is set, and includes at least one second side from each second vertex to at least one third vertex subsequent to each second vertex.
Generate model information that can build a conceptual model
the first side and the second side included in the path are set for the paths on the conceptual model including the first vertex, the second vertex, the first side, and the second side, respectively.
determine the flow rate of each side on the conceptual model Select each path on the conceptual model that satisfies the flow rate of each side, Determining a plurality of combinations of the processing device and the data device storing data to be processed by the processing device according to each vertex included in each of the selected paths; Distributed processing management method.
the at least one computer stores a data location storage unit that associates an identifier of data with an identifier of the data device that stores the data, and a job information storage unit that stores information about a plurality of data handled by a job And comprising
the generation of the model information indicates a vertex indicating the job and a plurality of data devices storing a plurality of data handled by the job from the vertex indicating the job based on information stored in the job information storage unit.
the plurality of combinations are determined by processing the job, the data device storing at least one data handled by the job, and the at least one data according to each vertex included in each of the selected paths. Determining a plurality of combinations with the processing device
the at least one computer comprises: The data for storing the processing device, an identifier of data processed by the processing device, and data processed by the processing device based on the information of the plurality of combinations and information stored in the data location storage unit Generate information indicating the correspondence with the device,
the at least one computer includes a data location storage unit that stores an identifier of original data multiplexed into a plurality of replicated data and an identifier of each data device that stores the replicated data in association with each other;
the generation of the model information further includes a vertex indicating the original data, and a plurality of sides extending from the vertex indicating the original data to a first vertex indicating each data device storing the duplicate data.
Determining a plurality of combinations with the processing device for processing at least one of The at least one computer comprises: The data for storing the processing device, an identifier of data processed by the processing device, and data processed by the processing device based on the information of the plurality of combinations and information stored in the data location storage unit Generate information indicating the correspondence with the device,
the data processing amount per unit time for each of the selected routes is further included in each of the corresponding relationships corresponding to the respective routes, and the corresponding relationship having a common data identifier is determined.
each processing device indicated by the plurality of correspondence relations has a ratio corresponding to the data processing amount per unit time included in the correspondence relation of the original data indicated by the common identifier. Decide what to do with each data volume, The distributed processing management method according to attachment 14.
the generation of the model information includes each start point vertex for each job handling a plurality of data, each end point vertex for each job, and a plurality of start points from each start point vertex to a vertex indicating each job.
An edge and a plurality of edges extending from the second vertex indicating the processing device to each end vertex, and an upper limit value of the processing amount constraint condition is executable by the processing device for each job.
Generating the model information capable of constructing the conceptual model further including a plurality of sides each set to a processing amount per unit time; The distributed processing management method according to any one of appendices 12 to 15.
the generation of the model information further includes a vertex indicating an intermediate device through which data stored in the data device is received by the processing device, and the data from the first vertex indicating the data device.
An edge that reaches a vertex indicating the nearest intermediate device of the device, and an upper limit value of the transfer amount restriction condition is set to a transferable amount per unit time from the data device to the nearest intermediate device;
the edge from the vertex indicating the intermediate device to the vertex indicating the other intermediate device, and the upper limit value of the transfer amount constraint is set to the transferable amount per unit time from the intermediate device to the other intermediate device
an edge from the vertex indicating the nearest intermediate device of the processing device to the second vertex indicating the processing device, and the upper limit value of the transfer amount constraint condition is from the nearest intermediate device to the second vertex processing
Generating the model information may construct the conceptual model further comprises at least one of the sides is set to transferable per unit time to the location, The distributed processing management method according to any one of appendices 12
the generation of the model information includes a vertex indicating the intermediate device, one or more vertices indicating one or more input units to the intermediate device, and one or more vertices indicating one or more output units of the intermediate device; It is composed of one or more sides connecting the input unit and the output unit to which data can be transferred.
Appendix 19 A program for causing at least one computer to execute the management method according to any one of appendices 12 to 18.

Landscapes

Engineering & Computer Science (AREA)
Computer Networks & Wireless Communication (AREA)
Signal Processing (AREA)
Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

PCT/JP2013/000305 2012-03-30 2013-01-23 Dispositif de gestion et procédé de gestion de traitement distribué Ceased WO2013145512A1 (fr)

Applications Claiming Priority (2)

Application Number	Priority Date	Filing Date	Title
JP2012082665		2012-03-30
JP2012-082665		2012-03-30

Publications (1)

Publication Number	Publication Date
WO2013145512A1 true WO2013145512A1 (fr)	2013-10-03

Family

ID=49258840

Family Applications (1)

Application Number	Title	Priority Date	Filing Date
PCT/JP2013/000305 Ceased WO2013145512A1 (fr)	2012-03-30	2013-01-23	Dispositif de gestion et procédé de gestion de traitement distribué

Country Status (2)

Country	Link
JP (1)	JPWO2013145512A1 (fr)
WO (1)	WO2013145512A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
WO2018154799A1 (fr) *	2017-02-24	2018-08-30	株式会社レクサー・リサーチ	Dispositif et procédé d'optimisation de plan opérationnel
CN109254531A (zh) *	2017-11-29	2019-01-22	辽宁石油化工大学	具有时滞和干扰的多阶段间歇过程的最优成本控制方法
JP2022008955A (ja) *	2017-02-24	2022-01-14	株式会社レクサー・リサーチ	業務計画最適化装置及び業務計画最適化方法

Citations (1)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
WO2004086246A1 (fr) *	2003-03-24	2004-10-07	Fujitsu Limited	Dispositif de commande de traitement decentralise, procede de commande de traitement decentralise, et programme de commande de traitement decentralise

2013
- 2013-01-23 JP JP2014507351A patent/JPWO2013145512A1/ja active Pending
- 2013-01-23 WO PCT/JP2013/000305 patent/WO2013145512A1/fr not_active Ceased

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
WO2004086246A1 (fr) *	2003-03-24	2004-10-07	Fujitsu Limited	Dispositif de commande de traitement decentralise, procede de commande de traitement decentralise, et programme de commande de traitement decentralise

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
WO2018154799A1 (fr) *	2017-02-24	2018-08-30	株式会社レクサー・リサーチ	Dispositif et procédé d'optimisation de plan opérationnel
JP2018139041A (ja) *	2017-02-24	2018-09-06	株式会社レクサー・リサーチ	業務計画最適化装置及び業務計画最適化方法
CN110337659A (zh) *	2017-02-24	2019-10-15	雷克萨研究有限公司	业务规划优化装置及业务规划优化方法
JP2022008955A (ja) *	2017-02-24	2022-01-14	株式会社レクサー・リサーチ	業務計画最適化装置及び業務計画最適化方法
US11314238B2 (en)	2017-02-24	2022-04-26	Lexer Research Inc.	Plant operational plan optimization discrete event simulator device and method
JP7244128B2 (ja)	2017-02-24	2023-03-22	株式会社レクサー・リサーチ	業務計画最適化装置及び業務計画最適化方法
CN109254531A (zh) *	2017-11-29	2019-01-22	辽宁石油化工大学	具有时滞和干扰的多阶段间歇过程的最优成本控制方法
CN109254531B (zh) *	2017-11-29	2021-10-22	辽宁石油化工大学	具有时滞和干扰的多阶段间歇过程的最优成本控制方法

Also Published As

Publication number	Publication date
JPWO2013145512A1 (ja)	2015-12-10

Publication	Publication Date	Title
JP5850054B2 (ja)	2016-02-03	分散処理管理サーバ、分散システム、及び分散処理管理方法
US20240256571A1 (en)	2024-08-01	Resource management systems and methods
JP5935889B2 (ja)	2016-06-15	データ処理方法、情報処理装置およびプログラム
US9740706B2 (en)	2017-08-22	Management of intermediate data spills during the shuffle phase of a map-reduce job
JP6162194B2 (ja)	2017-07-12	ユニバーサルフローを変換するためのシャーシコントローラ
US10545914B2 (en)	2020-01-28	Distributed object storage
JP4740897B2 (ja)	2011-08-03	仮想ネットワーク構成方法及びネットワークシステム
Nathan et al.	2017	Comicon: A co-operative management system for docker container images
US9092266B2 (en)	2015-07-28	Scalable scheduling for distributed data processing
US10127275B2 (en)	2018-11-13	Mapping query operations in database systems to hardware based query accelerators
US20150277955A1 (en)	2015-10-01	System and method for controlling virtual-machine migrations based on processor usage rates and traffic amounts
CN102947796A (zh)	2013-02-27	用于在数据中心环境中移动虚拟资源的方法和装置
JP2005056077A (ja)	2005-03-03	データベース制御方法
CN102165448A (zh)	2011-08-24	数据库服务器系统的存储层
US10990433B2 (en)	2021-04-27	Efficient distributed arrangement of virtual machines on plural host machines
WO2013145512A1 (fr)	2013-10-03	Dispositif de gestion et procédé de gestion de traitement distribué
CN105760391B (zh)	2019-12-13	数据动态重分布的方法、数据节点、名字节点及系统
US12086125B2 (en)	2024-09-10	Multiple volume placement based on resource usage and scoring functions
US9146694B2 (en)	2015-09-29	Distribution processing unit of shared storage
KR100983479B1 (ko)	2010-09-27	분산 스페이스를 이용하여 분산 프로그래밍 환경을 제공하기 위한 방법, 시스템 및 컴퓨터 판독 가능한 기록 매체
CN112714903B (zh)	2024-10-15	使用客户端提供的决策元数据的基于可缩放小区的包处理服务
WO2016174739A1 (fr)	2016-11-03	Système multi-ordinateur, ordinateur de gestion, et procédé de gestion de couplage de données
ELomari et al.	2021	New data placement strategy in the HADOOP framework
JP5031538B2 (ja)	2012-09-19	データ分配方法、データ分配プログラム、及び並列データベースシステム
US20230105531A1 (en)	2023-04-06	Executable Objects in a Distributed Storage System

Legal Events

Date	Code	Title	Description
2013-11-20	121	Ep: the epo has been informed by wipo that ep was designated in this application	Ref document number: 13769218 Country of ref document: EP Kind code of ref document: A1
2014-04-28	ENP	Entry into the national phase	Ref document number: 2014507351 Country of ref document: JP Kind code of ref document: A
2014-09-30	NENP	Non-entry into the national phase	Ref country code: DE
2015-04-22	122	Ep: pct application non-entry in european phase	Ref document number: 13769218 Country of ref document: EP Kind code of ref document: A1