WO2015071975A1 - Procédé de gestion de distribution d'applications et de données, système de gestion de distribution d'applications et de données, et support de stockage - Google Patents
Procédé de gestion de distribution d'applications et de données, système de gestion de distribution d'applications et de données, et support de stockage Download PDFInfo
- Publication number
- WO2015071975A1 WO2015071975A1 PCT/JP2013/080672 JP2013080672W WO2015071975A1 WO 2015071975 A1 WO2015071975 A1 WO 2015071975A1 JP 2013080672 W JP2013080672 W JP 2013080672W WO 2015071975 A1 WO2015071975 A1 WO 2015071975A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- condition
- application
- data
- user
- slo
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3442—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for planning or managing the needed capacity
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3452—Performance evaluation by statistical analysis
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/50—Network service management, e.g. ensuring proper service fulfilment according to agreements
Definitions
- the present invention relates to a technique for managing the arrangement of applications and data.
- Cloud computing that provides information technology as a service is widespread.
- Many cloud computing service providers (hereinafter referred to as cloud providers) that provide various cloud computing services have emerged, and cloud providers have built data centers to provide cloud computing services in a wide area on a global scale. Yes.
- Non-Patent Document 1 discloses a technique for automatically arranging applications and application data in consideration of position information of end users who use application services.
- Non-Patent Document 1 various constraint information, specifically, end information of the end user or the terminal used by the end user, position information of the data center, characteristic information of the application and application data, Service level target (SLO: Service Level Objects) information, data center characteristic information, data center cost information, and the like that are set for each service and transaction configured by applications and application data are input as constraint information. Then, the application and application data must be arranged in a data center where the application and application data are distributed so as to satisfy all the input constraint information. This arrangement plan is a NP (Non-deterministic Polynomial) difficult problem and requires a huge amount of calculation.
- NP Non-deterministic Polynomial
- application indicates an application binary that provides a part of the service to the end user.
- Application data indicates data accessed by the application binary.
- KPI Key Performance Indicator
- SLO is a conditional statement related to the performance of the computer system such as “response time of 1 second or less”.
- KPI items and SLO items items targeted by KPI and SLO, such as “contract rate” and “response time”, will be referred to as KPI items and SLO items, respectively, and values that KPI and SLO should satisfy such as “20% or more” and “1 second or less”.
- threshold are referred to as KPI standard and SLO standard, respectively.
- the contract rate indicates, for example, the ratio of users who have purchased a product among users who have visited the WEB site.
- the biggest factor that requires an enormous amount of calculation is that even if there are many SLOs, they are set only for each transaction, and the number of combinations of patterns for arranging applications and application data to satisfy the SLOs is enormous. is there. Specifically, if the number of data centers is 100 and the number of applications and application data is 1000, the number of calculation patterns is 100 ⁇ 1000 (100 to the 1000th power).
- the present invention has been made in view of the above problems, and an object of the present invention is to reduce the amount of calculation required for an application or application data arrangement plan that satisfies SLO.
- the present invention is an application and data arrangement management method in which a management computer having a processor and a memory and connected to a data center and a user terminal manages an application and data arrangement that can be divided into one or more data centers.
- An agent deployed in the data center monitors the data center, and an agent deployed in the user terminal monitors the user terminal, and the agent of the data center and the user terminal
- the third step of collecting from the agent of the terminal and the management computer have a correlation between the user trace information and the first condition, with a predetermined service level target of the application and data as a second condition
- the present invention it is possible to greatly reduce the amount of calculation required for the target application satisfying the service level and the data allocation plan. For example, if the number of data centers is 100 and the number of applications and application data is 1000, the number of patterns to be calculated is about 100 * 1000.
- FIG. 1A to 1C are block diagrams of a computer system according to an embodiment of the present invention.
- FIG. 1A is a block diagram illustrating an example of functions of a data center and a user terminal.
- FIG. 1B is a block diagram illustrating an example of a management computer that manages cloud computing.
- FIG. 1C is a block diagram illustrating an example of the configuration of the data center.
- the computer system of the present invention includes a plurality of user terminals 1-1 to 1-n, a plurality of data centers 2-1 to 2-n, and data centers 2-1 to 2- A management computer 100 that manages n is connected.
- the user terminal 1-1 to 1-n is generically referred to as a user terminal 1
- the data center 2-1 to 2-n is generically referred to as a data center 2 (hereinafter DC2).
- User terminal 1 uses services provided on cloud computing.
- a plurality of computing nodes are installed in DC2, and cloud computing is operated.
- DC2 indicates a data center.
- DC2 is indicated. Indicates a rack in which a plurality of computing nodes are combined or a cluster constituted by a plurality of racks.
- the user application 10 of the user terminal 1 is software that processes a part of the service provided by the service provider (or DC2), and is downloaded to the user terminal 1.
- the user application 10 is a service GUI that is executed on, for example, a browser application installed in the user terminal 1 and receives user operations.
- FIG. 1C is a block diagram illustrating an example of DC2.
- DC2-1 will be described, but DC2-2 to 2-n have the same configuration.
- the DC 2-1 includes a plurality of computing nodes (# 1 to #n) 200-1 to 200-n, a storage apparatus 250, and a network 260 that interconnects the gateway apparatus 270.
- a generic name of the nodes 200-1 to 200-n is represented by a node 200.
- Each node 200 is connected to the user terminal 1 and the management computer 100 from the external network 50 via the internal network 260 and the gateway device 270.
- the node (# 1) 200-1 includes a physical computer 201-1, a virtualization unit 202-1 that assigns the computer resources of the physical computer 201-1 to one or more virtual computers 210-1 to 210-n, and each virtual machine 201-1. And applications 14-1 to 14-n running on the OS 211-1 to 211-n of the computers 210-1 to 210-n.
- the OS 211-1 to 211 -n is generically referred to as OS 211
- the applications 14-1 to 14 -n are generically referred to as application 14.
- the physical computer 201-1 includes a processor 2011 and a memory 2012.
- the storage apparatus 250 stores the OS 211, the application 14, or data 140 used by the application 14.
- the DC 2-1 generates one or more virtual machines 210 according to a command from the management computer 100, executes the OS 211 and the application 14, and provides the service of the application 14 to the user terminal 1.
- the hardware and software shown in FIG. 1C are DC2 resources, and the management computer 100 manages the resources for each DC2.
- an application 14 that provides a service to the user terminal 1 a user trace acquisition unit 11 that acquires user trace information such as a history of use of the application 14 by the user terminal 1, and a user trace acquisition unit 11 have acquired.
- Accepts requests from a user trace storage unit 12 that stores user trace information a resource information acquisition unit 15 that acquires DC2-1 resource information, a resource information storage unit 16 that stores resource information, and a management computer 100
- a transmission / reception unit 131 that transmits user trace information or resource information.
- the user trace acquisition unit 11 and the resource information acquisition unit 15 can be executed by a predetermined virtual computer 210 or physical computer 201, and the user trace storage unit 12 and the resource information storage unit 16 can be set in the storage apparatus 250. .
- the user trace acquisition unit 11 and the resource information acquisition unit 15 function as an agent of the management computer 100 in each DC2.
- the user terminal 1 provides an application 10 that executes a predetermined process such as a process of connecting to the application 14 of the DC 2 and receiving a user input, and a transmission / reception unit 13 that communicates with the outside.
- the user terminal 1 includes a user trace acquisition unit 11A that acquires performance information such as response time viewed from the user terminal, and a user trace storage unit 12A that stores user trace information acquired by the user trace acquisition unit 11A.
- the user terminal 1 includes a processor and a memory (not shown). Further, the user trace acquisition unit 11 ⁇ / b> A and the user trace storage unit 12 ⁇ / b> A function as an agent of the management computer 100 in each user terminal 1.
- the user trace acquisition unit 11 is implemented at least in the DC 2 and, if possible, is implemented in the user terminal 1 as the user trace acquisition unit 11 A, and the actual service usage history (service action log) by the user using the user terminal 1 is recorded. Acquired and stored in the user trace storage unit 12A.
- the user trace information means that each DC 2 or an agent in each user terminal 1 monitors requests exchanged between users who use the user terminal 1, all applications 14 and data 140, and communication and processing related to responses. This is information gathered for each service transaction (or workload) by acquiring information related to logs and performance. Therefore, a set of user trace information related to one transaction (hereinafter referred to as transaction trace) is composed of one or more user trace information. User trace information collected by each agent is collected in the management computer 100.
- the user trace information can be acquired using a known or well-known method. For example, “Dapper, a large-scale distributed systems tracing infrastructure, structure.” (Sigelman, B. H., Barroso, L. A., Burs, M., P. M., Peter. ... & Shanbag, C. 2010, Google research) may be applied.
- the information acquired by the user trace information includes information on the tables T1 and T2 shown in FIGS. 2A and 2B.
- FIG. 2A is a diagram illustrating an example of a table T1 that stores information in units of user traces.
- FIG. 2B is a diagram illustrating an example of a table T2 that stores information in transaction units.
- a table T1 includes a Trace ID 301 for storing an identifier of user trace information, a Parent ID 302 for storing an identifier having a parent relationship between user traces, and a Sibling ID 303 for storing an identifier having a sibling relationship between user traces.
- the Destination Location 312, the Response Time 313 that stores the response time for the trace, the Throughput 314 that stores the throughput of the transaction, and the TAT 315 that stores the turnaround time of the transaction constitute one entry.
- the location information (309, 312) may be expressed in latitude and longitude as shown in the table T1, or may be converted by a GeoIP database service or the like by holding network location information such as an IP address (for example, , Quova IP Geo-Location Database. ⁇ Http://www.quova.com>).
- part of the service is an application (APP in the figure) It is composed of A and application B, and it is recorded that the service is established by two connections of the user terminal 1 used by the user from application A and application A to application B.
- APP application
- SLO Service Level Objects
- DC2 and network 50 such as response time, throughput, and TAT, as described above, and excludes security, availability, reliability, and the like.
- the table T2 includes a transaction ID 321 storing an identifier for each transaction, a parent ID 322 storing an identifier of a transaction having a parent relationship between transactions, and an identifier of a transaction having a sibling relationship between transactions.
- Sibling ID 323 to be stored Trace Set 324 to store a set of Trace IDs included in the transaction, Location 325 to store location information where the transaction has occurred, and Conversion 1 (326) to store the results of conversion 1 and conversion 2 respectively And one entry from Conversion 2 (327) and Sales 328 for storing the sales amount It is made.
- KPI is set for each transaction. Even if the KPI is set for each service, the same KPI may be set for all transactions belonging to the service.
- the location 325 is used when user trace information is divided and collected for each predetermined area and analyzed for each area.
- the KPI may be set for each area where the service is deployed.
- the user trace information is managed for each predetermined area, and the SLO is set for each area as described later.
- the above Conversion 1/2 (326, 327) indicates the achievement result of KPI. Specifically, “Is this registered?”, “Is the frequency of service usage more than twice / month”, “Number of social activities performed” Indicates whether or not the KPI item has been achieved in the user trace information such as “Is it once or more than a month?”, “Has the target billing amount been achieved”, or the like.
- KPI is achieved
- “True” is stored in Conversion 1/2 (326, 327)
- “False” is stored.
- the number of KPI items that are items targeted by the KPI in other words, the number of conversion items can be changed according to the set KPI.
- KPI covers items related to business income such as sales and profits, and user satisfaction related to them, and does not cover expenses such as expenses and expenses (expenses and expenses).
- indicators such as “income per expense” such as sales against expenses and sales and profits against cost of sales (cost rate and profit ratio) are subject to KPI.
- the application 14 executed by the virtual computer 210 of DC2 processes a part of the service provided by the service provider.
- a Web service it is an application that transmits a file constituting the Web service in response to a request from the user terminal 1, or in the case of an electronic commerce (EC) service, an application that searches for product data.
- EC electronic commerce
- the resource information acquisition unit 15 collects information regarding the operating status of the computing nodes in the DC 2 and stores the information in the resource information storage unit 16.
- the resource information indicates, for example, a processor usage rate, storage capacity, network usage amount, application log, and the like.
- acquisition of resource information can apply a well-known or well-known method. For example, “Nagios-The Industry Standard in IT Infrastructure Monitoring” (http://www.nagios.org/) may be applied.
- the user trace information stored in the user trace storage unit 12 and the resource information stored in the resource information storage unit 16 are transmitted to the transmission / reception unit 20 of the SLO management unit 3 of the management computer 100 by the transmission / reception unit 13.
- the transmission timing may be periodic, or may be event-driven triggered by acquisition of user trace information and resource information.
- the management computer 100 includes a processor 110, a memory 120, a storage 130, and a management console 5, and executes an SLO management unit 3, an arrangement plan unit 41, and an arrangement execution unit 42.
- the management console 5 includes an input device and an output device, and receives input from the service provider 4 or a system administrator.
- the SLO management unit 3 of the management computer 100 manages the arrangement of the application 14 and data 140 to the distributed DCs 2-1 to 2-n.
- the mounting location of the SLO management unit 3 is not limited to the management computer 100 of the present embodiment, but is the physical computer 201 and the virtual computer 210 in the DC2. Note that the SLO management unit 3 may be arranged in a distributed manner in each DC 2 to be distributed, or may be centrally arranged in one DC 2.
- the user trace accumulation unit 21 accumulates user trace information transmitted from the DC 2 and the user terminal 1 by the transmission / reception unit 20.
- the resource information accumulation unit 40 receives and accumulates the resource information transmitted from the DC 2 by the transmission / reception unit 20.
- the SLO management unit 3 when the SLO management unit 3 is distributed and distributed to each DC 2, the user trace accumulation unit 21 and the resource information accumulation unit 40 are connected to the DC 2-1 to 2 -n and the user who are in charge of each SLO management unit 3. Only user trace information transmitted from the terminal 1 and resource information transmitted from the DCs 2-1 to 2-n in charge of each SLO management unit 3 are accumulated.
- the SLO management unit 3 when the SLO management unit 3 is centrally arranged in one DC 2, the user trace accumulation unit 21 and the resource information accumulation unit 40 respectively have user trace information transmitted from all DCs 2 and all user terminals 1, Resource information transmitted from all DCs 2 is accumulated.
- the important SLO selection unit 22 extracts the KPI actual measurement value and the SLO actual measurement value for each transaction from the user trace information acquired from the user trace accumulation unit 21, and selects an SLO item highly related to the KPI. Detailed processing of the SLO management unit 3 including the important SLO selection unit 22 will be described later with reference to FIG.
- the SLO standard derivation unit 26 uses the KPI and SLO stored in the KPI storage unit 24 and the SLO storage unit 25 for each SLO item highly relevant to the KPI selected by the important SLO selection unit 22, respectively.
- the “application unit” and the “data unit” indicate units that can be arranged by dividing the application 14 and the data 140 into one or more data centers 2 corresponding to the distributed DCs 2-1 to 2-n. .
- the KPI storage unit 24 stores the KPI designated by the service provider 4 from the management console 5 via the setting input unit 23.
- FIG. 5A is a diagram illustrating an example of the KPI storage unit 24.
- the table T6 includes one entry from the Item 361 and the Value 362.
- the table T6 includes KPIs having a click through rate (Click Through Rate) of 15% or more, a sales (Sales Performance) of 100 million yen or more, and a profit rate (Profit Rate) of 20% or more.
- Click Through Rate Click Through Rate
- Sales Performance sales Performance
- Profile Rate profit rate
- FIG. 6A is a screen image for setting a KPI provided by the management computer 100.
- the setting input unit 23 of the SLO management unit 3 outputs the screen G1 to the management console 5.
- the management console 5 receives the KPI setting from the service provider 4.
- a target transaction G11, a KPI item G12, and a KPI standard G13 can be set.
- the KPI item G12 and the KPI standard G13 can be selected from values set in advance by a pull-down menu. Further, a new target transaction can be added by operating the “ADD” button.
- the set KPI information can be stored in the KPI storage unit 24 by operating the “Save” button.
- the SLO storage unit 25 stores the SLO as an initial value designated by the service provider 4 from the management console 5 via the setting input unit 23.
- An example of the SLO stored in the SLO storage unit 25 is shown in a table T7 in FIG. 5B.
- FIG. 5B is a diagram illustrating an example of the SLO storage unit 25.
- the table T7 constitutes one entry from Item 371 and Value 372.
- Table T7 includes SLOs with a response time of 500 ms or less, a throughput of 3 Mbps or more, and a TAT of 1 second or less.
- the SLO output unit 27 outputs the transaction unit or application and data unit SLO standard derived by the SLO standard deriving unit 26 to the management console 5 or the like.
- a known or publicly known technique may be applied to the output method and output format.
- the arrangement planning unit 41 plans the arrangement location of the application 14 and the data 140 on the DC 2 by using the SLO standard derived for each application and data by the SLO standard deriving unit 26 and the resource information of the resource information accumulation unit 40. .
- the arrangement execution unit 42 executes distribution of the application 14 and the data 140 to the DC 2 based on the arrangement plan output from the arrangement planning unit 41.
- the processor 110 operates as a functional unit that provides a predetermined function by performing processing according to a program of each functional unit.
- the processor 110 functions as the SLO management unit 3 by performing processing according to the SLO management program.
- the processor 110 also operates as a function unit that provides the functions of a plurality of processes executed by each program.
- a computer and a computer system are an apparatus and a system including these functional units.
- Information such as programs and tables for realizing each function of the SLO management unit 3 includes storage 130, nonvolatile semiconductor memory, hard disk drive, storage device such as SSD (Solid State Drive), or IC card, SD card, DVD, etc. Can be stored in any computer-readable non-transitory data storage medium.
- FIG. 3 is a flowchart illustrating an example of processing performed in the SLO management unit 3.
- the user trace acquisition unit 11 deployed in the user terminal 1 and the DC 2 collects user trace information, respectively (F1).
- the SLO management unit 3 of the management computer 100 acquires user trace information from the DC2 and the user trace storage units 12 and 12A of the user terminal 1 via the transmission / reception unit 20, and stores them in the user trace accumulation unit 21 (F2). .
- the user trace accumulating unit 21 aggregates the user trace information collected by the user trace acquisition units 11 and 11A as agents of each DC 2 and user terminal 1 in the management computer 100.
- the user trace accumulating unit 21 does not store the raw data of the user trace information shown in the table T1 and the table T2 as they are, but instead of the raw data of the user trace information as shown in the tables T3 to T5 shown in FIGS. User trace information of users may be stored together.
- FIG. 2C is a diagram illustrating an example of a table T3 that stores user traces of a plurality of users for a certain period in a transaction unit and stores them as transaction traces for KPI items.
- FIG. 2D is a diagram illustrating an example of a table T4 that stores user traces of a plurality of users for a certain period as transaction traces in a transaction unit with respect to the SLO item.
- FIG. 2E is a diagram illustrating an example of a table T5 in which user trace information of a plurality of users for a certain period is collected in units of one trace regarding the SLO item.
- the table T3 in FIG. 2C stores a transaction ID 331 for storing a transaction identifier, a period 332 for storing an accumulation period, a trace set 333 for storing a set of trace IDs included in the transaction, and position information where the transaction has occurred.
- One entry is composed of Location 334, Conversion 1 Rate (335) for storing the ratios of Conversion 1 and Conversion 2, respectively, Conversion 2 Rate (336), and Sales 3337 for storing the sales amount.
- the table T4 in FIG. 2D includes a transaction ID 341 for storing a transaction identifier, a location 342 for storing position information where the transaction has occurred, a response time 343 for storing a response time for the trace, and a turnaround time of the transaction.
- One entry is composed of the TAT 345 to be stored.
- the table T5 of FIG. 2E includes a trace ID 351 for storing user trace information identifiers, a transaction ID 352 for storing transaction identifiers, a period 353 for storing an accumulation period, and a response time 354 for storing response times for the traces.
- One entry consists of Throughput 355 that stores the throughput of the transaction and TAT 356 that stores the turnaround time of the transaction.
- a period 332 indicating an accumulation period is added to the table T3 with respect to the table T2.
- KPI items relating to rates are itemized, such as Conversion Rate 1/2 (335, 336).
- the values to be collected are in accordance with the KPI items and criteria. When the average is adopted in KPI, these items may be averaged. Thereafter, the KPI item information in the table T3 is used as the KPI actual measurement value.
- there is an item indicating the total sum of user trace information to be accumulated such as a Sales item 337.
- the table T4 similarly does not limit the values collected for the SLO items, such as the average, mode, and median. Thereafter, the SLO item information in the table T4 is used as the actual SLO value.
- the important SLO selection unit 22 of the SLO management unit 3 is the SLO that is important for the KPI item. Select an item (F5).
- the important SLO item refers to an SLO item having a high correlation with the KPI item or having a high causal relationship.
- a known or well-known method can be used as a method for deriving an important SLO item among SLO items having a high correlation (or causal relationship) with a KPI item.
- the SLO management unit 3 may perform multiple regression analysis using the KPI actual value acquired from the table T3 as an objective variable and the SLO actual value acquired from the table T4 in FIG. 2D as an explanatory variable.
- explanatory variables it is desirable to use a plurality of measured SLO values with different types of SLO items.
- the KPI actual values and SLO actual values in the table T3 in FIG. 2C and the table T4 in FIG. 2D are not normalized. It ’s fine.
- the t value can be treated as an index indicating the height (or importance) of the correlation between the KPI item and the SLO item.
- K is a matrix whose elements are k1, k2 and KPI measured values.
- KPI Conversion 1 Rate 335
- Transaction ID 331 of Table T3 is set to values of Conversion 1 Rate 335 of 100 and 101 in the figure for each of k1 and k2.
- Si is a matrix having SLO actual values of SLO item i as elements such as si1 and si2, and values such as response time 343 and throughput 344 of table T4 are entered in si1 and si2, respectively.
- S is a matrix having Si as an element
- ⁇ is a partial regression coefficient matrix
- ⁇ is an intercept matrix.
- T indicates that the matrix is a transposed matrix.
- an SLO item satisfying a certain standard or a higher-level SLO item is set as an important SLO item.
- a predetermined value can be used for the above-mentioned “constant standard” and “higher number to be extracted”, or a threshold value for the t value calculated by the above-described t-test can be used.
- steps F7 and F8 are executed for all the important SLO items calculated in step F5 (F6).
- the processing returns to step F4.
- step F7 the transaction unit SLO standard deriving unit 30 of the SLO standard deriving unit 26 derives an SLO standard along the KPI.
- a flowchart shown in FIG. 4A is shown as an example of a method for deriving an SLO standard along the KPI.
- FIG. 4A is a flowchart showing an example of the SLO standard derivation process performed by the transaction unit SLO standard derivation unit 30 of the management computer 100.
- the transaction unit SLO standard derivation unit 30 refers to the table T3, acquires the actual measured value of the KPI, and refers to the table T4 to acquire the correlation of the important SLO actual measured value (F21).
- the table T3 and the table T4 generated by the user trace information by a large number of users calculate a large number of measured KPI values and a large number of SLO measured values, and the distribution around the SLO items and the KPI items as axes. To do.
- a distribution approximate curve is obtained from the distribution calculated in step F21, and the intersection of the distribution curve and the KPI line is obtained (F22).
- KPI items related to income such as sales, profit ratio, user satisfaction, etc.
- SLO items related to performances such as response time, throughput, TAT, etc. in this example have a correlation
- the distribution is a power distribution.
- the first quadrant with the SLO item (Response Time) and the KPI item (Conversion 1 Rate) as the axes is divided into four areas by two straight lines, the KPI standard and the SLO standard. Then, there is a transaction trace that is plotted in a region that satisfies the SLO criterion but does not satisfy the KPI criterion.
- a transaction trace in which an error from the distribution approximate curve is plotted within a predetermined range is highly likely that the KPI has not been achieved due to a setting failure of the SLO standard.
- the transaction trace of the plot with a small error from the distribution approximate curve (within a predetermined range) satisfies the SLO standard. Can be interpreted as being plotted in a region not satisfying the KPI standard.
- the transaction trace in which the error from the distribution approximate curve is plotted within a predetermined range is expected to be plotted in a region that satisfies the KPI criterion, and the SLO criterion is changed to a new SLO criterion.
- the value of the SLO item axis at the intersection obtained in step F22 is set as the SLO reference along the KPI (F23).
- the SLO management unit 3 can automatically derive the SLO standard along the KPI.
- FIG. 7 is a graph showing a correlation between the KPI actual measurement value described in Step F21 and the SLO actual measurement value acquired from the table T4.
- FIG. 7 shows an example of the present invention in which the horizontal axis is Response Time as the SLO item and the vertical axis is Conversion 1 Rate as the KPI item.
- a vertical line S1 indicates the SLO standard
- a horizontal line K1 indicates the KPI standard
- C1 indicates a distribution approximate curve.
- A1 in the figure indicates an area where transaction traces that satisfy both the SLO standard and the KPI standard are plotted (user plot points in the figure), and A2 is a transaction that satisfies the SLO standard but does not satisfy the KPI.
- A3 indicates an area where a trace is plotted, and A3 indicates an area where a transaction trace that does not satisfy both the SLO criterion and the KPI criterion is plotted.
- the process of step F22 is performed for the purpose of moving the transaction trace plotted in the area A2 to the area A1 that satisfies the KPI.
- FIG. 8 is a graph illustrating the correlation between the measured KPI value shown in step F21 and the measured SLO value obtained from the table T4, as in FIG.
- I1 indicates an intersection between the distribution approximate curve C1 and the KPI standard K1
- S2 indicates a new SLO reference line passing through the intersection I1.
- a new SLO reference line S2 passing through the intersection I1 is set as the SLO reference along the KPI.
- it can be expected that transaction traces that satisfy the KPI standard will increase by changing the new SLO standard S2 in a direction that shortens the response time.
- the application / data unit SLO standard deriving unit 31 of the SLO standard deriving unit 26 expands the transaction unit SLO standard along the KPI into the application 14 and the data 140 unit. (F8).
- FIG. 4B is a flowchart illustrating an example of processing for setting an SLO standard for each application 14 and data 140.
- the application / data unit SLO standard deriving unit 31 refers to the table T3 to acquire the KPI actual measurement value as the objective variable, and refers to the table T5 to acquire the SLO actual measurement value of the application 14 and the data 140 unit as the explanatory variable.
- the application / data unit SLO criterion derivation unit 31 generates a model of the above equation (1) from the acquired objective variable and explanatory variable, and calculates a regression line by multiple regression analysis.
- the partial regression coefficient of the regression line may be t-tested and the calculated t values may be ranked.
- Si is a matrix having SLO actual values of SLO items i as elements such as si1 and si2, and the same SLO items of different user trace information are entered.
- si1 and si2 are stored in si1 and si2 in table T5.
- S is a matrix having Si as an element, and S1, S2,... Have different transaction IDs.
- ⁇ is a partial regression coefficient matrix, and ⁇ is an intercept matrix.
- T indicates that the matrix is a transposed matrix.
- the application / data unit SLO criterion derivation unit 31 extracts the contribution rate (or the degree of influence) of the application 14 and the data 140 to the corresponding SLO item by the multiple regression analysis described above (F30).
- the application / data unit SLO criterion derivation unit 31 calculates a large number of SLO actual measurement values, in other words, in order to increase n in the above equation (1), the application 14 is within a range that satisfies the SLO.
- the data 140 is distributed in a plurality of DCs 2-1 to 2-n. With this distributed arrangement, user trace information and transaction traces having different actual SLO values can be generated, so that a more accurate contribution rate can be derived.
- the application / data unit SLO criterion deriving unit 31 determines the new SLO criterion derived in step F23 of FIG. 4A as the SLO criterion for each application 14 and data 140 according to the contribution rate derived in step F30. Is set (F31).
- Step F31 a known or well-known method can be employed. For example, when the SLO item is response time or TAT, the contribution rate is normalized, and the reciprocal number is multiplied by the SLO criterion derived in step F23, thereby calculating the SLO criterion for each of the application 14 and the data 140.
- the SLO item is throughput
- the SLO standard for each of the application 14 and the data 140 is calculated by multiplying the normalized value of the contribution rate by the new SLO standard derived in step F23.
- a known or well-known technique can be adopted as a specific method for adjusting the SLO standard.
- a minimum required SLO standard may be selected. Specifically, when the response time standard for KPI-1 is 100 ms and the response time standard for KPI-2 is 50 ms, in order to satisfy both KPI-1 and KPI-2, What is necessary is just to employ 50 ms.
- an SLO that takes into account the costs for the arrangement of the application 14 and the data 140
- the standard can be set. Similar to the above example, if the response time standard is 100 ms for the profit rate (KPI-3) with respect to the operation management cost and the response time standard is 50 ms for the profit rate (KPI-4), KPI-
- KPI- 3 the profit rate
- KPI-4 the response time standard
- the SLO management unit 3 performs the above processing for all transactions (F3 loop), and then ends.
- the execution timing of the processing in FIG. 3 is arbitrary.
- the SLO standard according to the present embodiment is determined based on the history of processing by the user terminal 1 up to the time when the processing of FIG. 3 is executed. For this reason, if the process of FIG. 3 is periodically executed at long-term intervals, the user's action using the user terminal 1 is highly likely to deviate from the SLO standard. Therefore, it is preferable to execute the processing of FIG. 3 by periodic execution at short intervals or event driven after a certain period after service update.
- FIG. 6B shows an example of the output to the management console 5 based on the important SLO items of the transaction unit or the application 14 and the data 140 unit and the SLO standard.
- FIG. 6B is a screen image in which the output unit 27 of the management computer 100 outputs SLO information to the management console 5.
- the screen G2 can display the target transaction G21, the target application / data G22, the SLO item G23, and the SLO standard G24.
- the SLO management unit 3 outputs a screen G2 to the management console 5.
- the arrangement planning unit 41 formulates an arrangement plan for the application 14 and the data 140 using the important SLO items and SLO criteria of the transaction unit or application and data unit derived by the processing of FIG.
- a well-known or well-known technique can be employed for the method of arrangement planning.
- the arrangement plan of the application 14 and the data 140 includes the location of the DC 2 that provides the application 14 and the data 140, the designation of the physical computer 201 and the virtual computer 210 that execute the application 14, and the like. Since the SLO standard is set for the application 14 and the data 140 unit, the arrangement of the application 14 and the data 140 between the DCs 2 or in the DC 2 is planned without considering the dependency relationship between the applications 14 and the data 140. It is possible.
- the SLO standard for the application 14 and the data 140 unit set by the present embodiment is determined in order from the transaction trace close to the user using the user terminal 1 in the transaction trace. Therefore, the arrangement may be planned in order from the application 14 or data 140 close to the user terminal 1 in the order of the application 14 directly accessed by the user terminal 1, the application 14 accessed by the application 14, or the data 140.
- the placement planning unit 41 places the application 14 and the data 140 on the resource of DC2, if there are a plurality of DC2s that satisfy the derived SLO criterion, the application is intended to improve the processing accuracy of step F8. 14 and the replica of the data 140 are generated and arranged in a plurality of DCs 2 that satisfy the redundantly derived SLO criterion.
- the method of deriving the SLO from the KPI proposed in the present embodiment is a method of adjusting to the SLO according to the KPI based on the actual end-user behavior. If the measured SLO values of all end users are the same, the user plot points in FIG. 7 have a distribution like a straight line parallel to the vertical axis, and it becomes difficult to adjust to the desired SLO for KPI. Basically, since performance is not guaranteed particularly in cloud computing, the performance related to applications and data arranged in one place fluctuates. However, by placing applications and data in a plurality of data centers, a user trace having a larger fluctuation can be obtained, and as a result, an SLO along the KPI can be derived more accurately.
- the number of calculation patterns is about 100 * 1000, which greatly increases the amount of calculation for the arrangement plan compared to the conventional example. Can be reduced.
- the service provider 4 simply sets the KPI, and the SLO according to the KPI is automatically set, and the automatically set SLO Automatically, the application 14 and the data 140 are placed at a location (DC2) along the KPI.
- DC2 location
- a KPI is set for each target transaction.
- a KPI may be set for each service provided by the application 14.
- the management computer 100 that manages the arrangement of the application 14 and the data 140 in the computer system that can divide the service of the application 14 that can be divided and can be provided by one or more data centers 2, For each service or transaction, a KPI (first condition) including business conditions is received, and the reference values are automatically updated for items related to the service level target (second condition) of the application 14 and data 140.
- KPI first condition
- second condition service level target
- the agent In the data center 2 to which the computer (virtual computer 210 or physical computer 201) on which the application 14 operates belongs, the agent (user trace acquisition unit 11) is operated to monitor the user terminal 1 that uses the service of the application 14.
- the agent of each data center 2 acquires user trace information including a log of service (application 14) used by the user terminal 1 and performance information.
- the management computer 100 that manages the arrangement of the application 14 and the data 140 to the plurality of data centers 2 collects user trace information of the user terminal 1 using the application 14 and the data 140 from the agent of each data center 2.
- the management computer 100 calculates the importance (or contribution) for the SLO correlated with the KPI from the user trace information, and extracts the important SLO items. Then, the management computer 100 calculates a distribution approximation curve from the user trace information for the important SLO items correlated with the KPI, derives a new SLO criterion that satisfies the KPI criterion from the distribution approximation curve and the KPI criterion, and obtains the current SLO. Change the standard to the new SLO standard.
- the management computer 100 calculates the contribution rate (or the degree of influence) of the application 14 and the data 140 to the SLO item for the SLO item for the KPI. Then, the management computer 100 calculates the SLO standard of the application 14 and the SLO standard of the data 140 from the contribution rate and the new SLO standard.
- the SLO standard since the SLO standard is set in units of the application 14 and the data 140, the arrangement of the application 14 and the data 140 can be planned without considering the dependency relationship between the application 14 and the data 140. Is possible. And in this invention, the computational complexity concerning arrangement
- the configuration of the computer, the processing unit, and the processing unit described in the present invention may be partially or entirely realized by dedicated hardware.
- the various software exemplified in the present embodiment can be stored in various recording media (for example, non-transitory storage media) such as electromagnetic, electronic, and optical, and through a communication network such as the Internet. It can be downloaded to a computer.
- recording media for example, non-transitory storage media
- a communication network such as the Internet. It can be downloaded to a computer.
- the present invention is not limited to the above-described embodiments, and includes various modifications.
- the above-described embodiments have been described in detail for easy understanding of the present invention, and are not necessarily limited to those having all the configurations described.
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
Abstract
Dans la présente invention, un ordinateur de gestion : accepte des premières conditions, des conditions commerciales par exemple, pour chaque transaction utilisée par un terminal d'utilisateur; collecte, auprès de centres de données et d'agents sur des terminaux d'utilisateur ayant utilisé des applications, des informations de suivi d'utilisateur pour lesdits terminaux d'utilisateur; calcule, à partir desdites informations de suivi d'utilisateur, des degrés d'importance de secondes conditions, des cibles de niveaux de service d'applications plus précisément, corrélées avec les premières conditions susmentionnées; extrait d'importants éléments de secondes conditions comprenant des éléments correspondant aux secondes conditions dont des degrés d'importance sont égaux ou supérieurs à un seuil prescrit; calcule une courbe de distribution approximative desdits importants éléments de secondes conditions à partir des informations de suivi d'utilisateur; calcule, d'après ladite courbe de distribution approximative et les premières conditions, de nouvelles secondes conditions répondant aux premières conditions; calcule un taux de contribution de chaque application par rapport aux dites nouvelles secondes conditions; et calcule, d'après lesdits taux de contribution et les nouvelles secondes conditions, des critères pour les nouvelles secondes conditions par rapport aux applications.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/JP2013/080672 WO2015071975A1 (fr) | 2013-11-13 | 2013-11-13 | Procédé de gestion de distribution d'applications et de données, système de gestion de distribution d'applications et de données, et support de stockage |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/JP2013/080672 WO2015071975A1 (fr) | 2013-11-13 | 2013-11-13 | Procédé de gestion de distribution d'applications et de données, système de gestion de distribution d'applications et de données, et support de stockage |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2015071975A1 true WO2015071975A1 (fr) | 2015-05-21 |
Family
ID=53056945
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2013/080672 WO2015071975A1 (fr) | 2013-11-13 | 2013-11-13 | Procédé de gestion de distribution d'applications et de données, système de gestion de distribution d'applications et de données, et support de stockage |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2015071975A1 (fr) |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2003242228A (ja) * | 2002-01-31 | 2003-08-29 | Currie & Brown Japan Ltd | 施設運営管理システム |
| JP2007529048A (ja) * | 2003-07-11 | 2007-10-18 | インターナショナル・ビジネス・マシーンズ・コーポレーション | ビジネス・レベル・サービス・レベル・アグリーメントを監視し、制御するシステムおよび方法 |
| JP2008027442A (ja) * | 2006-07-21 | 2008-02-07 | Sony Computer Entertainment Inc | サブタスク・プロセッサの分散スケジューリング |
-
2013
- 2013-11-13 WO PCT/JP2013/080672 patent/WO2015071975A1/fr active Application Filing
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2003242228A (ja) * | 2002-01-31 | 2003-08-29 | Currie & Brown Japan Ltd | 施設運営管理システム |
| JP2007529048A (ja) * | 2003-07-11 | 2007-10-18 | インターナショナル・ビジネス・マシーンズ・コーポレーション | ビジネス・レベル・サービス・レベル・アグリーメントを監視し、制御するシステムおよび方法 |
| JP2008027442A (ja) * | 2006-07-21 | 2008-02-07 | Sony Computer Entertainment Inc | サブタスク・プロセッサの分散スケジューリング |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US9716756B2 (en) | Cloud data storage location monitoring | |
| US9774654B2 (en) | Service call graphs for website performance | |
| US9544403B2 (en) | Estimating latency of an application | |
| US20150067171A1 (en) | Cloud service brokering systems and methods | |
| Trihinas et al. | Monitoring elastically adaptive multi-cloud services | |
| US9766993B2 (en) | Quality of information assessment in dynamic sensor networks | |
| US20160225042A1 (en) | Determining a cost of an application programming interface | |
| Abderrahim et al. | A holistic monitoring service for fog/edge infrastructures: a foresight study | |
| JP2016504687A (ja) | 情報技術サービスの管理 | |
| Zheng et al. | Probabilistic QoS aggregations for service composition | |
| US20180097705A1 (en) | Backend Resource Costs For Online Service Offerings | |
| Barve et al. | FECBench: A holistic interference-aware approach for application performance modeling | |
| US12438800B2 (en) | Cloud native observability migration and assessment | |
| Klaver et al. | Towards independent run-time cloud monitoring | |
| US12047839B2 (en) | Out of box user performance journey monitoring | |
| US12250135B2 (en) | Intuitive graphical network mapping based on collective intelligence | |
| US20160225043A1 (en) | Determining a cost of an application | |
| Efimov et al. | Integration data model for continuous service delivery in cloud computing system | |
| JP6467365B2 (ja) | 故障解析装置、故障解析プログラムおよび故障解析方法 | |
| US9755925B2 (en) | Event driven metric data collection optimization | |
| Rady | Formal definition of service availability in cloud computing using OWL | |
| WO2015071975A1 (fr) | Procédé de gestion de distribution d'applications et de données, système de gestion de distribution d'applications et de données, et support de stockage | |
| Zhang et al. | A cloud infrastructure service recommendation system for optimizing real-time QoS provisioning constraints | |
| Chhetri et al. | CL-SLAM: Cross-layer SLA monitoring framework for cloud service-based applications | |
| JP2018032245A (ja) | 計算機システム及びリソース制御方法 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13897377 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 13897377 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: JP |