WO2012050224A1

WO2012050224A1 - Computer resource control system

Info

Publication number: WO2012050224A1
Application number: PCT/JP2011/073842
Authority: WO
Inventors: 英裕最首
Original assignee: EC One Inc
Current assignee: EC One Inc
Priority date: 2010-10-15
Filing date: 2011-10-17
Publication date: 2012-04-19
Anticipated expiration: 2013-04-15
Also published as: JP2012088770A; JP4811830B1

Abstract

Provided is a system that that allows the state of a computer resource included in a system that is being monitored to be identified and controlled in real time. The state of the control system itself can also be identified and controlled. This computer resource control system comprises a plurality of servers including: an administration server for determining whether or not an action is required for the computer resource on the basis of data collected from a monitoring agent; and an execution server for outputting a command to execute the action for the computer resource when it has been determined that an action is required for the computer resource. At least one of the servers in the computer resource control system includes the monitoring agent.

Description

Computer resource control system

　本発明は、「クラウド」という言葉で現されるような複雑化・大規模化するコンピュータリソースを安定して動作させるための、サービスレベルマネジメント技術に関する。 The present invention relates to a service level management technique for stably operating computer resources that are complicated and large in scale as expressed by the term “cloud”.

　従来、仮想化技術は、物理的には一台のコンピュータを、仮想的に複数台のコンピュータとして利用することを可能としている。つまり、仮想化技術によって、ハードウェアをソフトウェア化することが可能となり、サーバイメージをコピーして、必要なサーバ数を確保することができるようになっている。 Conventionally, virtualization technology has made it possible to use one computer physically as a plurality of computers. In other words, it becomes possible to make hardware hardware by using the virtualization technology, and it is possible to secure a necessary number of servers by copying a server image.

　また一方、一台の大きなサーバではなく、複数の小さなサーバに処理を分担させることにより性能を上げていく分散システムが、大量データを高速に保管・検索できる仕組みや、大規模なバッチシステムを分散して性能を上げていく仕組みなど、様々な分野で実用化されている。このような分散システムは、従来システムが一か所で行っていた機能を、複数のコンピュータで分散させながら、あたかも一台のコンピュータのように動作する。 On the other hand, a distributed system that improves performance by sharing processing with multiple small servers instead of a single large server distributes a large-scale batch system that can store and retrieve large amounts of data at high speed. It has been put to practical use in various fields, such as a mechanism for improving performance. Such a distributed system operates as if it were a single computer, while the functions previously performed by the system in one place are distributed among a plurality of computers.

　このような仮想化技術や大規模分散技術等をベースにして、近年クラウドと総称されるサービスが、ネットワーク上で提供されている。主なクラウドサービスとして、例えば、Ａｍａｚｏｎ　Ｗｅｂ　Ｓｅｒｖｉｃｅｓ（商標）などが知られている（非特許文献１参照）。 Recently, based on such virtualization technology and large-scale distributed technology, services collectively referred to as cloud have been provided on the network. As a main cloud service, for example, Amazon Web Services (trademark) is known (see Non-Patent Document 1).

「Ａｍａｚｏｎ　Ｗｅｂ　Ｓｅｒｖｉｃｅｓ」、［ｏｎｌｉｎｅ］、［平成２２年１０月１５日検索］、インターネット＜ＵＲＬ：http://aws.amazon.com/jp/＞"Amazon Web Services", [online], [October 15, 2010 search], Internet <URL: http://aws.amazon.com/jp/>

　ところで、クラウド環境では、データセンター等が仮想化・分散化されているため、仮想化されたサーバ等のコンピュータリソースを動的に変化させることによって、トランザクションやデータ量の変動に対して柔軟に対応可能なシステムを構築できるのではないかと期待されている。しかしながら、仮想化・分散化された環境下では、システムのどこで何が起きているのかを正確に把握することは難しい。例えば、ＣＰＵが高負荷になる原因が、ミドルウェアのガベージコレクションによる場合もあれば、ユーザ数の増加によりトランザクションが増加している場合もあるし、外部との通信が大量に発生している場合もある。 By the way, in a cloud environment, data centers are virtualized and distributed, so it is possible to flexibly deal with changes in transactions and data volume by dynamically changing computer resources such as virtualized servers. It is expected that a possible system can be constructed. However, in a virtualized / distributed environment, it is difficult to accurately grasp where and what is happening in the system. For example, the cause of high CPU load may be due to middleware garbage collection, transactions may increase due to an increase in the number of users, or a large amount of external communication may occur. is there.

　近年、ますます多くのトランザクションとデータに対応することが求められている中、様々な仮想化・分散化環境に備えて、状況をリアルタイムに把握し、事態を予測、そして遅滞なく制御していく仕組みが必要とされている。このような仕組みは、クラウド環境の信頼性と性能を大きく高めていくことにつながる。しかし同時に、このような仕組みに障害が発生し、制御が滞ると、クラウドの機能は大幅に低下しかねない。そのため、コンピュータリソースを監視し、制御する仕組み自体にも、スケーラビリティと耐障害性能が要求される。 In recent years, it has been required to support more and more transactions and data, and in preparation for various virtualization and decentralized environments, grasp the situation in real time, predict the situation, and control without delay A mechanism is needed. Such a mechanism leads to greatly improving the reliability and performance of the cloud environment. At the same time, however, if such a mechanism fails and control is delayed, cloud functionality can be significantly degraded. Therefore, scalability and fault tolerance performance are also required for the mechanism itself for monitoring and controlling computer resources.

　本発明は、かかる実情に鑑み、監視対象システムのコンピュータリソースの状況をリアルタイムに監視し、制御することのできるソリューションを提供しようとするものである。また、このようなソリューション自体に、スケーラビリティと耐障害性能を担保しようとするものである。 In view of such circumstances, the present invention is intended to provide a solution that can monitor and control the status of computer resources of a monitored system in real time. It also seeks to ensure scalability and fault tolerance performance for such a solution itself.

　本発明の一態様によるコンピュータリソース制御システムは、コンピュータリソースの状況を監視して状況に応じた制御を行う。コンピュータリソース制御システムは、監視エージェントから収集されたデータに基づいて、コンピュータリソースに対する制御の要否を判断し、コンピュータリソースに対する制御を要すると判断された場合に、コンピュータリソースに対する制御を実行するための指示を出力する処理部と、監視エージェントと処理部との間でデータを交換するためのメッセージキューと、メッセージキューの状況を監視するための第１の監視エージェントと、を備え、処理部は、第１の監視エージェントから収集されたデータに基づいて、コンピュータリソースに対する制御の要否を判断する。
　本発明の一態様によるコンピュータリソース制御システムは、監視エージェントから収集されたデータと予め定義された制御ルールとを比較して、コンピュータリソースに対するアクションの要否を判断する管理サーバと、管理サーバによって、コンピュータリソースに対するアクションを要すると判断された場合に、コンピュータリソースに対するアクションを実行するための指示を出力する実行サーバと、を含む複数のサーバを含んで構成され、コンピュータリソース制御システム内のサーバの少なくとも一つに監視エージェントを含む。これによれば、監視対象のシステムに含まれるコンピュータリソースの状況をリアルタイムに把握し、制御することができるシステムを提供できる。また、同時に、そのようなシステムに含まれるコンピュータリソースの状況をリアルタイムに監視し、制御することができる。 A computer resource control system according to an aspect of the present invention monitors the status of computer resources and performs control according to the status. The computer resource control system determines whether or not the computer resource needs to be controlled based on the data collected from the monitoring agent, and executes the control for the computer resource when it is determined that the computer resource needs to be controlled. A processing unit that outputs an instruction; a message queue for exchanging data between the monitoring agent and the processing unit; and a first monitoring agent for monitoring the status of the message queue. Based on the data collected from the first monitoring agent, it is determined whether or not it is necessary to control the computer resource.
A computer resource control system according to an aspect of the present invention compares a data collected from a monitoring agent with a predefined control rule to determine whether or not an action is required for a computer resource, and a management server, An execution server that outputs an instruction to execute an action on the computer resource when it is determined that an action on the computer resource is required, and includes at least one of the servers in the computer resource control system. One includes a monitoring agent. According to this, it is possible to provide a system that can grasp and control the status of computer resources included in the system to be monitored in real time. At the same time, the status of computer resources included in such a system can be monitored and controlled in real time.

　本発明の一態様によると、アクションは、あらゆるコンピュータリソースの制御を含む。好適には、アクションは、コンピュータリソース制御システムに含まれるサーバの数を増減させる処理を含む。これによれば、投入するコンピュータリソースの量を動的に制御できる。また、好適には、アクションは、サーバにおけるジョブの実行、サーバにおけるプログラムの起動、各種設定変更、ワークフロー制御、物理サーバの制御、ネットワーク機器の制御、又は、コンピュータリソース間の連携などの処理を含む。 According to one aspect of the invention, the action includes control of any computer resource. Preferably, the action includes processing for increasing or decreasing the number of servers included in the computer resource control system. According to this, the amount of computer resources to be input can be dynamically controlled. Preferably, the action includes processing such as job execution on the server, program start on the server, various setting changes, workflow control, physical server control, network device control, or cooperation between computer resources. .

　また、好適には、コンピュータリソース制御システムはさらに、監視エージェントとコンピュータリソース制御システム内のサーバとの間でデータを非同期的に交換するためのメッセージキューを含むメッセージキューサーバを備える。メッセージキューサーバは、データの交換状況を監視する監視エージェントを含むことが好ましい。これによれば、メッセージキューのデータ量に応じて、コンピュータリソースを適宜制御できるようになる。 Also preferably, the computer resource control system further comprises a message queue server including a message queue for asynchronously exchanging data between the monitoring agent and a server in the computer resource control system. The message queue server preferably includes a monitoring agent that monitors the data exchange status. According to this, computer resources can be appropriately controlled according to the data amount of the message queue.

　さらに、メッセージキューサーバと、管理サーバと、実行サーバは、それぞれ複数のサーバによって構成され、アクションは、これらのサーバに対する制御を含むことが好ましい。一例として、各サーバが仮想サーバによって構成され、アクションは、メッセージキューサーバを構成するサーバの数を増減させる処理、管理サーバを構成するサーバの数を増減させる処理、又は、実行サーバを構成するサーバの数を増減させる処理のうち、少なくとも一つを含むことが好ましい。これによれば、単一障害点のない分散構造でシステムが構成されるため、どこか単一の機能に障害が発生しても、全体としてはダウンしないシステムを構築できる。 Furthermore, it is preferable that the message queue server, the management server, and the execution server are each configured by a plurality of servers, and the action includes control for these servers. As an example, each server is configured by a virtual server, and an action is a process that increases or decreases the number of servers that constitute a message queue server, a process that increases or decreases the number of servers that constitute a management server, or a server that constitutes an execution server It is preferable that at least one of the processes for increasing / decreasing the number is included. According to this, since the system is configured in a distributed structure without a single point of failure, it is possible to construct a system that does not go down as a whole even if a failure occurs in any single function.

　また、好適には、メッセージキューサーバは、複数の監視エージェントから収集されたデータが順次入力され、管理サーバによって順次読み出されるデータ管理キュー、管理サーバからアクションの指示が順次入力され、実行サーバによって順次読み出される実行キュー、及び、実行サーバからコンピュータリソースに対するアクションを実行するための処理データが順次入力され、対応する監視エージェントによって順次読み出される管理キューのうち、少なくともいずれかを備え、メッセージキューサーバに含まれる監視エージェントは、データ管理キュー、実行キュー、及び管理キューの待ち行列を監視する。これによれば、キュー毎の待ち行列を監視することで、よりきめ細かな制御が可能になる。 Preferably, the message queue server sequentially receives data collected from a plurality of monitoring agents, sequentially reads data management queues read by the management server, inputs action instructions from the management server, and sequentially executes by the execution server. The message queue server includes at least one of an execution queue to be read and a management queue sequentially input processing data for executing an action on a computer resource from the execution server and sequentially read by the corresponding monitoring agent. The monitoring agent that monitors the data management queue, the execution queue, and the queue of the management queue. According to this, finer control becomes possible by monitoring the queue for each queue.

　さらに好適には、コンピュータリソース制御システム内の各サーバは、各サーバの稼働状況をそれぞれ監視する監視エージェントを含み、管理サーバは、各サーバの稼働状況に基づいて、コンピュータリソースに対するアクションの要否を判断する。これによれば、各サーバの稼働状況に応じて、インスタンスの起動や停止など、コンピュータリソースに対する制御を行うことができる。 More preferably, each server in the computer resource control system includes a monitoring agent that monitors the operating status of each server, and the management server determines whether or not an action is required for the computer resource based on the operating status of each server. to decide. According to this, it is possible to control computer resources such as starting and stopping of instances according to the operating status of each server.

　コンピュータリソース制御システムはさらに、データを格納するためのデータベースサーバ、複数の監視エージェントから収集されたデータをメッセージキューサーバから読み出してデータベースに登録する収集サーバ、及び、データベースに格納されたデータを読み出して編集し、ユーザ端末装置へ送信するダッシュボード・サーバのうち、少なくともいずれかを備えることが好ましい。これによれば、利用者に監視状況をリアルタイムで表示するダッシュボードを提供できる。 The computer resource control system further includes a database server for storing data, a collection server for reading data collected from a plurality of monitoring agents from the message queue server and registering it in the database, and reading data stored in the database. It is preferable to include at least one of dashboard servers that edit and transmit to the user terminal device. According to this, the dashboard which displays a monitoring condition in real time to a user can be provided.

　好適には、データベースサーバと、収集サーバと、ダッシュボード・サーバは、それぞれ複数のサーバによって構成され、アクションは、これらのサーバに対する制御を含むことが好ましい。一例として、各サーバが仮想サーバによって構成され、アクションは、データベースサーバを構成するサーバの数を増減させる処理、収集サーバを構成するサーバの数を増減させる処理、又は、ダッシュボード・サーバを構成するサーバの数を増減させる処理のうち、少なくとも一つを含む。これによれば、単一障害点のない分散構造で、フォールトトレラントなシステムを提供できる。 Preferably, the database server, the collection server, and the dashboard server are each configured by a plurality of servers, and the action preferably includes control for these servers. As an example, each server is configured by a virtual server, and an action configures a process that increases or decreases the number of servers that constitute a database server, a process that increases or decreases the number of servers that constitute a collection server, or a dashboard server At least one of the processes for increasing / decreasing the number of servers is included. According to this, a fault tolerant system can be provided with a distributed structure having no single point of failure.

　本発明の一態様によるコンピュータリソース制御方法は、コンピュータリソースの制御を行う制御システムにおいて、制御システムの備える処理装置が処理を行う方法である。処理装置は、監視エージェントから収集されたデータに基づいて、コンピュータリソースに対する制御の要否を判断するステップと、コンピュータリソースに対する制御を要すると判断された場合に、コンピュータリソースに対する制御を実行するための指示を出力するステップと、監視エージェントと制御システムとの間でメッセージキューを介してデータを交換するステップと、を備える。判断するステップは、メッセージキューの状況を監視する第１の監視エージェントから収集されたデータに基づいて、コンピュータリソースに対する制御の要否を判断する。
　また、本発明の一態様によるコンピュータリソース制御方法は、コンピュータリソースの状況を監視して状況に応じた制御を行う制御システムにおいて、制御システムの備える処理装置が処理を行う方法である。処理装置は、複数の監視エージェントから収集されたデータと予め定義された制御ルールとを比較して、コンピュータリソースに対するアクションの要否を判断するステップと、コンピュータリソースに対するアクションを要すると判断された場合に、コンピュータリソースに対するアクションを実行するための指示を出力するステップと、を備える。制御システムは、複数のサーバを含んで構成され、複数のサーバの少なくとも一つに監視エージェントを含む。 A computer resource control method according to an aspect of the present invention is a method in which a processing apparatus included in a control system performs processing in a control system that controls computer resources. The processing device determines whether or not to control the computer resource based on data collected from the monitoring agent, and executes control of the computer resource when it is determined that the computer resource needs to be controlled. Outputting instructions and exchanging data between the monitoring agent and the control system via a message queue. The determining step determines whether or not the computer resource needs to be controlled based on data collected from the first monitoring agent that monitors the status of the message queue.
A computer resource control method according to an aspect of the present invention is a method in which a processing apparatus included in a control system performs processing in a control system that monitors the status of computer resources and performs control according to the status. The processing device compares data collected from a plurality of monitoring agents with a predefined control rule to determine whether or not an action is required for a computer resource, and when it is determined that an action is required for a computer resource And outputting an instruction for executing an action on the computer resource. The control system is configured to include a plurality of servers, and at least one of the plurality of servers includes a monitoring agent.

　なお、本発明において、システムという用語は、物理的コンピュータで構成されたシステムのみでなく、コンピュータ上で仮想的に構築されたシステムをも含む。また、コンピュータリソースという用語は、コンピュータに関するあらゆるレベルのハードウェア及びソフトウェアを含むものであり、物理的に構成されているか或いは仮想的に構成されているかを問わない。また、特に明示しない限り、サーバという用語は実サーバと仮想サーバの両者を含み得る。さらに、キューという用語は、先に入力したデータが先に出力されるという特徴を有する任意の構成を含み得るものである。 In the present invention, the term system includes not only a system constituted by a physical computer but also a system virtually constructed on the computer. The term computer resource includes all levels of hardware and software related to a computer, regardless of whether it is physically configured or configured virtually. Moreover, unless otherwise specified, the term server may include both real servers and virtual servers. Further, the term queue may include any configuration having the feature that previously input data is output first.

　本発明によれば、監視対象のシステムに含まれるコンピュータリソースの状況をリアルタイムに監視し、制御することのできるソリューションを提供することができるという優れた効果を奏し得る。また、このようなソリューション自体に、スケーラビリティと耐障害性能を担保することができるというという優れた効果を奏し得る。 According to the present invention, it is possible to provide an excellent effect that it is possible to provide a solution that can monitor and control the status of computer resources included in a monitored system in real time. In addition, such a solution itself can have an excellent effect of ensuring scalability and fault tolerance performance.

クラウドコンピューティング環境の概略構成を示す図である。It is a figure which shows schematic structure of a cloud computing environment. 仮想化技術及び分散化技術の概要を示す図である。It is a figure which shows the outline | summary of a virtualization technique and a decentralization technique. コンピュータリソース制御システムの概略構成の一例を示すブロック図である。It is a block diagram which shows an example of schematic structure of a computer resource control system. コンピュータリソース制御システムの一実施例を示すブロック図である。1 is a block diagram illustrating an embodiment of a computer resource control system. 制御ルールの一例である。It is an example of a control rule. ダッシュボードの一例である。It is an example of a dashboard. コンピュータリソース制御システムの他の実施例を示すブロック図である。It is a block diagram which shows the other Example of a computer resource control system. コンピュータリソース制御システムにおける処理のフローチャートの一例である。It is an example of the flowchart of the process in a computer resource control system.

　以下、本発明の実施の形態について図面を参照しつつ詳細に説明する。なお、同一の要素には同一の符号を付し、重複する説明を省略する。また、以下の実施の形態は、本発明を説明するための例示であり、本発明をその実施の形態のみに限定する趣旨ではない。さらに、本発明は、その要旨を逸脱しない限り、さまざまな変形が可能である。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In addition, the same code | symbol is attached | subjected to the same element and the overlapping description is abbreviate | omitted. The following embodiments are examples for explaining the present invention, and are not intended to limit the present invention only to the embodiments. Further, the present invention can be variously modified without departing from the gist thereof.

　図１は、本発明によるコンピュータリソース制御システムの前提となるクラウドコンピューティング環境（クラウド環境）の概略構成を示す図である。同図に示すように、クラウドコンピューティング環境においては、ユーザ端末装置１２がネットワークＮを介してクラウド１０に接続される。 FIG. 1 is a diagram showing a schematic configuration of a cloud computing environment (cloud environment) which is a premise of a computer resource control system according to the present invention. As shown in the figure, in a cloud computing environment, a user terminal device 12 is connected to a cloud 10 via a network N.

　クラウド１０は、ソフトウェアやハードウェア、データ保管領域などのコンピューティングリソースの利用を、ネットワークＮを通じてサービスとして利用者に提供するシステムの総称であり、一般的には、大規模なデータセンターや、その中で運用されている複数のサーバ装置などを含む。クラウド１０は、ＡＳＰサービスやユーティリティコンピューティング、グリッドコンピューティング、ＳａａＳ／ＰａａＳなどを包含した、より包括的な概念であるともいえる。ユーザ端末装置１２の側から見れば、クラウド１０は、ネットワークＮの向こう側にあり、ユーザ端末装置１２に何らかのサービスを提供するコンピュータリソースの総称であるともいえる。本発明は、パブリッククラウド、プライベートクラウド、ハイブリッドクラウドを含む、あらゆるクラウド環境に適用可能である。 The cloud 10 is a generic term for systems that provide users with computing resources such as software, hardware, and data storage areas as services through the network N. In general, the cloud 10 Including a plurality of server devices operated in the network. It can be said that the cloud 10 is a more comprehensive concept including ASP services, utility computing, grid computing, SaaS / PaaS, and the like. When viewed from the user terminal device 12 side, the cloud 10 is on the other side of the network N and can be said to be a generic term for computer resources that provide some service to the user terminal device 12. The present invention is applicable to all cloud environments including public clouds, private clouds, and hybrid clouds.

　好適には、クラウド１０内のネットワーク上に分散して存在する物理的なディスクや物理的なサーバは、仮想化されて論理的に管理される。さらに、コンピュータリソース制御システムは、仮想化されて管理されるリソースのうち、稼動していないものはリソースプールに登録しておき、変動する要求に応じて、動的にリソースプールからリソースを取り出す。そして、コンピュータリソース制御システムは、取り出したリソースにタスクを割り当てて、スケーラブルなサービス提供を保障する。 Preferably, physical disks and physical servers distributed on the network in the cloud 10 are virtualized and logically managed. Furthermore, the computer resource control system registers inactive resources among resources managed in a virtualized manner in the resource pool, and dynamically extracts resources from the resource pool in response to changing requests. Then, the computer resource control system assigns a task to the extracted resource, and ensures scalable service provision.

　ユーザ端末装置１２は、利用者がクラウド１０を利用するための端末装置であり、ネットワークＮへの接続環境とユーザ端末装置１２上で動くブラウザを含む。このようなユーザ端末装置１２としては、パーソナルコンピュータ（ＰＣ）、携帯情報端末装置（ＰＤＡ）、タブレット型端末装置、携帯電話機、スマートフォンなどを含む。 The user terminal device 12 is a terminal device for a user to use the cloud 10, and includes a connection environment to the network N and a browser that runs on the user terminal device 12. Such user terminal devices 12 include personal computers (PCs), personal digital assistants (PDAs), tablet terminal devices, mobile phones, smartphones, and the like.

　ネットワークＮは、クラウド１０とユーザ端末装置１２との間でデータ等を送受信するための通信回線である。例えば、インターネット、ＬＡＮ、専用線、パケット通信網、電話回線、企業内ネットワーク、その他の通信回線、それらの組み合わせ等のいずれであってもよく、有線であるか無線であるかを問わない。 The network N is a communication line for transmitting and receiving data and the like between the cloud 10 and the user terminal device 12. For example, it may be any of the Internet, a LAN, a dedicated line, a packet communication network, a telephone line, a corporate network, other communication lines, combinations thereof, and the like, regardless of whether they are wired or wireless.

　図２は、本発明によるコンピュータリソース制御システムの前提となる仮想化技術及び分散化技術の概要を示す図である。同図に示すように、物理的なコンピュータ装置群２０は、分散化技術により、コンピュータ装置群２０で機能や処理を分散させながら、あたかも一台のコンピュータ２２のように動作する。例えば、コンピュータ装置群２０は、ネットワーク２２１を通じて、仮想的な１台のハードウェア２２２の上でオペレーティング・システム（ＯＳ）２２３が動いているように動作する。 FIG. 2 is a diagram showing an overview of the virtualization technology and the decentralized technology which are the premise of the computer resource control system according to the present invention. As shown in the figure, the physical computer device group 20 operates as if it is a single computer 22 while the functions and processing are distributed in the computer device group 20 by the decentralization technique. For example, the computer device group 20 operates as if an operating system (OS) 223 is running on a single virtual piece of hardware 222 via the network 221.

　また、仮想化技術により、一台のコンピュータ２２のように動作するコンピュータ装置群２０を、仮想的に複数のコンピュータ（サーバを含む）２４として利用者は利用することができる。つまり、ハードウェアのソフトウェア化である。この仮想化技術によって、仮想化されたサーバ２４をコピーすれば、同じサーバのレプリケーション（複製）を作成できるため、サーバイメージをコピーすることによって、必要なサーバ数を利用者は確保することができるようになる。また、サーバ数を減らす場合は、サーバイメージが削除されればよい。仮想化の一例としては、図２に示すように、ＫＶＭ（Ｋｅｒｎｅｌ－ｂａｓｅｄ　Ｖｉｒｔｕａｌ　Ｍａｃｈｉｎｅ）と呼ばれる仮想化ソフト２２４上でＯＳ２２５が動く。そのＯＳ２２５の上では、Ｊａｖａ　Ｖｉｒｔｕａｌ　Ｍａｃｈｉｎｅ（ＪＶＭ。なお、Ｊａｖａは登録商標。）のような別の仮想化環境２２６が動き、その上でミドルウェア２２７があり、アプリケーション２２８が動く。 In addition, the computer device group 20 that operates like a single computer 22 can be virtually used as a plurality of computers (including servers) 24 by the virtualization technology. In other words, it is hardware conversion to hardware. If the virtualized server 24 is copied by this virtualization technology, replication of the same server can be created. Therefore, by copying the server image, the user can secure the necessary number of servers. It becomes like this. Further, in order to reduce the number of servers, the server image may be deleted. As an example of virtualization, as shown in FIG. 2, an OS 225 runs on virtualization software 224 called KVM (Kernel-based Virtual Machine). On the OS 225, another virtual environment 226 such as Java Virtual Machine (JVM. Java is a registered trademark) runs, on which the middleware 227 and the application 228 run.

　なお、コンピュータ装置群２０を構成する個々の物理的なコンピュータは、コンピュータの動作や処理を制御するためのＣＰＵなどの処理装置、データの格納や処理の作業領域として機能するメモリや記憶装置、入出力インターフェース、通信インターフェース、及びこれらを結ぶバスを含むことが好ましい。また、コンピュータ装置群２０は、単一のコンピュータより構成されるものであっても、ネットワーク上に分散した複数のコンピュータより構成されるものであってもよい。各コンピュータは、処理装置がメモリまたは記憶装置などに記憶された所定のプログラムを実行することにより、各種機能実現手段として各コンピュータを機能させる。 The individual physical computers constituting the computer device group 20 include a processing device such as a CPU for controlling the operation and processing of the computer, a memory and storage device functioning as a work area for data storage and processing, and an input device. It is preferable to include an output interface, a communication interface, and a bus connecting them. Further, the computer device group 20 may be constituted by a single computer or may be constituted by a plurality of computers distributed on a network. Each computer causes each computer to function as various function realizing means by the processing device executing a predetermined program stored in a memory or a storage device.

　図３は、本発明によるコンピュータリソース制御システム１の概略構成の一例を示すブロック図である。本実施例によるコンピュータリソース制御システム１は、メッセージネットワーク３２とデータを処理する処理部３４とを含む。同図に示すように、本実施例によるコンピュータリソース制御システム１は、監視対象の監視ポイントに組み込まれた監視エージェント３０から、メッセージネットワーク３２を介して監視データを収集３４１する。そして、コンピュータリソース制御システム１は、収集された監視データ３４２に基づいて、監視対象をモニタ３４３し、監視対象に必要なコンピュータリソースの需要を予測３４４し、監視対象のコンピュータリソースを動的に制御３４５する。コンピュータリソースの制御３４５の具体例としては、仮想サーバの数の増減、ジョブの実行、プログラムの起動、各種設定変更、ワークフロー制御、電源のオン／オフを含む物理サーバの制御、ネットワーク機器の制御、サーバ間の連携など、クラウド１０内のあらゆるコンピュータリソースに関するあらゆる制御を含み得る。なお、監視エージェント３０、メッセージネットワーク３２、及び処理部３４の全てが、クラウド１０の内部に構成されることが好ましい。 FIG. 3 is a block diagram showing an example of a schematic configuration of the computer resource control system 1 according to the present invention. The computer resource control system 1 according to the present embodiment includes a message network 32 and a processing unit 34 that processes data. As shown in the figure, the computer resource control system 1 according to the present embodiment collects 341 monitoring data via a message network 32 from a monitoring agent 30 incorporated in a monitoring point to be monitored. Then, the computer resource control system 1 monitors the monitoring target 343 based on the collected monitoring data 342, predicts the demand 344 of the computer resource necessary for the monitoring target, and dynamically controls the computer resource to be monitored. 345. Specific examples of the computer resource control 345 include increase / decrease in the number of virtual servers, job execution, program activation, various setting changes, workflow control, physical server control including power on / off, network device control, Any control over any computer resources in the cloud 10, such as coordination between servers, may be included. The monitoring agent 30, the message network 32, and the processing unit 34 are all preferably configured inside the cloud 10.

　なお、本発明によるコンピュータリソース制御システム１を構築するクラウド１０は、サーバリソース等を管理するＡＰＩが実装されていれば、パブリッククラウド、プライベートクラウドを問わず、どのような環境でも構築可能であり、複数の環境を組み合わせて構築することも可能である。Ａｍａｚｏｎ　Ｗｅｂ　Ｓｅｒｖｉｃｅｓは構築可能なクラウド環境の一例である。 The cloud 10 for constructing the computer resource control system 1 according to the present invention can be constructed in any environment regardless of whether it is a public cloud or a private cloud, as long as an API for managing server resources and the like is implemented. It is also possible to construct a combination of multiple environments. Amazon Web Services is an example of a cloud environment that can be constructed.

　監視エージェント３０は、小さなソフトウェアモジュールであり、監視対象の監視ポイントに組み込まれ、この監視エージェント３０が監視情報を収集する。監視対象としては、例えば、コンピュータリソース制御システム１内のコンピュータリソースの監視、アプリケーションの監視、ログファイルの監視、プロセスの監視、ジョブの監視などがある。また、独自のセンサーネットワークや工場のラインの監視にも応用可能である。監視エージェント３０は、収集した監視情報のデータをメッセージネットワーク３２に送信する。また、監視エージェント３０は、処理部３４側から動的に中身を入れ替えることができるようになっている。 The monitoring agent 30 is a small software module and is incorporated in a monitoring point to be monitored, and the monitoring agent 30 collects monitoring information. Examples of monitoring targets include computer resource monitoring in the computer resource control system 1, application monitoring, log file monitoring, process monitoring, and job monitoring. It can also be applied to monitoring unique sensor networks and factory lines. The monitoring agent 30 transmits the collected monitoring information data to the message network 32. The monitoring agent 30 can dynamically change the contents from the processing unit 34 side.

　メッセージネットワーク３２は、監視ポイントに埋め込まれた監視エージェント３０と、処理部３４とのデータ交換を実現するためのものである。監視対象が大量になった場合、処理部３４側のスループットによってデータの取りこぼしが発生する場合がある。こうした事態を避けるために、データの受け渡しは分散メッセージネットワーク構造を採用することが好ましい。これにより、大量の監視対象を、効率よく監視・制御することが可能になる。 The message network 32 is for realizing data exchange between the monitoring agent 30 embedded in the monitoring point and the processing unit 34. When the number of monitoring targets becomes large, data may be lost due to the throughput on the processing unit 34 side. In order to avoid such a situation, it is preferable to adopt a distributed message network structure for data transfer. Thereby, it becomes possible to efficiently monitor and control a large number of monitoring targets.

　処理部３４は、監視エージェント３０からデータを収集し、データベースに保管する。また、監視エージェント３０からのデータに基づき、クラウド環境内にある監視対象のコンピュータリソースに対する制御を行う機能を担っている。一例として、処理部３４の動作は、利用者が作成するＤＳＬ（Ｄｏｍａｉｎ　Ｓｐｅｃｉｆｉｃ　Ｌａｎｇｕａｇｅ）によって定義される。また、監視エージェント３０が取得するデータは、数値データ、文字データ、ログファイル、その他監視対象で発生したあらゆる事象に関するデータを含み得る。本明細書において、計測値という用語は監視エージェント３０が取得する任意のデータを意味するものとする。なお、後述の実施例のように、処理部３４の各要素は、分散化されることが好ましい。これにより、処理部３４は、単一障害点のない構造になるとともに、性能劣化を仮想化されたコンピュータリソースの台数増加により補える構造になる。 The processing unit 34 collects data from the monitoring agent 30 and stores it in the database. Further, it has a function of controlling the computer resources to be monitored in the cloud environment based on the data from the monitoring agent 30. As an example, the operation of the processing unit 34 is defined by DSL (Domain Specific Language) created by the user. The data acquired by the monitoring agent 30 may include numerical data, character data, log files, and other data related to any event that has occurred in the monitoring target. In the present specification, the term measurement value means any data acquired by the monitoring agent 30. In addition, it is preferable that each element of the process part 34 is distributed like the below-mentioned Example. As a result, the processing unit 34 has a structure without a single point of failure, and has a structure that can compensate for performance degradation by increasing the number of virtualized computer resources.

　図４は、本発明によるコンピュータリソース制御システム１の一実施例を示すブロック図である。同図に示すとおり、本実施例においてコンピュータリソース制御システム１は、メッセージキューサーバ４１と、収集サーバ４２と、管理サーバ４３と、実行サーバ４４と、データベースサーバ４５と、ダッシュボード・サーバ４６とを含む。これらの各サーバは、同じサーバイメージを有する複数の仮想サーバによって分散化されていることが好ましい。 FIG. 4 is a block diagram showing an embodiment of the computer resource control system 1 according to the present invention. As shown in the figure, in this embodiment, the computer resource control system 1 includes a message queue server 41, a collection server 42, a management server 43, an execution server 44, a database server 45, and a dashboard server 46. Including. Each of these servers is preferably distributed by a plurality of virtual servers having the same server image.

　コンピュータリソース制御システム１は、クラウド１０の環境内の監視対象クラスタ５０に含まれる監視対象アプリケーションサーバ４０に組み込まれた監視エージェント３０から監視データを受け取る。また、コンピュータリソース制御システム１は、ユーザ端末装置１２に対して、ブラウザで閲覧可能なダッシュボード４８を提供する。なお、図３のメッセージネットワーク３２は、メッセージキューサーバ４１に対応する。図３の処理部３４は、収集サーバ４２、管理サーバ４３、実行サーバ４４、データベースサーバ４５、及びダッシュボード・サーバ４６に対応する。また、コンピュータリソース制御システム１と監視対象アプリケーションサーバ４０は、クラウド１０上で稼働する。 The computer resource control system 1 receives monitoring data from the monitoring agent 30 incorporated in the monitoring target application server 40 included in the monitoring target cluster 50 in the environment of the cloud 10. Further, the computer resource control system 1 provides a dashboard 48 that can be browsed by a browser to the user terminal device 12. Note that the message network 32 in FIG. 3 corresponds to the message queue server 41. 3 corresponds to the collection server 42, the management server 43, the execution server 44, the database server 45, and the dashboard server 46. The computer resource control system 1 and the monitoring target application server 40 operate on the cloud 10.

　本実施例において、コンピュータリソース制御システム１は、複数の監視対象アプリケーションサーバ４０を監視することができる。また、各監視対象アプリケーションサーバ４０は、複数のレプリケーションにより分散化されていてもよい。つまり、各監視対象アプリケーションサーバ４０は、複数の仮想サーバによって構成され、各監視対象アプリケーションサーバ４０を構成する仮想サーバの台数は動的に変更できるようになっていてもよい。例えば、１，０００台の監視対象の実サーバがあると仮定して、実サーバ毎に１０台の仮想サーバを立ち上げると、仮想サーバは総計１０，０００台ということになる。監視ポイントが各仮想サーバにそれぞれ２０ポイントあるとすると、監視ポイントは全部で２００，０００箇所ということになる。また、監視対象は、単にアプリケーションに関するサービスを提供する狭義のアプリケーションサーバのみに限定されるものではない。各種サーバ、アプリケーション、プロセス、ジョブ、その他、クラウド１０の環境内に存在するあらゆるコンピュータリソースを監視対象にすることができることは言うまでもない。 In this embodiment, the computer resource control system 1 can monitor a plurality of monitoring target application servers 40. Each monitoring target application server 40 may be distributed by a plurality of replications. That is, each monitoring target application server 40 may be configured by a plurality of virtual servers, and the number of virtual servers constituting each monitoring target application server 40 may be dynamically changed. For example, assuming that there are 1,000 real servers to be monitored and 10 virtual servers are started up for each real server, the total number of virtual servers is 10,000. Assuming that there are 20 monitoring points for each virtual server, there are a total of 200,000 monitoring points. Further, the monitoring target is not limited to only an application server in a narrow sense that provides a service related to an application. It goes without saying that various servers, applications, processes, jobs, and other computer resources that exist in the environment of the cloud 10 can be monitored.

　監視対象の監視対象アプリケーションサーバ４０の監視ポイントには、監視データを計測するための監視エージェント３０が組み込まれる。具体的には、例えば、利用者が予め、監視対象の監視対象アプリケーションサーバ４０のインスタンスに監視エージェントプログラムをインストールする。コンピュータリソース制御システム１は、監視エージェント３０を用いて、監視対象クラスタ５０と呼ばれる所定の論理的な単位で監視対象を管理する。本実施例では、監視エージェント３０は、システム・エージェント４０１と、ログファイル・エージェント４０２のうち、いずれかを含む。システム・エージェント４０１は、実行中のプロセスを管理するモジュールである。実行中のプロセスとは、ＯＳやミドルウェアの他、アプリケーションなども対象となる。システム・エージェント４０１は、プロセス内部で起きた変化や挙動を捉え、収集したデータを定期的に又は所定のトリガ等に応じて非定期的にコンピュータリソース制御システム１に通知するほか、プロセス内部の変数を変えたり、プログラム内部のメソッドを呼び出すなどの操作を行う。ログファイル・エージェント４０２は、監視対象内に書き込まれたファイルを監視するモジュールである。アプリケーションの状況監視のためにログファイルを活用しているアプリケーションは多く、そうしたログを監視対象にすることで、アプリケーション開発者の意図にあった監視が可能になる。ログファイル・エージェント４０２が収集した情報は、システム・エージェント４０１と同様に、定期的に又は所定のトリガ等に応じて非定期的にコンピュータリソース制御システム１に通知される。 A monitoring agent 30 for measuring monitoring data is incorporated in the monitoring point of the monitoring target application server 40 to be monitored. Specifically, for example, the user installs a monitoring agent program in an instance of the monitoring target application server 40 to be monitored in advance. The computer resource control system 1 uses the monitoring agent 30 to manage monitoring targets in a predetermined logical unit called a monitoring target cluster 50. In this embodiment, the monitoring agent 30 includes one of a system agent 401 and a log file agent 402. The system agent 401 is a module that manages a process being executed. The process being executed includes not only the OS and middleware but also applications. The system agent 401 captures changes and behaviors occurring in the process, notifies the collected data to the computer resource control system 1 periodically or in response to a predetermined trigger, etc. Change operations and call methods inside the program. The log file agent 402 is a module that monitors files written in the monitoring target. There are many applications that use log files to monitor the status of applications, and by monitoring such logs, it becomes possible to monitor them according to the intention of the application developer. The information collected by the log file agent 402 is notified to the computer resource control system 1 periodically or non-periodically in response to a predetermined trigger or the like, similar to the system agent 401.

　監視対象としては、ＯＳレベルの状況から、ＪＶＭや監視対象アプリケーションなどのミドルウェア、アプリケーションまでを一括で監視することが好ましい。監視エージェント３０は、例えば、特定サービスの利用状況、ミドルウェアの混雑度、ＣＰＵ負荷、どのサーバにジョブが割り当てられているか、各ジョブの進捗がどのようになっているか、ブラックリスト入りしたサーバはどこか、といった内容を監視できる。また、監視ポイントを動的に変更することによって、監視すべき対象を監視対象の動作状況に応じて変更してもよい。これにより、キャンペーン中にキャンペーン商品の在庫量を監視対象に加えたり、キャンペーン商品が完売したらサービス内容を切り替えるといった処理が監視対象を停止させることなく、実施可能である。 As the monitoring target, it is preferable to collectively monitor from the OS level situation to middleware and applications such as JVM and monitoring target applications. For example, the monitoring agent 30 can determine the usage status of a specific service, middleware congestion, CPU load, to which server a job is assigned, the progress of each job, and the blacklisted server. Can be monitored. Further, by dynamically changing the monitoring point, the target to be monitored may be changed according to the operation status of the monitoring target. Thus, processing such as adding the inventory amount of the campaign product to the monitoring target during the campaign or switching the service content when the campaign product is sold can be performed without stopping the monitoring target.

　メッセージキューサーバ４１は、監視対象アプリケーションサーバ４０に組み込まれた監視エージェント３０とコンピュータリソース制御システム１との間のデータ交換、及び、コンピュータリソース制御システム１内のサービス間のデータ交換を、非同期的に行うためのメッセージキューを提供する。本実施例において、監視エージェント３０と収集サーバ４２、管理サーバ４３、実行サーバ４４、及びダッシュボード・サーバ４６との間のデータ交換、並びに、収集サーバ４２、管理サーバ４３、実行サーバ４４、及びダッシュボード・サーバ４６の間のデータ交換は、メッセージキューサーバ４１内のメッセージキューを介して非同期的に行われる。ここで、データ交換とは、データの交換のみならず、タスク等の交換も含む。 The message queue server 41 asynchronously exchanges data between the monitoring agent 30 incorporated in the monitoring target application server 40 and the computer resource control system 1 and data exchange between services in the computer resource control system 1. Provides a message queue to do. In this embodiment, data exchange between the monitoring agent 30 and the collection server 42, the management server 43, the execution server 44, and the dashboard server 46, and the collection server 42, the management server 43, the execution server 44, and the dash Data exchange between the board server 46 is performed asynchronously via a message queue in the message queue server 41. Here, data exchange includes not only data exchange but also task exchange.

　本実施例では、メッセージキューサーバ４１内のメッセージキューとして、データ収集キュー４１１と、データ管理キュー４１２と、管理キュー４１３と、実行キュー４１４とを含む。各キューは、データを先入れ先出し（ＦＩＦＯ：Ｆｉｒｓｔ　Ｉｎ　Ｆｉｒｓｔ　Ｏｕｔ）のリスト構造で保持することが好ましい。また、各キューは、冗長構成が可能であり、キュー間通信を行うことで、メッセージキューデータの紛失を防ぐことができる。データ収集キュー４１１は、監視対象の監視エージェント３０から収集されたデータが順次入力され、収集サーバ４２によって順次読み出される。データ管理キュー４１２は、監視対象の監視エージェント３０から収集されたデータが順次入力され、管理サーバ４３によって順次読み出される。管理キュー４１３は、管理サーバ４３、実行サーバ４４、及びダッシュボード・サーバ４６から、監視対象のサーバを制御するためのデータ（タスク等を含む）が順次入力され、制御対象の監視エージェント３０によって順次読み出される。実行キュー４１４は、管理サーバ４３から、インスタンス起動制御や警告送信などのアクションの指示が順次入力され、実行サーバ４４によって順次読み出される。 In this embodiment, the message queue in the message queue server 41 includes a data collection queue 411, a data management queue 412, a management queue 413, and an execution queue 414. Each queue preferably holds data in a first-in first-out (FIFO: First In First Out) list structure. Each queue can have a redundant configuration, and message queue data can be prevented from being lost by performing inter-queue communication. Data collected from the monitoring agent 30 to be monitored is sequentially input to the data collection queue 411 and read sequentially by the collection server 42. Data collected from the monitoring agent 30 to be monitored is sequentially input to the data management queue 412 and read sequentially by the management server 43. In the management queue 413, data (including tasks) for controlling the monitoring target server is sequentially input from the management server 43, the execution server 44, and the dashboard server 46, and the management target monitoring agent 30 sequentially Read out. Execution instructions such as instance activation control and warning transmission are sequentially input from the management server 43 to the execution queue 414 and sequentially read by the execution server 44.

　収集サーバ４２は、監視エージェント３０から送信されたデータを分散データベースに登録する処理を実行する。収集サーバ４２は、データ収集キュー４１１に入力されたデータを順次取り出して、データベースサーバ４５に渡す。 The collection server 42 executes processing for registering data transmitted from the monitoring agent 30 in the distributed database. The collection server 42 sequentially retrieves data input to the data collection queue 411 and passes it to the database server 45.

　管理サーバ４３は、監視エージェント３０から送信されたデータをもとに、予め設定された制御ルールを参照して、インスタンス起動制御や警告送信などのアクションの実行要否を判断する処理を行う。管理サーバ４３は、データ管理キュー４１２に入力されたデータを順次取り出して、予め設定された制御ルールと比較する。ここで、制御ルールは、複数の制御ルールを含み得る。個々の制御ルールは、管理対象のサーバ群等の定義、監視エージェントが収集する情報の閾値の設定、及び、閾値を超えた場合の制御内容の定義を含むことが好ましい。また、個々の制御ルールは、監視エージェントの設定内容の変更などを含んでもよい。制御ルールは、コンピュータリソース制御システム１が予めデフォルトで定義したものを利用してもよいし、利用者が予め定義してもよい。好適には、利用者が制御ルールを定義するためのルールエディタが提供される。このルールエディタは、監視対象の単位であるクラスタの制御ルールを設定可能であり、計画的な変動に対する制御、データに応じた受動的な変動に対する制御、監視エージェントの設定変更、警告設定など、状況に応じた種々の制御ルールを設定できるようになっている。制御ルールは、例えばＲｕｂｙをベースとしたドメイン特化言語（ＤＳＬ）を用いて記述できるため、直感的で分かりやすいルールで記述できる。また、グラフィカル・エディタにより制御ルールを設定できるようにしてもよく、この場合は、ＤＳＬに馴染みのない利用者でも直感的にルールを記述できる。 The management server 43 refers to the control rules set in advance based on the data transmitted from the monitoring agent 30, and performs a process of determining whether or not an action such as instance activation control or warning transmission is necessary. The management server 43 sequentially takes out the data input to the data management queue 412 and compares it with a preset control rule. Here, the control rule may include a plurality of control rules. Each control rule preferably includes a definition of a server group to be managed, a setting of a threshold value of information collected by the monitoring agent, and a definition of control contents when the threshold value is exceeded. Each control rule may include a change in the setting contents of the monitoring agent. As the control rule, a rule defined by the computer resource control system 1 in advance as a default may be used, or a user may define it in advance. Preferably, a rule editor is provided for the user to define control rules. This rule editor can set the control rules of the cluster that is the unit to be monitored, such as control for planned fluctuation, control for passive fluctuation according to data, monitoring agent setting change, warning setting, etc. Various control rules can be set according to the conditions. Since the control rule can be described using, for example, a domain specific language (DSL) based on Ruby, it can be described with an intuitive and easy-to-understand rule. Further, the control rule may be set by a graphical editor. In this case, even a user who is not familiar with DSL can intuitively write the rule.

　図５は、制御ルールの一例である。同図の例は、「インスタンス内部で、ペンディング・スレッドが規定以上の状態を５秒以上続けていたら、同じサーバイメージからインスタンスを３台増やしなさい。」という条件と制御内容を規定している。他にも、例えば計画的な変動に対する制御ルールの一例として、「何月何日の何時何分になったら、ここのサーバを何台にしなさい。そして、時間がきたら、サーバを元の台数に戻しなさい。」といった内容を規定できる。また、収集されたデータに基づく受動的な変動に対する制御ルールとしては、例えば、処理するデータ量に基づいて、割り当てるサーバ台数を増減させるように規定できる。また、特定のサービスはスループットを低下させたくないような場合、アプリケーションごとに制御ルールのスケール基準を変えて設定する。商品が完売したらサービス内容を切り替えたい場合、アプリケーションを監視し、システム構成を変更するようなルールを設定する。 FIG. 5 is an example of a control rule. The example in the figure defines the condition and control contents “If the pending thread has been in a state exceeding the specified state for more than 5 seconds within the instance, increase the number of instances from the same server image.” For example, as an example of a control rule for planned fluctuations, “When what time of day, how many minutes, how many servers here. And when time comes, set the number of servers to the original number. You can stipulate the contents such as "Return." In addition, as a control rule for passive fluctuation based on collected data, for example, it can be defined that the number of servers to be allocated is increased or decreased based on the amount of data to be processed. Also, if you do not want to reduce the throughput of a specific service, set different scale rules for control rules for each application. If you want to switch the service content when the product is sold out, set a rule that monitors the application and changes the system configuration.

　なお、好適には、管理サーバ４３は、リソースの増減がパフォーマンスにどのような影響があるのかを自律的に学習し、制御の最適解を求め、制御ルールを書き換える。 Note that preferably, the management server 43 autonomously learns how the increase or decrease in resources has an effect on performance, obtains an optimal control solution, and rewrites the control rule.

　図４に戻り、管理サーバ４３は、監視エージェント３０から収集されたデータと制御ルールとを比較して、監視対象のシステム内のサーバ等のコンピュータリソースに対するアクションの要否を判断する。すなわち、監視データが制御ルールに規定された条件を満たさない場合には、管理サーバ４３は、アクションは不要であると判断する。一方、監視データが制御ルールに規定された条件を満たす場合に、管理サーバ４３は、アクションが必要であると判断し、その制御ルールに規定された制御内容のアクションを、その制御ルールに定義された管理対象のサーバ等のコンピュータリソースに対して実行する旨の指示を出力する。そして、この指示は、実行キュー４１４に入力される。 Referring back to FIG. 4, the management server 43 compares the data collected from the monitoring agent 30 with the control rule, and determines whether or not an action is required for a computer resource such as a server in the monitored system. In other words, when the monitoring data does not satisfy the condition defined in the control rule, the management server 43 determines that no action is required. On the other hand, when the monitoring data satisfies the condition stipulated in the control rule, the management server 43 determines that an action is necessary, and the action of the control content specified in the control rule is defined in the control rule. An instruction to execute to a computer resource such as a managed server is output. This instruction is input to the execution queue 414.

　実行サーバ４４は、インスタンスの起動や停止といった具体的なアクションを実行する処理を行う。実行サーバ４４は実行キュー４１４から、アクションの指示を順次読み出し、指示に応じて、制御ルールで規定された所定のコンピュータリソース（監視対象アプリケーションサーバ４０を含む）に対して、各種の制御を実行する。制御の内容としては、システムレベルからクラウドレベルまでの幅広い制御レベルに対応していることが好ましい。システムレベルの制御の一例としては、アプリケーションの特定機能のメソッド呼び出しや、内部変数の変更などがある。クラウドレベルでの制御の一例としては、インスタンスの起動・複製・停止、割り当てリソースの変更、起動インスタンスの設定変更などがある。つまり、仮想サーバの起動・複製・消去や仮想サーバの設定変更などを行うことができる。 The execution server 44 performs processing for executing specific actions such as starting and stopping of instances. The execution server 44 sequentially reads out action instructions from the execution queue 414, and executes various controls on predetermined computer resources (including the monitoring target application server 40) defined by the control rules in accordance with the instructions. . As the content of control, it is preferable to correspond to a wide range of control levels from the system level to the cloud level. As an example of system level control, there is a method call of a specific function of an application, an internal variable change, or the like. Examples of control at the cloud level include starting, duplicating, and stopping an instance, changing an allocated resource, and changing a setting of a starting instance. In other words, the virtual server can be activated / replicated / deleted or the virtual server setting can be changed.

　データベースサーバ４５は、監視エージェント３０によって収集されたデータを含む種々のデータを格納するデータベースである。このデータベースは、単独のデータベースサーバを前提とした仕組みではなく、複数のサーバが協調しあって性能を上げていく分散データベース構造であることが好ましい。これにより、データ量が膨大になっても、参加するサーバ台数を増やすことでキャパシティの対応ができる。また、複数サーバにレプリケーションを持たせることで、単一障害点のないデータベースが構築される。また、データベース性能劣化が予想される場合、サーバ台数を追加することによって、データベースの性能を維持することができる。 The database server 45 is a database that stores various data including data collected by the monitoring agent 30. This database is preferably not a mechanism based on a single database server but a distributed database structure in which a plurality of servers cooperate to increase performance. Thereby, even if the amount of data becomes enormous, the capacity can be accommodated by increasing the number of participating servers. In addition, by providing replication to multiple servers, a database without a single point of failure is constructed. In addition, when database performance degradation is expected, the database performance can be maintained by adding the number of servers.

　なお、データベースサーバ４５の一例として、分散ＫＶＳサーバが適用可能である。分散ＫＶＳサーバとは、保存したいデータ（Ｖａｌｕｅ）に、任意のラベル（Ｋｅｙ）を付けて、(Ｋｅｙ，Ｖａｌｕｅ)のペアを保存し、保存したデータを取得する際は、ラベル（Ｋｅｙ）を指定して、対応するデータ（Ｖａｌｕｅ）を取得するものであり、ＫＶＳとはＫｅｙ－Ｖａｌｕｅ　Ｓｔｏｒｅの略である。複数サーバにデータを分散保存するスケールアウト型であり、サーバを追加することで、大量のデータを扱うことができる。ＫＶＳサーバの一例として、ｍｏｎｇｏＤＢがある。なお、データベースサーバ４５は、ＫＶＳ方式のデータベースサーバを利用することが好ましいが、ＫＶＳに限定されるものではなく、他の方式によるデータベースサーバを用いてもよい。 As an example of the database server 45, a distributed KVS server is applicable. A distributed KVS server adds an arbitrary label (Key) to the data (Value) to be saved, saves a (Key, Value) pair, and specifies the label (Key) when retrieving the saved data Thus, corresponding data (Value) is acquired, and KVS is an abbreviation for Key-Value Store. It is a scale-out type in which data is distributed and stored on multiple servers, and a large amount of data can be handled by adding servers. As an example of the KVS server, there is mongoDB. The database server 45 is preferably a KVS database server, but is not limited to KVS, and may use a database server of another method.

　ダッシュボード・サーバ４６は、ユーザに対して各種の情報を表示してユーザからの操作を受け付けるためのダッシュボード４８をユーザ端末装置１２に提供するサーバである。ここで、ダッシュボード４８の画面は、所定の監視項目などの情報をクライアント装置で表示するためのものであり、見た目や機能が重要である。ダッシュボード４８の画面には、システムの監視状況やジョブの実行状況などのモニタリングの他、ＤＳＬによるバッチジョブのフローやリアルタイムシステムの監視及び制御などが含まれ、利用者とコンピュータリソース制御システム１とのインターフェースとして機能する。ダッシュボード・サーバ４６は、ユーザ端末装置１２からウェブサービス経由のアクセスを受け付け、データベースサーバ４５に格納されたデータを読み出してダッシュボード４８の画面を編集し、当該画面情報をユーザ端末装置１２に送信する。運用管理者は、ダッシュボード４８でシステムの監視、構成管理、制御設定を行う。また、ダッシュボード・サーバ４６は、監視情報が予め設定された閾値を超えた際に、ダッシュボード４８上に警告を表示すると共にメール通知を行う。これにより、ユーザは効率的に監視できる。 The dashboard server 46 is a server that provides the user terminal device 12 with a dashboard 48 for displaying various types of information to the user and accepting operations from the user. Here, the screen of the dashboard 48 is for displaying information such as predetermined monitoring items on the client device, and its appearance and function are important. The screen of the dashboard 48 includes monitoring of system monitoring status and job execution status, as well as batch job flow by DSL and monitoring and control of the real-time system. The user and the computer resource control system 1 Functions as an interface. The dashboard server 46 receives access via the web service from the user terminal device 12, reads the data stored in the database server 45, edits the screen of the dashboard 48, and transmits the screen information to the user terminal device 12. To do. The operation manager performs system monitoring, configuration management, and control settings on the dashboard 48. Also, the dashboard server 46 displays a warning on the dashboard 48 and sends an email notification when the monitoring information exceeds a preset threshold value. Thereby, the user can monitor efficiently.

　本実施例は、ダッシュボード・サーバ４６を交換することで、ユーザの要求に合わせた表示や操作を提供することが可能である。これにより、コンピュータリソース制御システムを他のシステムの一部として販売したりＯＥＭ販売したりすることが容易に実現できるようになっている。 In the present embodiment, it is possible to provide display and operation in accordance with a user request by exchanging the dashboard server 46. This makes it easy to sell the computer resource control system as part of another system or to sell it as an OEM.

　図６は、ダッシュボード４８の一例である。表示される画面の種類としては、例えば、メトリクスビュー、システム構成ビュー、ジョブネット監視ビュー、ログ監視ビュー、お知らせ一覧などがある。メトリクスビューは、監視エージェント３０から送信されているデータをリアルタイムで監視するための画面である。計測項目に応じたグラフが表示され、当該グラフはリアルタイムで更新される。また、過去のデータが表示されてもよい。システム構成ビューは、監視対象のシステム構成を俯瞰的に監視する画面である。各サーバの稼働状況の表示、サーバ内で稼働するプロセスの稼働状況の表示、サーバ間でのプロセス依存関係の表示等を行う。サーバリソースが閾値を超えた場合や、アプリケーションでエラーが発生した場合には、システム構成ビューで検知できるように表示される。ジョブネット監視ビューは、コンピュータリソース制御システム１が管理しているバッチジョブネットの実行状況を監視する画面である。実行状況に応じてアイコンの色を変化させ、視覚的に実行状況を示す。 FIG. 6 is an example of the dashboard 48. Examples of the displayed screen include a metrics view, a system configuration view, a job net monitoring view, a log monitoring view, and a notification list. The metrics view is a screen for monitoring data transmitted from the monitoring agent 30 in real time. A graph corresponding to the measurement item is displayed, and the graph is updated in real time. In addition, past data may be displayed. The system configuration view is a screen for monitoring the system configuration to be monitored from a bird's-eye view. It displays the operating status of each server, displays the operating status of processes running in the server, displays process dependencies between servers, and so on. When the server resource exceeds the threshold or an error occurs in the application, it is displayed so that it can be detected in the system configuration view. The job net monitoring view is a screen for monitoring the execution status of the batch job net managed by the computer resource control system 1. The icon color is changed according to the execution status to visually indicate the execution status.

　ログ監視ビューは、ログファイルの出力内容を監視し、監視すべきものとしてヒットした箇所を閲覧する画面である。アプリケーションエラーの検知やバッチジョブの進行状況把握のために使用される。ダッシュボード４８で何をどのように表示させるかは、利用者が自由に設定可能である。これにより、例えば、経営者は経営者の視点からのコンソール機能を、システム管理者はシステム運用上の監視制御コンソールなど、利用者のニーズに合わせたユーザインターフェースを実現できる。 The log monitoring view is a screen that monitors the output contents of the log file and browses the hit locations that should be monitored. Used for application error detection and batch job progress monitoring. The user can freely set what and how the dashboard 48 displays. Thus, for example, the manager can realize a console function from the manager's viewpoint, and the system administrator can realize a user interface that meets the needs of the user, such as a monitoring control console for system operation.

　図７は、本発明によるコンピュータリソース制御システム１の他の実施例を示すブロック図である。同図に示す実施例の構成は、コンピュータリソース制御システム１内のサーバに監視エージェント３０が組み込まれている他は、図４とほぼ同じである。本実施例では、コンピュータリソース制御システム１内の各サーバは、同じサーバイメージを有する複数の仮想サーバによって分散化されている。つまり、コンピュータリソース制御システム１は、複数のメッセージキューサーバ４１と、複数の収集サーバ４２と、複数の管理サーバ４３と、複数の実行サーバ４４と、複数のデータベースサーバ４５と、複数のダッシュボード・サーバ４６とを含む。ただし、障害発生時などの場合に、同じサーバイメージを有するサーバの台数が一時的に１つになることはあり得る。 FIG. 7 is a block diagram showing another embodiment of the computer resource control system 1 according to the present invention. The configuration of the embodiment shown in the figure is almost the same as that of FIG. 4 except that the monitoring agent 30 is incorporated in the server in the computer resource control system 1. In this embodiment, each server in the computer resource control system 1 is distributed by a plurality of virtual servers having the same server image. That is, the computer resource control system 1 includes a plurality of message queue servers 41, a plurality of collection servers 42, a plurality of management servers 43, a plurality of execution servers 44, a plurality of database servers 45, a plurality of dashboards, Server 46. However, the number of servers having the same server image may temporarily become one when a failure occurs.

　コンピュータリソース制御システム１内のサーバに組み込まれた監視エージェント３０は、監視対象アプリケーションサーバ４０に組み込まれた監視エージェント３０と同様に、監視対象サーバから収集した監視データをメッセージキューサーバ４１のデータ収集キュー４１１とデータ管理キュー４１２に入力する。以降の処理は、図４と同じである。すなわち、管理サーバ４３は、監視対象アプリケーションサーバ４０に組み込まれた監視エージェント３０から収集されたデータと同様に、コンピュータリソース制御システム１内のサーバに組み込まれた監視エージェント３０から収集されたデータに対して、予め定義された複数の制御ルールを参照して、コンピュータリソースに対するアクションの要否を判断する。そして、クラスタ毎のデータ量や処理量の変動に応じて、実行サーバ４４がコンピュータリソース制御システム１内のコンピュータリソースを制御する。一例として、実行サーバ４４がコンピュータリソース制御システム１内の各サーバの投入台数を増減させるなどの処理を実行することによって、最適なシステム構成が保持される。 Similar to the monitoring agent 30 incorporated in the monitoring target application server 40, the monitoring agent 30 incorporated in the server in the computer resource control system 1 collects the monitoring data collected from the monitoring target server in the data collection queue of the message queue server 41. 411 and the data management queue 412. The subsequent processing is the same as in FIG. In other words, the management server 43 performs the same processing on the data collected from the monitoring agent 30 incorporated in the server in the computer resource control system 1 as the data collected from the monitoring agent 30 incorporated in the monitoring target application server 40. Then, the necessity of action for the computer resource is determined with reference to a plurality of predefined control rules. Then, the execution server 44 controls computer resources in the computer resource control system 1 in accordance with fluctuations in the data amount and processing amount for each cluster. As an example, the execution server 44 executes processing such as increasing / decreasing the number of each server in the computer resource control system 1 to maintain an optimal system configuration.

　例えば、メッセージキューサーバ４１に組み込まれた監視エージェント３０は、メッセージキューサーバ４１内の各キュー、すなわち、データ収集キュー４１１、データ管理キュー４１２、管理キュー４１３、及び実行キュー４１４に投入されたデータ量や待ち行列の量を監視する。さらに、メッセージキューサーバ４１と、収集サーバ４２と、管理サーバ４３と、実行サーバ４４と、データベースサーバ４５と、ダッシュボード・サーバ４６のそれぞれに組み込まれた別の監視エージェント３０は、それぞれ各サーバの稼働状況を監視する。 For example, the monitoring agent 30 incorporated in the message queue server 41 receives the amount of data input to each queue in the message queue server 41, that is, the data collection queue 411, the data management queue 412, the management queue 413, and the execution queue 414. Monitor the amount of queues. Further, the separate monitoring agent 30 incorporated in each of the message queue server 41, the collection server 42, the management server 43, the execution server 44, the database server 45, and the dashboard server 46 is connected to each server. Monitor the operating status.

　一方、制御ルールには、データ収集キュー４１１の状態応じて収集サーバ４２のサーバ台数を増減させるためのルールが定義される。例えば、データ収集キュー４１１の待ち行列の量が所定の閾値を超えた場合には、インスタンス起動制御、すなわち、収集サーバ４２のレプリケーション（複製）を所定個数作成して、仮想サーバの数を増加させる、という条件と制御内容が定義される。待ち行列の量が所定の閾値以下になった場合には、インスタンス停止制御、すなわち、収集サーバ４２のレプリケーションを所定個数破棄（削除）して、仮想サーバの数を減らす、という条件と制御内容が定義される。他のキューに対して、同じような制御ルールが定義され、例えば、データ管理キュー４１２の状態に応じて管理サーバ４３のサーバ台数を増減させるためのルールが定義される。すなわち、データ管理キュー４１２の待ち行列の量が所定の閾値を超えた場合には、管理サーバ４３のレプリケーションを所定個数作成して、仮想サーバの数を増加させる一方、待ち行列の量が所定の閾値以下になった場合には、管理サーバ４３のレプリケーションを所定個数破棄して、仮想サーバの数を減らす、という条件と制御内容が定義される。また、実行キュー４１４の状態に応じて実行サーバ４４のサーバ台数を増減させるためのルールが定義される。すなわち、実行キュー４１４の待ち行列の量が所定の閾値を超えた場合には、実行サーバ４４のレプリケーションを所定個数作成して、仮想サーバの数を増加させる一方、待ち行列の量が所定の閾値以下になった場合には、実行サーバ４４のレプリケーションを所定個数破棄して、仮想サーバの数を減らす、という条件と制御内容が定義される。さらに、管理キュー４１３の状態に応じて任意のサーバのサーバ台数を増減させるためのルールが定義される。すなわち、管理キュー４１３の待ち行列のうち、ある特定のサーバに対するアクションの待ち行列の量が所定の閾値を超えた場合には、当該特定のサーバのレプリケーションを所定個数作成して、仮想サーバの数を増加させる一方、待ち行列の量が所定の閾値以下になった場合には、その特定のサーバのレプリケーションを所定個数破棄して、仮想サーバの数を減らす、という条件と制御内容が定義される。また、メッセージキューサーバ４１内のキュー全体の状況に応じて、メッセージキューサーバ４１のサーバ台数を増減させるためのルールが定義されてもよい。 On the other hand, in the control rule, a rule for increasing or decreasing the number of servers of the collection server 42 according to the state of the data collection queue 411 is defined. For example, when the queue amount of the data collection queue 411 exceeds a predetermined threshold value, the instance activation control, that is, the replication (replication) of the collection server 42 is created to increase the number of virtual servers. And the control content are defined. When the amount of the queue is equal to or less than the predetermined threshold, the instance stop control, that is, the condition and the control content that the predetermined number of replications of the collection server 42 are discarded (deleted) and the number of virtual servers is reduced. Defined. Similar control rules are defined for other queues. For example, a rule for increasing or decreasing the number of management servers 43 in accordance with the state of the data management queue 412 is defined. That is, when the queue amount of the data management queue 412 exceeds a predetermined threshold, a predetermined number of replications of the management server 43 are created to increase the number of virtual servers, while the queue amount is equal to the predetermined queue amount. When the threshold value is less than or equal to the threshold, a condition and control contents are defined such that a predetermined number of replications of the management server 43 are discarded and the number of virtual servers is reduced. Also, rules for increasing or decreasing the number of execution servers 44 according to the state of the execution queue 414 are defined. That is, when the queue amount of the execution queue 414 exceeds a predetermined threshold value, a predetermined number of replications of the execution server 44 are created to increase the number of virtual servers, while the queue amount is equal to the predetermined threshold value. In the following case, a condition and control contents are defined such that a predetermined number of replications of the execution server 44 are discarded and the number of virtual servers is reduced. Furthermore, a rule for increasing or decreasing the number of arbitrary servers according to the state of the management queue 413 is defined. In other words, if the amount of the queue of actions for a specific server in the queue of the management queue 413 exceeds a predetermined threshold, a predetermined number of replications of the specific server are created, and the number of virtual servers On the other hand, if the queue amount falls below a predetermined threshold, a condition and control contents are defined such that a predetermined number of replications of the specific server are discarded and the number of virtual servers is reduced. . A rule for increasing or decreasing the number of message queue servers 41 may be defined according to the status of the entire queue in the message queue server 41.

　また、他の制御ルールとして、各サーバの稼働状況に応じて、サーバのレプリケーションや破棄を動的に制御するためのルールが定義されることが好ましい。すなわち、あるサーバの稼働状況が所定の閾値を超えている場合には、そのサーバのレプリケーションを作成し、稼働状況が所定の閾値以下の場合には、そのサーバを破棄する。また、サーバが異常な挙動を示している場合には、利用者に警告を通知する。異常が直らない場合には、そのサーバを破棄して、新たにサーバのレプリケーションを作成することによって、サーバを立ち上げ直してもよい。 Also, as another control rule, it is preferable that a rule for dynamically controlling server replication or destruction is defined according to the operating status of each server. That is, when the operating status of a server exceeds a predetermined threshold, replication of the server is created, and when the operating status is equal to or lower than the predetermined threshold, the server is discarded. Further, when the server shows an abnormal behavior, a warning is notified to the user. If the problem persists, the server may be restarted by discarding the server and creating a new server replication.

　このような構成を取ることにより、コンピュータリソース制御システム１は、クラウド環境内の監視対象システムを監視する仕組みと同じ仕組みによって、コンピュータリソース制御システム１自身に含まれるコンピュータリソースを動的に制御することが可能になる。こうして、コンピュータリソース制御システム１は、監視対象アプリケーションサーバ４０等の監視対象の状況をリアルタイムに把握し、遅滞なく制御するばかりでなく、コンピュータリソース制御システム１自身の状況をリアルタイムに把握し、遅滞なく制御することができるようになる。 By adopting such a configuration, the computer resource control system 1 dynamically controls the computer resources included in the computer resource control system 1 by the same mechanism as the mechanism for monitoring the monitoring target system in the cloud environment. Is possible. Thus, the computer resource control system 1 not only grasps the status of the monitoring target such as the monitored application server 40 in real time and controls it without delay, but also grasps the status of the computer resource control system 1 itself in real time without delay. Will be able to control.

　本実施例は単一障害点のない分散構造で構成されるため、どこか単一の機能に障害が発生しても、全体としてはダウンしない構造になっている。また、計画的ないし突発的な負荷の増加にも動的に対応できる構造になっており、利用者や監視対象の増加に対して、コンピュータリソース制御システム１を構成するサーバのサーバ台数を増加させるなど、コンピュータリソースを制御することによって、サービスレベルを維持するように構成されている。 Since this embodiment is composed of a distributed structure without a single point of failure, even if a failure occurs somewhere in a single function, the structure does not go down as a whole. Moreover, it has a structure that can dynamically cope with a planned or sudden increase in load, and increases the number of servers constituting the computer resource control system 1 in response to an increase in users and monitoring targets. Etc., and configured to maintain a service level by controlling computer resources.

　なお、コンピュータリソース制御システム１は、ＡＰＩ（Ａｐｐｌｉｃａｔｉｏｎ　Ｐｒｏｇｒａｍ　Ｉｎｔｅｒｆａｃｅ）として提供されることが好ましい。 The computer resource control system 1 is preferably provided as an API (Application Program Interface).

　次に、本実施例におけるコンピュータリソース制御システム１の動作について説明する。 Next, the operation of the computer resource control system 1 in this embodiment will be described.

　図８は、コンピュータリソース制御システム１における処理のフローチャートの一例である。 FIG. 8 is an example of a flowchart of processing in the computer resource control system 1.

　まず、クラウド環境内の監視ポイントに埋め込まれた監視エージェント３０が監視データを収集してメッセージキューサーバ４１に送信する（Ｓ８１）。なお、監視エージェント３０は、定期的にまたは非定期的に監視データをメッセージキューサーバ４１に送り続ける。メッセージキューサーバ４１は、受信したデータを、データ収集キュー４１１とデータ管理キュー４１２に入れる。収集サーバ４２は、データ収集キュー４１１から監視データを順次読み出して、データベースサーバ４５のデータストアに監視データを登録する（Ｓ８２）。収集サーバ４２は、監視データの登録を終えると、次の監視データをメッセージキューから読み出して、Ｓ８２の処理を繰り返す。 First, the monitoring agent 30 embedded in the monitoring point in the cloud environment collects monitoring data and transmits it to the message queue server 41 (S81). Note that the monitoring agent 30 continues to send monitoring data to the message queue server 41 periodically or irregularly. The message queue server 41 puts the received data into the data collection queue 411 and the data management queue 412. The collection server 42 sequentially reads the monitoring data from the data collection queue 411 and registers the monitoring data in the data store of the database server 45 (S82). After completing the registration of the monitoring data, the collection server 42 reads the next monitoring data from the message queue, and repeats the process of S82.

　ダッシュボード・サーバ４６は、利用者からのリクエストに応じて、監視対象の状況等を閲覧するためのダッシュボード４８を作成し、ネットワークＮを介してユーザ端末装置１２に送信する（Ｓ８３）。ユーザ端末装置１２は、受信したダッシュボード４８をブラウザ上で表示する（Ｓ８４）。 The dashboard server 46 creates a dashboard 48 for browsing the status of the monitoring target in response to a request from the user, and transmits it to the user terminal device 12 via the network N (S83). The user terminal device 12 displays the received dashboard 48 on the browser (S84).

　また、データの登録処理に並行して、管理サーバ４３は、データ管理キュー４１２から監視データを読み出し、制御ルールと対比して（Ｓ８５）、コンピュータリソースに対するアクションの要否を判断する（Ｓ８６）。監視データが制御ルールに規定された条件を満たさない場合には、アクションが不要と判断する（Ｓ８６：Ｎｏ）。一方、監視データが制御ルールに規定された条件を満たす場合には、アクションが必要であると判断し（Ｓ８６：Ｙｅｓ）、具体的なアクションの指示をメッセージキューサーバ４１の実行キュー４１４に送信する（Ｓ８７）。その後、管理サーバ４３は、データ管理キュー４１２から監視データを再度読み出し、Ｓ８５からＳ８７までの一連の処理を繰り返す。 In parallel with the data registration process, the management server 43 reads the monitoring data from the data management queue 412 and compares it with the control rule (S85) to determine whether an action is required for the computer resource (S86). If the monitoring data does not satisfy the conditions defined in the control rule, it is determined that no action is required (S86: No). On the other hand, if the monitoring data satisfies the conditions defined in the control rule, it is determined that an action is necessary (S86: Yes), and a specific action instruction is transmitted to the execution queue 414 of the message queue server 41. (S87). Thereafter, the management server 43 reads the monitoring data from the data management queue 412 again, and repeats a series of processes from S85 to S87.

　実行サーバ４４は、実行キュー４１４からアクションの指示を読み出して、インスタンスの起動や停止など、コンピュータリソースに対する具体的なアクションを実行するための処理データをメッセージキューサーバ４１の管理キュー４１３に送信する（Ｓ８８）。その後、実行サーバ４４は、実行キューからアクション指示を再度読み出し、Ｓ８８の処理を繰り返す。 The execution server 44 reads an action instruction from the execution queue 414 and transmits processing data for executing a specific action for the computer resource, such as start or stop of the instance, to the management queue 413 of the message queue server 41 ( S88). Thereafter, the execution server 44 reads the action instruction again from the execution queue and repeats the process of S88.

　管理キュー４１３に入力された処理データは、アクションの対象となる監視エージェント３０に順次読み出され、サーバの複製や破棄などのアクションが実行される（Ｓ８９）。 The processing data input to the management queue 413 is sequentially read by the monitoring agent 30 that is the target of the action, and actions such as server duplication and destruction are executed (S89).

　なお、本発明は、上記した実施の形態に限定されるものではなく、本発明の要旨を逸脱しない範囲内において、他の様々な形で実施することができる。このため、上記実施形態はあらゆる点で単なる例示にすぎず、限定的に解釈されるものではない。例えば、上述の各処理ステップは処理内容に矛盾を生じない範囲で任意に順番を変更して又は並列に実行することができる。 The present invention is not limited to the above-described embodiment, and can be implemented in various other forms without departing from the gist of the present invention. For this reason, the said embodiment is only a mere illustration in all points, and is not interpreted limitedly. For example, the above-described processing steps can be executed in any order or in parallel as long as there is no contradiction in the processing contents.

　また、上述の実施例では、収集サーバ４２と、管理サーバ４３と、実行サーバ４４とが、それぞれ異なるサーバによって構成されるものとしたが、これらのサーバのうち任意の２つ又は全てのサーバの機能を、１つのサーバにまとめて構成してもよいものであることは言うまでもない。例えば、収集サーバ４２と管理サーバ４３を１つにまとめた処理サーバを設け、この処理サーバが、同じサーバイメージを有する複数の仮想サーバによって分散化されていてもよい。また、管理サーバ４３と実行サーバ４４を１つにまとめた処理サーバを設けたり、収集サーバ４２と管理サーバ４３と実行サーバ４４とを１つにまとめた処理サーバを設けたりしてもよい。そして、これらの処理サーバに監視エージェント３０を組み込んで、コンピュータリソース制御システム１が処理サーバを監視及び制御するものとしてもよい。 Further, in the above-described embodiment, the collection server 42, the management server 43, and the execution server 44 are configured by different servers, but any two or all of these servers are included. Needless to say, the functions may be configured in one server. For example, a processing server in which the collection server 42 and the management server 43 are combined into one may be provided, and this processing server may be distributed by a plurality of virtual servers having the same server image. Further, a processing server in which the management server 43 and the execution server 44 are combined may be provided, or a processing server in which the collection server 42, the management server 43, and the execution server 44 are combined may be provided. Then, the monitoring agent 30 may be incorporated in these processing servers, and the computer resource control system 1 may monitor and control the processing servers.

　さらに、上述の実施例では、監視エージェント３０は、システム・エージェント４０１と、ログファイル・エージェント４０２のうち、いずれかを含むものとしたが、これらに限定されず、任意の事象を監視するエージェントを適用してもよい。 Furthermore, in the above-described embodiment, the monitoring agent 30 includes one of the system agent 401 and the log file agent 402. However, the monitoring agent 30 is not limited to these, and an agent that monitors an arbitrary event is used. You may apply.

１　コンピュータリソース制御システム、１０　クラウド、１２　ユーザ端末装置、２０　コンピュータ装置群、３０　監視エージェント、３２　メッセージネットワーク、３４　処理部、４０　監視対象アプリケーションサーバ、４１　メッセージキューサーバ、４２　収集サーバ、４３　管理サーバ、４４　実行サーバ、４５　データベースサーバ、４６　ダッシュボード・サーバ、４８　ダッシュボード、５０　監視対象クラスタ、Ｎ　ネットワーク 1 computer resource control system, 10 cloud, 12 user terminal device, 20 computer device group, 30 monitoring agent, 32 message network, 34 processing unit, 40 monitored application server, 41 message queue server, 42 collection server, 43 management server, 44 execution servers, 45 database servers, 46 dashboard servers, 48 dashboards, 50 monitored clusters, N networks

Claims

A computer resource control system for controlling computer resources, the computer resource control system comprising:
Based on the data collected from the monitoring agent, it is determined whether or not the computer resource needs to be controlled, and when it is determined that the computer resource needs to be controlled, an instruction for executing the control on the computer resource is output. A processing unit;
A message queue for exchanging data between the monitoring agent and the processing unit;
A first monitoring agent for monitoring the status of the message queue;
With
The processing unit determines whether or not control of the computer resource is necessary based on data collected from the first monitoring agent;
Computer resource control system.

Control over the computer resources includes starting, copying, stopping or changing settings of any computer resource in the cloud.
The computer resource control system according to claim 1.

Control over the computer resources includes processing to increase or decrease the number of arbitrary servers in the cloud,
The computer resource control system according to claim 1.

The processing unit compares the data collected from the monitoring agent with a predefined control rule to determine whether or not the computer resource needs to be controlled;
The computer resource control system according to any one of claims 1 to 3.

A message queue server, the message queue server comprising the message queue and the first monitoring agent;
The computer resource control system according to claim 1, wherein the computer resource control system is a computer resource control system.

A computer resource control system for controlling computer resources, the computer resource control system comprising:
A management server that compares data collected from a plurality of monitoring agents with a predefined control rule to determine whether an action is required for the computer resource;
An execution server that outputs an instruction to execute an action on the computer resource when the management server determines that an action on the computer resource is required;
A message queue server for asynchronously exchanging data among the monitoring agent, the management server, and the execution server, the message comprising a first monitoring agent for monitoring the status of the message queue server A queue server,
Including
The action includes a process of increasing or decreasing the number of servers included in the computer resource control system based on data collected from the first monitoring agent.
A computer resource control system.

At least one of the message queue server, the management server, and the execution server is configured by a virtual server,
The action includes a process of increasing or decreasing the number of virtual servers included in the computer resource control system based on data collected from the first monitoring agent.
The computer resource control system according to claim 6.

The first monitoring agent monitors data exchange status;
The computer resource control system according to claim 6 or 7, characterized by the above.

The message queue server
Data management queues sequentially input data collected from the plurality of monitoring agents and sequentially read out by the management server;
An execution queue sequentially input action instructions from the management server and sequentially read out by the execution server;
Management data sequentially input processing data for executing an action on the computer resource from the execution server, and sequentially read by the corresponding monitoring agent;
With
The first monitoring agent monitors a queue of the data management queue, the execution queue, and the management queue;
The computer resource control system according to claim 6, wherein the computer resource control system is a computer resource control system.

Each server in the computer resource control system includes a second monitoring agent that monitors the operating status of each server,
The management server determines whether or not an action is required for the computer resource based on the operating status of each server.
The computer resource control system according to claim 6, wherein the computer resource control system is a computer resource control system.

The computer resource control system further includes:
A database server for storing the data;
A collection server that reads data collected from the plurality of monitoring agents from the message queue server and registers the data in the database server;
A dashboard server that reads and edits data stored in the database server, and transmits the data to a user terminal device;
The computer resource control system according to claim 6, further comprising:

In a control system for controlling computer resources, a processing device provided in the control system performs processing,
The processing device is
Determining whether to control the computer resource based on data collected from the monitoring agent; and
Outputting an instruction to execute control on the computer resource when it is determined that control on the computer resource is required;
Exchanging data between the monitoring agent and the control system via a message queue;
With
A first monitoring agent monitors the status of the message queue;
The step of determining determines whether or not control of the computer resource is necessary based on data collected from the first monitoring agent;
Computer resource control method.

Control over the computer resources includes starting, copying, stopping or changing settings of any computer resource in the cloud.
The computer resource control method according to claim 12.

Control over the computer resources includes processing to increase or decrease the number of arbitrary servers in the cloud,
The computer resource control method according to claim 12.

The step of determining determines whether or not the computer resource needs to be controlled by comparing data collected from the monitoring agent with a predefined control rule;
The computer resource control method according to claim 12, wherein the computer resource control method is a computer resource control method.

In a control system for controlling computer resources, a processing device provided in the control system performs processing,
The processing device is
Comparing the data collected from the monitoring agent with predefined control rules to determine the need for action on the computer resource;
Outputting an instruction to perform an action on the computer resource when it is determined that an action on the computer resource is required;
Exchanging data asynchronously between the monitoring agent and the control system;
With
A first monitoring agent monitors the status of the server processing the replacing step;
The action includes a process of increasing or decreasing the number of servers included in the control system based on data collected from the first monitoring agent.
A computer resource control method.

A computer-readable recording medium storing a program for causing a computer to execute the computer resource control method according to any one of claims 12 to 16.