[go: up one dir, main page]

WO2002003522A1 - Systeme et procede d'execution de tests de disponibilite de systeme et de maintenance, sans presence humaine, pour des programmes de service de serveur - Google Patents

Systeme et procede d'execution de tests de disponibilite de systeme et de maintenance, sans presence humaine, pour des programmes de service de serveur Download PDF

Info

Publication number
WO2002003522A1
WO2002003522A1 PCT/US2001/020774 US0120774W WO0203522A1 WO 2002003522 A1 WO2002003522 A1 WO 2002003522A1 US 0120774 W US0120774 W US 0120774W WO 0203522 A1 WO0203522 A1 WO 0203522A1
Authority
WO
WIPO (PCT)
Prior art keywords
program
server service
service program
server
properly running
Prior art date
Application number
PCT/US2001/020774
Other languages
English (en)
Inventor
Timothy Haggerty
Original Assignee
Ge Financial Assurance Holdings, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ge Financial Assurance Holdings, Inc. filed Critical Ge Financial Assurance Holdings, Inc.
Priority to AU2001271645A priority Critical patent/AU2001271645A1/en
Publication of WO2002003522A1 publication Critical patent/WO2002003522A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0817Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking functioning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0659Management of faults, events, alarms or notifications using network fault recovery by isolating or reconfiguring faulty entities
    • H04L41/0661Management of faults, events, alarms or notifications using network fault recovery by isolating or reconfiguring faulty entities by reconfiguring faulty entities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/50Network service management, e.g. ensuring proper service fulfilment according to agreements
    • H04L41/5003Managing SLA; Interaction between SLA and QoS
    • H04L41/5009Determining service level performance parameters or violations of service level contracts, e.g. violations of agreed response time or mean time between failures [MTBF]
    • H04L41/5012Determining service level performance parameters or violations of service level contracts, e.g. violations of agreed response time or mean time between failures [MTBF] determining service availability, e.g. which services are available at a certain point in time
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/008Reliability or availability analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/1438Restarting or rejuvenating
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display

Definitions

  • the present invention relates generally to a system and method for performing unattended system availability tests and maintenance on server service programs.
  • the invention relates to a system and method for providing a notification of a status of a server service program, its related components and sub-components, and for performing maintenance on the same.
  • Server service programs are server-based applications which are designed to be accessed by a plurality of users or "clients" on a networked system. Each server service program may provide and perform a wide variety of tasks for the use of the clients within a network. For instance, a specific server service program may provide a service of retrieving and analyzing business data collected from a plurality of databases throughout a network. Increasingly, server service programs are designed to take advantage of a wider availability of larger amounts of computing resources in order to provide more sophisticated services for clients. Today, a single server service program may have access to and rely on several other servers and server service programs. A server service program of a particular company may, for example, use several databases and servers to coordinate the data for all of the company's sales activities.
  • server service programs have become increasingly difficult to monitor and maintain.
  • One particular reason for this is that any one server service program may rely on many different subcomponents to properly function, hi turn, each of these sub-components may lie within a different layer of network architecture and use a different set of network interfaces.
  • a status of a server service program may be measured solely by whether such server service program as a whole is continuing to operate (i.e., application level monitoring), although many of the server service program's sub-components may no longer be properly responding.
  • This type of application level monitoring has many drawbacks. First, using this type of application level monitoring, a network administrator is often unaware of a problem until a server service program unsuccessfully attempts to access a failed subcomponent. Secondly, using this type of application level monitoring, it is often impossible to determine the exact cause of a failure. Consequently, many times an entire server service program has to be restarted in order to restart operation of a single failed sub-component, thereby resulting in an increase in administrative time, network resources and other inefficiencies.
  • the system and method of the present invention are advantageous because they provide for monitoring of each component of a server service program and for one or more notifications of a status of the server service program.
  • the present invention relates to a method for performing an unattended system availability test and maintenance on a server service program incorporating at least one task and having access to at least one complimentary program via at least one link.
  • the method comprises the steps of determining whether the at least one link is active; determining whether the at least one complimentary program is properly running; determining whether the server service program is properly running; and determining whether the at least one task within the server service program is properly running.
  • the present invention relates to a system for performing an unattended system availability test and maintenance on a server service program incorporating at least one task and having access to at least one complimentary program via at least one link.
  • the system comprises a first testing element for determining whether the at least one link is active; a second testing element for determining whether the at least one complimentary program is properly running; a third testing element for determining whether the server service program is properly running; and a fourth testing element for determining whether the at least one task within the server service program is properly running.
  • Figure 1 is a simplified schematic representation illustrating one example of a computer network configuration for use with one embodiment of the present invention.
  • Figure 2 is a simplified flowchart of a method for performing a plurality of unattended system availability tests and maintenance for a server service program in accordance with one embodiment of the present invention.
  • FIG. 1 illustrates an example of a network arrangement 50 employing a system and method of the present invention in accordance with a preferred embodiment of the invention. It should be understood that the present invention operates independent of any particular arrangement or mix of network components and that network 50 depicted in Figure 1 is purely illustrative and simplified for the purpose of explanation.
  • network 50 comprises a client 10, an application server 12, a database server 18, and a gateway server 22.
  • Application server 12 includes a processor module 16, a service control manager (SCM) 15, a server service program 14 and a maintenance program 40.
  • Gateway server 22 includes a gateway service program 21.
  • Database server 18 includes a database program 19.
  • Server service program 14 gains access to database 18 via a link 30 and to gateway server 22 via a link 34.
  • Client 10 gains access to the database 18 via a link 26.
  • application server 12 is operated using the Windows NTTM operating system.
  • server service program 14 may be any of a variety of server service programs which rely on a plurality of supporting connections to a plurality of other programs and databases in order to provide services to a plurality of clients.
  • server service program 14 is preferably a sales program such as Siebel Sales EnterpriseTM which relies on access to a plurality of sales information from across a business entity's computer network.
  • server service program 14 may rely on a variety of programs, databases and sub-components (hereafter collectively referred to as complimentary programs) for providing service to client 10. These complimentary programs, may include any programs or functions relied upon by the server service program 14 including, for example, those programs performed by a printer server, a web server, a mail server, a database server, a file server, a proxy server or an application server.
  • server service program 14 is shown in Figure 1 relying on database program 19 within database server 18, and gateway service program 21 within gateway server 22. Using these inputs, server service program 14 may provide client 10, for instance, with a plurality of sales analyses and other business services. However, if server service program 14 cannot access these complimentary programs 19 and 21, then server service program 14 will be unable to perform the sales analyses and other business services. Accordingly, maintenance program 40 is provided to test and maintain server service program 14 and its necessary complimentary programs 19 and 21.
  • maintenance program 40 should be co- located with the server service program 14 and the SCM 15 so that maintenance program 40 can directly interrogate the SCM 15. Further, according to a preferred embodiment, maintenance program 40 should not have a need to interact with a user stationed at the client 10 via a graphical user interface (GUI) or another user interface means since the maintenance program 40 is intended to continually monitor the network 50 without a need to wait for the user at the client 10 to respond to a query or a message box. Accordingly, maintenance program 40 is preferably compiled as an unattended executable program so that all message classes that may require interaction via a GUI or other user interface are redirected to, for instance, a Windows NTTM event log.
  • GUI graphical user interface
  • maintenance program 40 is preferably configured to run as a Windows NTTM service. This enables execution of the maintenance program 40 to be initiated when the network 50 starts up without a need for the maintenance program 40 to be launched by the user at the client 10. Additionally, this will allow maintenance program 40 to run in the same conditions and at the same time as any server service program it oversees.
  • Figure 2 is a flowchart illustrating the steps in the method of the present invention. As a first step 51, the maintenance program 40 is initiated and begins to run. Once initiated, in step 52, the maintenance program 40 checks the initiation file (INI file) of application server 12 for information on the configuration of the network 50 and makes a determination as to which complimentary programs are needed for proper operation of server service program 14. This determination is based upon the configuration of the network 50 and a plurality of preprogrammed user information saved within maintenance program 40.
  • initiation file ITI file
  • Step 54 once the necessary complimentary programs are determined, i.e., programs 19 and 21, maintenance program 40 begins conducting a plurality of maintenance checks by checking for a plurality of active links to each of these complimentary programs 19 and 21.
  • the necessary complimentary programs 19 and 21 include at least one database and a link to this database is checked by maintenance program 40 using an ODBC (Open Data Base Connectivity) interface.
  • Step 56 if the links to any of the necessary complimentary programs 19 and
  • maintenance program 40 clears all information regarding any active links and, in Step 58, checks to see whether any of the links to one of the complimentary programs 19 and 21 is unavailable due to a prescheduled downtime. If any of the links to one of the complimentary programs 19 and 21 is found to be unavailable due to a prescheduled downtime, then, in Step 60, maintenance program 40 waits a predetermined amount of time (corresponding to the prescheduled downtime) and, following the prescheduled downtime, in Step 52, maintenance program 40 proceeds to recheck the INI file without initiating any maintenance of the server service program 14 or a notification to the user.
  • the maintenance program 40 can determine that it was in the pre-scheduled downtime status for a system backup because it had previously checked for the link to complimentary program 19 and found it missing, but then in step 54 it found such link active. Moreover, the maintenance program 40 recognizes that it found the link to the complimentary program 19 missing at a time it expected to find such link missing (i.e., prescheduled downtime for backup over a weekend time period). In other words, maintenance program 40 recognizes that the loss of the link is a normal expected loss of connection within predefined time parameters and that the link is now restored. Maintenance program 40 then sends an email notification to warn that it is going to shut down the programs and the servers.
  • Maintenance program 40 then shuts down the tasks within the server service program 14 and the gateway service program 21 (if needed), and next shuts down programs 14 and 21 in order to clear a plurality of service log files. Maintenance program 40 then sends a special command to application server 12 to "restart" with a one-time startup delay. Maintenance program 40 starts at step 51 and proceeds to step 52 to check the TNI file and then waits for the one-time startup delay. The server service program 14 and the gateway service program 21 are then started. Following the startup delay, the maintenance program 40 begins at step 54. (These steps are not shown in Figure 2.) It is important to note that maintenance program 40 may perform these shut down and restart steps unattended by the user. Prior to the present invention, these operations would have required user monitoring.
  • Step 74 the maintenance program 40 initiates an email notification.
  • the email notification should be sent using an application such as SendSMTP.EXETM produced by Greyware Automation Products, Inc. or a similar application which does not use a messaging application program interface (MAPI) for messaging.
  • the email notification application may be turned off if the user does not desire to be informed of the status of system 50.
  • Step 62 maintenance program 40 proceeds to check each necessary complimentary program 19 and 21 for proper functioning. If any one of the necessary complimentary programs 19 and 21 is unresponsive, in Step 74, maintenance program 40 initiates an email notification to the user, goes back to Step 60 and waits a predetermined length of time, and then loops back to Step 52, and proceeds to recheck the INI file.
  • Step 66 maintenance program 40 tests the server service program 14 and the gateway service program 21 to make sure each service program is active.
  • both the server service program 14 and the gateway service program 21 are preferably tested using Windows NT TM service calls via the SCM 15. If either the server service program 14 or the gateway service program 21 are found to be unresponsive, in Step 74, maintenance program 40 then initiates an email notification to the user, goes back to Step 60 and waits a predetermined length of time, and then, following the predetermined length of time, loops back to Step 52 where maintenance program 40 proceeds to recheck the TNI file.
  • Step 68 maintenance program 40 proceeds to check to verify that each one of a plurality of tasks within the server service program 14 and the gateway service program 21 (if there are any tasks within the gateway service program 21) is properly running.
  • the checks performed in Step 68 are preferably accomplished using a disk operating system (DOS) interface.
  • DOS disk operating system
  • the checks of server service program 14 and gateway service program 21 may include, for example, a plurality of checks of the status of any of the tasks being run by the server service program 14 and gateway service program 21, a plurality of checks for any changes or updates to server service program 14 and gateway service program 21, and a plurality of checks for any updates for tasks not currently being run by the server service program 14 and gateway service program 21. If any one of the necessary tasks of server service program 14 and gateway service program 21 are found to be improperly running, then, in Step 70, the maintenance program 40 may attempt to restart the improperly running one of the necessary tasks. In Step 72, a determination is made as to whether any one of the necessary tasks is unable to be restarted.
  • Step 74 maintenance program 40 initiates an email notification to the user, and then goes back to Step 60 and waits a predetermined length of time, and following the predetermined length of time, loops back to Step 52 and proceeds to recheck the INI file. If each of the failed tasks is successfully restarted, then maintenance program 40 goes to Step 60 and waits a predetermined length of time before looping back to Step 52 and proceeding to recheck the INI file.
  • the system and method of the present invention may be used in a variety of network configurations in which a server service program relies on additional complimentary program resources to serve a client.
  • the system and method of the invention are also highly flexible and can be easily modified and customized to fit specific situations.
  • the present invention may be used within network arrangements such as a local area network (LAN) including an Ethernet and a Token Ring access method, a metropolitan area network (MAN), and a wide area network (WAN).
  • LAN local area network
  • MAN metropolitan area network
  • WAN wide area network
  • the preferred embodiments are discussed with reference to the Windows NTTM environment, the present invention may also be used in a variety of other server platforms and operating environments such as, for example, Windows 95, 98 and 2000, Unix, OS/2 and NetWare.
  • the present invention may be used to test a variety of .networking links including those based upon, for example, a Network File System (NFS); a Web NFS; a Server Message Block (SMB); a Samba; a Netware Core Protocol (NCP); a Distributed File System (DFS), and a Common Internet File System (CLFS) architecture, as well as use such transport protocols as, for example, TCP/IP, IPX/SPX, HTTP and NetBEUI.
  • NFS Network File System
  • SMB Server Message Block
  • NCP Netware Core Protocol
  • DFS Distributed File System
  • CLFS Common Internet File System

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Environmental & Geological Engineering (AREA)
  • Debugging And Monitoring (AREA)

Abstract

La présente invention concerne un système et un procédé destinés à l'exécution, sans présence humaine, d'un test de disponibilité et de la maintenance sur un programme de service de serveur. Le procédé de l'invention comporte plusieurs opérations, et notamment, de déterminer si chacune des liaisons nécessaires (54) est active, de déterminer si chacun des programmes systèmes nécessaires (62) s'exécute correctement, et de déterminer si chacune des tâches nécessaires à l'intérieur dudit programme de service de serveur (68) s'exécute correctement.
PCT/US2001/020774 2000-06-29 2001-06-29 Systeme et procede d'execution de tests de disponibilite de systeme et de maintenance, sans presence humaine, pour des programmes de service de serveur WO2002003522A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2001271645A AU2001271645A1 (en) 2000-06-29 2001-06-29 System and method for performing unattended system availability tests and maintenance for server service programs

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US60587700A 2000-06-29 2000-06-29
US09/605,877 2000-06-29

Publications (1)

Publication Number Publication Date
WO2002003522A1 true WO2002003522A1 (fr) 2002-01-10

Family

ID=24425566

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2001/020774 WO2002003522A1 (fr) 2000-06-29 2001-06-29 Systeme et procede d'execution de tests de disponibilite de systeme et de maintenance, sans presence humaine, pour des programmes de service de serveur

Country Status (2)

Country Link
AU (1) AU2001271645A1 (fr)
WO (1) WO2002003522A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8589535B2 (en) 2009-10-26 2013-11-19 Microsoft Corporation Maintaining service performance during a cloud upgrade

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5751941A (en) * 1996-04-04 1998-05-12 Hewlett-Packard Company Object oriented framework for testing software
US5854823A (en) * 1996-09-29 1998-12-29 Mci Communications Corporation System and method for providing resources to test platforms
US5987633A (en) * 1997-08-20 1999-11-16 Mci Communications Corporation System, method and article of manufacture for time point validation

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5751941A (en) * 1996-04-04 1998-05-12 Hewlett-Packard Company Object oriented framework for testing software
US5854823A (en) * 1996-09-29 1998-12-29 Mci Communications Corporation System and method for providing resources to test platforms
US5987633A (en) * 1997-08-20 1999-11-16 Mci Communications Corporation System, method and article of manufacture for time point validation

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8589535B2 (en) 2009-10-26 2013-11-19 Microsoft Corporation Maintaining service performance during a cloud upgrade

Also Published As

Publication number Publication date
AU2001271645A1 (en) 2002-01-14

Similar Documents

Publication Publication Date Title
US7209963B2 (en) Apparatus and method for distributed monitoring of endpoints in a management region
US7234072B2 (en) Method and system for making an application highly available
US20040010716A1 (en) Apparatus and method for monitoring the health of systems management software components in an enterprise
US10474521B2 (en) Service directory and fault injection management systems and methods
KR100763326B1 (ko) 분산 시스템에서의 근본 원인 식별 및 문제점 판정을 위한방법 및 장치
US20040153703A1 (en) Fault tolerant distributed computing applications
JP4426797B2 (ja) 依存性に基づく影響シミュレーションおよび脆弱性分析のための方法および装置
US7505872B2 (en) Methods and apparatus for impact analysis and problem determination
US20030196148A1 (en) System and method for peer-to-peer monitoring within a network
US7120684B2 (en) Method and system for central management of a computer network
US20040003266A1 (en) Non-invasive automatic offsite patch fingerprinting and updating system and method
CN101777020B (zh) 一种用于分布式程序的容错方法和系统
US7890616B2 (en) System and method for validation of middleware failover behavior
US7469287B1 (en) Apparatus and method for monitoring objects in a network and automatically validating events relating to the objects
US20030212788A1 (en) Generic control interface with multi-level status
US7934199B2 (en) Automated operation of IT resources with multiple choice configuration
US20090094477A1 (en) System and program product for detecting an operational risk of a node
KR20050007307A (ko) 컴퓨터 애플리케이션을 모니터링하는 시스템 및 방법
CN116225607A (zh) 数据库的管理方法和装置
US7206975B1 (en) Internal product fault monitoring apparatus and method
JP2003233512A (ja) 保守機能付きクライアント監視システム及び監視サーバ及びプログラム並びにクライアント監視・保守方法
US6151686A (en) Managing an information retrieval problem
US8607328B1 (en) Methods and systems for automated system support
WO2002003522A1 (fr) Systeme et procede d'execution de tests de disponibilite de systeme et de maintenance, sans presence humaine, pour des programmes de service de serveur
US9183068B1 (en) Various methods and apparatuses to restart a server

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP