KR20030072103A

KR20030072103A - system and method for providing fault information

Info

Publication number: KR20030072103A
Application number: KR1020020011670A
Authority: KR
Inventors: 윤기헌
Original assignee: 삼성전자주식회사
Priority date: 2002-03-05
Filing date: 2002-03-05
Publication date: 2003-09-13
Anticipated expiration: 2022-03-05
Also published as: KR100443914B1

Abstract

본 발명에 따른 장애 정보 제공 방법은, 교환기 시스템에서 발생된 장애 정보를 제공하는 방법에 있어서, 교환기 시스템으로부터 전달된 장애 발생 정보를 진단하여 장애 발생 정보에 따라 일정 시간대별로 구분하여 장애 통계 정보로 임시 저장하는 단계와, 일정 주기마다 장애 통계 정보를 업데이트하는 단계와, 업데이트된 장애 통계 정보를 일정 주기마다 시간대별로 백업하여 장애 이력 정보를 생성하는 단계와, 요청이 있는 경우, 업데이트된 장애 통계 정보 또는 장애 이력정보중에 해당 시간대의 정보를 제공하는 단계를 수행함으로써, 교환기 시스템의 장애에 대한 실시간 및 누적 통계 데이터를 관리하여 실시간으로 시스템의 현재 장애 상황을 파악할 수 있고, 원하는 시간대의 장애 현황을 수치, 그래프등과 같은 다양한 GUI(Graphic User Interface)를 통해 파악할 수 있음으로 인해 효율적으로 시스템을 관리할 수 있다.In the method for providing failure information according to the present invention, in the method for providing failure information generated in the exchange system, the failure occurrence information transmitted from the exchange system is diagnosed and divided into predetermined time intervals according to the occurrence of the failure information. Storing the fault information, updating the fault statistics information at regular intervals, backing up the updated fault statistics information every time period by time period, and generating fault history information, and requesting updated fault statistics information or By performing the step of providing the information of the relevant time zone in the fault history information, it is possible to identify the current failure situation of the system in real time by managing the real time and cumulative statistical data on the failure of the exchange system, Various GUI (Graphic User Interface) such as graph Through this, you can manage the system efficiently.

Description

Fault information providing system and method thereof

본 발명은 장애 정보 제공 시스템 및 그 방법에 관한 것으로, 상세하게는 인포렉스 시스템(Inforex System)을 위한 MAP(Maintenance Administration PC)/RMAP(Remote Maintenance Administration PC)같은 관리툴에서 취약하였던 통계부분을 보강하여 시스템 관리자가 보다 편리하게 시스템 장애 상황을 파악하고 대처할수 있도록 진단 데이터의 효율적인 관리를 위한 장애 정보 제공 시스템 및 그 방법에 관한 것이다.The present invention relates to a disability information providing system and a method thereof, and in particular, reinforces statistical portions that were weak in management tools such as maintenance administration PC (MAP) / remote maintenance administration PC (RMAP) for Inforex System. The present invention relates to a failure information providing system and method for efficient management of diagnostic data so that a system administrator can identify and cope with a system failure situation more conveniently.

인포렉스 시스템은 분산제어, 분산데이터 베이스 구조를 채택하고 시스템의 중요부분을 이중화함으로써, 신뢰성이 높은 구조를 가지고 있다. 아울러, 획기적인 기술력을 바탕으로 한 쉘프 및 노드형태로 최대 15,360 회선까지 증설이 가능하고, 대형의 메모리를 수용하고 있어 기능확장에 융통성을 보장하며, ISDN 상용화에 따른 음성, 데이터, 화상 등 각종 데이터 통신망 접속기능을 갖춘 미래 지향적 종합정보통신시스템이다.The Infolex system has a highly reliable structure by adopting distributed control, distributed database structure, and redundant parts of the system. In addition, it is possible to expand up to 15,360 lines in the form of shelves and nodes based on its breakthrough technology, and it accommodates large memory, ensuring flexibility in function expansion, and various data communication networks such as voice, data, and video according to ISDN commercialization It is a future-oriented comprehensive information and communication system with access function.

도 1은 종래의 인포렉스 시스템에서 장애처리를 수행하기 위한 시스템의 개략적인 블록도이다. 도시된 바와 같이, 시스템내에 장애가 발생하는 경우, 시스템을 진단하고 있던 진단 태스크(Task)(1)가 이를 감지하여 장애가 발생했음을 알리는 메시지를 입출력 프로세스 모듈(IO Process Module: 이하, IPM라 함)(2)로 보낸다. IPM은 입출력을 담당하는 카드로서, 태스크(1)로부터 받은 메시지를 하드 디스크(3)에 장애 관련된 데이터로 저장하고, MAP(4)과 같은 시스템 관리 프로그램을 통해서 장애이력에 대한 조회를 수행한다.1 is a schematic block diagram of a system for performing fault handling in a conventional InfoRex system. As illustrated, when a failure occurs in the system, a diagnostic task 1, which is diagnosing the system, detects it and sends a message indicating that the failure has occurred (IO Process Module (hereinafter referred to as IPM)). 2) to send. The IPM is a card that handles input / output. The message received from the task 1 is stored in the hard disk 3 as data related to the failure, and the inquiry of the failure history is performed through a system management program such as the MAP 4.

이와 같이 종래의 인포렉스 시스템은 시스템의 장애에 대한 장애 이력은 제공을 하지만, 진단 데이터에 대한 효율적 통계 데이터 처리 모듈이 없었기 때문에 통계 데이타는 제공할 수 없다. 즉, 장애 메시지를 이용하여 사용자가 원하는 정보를 추출하여 볼수 있도록 하는 중간 모듈이 없기 때문이다.As such, the conventional Infolex system provides a failure history of the system failure, but cannot provide statistical data because there is no efficient statistical data processing module for the diagnostic data. That is, there is no intermediate module that allows the user to extract and view the desired information by using the failure message.

따라서, 실제 사이트(site)의 시스템 관리자는 시스템의 장애 상황에 대해서 적절한 대처를 할 수 있는 정보를 획득하지 못함으로 인해서 시스템 장애에 대한 관리를 효율적으로 하기 어려운 문제점이 있다.Therefore, the system administrator of the actual site (site) has a problem that it is difficult to efficiently manage the system failure because the system administrator does not obtain the information that can properly cope with the failure situation of the system.

본 발명은 이러한 종래의 문제점을 해결하기 위하여 안출된 것으로, 시스템의 장애 진단 메시지를 효율적으로 분류하고 데이터베이스에 저장하여 시간대별로 장애 이력을 볼 수 있게 하는 장애 정보 제공 시스템 및 그 방법을 제공하는데 그 목적이 있다.SUMMARY OF THE INVENTION The present invention has been made to solve such a conventional problem, and provides a failure information providing system and method for efficiently classifying a failure diagnosis message of a system and storing it in a database so that a failure history can be viewed at each time zone. There is this.

도 1은 종래의 장애 정보 제공 시스템의 개략적인 블록도.1 is a schematic block diagram of a conventional fault information providing system.

도 2는 본 발명에 따른 장애 정보 제공 시스템이 구비된 장애 관리 서버의 구성 블록도.Figure 2 is a block diagram of a failure management server equipped with a failure information providing system according to the present invention.

도 3은 도 2에 도시된 진단 매니저 모듈의 내부 구성블록도 및 메시지 흐름도.3 is an internal configuration block diagram and message flow diagram of the diagnostic manager module shown in FIG.

*도면의 주요부분에 대한 부호의 설명** Description of the symbols for the main parts of the drawings *

1 : 진단 태스크2 : IPM(IO Process Module)1: Diagnostic task 2: IPM (IO Process Module)

3 : 하드 디스크 드라이브(HDD)4 : MAP3: hard disk drive (HDD) 4: MAP

10 : IAP11 : 진단 태스크10: IAP11: Diagnostic Task

20 : I/O 서버21 : IPM20: I / O Server 21: IPM

30 : 장애관리서버31 : 데이터 수집모듈30: failure management server 31: data collection module

32 : 진단 매니저 모듈32a: 초기화 처리 모듈32: diagnostic manager module 32a: initialization processing module

32b: 메시지 처리부32c: DB 관리 처리부32b: message processing unit 32c: DB management processing unit

33 : DB33a: 로그 DB33: DB33a: Log DB

33b: SUM DB34 : 작업 영역(WA)33b: SUM DB34: Workspace (WA)

34a : 현재 작업영역(CURR_WA)34b: 누적 작업영역(SUM_WA)34a: Current workspace (CURR_WA) 34b: Cumulative workspace (SUM_WA)

35 : 동적 라이브러리 모듈36 : 데이터 리포터 모듈35: dynamic library module 36: data reporter module

40 : 웹브라우저41 : 사용자 화면40: web browser 41: user screen

이러한 목적을 달성하는 본 발명에 따른, 교환기 시스템에서 발생된 장애를 처리하는 장치에 있어서, 교환기 시스템으로부터 전달되는 장애 발생 정보에 따라 일정 시간대별로 구분된 장애 이력 정보를 누적하여 저장하는 제 1 저장 모듈과, 교환기 시스템으로부터 전달되는 장애 발생 정보에 따라 일정 시간대별로 구분된 장애 통계 정보를 일정한 시간동안 저장하는 제 2 저장 모듈과, 교환기 시스템으로부터 전달된 장애 발생 정보를 진단하여 제 1 저장 모듈에 시간별로 누적하여 저장하고, 제 2 저장모듈에 저장된 통계 정보를 일정 주기로 제 1 저장모듈에 장애 이력 정보로 누적하여 저장하는 진단 매니저 모듈과, 제 2 저장모듈에 저장된 실시간적인 통계정보를 호출하기 위한 동적 라이브러리 모듈과, 외부로부터 원하는 시간대의 조회 요청 및 통계자료 요청이 있는 경우, 동적 라이브러리 모듈을 통해 제 2 저장모듈에 저장된 해당 시간대의 통계정보 및 제 1 저장모듈에 시간대별로 저장된 장애 이력 정보를 읽어들여 웹환경의 사용자에게 제공하는 데이터 리포터 모듈을 포함하여 구성된다.In the apparatus for processing a failure generated in the exchange system according to the present invention for achieving the above object, the first storage module for accumulating and storing the failure history information divided by a certain time period according to the failure occurrence information transmitted from the exchange system And a second storage module for storing fault statistics information divided by a certain time period according to the fault occurrence information transmitted from the exchange system for a predetermined time period, and diagnosing fault occurrence information transmitted from the exchange system for each time in the first storage module. A diagnostic manager module that accumulates and stores the statistical information stored in the second storage module as fault history information in the first storage module at regular intervals, and a dynamic library for calling real-time statistical information stored in the second storage module. Module, externally requested query time and statistics from outside If there is a request for data, the dynamic reporter module reads the statistical information of the corresponding time zone stored in the second storage module and the fault history information stored for each time slot in the first storage module, and provides the data reporter module to the user of the web environment. It is composed.

또한, 본 발명에 따른 장애 정보 제공 방법은, 교환기 시스템에서 발생된 장애 정보를 제공하는 방법에 있어서, 교환기 시스템으로부터 전달된 장애 발생 정보를 진단하여 장애 발생 정보에 따라 일정 시간대별로 구분하여 장애 통계 정보로 임시 저장하는 단계와, 일정 주기마다 장애 통계 정보를 업데이트하는 단계와, 업데이트된 장애 통계 정보를 일정 주기마다 시간대별로 백업하여 장애 이력 정보를생성하는 단계와, 요청이 있는 경우, 업데이트된 장애 통계 정보 또는 장애 이력정보중에 해당 시간대의 정보를 제공하는 단계를 수행한다.In addition, the fault information providing method according to the present invention, in the method for providing fault information generated in the exchange system, by diagnosing fault occurrence information transmitted from the exchange system and divided by a certain time period according to the fault occurrence information fault statistics information Temporarily storing the information, updating the failure statistics information at regular intervals, backing up the updated failure statistics information every time period by time period to generate failure history information, and, upon request, updated failure statistics. Providing the information of the corresponding time zone among the information or the fault history information.

이하, 첨부된 도면을 참조하여 본 발명을 상세히 설명해 보자.Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.

도 2는 본 발명에 따른 장애 정보 제공 시스템이 구비된 장애 관리 서버의 구성 블록도이다. 도 2를 참조하여 IAP 및 I/O 서버 및 장애 관리 서버(Admin Server)내의 다른 모듈과의 관계를 알아보자.2 is a block diagram of a failure management server equipped with a failure information providing system according to the present invention. Referring to FIG. 2, the relationship between the IAP and the I / O server and other modules in the failure management server (Admin Server).

장애 관리 서버(30)는 장애여부를 진단하는 진단 태스크(11)를 실장하는 IAP(10)와, I/O 서버(20)와, 사용자 웹브라우저 환경(40)에 연결된다.The failure management server 30 is connected to an IAP 10 that implements a diagnostic task 11 for diagnosing a failure, an I / O server 20, and a user web browser environment 40.

IAP(10)은 Integrated Access Platform의 약어로 교환기 시스템을 나타낸다. 진단 태스크(11)를 구비하여 교환기내에 장애가 발생되었는지의 여부를 진단하여 장애가 발생한 경우 장애가 발생했음을 알려주게 된다. 이때, 장애의 발생을 알려주는 메시지는 AFS 메시지 형태를 사용한다. AFS 메시지는 Alarm Fault Status 메시지를 말한다.IAP 10 stands for Exchange System as an abbreviation for Integrated Access Platform. The diagnostic task 11 is provided to diagnose whether or not a failure has occurred in the switchboard, and to indicate that a failure has occurred. At this time, a message indicating the occurrence of a failure uses an AFS message type. AFS message refers to Alarm Fault Status message.

I/O 서버(20)는 IAP(10)의 진단 태스크(11)로부터 네트워크에 장애가 발생되었음을 알리는 메시지를 받는 경우 장애처리를 위해 장애 관리 서버(30)에 전달해주는 역할을 수행한다. I/O 서버(20)에는 입출력을 담당하는 카드인 IPM(21)이 실장되어 있다.When the I / O server 20 receives a message indicating that a failure has occurred in the network from the diagnostic task 11 of the IAP 10, the I / O server 20 transmits the failure to the failure management server 30 for failure processing. In the I / O server 20, an IPM 21, which is a card in charge of input and output, is mounted.

장애 관리 서버(30)는 I/O 서버(20)의 IPM(21)을 통해 장애 데이터를 수집하는 데이터 수집 모듈(31)과, 데이터 수집 모듈(31)로부터 장애 데이터를 전달받아 장애처리를 수행하는 진단 매니저 모듈(32)과, 진단 매니저 모듈(32)의 제어에 따라 각종 장애 데이터를 저장하는 DB(33)와, 발생된 장애에 대한 실시간적인 통계 정보를 저장하는 작업 영역(Working Area)(34)과, 외부의 모니터링 요청에 따라 작업 영역에 저장된 장애 데이터를 외부로 전달해주기 위한 동적 라이브러리 모듈(35)과, 동적 라이브러리 모듈(35)을 통해 작업 영역(34)에 저장된 장애 데이터를 외부로 보고하기 위한 데이터 리포터 모듈(36)로 이루어진다.The failure management server 30 receives a failure data from the data collection module 31 for collecting failure data through the IPM 21 of the I / O server 20 and the data collection module 31 and performs failure processing. The diagnostic manager module 32, a DB 33 for storing various types of failure data under the control of the diagnostic manager module 32, and a working area for storing real-time statistical information on the generated failures. 34), the dynamic library module 35 for delivering fault data stored in the work area to the outside according to an external monitoring request, and the fault data stored in the work area 34 through the dynamic library module 35 to the outside. It consists of a data reporter module 36 for reporting.

데이터 수집 모듈(31)은 I/O 서버(20)를 통해 전달되는 장애 데이터를 수신하여 진단 매니저 모듈(32)에 전달하는 기능을 수행한다.The data collection module 31 receives the fault data transmitted through the I / O server 20 and transmits the fault data to the diagnosis manager module 32.

진단 매니저 모듈(32)은 초기화 모듈(Initializer)(32a)과, 메시지 처리 모듈(Message Handler)(32b)과, DB 관리 처리 모듈(DM Handler)(32c)로 구성될 수 있다.The diagnostic manager module 32 may be composed of an initializer 32a, a message handler 32b, and a DB handler 32c.

도 3은 도 2에 도시된 진단 매니저 모듈의 구성 블록도 및 각 구성 블록의 개략적인 기능을 보여주고 있다. 도 3을 참조하여 각 구성블록의 기능을 살펴보자.FIG. 3 is a block diagram of the diagnostic manager module shown in FIG. 2 and a schematic function of each block. Referring to Figure 3, let's look at the function of each building block.

초기화 모듈(32a)은 진단 매니저 모듈(32)의 내부 모듈 구동 및 환경에 대한 다음과 같은 초기화 작업을 수행하는 루틴이다. 데이터 수집모듈(31)간의 인터페이스를 위한 소켓 인터페이스를 초기화하고, DB를 초기화하고, SQL 서버 I/F(interface)를 초기화하고, WA(CURR_WA 및 SUM_WA)를 초기화하고, 진단 매니저 모듈의 내부 모듈간의 인터페이스를 위한 메시지 큐를 초기화하고, 시스템의 형상정보, 예를 들면, 시스템의 노드/셀프/카드/포트의 상태에 대한 형상 정보 메시지를 요청하고 그에 따른 처리를 수행한다.The initialization module 32a is a routine for performing the following initialization operations for the internal module driving and environment of the diagnostic manager module 32. Initialize the socket interface for the interface between the data collection modules 31, initialize the DB, initialize the SQL server I / F (interface), initialize the WA (CURR_WA and SUM_WA), between the internal modules of the diagnostic manager module Initializes the message queue for the interface, requests the shape information message of the shape information of the system, for example, the state of the node / self / card / port of the system and performs the processing accordingly.

메시지 처리 모듈(32b)은 IAP 시스템(10)으로 부터 받은 장애 메시지를 내부큐(queue)에 버퍼링하고 메시지를 읽고 분석하는 루틴모듈이다. 즉, 데이터 수집모듈로부터 장애 메시지 정보를 가지는 POD 메시지와 형상정보 메시지(config message)를 수신하게 된다. 여기서, POD 메시지는 periodic 메시지로서 주기적으로 발생하는 메시지를 나타낸다.The message processing module 32b is a routine module for buffering a fault message received from the IAP system 10 into an internal queue, reading and analyzing the message. That is, the POD message and the configuration information message having the fault message information are received from the data collection module. Here, the POD message indicates a message that occurs periodically as a periodic message.

이때, 데이터 수집모듈(31)로부터 수신된 POD 메시지는 AFS 메시지 형태를 가지게 된다. AFS 메시지의 레코드 형태는 DB 관리 처리 모듈(32c)에서 처리하는 로그 레코드의 형식과 차이가 있을 수 있다. 따라서, 데이터 수집모듈(31)로부터 수신된 로그 레코드중에서 DB 관리 처리 모듈(32c)에서 필요한 사항만을 추출할 필요가 있다. 이를 위해서 메시지 처리 모듈(32b)은 수신한 메시지를 내부큐에 버퍼링한 다음 그 메시지를 읽어서 어떠한 메시지인지 그 내용을 분석하여, DB관리 처리 모듈(32c)에 전달하게 된다.At this time, the POD message received from the data collection module 31 has an AFS message form. The record format of the AFS message may be different from that of the log record processed by the DB management processing module 32c. Therefore, it is necessary to extract only the necessary items from the DB management processing module 32c among the log records received from the data collection module 31. To this end, the message processing module 32b buffers the received message in an internal queue, reads the message, analyzes the content of the message, and delivers the message to the DB management processing module 32c.

DB 관리 처리 모듈(32c)은, 메시지 처리 모듈(32b)에서 분석한 장애메시지를 실제적으로 처리하는 루틴이다. 이에 따라 장애 카운터(counter) 변경 및 진단 DB 내용을 변경한다.The DB management processing module 32c is a routine that actually processes the failure message analyzed by the message processing module 32b. Accordingly, the error counter is changed and the contents of the diagnosis DB are changed.

DB(33)는 장애의 이력을 저장하는 로그 DB(LOG DB)(33a)와, 15분 간격으로 분류되는 장애 내력 정보를 저장하는 SUM DB(33b)로 이루어진다.The DB 33 is composed of a log DB 33a which stores a history of failures and a SUM DB 33b which stores failure history information classified at 15 minute intervals.

로그 DB(33a)는 단위 시간대별 상세 장애 이력을 조회할 때 사용하는 DB로서, 시스템으로부터 받은 장애 메시지(AFS 메세지)에 대하여 분류작업을 거쳐 로그 레코드(record)로 구성한 후 저장하게 된다.The log DB 33a is a DB used for inquiring a detailed failure history for each time zone. The log DB 33a is configured to store a log record after sorting the failure message (AFS message) received from the system.

한편, SUM DB(33b)는, 단위 시간대별로 장애 발생 횟수, 장애 복구 횟수, 남은 장애 개수를 셀 때 사용하는 DB이고, 15분 주기로 SUM_WA(34a)의 내용을 저장한다. SUM DB(33b)에는 노드(Node),셀프(Shelf) 및 상태(Status) DB만 저장된다.On the other hand, the SUM DB 33b is a DB used to count the number of failure occurrences, the number of failure restorations, and the number of remaining failures for each unit of time, and stores the contents of the SUM_WA 34a every 15 minutes. In the SUM DB 33b, only the Node, Self, and Status DB are stored.

WA(34)는 데이터 베이스 관리 체계(DBMS)에서 사용하는 데이터가 저장되는 작업 장소 내에 있는 구역을 의미하며, 사용자 작업 구역 내에는 응용 프로그램에 의해 호출된 서브스키마의 모든 데이터 항목에 대한 저장 장소가 있다.WA 34 means an area within a work area where data used by a database management system (DBMS) is stored. A user work area includes a storage area for all data items of a subschema called by an application. have.

WA(34)는 SUM_WA(34a)와, CURR_WA(34b)로 구성될 수 있다. SUM_WA(34a) 및 CURR_WA(34b)는 모니터링시에 최근 장애 복구 내역을 구별하기 위해서 "장애 복구 시간" 필드(field)를 추가할 수 도 있다.WA 34 may be composed of a SUM_WA 34a and a CURR_WA 34b. SUM_WA 34a and CURR_WA 34b may add a "Failure Recovery Time" field to distinguish recent failure recovery history during monitoring.

SUM_WA(34a)는 노드(node)/셀프(shelf) 별로 장애정보가 저장되며, 15분 주기의 SUM DB(33b)를 만들기 위해 임시로 주메모리(main memory)내에서 유지되는 WA로 주기적으로 SUM DB(33b)에 저장한다. 실시간 모니터링시에는 해당 시간 주기 내에서의 누적 통계치를 보여 주기 위해서 사용된다.SUM_WA 34a is a WA that is temporarily maintained in main memory to make SUM DB 33b for 15 minutes. Stored in the DB 33b. In real-time monitoring, it is used to show cumulative statistics within the time period.

CURR_WA(34b)는 카드(Card) 및 포트(Port)별로 현재(current)의 장애 상태를 표시하며 현재 상태 모니터링시 이용한다. 이 WA는 15분 주기의 WA로서 DB에는 저장되지 않는다.CURR_WA 34b displays the current fault status for each card and port and is used for monitoring the current status. This WA is a 15 minute cycle and is not stored in the database.

동적 라이브러리 모듈(35)는 진단 매니저 모듈(32)이 관리하는 WA(working area)(34)는 공유 메모리(Shared Memory)로 구현된 영역으로서 진단 데이터를 사용자에게 보여주거나 관리자에게 보여주는 데이터 리포터 모듈(36)이 ASP로 작성된 관계로 ASP모듈에서 진단 WA를 접근을 하기 위해서 COM DLL로 제공한다. 여기서, COM DLL는 Component Object Model Dynamic Link Library의 약어로 서로 다른 모듈간의 데이터 인터페이스를 제공하기 위한 데이타 접근방법을 위한 다이나믹 링크 라이브러리를 나타낸다.The dynamic library module 35 is a working area 34 managed by the diagnosis manager module 32. The working library 34 is an area implemented with shared memory. The dynamic library module 35 displays a diagnostic data to a user or to an administrator. 36) As it is written in ASP, it provides COM DLL to access diagnostic WA from ASP module. Here, COM DLL stands for Component Object Model Dynamic Link Library and represents a dynamic link library for a data access method for providing a data interface between different modules.

데이터 리포터 모듈(36)은 외부로부터의 요청이 있는 경우, 진단 데이터를 사용자에게 보여주거나 관리자에게 보여주기 위해 COM DLL(DLL dynamic link library)를 통해 WA(33)를 접근을 하여 각종 데이터를 읽어와서 사용자에게 표시하여 주는 모듈로, 웹환경에서 인터넷을 통하여 접속할 수 있도록 ASP로 작성될 수 있다.When there is a request from the outside, the data reporter module 36 reads various data by accessing the WA 33 through a COM DLL (DLL dynamic link library) to show the diagnostic data to the user or the administrator. This module displays to the user and can be written in ASP so that users can access the web through the Internet.

이와 같이 구성된 장애 정보 제공 시스템의 동작을 살펴보자.Let's look at the operation of the fault information providing system configured as described above.

진단 매니저 모듈(32)이 구동되면 초기화 처리 모듈(32a)에서 형상 정보 메시지(Config Message)를 데이터 수집 모듈(31)에 요청하고, 이에 대한 응답 처리를 수행하면서, 시스템의 형상 정보(노드, 셀프, 카드 , 포트 정보)를 개별적으로 요청하고 수신하여 형상 정보에 대한 WA(34)를 업데이트한다.When the diagnostic manager module 32 is driven, the initialization processing module 32a requests the configuration information message (Config Message) from the data collection module 31 and performs a response process, thereby performing configuration information of the system (node, self). Request, receive, and update the WA 34 with respect to the shape information.

IAP 시스템(10)에 장애가 발생하면 IAP 의 진단 태스크(11)에서는 AFS 메시지라는 메시지 포맷(format)으로 장애 발생에 관련된 진단 메시지를 보내주며 메시지 처리 모듈(32a)에서는 데이터를 형상정보(config) 혹은 로그(log)항목으로 분류하여 메시지 종류별로 해당 처리 모듈(초기화 처리 모듈 또는 DB 관리 처리 모듈)을 호출하여 각 장애 카운터의 값을 변경한다.If a failure occurs in the IAP system 10, the diagnostic task 11 of the IAP sends a diagnostic message related to the failure in a message format called an AFS message, and the message processing module 32a transmits data to the configuration information (config) or the like. It classifies into log items and calls each processing module (initialization module or DB management module) for each message type to change the value of each fault counter.

한편, 시스템의 장애 상황은 15분 경계에 걸쳐서 지속될수 있기 때문에 이에 대한 장애 리스트를 큐로 관리한다. 즉, 현재 시스템 장애상황은 15분 단위로 관리가 되기 때문에(15분 단위로 WA및 SUM DB관리) 15분이 지나면 WA를 클리어 하게 된다. 따라서,15분 단위 시간이 지나면 현재까지 지속중인 장애에 대한 정보가 없기때문에 현재 장애에 대한 정보를 제대로 파악할수 없다. 따라서, 매15분 단위로 WA를 저장하는 시점에 지속중인 장애는 '지속중'이라는 상태값을 링크 리스트 큐(linked list quqe)에 저장을 시켜 놓는다On the other hand, because the failure situation of the system can persist over a 15 minute boundary, the failure list for this is queued. In other words, the current system failure is managed by 15 minutes (WA and SUM DB management by 15 minutes), so the WA is cleared after 15 minutes. Therefore, after 15 minutes of time, there is no information on the current disability. Therefore, if the fault persists at the time of saving WA every 15 minutes, the status value of 'continuous' is stored in the linked list quqe.

알람 발생 메시지인 경우, 링크 리스트(linked list)인 알람 리스트(AlarmQlist)에 발생시간 순으로 저장하고, 해당 AFS 코드를 분석하여 관련된 SUM_WA 및 CURR_WA 의 값들을 업데이트한다.In case of an alarm occurrence message, it is stored in the order of occurrence time in the alarm list (AlarmQlist), which is a linked list, and the corresponding AFS code is analyzed to update the values of related SUM_WA and CURR_WA.

장애(Fault) 발생 메시지인 경우, 링크 리스트인 장애리스트(FaultQlist)에 발생 시간 순으로 저장하고, 해당 AFS 코드를 분석하여 관련된 SUM_WA 및 CURR_WA의 값들을 업데이트한다.In the case of a fault occurrence message, it is stored in the order of occurrence time in the fault list (FaultQlist), which is a link list, and the corresponding AFS code is analyzed to update the values of related SUM_WA and CURR_WA.

한편, 알람해제(Alarm Clear) 메시지인 경우, 링크 리스트(linked list)인 알람리스트(AlarmQlist)에서 해당 알람을 제거하고 AFS 코드를 분석하여 관련된 SUM_WA 및 CURR_WA의 값들을 업데이트한다.On the other hand, in the case of an Alarm Clear message, the corresponding alarm is removed from the linked list Alarm List (AlarmQlist) and the AFS code is analyzed to update the values of related SUM_WA and CURR_WA.

상태(Status) 메시지중 장애복구(Fault Recovery) 메시지인 경우에는, 장애 리스트(FaultQlist)에서 해당 장애를 제거하고, AFS 코드를 분석하여 관련된 SUM_WA(34a) 및 CURR_WA(34b)의 값들을 업데이트한다. 다른 상태 메시지에 대해서는 AFS 코드를 분석하여 SUM_WA(34a) 및 CURR_WA(34b)의 값들을 업데이트한다.In the case of a Fault Recovery message among the Status messages, the fault is removed from the FaultQlist, the AFS code is analyzed, and the values of the associated SUM_WA 34a and CURR_WA 34b are updated. For other status messages, the AFS code is analyzed to update the values of SUM_WA 34a and CURR_WA 34b.

매 15분 간격의 타이머를 생성하여 15분 단위로 SUM WA(34a)의 내용을 SUM DB(33b)에 저장하고 SUM_WA(34a)의 내용을 리셋시킨다. 또한 알람 리스트(AlarmQlist) 및 장애 리스트(FaultQlist)에 남아 있는 알람 및 장애들을 로그 DB(33a)에 다시 저장한다. 이때 알람 및 장애의 상태는 "진행중(Alarm Continue/ Fault Continue)"이라는 타입(type)으로 저장이 된다. 이것은 알람 및 장애가 해당 15분 시간 간격 안에 해제되지 않고, 그 다음 15분대로 지속되는 경우에 해당 장애의 이력(history)을 관리하여 알람의 최초발생시간에 관계없이 로그 이력(Log history)을 좀더 간단하게 검색하기 위해서이다.By generating a timer every 15 minutes, the contents of the SUM WA 34a are stored in the SUM DB 33b every 15 minutes, and the contents of the SUM_WA 34a are reset. In addition, the alarms and faults remaining in the alarm list AlarmQlist and the fault list FaultQlist are stored again in the log DB 33a. At this time, the state of alarm and fault is stored as a type of "Alarm Continue / Fault Continue". It manages the history of the fault when the alarm and fault are not cleared within the corresponding 15 minute time interval and continues for the next 15 minutes, making log history simpler regardless of the time the alarm first occurred. To search.

진단 매니저 모듈(32)이 구동된 후 최초로 진단 AFS 메시지를 받은 경우 또는 매 15분단위로 SUM_WA(34a)를 SUM DB(33b)에 기록(write)한 이후 해당 15분 시간대에서 최초로 받은 AFS 메시지의 타임 스탬프(time stamp)값을 비교 한 후 (15-α)분 타이머를 등록하고, 매 시간의 15분 간격으로 시간초과 메시지(timeout message)를 발생하도록 한다. 이것은 매 15분 간격으로 SUM_WA(34a)를 SUM DB(33b)에 기록하기 위함이다.When the first AFS message is received after the diagnosis manager module 32 is started, or after writing the SUM_WA 34a to the SUM DB 33b every 15 minutes, the time of the first AFS message received in the corresponding 15 minute time zone. After comparing the time stamp values, register a (15-α) minute timer and generate a timeout message every 15 minutes. This is to record the SUM_WA 34a in the SUM DB 33b every 15 minutes.

본 발명에 의하면, IAP 시스템의 장애에 대한 실시간 및 누적 통계 데이터를 관리하여 실시간으로 시스템의 현재 장애 상황을 파악할 수 있고, 원하는 시간대의 장애 현황을 수치, 그래프등과 같은 다양한 GUI(Graphic User Interface)를 통해 파악할 수 있음으로 인해 효율적으로 시스템을 관리할 수 있다.According to the present invention, the real-time and cumulative statistical data on the failure of the IAP system can be managed to grasp the current failure situation of the system in real time, and various failures (Graphic User Interface (GUI) such as numerical values, graphs, etc. This can be managed through the system, so that the system can be managed efficiently.

아울러, 지속되는 장애의 경우, 발생 시간을 바로 알 수 있기 때문에 중요한 시스템 장애(시스템 재기동)가 발생한 경우 장애 발생 시스템 관리자가 장애 발생 이력을 더욱 편리하게 관리할 수 있다.In addition, in the case of a continuous failure, the occurrence time can be immediately known, so that in case of a major system failure (system restart), a failure system administrator can manage the failure history more conveniently.

Claims

In the system for providing fault information generated in the exchange system,

A first storage module for accumulating and storing fault history information divided by predetermined time zones according to fault occurrence information transmitted from the exchange system;

A second storage module configured to store fault statistics information, which are divided by a predetermined time zone, for a predetermined time according to fault occurrence information transmitted from the exchange system;

Diagnosis of failure occurrence information transmitted from the exchange system is accumulated and stored in the first storage module for each time, and statistical information stored in the second storage module is accumulated and stored as failure history information in the first storage module at regular intervals. A diagnostic manager module,

A dynamic library module for calling real time statistical information stored in the second storage module;

When there is a request for inquiry of a desired time zone and a request for statistical data, a web environment is read through statistical information of a corresponding time zone stored in the second storage module and corresponding fault history information stored for each time zone in the first storage module through the dynamic library module. Fault information providing system comprising a data reporter module for providing a user of.

The method of claim 1, wherein the first storage module,

A first DB that sequentially accumulates failure occurrence information transmitted from the exchange system and stores a failure history;

Disability information providing system comprising a second DB for storing the failure history by dividing the failure occurrence information delivered from the exchange system by a predetermined time period.

The method of claim 2, wherein the first storage module,

And a queue for storing the remaining faults as a fault list after a predetermined time, and restoring the faults in the first DB until the corresponding fault is recovered at a predetermined time interval.

The method of claim 1, wherein the second storage module,

A first operating area for temporarily storing fault statistics information in a corresponding time period during real time monitoring;

And a second operating area temporarily maintained in the main memory for a predetermined period to form a predetermined period of failure history information, and then backed up and stored in the first storage module.

The method of claim 1, wherein the diagnostic manager module,

Initialize the message queue for the interface between the socket interface for receiving fault information from the exchange system, the internal DB, the second storage module, the internal module of the diagnostic manager module, and request system configuration information from the exchange system. Module,

A message processing module for interpreting system shape information transmitted from the exchange system and a failure occurrence message transmitted from the exchange system according to a request of the initialization module;

According to the failure occurrence information interpreted by the message processing module, fault history information is generated for each predetermined time, stored in the first storage module, updated failure statistics information of the second storage module for each predetermined time, and And a DB management processing module for backing up and updating the updated hourly statistical information of the second storage module to the first storage module.

The method of claim 1, wherein the failure statistics information,

Disability information providing system comprising at least one of the number of failures, the number of failure recovery, the remaining number of failures per unit time zone.

The method of claim 1, wherein the failure statistics information,

The total failure is divided into above card level and below port level, and the partial failure where the port level failure occurs is stored in a table that can determine whether to display the failure status according to the level so that the failure status of the upper level is not displayed. Disability Information System.

In the method for providing fault information generated in the exchange system,

Diagnosing fault occurrence information transmitted from the exchange system and temporarily storing the fault occurrence information as fault statistics information according to the fault occurrence time;

Updating the fault statistics information at regular intervals;

Generating fault history information by backing up the updated fault statistics information for each time period;

If there is a request, providing the information of the corresponding time zone in the updated failure statistics information or failure history information.

The method of claim 8, wherein the providing of the information of the corresponding time zone is provided through a web environment.

The method of claim 9, wherein the web environment supports a graphical user interface of a numerical value or a graph.