WO2016183967A1 - Procédé et appareil d'alarme de défaillance destinés à un élément clé et système de gestion de mégadonnées - Google Patents
Procédé et appareil d'alarme de défaillance destinés à un élément clé et système de gestion de mégadonnées Download PDFInfo
- Publication number
- WO2016183967A1 WO2016183967A1 PCT/CN2015/089361 CN2015089361W WO2016183967A1 WO 2016183967 A1 WO2016183967 A1 WO 2016183967A1 CN 2015089361 W CN2015089361 W CN 2015089361W WO 2016183967 A1 WO2016183967 A1 WO 2016183967A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- alarm information
- fault
- node
- level
- standby
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0654—Management of faults, events, alarms or notifications using network fault recovery
- H04L41/0668—Management of faults, events, alarms or notifications using network fault recovery by dynamic selection of recovery network elements, e.g. replacement by the most appropriate element after failure
Definitions
- This paper refers to, but is not limited to, the field of big data management systems, and in particular, it relates to a fault alarm method, device and big data management system for key components.
- Taobao.com www.taobao.com
- the amount of data generated per day is more than 50TB (1TB is equal to 1000GB)
- the storage capacity is 40PB (1PB is equal to 1000TB).
- Baidu's current total data is close to 1000PB, and the number of stored web pages is close to 1 trillion pages. About 6 billion search requests and dozens of PB data are processed every day. According to monitoring, this speed will continue until 2020.
- Big data has 4V features: large volume, multiple data categories (Variety), fast data processing speed (Velocity), and high data availability (Veracity). among them,
- the amount of data is large, now large data sets, the amount of data is generally around 10TB scale, currently generally considered PB level above data considered big data.
- the data processing speed is fast, in the case of a very large amount of data, it can also achieve real-time data processing, requiring data processing and I / O speed is very fast.
- Big data allows us to obtain products and services of great value, or deep insights, in an unprecedented way, by analyzing massive amounts of data, and ultimately forming the power of change.
- Many industries have big data needs, such as the telecommunications industry, the Internet industry and other industries that are prone to generate large amounts of data, such as medicine, education, mining, electricity and other traditional industries. Data from different industries have their own characteristics, and they need to combine their own industry knowledge to convert big data into value.
- the embodiment of the invention provides a fault alarming method, device and big data management system for key components, which solves the problem that key component alarms cannot be processed in time.
- a fault alarming method for a key component is provided, which is applied to a big data management system, and the method includes:
- the first alarm information is generated according to the state information after the switching between the primary node and the standby node;
- the first alarm information and the first fault solution are output.
- the step of generating the first alarm information according to the state information after the switching between the master node and the standby node includes:
- the first level alarm information is generated when only one of the status information after the switching between the master node and the standby node is active.
- the second level alarm information is generated when the state information of the master node and the standby node are both active or not active.
- the level of the second level alarm information is higher than the first level alarm. The level of interest.
- the step of acquiring the first fault solution corresponding to the first alarm information according to the corresponding relationship between the alarm information and the fault solution includes:
- the fault solution for automatically recovering the fault is obtained;
- the fault solution that needs to manually recover the fault is obtained.
- the step of generating the first level alarm information includes: when only one of the status information of the master node and the standby node is in the active state after the switchover is:
- the first node generates the first sub-alarm information of the first level when the active node is switched from the active state to the standby state, and the standby node is switched from the standby state to the active state.
- the second level alarm information of the first level is generated.
- the step of generating the second level alarm information when the status information of the active node and the standby node are both active or not active is:
- the primary node is still in the active state, and the second-level first sub-alarm information is generated when the standby node is switched from the standby state to the active state;
- the second-level second sub-alarm information is generated.
- the second-level third sub-alarm information is generated.
- the method further includes:
- the alarm log is generated according to the first alarm information and the first fault solution, and the alarm log is saved locally.
- the step of outputting the first alarm information and the first fault solution includes:
- the first alarm information and the first fault solution are output to a display device of the big data management system for display.
- a fault alarming device for a key component is provided, which is applied to a big data management system, including:
- the generating module is configured to generate the first alarm information according to the state information after the switching between the master node and the standby node, when the master node of the key component of the big data management system is faulty;
- Obtaining a module configured to acquire a first fault solution corresponding to the first alarm information according to the corresponding relationship between the alarm information and the fault solution;
- the output module is configured to output the first alarm information and the first fault solution.
- the generating module includes:
- the first generating unit is configured to generate first level alarm information when only one of the status information after the switching between the primary node and the standby node is in an active state;
- the second generating unit is configured to generate the second level alarm information when the state information of the master node and the standby node are both activated or not activated, wherein the level of the second level alarm information is higher than the first level alarm The level of information.
- the acquisition module includes:
- the first obtaining unit is configured to obtain a fault solution for automatically recovering the fault when the first alarm information is the first level alarm information;
- the second obtaining unit is configured to obtain a fault solution that needs to manually recover the fault when the first alarm information is the second level alarm information.
- the first generating unit includes:
- the first generation sub-unit is configured to generate the first-level first sub-alarm information when the primary node is switched from the active state to the standby state, and the standby node is switched from the standby state to the active state;
- the second generation sub-unit is configured to generate the first-level second sub-alarm information when the primary node is down or down, and the standby node is switched from the standby state to the active state.
- the second generating unit includes:
- a third generation sub-unit configured to generate a second-level first sub-alarm information when the primary node is still in an active state, and the standby node is switched from the standby state to the active state;
- a fourth generation sub-unit configured to generate a second-level second sub-alarm information when the primary node is switched from the active state to the standby state, and the standby node is still in the standby state;
- the fifth generation sub-unit is configured to generate the second-level third sub-alarm information when the primary node is down or down, and the standby node is still in the standby state.
- the fault alarm device of the key component further includes:
- the storage module is configured to generate an alarm log according to the first alarm information and the first fault solution, and save the alarm log to the local.
- the output module includes:
- a first output unit configured to send the first alarm information and the first fault solution to the mobile terminal pre-bound with the server where the key component is located;
- the second output unit is configured to output the first alarm information and the first fault solution to the display device of the big data management system for display.
- a big data management system comprising a critical component fault alarm device as described above.
- the embodiment of the present invention provides a fault alarming method and device for a key component, and a big data management system, which generates a corresponding alarm information when a master-slave node associated with a high-availability component of a critical component is switched, and obtains the alarm information. Corresponding solution and output together with the alarm information.
- the embodiment of the present invention can timely solve the problem of failure, and can timely discover and process the cluster problem to ensure high availability and reliability of the big data management system.
- FIG. 1 is a schematic flow chart showing a fault alarming method for a key component of an embodiment of the present invention
- FIG. 2 is a block diagram showing a fault alarm device of a key component of an embodiment of the present invention
- FIG. 3 is a schematic diagram showing the composition of a generating module in a fault alarm device of a key component according to an embodiment of the present invention
- FIG. 4 is a schematic diagram showing the composition of an acquisition module in a fault alarm device of a key component according to an embodiment of the present invention
- FIG. 5 is a schematic diagram showing the composition of an output module in a fault alarm device of a key component according to an embodiment of the present invention
- Fig. 6 is a block diagram showing the fault alarming device of another key component of the embodiment of the present invention.
- YARN Another Resource Negotiator
- the YARN master node includes: a resource manager ResourceManager and a configuration file yarn-site.xml.
- the resource manager ResourceManager is responsible for resource management and scheduling of the entire system, and maintains ApplictionMaster information of each application, and NodeManager information of each node.
- YARN's single point of failure refers to the resource manager ResourceManager single point problem.
- an embodiment of the present invention provides a method for alerting a key component, which is applied to a big data pipe.
- the system includes the following steps:
- Step 10 When the primary node of the key component of the big data management system is faulty, the first alarm information is generated according to the state information after the switching between the primary node and the standby node, and the state information after the node switching includes: an activated state or an inactive state, Inactive states include standby status, downtime, or outage.
- the primary node and the standby node automatically switch the service state.
- the first alarm information is generated according to the state information of the active and standby nodes after the switchover.
- the first alarm information includes one or more of the following: a time when the state switch occurs, a fault name, an alarm severity, an alarm code, an IP address of the server where the YARN is located, and a current service name.
- the alarm code corresponds to the alarm level and the fault name.
- the alarm code is different, and the corresponding alarm level and fault name are different.
- Step 20 Acquire a first fault solution corresponding to the first alarm information according to the corresponding relationship between the alarm information and the fault solution.
- Step 30 Output the first alarm information and the first fault solution.
- step 10 may be specifically classified into the following scenarios because different alarm information is different according to different faults:
- Scenario 1 When only one of the status information after the master node and the standby node are in the active state, the first level alarm information is generated.
- the initial state of the primary node is the active state
- the initial state of the standby node is the standby standby state.
- Scenario 2 When the status information after the master node and the standby node are in the active state or are not active, the second level alarm information is generated.
- both the master node and the standby node have a state switch. However, after the switchover, the nodes are active or not active. After the master/slave node switches, more than one node is active. Or there is no node in the active state. In this case, YARN cannot provide services externally. In this way, the alarm level of the second level alarm information is higher than the level of the first level alarm information;
- step 20 includes the following scenarios:
- Scenario 3 (corresponding to scenario 1): When the first alarm information is the first level alarm information, the fault solution for automatically recovering the fault is obtained.
- the YARN can still provide normal services after the active/standby node switches state, that is, the fault can be automatically restored in this case.
- Scenario 4 (corresponding to scenario 2): When the first alarm information is the second level alarm information, obtain a fault solution that needs to manually recover the fault.
- the first alarm information is the second-level alarm information
- the YARN cannot provide the service after the active/standby node switches the state, that is, the fault cannot be automatically restored in this case, and the operation and maintenance personnel need to manually To perform fault repair, it is necessary to obtain a first fault solution corresponding to the first alarm information, so that the operation and maintenance personnel can repair the YARN according to the prompt of the first fault solution, so that it can resume normal service.
- the corresponding one to the scenario 1 may include the following situations:
- Case 1 When the primary node is switched from the active state to the standby state, and the standby node is switched from the standby state to the active state, the first-level first sub-alarm information is generated.
- the active and standby nodes of the YARN switch normally, and the alarm information is reported once.
- the alarm level is slight.
- the content of the fault solution can be differentiated according to the alarm code. For example, the alarm code number is 001.
- YARN can operate normally with a low alarm level and does not need to be processed immediately. That is, after the active and standby nodes are successfully switched, that is, the YARN service still has only one primary node and one standby node, and the fault can be automatically restored.
- the resource manager ResourceManager process of the master node exits or goes down, and the status of the master node is stopped.
- the standby node automatically switches to the master node and reports an alarm to the big data management system.
- the alarm level is slight.
- the alarm code number is 002, indicating that the YARN can run normally.
- the alarm level is slight and does not need to be processed immediately.
- the YARN has only one master node. In this case, the YARN can provide services to the outside, and the fault can be automatically restored.
- the corresponding scenario 2 may include the following situations:
- Case 3 The primary node is still in the active state, and the second-level first sub-alarm information is generated when the standby node is switched from the standby state to the active state.
- the master node of the YARN is not successfully switched, and the backup of the standby node is successful.
- the two master nodes are active, that is, the active and standby nodes are in the active state.
- the fault cannot be recovered.
- the alarm code is 003 in this case, indicating that the YARN cannot operate normally, and the maintenance personnel need to follow the steps of the solution.
- the maintenance personnel can be processed according to the fault solution.
- the fault can be restored only when there is only one primary node and one standby node.
- Execute a script b.sh which forces the state of one of the nodes to be switched.
- the master node of the YARN is successfully switched, and the standby node is not successfully switched.
- the two standby nodes are in the standby state, that is, the active and standby nodes are in the standby state.
- the fault cannot be recovered.
- the alarm code is 004 in this case, indicating that the YARN cannot operate normally, and maintenance personnel are required to follow the steps of the solution.
- the maintenance personnel can be processed according to the fault solution.
- the fault can be restored only when there is only one primary node and one standby node.
- a specific fault solution refer to: Execute a script a.sh, which forces the state of a node to be switched.
- the status of the primary node is stopped, and the standby node is not successfully switched. That is, the YARN service has only one standby node.
- the alarm severity is severe and the fault cannot be automatically restored.
- the alarm code is used in this case. If it is 005, it means that YARN is not working properly. It needs maintenance personnel to follow the steps of the solution. The maintenance personnel can be processed according to the fault solution. Only when there is only one primary node and one standby node, the fault can be recovered.
- the specific fault solution refer to the following: first check whether the firewall of the previous master node is enabled. If the firewall is turned off, check whether the zookeeper service is running. If the ZooKeeper service is abnormal, restore the normal operation of the service. The script that starts the ResourceManager is started on the stopped node to start the node.
- YARN is a general resource management system
- it may run short or long jobs, such as various long-running services (such as Storm, thirft server, etc.), if each state switch between the primary and backup nodes will cause All running tasks and jobs are recalculated or restarted, that is, the jobs already running on the YARN master node will be re-run on the standby node with the active state after the switchover, when set in the YARN configuration file yarn-site.xml If there is a maximum number of handovers and the number of handovers exceeds the number of parameters set in the YARN configuration file yarn-site.xml, although the handover is successful, the impact on the running job is large, and the entire job needs to be submitted again on the client. run.
- step 20 it also includes:
- the alarm log is generated according to the first alarm information and the first fault solution, and the alarm log is saved locally.
- the alarm log records the first alarm information generated after the switch between the active and standby nodes, that is, the switch time of the active and standby nodes, the current service name, the server IP address of the YARN, and the alarm code are recorded in the alarm log. And information such as fault resolution. Generate and save the alarm log. It is easy for the operation and maintenance personnel to fully grasp the alarm information of YARN. It can analyze the internal causes of YARN faults in a macroscopic manner, facilitate the timely discovery of hidden problems in YARN, and solve the hidden problems to prevent YARN. Multiple failures caused by multiple failures.
- Manner 1 The first alarm information and the first fault solution are sent to the mobile terminal pre-bound with the server where the key component is located.
- the corresponding relationship between the server where the YARN is located and the mobile phone of the corresponding operation and maintenance personnel may be bound in advance.
- the generated alarm information is sent to the mobile phone of the corresponding operation and maintenance personnel, for example, by using the short message mode. hair.
- Manner 2 The first alarm information and the first fault solution are output to a display device of the big data management system for display.
- the alarm information and the corresponding fault solution are output to the fault display device of the system for display, so that the operation and maintenance personnel can conveniently recover the fault by viewing the prompt.
- mode one and mode two are contradictory. If necessary, two ways can be used for output.
- the embodiment of the present invention generates a solution corresponding to the alarm information by generating a corresponding alarm information after the switching between the active and standby nodes related to the high-availability of the critical component, and outputs the solution together with the alarm information.
- This method can solve the problem of faults in time, and can discover and deal with cluster problems in time to ensure the high availability and reliability of the big data management system.
- the embodiment of the invention further provides a computer storage medium, wherein the computer storage medium stores computer executable instructions, and the computer executable instructions are used to execute the above method.
- a fault alarming device for a key component is applied to a big data management system, including:
- the generating module 101 is configured to generate the first alarm information according to the state information of the switching between the master node and the standby node when the master node of the key component of the big data management system is faulty, and the state information after the node switching includes: an activated state or an inactive state.
- the inactive state includes a standby state, downtime, or outage;
- the obtaining module 201 is configured to acquire a first fault solution corresponding to the first alarm information according to the corresponding relationship between the alarm information and the fault solution.
- the output module 301 is configured to output the first alarm information and the first fault solution.
- the generating module 101 includes:
- the first generating unit is configured to generate first level alarm information when only one of the status information after the switching between the primary node and the standby node is in an active state;
- the second generating unit is configured to be activated when the primary node and the standby node are switched. When the state is not active, the second level alarm information is generated. The level of the second level alarm information is higher than the level of the first level alarm information.
- the first generating unit includes:
- the first generation sub-unit is configured to generate the first-level first sub-alarm information when the primary node is switched from the active state to the standby state, and the standby node is switched from the standby state to the active state;
- the second generation sub-unit is configured to generate the first-level second sub-alarm information when the primary node is down or down, and the standby node is switched from the standby state to the active state.
- the second generating unit includes:
- a third generation sub-unit configured to generate a second-level first sub-alarm information when the primary node is still in an active state, and the standby node is switched from the standby state to the active state;
- a fourth generation sub-unit configured to generate a second-level second sub-alarm information when the primary node is switched from the active state to the standby state, and the standby node is still in the standby state;
- the fifth generation sub-unit is configured to generate the second-level third sub-alarm information when the primary node is down or down, and the standby node is still in the standby state.
- the obtaining module 201 includes:
- the first obtaining unit is configured to obtain a fault solution for automatically recovering the fault when the first alarm information is the first level alarm information;
- the second obtaining unit is configured to obtain a fault solution that needs to manually recover the fault when the first alarm information is the second level alarm information.
- the output module 301 includes:
- a first output unit configured to send the first alarm information and the first fault solution to the mobile terminal pre-bound with the server where the key component is located;
- the second output unit is configured to output the first alarm information and the first fault solution to the display device of the big data management system for display.
- the fault alarm device of the key component further includes:
- a storage module configured to generate an alarm day according to the first alarm information and the first fault solution And save the alarm log to the local.
- the device is a system corresponding to the above-mentioned key component alarming method. All the implementation manners in the foregoing method embodiments are applicable to the embodiment of the device, and the same technical effects can be achieved.
- a big data management system comprising a critical component fault alarm device as described above.
- all or part of the steps of the above embodiments may also be implemented by using an integrated circuit. These steps may be separately fabricated into individual integrated circuit modules, or multiple modules or steps may be fabricated into a single integrated circuit module. achieve.
- the devices/function modules/functional units in the above embodiments may be implemented by a general-purpose computing device, which may be centralized on a single computing device or distributed over a network of multiple computing devices.
- each device/function module/functional unit in the above embodiment When each device/function module/functional unit in the above embodiment is implemented in the form of a software function module and sold or used as a stand-alone product, it can be stored in a computer readable storage medium.
- the above mentioned computer readable storage medium may be a read only memory, a magnetic disk or an optical disk or the like.
- the above technical solution can timely solve the problem of failure, and can timely discover and deal with the cluster problem, and ensure the high availability and reliability of the big data management system.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Telephonic Communication Services (AREA)
Abstract
La présente invention concerne un procédé et un appareil d'alarme de défaillance destinés à un élément clé, et un système de gestion de mégadonnées. Le procédé comprend les étapes suivantes consistant : lorsqu'un nœud principal d'un élément clé d'un système de gestion de mégadonnées échoue, à générer des premières informations d'alarme selon des informations d'état concernant le nœud principal et un nœud de secours après avoir été commuté ; selon une corrélation entre les informations d'alarme et une solution de défaillance, à acquérir une première solution de défaillance correspondant aux premières informations d'alarme ; et à émettre les premières informations d'alarme et la première solution de défaillance.
Dans la solution technique, des informations d'alarme correspondantes sont générées après la commutation de nœuds principaux et de secours se rapportant à la haute disponibilité d'un élément clé, et une solution correspondant aux informations d'alarme sont acquises et sont émises en même temps que les informations d'alarme.
Au moyen du procédé, le problème d'une défaillance peut être traité en temps opportun, et un problème par grappes peut être découvert et traité à temps, ce qui permet d'assurer la haute disponibilité et la fiabilité d'un système de gestion de mégadonnées.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201510253928.5A CN106301823B (zh) | 2015-05-19 | 2015-05-19 | 一种关键组件的故障告警方法、装置及大数据管理系统 |
| CN201510253928.5 | 2015-05-19 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2016183967A1 true WO2016183967A1 (fr) | 2016-11-24 |
Family
ID=57319234
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2015/089361 Ceased WO2016183967A1 (fr) | 2015-05-19 | 2015-09-10 | Procédé et appareil d'alarme de défaillance destinés à un élément clé et système de gestion de mégadonnées |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN106301823B (fr) |
| WO (1) | WO2016183967A1 (fr) |
Cited By (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108733511A (zh) * | 2018-03-23 | 2018-11-02 | 成都安信思远信息技术有限公司 | 一种基于大数据的电子数据处理方法 |
| CN111740868A (zh) * | 2020-07-07 | 2020-10-02 | 腾讯科技(深圳)有限公司 | 告警数据的处理方法和装置及存储介质 |
| CN113645650A (zh) * | 2021-07-09 | 2021-11-12 | 三维通信股份有限公司 | 主备切换的处理方法、系统、电子装置和存储介质 |
| CN113760607A (zh) * | 2021-08-31 | 2021-12-07 | 云尖信息技术有限公司 | 一种双bmc主备和数据同步方法 |
| CN115499295A (zh) * | 2022-07-29 | 2022-12-20 | 浪潮通信技术有限公司 | 服务器故障上报方法、装置、电子设备及存储介质 |
| CN115705259A (zh) * | 2021-08-06 | 2023-02-17 | 中移(苏州)软件技术有限公司 | 故障处理方法、相关设备及存储介质 |
| WO2024066346A1 (fr) * | 2022-09-27 | 2024-04-04 | 中兴通讯股份有限公司 | Procédé et appareil de traitement d'alarme, support de stockage et appareil électronique |
Families Citing this family (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107087021B (zh) * | 2017-03-30 | 2020-10-16 | 聚好看科技股份有限公司 | 主从服务器确定方法及装置 |
| CN111541753B (zh) * | 2020-04-16 | 2024-02-27 | 深圳市迅雷网络技术有限公司 | 区块链数据的分布式存储系统、方法、计算机设备及介质 |
| CN111693803A (zh) * | 2020-05-26 | 2020-09-22 | 日立楼宇技术(广州)有限公司 | 高低温湿热试验系统、测试控制方法及故障保护方法 |
| CN111880934A (zh) * | 2020-07-29 | 2020-11-03 | 北京浪潮数据技术有限公司 | 一种资源管理方法、装置、设备及可读存储介质 |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2008177796A (ja) * | 2007-01-17 | 2008-07-31 | Fuji Electric Fa Components & Systems Co Ltd | 省配線システム、そのマスタ通信装置、そのプログラム、表示制御方法 |
| CN101662387A (zh) * | 2009-10-14 | 2010-03-03 | 中国电信股份有限公司 | 一种检测网络中计算机接入状态的系统及方法 |
| CN101674195A (zh) * | 2009-10-13 | 2010-03-17 | 中兴通讯股份有限公司 | 主备倒换信号处理方法和装置 |
| CN101917283A (zh) * | 2010-07-22 | 2010-12-15 | 北京交通大学 | 双通道热备系统及实现双通道热备的方法 |
| CN103107904A (zh) * | 2011-11-15 | 2013-05-15 | 北京南车时代信息技术有限公司 | 一种ats系统控制中心应用服务器的双机切换方法 |
Family Cites Families (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101887387A (zh) * | 2010-04-07 | 2010-11-17 | 山东高效能服务器和存储研究院 | 一种远程智能监控与分析raid故障的方法 |
| CN102726000B (zh) * | 2011-07-22 | 2017-06-16 | 华为技术有限公司 | 故障通告方法、检测装置、转发装置及系统 |
| CN102752093B (zh) * | 2012-06-29 | 2016-02-10 | 中国联合网络通信集团有限公司 | 基于分布式文件系统的数据处理方法、设备和系统 |
| CN102882927B (zh) * | 2012-08-29 | 2016-12-21 | 华南理工大学 | 一种云存储数据同步框架及其实现方法 |
| TW201421232A (zh) * | 2012-11-19 | 2014-06-01 | Ibm | 在一冗餘群組中實施故障備援的方法、裝置與電腦程式產品 |
| CN103532753B (zh) * | 2013-10-11 | 2016-08-17 | 中国电子科技集团公司第二十八研究所 | 一种基于内存换页同步的双机热备方法 |
| CN103617231A (zh) * | 2013-11-26 | 2014-03-05 | 国家电网公司 | 大数据管理系统 |
-
2015
- 2015-05-19 CN CN201510253928.5A patent/CN106301823B/zh active Active
- 2015-09-10 WO PCT/CN2015/089361 patent/WO2016183967A1/fr not_active Ceased
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2008177796A (ja) * | 2007-01-17 | 2008-07-31 | Fuji Electric Fa Components & Systems Co Ltd | 省配線システム、そのマスタ通信装置、そのプログラム、表示制御方法 |
| CN101674195A (zh) * | 2009-10-13 | 2010-03-17 | 中兴通讯股份有限公司 | 主备倒换信号处理方法和装置 |
| CN101662387A (zh) * | 2009-10-14 | 2010-03-03 | 中国电信股份有限公司 | 一种检测网络中计算机接入状态的系统及方法 |
| CN101917283A (zh) * | 2010-07-22 | 2010-12-15 | 北京交通大学 | 双通道热备系统及实现双通道热备的方法 |
| CN103107904A (zh) * | 2011-11-15 | 2013-05-15 | 北京南车时代信息技术有限公司 | 一种ats系统控制中心应用服务器的双机切换方法 |
Cited By (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108733511A (zh) * | 2018-03-23 | 2018-11-02 | 成都安信思远信息技术有限公司 | 一种基于大数据的电子数据处理方法 |
| CN111740868A (zh) * | 2020-07-07 | 2020-10-02 | 腾讯科技(深圳)有限公司 | 告警数据的处理方法和装置及存储介质 |
| CN111740868B (zh) * | 2020-07-07 | 2023-12-15 | 腾讯科技(深圳)有限公司 | 告警数据的处理方法和装置及存储介质 |
| CN113645650A (zh) * | 2021-07-09 | 2021-11-12 | 三维通信股份有限公司 | 主备切换的处理方法、系统、电子装置和存储介质 |
| CN113645650B (zh) * | 2021-07-09 | 2024-06-04 | 三维通信股份有限公司 | 主备切换的处理方法、系统、电子装置和存储介质 |
| CN115705259A (zh) * | 2021-08-06 | 2023-02-17 | 中移(苏州)软件技术有限公司 | 故障处理方法、相关设备及存储介质 |
| CN113760607A (zh) * | 2021-08-31 | 2021-12-07 | 云尖信息技术有限公司 | 一种双bmc主备和数据同步方法 |
| CN115499295A (zh) * | 2022-07-29 | 2022-12-20 | 浪潮通信技术有限公司 | 服务器故障上报方法、装置、电子设备及存储介质 |
| WO2024066346A1 (fr) * | 2022-09-27 | 2024-04-04 | 中兴通讯股份有限公司 | Procédé et appareil de traitement d'alarme, support de stockage et appareil électronique |
Also Published As
| Publication number | Publication date |
|---|---|
| CN106301823B (zh) | 2020-12-18 |
| CN106301823A (zh) | 2017-01-04 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2016183967A1 (fr) | Procédé et appareil d'alarme de défaillance destinés à un élément clé et système de gestion de mégadonnées | |
| US10078564B2 (en) | Preventing split-brain scenario in a high-availability cluster | |
| US9450700B1 (en) | Efficient network fleet monitoring | |
| US9189348B2 (en) | High availability database management system and database management method using same | |
| CN111917846A (zh) | 一种Kafka集群切换方法、装置、系统、电子设备及可读存储介质 | |
| CN106156318B (zh) | 一种实现多节点数据库高可用的系统及方法 | |
| WO2020062211A1 (fr) | Procédé et système pour journal inviolable de stockage de mimétisme fusionné à une technologie de chaîne de blocs | |
| CN104679604A (zh) | 一种主节点和备节点切换的方法和装置 | |
| CN104408071A (zh) | 一种基于集群管理器的分布式数据库高可用方法及系统 | |
| CN112217847B (zh) | 微服务平台及其实现方法、电子设备及存储介质 | |
| CN102394914A (zh) | 集群脑裂处理方法和装置 | |
| CN105243004A (zh) | 一种故障资源检测方法及装置 | |
| CN113190620B (zh) | Redis集群之间数据的同步方法、装置、设备及存储介质 | |
| CN111198662A (zh) | 一种数据存储方法、装置和计算机可读存储介质 | |
| CN104468537A (zh) | 实现安全审计的系统及方法 | |
| CN104360918B (zh) | 一种智能变电站系统自诊断与自恢复方法 | |
| CN106453504A (zh) | 一种基于nginx服务器集群的监控系统及方法 | |
| CN116302352A (zh) | 集群灾备处理方法、装置、电子设备和存储介质 | |
| CN107528703B (zh) | 一种用于管理分布式系统中节点设备的方法与设备 | |
| CN110704281A (zh) | 一种监控系统运行的方法 | |
| CN119105913B (zh) | 数据备份与恢复方法、装置、存储介质及计算机设备 | |
| CN112463514A (zh) | 分布式缓存集群的监测方法和装置 | |
| CN114301763B (zh) | 分布式集群故障的处理方法及系统、电子设备及存储介质 | |
| CN115499444B (zh) | 跨集群负载均衡的方法、装置、设备和存储介质 | |
| WO2019241199A1 (fr) | Système et procédé de maintenance prédictive de dispositifs en réseau |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15892345 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 15892345 Country of ref document: EP Kind code of ref document: A1 |