Disclosure of Invention
The invention aims to provide a monitoring system, a monitoring method, a monitoring device, a monitoring medium and a monitoring program of a basic input/output system, which are used for improving the starting-up starting rate of a server.
In order to solve the technical problems, the invention provides a monitoring system of a basic input/output system, which comprises a first controller and a first change-over switch;
The first controller is configured to communicate with a first bios after power-on to obtain a start-up state of the first bios, and if it is identified that an actual start-up time in a start-up stage of the first bios exceeds a first start-up time corresponding to the start-up stage, determine that the first bios fails to start, and control the first switch to switch the first bios to a second bios;
The number of the starting-up stages is multiple, and the first starting time is determined according to equipment configuration parameters corresponding to the starting-up stages.
In one aspect, the boot-up stage includes at least two of a central processor boot-up stage, a memory initialization stage, a peripheral loading stage, an operating system loader loading stage, and an operating system running stage.
In another aspect, the first controller monitors the cpu boot phase of the first bios based on the boot state, including:
Detecting a signal for powering up a central processing unit, and determining that a starting stage of the central processing unit is started;
And if the CPU reset signal sent by the CPU is detected, determining that the CPU starting stage is finished.
In another aspect, the first controller monitors the memory initialization stage of the first bios according to the start-up state, including:
detecting a reset signal of the central processing unit, and determining that the memory initialization stage begins;
If the peripheral initialization signal is detected, determining that the memory initialization stage is finished.
In another aspect, the first controller monitors the peripheral loading phase of the first bios based on the start-up status, comprising:
Detecting a peripheral initialization signal, and determining that the peripheral loading stage begins;
If the loading signal of the operating system loader is detected, determining that the peripheral loading stage is finished.
In another aspect, the first controller monitors the operating system loader loading phase of the first bios based on the boot state, comprising:
detecting an operating system loader loading signal, and determining that the operating system loader loading stage begins;
if the operating system kernel initialization signal is detected, determining that the loading stage of the operating system loader is finished.
In another aspect, the first controller monitors an operating system running phase of the first bios according to the start-up state, including:
Detecting an operating system kernel initialization signal, and determining that the operating system operation stage starts;
If the operating system is detected to finish the preset number of operating cycles, determining that the operating system operating stage is finished.
In another aspect, the first controller obtaining the start-up state and identifying that the actual start-up time exceeds the first start-up time includes:
After the first controller determines that the first basic input and output system enters a current starting-up starting stage, monitoring output information of the first basic input and output system;
if the information of the current startup stage, which is sent by the first basic input and output system and is executed by the current startup stage, is not monitored within the corresponding first startup time, the first controller determines that the actual startup time corresponding to the current startup stage exceeds the corresponding first startup time.
In another aspect, the first controller monitors output information of the first basic input and output system, including:
After the first controller monitors the starting information of the starting start-up stage sent by the first basic input/output system, configuring and starting a first timer corresponding to the starting start-up stage according to the first start-up time corresponding to the starting start-up stage;
and after the first controller monitors the information of the completion of the startup stage sent by the first basic input and output system, closing the first timer.
In another aspect, the first controller obtaining the start-up state and identifying that the actual start-up time exceeds the first start-up time includes:
after determining that the first basic input/output system enters a current starting-up starting stage, the first controller accesses the first basic input/output system in the first starting time corresponding to the starting-up starting stage to acquire the starting state;
And if the starting state is that the first basic input/output system does not execute the current starting-up starting stage after the first starting time is reached, the first controller determines that the actual starting time corresponding to the current starting-up starting stage exceeds the corresponding first starting time.
In another aspect, the first controller determines that the first bios enters a current boot-up start-up phase, including:
And the first controller accesses the first basic input output system and determines that the first basic input output system enters the current starting-up stage after acquiring that the first basic input output system completes the last starting-up stage.
On the other hand, after determining that the first bios enters the current startup phase, the first controller accesses the first bios within the first startup time corresponding to the startup phase to obtain the startup state, including:
After determining that the first basic input/output system enters the current starting-up starting stage, the first controller multiplies the first starting time corresponding to the current starting-up starting stage by a preset proportionality coefficient to obtain a second starting time;
The first controller configures a second timer according to the second starting time, configures a third timer according to the first starting time, and starts the second timer and the third timer;
After the second timer expires, the first controller accesses the first basic input output system to acquire the starting state, if the first basic input output system does not complete the current starting-up starting stage, the first controller continues waiting, if the first basic input output system has completed the current starting-up starting stage, the third timer is closed, and the first basic input output system is determined to enter the next starting-up starting stage;
And when the third timer expires, the first controller accesses the first basic input output system to acquire the starting state, determines that the actual starting time exceeds the first starting time if the first basic input output system does not complete the current starting stage, and determines that the first basic input output system enters the next starting stage if the first basic input output system completes the current starting stage.
In another aspect, the first controller is a baseboard management controller;
the first controller, after being powered on, obtains a start-up state of the first basic input output system by communicating with the first basic input output system, including:
and the baseboard management controller acquires the starting state through an intelligent platform management interface command.
In another aspect, the first controller is a complex programmable logic device;
the first controller, after being powered on, obtains a start-up state of the first basic input output system by communicating with the first basic input output system, including:
The complex programmable logic device receives the information of the starting state output by the first basic input and output system through an integrated circuit bus.
In another aspect, the first controller is a complex programmable logic device;
the first controller, after being powered on, obtains a start-up state of the first basic input output system by communicating with the first basic input output system, including:
The complex programmable logic device receives the information of the starting state sent by the baseboard management controller.
In order to solve the above technical problem, the present invention further provides a monitoring method of a basic input/output system, applied to a first controller, including:
After power-on, acquiring a starting state of the first basic input output system by communicating with the first basic input output system;
If the fact that the actual starting time exceeds the first starting time corresponding to the starting stage exists in the starting stage of the first basic input output system is identified, determining that the first basic input output system fails to start;
after determining that the first basic input/output system fails to start, controlling a first switching switch to switch the first basic input/output system to a second basic input/output system;
The number of the starting-up stages is multiple, and the first starting time is determined according to equipment configuration parameters corresponding to the starting-up stages.
In order to solve the above technical problem, the present invention further provides a monitoring device of a basic input/output system, which is applied to a first controller, and includes:
The monitoring unit is used for acquiring the starting state of the first basic input/output system by communicating with the first basic input/output system after power-on;
The identification unit is used for determining that the first basic input and output system is failed to start if the actual starting time exceeds the first starting time corresponding to the starting stage in the starting stage of the first basic input and output system;
The control unit is used for controlling the first switching switch to switch the first basic input/output system to the second basic input/output system after determining that the first basic input/output system fails to start;
The number of the starting-up stages is multiple, and the first starting time is determined according to equipment configuration parameters corresponding to the starting-up stages.
In order to solve the technical problem, the present invention further provides a monitoring device of a basic input/output system, including:
A memory for storing a computer program;
And a processor for executing the computer program, which when executed by the processor, implements the steps of the method for monitoring a basic input output system as described above.
To solve the above technical problem, the present invention further provides a non-volatile storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the monitoring method of the basic input output system described above.
To solve the above technical problem, the present invention further provides a computer program product, which includes a computer program, where the computer program when executed by a processor implements the steps of the monitoring method of the basic input/output system.
The monitoring system of the basic input/output system has the advantages that the first controller determines the first starting time aiming at equipment configuration parameters corresponding to a plurality of starting stages of the first basic input/output system, after the first controller is electrified, the first controller communicates with the first basic input/output system to acquire the starting state of the first basic input/output system, if the fact that the actual starting time exceeds the first starting time corresponding to the starting stage in the starting stage of the first basic input/output system is recognized, the first basic input/output system is determined to be failed to start, and the first switch is controlled to switch the first basic input/output system to the second basic input/output system, so that when the first basic input/output system fails to start at a certain stage in the starting process, the first basic input/output system can be quickly switched to the second basic input/output system, the starting time-out time for judging the starting of the first basic input/output system is shortened, and the starting speed of a server is improved.
The monitoring method, the device, the equipment, the nonvolatile storage medium and the computer program product of the basic input/output system have the beneficial effects and are not repeated here.
Detailed Description
The invention provides a monitoring system, a method, a device, equipment and a medium of a basic input/output system, which are used for improving the starting speed of a server.
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the server system, a basic input output system memory (BIOS Flash) is used to store Firmware (Firmware) of the basic input output system. In one type of server, the bios memory stores firmware of the unified extensible firmware interface (Unified Extensible FIRMWARE INTERFACE, UEFI), and after the server is powered on, the bios starts initialization hardware and boots an Operating System (OS).
In order to improve the reliability of startup, two basic input/output systems are usually deployed on a server motherboard, one main basic input/output system is one standby basic input/output system, the main basic input/output system is the basic input/output system which is started by default when the server is started, and when the main basic input/output system is damaged due to virus, human upgrading and other reasons, the main basic input/output system cannot boot-load an operating system, that is, when the main basic input/output system fails to start and is in downtime, the main basic input/output system can be switched to be started, so that the server system cannot be started, and the reliability of the use of the server is improved.
In the traditional scheme, no automatic switching scheme exists between the main and standby basic input and output systems, and only after operation and maintenance personnel observe that the main basic input and output system fails to start, the main basic input and output system can be manually switched to the basic input and output system memory. This requires 24 hours of manual duty, wastes labor, and can result in the server being in downtime for a long period of time.
Therefore, it is proposed by those skilled in the art to set a timer for the primary bios, and if the primary bios is not successfully started beyond a predetermined time, the primary bios is automatically switched to the standby bios for starting by a switch (switch), so that no manual watch is required, and the downtime of the server is shortened.
However, in the scheme of automatic switching of the main and standby basic input/output systems in the related art, when the loading of the basic input/output system by the central processing unit fails or the basic input/output system cannot be started to run after loading, a certain timeout time is required to be waited for switching to the standby basic input/output system, and in order to avoid the situation that the main basic input/output system is misjudged to be failed to start in a normal starting stage, the time length of the timer is required to be set by considering the longest starting time of the main basic input/output system, so that after the main basic input/output system fails, the standby basic input/output system is required to be waited for a long time to be automatically switched to, and the starting time of the server is further longer.
Therefore, in the monitoring system of the basic input/output system provided by the embodiment of the invention, the first controller determines the first starting time according to the equipment configuration parameters corresponding to a plurality of starting stages of the first basic input/output system, and after the first controller is electrified, the first controller communicates with the first basic input/output system to acquire the starting state of the first basic input/output system, and determines that the first basic input/output system fails to start if the actual starting time exceeds the first starting time corresponding to the starting stage in the starting stage of the first basic input/output system is identified, and controls the first switch to switch the first basic input/output system to the second basic input/output system, so that the first controller can be quickly switched to the second basic input/output system when the first basic input/output system fails to start at a certain stage in the starting process, the starting time of the first basic input/output system is shortened, and the starting time of the first basic input/output system is judged to be overtime, so that the starting speed of the server is improved.
Fig. 1 is a schematic diagram of a monitoring system of a basic input/output system according to an embodiment of the present invention.
As shown in fig. 1, a monitoring system of a basic input output system provided by an embodiment of the present invention may include a first controller and a first switch 101.
The first controller is configured to communicate with the first bios after power-on to obtain a start-up state of the first bios, and if it is identified that an actual start-up time in a start-up stage of the first bios exceeds a first start-up time corresponding to the start-up stage, determine that the first bios fails to start up, and control the first switch 101 to switch the first bios to the second bios.
The number of the starting-up stages is multiple, and the first starting time is determined according to equipment configuration parameters corresponding to the starting-up stages.
In an embodiment of the present invention, the first controller may be a baseboard management controller (Baseboard Management Controller, BMC) or a complex programmable logic device (Complex Programmable logic device, CPLD).
The first switch 101 may be a physical switch on a server motherboard or baseboard management controller card. The first switch 101 is used for gating a connection relationship between a platform controller (such as a platform controller hub Platform Controller Hub, PCH, also called an integrated south bridge) of a server and a memory of a basic input output system, that is, for a first basic input output system and a second basic input output system, the first switch 101 includes at least two gating channels, where the first channel is a memory where the platform controller and the first basic input output system are located, and the second channel is a memory where the platform controller and the second basic input output system are located. The first switch 101 and the platform controller, the first switch 101 and the memory where the first bios is located, and the first switch 101 and the memory where the second bios is located may all be connected through a serial peripheral interface bus (SERIAL PERIPHERAL INTERFACE, SPI).
If the first controller employs a complex programmable logic device, the complex programmable logic device may be connected to the first switch 101 using an integrated circuit bus to control the chip selection of the first switch 101. Further, the complex programmable logic device may be connected to the platform controller through a first integrated circuit bus, and the complex programmable logic device may be connected to the controlled terminal of the first switch 101 through a second integrated circuit bus. The first integrated Circuit bus and the second integrated Circuit bus may be two-wire serial buses (Inter-INTEGRATED CIRCUIT, I2C) or modified integrated Circuit buses (I3C).
If the first controller is a baseboard management controller, the baseboard management controller may be connected to the platform controller by using a low pin count Bus (Low Pin Count Bus, LPC Bus), and the baseboard management controller may control the chip selection of the first switch 101 by using an intelligent platform management interface (INTELLIGENT PLATFORM MANAGEMENT INTERFACE, IPMI) command.
In an embodiment of the present invention, the first controller may communicate with the first bios and the second bios using the first bus to obtain the start-up status thereof.
If the first controller adopts a complex programmable logic device, the first bus may be an integrated circuit bus, and the complex programmable logic device may be connected to the first basic input/output system through a third integrated circuit bus, and the complex programmable logic device may be connected to the second basic input/output system through a fourth integrated circuit bus, where the third integrated circuit bus and the fourth integrated circuit bus may each be a two-wire serial bus (Inter-INTEGRATED CIRCUIT, I2C) or an improved integrated circuit bus.
If the first controller adopts the baseboard management controller, the baseboard management controller can communicate with the first basic input output system and the second basic input output system through the intelligent platform management interface command to acquire the states of the first basic input output system and the second basic input output system, for example, for the first basic input output system, the first basic input output system can output the starting state of the first basic input output system by sending the intelligent platform management interface command to the baseboard management controller.
After the server is powered on, there is a basic input output system selected by default, which is defined as a first basic input output system in the embodiment of the present invention, and may be a main basic input output system or a standby basic input output system defined in the related art. In a specific implementation, the main bios may be defaulted to be the first bios, but if the main bios fails before the last shutdown of the server, the next default startup of the standby bios may be set before the shutdown, and then the first bios is referred to as the standby bios.
After the first controller is powered on, the first bios may be determined by reading a slot number register (BIOS Flash Slot Number) of the bios memory and based on the value of the slot number register. If the slot number register is represented by 0, 1, two bios's are represented. If the values of the slot number registers read by the first controller are not 0 and 1, the bios with the number of 0 may be adopted as the first bios by default and written into a register in the nonvolatile memory (nvram). If the value of the slot number register read by the first controller is 0 or 1, according to whether the starting of the monitoring basic input/output system is overtime, if the starting is overtime, the value of the slot number register is inverted and then a fragment is set, and if the starting is not overtime, the chip selection of the basic input/output system is carried out according to the value of the slot number register read.
In the embodiment of the invention, the second basic input output system is defined as the basic input output system which is started in a secondary selection and is in a normal state, that is, the second basic input output system can be started normally after the first basic input output system fails to be started. In a specific implementation, the number of second bios may be one or more.
In the embodiment of the invention, the starting process of the basic input/output system is divided into a plurality of stages, and one stage of starting is called a starting stage.
In some optional implementations of the embodiments of the present invention, dividing the process of starting the bios into a plurality of startup phases may include dividing at least one hardware startup process or at least one software loading process into the same startup phase according to a hardware startup sequence and a software loading sequence of a server in which the bios startup process is performed. In specific implementation, the starting sequence of various hardware and software can be divided according to the hardware starting sequence and the software loading sequence of the server in the starting process of the basic input/output system, all hardware and software or part of hardware and software can be used as the basis for dividing the starting stage of the starting machine, and the sequence of the starting stage of the starting machine is determined according to the starting sequence.
Depending on the device configuration parameters (which may include software configuration parameters and hardware configuration parameters in particular) of the server, the start-up time corresponding to each start-up phase may be different during normal start-up. In the embodiment of the invention, the starting time corresponding to the starting stage in normal starting is defined as the first starting time, and the first starting time which can cover the maximum time length of normal execution of the starting stage is determined according to the equipment configuration parameters corresponding to the starting stage.
The first controller can pre-store the first starting time corresponding to each starting stage, and monitor whether the actual starting time of the starting stage is overtime or not according to the first starting time. In addition, in different startup times of the server, the first startup time corresponding to the startup stage may be different, and at this time, the first controller may prestore the first startup time corresponding to each startup stage of the server under different startup times, or the first bios may send the updated first startup time to the first controller after startup.
According to the unified extensible firmware interface specification, the starting phase of the basic input/output system can be divided into a central processing unit starting phase, a memory initializing phase, a peripheral loading phase, an operating system loader loading phase and an operating system running phase. In some optional implementations of the embodiments of the present invention, the boot-up stage may include at least two of a central processor boot-up stage, a memory initialization stage, a peripheral loading stage, an operating system loader loading stage, and an operating system running stage. In alternative implementations of the embodiments of the present invention, multiple phases may be monitored as a power-on start-up phase. In other alternative implementations of the embodiments of the present invention, more or fewer boot-up phases may be provided.
After the first controller is powered on, the first basic input/output system is determined by reading the slot number register of the basic input/output system memory, and the starting state of the first basic input/output system is obtained by communicating with the first basic input/output system, after the first basic input/output system enters a starting-up stage, the state executed in the starting-up stage is monitored according to the first starting time corresponding to the starting-up stage, if the starting-up stage is not executed yet beyond the first starting time, the actual starting time of the starting-up stage is beyond the first starting time, and the starting-up stage is failed, at this time, the starting failure of the first basic input/output system can be determined without waiting for the subsequent starting-up stage to be overtime, and the first switch 101 is controlled to switch the first basic input/output system to the second basic input/output system for starting up so as to load the operating system.
According to the monitoring system of the basic input/output system, the first controller acquires the starting state of the first basic input/output system after power-on through the first starting time of a plurality of starting stages of the first basic input/output system according to the equipment configuration parameters, if the fact that the actual starting time exceeds the first starting time corresponding to the starting stage in the starting stage of the first basic input/output system is recognized, the first basic input/output system is determined to be failed to start, and the first switch 101 is controlled to switch the first basic input/output system to the second basic input/output system, so that the first basic input/output system can be timely switched to the second basic input/output system when the first basic input/output system fails, the starting time-out time of the basic input/output system is shortened, and the starting speed of the server is improved.
In the above embodiments, the first controller may be a complex programmable logic device or a baseboard management controller.
If the first controller is a baseboard management controller, the first controller may acquire the start-up state of the first basic input output system by communicating with the first basic input output system after powering on, which may include that the baseboard management controller acquires the start-up state through an intelligent platform management interface command.
If the first controller is a complex programmable logic device, the first controller may be configured to communicate with the first bios after power is applied to obtain the start-up status of the first bios.
If the first controller is a complex programmable logic device, the first controller may further include receiving, by the complex programmable logic device, information of the start-up state sent by the baseboard management controller by communicating with the first bios after power is on, to obtain the start-up state of the first bios.
In other optional implementations of the embodiments of the present invention, the first controller may further include a complex programmable logic device and a baseboard management controller, and the complex programmable logic device and the baseboard management controller may obtain a start-up state of the first bios or a start-up state of the second bios communication, respectively, and perform monitoring on a start-up procedure of the first bios or a start-up procedure of the second bios in combination with the first start-up time. At this time, the complex programmable logic device and the baseboard management controller may be backed up each other, only one of them has control right to control the first switch 101 at the same time, the two first controllers monitor each other's operation state, and when the first controller having control right fails, the other first controller performs a task of monitoring the basic input/output system. Therefore, the reliability of the starting process of the basic input/output system can be further improved, and the starting efficiency of the server is further improved.
Based on the above embodiments, the process of the first controller communicating with the first bios will be further described in the embodiments of the present invention.
In some alternative implementations of the embodiments of the present invention, the script may be deployed on the first bios and the second bios to output the startup status to the first controller during the startup process. The first controller obtaining the start-up state and identifying that the actual start-up time exceeds the first start-up time may include monitoring output information of the first basic input output system after determining that the first basic input output system enters the current start-up stage, and if the information that the current start-up stage sent by the first basic input output system is not monitored to be executed within the corresponding first start-up time, determining that the actual start-up time corresponding to the current start-up stage exceeds the corresponding first start-up time by the first controller.
The first controller monitors the output information of the first basic input/output system, and can be configured and started to a first timer corresponding to the startup phase according to a first startup time corresponding to the startup phase after the first controller monitors the startup phase starting information sent by the first basic input/output system, and the first controller closes the first timer after the first controller monitors the startup phase executing information sent by the first basic input/output system.
In other alternative implementations of the embodiments of the present invention, the first controller may actively obtain the start-up status of the first bios or the second bios without modifying the bios. The first controller obtains a starting state and recognizes that the actual starting time exceeds the first starting time, and the method further comprises the steps that after the first basic input output system is determined to enter the current starting stage, the first controller accesses the first basic input output system in the first starting time corresponding to the starting stage to obtain the starting state, and if the starting state is that the first basic input output system does not execute the current starting stage after the first starting time is reached, the first controller determines that the actual starting time corresponding to the current starting stage exceeds the corresponding first starting time.
At the moment, the first controller determines that the first basic input output system enters the current starting-up stage, which may include that the first controller accesses the first basic input output system and obtains that the first basic input output system enters the current starting-up stage after finishing the last starting-up stage. That is, under the condition of actively acquiring the starting state of the first basic input/output system, the first controller may determine to enter the current startup phase according to the signal that ends the previous startup phase. Because the manner of actively acquiring the starting state of the first basic input output system may have a delay, that is, the first controller may end the last startup period for a period of time before accessing the first basic input output system, the first controller may acquire the starting state and the completion time of the last startup period when accessing the first basic input output system, and dynamically set the first startup time corresponding to the current startup period.
After the first basic input and output system is determined to enter the current starting stage, the first controller accesses the first basic input and output system in a first starting time corresponding to the starting stage to acquire a starting state, and the method comprises the steps of multiplying the first starting time corresponding to the current starting stage by a preset proportionality coefficient to obtain a second starting time after the first basic input and output system is determined to enter the current starting stage, configuring a second timer according to the second starting time, configuring a third timer according to the first starting time, starting the second timer and the third timer, accessing the first basic input and output system to acquire the starting state after the second timer arrives, continuing to wait if the first basic input and output system is not completed in the current starting stage, closing the third timer and determining that the first basic input and output system enters the next starting stage if the first basic input and output system is not completed in the current starting stage, accessing the first basic input and output system to acquire the second basic input and output state when the first basic input and output system is not completed in the current starting stage, and starting time exceeds the first basic input and output system to enter the next starting stage if the first basic input and output system is determined to enter the current starting stage. That is, since the first controller may have finished the last boot stage for a period of time before accessing the first bios, the first controller may also obtain the second boot time by multiplying the pre-stored first boot time by the preset scaling factor, monitor the boot stage according to the second boot time, access the first bios after reaching the second boot time to obtain the boot state thereof, and if the first bios has finished the last boot stage at this time, end the monitoring in advance, and enter the monitoring of the current boot stage in advance.
Based on the above embodiments, in the embodiments of the present invention, a boot stage is further described by taking at least two of a central processing unit boot stage, a memory initialization stage, a peripheral loading stage, an operating system loader loading stage and an operating system running stage as an example.
Fig. 2 is a timing chart of a starting process of a basic input/output system according to an embodiment of the present invention.
In the embodiment of the invention, the first controller can start the timer after determining that the startup stage is entered, the timing time of the timer is configured as the first startup time corresponding to the startup stage, the first controller turns off the timer after determining that the startup stage is executed, and if the startup stage is not completed beyond the timing time of the timer, the startup stage is determined to be overtime. The timer may be a watchdog timer.
In the embodiment of the invention, the first controller monitors the starting stage of the central processor of the first basic input/output system according to the starting state, and the first controller can comprise the steps of detecting a signal for powering on the central processor, determining that the starting stage of the central processor starts, and determining that the starting stage of the central processor is executed if a central processor reset signal sent by the central processor is detected.
As shown in fig. 2, the starting stage of the cpu is a starting process of the cpu firmware (including unified extensible firmware interface platform initialization specification Unified Extensible FIRMWARE INTERFACE Platform Initialization, UEFI PI, management engine ME) on the motherboard. After the server main board is powered on, the first controller is powered on, after the central processor is powered on by monitoring, the starting stage of the central processor is determined to be started, a watchdog corresponding to the starting stage of the central processor is started, the timeout time of the watchdog can be set to 4 minutes (the adaptation and adjustment are needed according to different designs of a platform and hardware), and the starting stage of the central processor is determined to be ended after the central processor reset signal (CPU reset) is detected. In the starting stage of the central processing unit, a boot loader (Bootloader) of the central processing unit loads a unified extensible firmware interface platform initialization specification or management engine firmware, the unified extensible firmware interface platform initialization specification or management engine firmware is usually stored in a memory of the basic input output system, and when the unified extensible firmware interface platform initialization specification or management engine firmware runs to the central processing unit and sends a central processing unit reset signal, the unified extensible firmware interface platform initialization specification or management engine firmware is normally started.
In the embodiment of the invention, the first controller monitors the memory initialization stage of the first basic input/output system according to the starting state, and can include detecting a reset signal of the central processing unit, determining that the memory initialization stage starts, and determining that the memory initialization stage is executed if the peripheral initialization signal is detected.
As shown in FIG. 2, the first controller starts the watchdog corresponding to the memory initialization stage after detecting the CPU reset signal, and because the BIOS initialization stage has different policy configurations, for example, the server needs to perform initialization Training (memory tracking) on the memory for the first time of assembly and power-up, then stores the trained parameters in the nonvolatile memory (BIOS FLASH NVRAM) of the memory where the BIOS is located, and when the server is restarted or powered up and down again, the memory initialization Training stage is skipped, and the Training parameters stored in the nonvolatile memory of the memory where the BIOS is located are directly used for performing quick memory initialization. In addition, when the memory configuration (position, capacity, model, etc.) changes, the previously stored training parameters cannot be used, and the memory initialization training needs to be performed again. The training parameters are obtained by memory initialization training, or the stored training parameters are adopted for quick initialization, the corresponding starting time of the memory initialization stage is different, and when the memory capacity is larger, the time difference is larger, the higher the memory substitution is, the longer the time required by the memory initialization training is, such as DDR3, DDR4 and DDR5 memories, and even future DDR6 memories are also. In the embodiment of the invention, the first controller can determine the first starting time when the memory initialization training occurs in the memory initialization stage and the first starting time when the memory initialization stage adopts the quick initialization according to the memory generation time and the memory capacity of the equipment where the first controller is identified, and set different flags (flags), and the first controller sets different first starting times according to the different flags. And when the first basic input and output system is identified to run to a peripheral initialization signal (such as PCIe OptionRom for loading), determining that the memory initialization stage is finished, and closing a watchdog corresponding to the memory initialization stage.
In the embodiment of the invention, the first controller monitors the peripheral loading stage of the first basic input/output system according to the starting state, and the method comprises the steps of detecting a peripheral initialization signal, determining that the peripheral loading stage starts, and determining that the peripheral loading stage is executed if an operating system loader loading signal is detected.
As shown in fig. 2, when the bios is started, a required optionally implemented read-only memory (Option ROM) is loaded according to different peripheral requirements, so as to support functional applications of different (high-speed serial computer expansion bus PERIPHERAL COMPONENT INTERCONNECT EXPRESS, PCIe) peripherals in the bios starting stage, such as a pre-start execution environment (Preboot eXecution Environment, PXE) function of a network card, a group RAID function of a disk array (Redundant Arrays of INDEPENDENT DISKS, RAID) card, an encryption function of a nonvolatile flash memory (Non-Volatile Memory Express, NVMe) hard disk, and the like. However, the choice of loading OptionRom by the extensible PCIe device occurs due to different server hardware designs and different requirements, where selection may be made by loading a whitelist through OptionRom. The bios calculates how much PCIe OptionRom needs to be loaded according to the device type and Number in the OptionRom loaded whitelist, and then passes the Number (Number) of OptionRom to the first controller. The first controller may default to a single OptionRom load timeout of 20s and then multiply OptionRom to obtain a corresponding first start-up time for the peripheral load phase. When the first basic input output system is identified to run to a peripheral initialization signal, the first controller starts a watchdog corresponding to a peripheral loading stage, and when the first basic input output system is started to an operating system loader (OS loader) loading stage, the watchdog is closed.
In the embodiment of the invention, the first controller monitors the loading stage of the operating system loader of the first basic input/output system according to the starting state, and the first controller can comprise the steps of detecting the loading signal of the operating system loader, determining that the loading stage of the operating system loader starts, and determining that the loading stage of the operating system loader is executed if the initializing signal of the operating system kernel is detected.
As shown in fig. 2, for the loading stage of the operating system loader, the first basic input output system sets different watchdog coefficients or switches according to a boot order (boot order) and a boot option (boot option) device actually inserted, such as identifying a universal serial bus (Universal Serial Bus, USB) boot disk, pre-boot execution environment system filling, different operating system disks, and the like, and sets an operating system loader watchdog enable flag and a watchdog time coefficient according to a product policy. It should be noted that, the operating system on some devices does not support the timing function, and the watchdog enable flag is used to skip the monitoring of the loading stage of the operating system loader, and enter the monitoring of the next boot stage after identifying the start signal of the next boot stage.
The first controller (complex programmable logic device or baseboard management controller) multiplies the default watchdog time (typical time) by the watchdog time coefficient according to the watchdog enable flag and the watchdog time coefficient to obtain a new watchdog time, obtains a first start time, and then starts or closes the watchdog time according to the watchdog enable flag. When the operating system runs to operating system kernel initialization (KERNEL INIT), the corresponding watchdog of the operating system loader loading stage is closed.
In the embodiment of the invention, the first controller monitors the operating system operation stage of the first basic input/output system according to the starting state, and the first controller can comprise the steps of detecting an operating system kernel initialization signal, determining that the operating system operation stage starts, and determining that the operating system operation stage is executed if the operating system is detected to complete a preset number of operation cycles.
As shown in fig. 2, upon identifying an operating system kernel initialization signal, the first controller turns on a watchdog corresponding to an operating system operating phase, and turns off the watchdog according to an operating system operating cycle, e.g., after one or more operating cycles.
In the above five startup phases, the first controller performs conditional switching, setting and watchdog care on the watchdog corresponding to each startup phase, when any 1-level watchdog has overtime, the first controller may perform power reset (power reset) operation, when any 4-level watchdog has overtime, in addition to performing power reset operation, the first controller controls the first switch 101 to switch the first basic input/output system to the second basic input/output system (specifically, may read the current slot number register number (BIOS Flash Slot Number) stored in nvram), switch the first basic input/output system after the reverse, and restore the first basic input/output system to nvram), and store the selected memory slot (flash slot) number into nvram, and when restarting next time, preferably read the current slot number register number from nvram to perform the chip selection.
The power reset operation is used for resetting all registers to an initial state when the power is turned on, so that the reliability and the safety of the system are improved.
Taking the five startup phases described in the above embodiments as an example, the embodiment of the present invention further describes the monitoring timing of the first controller.
In the monitoring system of the basic input/output system provided by the embodiment of the present invention, the monitoring process executed by the first controller may include the following steps S101 to S112.
And S101, after the power-on starting, the first controller reads a slot number register number (BIOS Flash Slot Number) register in the nvram, if the slot number register is not 0 or 1, a basic input output system with the number of 0 is used as a first basic input output system by default, and the basic input output system is written into the register in a nonvolatile memory (nvram). If the value of the slot number register read by the first controller is 0 or 1, according to whether the starting of the monitoring basic input/output system is overtime, if the starting is overtime, the value of the slot number register is inverted and then a fragment is set, and if the starting is not overtime, the chip selection of the basic input/output system is carried out according to the value of the slot number register read.
And S102, the first controller circularly monitors whether the central processor is electrified, starts a watchdog corresponding to the starting stage of the central processor after the central processor is electrified, and sets a first starting time (which may be 4 minutes).
And S103, the first controller circularly monitors whether a central processing unit reset signal (CPU reset) is sent out, the central processing unit reset signal is monitored to be set, and a watchdog corresponding to a starting stage of the central processing unit is closed, and S104 is carried out. Otherwise, the watchdog corresponding to the starting stage of the central processing unit overtime, recording an overtime mark to an nvram register, recording a starting overtime log of the firmware of the central processing unit, and performing a main board Power reset (Power reset) operation.
S104, the first controller starts a watchdog in a basic input and output system starting stage (memory initialization stage), and the default timeout time is set to be 10 minutes.
S105, the first controller monitors whether the first basic input and output system has a mark (actual sending time value coefficient Flag) for sending memory initialization training, if the mark is detected within the timeout time, the watchdog time is reset to 10 minutes of Flag/10 (note: the time value coefficient unit transmitted by the first basic input and output system is minutes) and jumps to S106 for execution, otherwise, the process directly jumps to S106 for execution.
And S106, the first controller circularly monitors whether a peripheral initialization signal (PCIe OptionRom loading marks) exists in a first starting time corresponding to the memory initialization stage, if so, the S107 is entered, otherwise, after the memory initialization stage is overtime, the overtime marks are recorded to the nvram register, a log of the overtime of the basic input and output system starting stage (the memory initialization stage) is recorded, and a main board Power reset (Power reset) operation is performed.
And S107, the first controller starts a watchdog corresponding to the peripheral loading stage, and sets the overtime as a corresponding first starting time.
S108, the first controller circularly detects an operating system loader (OS loader) mark in a first starting time corresponding to a peripheral loading stage, if the OS loader mark is detected in the first starting time, the S109 is skipped, otherwise, the watchdog corresponding to the peripheral loading stage overtakes, records the overtime mark to an nvram register, records a log overtime in the peripheral loading stage, and performs a main board Power reset (Power reset) operation.
S109, the first controller starts a watchdog corresponding to the loading stage of the operating system loader, and sets the timeout time as the corresponding first starting time.
S110, the first controller circularly detects an operating system kernel initialization (KERNEL INIT) mark in a first starting time corresponding to the loading stage of the operating system loader, if the operating system kernel initialization mark is detected in the first starting time, the step S111 is skipped, otherwise, the watchdog corresponding to the loading stage of the operating system loader is overtime, the overtime mark is recorded to an nvram register, the overtime log of the loading stage of the operating system loader is recorded, and a main board Power reset (Power reset) operation is performed.
S111, the first controller starts a watchdog corresponding to an operating system operation stage, and sets a timeout time as a corresponding first starting time (which may be 60 seconds).
And S112, the first controller circularly detects the periodic dog feeding signal of the operating system in a first starting time corresponding to the operating system running stage, if the periodic dog feeding signal is detected in the first starting time, the first controller resets the watchdog time, otherwise, the watchdog corresponding to the operating system running stage overtakes, records the overtime mark to the nvram register, records the overtime log of the operating system running stage and performs main board Power reset (Power reset) operation.
Taking the five startup phases described in the above embodiments as examples, the embodiments of the present invention further describe the timing sequence of the bios during startup.
In the monitoring system of the bios provided in the embodiment of the present invention, the process executed by the first bios (or the second bios) after the start may include the following steps S201 to S204.
S201, after the first basic input and output system is started, whether memory initialization training is needed or not is detected, if so, S202 is performed, and if not, S203 is performed.
The first bios identifies the memory generation number (DDR 3, DDR4, DDR5, DDR6, etc.), the single memory capacity (typically 8G, 16G, 32G, 64G, etc.), the number of memory banks, and determines the memory initialization training time (the time is required to be tested in different platforms of the same cpu manufacturer) by using the memory generation number (memory capacity/8G) and the single memory to calculate the time required for the memory initialization training, and the unit may be minutes, and writes the time into the register (the memory initialization training flag register) of the first controller to send the time to the first controller.
And S203, starting the first basic input output system to a peripheral loading stage, reading a PCIe OptionRom white list in a basic input output system memory, calculating how many PCIe devices need to be loaded OptioRom, and sending the number to the first controller in a manner of writing the number into a OptionRom number register of the first controller. If the whitelist is empty or abnormal data, the number may be sent to the first controller by writing the number to OptionRom number register of the first controller by default OptionRom number to 1.
S204, in a stage of starting the first basic input/output system to an operating system loader, identifying the type of a current starting item, such as a USB (universal serial bus) starting disc, an HDD (hard disk) system disc, a PXE (peripheral component interconnect express) system, a UEFIshell, a setup and the like, respectively setting different watchdog strategies, wherein the setting strategies can comprise:
(1) The USB boot disk is used for closing a watchdog in the loading stage of the operating system loader;
(2) The HDD system disk starts the watchdog of the loading stage of the operating system loader, can identify different operating system types at the same time, and sets corresponding watchdog time;
(3) UEFI shell, closing the watchdog of the loading stage of the operating system loader;
(4) setup, closing the operating system loader loading stage watchdog;
(5) PXE, closing the operating system loader loading phase watchdog.
It should be noted that, in the embodiments of the present invention, the first controller monitors the start-up state of the first bios, and after the first bios fails to start, the first controller switches to the second bios, and the process of the first controller monitoring the start-up state of the second bios is the same as the process of the first controller monitoring the start-up state of the first bios.
The monitoring method of the basic input/output system provided by the embodiment of the invention is described below with reference to the above-described monitoring system of the basic input/output system of the embodiment of the invention with reference to the accompanying drawings.
Fig. 3 is a flowchart of a monitoring method of a basic input/output system according to an embodiment of the present invention.
As shown in fig. 3, the monitoring method applied to the first controller for the basic input/output system provided in the embodiment of the present invention may include:
S301, after power-on, acquiring a starting state of a first basic input output system by communicating with the first basic input output system;
S302, if the fact that the actual starting time exceeds the first starting time corresponding to the starting stage exists in the starting stage of the first basic input output system is identified, determining that the starting of the first basic input output system fails;
S303, after determining that the first basic input/output system fails to start, controlling a first switching switch to switch the first basic input/output system to a second basic input/output system;
The number of the starting-up stages is multiple, and the first starting time is determined according to equipment configuration parameters corresponding to the starting-up stages.
In the embodiment of the present invention, the boot stage may include at least two of a central processing unit boot stage, a memory initialization stage, a peripheral loading stage, an operating system loader loading stage and an operating system running stage.
In the embodiment of the invention, the CPU starting stage of the first basic input/output system is monitored according to the starting state, and the method comprises the steps of detecting a signal for powering on the CPU, determining that the CPU starting stage starts, and determining that the CPU starting stage is executed if a CPU reset signal sent by the CPU is detected.
In the embodiment of the invention, the memory initialization stage of the first basic input/output system is monitored according to the starting state, and the method comprises the steps of detecting a reset signal of a central processing unit, determining that the memory initialization stage starts, and determining that the memory initialization stage is executed if a peripheral initialization signal is detected.
In the embodiment of the invention, the monitoring of the peripheral loading stage of the first basic input/output system according to the starting state can comprise the steps of detecting a peripheral initialization signal, determining that the peripheral loading stage starts, and determining that the peripheral loading stage is executed if the loading signal of the operating system loader is detected.
In the embodiment of the invention, the operating system loader loading stage of the first basic input/output system is monitored according to the starting state, and the method comprises the steps of detecting an operating system loader loading signal, determining that the operating system loader loading stage starts, and determining that the operating system loader loading stage is executed if an operating system kernel initializing signal is detected.
In the embodiment of the invention, the monitoring of the operating system operation stage of the first basic input/output system according to the starting state can comprise the steps of detecting an operating system kernel initialization signal, determining that the operating system operation stage starts, and determining that the operating system operation stage is executed if the operating system is detected to complete a preset number of operation cycles.
In the embodiment of the invention, acquiring the starting state and identifying that the actual starting time exceeds the first starting time can comprise monitoring the output information of the first basic input output system after determining that the first basic input output system enters the current starting stage, and determining that the actual starting time corresponding to the current starting stage exceeds the corresponding first starting time by the first controller if the information of the current starting stage, which is sent by the first basic input output system and is executed, is not monitored within the corresponding first starting time.
The first controller monitors output information of the first basic input/output system, and can be used for configuring and starting a first timer corresponding to a starting period according to a first starting time corresponding to the starting period after the first controller monitors information of starting period starting sent by the first basic input/output system, and closing the first timer after the first controller monitors information of finishing executing the starting period sent by the first basic input/output system.
In the embodiment of the invention, the starting state is obtained and the fact that the actual starting time exceeds the first starting time is identified, and the method further comprises the steps of accessing the first basic input output system in the first starting time corresponding to the starting stage to obtain the starting state after the first basic input output system is determined to enter the current starting stage, and determining that the actual starting time corresponding to the current starting stage exceeds the corresponding first starting time if the starting state is that the first basic input output system does not execute the current starting stage after the first starting time is reached.
The determining that the first bios enters the current boot stage may include accessing the first bios and determining that the first bios enters the current boot stage after the first bios completes the last boot stage.
The method comprises the steps of after the first basic input and output system is determined to enter a current starting stage, accessing the first basic input and output system in a first starting time corresponding to the starting stage to obtain a starting state, wherein the method comprises the steps of multiplying the first starting time corresponding to the current starting stage by a preset proportionality coefficient to obtain a second starting time after the first basic input and output system is determined to enter the current starting stage, configuring a second timer according to the second starting time, configuring a third timer according to the first starting time, starting the second timer and the third timer, accessing the first basic input and output system to obtain the starting state after the second timer arrives, continuing to wait if the first basic input and output system does not arrive at the current starting stage, closing the third timer and determining that the first basic input and output system enters the next starting stage if the first basic input and output system does not arrive at the current starting stage, accessing the first basic input and output system to obtain the starting state if the first basic input and output system does not arrive at the current starting stage, and determining that the first basic input and output system enters the current starting stage if the first basic input and output system does not arrive at the current starting stage.
In some optional implementations of the embodiment of the invention, the first controller may be a baseboard management controller, and the step of obtaining the start-up state of the first bios by communicating with the first bios after power-up in S301 may include the baseboard management controller obtaining the start-up state through an intelligent platform management interface command.
In other alternative implementations of the embodiment of the present invention, the first controller may be a complex programmable logic device, and the step of obtaining the start-up state of the first bios by communicating with the first bios after power is applied in S301 may further include receiving, by the complex programmable logic device via the integrated circuit bus, information about the start-up state of the first bios.
In still other alternative implementations of the embodiments of the present invention, the first controller may be a complex programmable logic device, and the step of obtaining the start-up state of the first bios by communicating with the first bios after power is applied in S301 may further include the complex programmable logic device receiving the information of the start-up state sent by the baseboard management controller.
It should be noted that, in the embodiments of the monitoring method of each bios of the present invention, some of the steps or features may be omitted or not performed. The divided hardware or software functional modules are not the only implementation form for implementing the monitoring method of the basic input/output system provided by the embodiment of the invention.
Various embodiments of a monitoring method of a basic input/output system are detailed above, and on the basis of the embodiments, the invention also discloses a monitoring device, equipment, a nonvolatile storage medium and a computer program product of the basic input/output system corresponding to the method.
The monitoring device applied to the first controller for the basic input/output system provided by the embodiment of the invention can comprise:
The monitoring unit is used for acquiring the starting state of the first basic input/output system by communicating with the first basic input/output system after power-on;
The identification unit is used for determining that the first basic input and output system is failed to start if the actual starting time exceeds the first starting time corresponding to the starting stage in the starting stage of the first basic input and output system;
The control unit is used for controlling the first switching switch to switch the first basic input and output system to the second basic input and output system after determining that the first basic input and output system fails to start;
The number of the starting-up stages is multiple, and the first starting time is determined according to equipment configuration parameters corresponding to the starting-up stages.
It should be noted that, in each implementation manner of the monitoring device of the basic input/output system provided by the embodiment of the present invention, the division of the units is only one division in logic function, and other division manners may be adopted. The connection between the different units may be electrical, mechanical or other. Separate units may be located in the same physical location or distributed across multiple network nodes. The units may be implemented in hardware or in software functional units. The aim of the scheme of the embodiment of the invention can be realized by selecting part or all of the units provided by the embodiment of the invention according to actual needs and adopting a corresponding connection mode or an integration mode.
Since the embodiments of the apparatus portion and the embodiments of the method portion correspond to each other, the embodiments of the apparatus portion are referred to the description of the embodiments of the method portion, and are not repeated herein.
Fig. 4 is a schematic structural diagram of a monitoring device of a basic input/output system according to an embodiment of the present invention.
As shown in fig. 4, the monitoring device for a basic input output system according to an embodiment of the present invention includes a memory 410 for storing a computer program 411, and a processor 420 for executing the computer program 411, where the computer program 411 is executed by the processor 420 to implement the steps of the monitoring method for a basic input output system according to any one of the embodiments.
Processor 420 may include one or more processing cores, such as a 3-core processor, an 8-core processor, etc., among others. The processor 420 may be implemented in at least one hardware form of Digital Signal Processing (DSP), field-Programmable gate array (fieldprogrammable GATE ARRAY, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 420 may also include a main processor, which is a processor for processing data in a wake-up state, also referred to as a central processor (Central Processing Unit, CPU), and a coprocessor, which is a low-power processor for processing data in a standby state. In some embodiments, the processor 420 may be integrated with an image processor (Graphics Processing Unit, GPU) for use in responsible for rendering and rendering of the content that is to be displayed by the display screen. In some embodiments, the processor 420 may also include an artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) processor for processing computing operations related to machine learning.
Memory 410 may include one or more non-volatile storage media, which may be non-transitory. Memory 410 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In this embodiment, the memory 410 is at least used for storing a computer program 411, where the computer program 411 can implement relevant steps in the monitoring method of the basic input/output system disclosed in any one of the foregoing embodiments after being loaded and executed by the processor 420. In addition, the resources stored in the memory 410 may further include an operating system 412, data 413, and the like, where the storage manner may be transient storage or permanent storage. The operating system 412 may be Windows or other types of operating systems. The data 413 may include, but is not limited to, data related to the above-described method.
In some embodiments, the monitoring device of the basic input output system may further comprise a display screen 430, a power supply 440, a communication interface 450, an input output interface 460, a sensor 470 and a communication bus 480.
Those skilled in the art will appreciate that the configuration shown in fig. 4 is not limiting of the monitoring device of the basic input output system and may include more or less components than those shown.
The monitoring device for the basic input/output system provided by the embodiment of the invention comprises a memory and a processor, wherein the processor can realize the steps of the monitoring method for the basic input/output system provided by the embodiment when executing the program stored in the memory.
An embodiment of the present invention provides a non-volatile storage medium having stored thereon a computer program which, when executed by a processor, can implement the steps of the method for monitoring a basic input output system provided in any of the above embodiments.
The nonvolatile storage medium may include a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk or an optical disk, etc. various media in which program codes may be stored.
For the introduction of the non-volatile storage medium provided by the embodiment of the present invention, please refer to the above method embodiment, and the effect of the method is the same as the monitoring method of the basic input/output system provided by the embodiment of the present invention, and the description of the present invention is omitted herein.
An embodiment of the present invention provides a computer program product, including a computer program, which when executed by a processor implements the steps of the method for monitoring a basic input output system provided in any of the above embodiments.
For the introduction of the computer program product provided by the embodiment of the present invention, please refer to the above method embodiment, and the effect thereof is the same as the monitoring method of the basic input/output system provided by the embodiment of the present invention, and the disclosure is not repeated here.
The method, the device, the equipment and the nonvolatile storage medium for monitoring the basic input/output system provided by the invention are described in detail. In the description, each embodiment is described in a progressive manner, and each embodiment is mainly described by the differences from other embodiments, so that the same similar parts among the embodiments are mutually referred. The apparatus, device, non-volatile storage medium, and computer program product of the embodiments disclosed herein, as they correspond to the methods of the embodiments disclosed herein, are described in the simpler terms, where relevant to the description of the methods section. It should be noted that it will be apparent to those skilled in the art that the present invention may be modified and practiced without departing from the spirit of the present invention.
It should also be noted that in this specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.