Disclosure of Invention
The embodiment of the application provides a processing method and a related device for a storage device path error, which are used for improving the flexibility of processing the error by a processing device.
In a first aspect, an embodiment of the present application provides a method for processing a path error of a storage device, including:
when the processing device is ready to process an error, the processing device may obtain a second error code decision table through the IO control interface, where the second error code decision table is obtained by modifying the first error code decision table by a user, the second error code decision table includes a correspondence between at least one error and a rule for processing the error, and the error is an error occurring in a process in which the processing device sends a request to the storage device;
after the processing device obtains the second error code decision table through the IO control interface, the processing device may process the error according to the second error code decision table.
In the embodiment of the present application, the error code decision table may implement transmission at a user layer and a kernel layer of the processing apparatus through an IO control interface. No matter whether the processing device has errors in the process of sending the IO request to the storage device, the processing device may receive an instruction for modifying the error code decision table at any time, receive the second error code decision table carried in the instruction through the IO control interface, and modify the first error code decision table according to the second error code decision table carried in the instruction. Thereby increasing the flexibility of multipath software to handle errors.
According to a first aspect, in a first implementation manner of the first aspect of this embodiment of the present application, the second error code decision table includes: an operating system interface layer error, a small computer system interface SCSI command layer error, a processing rule corresponding to the operating system interface layer error or the SCSI command layer error.
In the embodiment of the present application, the contents of the second error code decision table are listed, so that the feasibility of the scheme can be improved.
According to the first aspect, in a second implementation manner of the first aspect of this embodiment of the present application, the processing the error according to the second error code decision table includes:
replacing the error code decision table of the kernel layer with the second error code decision table;
and processing the error according to the error code decision table of the kernel layer.
In the embodiment of the present application, a step of processing the error according to the second error code decision table is introduced, so that the feasibility of the scheme can be improved.
According to the second implementation manner of the first aspect, in a third implementation manner of the first aspect of this embodiment of the present application, the processing the error according to the error code decision table of the kernel layer includes:
when an error prompt message is received, determining a target error corresponding to the error prompt message;
judging whether a processing rule corresponding to the target error exists in an error code decision table of the kernel layer;
and if the error code decision table of the kernel layer has a processing rule corresponding to the target error, processing the target error according to the processing rule.
In the embodiment of the present application, a step of processing the error by the error code decision table of the kernel layer is introduced, so that the feasibility of the scheme can be improved.
According to a fourth implementation form of the first aspect of this embodiment, the method further comprises:
and if the error code decision table of the kernel layer does not have the processing rule corresponding to the target error, prompting a user to modify the second error code decision table.
In the embodiment of the present application, a result obtained when it is determined that the processing rule corresponding to the target error does not exist in the error code decision table of the kernel layer is listed, so that implementation flexibility of the scheme can be enhanced.
According to the first aspect, in a fifth implementation manner of the first aspect of the embodiments of the present application, the method further includes:
when an IO request sent to the storage device has an error, acquiring an error path code, wherein the error path code is used for indicating the position of the error;
judging whether the first error code decision table can process the error, wherein the error processing comprises updating parameters of the path;
if the first error code decision table cannot handle the error, a request to modify the error code decision table is sent.
In the embodiment of the application, when an error occurs in the process of sending the IO request to the storage device, the processing device first determines whether the first error code decision table can process the error, and then executes the next operation according to the determination result, so that the implementation flexibility of the scheme can be enhanced.
According to a fifth implementation manner of the first aspect, in a sixth implementation manner of the first aspect of this embodiment of the present application, the determining whether the first error code decision table can handle the error includes:
acquiring the error path code;
querying the table entry in the first error code decision table;
judging whether the first error code decision table has the table item corresponding to the error path code;
if the entry corresponding to the error path code exists in the first error code decision table, determining that the first error code decision table can handle the error;
if the entry corresponding to the error path code is not in the first error code decision table, determining that the first error code decision table is not capable of handling the error.
In the embodiment of the application, a specific step of judging whether the first error code decision table can process the error is introduced, so that the feasibility of the scheme can be improved.
In a second aspect, an embodiment of the present application provides a processing apparatus, which performs the method in the foregoing first aspect, and includes:
an obtaining unit, configured to obtain a second error code decision table through an IO control interface, where the second error code decision table is obtained by modifying a first error code decision table by a user, the second error code decision table includes a correspondence between at least one error and a rule for processing the error, and the error is an error occurring in a process of sending a request to a storage device by a processing apparatus;
and the processing unit is used for processing the error according to the second error code decision table.
In the embodiment of the present application, the error code decision table may implement transmission at a user layer and a kernel layer of the processing apparatus through an IO control interface. No matter whether the processing device has errors in the process of sending the IO request to the storage device, the processing device may receive an instruction for modifying the error code decision table at any time, receive the second error code decision table carried in the instruction through the IO control interface, and modify the first error code decision table according to the second error code decision table carried in the instruction. Thereby increasing the flexibility of the processing means to handle errors.
According to a second aspect, in a first implementation manner of the second aspect of this embodiment of the present application, the second error code decision table includes:
an operating system interface layer error, a small computer system interface SCSI command layer error, a processing rule corresponding to the operating system interface layer error or the SCSI command layer error.
In the embodiment of the present application, the contents of the second error code decision table are listed, so that the feasibility of the scheme can be improved.
According to a second aspect, in a second implementation manner of the second aspect of this embodiment of the present application, the processing unit includes:
a replacement subunit, configured to replace the error code decision table of the kernel layer with the second error code decision table;
and the processing subunit is used for processing the error according to the error code decision table of the kernel layer.
In the embodiment of the present application, a step of processing the error according to the second error code decision table is introduced, so that the feasibility of the scheme can be improved.
According to a second implementation form of the second aspect, in a third implementation form of the second aspect of this application example, the processing subunit includes:
the determining module is used for determining a target error corresponding to the error prompt message when the error prompt message is received;
the judging module is used for judging whether a processing rule corresponding to the target error exists in an error code decision table of the kernel layer;
and the processing module is used for processing the target error according to the processing rule when the processing rule corresponding to the target error exists in the error code decision table of the kernel layer.
In the embodiment of the present application, a step of processing the error by the error code decision table of the kernel layer is introduced, so that the feasibility of the scheme can be improved.
According to a fourth implementation manner of the second aspect of the embodiments of the present application, the processing apparatus further includes:
and the prompting unit is used for prompting a user to modify the second error code decision table when the processing rule corresponding to the target error does not exist in the error code decision table of the kernel layer.
In the embodiment of the present application, a result obtained when it is determined that the processing rule corresponding to the target error does not exist in the error code decision table of the kernel layer is listed, so that implementation flexibility of the scheme can be enhanced.
In a third aspect, an embodiment of the present application provides a processing apparatus, which performs the method in the foregoing first aspect, and includes:
a processor, a memory, a bus, and a communication interface;
the processor, the memory and the input/output device are connected with the bus;
the processor controls the communication interface to obtain a second error code decision table by obtaining the second error code decision table, and stores the second error code decision table in the memory, wherein the second error code decision table is obtained by modifying the first error code decision table by a user, the second error code decision table comprises a corresponding relation between at least one error and a rule for processing the error, and the error is an error generated in the process of sending a request to the storage equipment by the processing device;
the processor processes the error according to the second error code decision table.
It should be noted that the communication interface may be an IO control interface.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium storing a computer program comprising program instructions that, when executed by a processor, cause the processor to perform the method of any of the preceding first aspects.
In a fifth aspect, the present application provides a computer program product, which is characterized in that when the computer program product runs on a computer, the computer is caused to execute the method according to any one of the preceding first aspects.
In a sixth aspect, the present application provides a chip system comprising a processor for enabling a network device to implement the functions referred to in the above aspects, e.g. to transmit or process data and/or information referred to in the above methods. In one possible design, the system-on-chip further includes a memory for storing program instructions and data necessary for the network device. The chip system may be formed by a chip, or may include a chip and other discrete devices.
According to the technical scheme, the embodiment of the application has the following advantages:
in the embodiment of the present application, the error code decision table may implement transmission at a user layer and a kernel layer of the processing apparatus through an IO control interface. No matter whether the processing device has errors in the process of sending the IO request to the storage device, the processing device may receive an instruction for modifying the error code decision table at any time, receive the second error code decision table carried in the instruction through the IO control interface, and modify the first error code decision table according to the second error code decision table carried in the instruction. Thereby increasing the flexibility of multipath software to handle errors.
Detailed Description
The embodiment of the application provides a processing method and a related device for a storage device path error, which are used for improving the flexibility of processing the error by a processing device.
Some terms referred to in the embodiments of the present application are described below.
A processing device: a geometry having a function of multipath driver software is included in the processing device.
Error: the sign is a sign for prompting the occurrence of a network error, and refers to all reasons or events which cause the system not to work according to the intention of a user, and the common causes are 407 errors, 405 errors, 401 errors and 404 errors. In the embodiment of the application, the multipath driver generates an exception in the process of sending an IO request to a storage device.
Error processing: in the programming process, the program cannot run normally due to the existence of some errors, and the processing of the errors to make the program run correctly is called error processing. Error handling functions are important aspects of compiler performance and play a very important role in helping programmers modify programs as quickly as possible.
The following describes a system architecture to which the embodiments of the present application are adapted.
In an enterprise-level information system, an application Host (Host) for processing a service request and a Storage device (Storage) for storing data are connected to each other through a Storage Area Network (SAN). In order to improve redundancy and IO throughput, a storage multi-path connection mode is usually adopted, that is, an application host accesses a storage device through multiple physical paths at the same time. As shown in fig. 1, there are two initial ports IP0 and IP1 on the application host, and two target ports TP0 and TP1 on the storage device, and when the application host accesses a Logical Unit Number (LUN) in the storage device through the SAN network connection, there are 4 paths: path0(IP0, TP0), Path1(IP0, TP1), Path2(IP1, TP0), Path3(IP1, TP 1).
The structure of multi-path software (MPIO) in the os of the application host is shown in fig. 2, which is a part of the processing device in the embodiment of the present application. The processing device at least comprises multipath software, and the multipath software is composed of a multipath driver of a kernel layer and a multipath management tool of a user layer. The multipath driver is a module belonging to the kernel of the operating system, and realizes the functions of path identification, path aggregation, path selection (load balancing), error processing and the like. The multipath management tool runs on a user layer and provides functions of path management and performance data statistics. In the embodiment of the application, an IO control interface between a multipath driver and a multipath management tool is newly defined, an error code decision table configuration file is set in the multipath management tool of a user layer, and an error code decision table is set in the multipath driver of a kernel layer, so that the error code decision table is transmitted between the user layer and the kernel layer. The configuration file of the error code decision table comprises the error code decision table and other files required by a user to modify the error code decision table. If an error occurs when the multi-path driver sends an IO request to the storage device through the selected path, the error handling module of the multi-path driver may determine how to handle the IO request and how to handle the error according to the type of the error and the indication of the error code decision table.
In this embodiment, it should be noted that the processing device in this embodiment may be a server, or a processor in the server, or a chip in the server, or other devices, and this is not limited herein. In this embodiment and the following embodiments, only the processing device is taken as an example for description.
For convenience of understanding, a specific flow in this embodiment is described below, and as shown in fig. 3, is a method for processing a storage device path error provided in this embodiment, where a processing apparatus in the method performs the following steps, including:
301. acquiring a second error code decision table through an IO control interface;
when the processing device is ready to process an error, the processing device may obtain a second error code decision table through the IO control interface, where the second error code decision table is obtained by modifying the first error code decision table by a user, the second error code decision table includes a correspondence between at least one error and a rule for processing the error, and the error is an error occurring in a process in which the processing device sends a request to the storage device.
302. The error is processed according to the second error code decision table.
After the processing device obtains the second error code decision table through the IO control interface, the processing device may process the error according to the second error code decision table.
In this embodiment, the error code decision table may implement transmission at a user layer and a kernel layer of the processing apparatus through an IO control interface. No matter whether the processing device has errors in the process of sending the IO request to the storage device, the processing device may receive the instruction for modifying the error code decision table at any time, receive the second error code decision table carried in the instruction through the IO control interface, and modify the first error code decision table according to the second error code decision table carried in the instruction. Thereby increasing the flexibility of multipath software to handle errors.
While the method for processing the storage device path error in this embodiment is described above, another embodiment of the method for processing the storage device path error in this embodiment is described below, and as shown in fig. 3, another embodiment of the method for processing the storage device path error in this embodiment includes:
401. acquiring a second error code decision table through an input/output control IO control interface;
in this embodiment, when a user modifies a second error code decision table, the processing device obtains the second error code decision table through the IO control interface, where the second error code decision table is obtained by modifying the first error code decision table by the user, the second error code decision table includes a correspondence between at least one error and a rule for processing the error, and the error is an error occurring in a process in which the processing device sends a request to the storage device.
In this embodiment, a set of IO control interfaces is defined, and the interfaces are used to implement synchronization of an error code decision table between a multipath management tool of a user layer and a processing device of a kernel layer. When the multi-path management tool reads the ERROR code decision TABLE of the processing device by calling the interface, an instruction Io _ CTL (CTL _ GET _ ERROR _ POLICY _ TABLE, void) is adopted. When the multi-path management tool reads the ERROR code decision TABLE of the processing device through updating the interface, an instruction Io _ CTL (CTL _ SET _ ERROR _ POLICY _ TABLE, void TABLE) is adopted. The code refers to a source file written by a programmer in a language supported by a development tool, and is a set of definite rule systems for representing information in a discrete form by characters, symbols or signal elements. The principles of code design include uniqueness, standardization and versatility, extensibility and stability, ease of identification and memory, strive for shortness and format unification, and ease of modification. It should be noted that the instruction for updating or reading the interface by the multipath management tool in this embodiment is not limited to the instruction described above, and the embodiment and the following embodiments are described by taking the two codes as examples.
In this embodiment, the decision table is a tabular graphic tool, and is suitable for describing situations where there are many processing and determining conditions, and various conditions are combined with each other and there are multiple decision schemes. The error code decision table may correspond to a plurality of conditions and actions to be performed after the conditions are satisfied in a manner that accurately and concisely describes complex logic. Unlike control statements in conventional program languages, the error code decision table can clearly express the direct connection of a plurality of independent conditions and a plurality of actions. As shown in table 1, the error code decision table at least includes: the host status (host _ status), SCSI command layer error (SCSI _ status), error handling rules (action), number of retransmissions (paramcount), retransmission interval (interval), and upper limit of the number of retransmissions (count) are applied.
TABLE 1
402. Replacing the error code decision table of the kernel layer with the second error code decision table;
in this embodiment, the processing device replaces the original error code decision table of the kernel layer in the kernel layer with a second error code decision table, where the second error code decision table is obtained from a multipath management tool in the user layer through an IO control interface. In this embodiment, the error code decision table of the core layer is replaced with the second error code decision table, so that when the error code decision table is updated next, the second error code decision table is the error code decision table of the core layer.
403. When an error prompt message is received, determining a target error corresponding to the error prompt message;
in this embodiment, when an error occurs when the multipath driver sends the IO request to the storage device, the processing device receives an error notification message, where the error notification message may be one message or multiple messages, and the details are not limited herein. The error prompting message carries an error path code, where the error path code is used to indicate the reason for the error, and the reason for the error includes: the location of the error, the error time of the error, or the number of occurrences of the error. In this embodiment, the processing device may determine a corresponding target error according to the error path code in the error prompt message. The error prompt message may also carry the number of times of error prompt, the number of stages of error early warning, or the time to process the error, which is not limited herein.
404. Judging whether a processing rule corresponding to the target error exists in an error code decision table of the kernel layer;
in this embodiment, after the processing device receives the error prompt message and determines the target error, it is determined whether a processing rule corresponding to the target error exists in the error code decision table of the core layer, where the content of the error code decision table of the core layer is the same as the parameter in the second error code decision table in the foregoing, and the processing rule refers to an entry in the error code decision table of the core layer. As shown in table 1, the error code decision table at least includes: the host status (host _ status), SCSI command layer error (SCSI _ status), error handling rules (action), number of retransmissions (paramcount), retransmission interval (interval), and upper limit of the number of retransmissions (count) are applied. Wherein at least one SCSI command layer error corresponds to at least one error handling rule. The one SCSI command layer error in the error code decision table may correspond to a plurality of error handling rules, and the one error handling rule in the error code decision table may correspond to a plurality of SCSI command layer errors. In this embodiment and the following embodiments, only the case where a SCSI command layer error corresponds to an error handling rule is described.
The error code decision table is explained below with the first and second behavior examples in table 1.
As shown in Table 1, when the application host status in the error path code corresponding to the target error is application host code 0 and the SCSI error code 0 is fetched by the SCSI command layer error, the corresponding processing rule (action) exists in the error code decision table.
If the error code decision table of the kernel layer has a processing rule corresponding to the target error, execute step 405;
if the error code decision table of the kernel layer does not have the processing rule corresponding to the target error, step 406 is executed.
405. Processing the target error according to the processing rule;
in this embodiment, when a processing rule corresponding to the target error exists in the error code decision table of the kernel layer, the processing device processes the target error according to the processing rule. The error code decision table is explained below using the first and fourth rows in table 1 as an example. As shown in table 1, when the application host state in the error path code corresponding to the target error is the application host code 0 and the SCSI command layer error fetches the SCSI error code 2, the error code decision table has a corresponding processing rule (action), which is to retransmit the IO request (retry _ other) via another path, and the number of current retransmissions (param count) is 2, the retransmission time interval (interva) is 10 unit times, and the upper limit of the number of retransmissions (count) is 3. The unit time may be 0.001ms, 0.002ms, or 0.0001ms, and is not particularly limited herein.
406. The user is prompted to modify the second error code decision table.
In this embodiment, when the error code decision table of the kernel layer does not have the processing rule corresponding to the target error, the processing device prompts the user to modify the second error code decision table. When the user receives the prompt that the user needs to modify the second error code decision table, the user interface displays the entry of the second error code decision table in the kernel layer, the error path code, or other information that may indicate how the user modifies the second error code decision table, which is not limited herein. The user modifies the second error code decision table according to the information to obtain a third error code decision table, and then step 401 is executed to enter the next cycle of modifying the error code decision table.
In this embodiment, the error code decision table may implement transmission at a user layer and a kernel layer of the processing apparatus through an IO control interface. No matter whether the processing device has errors in the process of sending the IO request to the storage device, the processing device may receive the instruction for modifying the error code decision table at any time, receive the second error code decision table carried in the instruction through the IO control interface, and modify the first error code decision table according to the second error code decision table carried in the instruction. Thereby increasing the flexibility of multipath software to handle errors.
In the above description of the method for processing a storage device path error in this embodiment, a processing apparatus 500 in this embodiment is described below, and as shown in fig. 5, an embodiment of the processing apparatus 500 in this embodiment includes:
an obtaining unit 501, configured to obtain a second error code decision table through an IO control interface, where the second error code decision table is obtained by modifying a first error code decision table by a user, the second error code decision table includes a correspondence between at least one error and a rule for processing the error, and the error is an error occurring in a process of sending a request to a storage device by a processing apparatus;
a processing unit 502, configured to process the error according to the second error code decision table.
The processing unit 502 includes:
a replacement subunit 5021, configured to replace the error code decision table of the kernel layer with the second error code decision table;
the processing subunit 5022 is configured to process the error according to the error code decision table of the kernel layer.
The processing subunit 5022 includes:
a determining module 50221, configured to determine, when an error prompting message is received, a target error corresponding to the error prompting message;
a determining module 50222, configured to determine whether a processing rule corresponding to the target error exists in an error code decision table of the kernel layer;
the processing module 50223 is configured to process the target error according to a processing rule corresponding to the target error when the processing rule exists in the error code decision table of the kernel layer.
The processing device 500 further comprises:
a prompting unit 503, configured to prompt a user to modify the second error code decision table when the processing rule corresponding to the target error does not exist in the error code decision table of the kernel layer.
In this embodiment, the error code decision table may implement transmission at a user layer and a kernel layer of the processing apparatus through an IO control interface. No matter whether the processing device has errors in the process of sending the IO request to the storage device, the processing device may receive the instruction for modifying the error code decision table at any time, receive the second error code decision table carried in the instruction through the IO control interface, and modify the first error code decision table according to the second error code decision table carried in the instruction. Thereby increasing the flexibility of multipath software to handle errors.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the unit is only one logical functional division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.