US20130275806A1 - Reconfigurable recovery modes in high availability processors - Google Patents
Reconfigurable recovery modes in high availability processors Download PDFInfo
- Publication number
- US20130275806A1 US20130275806A1 US13/785,103 US201313785103A US2013275806A1 US 20130275806 A1 US20130275806 A1 US 20130275806A1 US 201313785103 A US201313785103 A US 201313785103A US 2013275806 A1 US2013275806 A1 US 2013275806A1
- Authority
- US
- United States
- Prior art keywords
- resources
- processor
- error recovery
- recovery
- instruction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3234—Power saving characterised by the action undertaken
- G06F1/3287—Power saving characterised by the action undertaken by switching off individual functional units in the computer system
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0766—Error or fault reporting or storing
- G06F11/0772—Means for error signaling, e.g. using interrupts, exception flags, dedicated error registers
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0766—Error or fault reporting or storing
- G06F11/0781—Error filtering or prioritizing based on a policy defined by the user or on a policy defined by a hardware/software module, e.g. according to a severity level
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0793—Remedial or corrective actions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1405—Saving, restoring, recovering or retrying at machine instruction level
- G06F11/1407—Checkpointing the instruction stream
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1415—Saving, restoring, recovering or retrying at system level
- G06F11/142—Reconfiguring to eliminate the error
- G06F11/1425—Reconfiguring to eliminate the error by reconfiguration of node membership
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1415—Saving, restoring, recovering or retrying at system level
- G06F11/142—Reconfiguring to eliminate the error
- G06F11/1428—Reconfiguring to eliminate the error with loss of hardware functionality
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- the present invention relates to error recovery in high availability processors, and more specifically, exemplary embodiments of the present invention relate to error recovery systems and methods of instruction processing for high availability processors with both recoverable and non-recoverable modes of operation.
- a method for performing error recovery is provided.
- a software recovery checkpoint is created by a processor.
- the processor is dynamically switched into a non-recoverable processing mode of operation based on creating the software recovery checkpoint.
- the non-recoverable processing mode of operation is mode in which a subset of hardware error recovery resources are powered-down or re-purposed for instruction processing. It is determined, during the non-recoverable processing mode of operation, that a new software recovery checkpoint is required. Based on the determining that a new software recovery checkpoint is required, the processor is dynamically switched into a recoverable processing mode of operation.
- the recoverable processing mode of operation is a mode in which hardware error recovery resources, including at least one of the hardware error recovery resources in the subset, are purposed for hardware error recovery operations.
- FIG. 1 illustrates a diagram of a high availability computer processor in recoverable mode, according to an exemplary embodiment of the present invention
- FIG. 2 illustrates a diagram of a high availability computer processor in non-recoverable mode, according to an exemplary embodiment of the present invention
- FIG. 3 illustrates a flow chart of a method of processing instructions in a high availability computer processor with non-recoverable mode support, according to an exemplary embodiment of the present invention
- FIG. 4 illustrates a computer-usable storage medium, according to an exemplary embodiment of the present invention.
- FIG. 5 illustrates a diagram of a computer apparatus, according to an exemplary embodiment of the present invention.
- a high availability computer processor and method of processing instructions on a high availability processor are provided which increase resources available for instruction execution through dynamic changes to fault recovery systems available to the processor.
- a high availability computer processor may dynamically switch from a fully recoverable mode into one form of non-recoverable mode which may free resources (e.g., registers, buffers, etc.) typically used for fault/error recovery for other operations.
- the freed resources may allow increased computational throughput as compared to other computer processors (i.e., processors with only recoverable modes) while still allowing for an acceptable level of fault recovery as enabled through software-created checkpoints rather than costly hardware-specific checkpoints.
- a hardware mechanism may be enabled by software such that the processor dynamically configures itself to execute in either of the two modes.
- the software While in a non-recoverable mode of operation, the software may provide some coarse grain recovery mechanism, and can recover from faults using software-based checkpoint schemes, thereby allowing the same measure of fault recovery while also having freed resources from non-used hardware recovery resources.
- an application or code section may instruct the processor to go into a non-recoverable mode, and only be notified if a fault is detected. Otherwise, the processor can remain (or be switched back) into its recoverable mode.
- a non-recoverable processor state has multiple benefits.
- the extra resources required to save a previously known good checkpoint state can be allocated for productive work. Any transient buffering states can also be reduced. For example, if part of a physical register pool is allocated to hardware checkpointing, these registers can now be used for speculative processing instead. Furthermore, additional speculative processing may be allowed to complete ahead of time before all errors are collected and a checkpoint is taken. Results can be committed before checkpointing is performed by the hardware, which is no longer required in a non-recoverable mode. Thereafter, once software enabled error recovery operations require a new software checkpoint, the processor may be directed to switch back into a recoverable mode of operation such that error recovery is hardware-facilitated while the software creates a new checkpoint. Subsequently, the processor may return to a non-recoverable mode with newly freed resources.
- a high availability computer processor and method of processing instructions on a high availability processor are provided which decrease power consumption through dynamic changes to fault recovery systems available to the processor.
- a high availability computer processor may dynamically switch from a fully recoverable mode into one form of non-recoverable mode which may free and power-down resources (e.g., registers, buffers, etc.) typically used for fault/error recovery for other operations. Therefore, when operating in a non-recoverable mode, the powered-down resources do not contribute to overall power consumption while still allowing for an acceptable level of fault recovery as enabled through software-created checkpoints rather than costly hardware-specific checkpoints.
- power-down resources e.g., registers, buffers, etc.
- exemplary embodiments noted above may be implemented in combination such that a portion of freed resources are powered-down while another portion of freed resources are repurposed. In this manner, a plurality of operational states may become apparent where multiple benefits in computer processing are realized in contrast to existing technologies.
- the processor 100 includes a cache 101 (e.g., data and instruction cache) which may be divided into a plurality of different cache levels or designations.
- the processor 100 further includes instruction fetch circuitry 102 configured to fetch instructions from the cache 101 .
- the processor 100 further includes instruction decode circuitry 103 configured to receive fetched instructions from the instruction fetch circuitry 102 .
- the processor 100 further includes instruction dispatch circuitry 104 configured to dispatch instructions decoded through circuitry 103 .
- the instructions are issued and executed through instruction execution portion 105 , which is further configured to fetch associated data from the cache 101 .
- Detailed handling of out of order instruction execution and support are assumed to be handled mainly inside circuitries 105 .
- processor 100 includes relatively common and general portions which function in an anticipated manner. These portions may be configured for reduced or complex (e.g., RISC or CISC) instruction sets or entirely specialized instruction sets according to any desired implementation of exemplary embodiments. Therefore, the processor 100 should not be limited to any specific computer processor, but should be equally applicable to any computer processor including somewhat similar or equivalent componentry.
- RISC reduced or complex
- CISC CISC
- the processor 100 further includes error recovery resources 106 in communication with the cache 101 and instruction execution portion 105 .
- the error recovery resources 106 may include fault recovery components comprising queues, buffers, thread processing units, registers, and any other suitable components.
- the error recovery resources 106 may process, create, and store instructions and results for error recovery facilitation.
- the error recovery resources 106 may detect hardware faults, capture checkpoints, and perform checkpoint retry upon detection of faults.
- the processor 100 further includes general purpose resources 107 in communication with the instruction execution portion 105 and the error recovery resources 106 through channel 108 .
- the general purpose resources may include general purpose registers, floating point registers, special purpose registers, or any other suitable components for processing and storing instruction results in a controlled fashion.
- error recovery resources 106 and general purpose resources 107 are illustrated as individual components, these resources can be distributed into other components 101 , 102 , 103 , 104 , 105 and other processor components not explicitly illustrated. Furthermore, a plurality of threads may be executed on processor 100 using both error recovery resources 106 and general purpose resources 107 .
- error recovery resources 106 and general purpose resources 107 are configured to provide different functionality, the associated components organized therein comprise at least a portion of generally the same or similar components.
- registers and buffers may be both included in the error recovery resources 106 and the general purpose resources 107 . Therefore, according to exemplary embodiments of the present invention, a portion of the error recovery resources may be freed, powered-down, or re-purposed to function in unison with the general purpose resources in at least one new operating mode. Such is illustrated in FIG. 2 .
- a portion of error recovery resources 106 are freed and repurposed as resources 206 B to function in accordance with the general purpose resources 107 .
- resources 206 B may be powered-down to lower power consumption.
- a remaining portion of error recovery resources 206 A may remain dedicated to error recovery and further be configured to free, power-down, repurpose, or reintegrate the resources 206 B to function as either error recovery resources or general purpose resources within processor 100 . It follows then, that if additional resources are freed for use in instruction processing, the processor 100 may operate at a faster rate than if all error recovery resources 106 are dedicated to handle hardware fault recovery.
- processor 100 still detects hardware faults and reports them to the operating system or application code. However, the checkpoint capturing and retry mechanisms are disabled in the hardware and, instead, are performed by software.
- the processor changes its basic operation and does not allocate resources for saving checkpoints nor does it postpone instruction execution or completion that might have to wait until the creation and validation of checkpoints.
- Such a configuration change may be accomplished by modifying the typical instruction dispatch/issue/execution rules of precedence, storage update ordering, and register mapping algorithms. Under such non-recoverable mode operation, many fine-grained instruction processing performance improvements and power savings can be obtained.
- a method of instruction processing which provides a template for instruction processing which both increases computational efficiency while retaining a useful system level error recovery mechanism which distributes fault recovery obligations across software and hardware to make more efficient use of system resources.
- the method 300 includes creating a software checkpoint at block 301 .
- Creating the software checkpoint may include creating a software-based error-recovery checkpoint that enables both software error recovery and hardware error recovery for instructions executed by a high availability processor.
- the method 300 includes directing the high availability computer processor to enter or begin processing in a non-recoverable mode at block 302 .
- the processor will switch itself into non-recoverable mode to gain extra performance or power savings.
- Directing the processor may include inserting an instruction such as “start non-recoverable mode” into the software program.
- the instruction may be embodied as a simple command, op-code, or instruction which, if fetched and decoded during normal processor operation, directs the processor to enter the non-recoverable mode of operation.
- Such an instruction may be embodied to have a data value stored in a particular portion of cache or memory, which upon access as part of processing the instruction, direct the error recovery resources 106 to free, power-down, or repurpose at least a portion of resources 206 B.
- the instruction may be embodied with a set of bit flags or other additional directives controllable by software to more directly control resource freeing.
- the processor Upon executing the instruction, the processor will checkpoint all instructions prior and switch itself into a new operating policy to operate in a performance-focused or power-saving mode; and no longer support hardware checkpoint recovery. Thereafter, the processor 100 may process instructions using the newly freed resources 206 B and/or general purpose resources 107 at block 303 .
- the processor may process instructions with general purpose resources 107 while at least a portion of resources 206 B are powered-down. If a fault is detected by error recovery resources 206 A at block 304 , a flag, value, or other means for notification may be set at block 307 , and instructions may be retried by the software at block 308 using its software checkpoint. The notification may be done by interrupting the current instruction stream, and the processor can post a special interrupt into the software code. If interrupt handling is not applicable or desirable, the processor may alternatively jump into a pre-specified instruction address. Such an instruction address may be a fixed location in storage, or can be specified as an operand address of the “start non-recoverable mode” instruction, for example. Otherwise, processing may continue in the non-recoverable mode until a new software checkpoint is necessary. This determination is done for software-based error recovery (e.g., see block 305 ).
- any instructions whose dispatch/issue rule typically require may now execute at earlier time.
- physical register pool general registers/GPRs, floating point registers/FPRs, condition code registers/CCRs, etc.
- cache updates due to storage updating instructions can be updated without waiting for instruction checkpointing.
- increased processing efficiency can be realized.
- Another special instruction may be issued to direct the processor to enter or begin processing in recoverable mode at block 306 .
- the processor will switch itself into recoverable mode to again begin the support for potential hardware-specific fault recovery.
- Directing the processor may include issuing a special instruction such as “end non-recoverable mode.”
- the instruction may be embodied as a simple command, op-code, or instruction which, if fetched and decoded during normal processor operation, directs the processor to enter recoverable mode of operation.
- Such an instruction may be embodied to have a data value stored in a particular portion of cache or memory, which upon access as part of processing the instruction, direct the repurposed and/or powered-down resources 206 B to be purposed as error recovery resources.
- the instruction may be embodied with a set of bit flags or other additional directives controllable by software to more directly control resource freeing.
- the processor may check that all prior instructions are completed, and then switch itself back into the recoverable mode; and again support hardware checkpoint recovery as described above.
- aspects of the present invention may be embodied as a system, method or computer program product (e.g., as illustrated in FIG. 4 ). Furthermore, aspects of the present invention may take the form of a computer program product 400 embodied in one or more computer readable medium(s) 402 having computer readable program code 404 embodied thereon.
- the computer readable medium may be a computer readable signal medium or a computer readable storage medium.
- a computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing.
- a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
- a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof.
- a computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
- Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
- Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages.
- the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
- the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
- LAN local area network
- WAN wide area network
- Internet Service Provider for example, AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.
- These computer program instructions may also be stored in a computer readable medium that can direct a computer apparatus (e.g., as illustrated in FIG. 5 ), other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
- the computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
- FIG. 5 illustrates a computer apparatus, according to an exemplary embodiment. Therefore, portions or the entirety of the methodologies described herein may be executed as instructions in a processor 502 of the computer system 500 .
- the computer system 500 includes memory 501 for storage of instructions and information, input device(s) 503 for computer communication, and display device 504 .
- the present invention may be implemented, in software, for example, as any suitable computer program on a computer system somewhat similar to computer system 500 .
- each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s).
- the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- Retry When Errors Occur (AREA)
- Test And Diagnosis Of Digital Computers (AREA)
- Debugging And Monitoring (AREA)
Abstract
Description
- This application is a continuation of U.S. patent application Ser. No. 13/447,554, filed Apr. 16, 2012, the disclosure of which is incorporated by reference herein in its entirety.
- The present invention relates to error recovery in high availability processors, and more specifically, exemplary embodiments of the present invention relate to error recovery systems and methods of instruction processing for high availability processors with both recoverable and non-recoverable modes of operation.
- High availability computer systems necessitate both detection of hardware faults and methods to recover from the detected faults and prevent any incorrect results. In a conventional microprocessor supporting fault recovery, most fault recovery operations are hardware-specific and integrated within the microprocessor itself. Therefore, software executed on the microprocessor may run uninterrupted while lacking disruption or signaling resulting from transient hardware faults within the microprocessor hardware.
- These conventional recovery mechanisms implemented in microprocessors usually discard instructions that are processed, or potentially processed, through faulty circuits, while keeping results from any chronologically older instructions that are processed prior to detecting a fault. In order to differentiate as to whether or not results of instructions are potentially faulty, results need to be buffered and/or held until associated results are checked against any potential faulty conditions before these instructions (and their results) are committed as non-faulty. If a faulty condition is detected, these potentially faulty results will need to be rolled-back, and any affected instruction will be discarded and later reissued.
- In order to achieve the capabilities described above, extra pipeline resources are necessary to buffer instruction results until no faults are detected. In addition, because instructions need to be retired from a good architectural state, appropriate states (architectural and sometimes non-architectural) need to be maintained (e.g., through check-points). Such buffering, maintenance, and check-pointing increases overall circuitry required or reduces the net available resources available for instruction processing in conventional microprocessors supporting fault-recovery.
- According to exemplary embodiments of the present invention, a method for performing error recovery is provided. A software recovery checkpoint is created by a processor. The processor is dynamically switched into a non-recoverable processing mode of operation based on creating the software recovery checkpoint. The non-recoverable processing mode of operation is mode in which a subset of hardware error recovery resources are powered-down or re-purposed for instruction processing. It is determined, during the non-recoverable processing mode of operation, that a new software recovery checkpoint is required. Based on the determining that a new software recovery checkpoint is required, the processor is dynamically switched into a recoverable processing mode of operation. The recoverable processing mode of operation is a mode in which hardware error recovery resources, including at least one of the hardware error recovery resources in the subset, are purposed for hardware error recovery operations.
- Additional features and advantages are realized through the techniques of the present invention. Other embodiments and aspects of the invention are described in detail herein and are considered a part of the claimed invention. For a better understanding of the invention with the advantages and the features, refer to the description and to the drawings.
- The subject matter which is regarded as the invention is particularly pointed out and distinctly claimed in the claims at the conclusion of the specification. The forgoing and other features, and advantages of the invention are apparent from the following detailed description taken in conjunction with the accompanying drawings in which:
-
FIG. 1 illustrates a diagram of a high availability computer processor in recoverable mode, according to an exemplary embodiment of the present invention; -
FIG. 2 illustrates a diagram of a high availability computer processor in non-recoverable mode, according to an exemplary embodiment of the present invention; -
FIG. 3 illustrates a flow chart of a method of processing instructions in a high availability computer processor with non-recoverable mode support, according to an exemplary embodiment of the present invention; -
FIG. 4 illustrates a computer-usable storage medium, according to an exemplary embodiment of the present invention; and -
FIG. 5 illustrates a diagram of a computer apparatus, according to an exemplary embodiment of the present invention. - According to exemplary embodiments of the present invention, a high availability computer processor and method of processing instructions on a high availability processor are provided which increase resources available for instruction execution through dynamic changes to fault recovery systems available to the processor. For example, according to at least one exemplary embodiment, a high availability computer processor may dynamically switch from a fully recoverable mode into one form of non-recoverable mode which may free resources (e.g., registers, buffers, etc.) typically used for fault/error recovery for other operations. Therefore, when operating in a non-recoverable mode, the freed resources may allow increased computational throughput as compared to other computer processors (i.e., processors with only recoverable modes) while still allowing for an acceptable level of fault recovery as enabled through software-created checkpoints rather than costly hardware-specific checkpoints.
- For example, a hardware mechanism may be enabled by software such that the processor dynamically configures itself to execute in either of the two modes. While in a non-recoverable mode of operation, the software may provide some coarse grain recovery mechanism, and can recover from faults using software-based checkpoint schemes, thereby allowing the same measure of fault recovery while also having freed resources from non-used hardware recovery resources. Thus, an application (or code section) may instruct the processor to go into a non-recoverable mode, and only be notified if a fault is detected. Otherwise, the processor can remain (or be switched back) into its recoverable mode.
- A non-recoverable processor state has multiple benefits. The extra resources required to save a previously known good checkpoint state can be allocated for productive work. Any transient buffering states can also be reduced. For example, if part of a physical register pool is allocated to hardware checkpointing, these registers can now be used for speculative processing instead. Furthermore, additional speculative processing may be allowed to complete ahead of time before all errors are collected and a checkpoint is taken. Results can be committed before checkpointing is performed by the hardware, which is no longer required in a non-recoverable mode. Thereafter, once software enabled error recovery operations require a new software checkpoint, the processor may be directed to switch back into a recoverable mode of operation such that error recovery is hardware-facilitated while the software creates a new checkpoint. Subsequently, the processor may return to a non-recoverable mode with newly freed resources.
- According to additional exemplary embodiments of the present invention, a high availability computer processor and method of processing instructions on a high availability processor are provided which decrease power consumption through dynamic changes to fault recovery systems available to the processor. For example, according to at least one exemplary embodiment, a high availability computer processor may dynamically switch from a fully recoverable mode into one form of non-recoverable mode which may free and power-down resources (e.g., registers, buffers, etc.) typically used for fault/error recovery for other operations. Therefore, when operating in a non-recoverable mode, the powered-down resources do not contribute to overall power consumption while still allowing for an acceptable level of fault recovery as enabled through software-created checkpoints rather than costly hardware-specific checkpoints.
- Furthermore, the exemplary embodiments noted above may be implemented in combination such that a portion of freed resources are powered-down while another portion of freed resources are repurposed. In this manner, a plurality of operational states may become apparent where multiple benefits in computer processing are realized in contrast to existing technologies.
- Turning now to
FIG. 1 , a high availability computer processor according to an exemplary embodiment is illustrated. As shown, theprocessor 100 includes a cache 101 (e.g., data and instruction cache) which may be divided into a plurality of different cache levels or designations. Theprocessor 100 further includesinstruction fetch circuitry 102 configured to fetch instructions from thecache 101. Theprocessor 100 further includesinstruction decode circuitry 103 configured to receive fetched instructions from theinstruction fetch circuitry 102. Theprocessor 100 further includesinstruction dispatch circuitry 104 configured to dispatch instructions decoded throughcircuitry 103. Upon dispatch, the instructions are issued and executed throughinstruction execution portion 105, which is further configured to fetch associated data from thecache 101. Detailed handling of out of order instruction execution and support are assumed to be handled mainly insidecircuitries 105. - Thus, as described above,
processor 100 includes relatively common and general portions which function in an anticipated manner. These portions may be configured for reduced or complex (e.g., RISC or CISC) instruction sets or entirely specialized instruction sets according to any desired implementation of exemplary embodiments. Therefore, theprocessor 100 should not be limited to any specific computer processor, but should be equally applicable to any computer processor including somewhat similar or equivalent componentry. - Turning back to
FIG. 1 , theprocessor 100 further includeserror recovery resources 106 in communication with thecache 101 andinstruction execution portion 105. Theerror recovery resources 106 may include fault recovery components comprising queues, buffers, thread processing units, registers, and any other suitable components. Theerror recovery resources 106 may process, create, and store instructions and results for error recovery facilitation. Furthermore, theerror recovery resources 106 may detect hardware faults, capture checkpoints, and perform checkpoint retry upon detection of faults. - Turning back to
FIG. 1 , theprocessor 100 further includesgeneral purpose resources 107 in communication with theinstruction execution portion 105 and theerror recovery resources 106 throughchannel 108. The general purpose resources may include general purpose registers, floating point registers, special purpose registers, or any other suitable components for processing and storing instruction results in a controlled fashion. - Although
error recovery resources 106 andgeneral purpose resources 107 are illustrated as individual components, these resources can be distributed into 101, 102, 103, 104, 105 and other processor components not explicitly illustrated. Furthermore, a plurality of threads may be executed onother components processor 100 using botherror recovery resources 106 andgeneral purpose resources 107. - It is noted that although
error recovery resources 106 andgeneral purpose resources 107 are configured to provide different functionality, the associated components organized therein comprise at least a portion of generally the same or similar components. For example, registers and buffers may be both included in theerror recovery resources 106 and thegeneral purpose resources 107. Therefore, according to exemplary embodiments of the present invention, a portion of the error recovery resources may be freed, powered-down, or re-purposed to function in unison with the general purpose resources in at least one new operating mode. Such is illustrated inFIG. 2 . - As shown in
FIG. 2 , a portion oferror recovery resources 106 are freed and repurposed asresources 206B to function in accordance with thegeneral purpose resources 107. Alternatively, a portion or all ofresources 206B may be powered-down to lower power consumption. Further, a remaining portion oferror recovery resources 206A may remain dedicated to error recovery and further be configured to free, power-down, repurpose, or reintegrate theresources 206B to function as either error recovery resources or general purpose resources withinprocessor 100. It follows then, that if additional resources are freed for use in instruction processing, theprocessor 100 may operate at a faster rate than if allerror recovery resources 106 are dedicated to handle hardware fault recovery. Moreover, if a portion of the additional resources are powered-down, power savings may be realized. It is noted that theprocessor 100 still detects hardware faults and reports them to the operating system or application code. However, the checkpoint capturing and retry mechanisms are disabled in the hardware and, instead, are performed by software. - In this configuration, the processor changes its basic operation and does not allocate resources for saving checkpoints nor does it postpone instruction execution or completion that might have to wait until the creation and validation of checkpoints. Such a configuration change may be accomplished by modifying the typical instruction dispatch/issue/execution rules of precedence, storage update ordering, and register mapping algorithms. Under such non-recoverable mode operation, many fine-grained instruction processing performance improvements and power savings can be obtained.
- Furthermore, according to exemplary embodiments of the present invention, a method of instruction processing has been provided which provides a template for instruction processing which both increases computational efficiency while retaining a useful system level error recovery mechanism which distributes fault recovery obligations across software and hardware to make more efficient use of system resources.
- Turning to
FIG. 3 , a method of instruction processing in a high availability computer processor is illustrated. Themethod 300 includes creating a software checkpoint atblock 301. Creating the software checkpoint may include creating a software-based error-recovery checkpoint that enables both software error recovery and hardware error recovery for instructions executed by a high availability processor. - Upon creation of the checkpoint, the
method 300 includes directing the high availability computer processor to enter or begin processing in a non-recoverable mode atblock 302. Thus, the processor will switch itself into non-recoverable mode to gain extra performance or power savings. Directing the processor may include inserting an instruction such as “start non-recoverable mode” into the software program. The instruction may be embodied as a simple command, op-code, or instruction which, if fetched and decoded during normal processor operation, directs the processor to enter the non-recoverable mode of operation. Such an instruction may be embodied to have a data value stored in a particular portion of cache or memory, which upon access as part of processing the instruction, direct theerror recovery resources 106 to free, power-down, or repurpose at least a portion ofresources 206B. Alternatively, the instruction may be embodied with a set of bit flags or other additional directives controllable by software to more directly control resource freeing. Upon executing the instruction, the processor will checkpoint all instructions prior and switch itself into a new operating policy to operate in a performance-focused or power-saving mode; and no longer support hardware checkpoint recovery. Thereafter, theprocessor 100 may process instructions using the newly freedresources 206B and/orgeneral purpose resources 107 atblock 303. Alternatively or in combination, the processor may process instructions withgeneral purpose resources 107 while at least a portion ofresources 206B are powered-down. If a fault is detected byerror recovery resources 206A atblock 304, a flag, value, or other means for notification may be set atblock 307, and instructions may be retried by the software atblock 308 using its software checkpoint. The notification may be done by interrupting the current instruction stream, and the processor can post a special interrupt into the software code. If interrupt handling is not applicable or desirable, the processor may alternatively jump into a pre-specified instruction address. Such an instruction address may be a fixed location in storage, or can be specified as an operand address of the “start non-recoverable mode” instruction, for example. Otherwise, processing may continue in the non-recoverable mode until a new software checkpoint is necessary. This determination is done for software-based error recovery (e.g., see block 305). - It should be appreciated that while operating in the non-recoverable mode, any instructions whose dispatch/issue rule typically require (e.g., if executed in recoverable mode) to be next to be checkpointed may now execute at earlier time. Furthermore, physical register pool (general registers/GPRs, floating point registers/FPRs, condition code registers/CCRs, etc.) resources that would have been needed and reserved for hardware-based checkpoint retry can now be allocated for instruction processing. Furthermore, cache updates due to storage updating instructions (e.g., as in a simple store instruction) can be updated without waiting for instruction checkpointing. Thus, increased processing efficiency can be realized.
- Thereafter, if software is ready to take another checkpoint as determined at
block 305, another special instruction may be issued to direct the processor to enter or begin processing in recoverable mode atblock 306. Thus, the processor will switch itself into recoverable mode to again begin the support for potential hardware-specific fault recovery. Directing the processor may include issuing a special instruction such as “end non-recoverable mode.” The instruction may be embodied as a simple command, op-code, or instruction which, if fetched and decoded during normal processor operation, directs the processor to enter recoverable mode of operation. Such an instruction may be embodied to have a data value stored in a particular portion of cache or memory, which upon access as part of processing the instruction, direct the repurposed and/or powered-downresources 206B to be purposed as error recovery resources. Alternatively, the instruction may be embodied with a set of bit flags or other additional directives controllable by software to more directly control resource freeing. Upon executing the special instruction, the processor may check that all prior instructions are completed, and then switch itself back into the recoverable mode; and again support hardware checkpoint recovery as described above. - As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or computer program product (e.g., as illustrated in
FIG. 4 ). Furthermore, aspects of the present invention may take the form of acomputer program product 400 embodied in one or more computer readable medium(s) 402 having computer readable program code 404 embodied thereon. - Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
- A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
- Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
- Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
- Aspects of the present invention are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
- These computer program instructions may also be stored in a computer readable medium that can direct a computer apparatus (e.g., as illustrated in
FIG. 5 ), other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks. - The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
- As noted above, the methodologies described hereinbefore may be implemented by a computer system or apparatus. For example,
FIG. 5 illustrates a computer apparatus, according to an exemplary embodiment. Therefore, portions or the entirety of the methodologies described herein may be executed as instructions in aprocessor 502 of the computer system 500. The computer system 500 includesmemory 501 for storage of instructions and information, input device(s) 503 for computer communication, anddisplay device 504. Thus, the present invention may be implemented, in software, for example, as any suitable computer program on a computer system somewhat similar to computer system 500. - The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
- The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one more other features, integers, steps, operations, element components, and/or groups thereof.
- The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.
- The flow diagrams depicted herein are just one example. There may be many variations to this diagram or the steps (or operations) described therein without departing from the spirit of the invention. For instance, the steps may be performed in a differing order or steps may be added, deleted or modified. All of these variations are considered a part of the claimed invention.
- While the preferred embodiment to the invention had been described, it will be understood that those skilled in the art, both now and in the future, may make various improvements and enhancements which fall within the scope of the claims which follow. These claims should be construed to maintain the proper protection for the invention first described.
Claims (7)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US13/785,103 US9043641B2 (en) | 2012-04-16 | 2013-03-05 | Reconfigurable recovery modes in high availability processors |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US13/447,554 US8954797B2 (en) | 2012-04-16 | 2012-04-16 | Reconfigurable recovery modes in high availability processors |
| US13/785,103 US9043641B2 (en) | 2012-04-16 | 2013-03-05 | Reconfigurable recovery modes in high availability processors |
Related Parent Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US13/447,554 Continuation US8954797B2 (en) | 2012-04-16 | 2012-04-16 | Reconfigurable recovery modes in high availability processors |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20130275806A1 true US20130275806A1 (en) | 2013-10-17 |
| US9043641B2 US9043641B2 (en) | 2015-05-26 |
Family
ID=47843295
Family Applications (2)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US13/447,554 Expired - Fee Related US8954797B2 (en) | 2012-04-16 | 2012-04-16 | Reconfigurable recovery modes in high availability processors |
| US13/785,103 Expired - Fee Related US9043641B2 (en) | 2012-04-16 | 2013-03-05 | Reconfigurable recovery modes in high availability processors |
Family Applications Before (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US13/447,554 Expired - Fee Related US8954797B2 (en) | 2012-04-16 | 2012-04-16 | Reconfigurable recovery modes in high availability processors |
Country Status (5)
| Country | Link |
|---|---|
| US (2) | US8954797B2 (en) |
| CN (1) | CN104246710B (en) |
| DE (1) | DE112013002054T5 (en) |
| GB (1) | GB2514700B (en) |
| WO (1) | WO2013156201A1 (en) |
Cited By (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20140208169A1 (en) * | 2013-01-18 | 2014-07-24 | Unisys Corporation | Domain scripting language framework for service and system integration |
| US20150227429A1 (en) * | 2014-02-10 | 2015-08-13 | Via Technologies, Inc. | Processor that recovers from excessive approximate computing error |
| US9384020B2 (en) | 2013-01-18 | 2016-07-05 | Unisys Corporation | Domain scripting language framework for service and system integration |
| US10235232B2 (en) | 2014-02-10 | 2019-03-19 | Via Alliance Semiconductor Co., Ltd | Processor with approximate computing execution unit that includes an approximation control register having an approximation mode flag, an approximation amount, and an error threshold, where the approximation control register is writable by an instruction set instruction |
| US10997027B2 (en) * | 2017-12-21 | 2021-05-04 | Arizona Board Of Regents On Behalf Of Arizona State University | Lightweight checkpoint technique for resilience against soft errors |
| US11449380B2 (en) | 2018-06-06 | 2022-09-20 | Arizona Board Of Regents On Behalf Of Arizona State University | Method for detecting and recovery from soft errors in a computing device |
Families Citing this family (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106032619B (en) * | 2015-03-20 | 2018-05-01 | 无锡飞翎电子有限公司 | Machine communicating with washing method |
| CN108270832B (en) * | 2016-12-30 | 2020-11-06 | 华为技术有限公司 | A fault replay method and device |
| US10922203B1 (en) * | 2018-09-21 | 2021-02-16 | Nvidia Corporation | Fault injection architecture for resilient GPU computing |
| US10884845B2 (en) * | 2018-11-08 | 2021-01-05 | International Business Machines Corporation | Increasing processing capacity of processor cores during initial program load processing |
| US10884818B2 (en) | 2018-11-08 | 2021-01-05 | International Business Machines Corporation | Increasing processing capacity of virtual machines |
| US10990434B2 (en) | 2019-02-05 | 2021-04-27 | International Business Machines Corporation | Increasing processing capacity of virtual machines for an abnormal event |
| US11327767B2 (en) | 2019-04-05 | 2022-05-10 | International Business Machines Corporation | Increasing resources for partition to compensate for input/output (I/O) recovery event |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20060179290A1 (en) * | 2005-02-10 | 2006-08-10 | International Business Machines Corporation | System and method for creating precise exceptions |
Family Cites Families (13)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5692121A (en) | 1995-04-14 | 1997-11-25 | International Business Machines Corporation | Recovery unit for mirrored processors |
| JP3357777B2 (en) * | 1996-01-26 | 2002-12-16 | 株式会社東芝 | Program control system |
| US6058491A (en) | 1997-09-15 | 2000-05-02 | International Business Machines Corporation | Method and system for fault-handling to improve reliability of a data-processing system |
| US6247118B1 (en) | 1998-06-05 | 2001-06-12 | Mcdonnell Douglas Corporation | Systems and methods for transient error recovery in reduced instruction set computer processors via instruction retry |
| US7467325B2 (en) | 2005-02-10 | 2008-12-16 | International Business Machines Corporation | Processor instruction retry recovery |
| US7478276B2 (en) | 2005-02-10 | 2009-01-13 | International Business Machines Corporation | Method for checkpointing instruction groups with out-of-order floating point instructions in a multi-threaded processor |
| US20080263324A1 (en) | 2006-08-10 | 2008-10-23 | Sehat Sutardja | Dynamic core switching |
| CN101196847A (en) * | 2006-12-08 | 2008-06-11 | 深圳艾科创新微电子有限公司 | Method for automatic maintenance of CPU program memory and hardware cell structure |
| US8108714B2 (en) | 2007-03-12 | 2012-01-31 | International Business Machines Corporation | Method and system for soft error recovery during processor execution |
| JP5161696B2 (en) * | 2008-08-07 | 2013-03-13 | 株式会社日立製作所 | Virtual computer system, error recovery method in virtual computer system, and virtual computer control program |
| US8078851B2 (en) | 2008-12-18 | 2011-12-13 | Faraday Technology Corp. | Processor and method for recovering global history shift register and return address stack thereof by determining a removal range of a branch recovery table |
| US8140905B2 (en) | 2010-02-05 | 2012-03-20 | International Business Machines Corporation | Incremental problem determination and resolution in cloud environments |
| US20120221884A1 (en) * | 2011-02-28 | 2012-08-30 | Carter Nicholas P | Error management across hardware and software layers |
-
2012
- 2012-04-16 US US13/447,554 patent/US8954797B2/en not_active Expired - Fee Related
-
2013
- 2013-03-05 US US13/785,103 patent/US9043641B2/en not_active Expired - Fee Related
- 2013-03-08 GB GB1414521.3A patent/GB2514700B/en active Active
- 2013-03-08 CN CN201380020281.0A patent/CN104246710B/en not_active Expired - Fee Related
- 2013-03-08 DE DE112013002054.8T patent/DE112013002054T5/en not_active Withdrawn
- 2013-03-08 WO PCT/EP2013/054696 patent/WO2013156201A1/en active Application Filing
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20060179290A1 (en) * | 2005-02-10 | 2006-08-10 | International Business Machines Corporation | System and method for creating precise exceptions |
Cited By (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20140208169A1 (en) * | 2013-01-18 | 2014-07-24 | Unisys Corporation | Domain scripting language framework for service and system integration |
| US9384020B2 (en) | 2013-01-18 | 2016-07-05 | Unisys Corporation | Domain scripting language framework for service and system integration |
| US20150227429A1 (en) * | 2014-02-10 | 2015-08-13 | Via Technologies, Inc. | Processor that recovers from excessive approximate computing error |
| US9588845B2 (en) * | 2014-02-10 | 2017-03-07 | Via Alliance Semiconductor Co., Ltd. | Processor that recovers from excessive approximate computing error |
| US10235232B2 (en) | 2014-02-10 | 2019-03-19 | Via Alliance Semiconductor Co., Ltd | Processor with approximate computing execution unit that includes an approximation control register having an approximation mode flag, an approximation amount, and an error threshold, where the approximation control register is writable by an instruction set instruction |
| US10997027B2 (en) * | 2017-12-21 | 2021-05-04 | Arizona Board Of Regents On Behalf Of Arizona State University | Lightweight checkpoint technique for resilience against soft errors |
| US11449380B2 (en) | 2018-06-06 | 2022-09-20 | Arizona Board Of Regents On Behalf Of Arizona State University | Method for detecting and recovery from soft errors in a computing device |
Also Published As
| Publication number | Publication date |
|---|---|
| CN104246710B (en) | 2017-10-20 |
| GB201414521D0 (en) | 2014-10-01 |
| DE112013002054T5 (en) | 2015-03-05 |
| US20130275801A1 (en) | 2013-10-17 |
| WO2013156201A1 (en) | 2013-10-24 |
| US8954797B2 (en) | 2015-02-10 |
| CN104246710A (en) | 2014-12-24 |
| GB2514700A (en) | 2014-12-03 |
| US9043641B2 (en) | 2015-05-26 |
| GB2514700B (en) | 2015-04-08 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US9043641B2 (en) | Reconfigurable recovery modes in high availability processors | |
| US7721076B2 (en) | Tracking an oldest processor event using information stored in a register and queue entry | |
| US10289415B2 (en) | Method and apparatus for execution of threads on processing slices using a history buffer for recording architected register data | |
| US9361111B2 (en) | Tracking speculative execution of instructions for a register renaming data store | |
| US20070043934A1 (en) | Early misprediction recovery through periodic checkpoints | |
| US10073699B2 (en) | Processing instructions in parallel with waw hazards and via a distributed history buffer in a microprocessor having a multi-execution slice architecture | |
| US10013255B2 (en) | Hardware-based run-time mitigation of conditional branches | |
| US20170109093A1 (en) | Method and apparatus for writing a portion of a register in a microprocessor | |
| US20170109167A1 (en) | Method and apparatus for restoring data to a register file of a processing unit | |
| US10977038B2 (en) | Checkpointing speculative register mappings | |
| CN108920190B (en) | Apparatus and method for determining a recovery point from which recovery instructions are executed | |
| US20140019734A1 (en) | Data processing apparatus and method using checkpointing | |
| US9268575B2 (en) | Flush operations in a processor | |
| US20220027162A1 (en) | Retire queue compression | |
| US10545765B2 (en) | Multi-level history buffer for transaction memory in a microprocessor | |
| US20050138333A1 (en) | Thread switching mechanism | |
| US10379867B2 (en) | Asynchronous flush and restore of distributed history buffer | |
| WO2017098344A1 (en) | Run-time code parallelization with independent speculative committing of instructions per segment | |
| EP2717156A1 (en) | Speculative privilege elevation | |
| US10909034B2 (en) | Issue queue snooping for asynchronous flush and restore of distributed history buffer | |
| US10996995B2 (en) | Saving and restoring a transaction memory state | |
| WO2017072615A1 (en) | Hardware-based run-time mitigation of conditional branches | |
| US10127121B2 (en) | Operation of a multi-slice processor implementing adaptive failure state capture |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BUSABA, FADI Y.;CARLOUGH, STEVEN R.;KRYGOWSKI, CHRISTOPHER A.;AND OTHERS;SIGNING DATES FROM 20130204 TO 20130205;REEL/FRAME:029922/0513 |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
| FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| LAPS | Lapse for failure to pay maintenance fees |
Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
| FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20190526 |