[go: up one dir, main page]

US20080186672A1 - Method Of Latent Fault Checking A Cooling Module - Google Patents

Method Of Latent Fault Checking A Cooling Module Download PDF

Info

Publication number
US20080186672A1
US20080186672A1 US12/099,358 US9935808A US2008186672A1 US 20080186672 A1 US20080186672 A1 US 20080186672A1 US 9935808 A US9935808 A US 9935808A US 2008186672 A1 US2008186672 A1 US 2008186672A1
Authority
US
United States
Prior art keywords
fan
signal
speed
fan speed
latent fault
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US12/099,358
Inventor
Mark S. Lanus
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Smart Embedded Computing Inc
Original Assignee
Emerson Network Power Embedded Computing Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Emerson Network Power Embedded Computing Inc filed Critical Emerson Network Power Embedded Computing Inc
Priority to US12/099,358 priority Critical patent/US20080186672A1/en
Assigned to EMERSON NETWORK POWER - EMBEDDED COMPUTING, INC. reassignment EMERSON NETWORK POWER - EMBEDDED COMPUTING, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MOTOROLA, INC.
Publication of US20080186672A1 publication Critical patent/US20080186672A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3058Monitoring arrangements for monitoring environmental properties or parameters of the computing system or of the computing system component, e.g. monitoring of power, currents, temperature, humidity, position, vibrations
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3044Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is the mechanical casing of the computing system

Definitions

  • Embedded computer chassis systems generally include numerous rack-mounted computer cards connected to a backplane.
  • the computer cards may include payload cards and switch module cards that communicate using a bus or switched fabric topology over the backplane.
  • the payload cards and switch cards may be chosen so as to provide the computer chassis with the functionality and features desired by a user.
  • Each embedded computer chassis generally includes cooling modules mounted in the chassis to cool the computer cards.
  • Most cooling modules in computer equipment implement variable speed fan control and fan tachometer monitoring to detect fan failures or imminent fan failures.
  • the fan tachometer or fan controller may fail in such a way as to give a false reading indicating that the fan is alright. This is a latent fault as it is a fault that occurred but does not yet compromise the cooling subsystem. Further, if the fan or fan control then fails, the latent fault is activated and the fan tachometer provides a reading indicating that the fan is working properly when in fact the fan has failed.
  • the prior art does not currently provide a method to detect latent faults in cooling subsystems of embedded computer systems.
  • FIG. 1 representatively illustrates a computer system in accordance with an exemplary embodiment
  • FIG. 2 representatively illustrates a computer system in accordance with another exemplary embodiment
  • FIG. 3 representatively illustrates a flow diagram in accordance with an exemplary embodiment.
  • the terms “a” or “an”, as used herein, are defined as one, or more than one.
  • the term “plurality,” as used herein, is defined as two, or more than two.
  • the term “another,” as used herein, is defined as at least a second or more.
  • the terms “including” and/or “having,” as used herein, are defined as comprising (i.e., open language).
  • the term “coupled,” as used herein, is defined as connected, although not necessarily directly, and not necessarily mechanically.
  • program “software application,” and the like as used herein, are defined as a sequence of instructions designed for execution on a computer system.
  • a program, computer program, or software application may include a subroutine, a function, a procedure, an object method, an object implementation, an executable application, an applet, a servlet, an object code, a shared library/dynamic load library and/or other sequence of instructions designed for execution on a computer system.
  • a component may include a computer program, software application, or one or more lines of computer readable processing instructions.
  • Software blocks that perform various embodiments can be part of computer program modules comprising computer instructions, such control algorithms that are stored in a computer-readable medium such as memory.
  • Computer instructions can instruct processors to perform any methods described below. In other embodiments, additional modules could be provided as needed.
  • FIG. 1 representatively illustrates a computer system 100 in accordance with various exemplary embodiments.
  • Computer system 100 may include an embedded computer chassis 101 having a front side 102 and a rear side 104 .
  • computer system 100 and embedded computer chassis 101 may comply with the Advanced Telecom and Computing Architecture (ATCATM) standard as defined in the PICMG 3.0 AdvancedTCA specification.
  • ATCATM Advanced Telecom and Computing Architecture
  • computer system 100 and embedded computer chassis 101 may comply with CompactPCI standard.
  • embedded computer chassis 101 may comply with MicroTCA standard as defined in PICMG® MicroTCA Draft 0.6—Micro Telecommunications Computing Architecture Base Specification (and subsequent revisions).
  • the various embodiments are not limited to a computer system complying with any of these standards, and computer systems complying with other standards are within the scope of the present teachings.
  • Embedded computer chassis 101 may include a plurality of slots for inserting computing modules 118 , for example payload modules and switch modules.
  • Computing modules 118 may couple to backplane (not shown for clarity) to facilitate power distribution and/or communication using a bus topology, switch fabric topology, and the like.
  • backplane may comprise for example and without limitation, 100-ohm differential signaling pairs.
  • computing modules 118 When in operation, computing modules 118 generate heat that must be removed from embedded computer chassis 101 .
  • Computing modules 118 may include at least one switch module coupled to any number of payload modules via the backplane, which may accommodate any combination of a packet switched backplane including a distributed switched fabric, or a multi-drop bus type backplane.
  • Backplanes architectures may include CompactPCI, Advanced Telecom Computing Architecture (AdvancedTCA), MicroTCA, and the like.
  • Payload modules may add functionality to computer system 100 through the addition of processors, memory, storage devices, I/O elements, and the like.
  • payload module may include any combination of processors, memory, storage devices, I/O elements, and the like, to give computer system 100 any functionality desired by a user.
  • computer system 100 can use a switch module as a central switching hub with any number of payload modules coupled to one or more switch modules.
  • Computer system 100 may support a point-to-point, switched input/output (I/O) fabric.
  • Computer system 100 may be implemented by using one or more of a plurality of switched fabric network standards, for example and without limitation, InfiniBandTM, Serial RapidIOTM, EthernetTM, AdvancedTCATM, PCI ExpressTM, Gigabit Ethernet, and the like.
  • Computer system 100 is not limited to the use of these switched fabric network standards and the use of any switched fabric network standard is within the scope of the present teachings.
  • embedded computer chassis 101 may include a cooling subsystem comprising any number of cooling modules 108 for dissipating heat generated by computing modules 118 , temperature sensors and other hardware and software modules to detect and react to temperature changes in embedded computer chassis.
  • cooling module 108 may be disposed adjacent to computing modules 118 .
  • Embedded computer chassis 101 may include a plurality of fan module bays 106 , each disposed to accept a cooling module 108 for drawing cooling air 120 through embedded computer chassis 101 .
  • each cooling module 108 may include one or more fans or blowers, power and control circuitry, and the like (as discussed more fully below).
  • Cooling module 108 may plug into each fan module bay 106 and receive power from a central or dedicated power supply for embedded computer chassis 101 .
  • embedded computer chassis 101 may include a cooling module cover 110 to provide access to cooling module for maintenance and system diagnostics.
  • term “fan” or “fans” will be understood to include “blowers,” “fans,” or any combination of “blowers” and “fans.”
  • FIG. 2 representatively illustrates a computer system 200 in accordance with various embodiments.
  • computer system 200 may include cooling module 208 coupled to at least one bus master module 230 .
  • Cooling module 208 may be a modular cooling fan tray coupled for insertion into fan module bays 106 , and include one or more fans 236 , and a fan controller module 232 coupled to issue commands to the fan such as increase speed, decrease speed, on/off signals, and the like.
  • Cooling module 208 may also include a fan tachometer 234 coupled to read the fan speed 239 in rpm, and the like, and report the fan speed 239 to fan controller module 232 , which may then report fan speed 239 to bus master module 230 .
  • a bus master module 230 which may function to control a maintenance bus 231 .
  • maintenance bus 231 may communicate management data between bus master module 230 and cooling module 208 .
  • Management data may include data pertaining to, for example and without limitation, temperature, voltage, amperage, bus traffic, status indications, and the like.
  • Management data may also include instructions, for example and without limitation, instructions for cooling fans, adjustment of power supplies, and the like.
  • Management data communicated over maintenance bus 231 may function to monitor and maintain cooling module 208 .
  • Management data differs from other data transmitted on a data bus (not shown for clarity) in that management data is used for monitoring and maintaining, among other things, cooling module 208 , while a traditional data bus functions to communicate data transmitted to/from and processed by computing modules 118 .
  • maintenance bus 231 may be an Intelligent Platform Management Bus (IPMB) as specified in an Intelligent Platform Management Interface Specification.
  • IPMB Intelligent Platform Management Bus
  • the Intelligent Platform Management Bus may be an I 2 C-based bus that provides a standardized interconnection between different boards within a chassis.
  • the IPMB can also serve as a standardized interface for auxiliary or emergency management add-in cards.
  • bus master module 230 may be a Shelf Management Controller (ShMC) as is know in the AdvancedTCA computer platform.
  • fan controller module 232 controls the fan speed 239 based on cooling requirements of embedded computer chassis 101 . For example, if bus master module 230 detects a temperature increase in embedded computer chassis 101 , it may signal cooling module 208 , particularly fan controller module 232 , that it needs to increase the fan speed 239 to increase cooling air flow. If the system is functioning correctly, fan controller module 232 may issue a command to fan 236 to increase fan speed 239 . This increase in fan speed is detected by fan controller module 232 via fan tachometer 234 , which may measure and report the rpm of fan 236 to bus master module 230 via fan controller module on maintenance bus 231 . The same process may work in reverse if bus master module 230 detects that the temperature of embedded computer chassis 101 is too low. In this instance a decrease in fan speed may be commanded with the corresponding feedback of fan speed via fan tachometer 234 .
  • cooling module 208 is critical to reliable operation of computer system 200
  • full-speed fan control circuit 238 is included such that bus master module 230 may order fan 236 to increase to full-speed, while bypassing maintenance bus 231 and fan controller module 232 .
  • bus master module 230 may order fan 236 to increase to full-speed, while bypassing maintenance bus 231 and fan controller module 232 .
  • bus master module 230 has an alternative path to order an increase in fan speed 239 . This may indicate a failure of fan controller module 232 .
  • bus master module 230 may directly command fan 236 to increase to full speed by issuing full-speed signal 246 , thereby causing fan 236 to increase to full-speed and provide maximum cooling. This feature adds an additional layer of fault tolerance to cooling module 208 and hence increases reliability.
  • a latent fault is a fault that has occurred but is not visible or has not manifested itself. This is contrasted with an active fault that is visible and has manifested itself.
  • fan tachometer 234 or fan controller module 232 fails such that fan speed 239 is indicated as sufficient regardless of what fan speed 239 or the condition of fan 236 really was (voltage or current draw, and the like)
  • bus master module 230 there may be no indication to bus master module 230 that a problem exists.
  • This is referred to as a latent fault as it is a failure of the cooling module 208 but does not trigger an indication of failure until a second fault occurs, (such as insufficient cooling of embedded computer chassis 101 ).
  • a latent fault is a fault that is present but not visible or active.
  • a latent fault within the cooling module 208 needs to be detected before a second fault occurs and activates the latent fault to the status of active fault.
  • This may be the function of fan controller latent fault checking algorithm 242 and full-speed latent fault checking algorithm 240 , which may be any combination of software or hardware functioning to detect a latent fault in cooling module 208 prior to that latent fault manifesting itself as an active fault.
  • Fan controller latent fault checking algorithm 242 may function to test fan controller module 232 , full-speed fan control circuit 238 and fan tachometer 234 prior to an active fault in cooling module 208 . Prior to an active fault in cooling module 208 , or detection of an active fault in cooling module 208 , fan controller latent fault checking algorithm 242 may be utilized periodically to increase the reliability of cooling module 208 and cooling subsystem.
  • Fan controller latent fault checking algorithm 242 attempts to modify fan speed 239 via fan controller module 232 and detect the a change in fan speed 245 at bus master module 230 to determine if fan controller module 232 and fan tachometer 234 are functioning properly.
  • an increase fan speed signal 243 may be communicated from bus master module 230 via fan controller module 232 to increase fan speed 239 . It is determined if an increase in fan speed 241 is detected as measured via fan tachometer 234 .
  • a decrease fan speed signal 244 may be communicated from bus master module 230 via fan controller module 232 to decrease fan speed 239 . It is determined if a decrease in fan speed 242 is detected as measured via fan tachometer 234 . If either the increase in fan speed 241 or the decrease in fan speed 242 are not detected, a latent fault may be indicated in the cooling module 208 .
  • an alarm signal 250 may be generated to notify a system administrator of the latent fault.
  • full-speed latent fault checking algorithm 240 may be employed. Full-speed latent fault checking algorithm 240 attempts to modify fan speed 239 via full-speed fan control circuit 238 , bypassing fan controller module 232 and detect a change in fan speed 245 at bus master module 230 to determine if full-speed fan control circuit 238 , fan controller module 232 and fan tachometer 234 are functioning properly. For example, full-speed signal 246 is communicated to fan 236 via full-speed fan control circuit 238 , bypassing fan controller module 232 . It is determined if an increase in fan speed 241 is detected as measured via fan tachometer 234 .
  • Removal of full-speed signal 246 while bypassing the fan controller module 232 may then allow a decrease in fan speed 239 , for example back to the fan speed prior to implementing the above algorithm. It is determined if a decrease in fan speed 242 is detected as measured via fan tachometer 234 . If either the increase in fan speed 241 or the decrease in fan speed 242 are not detected, a latent fault may be indicated in the cooling module 208 . In some embodiments, an alarm signal 250 may be generated to notify a system administrator of the latent fault.
  • FIG. 3 representatively illustrates a flow diagram 300 in accordance with various exemplary embodiments.
  • an increase fan speed signal is communicated via fan controller module.
  • step 304 it is determined if fan speed has increased. If not, a latent fault is indicated per step 318 . If fan speed has increased, a decrease fan speed signal is communicated via fan controller module in step 306 . In step 308 it is determined if fan speed has decreased. If not, a latent fault is indicated per step 318 .
  • step 308 If fan speed has decreased in step 308 , a full-speed signal is communicated, bypassing fan controller module per step 310 .
  • step 312 it is determined if fan speed has increased. If not, a latent fault is indicated per step 318 . If fan speed has increased, full-speed signal is removed while bypassing fan controller module per step 314 .
  • step 316 it is determined if fan speed has decreased. If not, a latent fault is indicated per step 318 . If fan speed has decreased per step 316 , no latent fault is detected per step 322 . If at any point in the flow diagram latent fault is detected per step 318 , an alarm signal may be generated per step 320 to notify a system administrator of the latent fault.
  • any method or process claims may be executed in any order and are not limited to the specific order presented in the claims.
  • the components and/or elements recited in any apparatus claims may be assembled or otherwise operationally configured in a variety of permutations to produce substantially the same result and are accordingly not limited to the specific configuration recited in the claims.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Cooling Or The Like Of Semiconductors Or Solid State Devices (AREA)
  • Cooling Or The Like Of Electrical Apparatus (AREA)
  • Devices That Are Associated With Refrigeration Equipment (AREA)

Abstract

A computer system includes a cooling module that cools an embedded computer chassis. The cooling module includes a fan and a fan controller that controls the fan speed based on a first signal that represents a desired speed of the fan. A bus master module generates the first signal, generates a second signal that bypasses the fan controller and selectively switches the fan to a full-speed, receives a third signal that indicates an actual speed of the fan, communicates the second signal to switch the fan to full-speed, monitors the third signal to determine if the fan speed changed due to the second signal, and indicates a latent fault if the change in the fan speed is not detected.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This application is a continuation of U.S. patent application Ser. No. 11/336,230 filed on Jan. 20, 2006. The disclosure of the above application is incorporated herein by reference in its entirety.
  • BACKGROUND OF INVENTION
  • Embedded computer chassis systems generally include numerous rack-mounted computer cards connected to a backplane. The computer cards may include payload cards and switch module cards that communicate using a bus or switched fabric topology over the backplane. The payload cards and switch cards may be chosen so as to provide the computer chassis with the functionality and features desired by a user.
  • Each embedded computer chassis generally includes cooling modules mounted in the chassis to cool the computer cards. Most cooling modules in computer equipment implement variable speed fan control and fan tachometer monitoring to detect fan failures or imminent fan failures. However, the fan tachometer or fan controller may fail in such a way as to give a false reading indicating that the fan is alright. This is a latent fault as it is a fault that occurred but does not yet compromise the cooling subsystem. Further, if the fan or fan control then fails, the latent fault is activated and the fan tachometer provides a reading indicating that the fan is working properly when in fact the fan has failed. The prior art does not currently provide a method to detect latent faults in cooling subsystems of embedded computer systems.
  • There is a need, not met in the prior art, for an apparatus and method for latent fault checking a cooling module. Accordingly, there is a significant need for an apparatus that overcomes the deficiencies of the prior art outlined above.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • Representative elements, operational features, applications and/or advantages of the present invention reside inter alia in the details of construction and operation as more fully hereafter depicted, described and claimed—reference being made to the accompanying drawings forming a part hereof, wherein like numerals refer to like parts throughout. Other elements, operational features, applications and/or advantages will become apparent in light of certain exemplary embodiments recited in the Detailed Description, wherein:
  • FIG. 1 representatively illustrates a computer system in accordance with an exemplary embodiment;
  • FIG. 2 representatively illustrates a computer system in accordance with another exemplary embodiment; and
  • FIG. 3 representatively illustrates a flow diagram in accordance with an exemplary embodiment.
  • Elements in the Figures are illustrated for simplicity and clarity and have not necessarily been drawn to scale. For example, the dimensions of some of the elements in the Figures may be exaggerated relative to other elements to help improve understanding of various embodiments of the present invention. Furthermore, the terms “first”, “second”, and the like herein, if any, are used inter alia for distinguishing between similar elements and not necessarily for describing a sequential or chronological order. Moreover, the terms “front”, “back”, “top”, “bottom”, “over”, “under”, and the like in the Description and/or in the Claims, if any, are generally employed for descriptive purposes and not necessarily for comprehensively describing exclusive relative position. Any of the preceding terms so used may be interchanged under appropriate circumstances such that various embodiments described herein may be capable of operation in other configurations and/or orientations than those explicitly illustrated or otherwise described.
  • DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS
  • The following representative descriptions generally relate to exemplary embodiments and the inventor's conception of the best mode, and are not intended to limit the applicability or configuration of the present teachings in any way. Rather, the following description is intended to provide convenient illustrations for implementing various embodiments of the invention. As will become apparent, changes may be made in the function and/or arrangement of any of the elements described in the disclosed exemplary embodiments without departing from the spirit and scope of the present disclosure.
  • For clarity of explanation, various embodiments are presented, in part, as comprising individual functional blocks. The functions represented by these blocks may be provided through the use of either shared or dedicated hardware, including, but not limited to, hardware capable of executing software. The various embodiments are not limited to implementation by any particular set of elements, and the description herein is merely representational of various embodiments.
  • The terms “a” or “an”, as used herein, are defined as one, or more than one. The term “plurality,” as used herein, is defined as two, or more than two. The term “another,” as used herein, is defined as at least a second or more. The terms “including” and/or “having,” as used herein, are defined as comprising (i.e., open language). The term “coupled,” as used herein, is defined as connected, although not necessarily directly, and not necessarily mechanically. The terms “program,” “software application,” and the like as used herein, are defined as a sequence of instructions designed for execution on a computer system. A program, computer program, or software application may include a subroutine, a function, a procedure, an object method, an object implementation, an executable application, an applet, a servlet, an object code, a shared library/dynamic load library and/or other sequence of instructions designed for execution on a computer system. A component may include a computer program, software application, or one or more lines of computer readable processing instructions.
  • Software blocks that perform various embodiments can be part of computer program modules comprising computer instructions, such control algorithms that are stored in a computer-readable medium such as memory. Computer instructions can instruct processors to perform any methods described below. In other embodiments, additional modules could be provided as needed.
  • A detailed description of an exemplary application is provided as a specific enabling disclosure that may be generalized to any application of the disclosed system, device and method for latent fault checking a cooling module in accordance with the various embodiments.
  • FIG. 1 representatively illustrates a computer system 100 in accordance with various exemplary embodiments. Computer system 100 may include an embedded computer chassis 101 having a front side 102 and a rear side 104. In some embodiments, computer system 100 and embedded computer chassis 101 may comply with the Advanced Telecom and Computing Architecture (ATCA™) standard as defined in the PICMG 3.0 AdvancedTCA specification. In other embodiments, computer system 100 and embedded computer chassis 101 may comply with CompactPCI standard. In yet other embodiments, embedded computer chassis 101 may comply with MicroTCA standard as defined in PICMG® MicroTCA Draft 0.6—Micro Telecommunications Computing Architecture Base Specification (and subsequent revisions). The various embodiments are not limited to a computer system complying with any of these standards, and computer systems complying with other standards are within the scope of the present teachings.
  • Embedded computer chassis 101 may include a plurality of slots for inserting computing modules 118, for example payload modules and switch modules. Computing modules 118 may couple to backplane (not shown for clarity) to facilitate power distribution and/or communication using a bus topology, switch fabric topology, and the like. In some embodiments, backplane may comprise for example and without limitation, 100-ohm differential signaling pairs. When in operation, computing modules 118 generate heat that must be removed from embedded computer chassis 101.
  • Computing modules 118 may include at least one switch module coupled to any number of payload modules via the backplane, which may accommodate any combination of a packet switched backplane including a distributed switched fabric, or a multi-drop bus type backplane. Backplanes architectures may include CompactPCI, Advanced Telecom Computing Architecture (AdvancedTCA), MicroTCA, and the like.
  • Payload modules may add functionality to computer system 100 through the addition of processors, memory, storage devices, I/O elements, and the like. In other words, payload module may include any combination of processors, memory, storage devices, I/O elements, and the like, to give computer system 100 any functionality desired by a user.
  • In some embodiments, computer system 100 can use a switch module as a central switching hub with any number of payload modules coupled to one or more switch modules. Computer system 100 may support a point-to-point, switched input/output (I/O) fabric. Computer system 100 may be implemented by using one or more of a plurality of switched fabric network standards, for example and without limitation, InfiniBand™, Serial RapidIO™, Ethernet™, AdvancedTCA™, PCI Express™, Gigabit Ethernet, and the like. Computer system 100 is not limited to the use of these switched fabric network standards and the use of any switched fabric network standard is within the scope of the present teachings.
  • In some embodiments, embedded computer chassis 101 may include a cooling subsystem comprising any number of cooling modules 108 for dissipating heat generated by computing modules 118, temperature sensors and other hardware and software modules to detect and react to temperature changes in embedded computer chassis. In some embodiments by way of non-limiting example, cooling module 108 may be disposed adjacent to computing modules 118. Embedded computer chassis 101 may include a plurality of fan module bays 106, each disposed to accept a cooling module 108 for drawing cooling air 120 through embedded computer chassis 101. In some embodiments, each cooling module 108 may include one or more fans or blowers, power and control circuitry, and the like (as discussed more fully below). Cooling module 108 may plug into each fan module bay 106 and receive power from a central or dedicated power supply for embedded computer chassis 101. In some embodiments, embedded computer chassis 101 may include a cooling module cover 110 to provide access to cooling module for maintenance and system diagnostics. In the following discussion of embodiments, term “fan” or “fans” will be understood to include “blowers,” “fans,” or any combination of “blowers” and “fans.”
  • FIG. 2 representatively illustrates a computer system 200 in accordance with various embodiments. In some embodiments, computer system 200 may include cooling module 208 coupled to at least one bus master module 230. Cooling module 208 may be a modular cooling fan tray coupled for insertion into fan module bays 106, and include one or more fans 236, and a fan controller module 232 coupled to issue commands to the fan such as increase speed, decrease speed, on/off signals, and the like. Cooling module 208 may also include a fan tachometer 234 coupled to read the fan speed 239 in rpm, and the like, and report the fan speed 239 to fan controller module 232, which may then report fan speed 239 to bus master module 230.
  • Coupled to cooling module 208, is a bus master module 230, which may function to control a maintenance bus 231. In various embodiments, maintenance bus 231 may communicate management data between bus master module 230 and cooling module 208. Management data may include data pertaining to, for example and without limitation, temperature, voltage, amperage, bus traffic, status indications, and the like. Management data may also include instructions, for example and without limitation, instructions for cooling fans, adjustment of power supplies, and the like. Management data communicated over maintenance bus 231 may function to monitor and maintain cooling module 208. Management data differs from other data transmitted on a data bus (not shown for clarity) in that management data is used for monitoring and maintaining, among other things, cooling module 208, while a traditional data bus functions to communicate data transmitted to/from and processed by computing modules 118.
  • In various embodiments, maintenance bus 231 may be an Intelligent Platform Management Bus (IPMB) as specified in an Intelligent Platform Management Interface Specification. The Intelligent Platform Management Bus may be an I2C-based bus that provides a standardized interconnection between different boards within a chassis. The IPMB can also serve as a standardized interface for auxiliary or emergency management add-in cards. In various embodiments, bus master module 230 may be a Shelf Management Controller (ShMC) as is know in the AdvancedTCA computer platform.
  • Under normal operation, fan controller module 232 controls the fan speed 239 based on cooling requirements of embedded computer chassis 101. For example, if bus master module 230 detects a temperature increase in embedded computer chassis 101, it may signal cooling module 208, particularly fan controller module 232, that it needs to increase the fan speed 239 to increase cooling air flow. If the system is functioning correctly, fan controller module 232 may issue a command to fan 236 to increase fan speed 239. This increase in fan speed is detected by fan controller module 232 via fan tachometer 234, which may measure and report the rpm of fan 236 to bus master module 230 via fan controller module on maintenance bus 231. The same process may work in reverse if bus master module 230 detects that the temperature of embedded computer chassis 101 is too low. In this instance a decrease in fan speed may be commanded with the corresponding feedback of fan speed via fan tachometer 234.
  • Since cooling module 208 is critical to reliable operation of computer system 200, full-speed fan control circuit 238 is included such that bus master module 230 may order fan 236 to increase to full-speed, while bypassing maintenance bus 231 and fan controller module 232. For example, if an increase in cooling air is required and bus master module 230 orders an increase in fan speed 239 and no indication of the increased fan speed is given via the feedback mechanism illustrated above, bus master module 230 has an alternative path to order an increase in fan speed 239. This may indicate a failure of fan controller module 232. In this instance, bus master module 230 may directly command fan 236 to increase to full speed by issuing full-speed signal 246, thereby causing fan 236 to increase to full-speed and provide maximum cooling. This feature adds an additional layer of fault tolerance to cooling module 208 and hence increases reliability.
  • Despite the above features, the prior art does not currently provide a method or apparatus to detect a latent fault in cooling module 208. A latent fault is a fault that has occurred but is not visible or has not manifested itself. This is contrasted with an active fault that is visible and has manifested itself. In the prior art, if fan tachometer 234 or fan controller module 232 fails such that fan speed 239 is indicated as sufficient regardless of what fan speed 239 or the condition of fan 236 really was (voltage or current draw, and the like), there may be no indication to bus master module 230 that a problem exists. This is referred to as a latent fault as it is a failure of the cooling module 208 but does not trigger an indication of failure until a second fault occurs, (such as insufficient cooling of embedded computer chassis 101).
  • In other words, a latent fault is a fault that is present but not visible or active. In order to maintain a highly reliable, highly available system, a latent fault within the cooling module 208 needs to be detected before a second fault occurs and activates the latent fault to the status of active fault. This may be the function of fan controller latent fault checking algorithm 242 and full-speed latent fault checking algorithm 240, which may be any combination of software or hardware functioning to detect a latent fault in cooling module 208 prior to that latent fault manifesting itself as an active fault.
  • Fan controller latent fault checking algorithm 242 may function to test fan controller module 232, full-speed fan control circuit 238 and fan tachometer 234 prior to an active fault in cooling module 208. Prior to an active fault in cooling module 208, or detection of an active fault in cooling module 208, fan controller latent fault checking algorithm 242 may be utilized periodically to increase the reliability of cooling module 208 and cooling subsystem.
  • Fan controller latent fault checking algorithm 242 attempts to modify fan speed 239 via fan controller module 232 and detect the a change in fan speed 245 at bus master module 230 to determine if fan controller module 232 and fan tachometer 234 are functioning properly. For example, an increase fan speed signal 243 may be communicated from bus master module 230 via fan controller module 232 to increase fan speed 239. It is determined if an increase in fan speed 241 is detected as measured via fan tachometer 234. Also, a decrease fan speed signal 244 may be communicated from bus master module 230 via fan controller module 232 to decrease fan speed 239. It is determined if a decrease in fan speed 242 is detected as measured via fan tachometer 234. If either the increase in fan speed 241 or the decrease in fan speed 242 are not detected, a latent fault may be indicated in the cooling module 208. In some embodiments, an alarm signal 250 may be generated to notify a system administrator of the latent fault.
  • To further test for latent faults in cooling module 208, full-speed latent fault checking algorithm 240 may be employed. Full-speed latent fault checking algorithm 240 attempts to modify fan speed 239 via full-speed fan control circuit 238, bypassing fan controller module 232 and detect a change in fan speed 245 at bus master module 230 to determine if full-speed fan control circuit 238, fan controller module 232 and fan tachometer 234 are functioning properly. For example, full-speed signal 246 is communicated to fan 236 via full-speed fan control circuit 238, bypassing fan controller module 232. It is determined if an increase in fan speed 241 is detected as measured via fan tachometer 234. Removal of full-speed signal 246 while bypassing the fan controller module 232 may then allow a decrease in fan speed 239, for example back to the fan speed prior to implementing the above algorithm. It is determined if a decrease in fan speed 242 is detected as measured via fan tachometer 234. If either the increase in fan speed 241 or the decrease in fan speed 242 are not detected, a latent fault may be indicated in the cooling module 208. In some embodiments, an alarm signal 250 may be generated to notify a system administrator of the latent fault.
  • The above algorithms may be performed in any order and be within the scope of the various embodiments. Further, the test of increased and decreased fan speed may be performed in any order in both algorithms and be within the scope of the various embodiments.
  • FIG. 3 representatively illustrates a flow diagram 300 in accordance with various exemplary embodiments. In step 302, an increase fan speed signal is communicated via fan controller module. In step 304 it is determined if fan speed has increased. If not, a latent fault is indicated per step 318. If fan speed has increased, a decrease fan speed signal is communicated via fan controller module in step 306. In step 308 it is determined if fan speed has decreased. If not, a latent fault is indicated per step 318.
  • If fan speed has decreased in step 308, a full-speed signal is communicated, bypassing fan controller module per step 310. In step 312 it is determined if fan speed has increased. If not, a latent fault is indicated per step 318. If fan speed has increased, full-speed signal is removed while bypassing fan controller module per step 314. In step 316 it is determined if fan speed has decreased. If not, a latent fault is indicated per step 318. If fan speed has decreased per step 316, no latent fault is detected per step 322. If at any point in the flow diagram latent fault is detected per step 318, an alarm signal may be generated per step 320 to notify a system administrator of the latent fault.
  • In the foregoing specification, various embodiments have been described. However, it will be appreciated that various modifications and changes may be made without departing from the scope of the present teachings as set forth in the claims below. The specification and figures are to be regarded in an illustrative manner, rather than a restrictive one and all such modifications are intended to be included within the scope of the present teachings. Accordingly, the scope of the present teachings should be determined by the claims appended hereto and their legal equivalents rather than by merely the examples described above.
  • For example, the steps recited in any method or process claims may be executed in any order and are not limited to the specific order presented in the claims. Additionally, the components and/or elements recited in any apparatus claims may be assembled or otherwise operationally configured in a variety of permutations to produce substantially the same result and are accordingly not limited to the specific configuration recited in the claims.
  • Benefits, other advantages and solutions to problems have been described above with regard to various embodiments; however, any benefit, advantage, solution to problem or any element that may cause any particular benefit, advantage or solution to occur or to become more pronounced are not to be construed as critical, required or essential features or components of any or all the claims.
  • Other combinations and/or modifications of the above-described structures, arrangements, applications, proportions, elements, materials or components used in the practice of the present teachings, in addition to those not specifically recited, may be varied or otherwise particularly adapted to specific environments, manufacturing specifications, design parameters or other operating requirements without departing from the general principles of the same.

Claims (12)

1. A computer system, comprising:
a cooling module that cools an embedded computer chassis, comprising:
a fan; and
a fan controller that controls the fan speed based on a first signal that represents a desired speed of the fan; and
a bus master module that generates the first signal, that generates a second signal that bypasses the fan controller and selectively switches the fan to a full-speed, that receives a third signal that indicates an actual speed of the fan, that communicates the second signal to switch the fan to full-speed, that monitors the third signal to determine if the fan speed changed due to the second signal, and that indicates a latent fault if the change in the fan speed is not detected.
2. The computer system of claim 1 wherein the bus master module selectively generates the first signal to change the fan speed, monitors the third signal to determine if the fan speed changed due to the first signal, and indicates the latent fault if the change in the fan speed is not detected.
3. The computer system of claim 1 wherein the bus master module demands an increase of fan speed via the first signal, monitors the third signal to determine if the fan speed increased due to the first signal, demands a decrease of the fan speed via the first signal, monitors the third signal to determine if the fan speed decreased due to the first signal, and indicates the latent fault if at least one of the increase and decrease of the fan speeds is not detected.
4. The computer system of claim 1 further comprising an Intelligent Platform Management Bus (IPMB) that carries the second and third signals.
5. The embedded computer chassis of claim 1 wherein the bus master module monitors the third signal to determine if the fan speed increased due to the second signal and then relinquishes the second signal, monitors the third signal to determine if the fan speed decreased due to relinquishing the second signal, and indicates the latent fault if at least one of the increase and decrease in the fan speeds is not detected.
6. The embedded computer chassis of claim 1 further comprising a fan tachometer that generates the third signal based on a rotational speed of the fan.
7. The embedded computer chassis of claim 1, wherein the bus module indicates the latent fault via an alarm signal.
8. A method of performing a full-speed latent fault check on a fan controller, comprising:
a full-speed latent fault checking algorithm, comprising:
communicating a first signal to modify a fan speed via a full speed fan control circuit that bypasses a fan controller module; and
determining if the change in the fan speed is detected due to the first signal; and
if the change in the fan speed is not detected in the full-speed latent fault checking algorithm, then indicating a latent fault.
9. The method of claim 8, further comprising a fan controller latent fault checking algorithm, comprising:
communicating a second signal to modify the fan speed via the fan controller module;
determining if a change in the fan speed is detected due to the second signal; and
if the change in the fan speed is not detected in the fan controller latent fault checking algorithm, then indicating the latent fault.
10. The method of claim 8, the fan controller latent fault checking algorithm further comprising:
requesting an increased fan speed via the second signal;
determining if an increase in the fan speed is detected;
requesting a decreased fan speed via the second signal;
determining if a decrease in the fan speed is detected; and
if at least one of the increase in the fan speed and the decrease in the fan speed are not detected, then indicating the latent fault.
11. The method of claim 8, the full-speed latent fault checking algorithm further comprising:
communicating a full-speed signal, bypassing the fan controller module;
determining if an increase in the fan speed is detected;
removing the full-speed signal, bypassing the fan controller module;
determining if a decrease in the fan speed is detected; and
if at least one of the increase in the fan speed and the decrease in the fan speed are not detected, indicating the latent fault in the cooling module of the embedded computer chassis.
12. The method of claim 8, wherein indicating a latent fault comprises generating an alarm signal in a computer.
US12/099,358 2006-01-20 2008-04-08 Method Of Latent Fault Checking A Cooling Module Abandoned US20080186672A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US12/099,358 US20080186672A1 (en) 2006-01-20 2008-04-08 Method Of Latent Fault Checking A Cooling Module

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US11/336,230 US7373278B2 (en) 2006-01-20 2006-01-20 Method of latent fault checking a cooling module
US12/099,358 US20080186672A1 (en) 2006-01-20 2008-04-08 Method Of Latent Fault Checking A Cooling Module

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US11/336,230 Continuation US7373278B2 (en) 2006-01-20 2006-01-20 Method of latent fault checking a cooling module

Publications (1)

Publication Number Publication Date
US20080186672A1 true US20080186672A1 (en) 2008-08-07

Family

ID=38286576

Family Applications (2)

Application Number Title Priority Date Filing Date
US11/336,230 Active 2026-05-12 US7373278B2 (en) 2006-01-20 2006-01-20 Method of latent fault checking a cooling module
US12/099,358 Abandoned US20080186672A1 (en) 2006-01-20 2008-04-08 Method Of Latent Fault Checking A Cooling Module

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US11/336,230 Active 2026-05-12 US7373278B2 (en) 2006-01-20 2006-01-20 Method of latent fault checking a cooling module

Country Status (4)

Country Link
US (2) US7373278B2 (en)
EP (1) EP1974275A2 (en)
CN (1) CN101379470B (en)
WO (1) WO2007084812A2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104778110A (en) * 2015-04-16 2015-07-15 浪潮电子信息产业股份有限公司 Air duct abnormality detection method based on nios II soft core in server system
US9622388B1 (en) * 2016-02-10 2017-04-11 Ciena Corporation Multi-directional fans in an electronic chassis supporting extended range temperature operation
US20180137747A1 (en) * 2016-11-17 2018-05-17 Cisco Technology, Inc. Method and apparatus for exchanging information through a tachometer signal

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070180329A1 (en) * 2006-01-31 2007-08-02 Lanus Mark S Method of latent fault checking a management network
US8437881B2 (en) * 2008-02-15 2013-05-07 The Pnc Financial Services Group, Inc. Systems and methods for computer equipment management
CN102478861B (en) * 2010-11-29 2014-03-26 英业达股份有限公司 Fan control system for memory
DE102012210760A1 (en) * 2012-06-25 2014-01-02 Kaco New Energy Gmbh Method for checking the function of a cooling system of an inverter and inverter
US10036396B2 (en) * 2013-03-08 2018-07-31 Coriant Operations, Inc. Field configurable fan operational profiles
WO2016151779A1 (en) * 2015-03-24 2016-09-29 富士通株式会社 Information processing device and management device
CN106567846B (en) * 2015-10-12 2018-12-18 大唐移动通信设备有限公司 A kind of blower regulation method and apparatus of Advanced telecom computing architecture ATCA subrack
EP4004679A4 (en) * 2019-07-24 2023-04-05 Hewlett-Packard Development Company, L.P. Pulse width modulation and voltage test signals for fan type detection
CN113821091B (en) * 2020-06-19 2024-02-13 戴尔产品有限公司 Fan fault compensation

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5727928A (en) * 1995-12-14 1998-03-17 Dell Usa L.P. Fan speed monitoring system for determining the speed of a PWM fan
US6163266A (en) * 1998-12-08 2000-12-19 Lucent Technologies Inc. Fan operation detection circuit for a DC fan and method of operation thereof
US20030084358A1 (en) * 2001-10-31 2003-05-01 Bresniker Kirk M. System and method for intelligent control of power consumption of distributed services during periods of reduced load
US20030236594A1 (en) * 2002-06-20 2003-12-25 Scott Frankel Intelligent cooling fan
US20050165577A1 (en) * 2004-01-28 2005-07-28 Valere Power, Inc. Method and apparatus for predicting fan failure

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1372375A (en) * 2001-02-28 2002-10-02 协禧电机股份有限公司 Fixed speed control circuit of small fan
CN100362448C (en) * 2003-02-26 2008-01-16 华为技术有限公司 A method for monitoring fan operation

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5727928A (en) * 1995-12-14 1998-03-17 Dell Usa L.P. Fan speed monitoring system for determining the speed of a PWM fan
US6163266A (en) * 1998-12-08 2000-12-19 Lucent Technologies Inc. Fan operation detection circuit for a DC fan and method of operation thereof
US20030084358A1 (en) * 2001-10-31 2003-05-01 Bresniker Kirk M. System and method for intelligent control of power consumption of distributed services during periods of reduced load
US20030236594A1 (en) * 2002-06-20 2003-12-25 Scott Frankel Intelligent cooling fan
US20050165577A1 (en) * 2004-01-28 2005-07-28 Valere Power, Inc. Method and apparatus for predicting fan failure

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104778110A (en) * 2015-04-16 2015-07-15 浪潮电子信息产业股份有限公司 Air duct abnormality detection method based on nios II soft core in server system
US9622388B1 (en) * 2016-02-10 2017-04-11 Ciena Corporation Multi-directional fans in an electronic chassis supporting extended range temperature operation
US20180137747A1 (en) * 2016-11-17 2018-05-17 Cisco Technology, Inc. Method and apparatus for exchanging information through a tachometer signal
US10467892B2 (en) * 2016-11-17 2019-11-05 Cisco Technology, Inc. Method and apparatus for exchanging information through a tachometer signal

Also Published As

Publication number Publication date
WO2007084812A2 (en) 2007-07-26
WO2007084812A3 (en) 2008-08-28
EP1974275A2 (en) 2008-10-01
US20070174020A1 (en) 2007-07-26
CN101379470B (en) 2011-04-13
US7373278B2 (en) 2008-05-13
CN101379470A (en) 2009-03-04

Similar Documents

Publication Publication Date Title
US7373278B2 (en) Method of latent fault checking a cooling module
US8374731B1 (en) Cooling system
US8656003B2 (en) Method for controlling rack system using RMC to determine type of node based on FRU's message when status of chassis is changed
US6813150B2 (en) Computer system
EP3427151B1 (en) Memory backup management in computing systems
US8800884B2 (en) Method for controlling cooling in a data storage system
TWI571733B (en) Server rack system and power management method applicable thereto
US20080113604A1 (en) Embedded computer chassis with service fan tray
US7624303B2 (en) Generation of system power-good signal in hot-swap power controllers
US8639963B2 (en) System and method for indirect throttling of a system resource by a processor
KR20060093019A (en) How to Switch Services, How to Provide Computer Systems and Services
KR20150049572A (en) System for sharing power of rack mount server and operating method thereof
CN107179804B (en) Cabinet device
CN107870846B (en) Fault element indication method, device and system
US11733762B2 (en) Method to allow for higher usable power capacity in a redundant power configuration
CN110442225A (en) Power distribution board, modular chassis system and its operation method
CN106940676B (en) Monitoring system of cabinet
WO2006107444A2 (en) Method of monitoring a power distribution unit
US10516260B2 (en) Multi-node system fault management
US11880251B2 (en) Self-powered indicator
US20030115397A1 (en) Computer system with dedicated system management buses
US6876102B2 (en) Diode fault detection system and method
US20070180329A1 (en) Method of latent fault checking a management network
CN116361088A (en) Node misplug detection method and server
US11800681B2 (en) Rack based distribution unit for electronics cooling

Legal Events

Date Code Title Description
AS Assignment

Owner name: EMERSON NETWORK POWER - EMBEDDED COMPUTING, INC.,

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MOTOROLA, INC.;REEL/FRAME:020771/0126

Effective date: 20071231

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION