WO1996041249A2 - Intelligent disk-cache memory - Google Patents
Intelligent disk-cache memory Download PDFInfo
- Publication number
- WO1996041249A2 WO1996041249A2 PCT/US1996/006520 US9606520W WO9641249A2 WO 1996041249 A2 WO1996041249 A2 WO 1996041249A2 US 9606520 W US9606520 W US 9606520W WO 9641249 A2 WO9641249 A2 WO 9641249A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- memory
- data
- disk
- cache
- memory bank
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C29/00—Checking stores for correct operation ; Subsequent repair; Testing stores during standby or offline operation
- G11C29/70—Masking faults in memories by using spares or by reconfiguring
- G11C29/74—Masking faults in memories by using spares or by reconfiguring using duplex memories, i.e. using dual copies
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/08—Error detection or correction by redundancy in data representation, e.g. by using checking codes
- G06F11/10—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
- G06F11/1076—Parity data used in redundant arrays of independent storages, e.g. in RAID systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/2015—Redundant power supplies
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0804—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with main memory updating
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0866—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2211/00—Indexing scheme relating to details of data-processing equipment not covered by groups G06F3/00 - G06F13/00
- G06F2211/10—Indexing scheme relating to G06F11/10
- G06F2211/1002—Indexing scheme relating to G06F11/1076
- G06F2211/1009—Cache, i.e. caches used in RAID system with parity
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2211/00—Indexing scheme relating to details of data-processing equipment not covered by groups G06F3/00 - G06F13/00
- G06F2211/10—Indexing scheme relating to G06F11/10
- G06F2211/1002—Indexing scheme relating to G06F11/1076
- G06F2211/109—Sector level checksum or ECC, i.e. sector or stripe level checksum or ECC in addition to the RAID parity calculation
Definitions
- the present invention relates to methods and apparatus for computer disk subsystems and more specifically to cache memories suitable for computer disk subsystems.
- Computer disk subsystems generally include the media on which data are stored, plus one or more disk-subsystem controllers, one or more memories for caching and/or buffering data being transferred between the system and the disk devices, and an interface to the main computer or computers.
- the media holding data within a computer disk subsystem can include magnetic, optical, or other disk-storage technologies, and often also include tape or other types of removable data storage, particularly for backup and interchange of the data stored on the disk-storage technologies.
- the memories in a computer can include static random access memories (SRAMs), dynamic random access memories (DRAMs), or dual- ported static random-access memories (DPSRAMs).
- SRAMs static random access memories
- DRAMs dynamic random access memories
- DPSRAMs dual- ported static random-access memories
- “Caching” in these memories is defined as storing data which, it is anticipated, the main computer will request in the near future (this is also called “read caching”). Read caching can involve holding read data which was requested in anticipation that it will be requested again in the future, or reading ahead of what was requested and holding the read-ahead data in anticipation that sequential data will be requested in the future.
- “Write caching” is defined as storing data, which has been sent from the main computer for storage on the disk devices, into the memories.
- Read buffering in these memories is defined as storing data which has already been requested by the main computer, but which comes from the disk devices at a different rate than the rate which the data are transferred to the main computer system; the buffering accommodates the different transfer rates (typically, storing data as they are relatively slowly read and gathered from the disk devices, then quickly transferring blocks of data to the main computer).
- “Write buffering” accommodates data which is being transferred from the system to the disk devices, and which comes from the system at a different rate (typically a quicker rate) than the rate of transfer to the disk devices. Read buffering and write buffering is often performed in disk devices themselves.
- the main computer must wait until the disk subsystem has completed a write operation and has also indicated this completion to the main computer before the main computer can proceed with other operations.
- the completion indication can be sent to the main computer once the data are successfully written into a data cache, and before the data have actually been written to the disk surface. There is a danger of data loss if the data are transferred into the cache and the corresponding completion indication is sent to the main computer, but later some subsequent error (such as a parity error in the data, a disk-subsystem-controller failure, or a loss of power to the subsystem) prevents the data from ever being written to the disk.
- the improved cache should improve performance and be able to recover data in case of a detected parity error in the data, in the case of a controller failure, and in the case of a power failure.
- the present invention teaches a method and apparatus for intelligently caching data in an intelligent disk subsystem connected to a main computer having a main memory.
- the disk subsystem includes a disk-cache memory having a first and a second memory bank.
- a first copy and a second copy of data are held in the first and second memory bank, respectively, wherein the first memory bank is coupled to a first battery and the second memory bank is coupled to a second battery.
- a detected failure occurring in either memory bank or either battery causes either the first copy or the second copy of data to be read (i.e., the copy not associated with the error will be read), based on where a detected failure occurred.
- the detected failures include error-correction- code (ECC) errors.
- ECC error-correction- code
- successive read operations are routed to alternating memory banks.
- each memory bank is periodically scrubbed to purge the memory of correctable errors.
- the disk-cache memory is packaged on a removable cache module.
- read operations going to disk devices and returning data to the disk-cache memory are given a higher priority than write operations.
- a method describes caching data originating from write operations but not caching data originating from read operations in the cache, wherein write operations are defined as transferring data from the main computer to the disk subsystem and read operations are defined as transferring data from the disk subsystem to the main computer.
- later write operations are examined to determine whether data from the later write operation is intended to be written to the intended address of an earlier write operation, and if so, the data from the first write operation is overwritten with the data from the second in the cache.
- FIG. 1 is a block diagram illustrating an computer system 100 including an intelligent SCSI subsystem 200 according to the present invention.
- FIG. 2 is a block diagram illustrating an intelligent SCSI subsystem 200 including an intelligent disk-cache memory 300 according to the present invention.
- FIG. 3 is a block diagram illustrating details of one embodiment of an intelligent disk-cache memory 300 according to the present invention.
- FIG. 4 is a diagram illustrating an embodiment of removable cache module 350.
- Figure 1 is a block diagram illustrating a computer system 100 according to the present invention, including main computer 102 and intelligent SCSI subsystem (ISS) 200 connected by high-performance bus 198.
- Main computer 102 includes system memory 104.
- a plurality of disk-connection busses 239 are provided, each capable of connecting to a plurality of disk devices 240.
- disk-connection busses 239 are standard 16- bit-wide/fast differential-drive SCSI (Small Computer System Interface) busses with differential SCSI terminators, and disk devices 240 are SCSI disk devices.
- disk-connection busses 239 are standard 8 -bit- wide/fast differential-drive SCSI busses, 8-bit-wide/slow differential-drive SCSI busses, 8- bit-wide/fast single-ended-drive SCSI busses, or 8-bit-wide/slow single-ended- drive SCSI busses, depending on the interface chips used in ISS 200.
- four SCSI disk-connection busses 239 are provided.
- up to seven SCSI disk devices 240 can connect to each disk- connection bus 239.
- up to fifteen SCSI disk devices 240 can connect to each disk-connection bus 239.
- high-performance bus 198 is a 64-bit wide bus capable of 267MB/second transfer rates.
- ISS 200 subsystems can be connected to one main computer 102, providing scalability to the needs of the user.
- portions of disk-processing tasks are offloaded from main computer 102 into each ISS 200, allowing scalable system performance improvements by adding ISS 200 subsystems.
- multiple main computers 102 are also connected to high- performance bus 198, providing additional system performance or redundancy (for additional reliability) or both.
- FIG. 2 is a block diagram illustrating one embodiment of intelligent SCSI subsystem (ISS) 200, including intelligent disk-cache memory 300, processor 202, cache buffer 204, dual-port memory 206, FLASH ROM (read-only memory) 208, NVRAM (non- volatile random-access memory) 210, system registers 212, local registers 214, SRAM 216, and a plurality of disk- connection bus processors 220.1 through 220.N.
- disk- connection bus processors 220.1 through 220.N are NCR53C720-type SCSI processors by NCR Corporation, and number four.
- processor 202 is a 33-MHz 486SX processor by Intel Corporation, and is used to control the overall flow of data and status information within ISS 200.
- a socket is provided to allow a processor 202 upgrade to an Intel Overdrive processor.
- system registers 212 and local registers 214 are implemented in a fifteen-nanosecond (15-ns) MACH230 chip.
- dual-port memory 206 is implemented using two 30-ns 4K-by- 16-bit dual-port SRAMs, with one port coupled to high- performance bus 198 and the other port coupled to local bus 199.
- writing to a particular location (called an 'ISS mailbox') in dual- port memory 206 from high-performance bus 198 causes an interrupt (called an 'ISS mailbox interrupt') to be issued to processor 202 (in one embodiment, by an Intel 8259A interrupt controller chip) in order to indicate that information has arrived in the ISS mailbox.
- writing to a particular location (called a 'system mailbox') in dual-port memory 206 from local bus 199 causes a system interrupt (called a 'system mailbox interrupt') to be issued to processor 202 (in one embodiment, via a slot-specific attention signal — i.e., identifying a specific ISS 200) in order to indicate that information has arrived in the system mailbox.
- each dual-port memory 206 on each ISS 200 in a single system 100 is given a different address to which to respond, in order that main computer 102 can communicate to each ISS 200 separately.
- SRAM 216 is implemented as two sections of four chips each, wherein each chip is a 25-ns 128K-by-8-bit SRAM, and contains the executable firmware which controls operation of ISS 200.
- FLASH ROM 208 is an 80-ns 256K-by-8-bit electronically- erasable-and-rewritable ROM comprising a 28F200BXT chip, and containing at least part of the firmware which controls the operation of processor 202.
- the subsystem firmware in FLASH ROM 208 can be updated without turning off or disabling the system 100 using a DOS-based utility operating on main computer 102; subsystem firmware in FLASH ROM 208 is copied into SRAM 216 for faster execution by processor 202 and bus processors 220.1-220.N.
- NVRAM 210 is a 100-ns 32K-by-8-bit nonvolatile SRAM chip and is used to store RAID (redundant arrays of inexpensive disks) configuration information which is used to automatically restore the RAID-level configuration after a power failure, and the status of in- progress write operations in order that disk data integrity can be reconstructed after a failure.
- ISS 200 provides RAID data protection, and supports various levels of RAID protection including data striping, disk mirroring, and disk-fault recovery.
- RAID level 0 provides higher potential performance by placing part of a data file on one disk (e.g., on the first disk device 240 connected to disk-connection bus 239 from bus processor 220.1) and another part of the same data file on another disk (e.g.. on the first disk device 240 connected to disk- connection bus 239 from bus processor 220.2), such that both disk devices 240 can be accessed in parallel, and data can thus be transferred at a higher rate, up to N times faster if N disk devices 240 are being used in parallel.
- a RAID-level-0 subsystem appears to main computer 102 as a single, large logical disk device having the capacity of all disk devices 240 combined.
- RAID level 1 provides higher potential performance by placing a 'mirrored' separate copy of a data file on each of two or more disk devices 240 (e.g., one copy on the first disk device 240 connected to disk-cpnnection bus 239 from bus processor 220.1 , and one copy on the first disk device 240 connected to bus 239 from bus processor 220.2, and if more than two copies are to be made, one copy to each other mirror disk device 240), such that all disk devices 240 are written roughly in parallel, and data are read from any disk, thus freeing the other disks for other work, which gives better overall performance for system 100.
- disk devices 240 e.g., one copy on the first disk device 240 connected to disk-cpnnection bus 239 from bus processor 220.1 , and one copy on the first disk device 240 connected to bus 239 from bus processor 220.2, and if more than two copies are to be made, one copy to each other mirror disk device 240
- the disk device 240 finishing last provides the completion status signal to main computer 102, thus ensuring that all devices have received the new data before the data are released by the main computer 102.
- the disk device 240 finishing first provides ' the completion status signal to main computer 102, thus providing faster average write times, since the fastest completion time is as fast or faster than the average completion time.
- RAID level 1 also provides data protection, in that if an error is detected in the data read from one disk, or even if one disk-connection bus 239 fails entirely, the data can still be retrieved from one or more other disks on other disk-connection busses 239.
- RAID-level- 1 mirroring is performed across two or more ISS 200 subsystems, (a configuration called 'disk controller duplexing') in order that a failure of an ISS 200 will not cause the loss of data availability to main computer 102 (for example, in one embodiment one complete copy of all system data are stored on disk devices 240 connected to one ISS 200, and another complete copy of all system data are stored on disk devices 240 connected to another ISS 200, so that if either ISS 200 or any component connected to them fails, all of the data are available through the other, surviving ISS 200 and associated devices).
- ISS 200 subsystems a configuration called 'disk controller duplexing'
- RAID level 10 provides even higher potential performance as well as providing data protection, by combining RAID-level- 1 mirroring with RAID-level-0 striping.
- RAID level 4 provides data protection but uses fewer disk drives than RAID-level- 1 mirroring, by combining RAID-level-0 striping with exclusive-or checksum-type parity data redundancy.
- data are striped across two to twenty-seven disk devices 240, while a separate disk device 240 holds the parity data; and all disk devices 240 are configured into a single, large logical device.
- data are striped across two to fifty-nine disk devices 240, while a separate disk device 240 holds the parity data; and all disk devices 240 are configured into a single, large logical device.
- RAID level 5 provides data protection but uses fewer disk drives than RAID-level- 1 mirroring, by combining RAID-level-0 striping with exclusive-or checksum-type parity data redundancy.
- data are striped across three to twenty-eight disk devices 240, while the parity data are interleaved among each disk device 240.
- data are striped across three to sixty disk devices 240, while the parity data are interleaved among each disk device 240.
- one or more hot-spare disk devices 240 are connected to one or more of the disk-connection busses 239 and kept powered- on and ready, but are unused relative to storage of data until a failure is detected (for RAID-levels- 1, -4, -5, or -10) in one of the other disk devices 240.
- a failure for RAID-levels- 1, -4, -5, or - 10.
- the mirrored data from the disk device (or devices) 240 which mirror the failed device is reconstructed onto the hot-spare disk device 240.
- data for write operations are written into both memory banks 308A and 308B in intelligent disk-cache memory 300, and a write-completion status signal is sent to main computer 102 after these writes are complete, but before data are actually written to disk devices 240.
- These write operations are called “delayed writes,” since they are actually completed after the completion is indicated to main computer 102. This allows main computer 102 to proceed to other tasks without having to wait until the data are actually written to disk devices 240, thus enhancing performance in certain circumstances. Because two redundant copies of the data exist in battery-backed-up memory banks 308A and 308B, there is little chance that the data will get irretrievably destroyed before they are actually written to the disk devices 240.
- an elevator-seek algorithm operating in the firmware which is controlling processor 202 optimizes the sequence of " operations sent to any one disk device 240 (although certain operations which must be performed in a determined order, such as a write to a particular sector followed by a read to the same sector are left in that order).
- some operations are re-ordered in order to shorten the seek time needed (e.g., a first read operation to a sector which required a long seek time might be re-ordered to take place after a second read operation which came into ISS 200 after the first operation but which required a shorter seek time: the second operation would be performed by the disk arm while the arm was "on its way" to the first).
- operations are reordered so that disk addresses are accessed in an alternating ascending-and-descending sequence, and the disk arm is scanned first in an ascending sequence, and then in a descending sequence (hence the name 'elevator-seek').
- the elevator-seek algorithm in firmware rather than in the operating system running on main computer 102, bus traffic on high-performance bus 198 is reduced, thus enhancing overall system performance.
- the firmware which is controlling processor 202 optimizes operations by detecting multiple SCSI requests to adjacent locations and combining such operations into a single, larger SCSI operation, thus reducing the number of SCSI commands passed, and saving "missed revolutions" (i.e., if a second operation is sent immediately following the completion of the first, the disk head will already have moved across the sector needed ('missing it') by the time the second command is recognized, and the head must therefore wait nearly an entire revolution before the desired sector is again reached).
- missed revolutions i.e., if a second operation is sent immediately following the completion of the first, the disk head will already have moved across the sector needed ('missing it') by the time the second command is recognized, and the head must therefore wait nearly an entire revolution before the desired sector is again reached.
- the firmware which is controlling processor 202 optimizes write operations by detecting multiple SCSI writes to the same location and combining such operations into a single write operation reflecting only the data of the last write to arrive at ISS 200 (since the data of the earlier operations would have been overwritten anyway, those data are no longer needed, and only the last operation need be performed).
- write-completion status is then signaled to main computer 102 for every collapsed write operation. Performance is improved, since the unneeded earlier write operations are never actually performed to the disk devices 240. These combined write operations are called "collapsed writes.”
- operating system performance is also improved by allowing main computer 102 to perform read-ahead caching into system memory 104.
- Figure 3 is a block diagram illustrating details of one embodiment of intelligent disk-cache memory 300.
- address bus 301 is twenty-three bits wide, and data bus 303 is 32 bits wide.
- Address bus 301 is coupled to two redundant address buffers, 302 A and 302B.
- Data bus 302 is coupled to two redundant bi-directional data buffers, 304 A and 304B.
- Address buffer 302 A is twenty-three bits wide in the embodiment shown, is comprised of three FCT2244 chips (available from Quality Semiconductor, 851 Martin Ave.. Santa Clara, CA 95050), and drives an address to all of the memory chips in memory bank 308 A.
- Address buffer 302B is also twenty-three bits wide in the embodiment shown, is also comprised of three FCT2244 chips, and drives all of the memory chips in memory bank 308B.
- Data buffer 304A is thirty-two bits wide in the embodiment shown, is comprised of two ABT16245 chips (available from Texas Instruments, P.O.Box 655303, Dallas, TX 75265), and drives data to all of the memory chips in memory bank 308A.
- Data buffer 304B is also thirty-two bits wide in the embodiment shown, is also comprised of two ABT 16245 chips, and drives data to all of the memory chips in memory bank 308B.
- memory bank 308 A includes twenty pseudo-SRAM (static random-access memory) chips, each having 512K eight- bit bytes, each having an eighty-nanosecond (80-ns) access time, arranged as a memory array which is four sections by five bytes by 512K, or eight megabytes of data plus error-correction code (ECC).
- the five bytes to memory bank 308 A are coupled to the four bytes of the DATA0 bus and the one byte of ECC0 bus.
- memory bank 308B also includes the same number and type of chips, coupled to the four bytes of the DATA1 bus and the one byte of ECC 1 bus.
- ECC chip 306A which is a nine-nanosecond (9-ns) AM29C660 chip (available from AMD (Advanced Micro Devices) 901 Thomson Place. P.O. Box 3453, Sunnyvale, CA 94088) that provides detection of all double- and single-bit errors and correction of all single-bit errors (DBED/SBEC) on the DATAO bus.
- ECC chip 306A generates ECCO bits as DATAO data are being written into memory bank 308 A, and uses the read-out ECCO bits to provide DBED/SBEC on the DATAO bus as data are being read.
- ECC chip 306B which is also an AM29C660 chip that provides DBED/SBEC on the DATA1 bus.
- ECC chip 306B also generates ECC1 bits as DATA1 data are being written into memory bank 308B, and uses the read-out ECC1 bits to provide DBED/SBEC on the DATA1 bus as data are being read from memory bank 308B.
- battery 311 A provides power for memory bank 308A, and is monitored by gas-gauge (GG) chip 310A and BVCC (battery V cc ) control chip 313A and controlled via PNP transistor 314A.
- GG gas-gauge
- BVCC battery V cc
- GG chip 310A is a BQ2010 chip (Benchmark
- BVCC control chip 313A is an LTC1235 chip (Linear Technology Corp., 1630 McCarthy Boulevard., Milpitas, CA 95035) which controls the supply of power, from main voltage supply V cc or from battery 311 A, via PNP transistor 314A, depending on which provides the "best" supply voltage (i.e., if V cc fails or falls below a specified voltage, power is instead supplied from battery 311 A).
- gas gauge control 312 is a fifteen- nanosecond (15-ns) MACH220 37047200 chip (available from AMD (Advanced Micro Devices) 901 Thomson Place, P.O. Box 3453, Sunnyvale, CA 94088) which provides monitoring of GG chips 310A and 310B, and provides output- gas-gauge values onto the low-order bits of bus DATAO when requested.
- cache control 322 is a fifteen- nanosecond (15-ns) MACH445 37047100 chip (available from AMD (Advanced Micro Devices) 901 Thomson Place, P.O. Box 3453, Sunnyvale, CA 94088) which provides overall monitoring and control of the functions of intelligent disk-cache memory 300.
- Input signals which are monitored by cache control 322 include a single-bit and a multi-bit error-detection signal from each of the ECC chips 306A and 306B.
- input signals from other subsystems such as BVCC controls 313A and 313B, gas gauges 310A and 31 OB, gas gauge control 312, etc. are also monitored by cache control 322.
- Control signals generated by cache control 322 include output-enable signals and read/write signals to memory banks 308A and 308B, and a battery-off signal to BVCC controllers 313A and 313B.
- cache control 322 under normal conditions (i.e., when no errors have been detected), in response to a write operation causes each written datum to be simultaneously written twice, one copy each to memory banks 308A and 308B, in order to have a redundant copy available if one or the other memory bank (308A or 308B), or the respective associated circuitry for that memory bank, happens to fail.
- cache control 322 in response to a read operation, cache control 322 causes either memory bank 308A or 308B to read its respective copy of the data into its respective ECC chip 306A and 306B (i.e., alternating banks are used on successive read operations), and if no error is detected, cache control 322 causes the data from memory bank 308A (or 308B as applicable) to be passed to data bus 303 by data buffer 304A (or 304B as applicable); if an ECC error is detected by ECC chip 306A (or 306B as applicable) data will be read from the opposite bank, and if no error is detected by ECC chip 306B (or 306A as applicable), then cache control 322 causes the data from the opposite memory bank (i.e., memory bank 308B if 308 A detected an error on its side) to be passed to data bus 303 by data buffer 304B (or 304A as applicable).
- the ECC chip 306A 306B of the corresponding bank will correct the single-bit error it detects, and cache control 322 causes the corrected data from affected ECC chip 306A/306B to be passed to data bus 303 by the corresponding data buffer 304A/304B, and to be rewritten to the affected memory bank 308A/308B (and the other bank is not read from).
- ECC chip 306B will correct the single-bit error it detects, and cache control 322 causes the corrected data from ECC chip 306B to be passed to data bus 303 by data buffer 304B.
- cache control 322 causes memory bank 308A and 308B to be sequentially read, corrected, and rewritten during otherwise unused cycles, in order that single-bit errors (which may "spontaneously" appear from time-to-time) are detected and corrected.
- a pointer is maintained, and sequenced through each successive lpcation of memory banks 308A and 308B, in order to correct all single-bit soft errors (a soft error is one which can be corrected by overwriting the location with the correct data).
- a soft error is one which can be corrected by overwriting the location with the correct data.
- cache control 322 also detects certain failures (such as insufficient voltage or charge) in batteries 311 A and 31 IB and in their respective associated circuitry (i.e., PNP transistors 314A and 314B, GG chips 310A and 31 OB, and BVCC controls 313A and 313B). These failures are collectively called 'battery failures', even though some may actually be caused by failures in the other associated components. Under normal conditions (i.e., when no battery failures have been detected), writes go to both memory banks 308A and 308B, and reads come alternately from memory banks 308A and 308B, as described above. In this embodiment, if a battery failure associated with memory bank 308A is detected, no new data is written to memory bank 308A, and that memory bank is cleared of data, in order that the battery failure does not allow undetected errors to pass from memory bank 308A.
- failures such as insufficient voltage or charge
- cache control 322 if cache control 322 detects any failures in either memory bank or in their respective support circuitry, then no new data is written into intelligent disk cache memory 300, but instead read and write operations thereafter bypass the cache until a repair of the failed device is effected.
- data already in memory bank 308A and/or 308B are "flushed.” This flushing operation involves writing the write data to the disk devices 240 as specified by the respective write operations which were previously cached, and all cache resources are released as the corresponding write operations complete to disk devices 240.
- intelligent disk-cache memory 300 only write data from write operations from main computer 102 to the ISS 200 are cached into intelligent disk-cache memory 300, while read data from read operations bypass intelligent disk-cache memory 300.
- This write data is then eventually written to disk devices 240 (as specified by the RAID level currently running on ISS 200) as described above (a write cache); the cached write data are also available for read operations (a read cache), however, no other read data (e.g., from read operations, read-ahead operations or read-buffering operations) is placed or held in intelligent disk- cache memory 300.
- intelligent disk-cache memory 300 is fabricated as a removable cache module 350.
- a removable cache module 350 can be removed, with its data intact, from the failed ISS 200, and plugged into a replacement ISS 200, which in turn is replaced into system 100. Data stored in the removable plug-in module are then used to complete operations which were in process, but not yet completed, at the moment of failure.
- FIG. 4 is a diagram illustrating an embodiment of removable cache module 350 having a signal connector 399.
- Removable cache module 350 of one embodiment is designed such that it can be unplugged or plugged without losing any data in memory banks 308A and 308B, due, for example, to voltage spikes caused by the unplugging or plugging processes.
- protection from data loss due to electro-static discharge (ESD) is also provided.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
- Techniques For Improving Reliability Of Storages (AREA)
Abstract
Description
Claims
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| AU57905/96A AU5790596A (en) | 1995-06-07 | 1996-05-20 | Intelligent disk-cache memory |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US47953495A | 1995-06-07 | 1995-06-07 | |
| US08/479,534 | 1995-06-07 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| WO1996041249A2 true WO1996041249A2 (en) | 1996-12-19 |
| WO1996041249A3 WO1996041249A3 (en) | 1997-08-21 |
Family
ID=23904417
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US1996/006520 WO1996041249A2 (en) | 1995-06-07 | 1996-05-20 | Intelligent disk-cache memory |
Country Status (2)
| Country | Link |
|---|---|
| AU (1) | AU5790596A (en) |
| WO (1) | WO1996041249A2 (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115114200A (en) * | 2022-06-29 | 2022-09-27 | 海光信息技术股份有限公司 | Multi-chip system and starting method based on same |
Family Cites Families (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US4942575A (en) * | 1988-06-17 | 1990-07-17 | Modular Computer Systems, Inc. | Error connection device for parity protected memory systems |
| GB8815239D0 (en) * | 1988-06-27 | 1988-08-03 | Wisdom Systems Ltd | Memory error protection system |
| CA2072728A1 (en) * | 1991-11-20 | 1993-05-21 | Michael Howard Hartung | Dual data buffering in separately powered memory modules |
| WO1993018461A1 (en) * | 1992-03-09 | 1993-09-16 | Auspex Systems, Inc. | High-performance non-volatile ram protected write cache accelerator system |
| US5408644A (en) * | 1992-06-05 | 1995-04-18 | Compaq Computer Corporation | Method and apparatus for improving the performance of partial stripe operations in a disk array subsystem |
| US5448719A (en) * | 1992-06-05 | 1995-09-05 | Compaq Computer Corp. | Method and apparatus for maintaining and retrieving live data in a posted write cache in case of power failure |
| US5437022A (en) * | 1992-12-17 | 1995-07-25 | International Business Machines Corporation | Storage controller having additional cache memory and a means for recovering from failure and reconfiguring a control unit thereof in response thereto |
-
1996
- 1996-05-20 AU AU57905/96A patent/AU5790596A/en not_active Abandoned
- 1996-05-20 WO PCT/US1996/006520 patent/WO1996041249A2/en active Application Filing
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115114200A (en) * | 2022-06-29 | 2022-09-27 | 海光信息技术股份有限公司 | Multi-chip system and starting method based on same |
| CN115114200B (en) * | 2022-06-29 | 2023-11-17 | 海光信息技术股份有限公司 | Multi-chip system and starting method based on same |
Also Published As
| Publication number | Publication date |
|---|---|
| WO1996041249A3 (en) | 1997-08-21 |
| AU5790596A (en) | 1996-12-30 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US5708668A (en) | Method and apparatus for operating an array of storage devices | |
| US5758054A (en) | Non-volatile memory storage of write operation identifier in data storage device | |
| JP2831072B2 (en) | Disk drive memory | |
| US5487160A (en) | Concurrent image backup for disk storage system | |
| JP3164499B2 (en) | A method for maintaining consistency of parity data in a disk array. | |
| CN1038710C (en) | Method and apparatus for recovering parity-protected data | |
| US5566316A (en) | Method and apparatus for hierarchical management of data storage elements in an array storage device | |
| US5548711A (en) | Method and apparatus for fault tolerant fast writes through buffer dumping | |
| US5233618A (en) | Data correcting applicable to redundant arrays of independent disks | |
| JP3129732B2 (en) | Storage array with copy-back cache | |
| US5488701A (en) | In log sparing for log structured arrays | |
| US7228381B2 (en) | Storage system using fast storage device for storing redundant data | |
| US5379417A (en) | System and method for ensuring write data integrity in a redundant array data storage system | |
| US7464322B2 (en) | System and method for detecting write errors in a storage device | |
| JP3230645B2 (en) | Data processing method, system and device | |
| US7130973B1 (en) | Method and apparatus to restore data redundancy and utilize spare storage spaces | |
| JP3270959B2 (en) | Parity storage method in disk array device and disk array device | |
| JP2857288B2 (en) | Disk array device | |
| WO1996041249A2 (en) | Intelligent disk-cache memory | |
| JP3845239B2 (en) | Disk array device and failure recovery method in disk array device | |
| GB2402803A (en) | Arrangement and method for detection of write errors in a storage system | |
| JPH0962461A (en) | Automatic data restoring method for disk array device | |
| JP2857289B2 (en) | Disk array device | |
| KR100205289B1 (en) | How to prevent loss of recorded data | |
| JPH08137627A (en) | Disk array device |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AK | Designated states |
Kind code of ref document: A2 Designated state(s): AL AM AT AU AZ BB BG BR BY CA CH CN CZ DE DK EE ES FI GB GE HU IS JP KE KG KP KR KZ LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK TJ TM TR TT UA UG UZ VN AM AZ BY KG KZ MD RU TJ TM |
|
| AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): KE LS MW SD SZ UG AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN ML |
|
| DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
| AK | Designated states |
Kind code of ref document: A3 Designated state(s): AL AM AT AU AZ BB BG BR BY CA CH CN CZ DE DK EE ES FI GB GE HU IS JP KE KG KP KR KZ LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK TJ TM TR TT UA UG UZ VN AM AZ BY KG KZ MD RU TJ TM |
|
| AL | Designated countries for regional patents |
Kind code of ref document: A3 Designated state(s): KE LS MW SD SZ UG AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN ML |
|
| REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
| 122 | Ep: pct application non-entry in european phase | ||
| NENP | Non-entry into the national phase |
Ref country code: CA |