[go: up one dir, main page]

HK1090451B - System and method for direct deposit using locking cache - Google Patents

System and method for direct deposit using locking cache Download PDF

Info

Publication number
HK1090451B
HK1090451B HK06112000.7A HK06112000A HK1090451B HK 1090451 B HK1090451 B HK 1090451B HK 06112000 A HK06112000 A HK 06112000A HK 1090451 B HK1090451 B HK 1090451B
Authority
HK
Hong Kong
Prior art keywords
cache
data
lock
subsystem
processor
Prior art date
Application number
HK06112000.7A
Other languages
Chinese (zh)
Other versions
HK1090451A1 (en
Inventor
迈克尔.诺曼.戴
查尔斯.约翰斯
苏翁.特鲁翁
Original Assignee
国际商业机器公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US10/976,263 external-priority patent/US7290107B2/en
Application filed by 国际商业机器公司 filed Critical 国际商业机器公司
Publication of HK1090451A1 publication Critical patent/HK1090451A1/en
Publication of HK1090451B publication Critical patent/HK1090451B/en

Links

Description

System and method for direct deposit using a locked cache
Cross Reference to Related Applications
This application relates to a co-pending U.S. patent application entitled "Method for a processor to Use a locked Cache as Part of System Memory" (attorney docket number: AUS920040324US1) filed concurrently herewith in the name of Michael Norman Day, Charles Ray Johns, and Thuong Quang Truong.
Technical Field
The present invention relates generally to memory management and, more particularly, to cache memory usage.
Background
Memory access latency (time to wait) for writing to and reading from memory is often a problem for software programs. In current computers, processor cycles are much shorter than the time for memory access. Moreover, this problem becomes more and more severe. Processor speed is increasing exponentially, but memory access is only growing progressively.
One remedy to partially address the memory access latency problem is memory layering. The main memory is large in capacity but the slowest. Above the main memory is a plurality of layers of memory or cache memory that successively become smaller and faster.
The current use of cache memory presents a number of problems. When the cache does not contain the required data, the read from the cache may fail. The data must be accessed from the slower main memory. It may not be permissible to attempt to write data exclusively to the cache. Data transfers from I/O devices, networks, or storage disks may require that the data be written exclusively to main memory, or also to local memory or cache memory at the same time. In either case, there is a delay in writing to the slower main memory. Moreover, there may be a delay in accessing the data. In the first case, the processor must access the data from the main memory for processing, which causes access delays. In the second case, data written to the cache may be replaced with other data before access thereto. When this occurs, the replaced data is written to the main memory. Then, to use this data, the processor must access it from main memory.
Therefore, there is a need for a way to store data from a processor, I/O device, network, or storage disk into a cache or other fast memory, without also storing it into main memory. In addition, the method must ensure that the data remains in the cache or other fast memory until it is used.
Disclosure of Invention
According to a first aspect of the present invention, there is provided a computer system for transferring data directly to a lock cache and retaining the data therein until the data is accessed for use, the system comprising:
a processor;
a cache memory coupled to the processor, wherein the cache memory is divided into a locked cache memory and an unlocked cache memory;
a cache controller coupled to the processor and the cache memory;
a system bus coupled to the cache memory;
an I/O subsystem coupled to the system bus;
one or two address range registers storing a particular address range for accessing the lock cache, such that the lock cache appears to be additional system memory;
wherein the I/O subsystem is configured to issue a store instruction to transfer data to the lock cache without also transferring data to system memory;
wherein the cache controller is configured to determine whether the store instruction is within a particular address range for the lock cache and, if the store instruction is within the particular address range, store data from the I/O subsystem to the lock cache; and
wherein the processor is configured to issue a signal to indicate that data stored in the lock cache from the I/O subsystem may be overwritten in response to loading the data stored in the lock cache from the I/O subsystem for use.
According to a second aspect of the present invention, there is provided a method for transferring data directly to a lock cache and retaining the data therein until the data is accessed for use, the method comprising:
dividing a cache into a locked cache and a non-locked cache;
receiving, by a cache controller, a store instruction from an I/O subsystem;
determining, by a cache controller, whether a store instruction is within a particular address range for accessing a locked cache, wherein one or both address range registers store the particular address range, such that the locked cache appears to be additional system memory;
if the store instruction is within the particular address range, transferring data from the I/O subsystem to the lock cache; and
in response to loading the data stored in the lock cache from the I/O subsystem for use, a signal is issued to indicate that the data stored in the lock cache from the I/O subsystem may be overwritten.
Drawings
For a more complete understanding of the present invention and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which:
FIG. 1 illustrates a block diagram of a system for storing data transferred from an I/O subsystem to a lock cache;
FIG. 2 shows a flowchart illustrating an I/O subsystem storing data in a lock cache;
FIG. 3 shows a flowchart illustrating address range checking when a processor stores data;
FIG. 4 is a schematic diagram showing a memory arrangement from the perspective of an I/O subsystem;
FIG. 5 illustrates an alternative management table; and
FIG. 6 illustrates locking a partition of a cache.
Detailed Description
In the following discussion, numerous specific details are set forth to provide a thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced without these specific details. In other instances, well-known elements have been illustrated in schematic or block diagram form in order not to obscure the present invention in unnecessary detail.
It should also be noted that, unless indicated otherwise, all functions described herein may be performed in either hardware or software, or some combination thereof. However, in a preferred embodiment, the functions are performed by a processor, such as a computer or an electronic data processor, in accordance with code, such as computer program code, software, and/or integrated circuits that are coded to perform such functions, unless indicated otherwise.
FIG. 1 illustrates a block diagram of a system for storing data transferred from an I/O subsystem to a lock cache. The processor 102 is coupled to the cache memory 110, the cache controller 106, and the first address range register set 104. A Replacement Management Table (RMT)108 is coupled to the address range register set 104 and the cache controller 106. The cache controller 106 and the cache memory 110 are coupled to a system bus 112. A second set of address range registers 116 is coupled to the system bus 112 and the RMT 108. The system bus 112 is also coupled to an input/output (I/O) subsystem 114. In one embodiment of the invention, the lock cache comprises a set or sets, but not all, of a multi-set associative (set-associative) cache 110. The remaining sets are used for conventional cache memory. Dividing the cache memory 110 into a plurality of disjoint sets for both the conventional cache memory and the locked cache memory prevents data written to the locked cache memory from being overwritten by data written to the cache memory used in the conventional manner.
The space in the lock cache may be considered additional system memory, with a higher address range than the actual main system memory, for both the processor 102 and the I/O subsystem 104. In one embodiment of the invention, two register sets, a first address range register set 104 for the processor 102 and a second address range register set 116 for the I/O device, determine accesses to the lock cache. The two sets include two address range registers and one mask register. The access address of the load or store instruction/bus command is compared to the contents of the address range register. Then, a class _ id is provided as an index in the Replacement Management Table (RMT) 108. The RMT 108 indicates which sets of cache memory 110 are available for load or store instruction/bus commands. Transactions having access addresses within a particular range may access the locked cache. Other transactions are written to other sets or ways of the cache memory 110.
Access to the lock cache is under software control. When processor 102 or I/O subsystem 114 completes writing data to the locked portion of cache memory 110, it may issue a signal indicating that the data is available. Once notified, the processor 102 or I/O subsystem 114 using the data obtains the data from the locked portion of the cache memory 110 and issues a signal indicating that the data has been read. The space where the data is stored is then available for additional writes. To ensure the validity of data, a region of the lock cache to which data is being written by one device cannot be read or written by another device at the same time.
As a result of the FIG. 1 system, both the processor 102 and the I/O subsystem 114 may write newly generated data to the locked cache, rather than to much slower main memory. Moreover, both processor 102 and I/O subsystem 114 may load data from the lock cache, avoiding the delay that would result from loading the data from main memory. Initially, data is written to the locked cache and locked in the cache 110 until it is accessed.
In the locked portion of the cache, the data is marked as valid and modified. When an IO controller or other device accesses this data, the IO controller or other device that loaded the data issues a request to load, but not modify. The cache snoops this request. In the present data state, the cache controller 106 will intervene to satisfy this request. When data within the locked cache address range is loaded by the processor, the cache controller 106 returns the data to the processor as a cache hit.
When data is being stored from the system bus 112, the cache controller 106 will detect the transaction by using the address range information. If the address is within the range of addresses used to access the locked cache, the cache controller 106 will update the locked set of the cache 110 with new data, but without changing the cache state. Data in this address range is stored from processor 102 because of the "valid and modified" cache state, no bus transactions are required.
FIG. 2 shows a flowchart illustrating the I/O subsystem 114 storing data in the lock cache. In step 202, the I/O subsystem 114 sends a storage request to the system bus 112. In step 204, the cache controller 106 checks the address range of the request using a pair of address range registers in the address range register set 116. In one embodiment of the invention, the cache controller may also use a mask register. In step 205, it is determined whether the address of the request is within range. If the address is within the range for the locked cache, then in step 206, the data is written to the locked cache. If the address is not within the range, then in step 208, the data is written to the non-locked portion of the cache or to system memory. In one embodiment of the invention, the data is written to system memory in step 208. In another embodiment of the present invention, data is written to system memory and cache memory 110, but not to the cache portion of the locked cache, in step 208. In yet another embodiment of the present invention, data is written to the cache memory 110, but not used to lock portions of the cache memory, in step 208.
FIG. 3 shows a flow chart illustrating address range checking when a processor stores data. In step 302, processor 102 issues a storage request. In step 304, a pair of address range registers in address range register set 104 checks the address range of the request. In one embodiment of the invention, the address range register set 104 may also contain a mask register. In step 305, it is determined whether the address of the request is within range. If the address is within the range for the lock cache, then in step 306, the data is written to the lock cache. If the address is not within the range, then in step 308, the data is written to the non-locked portion of the cache or to system memory. In three different embodiments of the present invention, in step 308, the data is written into the system memory respectively; write to system memory and cache 110, but not the cache portion used for the lock cache; and write to cache 110, but not the portion used to lock the cache.
FIG. 4 is a schematic diagram showing a memory arrangement from the perspective of an I/O subsystem. The lock cache appears to be an additional system memory with address ranges above the main system memory. In FIG. 4, main memory ends with address OX60000 (hex), while the lock cache includes addresses OX60001 (hex) through OX60FFF (hex). The lock cache shown in FIG. 4 contains 4kb of capacity. The lock cache capacity is implementation dependent. Although the main memory and lock cache address spaces are contiguous in FIG. 4, in other embodiments, the address spaces do not have to be contiguous.
FIG. 5 shows a Replacement Management Table (RMT)500 having four rows of entries, 502, 504, 506 and 508, each indexed by binary numbers 00, 01, 10 and 11, respectively. The entries in each row of RMT 500 indicate which sets in the cache are available for transactions. A column corresponds to a way or set of cache memory 110. A1 bit in the column indicates that the corresponding way is available for the transaction, while a 0 bit indicates that the corresponding way is unavailable. The transaction involving lock cache 404 is provided with a class id that gives the index of a line in which the set that includes the lock cache is 1 and the other set is 0. Transactions that do not involve lock caches are provided with a class id that gives the index of a line in which the set that includes the lock cache is 0 and at least one set in the cache that does not involve lock cache is 1. The cache memory corresponding to the RMT in fig. 5 has eight sets or ways. The first set is used as a lock cache and the remaining sets are used for a regular cache. For RMT, there are four rows. The index 01 corresponding to the second row 504 is used for transactions that access the lock cache. A "1" in the first column of row 504 indicates that a first set for locking cache is available for the transaction. A "0" in the remaining columns of row 504 indicates that the other sets in the cache are not available for the transaction. Other rows 502, 506, and 508 indicate that the set for the lock cache is not available, however, the sets including the normal cache are available.
In other embodiments, there may be multiple sets for the lock cache. In those embodiments, the software selects a collection that stores particular data. The software may begin writing to the first set of locked caches. When that set is full, the software may begin writing to the second set of lock caches.
FIG. 6 illustrates the partitioning of the lock cache into four partitions or segments. Processor 102 may store data in the first two segments while I/O subsystem 114 may store data in the remaining two segments. Thus, while the processor is waiting for the I/O subsystem 114 to access data that has been written to the first segment 602, the data may be written to the second segment 604. Similarly, while I/O subsystem 114 is waiting for processor 102 to access data that has been written to fourth segment 608, data may be written to third segment 606. Thus, both the processor 102 and the I/O subsystem 114 may avoid delays waiting for data to be accessed before storing other data.
Having thus described the present invention by reference to the preferred embodiments thereof, it is noted that the embodiments disclosed are illustrative rather than limiting in nature and that a wide range of variations, modifications, changes, and substitutions are possible in the foregoing disclosure and, in some instances, some features of the present invention may be employed without a corresponding use of the other features. Many variations and modifications will become apparent to those skilled in the art upon review of the foregoing description of the preferred embodiment. Accordingly, it is appropriate that the appended claims be construed broadly and in a manner consistent with the scope of the invention.

Claims (13)

1. A computer system for transferring data directly to a lock cache and retaining the data therein until the data is accessed for use, the system comprising:
a processor;
a cache memory coupled to the processor, wherein the cache memory is divided into a locked cache memory and an unlocked cache memory;
a cache controller coupled to the processor and the cache memory;
a system bus coupled to the cache memory;
an I/O subsystem coupled to the system bus;
one or two address range registers storing a particular address range for accessing the lock cache, such that the lock cache appears to be additional system memory;
wherein the I/O subsystem is configured to issue a store instruction to transfer data to the lock cache without also transferring data to system memory;
wherein the cache controller is configured to determine whether the store instruction is within a particular address range for the lock cache and, if the store instruction is within the particular address range, store data from the I/O subsystem to the lock cache; and
wherein the processor is configured to issue a signal to indicate that data stored in the lock cache from the I/O subsystem may be overwritten in response to loading the data stored in the lock cache from the I/O subsystem for use.
2. The computer system of claim 1, wherein the computer system,
wherein the processor is configured to issue a request to write data to the locked cache, without also writing data to the system memory,
wherein the cache controller is configured to determine whether the request is within a particular address range for the lock cache and, if the request is within the particular address range, store data from the processor to the lock cache; and
wherein the I/O subsystem is configured to issue a signal to indicate from the processor that the data stored in the lock cache may be overwritten in response to loading the data stored in the lock cache from the processor for use.
3. The computer system of claim 1, wherein the cache memory includes a plurality of sets or ways, and wherein the lock cache memory includes one or more sets or ways, but not all sets or ways, in the cache memory.
4. The computer system of claim 3, further comprising a Replacement Management Table (RMT), wherein an entry in the RMT indicates which set or sets in the cache are available for the transaction.
5. A method for transferring data directly to a lock cache and retaining the data therein until the data is accessed for use, the method comprising:
dividing a cache into a locked cache and a non-locked cache;
receiving, by a cache controller, a store instruction from an I/O subsystem;
determining, by a cache controller, whether a store instruction is within a particular address range for accessing a locked cache, wherein one or both address range registers store the particular address range, such that the locked cache appears to be additional system memory;
if the store instruction is within the particular address range, transferring data from the I/O subsystem to the lock cache; and
in response to loading the data stored in the lock cache from the I/O subsystem for use, a signal is issued to indicate that the data stored in the lock cache from the I/O subsystem may be overwritten.
6. The method of claim 5, further comprising the step of issuing a signal when data is transferred from the I/O subsystem to the lock cache.
7. The method of claim 5, further comprising the step of:
receiving, by a cache controller, a store request from a processor;
determining, by the cache controller, whether the store request is within a particular address range for accessing the locked cache, and if the store request is within the particular address range, transferring data from the processor to the locked cache; and
in response to loading the data stored in the lock cache from the processor for use, a signal is issued to indicate that the data stored in the lock cache from the processor may be overwritten.
8. The method of claim 7, further comprising the step of:
a partition lock cache;
allocating one or more, but not all, partitions of the lock cache to the I/O subsystem; and
allocating one or more partitions not allocated to the I/O subsystem to a processor of the computer system;
wherein data transferred from the I/O subsystem to the lock cache is written to only the partition of the lock cache allocated to the I/O subsystem and data written from the processor to the lock cache is written to only the partition of the lock cache allocated to the processor.
9. The method of claim 8, wherein the plurality of partitions are allocated to the I/O subsystem, and the data is transferred from the I/O subsystem to the second partition after the transfer of the data from the I/O subsystem to the first partition is completed.
10. The method of claim 5, wherein the cache memory comprises a plurality of sets or ways, and wherein the lock cache memory comprises one or more sets or ways, but not all sets or ways, in the cache memory.
11. The method of claim 10, wherein the RMT is used to indicate which set or sets in the cache are available for transactions.
12. The method of claim 5, wherein data in the lock cache is marked as "valid and modified".
13. The method of claim 12, wherein loading data for use comprises:
issue a load, but not intend to modify the request;
monitoring a request; and
data from the cache is transferred.
HK06112000.7A 2004-10-28 2006-11-01 System and method for direct deposit using locking cache HK1090451B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US10/976,263 US7290107B2 (en) 2004-10-28 2004-10-28 Direct deposit using locking cache
US10/976,263 2004-10-28

Publications (2)

Publication Number Publication Date
HK1090451A1 HK1090451A1 (en) 2006-12-22
HK1090451B true HK1090451B (en) 2009-03-06

Family

ID=

Similar Documents

Publication Publication Date Title
US7590802B2 (en) Direct deposit using locking cache
US6047358A (en) Computer system, cache memory and process for cache entry replacement with selective locking of elements in different ways and groups
US10169232B2 (en) Associative and atomic write-back caching system and method for storage subsystem
US6192458B1 (en) High performance cache directory addressing scheme for variable cache sizes utilizing associativity
US6901483B2 (en) Prioritizing and locking removed and subsequently reloaded cache lines
US6877067B2 (en) Shared cache memory replacement control method and apparatus
EP0377970B1 (en) I/O caching
US20100325374A1 (en) Dynamically configuring memory interleaving for locality and performance isolation
US6023746A (en) Dual associative-cache directories allowing simultaneous read operation using two buses with multiplexors, address tags, memory block control signals, single clock cycle operation and error correction
US7596665B2 (en) Mechanism for a processor to use locking cache as part of system memory
US6332179B1 (en) Allocation for back-to-back misses in a directory based cache
EP0708404A2 (en) Interleaved data cache array having multiple content addressable fields per cache line
US6950909B2 (en) System and method for reducing contention in a multi-sectored cache
US20020108019A1 (en) Cache system with DMA capabilities and method for operating same
EP0284751B1 (en) Cache memory
EP0533427B1 (en) Computer memory control system
KR100326989B1 (en) Method and system for pre-fetch cache interrogation using snoop port
US5822764A (en) Method and circuit for efficiently replacing invalid locked portions of a cache with valid data
US5890221A (en) Method and system for offset miss sequence handling in a data cache array having multiple content addressable field per cache line utilizing an MRU bit
US20020108021A1 (en) High performance cache and method for operating same
US9128856B2 (en) Selective cache fills in response to write misses
US20070266199A1 (en) Virtual Address Cache and Method for Sharing Data Stored in a Virtual Address Cache
US6757785B2 (en) Method and system for improving cache performance in a multiprocessor computer
HK1090451B (en) System and method for direct deposit using locking cache
CN117785737A (en) Last level cache based on linked list structure and supporting dynamic partition granularity access