WO2000026741A2 - A method for delivering data to an instruction processing unit - Google Patents
A method for delivering data to an instruction processing unit Download PDFInfo
- Publication number
- WO2000026741A2 WO2000026741A2 PCT/SE1999/001962 SE9901962W WO0026741A2 WO 2000026741 A2 WO2000026741 A2 WO 2000026741A2 SE 9901962 W SE9901962 W SE 9901962W WO 0026741 A2 WO0026741 A2 WO 0026741A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- read request
- data
- read
- request
- memory
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3824—Operand accessing
Definitions
- This disclosure relates to the (field of computer systems and more particularly to a method of delivering data to an instruction processing unit in such computer systems.
- a data storage controller controls access to data storage units (DSUs) that comprise system memory with addressable memory locations, which are generally within a continuous range of predefined logical addresses.
- DSC data storage controller
- the DSC processes read and write requests generated by an instruction processing unit (IPU) executing a program that requests data to be read from or written into a particular memory location.
- IPU instruction processing unit
- the DSC initiates corresponding read or write cycles over a memory bus, for accessing the addressed memory locations.
- the rate by which data is transferred, i.e., the data throughput, during each memory cycle is dependent on the bus speed as well as the width of the system's data bus and the length of a memory location, which is defined in terms of data bits, for example, 8-bit, 16-bit, or 32-bit, etc.
- Various techniques have been devised to increase the data throughput by minimizing the time required to access the system memory. For example, under an scheme known as interleaved memory access, each DSU is addressable over a corresponding internal bus having an assigned range of physical addresses. In this way, a memory accesses over one internal bus can start ahead of completion of a prior access over another internal bus, provided that the memory bus bandwidth supports the execution of parallel memory accesses over the internal busses.
- a separate memory management unit maps the logical memory addresses into the physical addresses.
- two back-to-back read requests from a two-word (32 bits) memory location may request data contained in a first nibble and a successive second nibble.
- Memory caching makes use of a principle known as locality based upon which a program executed by the IPU is written to use a certain portion of the system memory more frequently, either on a temporal or spatial basis.
- Temporal locality is referred to when a recently accessed memory location has a high probability of soon being accessed again.
- Spatial locality is referred to when adjacent memory locations to a recently accessed location have a high probability of soon being accessed again.
- a cache memory is made of a significantly faster memory device, such as Static Random Access Memory (SRAM), that is positioned in close physical proximity to the IPU. By caching the frequently accessed data in the faster cache memory, the overall access time of the system memory is significantly reduced.
- SRAM Static Random Access Memory
- Some computers use separate queues for queuing read and write requests pending execution.
- the dependency on the pending write request is resolved by forcing the execution of the write request up to the pending write request.
- a write dependency in conventional systems could delay the servicing of a subsequently queued non-dependent read request to another physical location. As illustrated in Fig.
- a first read request (READ #1) that requests a first sub-variable in a first memory location at a time t l5 the entire length of the memory location is cached at time t 2 , which is within a predefined number of clock cycles from t,.
- the content of cache memory is checked to determine wether the data variable in response to the first read request is cached or not. If so, the requested second sub-variable is returned during the next clock cycle, since as explained above, a previously cached data variable to the same memory location caches the second sub-variable as well.
- the second read request must wait until the requested data is delivered to the cache memory.
- the second read request prevents the execution of a subsequent third read request (READ #3) until time t 3 , even though the third read request may be directed to a separate physical address, thereby creating a long queue of non-dependent read requests. Therefore, there exists a need for a faster method of delivering data to the IPU.
- one or more data storage units store data variables at a plurality of addressable memory locations.
- a cache memory caches data variables when they are accessed from the DSUs.
- a read access queue queues successive read requests for data variables.
- a data storage controller determines whether there is a cache hit for a data requested by a read request, with the cache hit, for example, being based on a read request to a different variable at the same memory location. Upon a cache hit, if the requested data variable is not cached in the cache memory, the DSC temporarily stores the read request in a register to allow a subsequent read request to be executed.
- the cache hit condition is indicated when accessing successive variables in the same memory location.
- the computer system may also be a read priority system having a separate write access queue for queuing write requests modifying data variables stored in the DSU. Under this arrangement, a dependent read request on a pending write request in the write access queue would cause the execution of the pending write request.
- the dependent read request is temporarily queued in the register, while the pending write request is executed.
- the DSUs are accessed on a memory interleaved basis.
- data stored in the DSUs are cached in a cache memory.
- successive read requests for the data variables are queued in a read access queue.
- a determination is made as to whether there is a cache hit condition for a data variable requested by a read request.
- the read request is temporarily stored in a register, and a subsequent read request in the read access queue is executed, while the read request is temporarily stored.
- a method for delivering data variable stored in one or more DSUs queues successive read requests for data variables and caches the data variables requested by the read requests in a cache memory. Then, a first read request is executed. A determination is made as to whether a cache hit condition exists for a subsequent second read request as a result of executing the first read request. If so, the second read request is temporarily stored in a register, if the data variable requested by the first read request is not cached. While the second read request is temporarily stored in the register, a third subsequent read request is executed.
- the step of executing the first read request generates a read cycle and indicates a "no data" condition in the cache, while the read cycle has not completed.
- the second read request is temporarily stored in the register upon determination of the cache hit condition and "no data" condition.
- the first read and second read requests may request data for different variable within the same memory location.
- FIG. 1 is a timing diagram for delivering data according to the prior art.
- FIG. 2 is a block diagram of a computer system that advantageously incorporates the present invention.
- FIG. 3 is a timing diagram for delivering data according to the present invention.
- FIG. 4 is a block diagram of a DSC that is incorporated in the computer system of FIG. 2.
- the computer system 10 is a telephony computer system providing switching control for a public system telephone network (PSTN) 12.
- PSTN public system telephone network
- the system 10 operates under the control of an Instruction Processor Unit (IPU) 14 that exchanges data stored in a plurality of interleaved Data Storage Units (DSU) 16 by executing a program that generates memory access requests, including read requests and write requests.
- IPU Instruction Processor Unit
- DSU interleaved Data Storage Units
- a read request requests data variables or sub- variables from a specified memory location, and a write request modifies data variables or sub-variables in the same memory location.
- each memory location stores 32 bits of data that are addressable by a 32-bit address.
- the interleaved arrangement of the DSUs 16 allows for data access to one DSU to start, while an access to another DSU is continuing.
- An Instruction Que Controller (IQC) 18 within the IPU is responsible for sequencing the requests and providing them to a Data Storage Handler (DSC) 20.
- the DSC 20 is responsible for generating memory cycles over a memory bus 22.
- the DSC 20 includes a cache memory that uses the locality principle to speed up the delivery of data to the IPU 14.
- a subsequent second read request that produces a cache hit based on the pendency of the first read request, for example, a read request to a different data variable at the same memory location, is temporarily stored to allow a third subsequent read request to be processed, as illustrated by Fig. 3.
- the second read request is temporarily stored in a pending cache hit (PCH) register, to allow the third read request to be executed.
- PCH pending cache hit
- the DSC 20 saves the information regarding which ongoing read request would deliver the requested data and from which cache memory position the data can be retrieved.
- the PCH register recognizes that data for a pending hit has been cached, it autonomously asserts that data is ready for delivery and the second read request can deliver the requested data. Therefore, once the data corresponding to the first read request is cached in the memory, the data requested by the second read request, which may be a part of the cached data, is returned to the IPU 14 during a subsequent clock cycle.
- the computer system 10 is designed as a read priority system, where the read requests have priority over the write requests.
- the read priority arrangement situations arises when a read request becomes dependent on a pending write request. In this situation, the pending write request is given priority over the pending read requests.
- the system 10 employs well known pipelining and prefetching techniques for executing the IPU instructions. Under the pre-fetching technique, newly arrived instructions are fetched prior to the execution of a previous instruction, thereby increasing execution efficiency. Under the pipeline execution technique, the IPU instructions are subdivided into smaller sub-tasks, with each sub-task being performed by a corresponding register.
- the ADD instruction For executing an ADD instruction, for example, the ADD instruction is fetched from an Instruction Storage Unit (not shown), decoded by an instruction decoder (not shown), and processed in an Arithmatic Logic Unit (ALU) (not shown).
- ALU Arithmatic Logic Unit
- corresponding registers separately perform the pre-fetching function, decoding function and ALU function, thereby performing multiple ADD functions substantially simultaneously. Accordingly, the computer system 10 uses separate read and write queues in the DSC 20, to implement these techniques.
- the DSC 20 includes a multiple-element Read Access Queue (RAQ) 26 that stores IQC-generated read requests for reading data variables from specified DSU memory locations. Also, the DSC 20 includes a Write Access Queue (WAQ) 38 that the queues generated write accesses.
- the IQC 18 may also flush the RAQ 26, clearing some or all of its elements.
- the IQC 18 has an internal buffer (not shown) that is of equal length to the RAQ 26, to prevent it from becoming full.
- the RAQ 26 is an 8-element queue with each element having 46 bits as defined by Table 1 below.
- PN is a position valid flag that indicates whether a RAQ element valid or not. For example, when a read request is flushed, PV is reset.
- Tag is an access sequence number assigned by the IQC 18 to each read request. Address specifies the memory location from which a data variable or sub-variable to be read. As mentioned above, in the computer system 10, the read requests have the highest priority and usually 'pass through' the RAQ 26 unimpeded, unless one becomes dependent on an un-executed write requets.
- MTAG is a match tag assigned to each write request and is returned to a corresponding RAQ element, when a read request becomes dependent on a pending write request.
- a forced write cycle is started to write the data associated with the pending write request into the DSU 16 and the cache memory 24, which in the exemplary embodiment of the present invention, has 16 positions.
- the each memory 24 is fully associative, using the hit logic 28 for addressing purpose.
- the cache uses a FIFO algorithm for replacing an earliest requested data with a latest requested data. Each cache position is 71 bits wide, as shown in Table 2.
- Each cache position contains an address and a tag for mapping the address with the delivered data from the DSUs 16.
- all cached data are designated as valid, which is indicated by a DV flag.
- the DV flag for a corresponding cache position where the requested memory location data is to be delivered indicates a "no data” condition.
- the DV flag is reset to indicate that the cache memory 24 position does not contain any data, but its corresponding read request is under execution.
- the read request when a subsequent read request from the RAQ 26 gets a hit in a "no data" position, the read request is transferred to one of the positions of a Pending Cache Hit (PCH) register 30, which temporarily stores the read request until the pending read cycle caches the requested data variable. While the read requets is stored in the PCH register 30, a subsequent read request may proceed to retrieve data from the cache memory 24. In this way, the present invention prevents a queued read request in the RAQ 26 from being blocked while a previous request is waiting for completion of a pending read cycle, especially when reading successive data variables within the same memory location.
- PCH Pending Cache Hit
- a position in the cache memory 24 is reserved by storing the position to which the modified data of the pending write request is to be returned.
- the DV flag for the position is reset.
- NR (NO READ) flag is used to indicate whether a cached data is dirty or not. That is whether the data stored in a corresponding cache position is about to be invalidated by a write request. In case a subsequent read request is directed at a dirty cache position, a cache hit is not indicated. In this way, read accesses stored in the PCH register make use of the dirty data, while subsequent read requests use clean data.
- the PCH register 30 is an 8 element register, with each element having a bit content as shown below in Table 3.
- a 17-element Data Store Queue (DSQ) 42 keeps track of executing memory accesses and memory refresh functions. Also, the DSQ detects collisions, concurrently matching addresses from the RAQ 26, WAQ 38 and the refresh function. Preferably, the length of the DSQs is selected to accommodate the longest cycle time in the DSU. Mapping of the cached data and returned data with a read request is handled by "Tag", CAPTAG and RATAG. CAPTAG is used for mapping a read request with a delivered data to the cache from the DSU. CAPTAG is identical to the "Tag" shown in Table 4 as it corresponds to an ongoing read request.
- the CAPTAG is used to address the cache to associate a returned data with RATAG, which is provided from the DSQ 42.
- RATAG is returned along with the returned data to the IQC to associate a read request with the delivered data.
- the present invention reduces both the number of main memory accesses needed and the number of stalled cycles for subsequent read accesses.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
Description
Claims
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| AU14326/00A AU1432600A (en) | 1998-10-30 | 1999-11-01 | A method for delivering data to an instruction processing unit |
| EP99971528A EP1145094A3 (en) | 1998-10-30 | 1999-11-01 | A method for delivering data to an instruction processing unit |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US18286598A | 1998-10-30 | 1998-10-30 | |
| US09/182,865 | 1998-10-30 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| WO2000026741A2 true WO2000026741A2 (en) | 2000-05-11 |
| WO2000026741A3 WO2000026741A3 (en) | 2002-01-10 |
Family
ID=22670389
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/SE1999/001962 Ceased WO2000026741A2 (en) | 1998-10-30 | 1999-11-01 | A method for delivering data to an instruction processing unit |
Country Status (3)
| Country | Link |
|---|---|
| EP (1) | EP1145094A3 (en) |
| AU (1) | AU1432600A (en) |
| WO (1) | WO2000026741A2 (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8255669B2 (en) | 2008-01-30 | 2012-08-28 | International Business Machines Corporation | Method and apparatus for thread priority control in a multi-threaded processor based upon branch issue information including branch confidence information |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5787465A (en) * | 1994-07-01 | 1998-07-28 | Digital Equipment Corporation | Destination indexed miss status holding registers |
| US5758178A (en) * | 1996-03-01 | 1998-05-26 | Hewlett-Packard Company | Miss tracking system and method |
-
1999
- 1999-11-01 WO PCT/SE1999/001962 patent/WO2000026741A2/en not_active Ceased
- 1999-11-01 EP EP99971528A patent/EP1145094A3/en not_active Withdrawn
- 1999-11-01 AU AU14326/00A patent/AU1432600A/en not_active Abandoned
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8255669B2 (en) | 2008-01-30 | 2012-08-28 | International Business Machines Corporation | Method and apparatus for thread priority control in a multi-threaded processor based upon branch issue information including branch confidence information |
Also Published As
| Publication number | Publication date |
|---|---|
| EP1145094A2 (en) | 2001-10-17 |
| EP1145094A3 (en) | 2002-03-27 |
| AU1432600A (en) | 2000-05-22 |
| WO2000026741A3 (en) | 2002-01-10 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US6185660B1 (en) | Pending access queue for providing data to a target register during an intermediate pipeline phase after a computer cache miss | |
| EP0795820B1 (en) | Combined prefetch buffer and instructions cache memory system and method for providing instructions to a central processing unit utilizing said system. | |
| US5353426A (en) | Cache miss buffer adapted to satisfy read requests to portions of a cache fill in progress without waiting for the cache fill to complete | |
| CA1325285C (en) | Method and apparatus for ordering and queueing multiple memory requests | |
| JP4486750B2 (en) | Shared cache structure for temporary and non-temporary instructions | |
| JP3549079B2 (en) | Instruction prefetch method of cache control | |
| EP0457403A2 (en) | Multilevel instruction cache, method for using said cache, method for compiling instructions for said cache and micro computer system using such a cache | |
| EP0097790A2 (en) | Apparatus for controlling storage access in a multilevel storage system | |
| JP2010532057A (en) | Cache and method for multi-threaded and multi-core systems | |
| WO2004049171A2 (en) | Microprocessor including cache memory supporting multiple accesses per cycle | |
| US6715035B1 (en) | Cache for processing data in a memory controller and a method of use thereof to reduce first transfer latency | |
| KR100234647B1 (en) | Data processing system with instruction prefetch | |
| JP3763579B2 (en) | Apparatus and method for reducing read miss latency by predicting instruction read first | |
| US7461205B2 (en) | Performing useful computations while waiting for a line in a system with a software implemented cache | |
| US20180095893A1 (en) | Queuing memory access requests | |
| US5938761A (en) | Method and apparatus for branch target prediction | |
| US20070180158A1 (en) | Method for command list ordering after multiple cache misses | |
| US5860150A (en) | Instruction pre-fetching of a cache line within a processor | |
| US20070180156A1 (en) | Method for completing IO commands after an IO translation miss | |
| JP2003519836A (en) | A method for managing a set associative cache using parallel and serial reads initiated while a processor is in a wait state | |
| US7111127B2 (en) | System for supporting unlimited consecutive data stores into a cache memory | |
| US20020188805A1 (en) | Mechanism for implementing cache line fills | |
| US20040024976A1 (en) | System and method for reducing access latency to shared program memory | |
| US20090063773A1 (en) | Technique to enable store forwarding during long latency instruction execution | |
| EP1145094A2 (en) | A method for delivering data to an instruction processing unit |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| ENP | Entry into the national phase |
Ref country code: AU Ref document number: 2000 14326 Kind code of ref document: A Format of ref document f/p: F |
|
| AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW |
|
| AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
| DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
| WWE | Wipo information: entry into national phase |
Ref document number: 1999971528 Country of ref document: EP |
|
| REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
| WWP | Wipo information: published in national office |
Ref document number: 1999971528 Country of ref document: EP |
|
| AK | Designated states |
Kind code of ref document: A3 Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW |
|
| AL | Designated countries for regional patents |
Kind code of ref document: A3 Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |