[go: up one dir, main page]

WO2000026741A2 - A method for delivering data to an instruction processing unit - Google Patents

A method for delivering data to an instruction processing unit Download PDF

Info

Publication number
WO2000026741A2
WO2000026741A2 PCT/SE1999/001962 SE9901962W WO0026741A2 WO 2000026741 A2 WO2000026741 A2 WO 2000026741A2 SE 9901962 W SE9901962 W SE 9901962W WO 0026741 A2 WO0026741 A2 WO 0026741A2
Authority
WO
WIPO (PCT)
Prior art keywords
read request
data
read
request
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/SE1999/001962
Other languages
French (fr)
Other versions
WO2000026741A3 (en
Inventor
Matiss Jonas Zervens
Orvar Per Dahl
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Telefonaktiebolaget LM Ericsson AB
Original Assignee
Telefonaktiebolaget LM Ericsson AB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telefonaktiebolaget LM Ericsson AB filed Critical Telefonaktiebolaget LM Ericsson AB
Priority to AU14326/00A priority Critical patent/AU1432600A/en
Priority to EP99971528A priority patent/EP1145094A3/en
Publication of WO2000026741A2 publication Critical patent/WO2000026741A2/en
Anticipated expiration legal-status Critical
Publication of WO2000026741A3 publication Critical patent/WO2000026741A3/en
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3824Operand accessing

Definitions

  • This disclosure relates to the (field of computer systems and more particularly to a method of delivering data to an instruction processing unit in such computer systems.
  • a data storage controller controls access to data storage units (DSUs) that comprise system memory with addressable memory locations, which are generally within a continuous range of predefined logical addresses.
  • DSC data storage controller
  • the DSC processes read and write requests generated by an instruction processing unit (IPU) executing a program that requests data to be read from or written into a particular memory location.
  • IPU instruction processing unit
  • the DSC initiates corresponding read or write cycles over a memory bus, for accessing the addressed memory locations.
  • the rate by which data is transferred, i.e., the data throughput, during each memory cycle is dependent on the bus speed as well as the width of the system's data bus and the length of a memory location, which is defined in terms of data bits, for example, 8-bit, 16-bit, or 32-bit, etc.
  • Various techniques have been devised to increase the data throughput by minimizing the time required to access the system memory. For example, under an scheme known as interleaved memory access, each DSU is addressable over a corresponding internal bus having an assigned range of physical addresses. In this way, a memory accesses over one internal bus can start ahead of completion of a prior access over another internal bus, provided that the memory bus bandwidth supports the execution of parallel memory accesses over the internal busses.
  • a separate memory management unit maps the logical memory addresses into the physical addresses.
  • two back-to-back read requests from a two-word (32 bits) memory location may request data contained in a first nibble and a successive second nibble.
  • Memory caching makes use of a principle known as locality based upon which a program executed by the IPU is written to use a certain portion of the system memory more frequently, either on a temporal or spatial basis.
  • Temporal locality is referred to when a recently accessed memory location has a high probability of soon being accessed again.
  • Spatial locality is referred to when adjacent memory locations to a recently accessed location have a high probability of soon being accessed again.
  • a cache memory is made of a significantly faster memory device, such as Static Random Access Memory (SRAM), that is positioned in close physical proximity to the IPU. By caching the frequently accessed data in the faster cache memory, the overall access time of the system memory is significantly reduced.
  • SRAM Static Random Access Memory
  • Some computers use separate queues for queuing read and write requests pending execution.
  • the dependency on the pending write request is resolved by forcing the execution of the write request up to the pending write request.
  • a write dependency in conventional systems could delay the servicing of a subsequently queued non-dependent read request to another physical location. As illustrated in Fig.
  • a first read request (READ #1) that requests a first sub-variable in a first memory location at a time t l5 the entire length of the memory location is cached at time t 2 , which is within a predefined number of clock cycles from t,.
  • the content of cache memory is checked to determine wether the data variable in response to the first read request is cached or not. If so, the requested second sub-variable is returned during the next clock cycle, since as explained above, a previously cached data variable to the same memory location caches the second sub-variable as well.
  • the second read request must wait until the requested data is delivered to the cache memory.
  • the second read request prevents the execution of a subsequent third read request (READ #3) until time t 3 , even though the third read request may be directed to a separate physical address, thereby creating a long queue of non-dependent read requests. Therefore, there exists a need for a faster method of delivering data to the IPU.
  • one or more data storage units store data variables at a plurality of addressable memory locations.
  • a cache memory caches data variables when they are accessed from the DSUs.
  • a read access queue queues successive read requests for data variables.
  • a data storage controller determines whether there is a cache hit for a data requested by a read request, with the cache hit, for example, being based on a read request to a different variable at the same memory location. Upon a cache hit, if the requested data variable is not cached in the cache memory, the DSC temporarily stores the read request in a register to allow a subsequent read request to be executed.
  • the cache hit condition is indicated when accessing successive variables in the same memory location.
  • the computer system may also be a read priority system having a separate write access queue for queuing write requests modifying data variables stored in the DSU. Under this arrangement, a dependent read request on a pending write request in the write access queue would cause the execution of the pending write request.
  • the dependent read request is temporarily queued in the register, while the pending write request is executed.
  • the DSUs are accessed on a memory interleaved basis.
  • data stored in the DSUs are cached in a cache memory.
  • successive read requests for the data variables are queued in a read access queue.
  • a determination is made as to whether there is a cache hit condition for a data variable requested by a read request.
  • the read request is temporarily stored in a register, and a subsequent read request in the read access queue is executed, while the read request is temporarily stored.
  • a method for delivering data variable stored in one or more DSUs queues successive read requests for data variables and caches the data variables requested by the read requests in a cache memory. Then, a first read request is executed. A determination is made as to whether a cache hit condition exists for a subsequent second read request as a result of executing the first read request. If so, the second read request is temporarily stored in a register, if the data variable requested by the first read request is not cached. While the second read request is temporarily stored in the register, a third subsequent read request is executed.
  • the step of executing the first read request generates a read cycle and indicates a "no data" condition in the cache, while the read cycle has not completed.
  • the second read request is temporarily stored in the register upon determination of the cache hit condition and "no data" condition.
  • the first read and second read requests may request data for different variable within the same memory location.
  • FIG. 1 is a timing diagram for delivering data according to the prior art.
  • FIG. 2 is a block diagram of a computer system that advantageously incorporates the present invention.
  • FIG. 3 is a timing diagram for delivering data according to the present invention.
  • FIG. 4 is a block diagram of a DSC that is incorporated in the computer system of FIG. 2.
  • the computer system 10 is a telephony computer system providing switching control for a public system telephone network (PSTN) 12.
  • PSTN public system telephone network
  • the system 10 operates under the control of an Instruction Processor Unit (IPU) 14 that exchanges data stored in a plurality of interleaved Data Storage Units (DSU) 16 by executing a program that generates memory access requests, including read requests and write requests.
  • IPU Instruction Processor Unit
  • DSU interleaved Data Storage Units
  • a read request requests data variables or sub- variables from a specified memory location, and a write request modifies data variables or sub-variables in the same memory location.
  • each memory location stores 32 bits of data that are addressable by a 32-bit address.
  • the interleaved arrangement of the DSUs 16 allows for data access to one DSU to start, while an access to another DSU is continuing.
  • An Instruction Que Controller (IQC) 18 within the IPU is responsible for sequencing the requests and providing them to a Data Storage Handler (DSC) 20.
  • the DSC 20 is responsible for generating memory cycles over a memory bus 22.
  • the DSC 20 includes a cache memory that uses the locality principle to speed up the delivery of data to the IPU 14.
  • a subsequent second read request that produces a cache hit based on the pendency of the first read request, for example, a read request to a different data variable at the same memory location, is temporarily stored to allow a third subsequent read request to be processed, as illustrated by Fig. 3.
  • the second read request is temporarily stored in a pending cache hit (PCH) register, to allow the third read request to be executed.
  • PCH pending cache hit
  • the DSC 20 saves the information regarding which ongoing read request would deliver the requested data and from which cache memory position the data can be retrieved.
  • the PCH register recognizes that data for a pending hit has been cached, it autonomously asserts that data is ready for delivery and the second read request can deliver the requested data. Therefore, once the data corresponding to the first read request is cached in the memory, the data requested by the second read request, which may be a part of the cached data, is returned to the IPU 14 during a subsequent clock cycle.
  • the computer system 10 is designed as a read priority system, where the read requests have priority over the write requests.
  • the read priority arrangement situations arises when a read request becomes dependent on a pending write request. In this situation, the pending write request is given priority over the pending read requests.
  • the system 10 employs well known pipelining and prefetching techniques for executing the IPU instructions. Under the pre-fetching technique, newly arrived instructions are fetched prior to the execution of a previous instruction, thereby increasing execution efficiency. Under the pipeline execution technique, the IPU instructions are subdivided into smaller sub-tasks, with each sub-task being performed by a corresponding register.
  • the ADD instruction For executing an ADD instruction, for example, the ADD instruction is fetched from an Instruction Storage Unit (not shown), decoded by an instruction decoder (not shown), and processed in an Arithmatic Logic Unit (ALU) (not shown).
  • ALU Arithmatic Logic Unit
  • corresponding registers separately perform the pre-fetching function, decoding function and ALU function, thereby performing multiple ADD functions substantially simultaneously. Accordingly, the computer system 10 uses separate read and write queues in the DSC 20, to implement these techniques.
  • the DSC 20 includes a multiple-element Read Access Queue (RAQ) 26 that stores IQC-generated read requests for reading data variables from specified DSU memory locations. Also, the DSC 20 includes a Write Access Queue (WAQ) 38 that the queues generated write accesses.
  • the IQC 18 may also flush the RAQ 26, clearing some or all of its elements.
  • the IQC 18 has an internal buffer (not shown) that is of equal length to the RAQ 26, to prevent it from becoming full.
  • the RAQ 26 is an 8-element queue with each element having 46 bits as defined by Table 1 below.
  • PN is a position valid flag that indicates whether a RAQ element valid or not. For example, when a read request is flushed, PV is reset.
  • Tag is an access sequence number assigned by the IQC 18 to each read request. Address specifies the memory location from which a data variable or sub-variable to be read. As mentioned above, in the computer system 10, the read requests have the highest priority and usually 'pass through' the RAQ 26 unimpeded, unless one becomes dependent on an un-executed write requets.
  • MTAG is a match tag assigned to each write request and is returned to a corresponding RAQ element, when a read request becomes dependent on a pending write request.
  • a forced write cycle is started to write the data associated with the pending write request into the DSU 16 and the cache memory 24, which in the exemplary embodiment of the present invention, has 16 positions.
  • the each memory 24 is fully associative, using the hit logic 28 for addressing purpose.
  • the cache uses a FIFO algorithm for replacing an earliest requested data with a latest requested data. Each cache position is 71 bits wide, as shown in Table 2.
  • Each cache position contains an address and a tag for mapping the address with the delivered data from the DSUs 16.
  • all cached data are designated as valid, which is indicated by a DV flag.
  • the DV flag for a corresponding cache position where the requested memory location data is to be delivered indicates a "no data” condition.
  • the DV flag is reset to indicate that the cache memory 24 position does not contain any data, but its corresponding read request is under execution.
  • the read request when a subsequent read request from the RAQ 26 gets a hit in a "no data" position, the read request is transferred to one of the positions of a Pending Cache Hit (PCH) register 30, which temporarily stores the read request until the pending read cycle caches the requested data variable. While the read requets is stored in the PCH register 30, a subsequent read request may proceed to retrieve data from the cache memory 24. In this way, the present invention prevents a queued read request in the RAQ 26 from being blocked while a previous request is waiting for completion of a pending read cycle, especially when reading successive data variables within the same memory location.
  • PCH Pending Cache Hit
  • a position in the cache memory 24 is reserved by storing the position to which the modified data of the pending write request is to be returned.
  • the DV flag for the position is reset.
  • NR (NO READ) flag is used to indicate whether a cached data is dirty or not. That is whether the data stored in a corresponding cache position is about to be invalidated by a write request. In case a subsequent read request is directed at a dirty cache position, a cache hit is not indicated. In this way, read accesses stored in the PCH register make use of the dirty data, while subsequent read requests use clean data.
  • the PCH register 30 is an 8 element register, with each element having a bit content as shown below in Table 3.
  • a 17-element Data Store Queue (DSQ) 42 keeps track of executing memory accesses and memory refresh functions. Also, the DSQ detects collisions, concurrently matching addresses from the RAQ 26, WAQ 38 and the refresh function. Preferably, the length of the DSQs is selected to accommodate the longest cycle time in the DSU. Mapping of the cached data and returned data with a read request is handled by "Tag", CAPTAG and RATAG. CAPTAG is used for mapping a read request with a delivered data to the cache from the DSU. CAPTAG is identical to the "Tag" shown in Table 4 as it corresponds to an ongoing read request.
  • the CAPTAG is used to address the cache to associate a returned data with RATAG, which is provided from the DSQ 42.
  • RATAG is returned along with the returned data to the IQC to associate a read request with the delivered data.
  • the present invention reduces both the number of main memory accesses needed and the number of stalled cycles for subsequent read accesses.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

A computer system temporarily stores a read request that results in a cache hit based on a previous read request that has not cached a requested data in a cache memory. While temporarily stored, a subsequent read request may be executed, without being stalled due to the pendency of an on going memory cycle. After the requested data from the previous read request is cached, the stored read request returns its corresponding the requested data.

Description

A METHOD FOR DELIVERING DATA TO AN INSTRUCTION PROCESSING UNIT
TECHNICAL FIELD:
This disclosure relates to the (field of computer systems and more particularly to a method of delivering data to an instruction processing unit in such computer systems.
BACKGROUND:
Under a typical computer system architecture, during read and write cycles, a data storage controller (DSC) controls access to data storage units (DSUs) that comprise system memory with addressable memory locations, which are generally within a continuous range of predefined logical addresses. For accessing the system memory, the DSC processes read and write requests generated by an instruction processing unit (IPU) executing a program that requests data to be read from or written into a particular memory location. Upon receipt of the requests, the DSC initiates corresponding read or write cycles over a memory bus, for accessing the addressed memory locations. The rate by which data is transferred, i.e., the data throughput, during each memory cycle is dependent on the bus speed as well as the width of the system's data bus and the length of a memory location, which is defined in terms of data bits, for example, 8-bit, 16-bit, or 32-bit, etc.
Each memory cycle, read or write, expends a certain number of clock cycles. Because the performance of a computer system is highly dependent on the data throughput, it is necessary to m--ximize the data transfer rate over the memory bus, ideally, making it reach the full system clock speed. Various techniques have been devised to increase the data throughput by minimizing the time required to access the system memory. For example, under an scheme known as interleaved memory access, each DSU is addressable over a corresponding internal bus having an assigned range of physical addresses. In this way, a memory accesses over one internal bus can start ahead of completion of a prior access over another internal bus, provided that the memory bus bandwidth supports the execution of parallel memory accesses over the internal busses. Usually, under the interleaved memory access arrangement, a separate memory management unit (MMU) maps the logical memory addresses into the physical addresses.
Although the interleaved memory access improves data throughput, sometimes the execution of the program requires reading data from different portions of the same memory location. For example, two back-to-back read requests from a two-word (32 bits) memory location may request data contained in a first nibble and a successive second nibble.
Memory caching makes use of a principle known as locality based upon which a program executed by the IPU is written to use a certain portion of the system memory more frequently, either on a temporal or spatial basis. Temporal locality is referred to when a recently accessed memory location has a high probability of soon being accessed again. Spatial locality is referred to when adjacent memory locations to a recently accessed location have a high probability of soon being accessed again. Although substantially smaller in size than the system memory, a cache memory is made of a significantly faster memory device, such as Static Random Access Memory (SRAM), that is positioned in close physical proximity to the IPU. By caching the frequently accessed data in the faster cache memory, the overall access time of the system memory is significantly reduced.
In a computer system that uses data variables shorter than the full length of a memory location, the spatial and temporal locality become effectively the same, since data variables are generally cached in full length (for example, 32-bits). This is true even if the read requests are for data variables that are less than the full length, for example, sub-variables comprised of two bits. In this case, a read access to a two-bit sub-variable results in caching the entire data at the specified memory location. As discussed above, under the principal of locality, there is a high probability that a subsequent read request would read an adjacently stored sub-variable of the same size. By executing a single read cycle that caches the entire length of a memory location, multiple read requests for sub-variables at the same memory location may be serviced directly from the cache memory, without initiating additional read cycles over the memory bus.
Some computers use separate queues for queuing read and write requests pending execution. In systems that provide for read priority where read requests have priority over write requests, situations arises when a read request may become dependent on a pending write request that is queued in a path separate from the path that holds read requests. For servicing a dependent read, the dependency on the pending write request is resolved by forcing the execution of the write request up to the pending write request. A write dependency in conventional systems, however, could delay the servicing of a subsequently queued non-dependent read request to another physical location. As illustrated in Fig. 1, in response to a first read request (READ #1) that requests a first sub-variable in a first memory location at a time tl5 the entire length of the memory location is cached at time t2, which is within a predefined number of clock cycles from t,. Before starting a subsequent second read request (READ #2) requesting a second sub-variable at the same memory location, the content of cache memory is checked to determine wether the data variable in response to the first read request is cached or not. If so, the requested second sub-variable is returned during the next clock cycle, since as explained above, a previously cached data variable to the same memory location caches the second sub-variable as well. If, however, the first read request is still pending, the second read request must wait until the requested data is delivered to the cache memory. As a result, the second read request prevents the execution of a subsequent third read request (READ #3) until time t3, even though the third read request may be directed to a separate physical address, thereby creating a long queue of non-dependent read requests. Therefore, there exists a need for a faster method of delivering data to the IPU.
Summary Of the Invention
Briefly, according to one aspect of the present invention embodied in a computer system, one or more data storage units (DSU) store data variables at a plurality of addressable memory locations. A cache memory caches data variables when they are accessed from the DSUs. A read access queue queues successive read requests for data variables. A data storage controller (DSC) determines whether there is a cache hit for a data requested by a read request, with the cache hit, for example, being based on a read request to a different variable at the same memory location. Upon a cache hit, if the requested data variable is not cached in the cache memory, the DSC temporarily stores the read request in a register to allow a subsequent read request to be executed.
According to some of the more detailed features of the present invention, the cache hit condition is indicated when accessing successive variables in the same memory location. The computer system may also be a read priority system having a separate write access queue for queuing write requests modifying data variables stored in the DSU. Under this arrangement, a dependent read request on a pending write request in the write access queue would cause the execution of the pending write request. According to one of the features of the present invention, the dependent read request is temporarily queued in the register, while the pending write request is executed. According to another feature of the invention, the DSUs are accessed on a memory interleaved basis. According to another aspect of the present invention embodied in a method for delivering data variables stored in one or more DSUs, data stored in the DSUs are cached in a cache memory. According to this method, successive read requests for the data variables are queued in a read access queue. A determination is made as to whether there is a cache hit condition for a data variable requested by a read request. Upon a cache hit condition, if the requested data variable is not cached, the read request is temporarily stored in a register, and a subsequent read request in the read access queue is executed, while the read request is temporarily stored. According to yet another aspect of the present invention, a method for delivering data variable stored in one or more DSUs queues successive read requests for data variables and caches the data variables requested by the read requests in a cache memory. Then, a first read request is executed. A determination is made as to whether a cache hit condition exists for a subsequent second read request as a result of executing the first read request. If so, the second read request is temporarily stored in a register, if the data variable requested by the first read request is not cached. While the second read request is temporarily stored in the register, a third subsequent read request is executed. In an exemplary embodiment, the step of executing the first read request generates a read cycle and indicates a "no data" condition in the cache, while the read cycle has not completed. According to this embodiment, the second read request is temporarily stored in the register upon determination of the cache hit condition and "no data" condition. Under this aspect, the first read and second read requests may request data for different variable within the same memory location.
Brief Description of drawings
FIG. 1 is a timing diagram for delivering data according to the prior art. FIG. 2 is a block diagram of a computer system that advantageously incorporates the present invention.
FIG. 3 is a timing diagram for delivering data according to the present invention. FIG. 4 is a block diagram of a DSC that is incorporated in the computer system of FIG. 2.
Detailed Description of the Preferred Embodiment
Referring to FIG. 2, a block diagram of a computer system 10 that advantageously incorporates the present invention is shown. In the exemplary embodiment, the computer system 10 is a telephony computer system providing switching control for a public system telephone network (PSTN) 12. The system 10 operates under the control of an Instruction Processor Unit (IPU) 14 that exchanges data stored in a plurality of interleaved Data Storage Units (DSU) 16 by executing a program that generates memory access requests, including read requests and write requests. A read request requests data variables or sub- variables from a specified memory location, and a write request modifies data variables or sub-variables in the same memory location. In the exemplary embodiment of the invention, each memory location stores 32 bits of data that are addressable by a 32-bit address. As described above, the interleaved arrangement of the DSUs 16 allows for data access to one DSU to start, while an access to another DSU is continuing.
An Instruction Que Controller (IQC) 18 within the IPU is responsible for sequencing the requests and providing them to a Data Storage Handler (DSC) 20. The DSC 20 is responsible for generating memory cycles over a memory bus 22. As described later in deteail, the DSC 20 includes a cache memory that uses the locality principle to speed up the delivery of data to the IPU 14. According to one aspect of the present invention, while data requested from a memory location in response to a pending first read request has not returned to the cache memory 24, a subsequent second read request that produces a cache hit based on the pendency of the first read request, for example, a read request to a different data variable at the same memory location, is temporarily stored to allow a third subsequent read request to be processed, as illustrated by Fig. 3. More specifically, the second read request is temporarily stored in a pending cache hit (PCH) register, to allow the third read request to be executed. In this way, the third read request and those subsequent thereto do not have to wait for the execution of the first and second read requests to complete. The DSC 20 saves the information regarding which ongoing read request would deliver the requested data and from which cache memory position the data can be retrieved. When the PCH register recognizes that data for a pending hit has been cached, it autonomously asserts that data is ready for delivery and the second read request can deliver the requested data. Therefore, once the data corresponding to the first read request is cached in the memory, the data requested by the second read request, which may be a part of the cached data, is returned to the IPU 14 during a subsequent clock cycle.
In order to provide the required telephony services, the computer system 10 is designed as a read priority system, where the read requests have priority over the write requests. Under the read priority arrangement, situations arises when a read request becomes dependent on a pending write request. In this situation, the pending write request is given priority over the pending read requests. Preferably, the system 10 employs well known pipelining and prefetching techniques for executing the IPU instructions. Under the pre-fetching technique, newly arrived instructions are fetched prior to the execution of a previous instruction, thereby increasing execution efficiency. Under the pipeline execution technique, the IPU instructions are subdivided into smaller sub-tasks, with each sub-task being performed by a corresponding register. For executing an ADD instruction, for example, the ADD instruction is fetched from an Instruction Storage Unit (not shown), decoded by an instruction decoder (not shown), and processed in an Arithmatic Logic Unit (ALU) (not shown). In order to execute multiple ADD instructions in a pipelined manner, corresponding registers separately perform the pre-fetching function, decoding function and ALU function, thereby performing multiple ADD functions substantially simultaneously. Accordingly, the computer system 10 uses separate read and write queues in the DSC 20, to implement these techniques.
Referring to FIG. 4 a block diagram of the DSC 20 of the present invention is shown. For queuing the read requests, the DSC 20 includes a multiple-element Read Access Queue (RAQ) 26 that stores IQC-generated read requests for reading data variables from specified DSU memory locations. Also, the DSC 20 includes a Write Access Queue (WAQ) 38 that the queues generated write accesses. The IQC 18 may also flush the RAQ 26, clearing some or all of its elements. Preferably, the IQC 18 has an internal buffer (not shown) that is of equal length to the RAQ 26, to prevent it from becoming full. In the exemplary embodiment, the RAQ 26 is an 8-element queue with each element having 46 bits as defined by Table 1 below.
Table 1 Content of RAQ
Figure imgf000010_0001
PN is a position valid flag that indicates whether a RAQ element valid or not. For example, when a read request is flushed, PV is reset. Tag is an access sequence number assigned by the IQC 18 to each read request. Address specifies the memory location from which a data variable or sub-variable to be read. As mentioned above, in the computer system 10, the read requests have the highest priority and usually 'pass through' the RAQ 26 unimpeded, unless one becomes dependent on an un-executed write requets. MTAG is a match tag assigned to each write request and is returned to a corresponding RAQ element, when a read request becomes dependent on a pending write request. Under this condition, which is indicated by MTAG and MTAG valid (MV) flag, a forced write cycle is started to write the data associated with the pending write request into the DSU 16 and the cache memory 24, which in the exemplary embodiment of the present invention, has 16 positions.
For each one of the read requests arriving from the RAQ 26, a determination is made as to whether the requested data is cached or not. If cached, a hit logic 28 in the cache memory 24 indicates a hit condition for the requested data. If not, a read cycle is initiated over the memory bus 22 to deliver the entire length of the requested data variable, i.e., 32 bits, to the cache memory 24. In the exemplary embodiment of the invention, the each memory 24 is fully associative, using the hit logic 28 for addressing purpose. The cache uses a FIFO algorithm for replacing an earliest requested data with a latest requested data. Each cache position is 71 bits wide, as shown in Table 2.
Table 2 Contents of Cache Position
Figure imgf000011_0001
Each cache position contains an address and a tag for mapping the address with the delivered data from the DSUs 16. Under the exemplary arrangement, all cached data are designated as valid, which is indicated by a DV flag. When a write access modifies a data variable in a DSU 16, then the cached data is invalidated. Under the present invention, when a "no cache hit" condition starts a read cycle over the memory bus 22, the DV flag for a corresponding cache position where the requested memory location data is to be delivered indicates a "no data" condition. Until data from the requested memory location is delivered to the cache position, the DV flag is reset to indicate that the cache memory 24 position does not contain any data, but its corresponding read request is under execution. As described above, in the present invention, when a subsequent read request from the RAQ 26 gets a hit in a "no data" position, the read request is transferred to one of the positions of a Pending Cache Hit (PCH) register 30, which temporarily stores the read request until the pending read cycle caches the requested data variable. While the read requets is stored in the PCH register 30, a subsequent read request may proceed to retrieve data from the cache memory 24. In this way, the present invention prevents a queued read request in the RAQ 26 from being blocked while a previous request is waiting for completion of a pending read cycle, especially when reading successive data variables within the same memory location.
When a forced write cycle is started based on a dependent read request, a position in the cache memory 24 is reserved by storing the position to which the modified data of the pending write request is to be returned. The DV flag for the position is reset. When the dependent read request reaches the bottom of the RAQ 26, a hit on a "no data" position is detected and the read request is moved to the PCH register 30. As a result, compared to a non-dependent read request, the present invention reduces the number of clock cycles for servicing a dependent read request.
NR (NO READ) flag is used to indicate whether a cached data is dirty or not. That is whether the data stored in a corresponding cache position is about to be invalidated by a write request. In case a subsequent read request is directed at a dirty cache position, a cache hit is not indicated. In this way, read accesses stored in the PCH register make use of the dirty data, while subsequent read requests use clean data.
In the exemplary embodiment of the invention, the PCH register 30 is an 8 element register, with each element having a bit content as shown below in Table 3. Table 3 Content of PCH register 30 element
Figure imgf000013_0001
A 17-element Data Store Queue (DSQ) 42 keeps track of executing memory accesses and memory refresh functions. Also, the DSQ detects collisions, concurrently matching addresses from the RAQ 26, WAQ 38 and the refresh function. Preferably, the length of the DSQs is selected to accommodate the longest cycle time in the DSU. Mapping of the cached data and returned data with a read request is handled by "Tag", CAPTAG and RATAG. CAPTAG is used for mapping a read request with a delivered data to the cache from the DSU. CAPTAG is identical to the "Tag" shown in Table 4 as it corresponds to an ongoing read request. Once the ongoing read request is completed, the CAPTAG is used to address the cache to associate a returned data with RATAG, which is provided from the DSQ 42. RATAG is returned along with the returned data to the IQC to associate a read request with the delivered data.
From the foregoing description it would be appreciated that the present invention reduces both the number of main memory accesses needed and the number of stalled cycles for subsequent read accesses.

Claims

Claims:
1. A computer system, comprising: at least one data storage unit (DSU) that stores data variables at a plurality of addressable memory locations; a cache memory that caches the data variables stored in the DSU; and a data storage controller DSC that determines whether there is a cache hit for a data requested by a read request from a specified memory location; wherein the cache controller upon a cache hit condition temporarily stores the read request in a register, if the requested data variable is not cached in the cache memory, to allow a subsequent read request to be executed.
2. The computer system of claim 1 further including a read access queue that queues corresponding read requests for the data variables stored in the DSU
3. The computer system of claim 1, wherein the cache hit condition is indicated based on a previous read request to a different variable in the specified memory location.
4. The computer of claim 1 further including a separate write access queue that queues write requests modifying the data variables stored in the DSU, wherein a dependent read request on a pending write request in the write access queue causes the pending write request to be executed and wherein the dependent read request is temporarily stored in the register while the pending write request is executed.
5. The computer of claim 1, wherein the at least one DSU is accessed on a memory interleaved basis.
6. In a computer system that caches data variables stored in one or more DSU's , a method for delivering data variables comprising the steps of: generating successive read requests for the data variables; determining whether there is a cache hit condition for a data variable requested by a read request at a specified memory location; upon a cache hit condition, temporarily storing the read request in a register, if the requested data variable is not cached; and executing a subsequent read request in the read access queue, while the read request is temporarily stored.
7. The method of claim 6, wherein the cache hit condition is indicated based on a previous read request to a different variable in the specified memory location.
8. The method of claim 6 further including the steps of: queuing write requests modifying data variables stored in the DSU seperately from the read request; determining whether a read request is dependent on a queued pending write requests; executing the pending write request; and temporarily storing the dependent read request in the register, while the pending write request is executed.
9. The method of claim 6 further including the step of accessing the at least one DSU on a memory interleaved basis.
10. In a computer system, a method for delivering data variables stored in one or more DSUs, comprising the steps of: generating read requests for the data variables; caching the data variables requested by the read requests in a cache memory; executing a first read request; temporarily storing a second read request subsequent to the first read request in a register if the data variable requested by the first read request is not cached; and executing a third read request subsequent to the second read request, while the second read request is temporarily stored in the register
11. The method of claim 10, wherein the step of executing the first read request comprises the steps of generating a read cycle and indicating a "no data" condition in the cache while the read cycle has not completed, wherein the second read request is temporarily stored in the register upon determination of the cache hit condition and "no data" condition.
12. The method of claim 10 further including the step of accessing the DSU on a memory interleaved basis.
13. The method of claim 10, wherein the first read and second read requests request data for variables at the same memory location.
PCT/SE1999/001962 1998-10-30 1999-11-01 A method for delivering data to an instruction processing unit Ceased WO2000026741A2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
AU14326/00A AU1432600A (en) 1998-10-30 1999-11-01 A method for delivering data to an instruction processing unit
EP99971528A EP1145094A3 (en) 1998-10-30 1999-11-01 A method for delivering data to an instruction processing unit

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US18286598A 1998-10-30 1998-10-30
US09/182,865 1998-10-30

Publications (2)

Publication Number Publication Date
WO2000026741A2 true WO2000026741A2 (en) 2000-05-11
WO2000026741A3 WO2000026741A3 (en) 2002-01-10

Family

ID=22670389

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SE1999/001962 Ceased WO2000026741A2 (en) 1998-10-30 1999-11-01 A method for delivering data to an instruction processing unit

Country Status (3)

Country Link
EP (1) EP1145094A3 (en)
AU (1) AU1432600A (en)
WO (1) WO2000026741A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8255669B2 (en) 2008-01-30 2012-08-28 International Business Machines Corporation Method and apparatus for thread priority control in a multi-threaded processor based upon branch issue information including branch confidence information

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5787465A (en) * 1994-07-01 1998-07-28 Digital Equipment Corporation Destination indexed miss status holding registers
US5758178A (en) * 1996-03-01 1998-05-26 Hewlett-Packard Company Miss tracking system and method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8255669B2 (en) 2008-01-30 2012-08-28 International Business Machines Corporation Method and apparatus for thread priority control in a multi-threaded processor based upon branch issue information including branch confidence information

Also Published As

Publication number Publication date
EP1145094A2 (en) 2001-10-17
EP1145094A3 (en) 2002-03-27
AU1432600A (en) 2000-05-22
WO2000026741A3 (en) 2002-01-10

Similar Documents

Publication Publication Date Title
US6185660B1 (en) Pending access queue for providing data to a target register during an intermediate pipeline phase after a computer cache miss
EP0795820B1 (en) Combined prefetch buffer and instructions cache memory system and method for providing instructions to a central processing unit utilizing said system.
US5353426A (en) Cache miss buffer adapted to satisfy read requests to portions of a cache fill in progress without waiting for the cache fill to complete
CA1325285C (en) Method and apparatus for ordering and queueing multiple memory requests
JP4486750B2 (en) Shared cache structure for temporary and non-temporary instructions
JP3549079B2 (en) Instruction prefetch method of cache control
EP0457403A2 (en) Multilevel instruction cache, method for using said cache, method for compiling instructions for said cache and micro computer system using such a cache
EP0097790A2 (en) Apparatus for controlling storage access in a multilevel storage system
JP2010532057A (en) Cache and method for multi-threaded and multi-core systems
WO2004049171A2 (en) Microprocessor including cache memory supporting multiple accesses per cycle
US6715035B1 (en) Cache for processing data in a memory controller and a method of use thereof to reduce first transfer latency
KR100234647B1 (en) Data processing system with instruction prefetch
JP3763579B2 (en) Apparatus and method for reducing read miss latency by predicting instruction read first
US7461205B2 (en) Performing useful computations while waiting for a line in a system with a software implemented cache
US20180095893A1 (en) Queuing memory access requests
US5938761A (en) Method and apparatus for branch target prediction
US20070180158A1 (en) Method for command list ordering after multiple cache misses
US5860150A (en) Instruction pre-fetching of a cache line within a processor
US20070180156A1 (en) Method for completing IO commands after an IO translation miss
JP2003519836A (en) A method for managing a set associative cache using parallel and serial reads initiated while a processor is in a wait state
US7111127B2 (en) System for supporting unlimited consecutive data stores into a cache memory
US20020188805A1 (en) Mechanism for implementing cache line fills
US20040024976A1 (en) System and method for reducing access latency to shared program memory
US20090063773A1 (en) Technique to enable store forwarding during long latency instruction execution
EP1145094A2 (en) A method for delivering data to an instruction processing unit

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref country code: AU

Ref document number: 2000 14326

Kind code of ref document: A

Format of ref document f/p: F

AK Designated states

Kind code of ref document: A2

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWE Wipo information: entry into national phase

Ref document number: 1999971528

Country of ref document: EP

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

WWP Wipo information: published in national office

Ref document number: 1999971528

Country of ref document: EP

AK Designated states

Kind code of ref document: A3

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG