WO2000026741A2

WO2000026741A2 - A method for delivering data to an instruction processing unit

Info

Publication number: WO2000026741A2
Application number: PCT/SE1999/001962
Authority: WO
Inventors: Matiss Jonas Zervens; Orvar Per Dahl
Original assignee: Telefonaktiebolaget LM Ericsson AB
Current assignee: Telefonaktiebolaget LM Ericsson AB
Priority date: 1998-10-30
Filing date: 1999-11-01
Publication date: 2000-05-11
Anticipated expiration: 2001-04-30
Also published as: EP1145094A2; EP1145094A3; AU1432600A; WO2000026741A3

Abstract

A computer system temporarily stores a read request that results in a cache hit based on a previous read request that has not cached a requested data in a cache memory. While temporarily stored, a subsequent read request may be executed, without being stalled due to the pendency of an on going memory cycle. After the requested data from the previous read request is cached, the stored read request returns its corresponding the requested data.

Description

A METHOD FOR DELIVERING DATA TO AN INSTRUCTION PROCESSING UNIT

TECHNICAL FIELD:

This disclosure relates to the (field of computer systems and more particularly to a method of delivering data to an instruction processing unit in such computer systems.

BACKGROUND:

Under a typical computer system architecture, during read and write cycles, a data storage controller (DSC) controls access to data storage units (DSUs) that comprise system memory with addressable memory locations, which are generally within a continuous range of predefined logical addresses. For accessing the system memory, the DSC processes read and write requests generated by an instruction processing unit (IPU) executing a program that requests data to be read from or written into a particular memory location. Upon receipt of the requests, the DSC initiates corresponding read or write cycles over a memory bus, for accessing the addressed memory locations. The rate by which data is transferred, i.e., the data throughput, during each memory cycle is dependent on the bus speed as well as the width of the system's data bus and the length of a memory location, which is defined in terms of data bits, for example, 8-bit, 16-bit, or 32-bit, etc.

Each memory cycle, read or write, expends a certain number of clock cycles. Because the performance of a computer system is highly dependent on the data throughput, it is necessary to m--ximize the data transfer rate over the memory bus, ideally, making it reach the full system clock speed. Various techniques have been devised to increase the data throughput by minimizing the time required to access the system memory. For example, under an scheme known as interleaved memory access, each DSU is addressable over a corresponding internal bus having an assigned range of physical addresses. In this way, a memory accesses over one internal bus can start ahead of completion of a prior access over another internal bus, provided that the memory bus bandwidth supports the execution of parallel memory accesses over the internal busses. Usually, under the interleaved memory access arrangement, a separate memory management unit (MMU) maps the logical memory addresses into the physical addresses.

Although the interleaved memory access improves data throughput, sometimes the execution of the program requires reading data from different portions of the same memory location. For example, two back-to-back read requests from a two-word (32 bits) memory location may request data contained in a first nibble and a successive second nibble.

Memory caching makes use of a principle known as locality based upon which a program executed by the IPU is written to use a certain portion of the system memory more frequently, either on a temporal or spatial basis. Temporal locality is referred to when a recently accessed memory location has a high probability of soon being accessed again. Spatial locality is referred to when adjacent memory locations to a recently accessed location have a high probability of soon being accessed again. Although substantially smaller in size than the system memory, a cache memory is made of a significantly faster memory device, such as Static Random Access Memory (SRAM), that is positioned in close physical proximity to the IPU. By caching the frequently accessed data in the faster cache memory, the overall access time of the system memory is significantly reduced.

In a computer system that uses data variables shorter than the full length of a memory location, the spatial and temporal locality become effectively the same, since data variables are generally cached in full length (for example, 32-bits). This is true even if the read requests are for data variables that are less than the full length, for example, sub-variables comprised of two bits. In this case, a read access to a two-bit sub-variable results in caching the entire data at the specified memory location. As discussed above, under the principal of locality, there is a high probability that a subsequent read request would read an adjacently stored sub-variable of the same size. By executing a single read cycle that caches the entire length of a memory location, multiple read requests for sub-variables at the same memory location may be serviced directly from the cache memory, without initiating additional read cycles over the memory bus.

Some computers use separate queues for queuing read and write requests pending execution. In systems that provide for read priority where read requests have priority over write requests, situations arises when a read request may become dependent on a pending write request that is queued in a path separate from the path that holds read requests. For servicing a dependent read, the dependency on the pending write request is resolved by forcing the execution of the write request up to the pending write request. A write dependency in conventional systems, however, could delay the servicing of a subsequently queued non-dependent read request to another physical location. As illustrated in Fig. 1, in response to a first read request (READ #1) that requests a first sub-variable in a first memory location at a time t_l5 the entire length of the memory location is cached at time t₂, which is within a predefined number of clock cycles from t,. Before starting a subsequent second read request (READ #2) requesting a second sub-variable at the same memory location, the content of cache memory is checked to determine wether the data variable in response to the first read request is cached or not. If so, the requested second sub-variable is returned during the next clock cycle, since as explained above, a previously cached data variable to the same memory location caches the second sub-variable as well. If, however, the first read request is still pending, the second read request must wait until the requested data is delivered to the cache memory. As a result, the second read request prevents the execution of a subsequent third read request (READ #3) until time t₃, even though the third read request may be directed to a separate physical address, thereby creating a long queue of non-dependent read requests. Therefore, there exists a need for a faster method of delivering data to the IPU.

Summary Of the Invention

Briefly, according to one aspect of the present invention embodied in a computer system, one or more data storage units (DSU) store data variables at a plurality of addressable memory locations. A cache memory caches data variables when they are accessed from the DSUs. A read access queue queues successive read requests for data variables. A data storage controller (DSC) determines whether there is a cache hit for a data requested by a read request, with the cache hit, for example, being based on a read request to a different variable at the same memory location. Upon a cache hit, if the requested data variable is not cached in the cache memory, the DSC temporarily stores the read request in a register to allow a subsequent read request to be executed.

According to some of the more detailed features of the present invention, the cache hit condition is indicated when accessing successive variables in the same memory location. The computer system may also be a read priority system having a separate write access queue for queuing write requests modifying data variables stored in the DSU. Under this arrangement, a dependent read request on a pending write request in the write access queue would cause the execution of the pending write request. According to one of the features of the present invention, the dependent read request is temporarily queued in the register, while the pending write request is executed. According to another feature of the invention, the DSUs are accessed on a memory interleaved basis. According to another aspect of the present invention embodied in a method for delivering data variables stored in one or more DSUs, data stored in the DSUs are cached in a cache memory. According to this method, successive read requests for the data variables are queued in a read access queue. A determination is made as to whether there is a cache hit condition for a data variable requested by a read request. Upon a cache hit condition, if the requested data variable is not cached, the read request is temporarily stored in a register, and a subsequent read request in the read access queue is executed, while the read request is temporarily stored. According to yet another aspect of the present invention, a method for delivering data variable stored in one or more DSUs queues successive read requests for data variables and caches the data variables requested by the read requests in a cache memory. Then, a first read request is executed. A determination is made as to whether a cache hit condition exists for a subsequent second read request as a result of executing the first read request. If so, the second read request is temporarily stored in a register, if the data variable requested by the first read request is not cached. While the second read request is temporarily stored in the register, a third subsequent read request is executed. In an exemplary embodiment, the step of executing the first read request generates a read cycle and indicates a "no data" condition in the cache, while the read cycle has not completed. According to this embodiment, the second read request is temporarily stored in the register upon determination of the cache hit condition and "no data" condition. Under this aspect, the first read and second read requests may request data for different variable within the same memory location.

Brief Description of drawings

FIG. 1 is a timing diagram for delivering data according to the prior art. FIG. 2 is a block diagram of a computer system that advantageously incorporates the present invention.

FIG. 3 is a timing diagram for delivering data according to the present invention. FIG. 4 is a block diagram of a DSC that is incorporated in the computer system of FIG. 2.

Detailed Description of the Preferred Embodiment

Referring to FIG. 2, a block diagram of a computer system 10 that advantageously incorporates the present invention is shown. In the exemplary embodiment, the computer system 10 is a telephony computer system providing switching control for a public system telephone network (PSTN) 12. The system 10 operates under the control of an Instruction Processor Unit (IPU) 14 that exchanges data stored in a plurality of interleaved Data Storage Units (DSU) 16 by executing a program that generates memory access requests, including read requests and write requests. A read request requests data variables or sub- variables from a specified memory location, and a write request modifies data variables or sub-variables in the same memory location. In the exemplary embodiment of the invention, each memory location stores 32 bits of data that are addressable by a 32-bit address. As described above, the interleaved arrangement of the DSUs 16 allows for data access to one DSU to start, while an access to another DSU is continuing.

An Instruction Que Controller (IQC) 18 within the IPU is responsible for sequencing the requests and providing them to a Data Storage Handler (DSC) 20. The DSC 20 is responsible for generating memory cycles over a memory bus 22. As described later in deteail, the DSC 20 includes a cache memory that uses the locality principle to speed up the delivery of data to the IPU 14. According to one aspect of the present invention, while data requested from a memory location in response to a pending first read request has not returned to the cache memory 24, a subsequent second read request that produces a cache hit based on the pendency of the first read request, for example, a read request to a different data variable at the same memory location, is temporarily stored to allow a third subsequent read request to be processed, as illustrated by Fig. 3. More specifically, the second read request is temporarily stored in a pending cache hit (PCH) register, to allow the third read request to be executed. In this way, the third read request and those subsequent thereto do not have to wait for the execution of the first and second read requests to complete. The DSC 20 saves the information regarding which ongoing read request would deliver the requested data and from which cache memory position the data can be retrieved. When the PCH register recognizes that data for a pending hit has been cached, it autonomously asserts that data is ready for delivery and the second read request can deliver the requested data. Therefore, once the data corresponding to the first read request is cached in the memory, the data requested by the second read request, which may be a part of the cached data, is returned to the IPU 14 during a subsequent clock cycle.

In order to provide the required telephony services, the computer system 10 is designed as a read priority system, where the read requests have priority over the write requests. Under the read priority arrangement, situations arises when a read request becomes dependent on a pending write request. In this situation, the pending write request is given priority over the pending read requests. Preferably, the system 10 employs well known pipelining and prefetching techniques for executing the IPU instructions. Under the pre-fetching technique, newly arrived instructions are fetched prior to the execution of a previous instruction, thereby increasing execution efficiency. Under the pipeline execution technique, the IPU instructions are subdivided into smaller sub-tasks, with each sub-task being performed by a corresponding register. For executing an ADD instruction, for example, the ADD instruction is fetched from an Instruction Storage Unit (not shown), decoded by an instruction decoder (not shown), and processed in an Arithmatic Logic Unit (ALU) (not shown). In order to execute multiple ADD instructions in a pipelined manner, corresponding registers separately perform the pre-fetching function, decoding function and ALU function, thereby performing multiple ADD functions substantially simultaneously. Accordingly, the computer system 10 uses separate read and write queues in the DSC 20, to implement these techniques.

Referring to FIG. 4 a block diagram of the DSC 20 of the present invention is shown. For queuing the read requests, the DSC 20 includes a multiple-element Read Access Queue (RAQ) 26 that stores IQC-generated read requests for reading data variables from specified DSU memory locations. Also, the DSC 20 includes a Write Access Queue (WAQ) 38 that the queues generated write accesses. The IQC 18 may also flush the RAQ 26, clearing some or all of its elements. Preferably, the IQC 18 has an internal buffer (not shown) that is of equal length to the RAQ 26, to prevent it from becoming full. In the exemplary embodiment, the RAQ 26 is an 8-element queue with each element having 46 bits as defined by Table 1 below.

Table 1 Content of RAQ

PN is a position valid flag that indicates whether a RAQ element valid or not. For example, when a read request is flushed, PV is reset. Tag is an access sequence number assigned by the IQC 18 to each read request. Address specifies the memory location from which a data variable or sub-variable to be read. As mentioned above, in the computer system 10, the read requests have the highest priority and usually 'pass through' the RAQ 26 unimpeded, unless one becomes dependent on an un-executed write requets. MTAG is a match tag assigned to each write request and is returned to a corresponding RAQ element, when a read request becomes dependent on a pending write request. Under this condition, which is indicated by MTAG and MTAG valid (MV) flag, a forced write cycle is started to write the data associated with the pending write request into the DSU 16 and the cache memory 24, which in the exemplary embodiment of the present invention, has 16 positions.

For each one of the read requests arriving from the RAQ 26, a determination is made as to whether the requested data is cached or not. If cached, a hit logic 28 in the cache memory 24 indicates a hit condition for the requested data. If not, a read cycle is initiated over the memory bus 22 to deliver the entire length of the requested data variable, i.e., 32 bits, to the cache memory 24. In the exemplary embodiment of the invention, the each memory 24 is fully associative, using the hit logic 28 for addressing purpose. The cache uses a FIFO algorithm for replacing an earliest requested data with a latest requested data. Each cache position is 71 bits wide, as shown in Table 2.

Table 2 Contents of Cache Position

Each cache position contains an address and a tag for mapping the address with the delivered data from the DSUs 16. Under the exemplary arrangement, all cached data are designated as valid, which is indicated by a DV flag. When a write access modifies a data variable in a DSU 16, then the cached data is invalidated. Under the present invention, when a "no cache hit" condition starts a read cycle over the memory bus 22, the DV flag for a corresponding cache position where the requested memory location data is to be delivered indicates a "no data" condition. Until data from the requested memory location is delivered to the cache position, the DV flag is reset to indicate that the cache memory 24 position does not contain any data, but its corresponding read request is under execution. As described above, in the present invention, when a subsequent read request from the RAQ 26 gets a hit in a "no data" position, the read request is transferred to one of the positions of a Pending Cache Hit (PCH) register 30, which temporarily stores the read request until the pending read cycle caches the requested data variable. While the read requets is stored in the PCH register 30, a subsequent read request may proceed to retrieve data from the cache memory 24. In this way, the present invention prevents a queued read request in the RAQ 26 from being blocked while a previous request is waiting for completion of a pending read cycle, especially when reading successive data variables within the same memory location.

When a forced write cycle is started based on a dependent read request, a position in the cache memory 24 is reserved by storing the position to which the modified data of the pending write request is to be returned. The DV flag for the position is reset. When the dependent read request reaches the bottom of the RAQ 26, a hit on a "no data" position is detected and the read request is moved to the PCH register 30. As a result, compared to a non-dependent read request, the present invention reduces the number of clock cycles for servicing a dependent read request.

NR (NO READ) flag is used to indicate whether a cached data is dirty or not. That is whether the data stored in a corresponding cache position is about to be invalidated by a write request. In case a subsequent read request is directed at a dirty cache position, a cache hit is not indicated. In this way, read accesses stored in the PCH register make use of the dirty data, while subsequent read requests use clean data.

In the exemplary embodiment of the invention, the PCH register 30 is an 8 element register, with each element having a bit content as shown below in Table 3. Table 3 Content of PCH register 30 element

A 17-element Data Store Queue (DSQ) 42 keeps track of executing memory accesses and memory refresh functions. Also, the DSQ detects collisions, concurrently matching addresses from the RAQ 26, WAQ 38 and the refresh function. Preferably, the length of the DSQs is selected to accommodate the longest cycle time in the DSU. Mapping of the cached data and returned data with a read request is handled by "Tag", CAPTAG and RATAG. CAPTAG is used for mapping a read request with a delivered data to the cache from the DSU. CAPTAG is identical to the "Tag" shown in Table 4 as it corresponds to an ongoing read request. Once the ongoing read request is completed, the CAPTAG is used to address the cache to associate a returned data with RATAG, which is provided from the DSQ 42. RATAG is returned along with the returned data to the IQC to associate a read request with the delivered data.

From the foregoing description it would be appreciated that the present invention reduces both the number of main memory accesses needed and the number of stalled cycles for subsequent read accesses.

Claims

Claims:

1. A computer system, comprising: at least one data storage unit (DSU) that stores data variables at a plurality of addressable memory locations; a cache memory that caches the data variables stored in the DSU; and a data storage controller DSC that determines whether there is a cache hit for a data requested by a read request from a specified memory location; wherein the cache controller upon a cache hit condition temporarily stores the read request in a register, if the requested data variable is not cached in the cache memory, to allow a subsequent read request to be executed.

2. The computer system of claim 1 further including a read access queue that queues corresponding read requests for the data variables stored in the DSU

3. The computer system of claim 1, wherein the cache hit condition is indicated based on a previous read request to a different variable in the specified memory location.

4. The computer of claim 1 further including a separate write access queue that queues write requests modifying the data variables stored in the DSU, wherein a dependent read request on a pending write request in the write access queue causes the pending write request to be executed and wherein the dependent read request is temporarily stored in the register while the pending write request is executed.

5. The computer of claim 1, wherein the at least one DSU is accessed on a memory interleaved basis.

6. In a computer system that caches data variables stored in one or more DSU's , a method for delivering data variables comprising the steps of: generating successive read requests for the data variables; determining whether there is a cache hit condition for a data variable requested by a read request at a specified memory location; upon a cache hit condition, temporarily storing the read request in a register, if the requested data variable is not cached; and executing a subsequent read request in the read access queue, while the read request is temporarily stored.

7. The method of claim 6, wherein the cache hit condition is indicated based on a previous read request to a different variable in the specified memory location.

8. The method of claim 6 further including the steps of: queuing write requests modifying data variables stored in the DSU seperately from the read request; determining whether a read request is dependent on a queued pending write requests; executing the pending write request; and temporarily storing the dependent read request in the register, while the pending write request is executed.

9. The method of claim 6 further including the step of accessing the at least one DSU on a memory interleaved basis.

10. In a computer system, a method for delivering data variables stored in one or more DSUs, comprising the steps of: generating read requests for the data variables; caching the data variables requested by the read requests in a cache memory; executing a first read request; temporarily storing a second read request subsequent to the first read request in a register if the data variable requested by the first read request is not cached; and executing a third read request subsequent to the second read request, while the second read request is temporarily stored in the register

11. The method of claim 10, wherein the step of executing the first read request comprises the steps of generating a read cycle and indicating a "no data" condition in the cache while the read cycle has not completed, wherein the second read request is temporarily stored in the register upon determination of the cache hit condition and "no data" condition.

12. The method of claim 10 further including the step of accessing the DSU on a memory interleaved basis.

13. The method of claim 10, wherein the first read and second read requests request data for variables at the same memory location.