CN1320464C

CN1320464C - Method and equipment for maintenance of sharing consistency of cache memory

Info

Publication number: CN1320464C
Application number: CNB2003101198312A
Authority: CN
Inventors: V·彭特科夫斯基; V·加格; N·S·伊耶尔; J·克沙瓦
Original assignee: Intel Corp
Current assignee: Intel Corp
Priority date: 2003-10-23
Filing date: 2003-10-23
Publication date: 2007-06-06
Anticipated expiration: 2023-10-23
Also published as: CN1609823A

Abstract

The present invention relates to a method and a device for sharing the consistency of cache memories, which is used for a multiprocessor of a chip or a multiprocessor system. In an implementing example, a multi-core processor comprises a plurality of processor cores, wherein each core is provided with a special cache memory and a shared cache memory. An inner snoop bus is coupled to each special cache memory and each shared cache memory so as to be convenient for transmitting data from each special cache memory to the other special cache memories and the shared cache memories. In another implementing example, the device comprises a plurality of processor cores and a plurality of cache memories, wherein one of the cache memories maintains cache memory wires in two different modifying states; the first modifying state indicates the recent copy of the modified cache memory wires, and the second modifying state indicates the outdated copy of the modified cache memory wires.

Description

Be used for keeping the conforming method and apparatus of shared cache

Technical field

Present disclosure relates to treatment system and the field of the caching device that is associated.

Background technology

Improve the performance of computer or other treatment systems, normally improve whole handling capacity and/or provide better user to experience. A kind of technology that is used for improving in the instruction total amount that system processes is the quantity of processor in the increase system. Yet, realize that multiprocessing (MP) system is not only that requirement is in parallel with processor usually. But for example, task or program need to be separated, so that they can be crossed over the parallel processing resource and carry out, also need memory consistency system etc.

Along with logic element dwindles because of improving of manufacturing technology, a plurality of processors are integrated in become more practical on the single component, and in fact, many current designs have realized a plurality of processors (" multi-core processor ") at single component. Except integrated and each processor core (processor core) are closely related any Cache of connection, multi-core processor is common integrated certain additional cache memory also, and keeps the uniformity of crossing over the level in the multi-core processor equipment with multiple technologies.

For example, in a processor as prior art, one-level (L1) Cache that is associated with each processor core is realized as writing straight-through (write through) Cache, so receives all modifications so that share secondary (L2) Cache by each L1. Be logged to the formation of L2 Cache from the write operation of each L1. When effective L2 catalogue entry was hit in structure operation, appointment was tried to find out processor and is only used L2 content (clauses and subclauses that comprise potential any queuing) to respond and try to find out. Although it is known using and writing straight-through agreement, compare with use write-back agreement and/or well-known 4 state MESI (revise (Modified), monopolize (Exclusive), share (Shared), invalid (Invalid)) agreement, under some environment, be in a disadvantageous position in performance, write straight-through agreement and can eliminate the needs as the cross-examination of the L1 Cache in the multi-core processor of prior art but use. In the situation that does not have the cross-examination between the L1 Cache, between the L1 Cache, can not provide and try to find out bus, and L1 can not occur to the transmission of L1.

Also carry out the MESI agreement of modification for its L2 Cache as the processor of prior art, the MESI agreement of described modification has three " modification " states, and two " sharing " states and new " mark " state. Amended state M, Me and Mu be corresponding to the modification state, amended exclusive state and amended unsolicited (unsolicited) state. Amended monopolizing with unsolicited status shows that both data are effective. In addition, amended monopolizing with unsolicited status both asked to keep data and produced via special instruction by processor.

In the multi-core processor of another kind as prior art, two L1 Caches are also separated by the L2 Cache. In the processor of described prior art, core logic directly links with L2 Cache control logic, and directly links with special-purpose L1. Thus, the uniformity among L1 and the L2 is searched and can be begun simultaneously; Yet the L1 that the L2 control logic will be associated with the first core and the L1 that is associated with the second core are separated. Therefore, each processor special use and L1 Cache that be associated links no longer mutually. Thus, between the L1 Cache, there is not direct cross-examination, and do not have direct L1 to transmit to the data of L1.

Summary of the invention

In one aspect of the invention, provide a kind of equipment, having comprised:

A plurality of processor cores, each processor core comprises the private cache device, in order to realize first multi-state cache agreement; With

Shared cache, in order to realize second multi-state cache agreement, this second multi-state cache agreement is different from described first multi-state cache agreement.

In another aspect of the present invention, a kind of equipment is provided, comprising:

A plurality of processor cores;

A plurality of Caches, the first Cache in described a plurality of Caches are used for a plurality of cache lines are maintained in state of a plurality of states, and described a plurality of states comprise:

The first modification state is used for showing newly copied at the line of the modification of second Cache of described a plurality of Caches;

The second modification state is for the out-of-date copy of the line that shows modification.

Aspect another, provide a kind of method of the present invention, having comprised:

Try to find out the bus driving in inside to the cycle of trying to find out of a plurality of internal proprietary Caches and shared cache, described shared cache is kept according to first four states cache protocol, and wherein said first four states cache protocol comprises the newly copied state of modification, out-of-date state, shared state and the disarmed state of modification;

Via the cache miss cycle of external bus driving to memory.

Aspect another, provide a kind of system of the present invention, having comprised:

Multi-core processor, described multi-core processor comprises:

A plurality of processors;

A plurality of Caches that are associated;

Shared cache;

Try to find out bus with the inside that described a plurality of Caches are coupled, in order to allow in response to try to find out trying to find out the cycle and between described a plurality of Caches, carry out data and shift on the bus in inside;

The uniformity logic, be used for a plurality of cache lines are maintained in the state of a plurality of states, described a plurality of state comprises newly copied modification state and out-of-date modification state, the first data item that described uniformity logic is used for having the first corresponding address maintains described out-of-date modification state, and will have corresponding two address the second data item and maintain in the described newly copied modification state;

The memory that is coupled with multi-core processor, described memory is used for storing the 3rd data item at place, described the first address and the 4th data item at place, the second address, and described memory selectively is updated in order to described the first data item is copied to described the first address.

Description of drawings

Illustrate by way of example the present invention, but the implementation that the present invention can't help among the figure in the accompanying drawing limits.

Fig. 1 illustrates an embodiment who includes the treatment system of trying to find out bus, and the described bus of trying to find out makes the core Cache coupled to each other and make the core Cache be coupled to shared cache.

Fig. 2 illustrates an embodiment of the system that includes chip multi-processor, and described chip multi-processor has the bus of trying to find out and carries out the shared cache consistency protocol.

Fig. 3 a-3f understands that for example the various states according to the shared and private cache device of an embodiment change.

Fig. 4 illustrates another embodiment of the system that includes chip multi-processor, and described chip multi-processor is carried out a plurality of private cache devices with the multimode agreement, and carries out shared cache with multimode shared cache agreement.

Fig. 5 illustrates and comprises an embodiment who removes the line shared logic.

The specific embodiment

Description subsequently will illustrate the embodiment of the conforming method and apparatus of shared cache that is used for chip multi-processor or multicomputer system. In description subsequently, numerous details proposes of the present invention more thoroughly understanding in order to provide. Yet it will be appreciated by those skilled in the art that is not having can to implement the present invention in the situation of these details yet.

Disclosed some embodiment has used special-purpose and interface is tried to find out in the inside in some example, in order to allow snoop operations and exchanges data between a plurality of internal proprietary Caches and the shared cache. Some embodiment carries out has the cache protocol of two different modification states. These and other embodiment can be used for having in a plurality of processor cores and the private cache device that is associated, the chip multi-processor together with shared cache. Some embodiment provides data inconsistent quick Solution, reduce the stand-by period from the shared cache reading out data to core, and/or reduced the stand-by period in the memory access in the transmission situation of private cache device of private cache device.

Fig. 1 illustrates an embodiment who comprises the treatment system of trying to find out bus 130. In certain embodiments, the treatment system among Fig. 1 can be chip multi-processor. Chip multi-processor is single integrated circuit or the individual module that comprises a plurality of processing cores. Each processes core is to carry out the processor of instruction. The embodiment of Fig. 1 comprises that N is processed core, i.e. core 100-1 to 100-N. Each is processed core 100-1 to 100-N and comprises respectively associated and be coupled to private cache device on it, i.e. core Cache 110-1 to 110-N.

" special use " Cache is the Cache that is associated with one or more processor cores, wherein processor core offers the private cache device with the memory transaction of cacheable memory usually, and just do not trying to find out or inquiring under the environment in cycle, yet other irrelevant processors or processor core offer the private cache device with the cycle as inquiry or listening period usually. Usually, the situation among the embodiment as shown in Figure 1 is such, and single private cache device can be associated with single core. Yet, in certain embodiments, Cache can by a plurality of cores special use.

As shown in Figure 1, the core Cache can be M level Cache, this means that it is the on-chip cache device that these Caches need not, and more can be the on-chip cache device or in the Cache level Cache of higher level, such as second level cache device, three grades of Caches etc. In addition, the core Cache can be the Cache that makes the instruction cache device consistent with data cache, perhaps can make the consistent Cache of instruction cache device or data cache. In addition, the Cache of two separation (for example, instruction and data Cache) can be processed in the same manner. Thus, terminology data, data element, information or message elements are used presentation directives or data interchangeably.

Each Cache 110-1 to 110-N is coupled to and tries to find out on the bus 130. In addition, shared cache 120 (M+1 level Cache) is coupled to and tries to find out on the bus 130. Shared cache 120 also is coupled in the interconnection (interconnect) 140. Interconnection 140 can be multi-point bus or can be the point-to-point interface, in order to be connected with other processors, memory, memory interface or other agencies. Similarly, point-to-point interconnection or multi-point bus type interconnect can be used for trying to find out bus 130 in different embodiment.

In one embodiment, Cache 110-1 to 110-N and shared cache 120 carried out cross-examination or snoop operations, in order to keep data consistent trying to find out bus 130. This dedicated bus or interface be used to trying to find out can alleviate at the congestion information of interconnection on 140. In one embodiment, trying to find out bus 130 is the buses (on-die bus) that are produced on the chip of chip multi-processor place. Therefore, cross-examination can be carried out with high frequency, and described frequency in some cases can be the same with the core operation frequency high. In addition, external reference can be limited in a plurality of cores on the described chip and the request that can't satisfy between described shared cache. In one embodiment, Cache can utilize traditional MESI (revise, monopolize, share, invalid) agreement or other agreements known or that other are available. In other embodiments, such as discussed further below, different multimode agreements can be used for shared cache.

Fig. 2 illustrates the embodiment of the system that comprises chip multi-processor, and described chip multi-processor has the bus of trying to find out, and carries out the shared cache consistency protocol. In the embodiment shown in Figure 2, processor 200 is by interface logic 230 and Memory Controller 240 couplings. Memory Controller 240 and memory 250 couplings. Memory 250 can be stored and will be cached in the different cache device by the data of processor 200 uses and with it.

Processor 200 comprises interface logic 230, in order to information is sent to Memory Controller 240 (or certain other buses agency). Multiple interface logic known or other available types all can be used for different assemblies is coupled to together with interconnection. Processor 200 further comprises the first core 205-1 and the second core 205-2. The first core has the core Cache 210-1 that is associated, and the second core has the core Cache 210-2 that is associated. In the present embodiment, the core Cache operates according to the MESI agreement, but in other embodiments, can operate with other agreements. Core Cache 210-1 and 210-2 have respectively control logic 215-1 and 215-2, in order to keep uniformity and process conventional Cache control and communication task. Cache 210-1 and 210-2 both are coupled to shared cache 220, and described shared cache 220 also comprises the control logic 225 of himself. Cache 210-1,210-2 and 220 are coupled to and try to find out bus 227, so that as carrying out cross-examination and exchanges data according to Fig. 1 as described in front.

In the present embodiment, shared cache 220 operates according to the shared cache agreement. According to the shared cache agreement, can cache line (cache line) be described with two modification states. Can use and revise out-of-date (modified stale, MStale) state and represent situation when shared cache 220 has out-of-date copy of cache line. When revising cache line for one among processor cache 210-1 or the 210-2, shared cache often has the out-of-date copy of the line of modification. Can use and revise newly copied (modified most recent copy, MMRC) state and represent situation when shared cache comprises line newly copied of modification. Useful shared state shows the situation when a plurality of Caches have the data that all are not modified in they any one, and disarmed state only shows that data are not effective. Thus, compare with traditional MESI agreement, eliminated in the present embodiment and monopolized (E) state, and the M state has been divided into two different M states.

Table 1: shared cache protocol status

State	Abbreviation	Implication
State	Abbreviation	Implication	It is out-of-date to revise	Mstale	Cache line has been modified and shared cache has the out-of-date copy of described line. Other L1 Cache has nearest copy.
Revise newly copied	MMRC	Cache line has been modified and shared cache has nearest copy, and described copy reflects described modification	It is out-of-date to revise	Mstale
Revise newly copied	MMRC		Share	S	Cache line can be stored in a plurality of Caches, but described line is not modified in they any one. Mean that also this memory is the most recently.
Invalid	I	Cache line does not comprise valid data	Share	S

Can be understood that, cache line can be kept in a different manner or will be labeled as specific state. For example, can represent different states in the usage flag position. This mark or mode bit can be stored with each cache line. Described mode bit also can be stored separately in the independent memory array. In addition, the protocol status position can with about other information combination of cache line together, and be stored with coding or other forms. The final processing of cache line also can be subject to other or the impact that arranges, described other positions or arrange such as memory and key in register (memory typing register) etc.

Fig. 3 a for example understands the transformation according to an embodiment first group of state under first group of condition of shared and private cache device. In the situation of in Fig. 3 a, describing, in the first situation, cache line in the L1 Cache begins with disarmed state, and the cache line corresponding to same memory cell begins to revise newly copied state in the L2 Cache, shown in frame 300. In the second situation, the cache line among the L1 begins with invalid or modification state, and the cache line corresponding to same memory cell among the L2 begins to revise out-of-date state, shown in frame 301. As shown in frame 302, in both cases, core is all carried out write operation to cache line.

Shown in frame 305, in the first situation, in L1, occur missly, and used in one embodiment and write distribution policy, thereby so that L1 obtain cache line (coming from L2) and this cache line maintained in the modification state. Because in this case, described cache line is stored among the L2 to revise newly copied state, so L2 hits, and the state of the line among the L2 is changed to the out-of-date state of modification. For example, can provide data to L1 to the read operation in ownership cycle of L2, and in L2, trigger from revising newly copied state to the transformation of revising out-of-date state. Shown in frame 322, this situation causes the L1 Cache to make its cache line be in the modification state always, and the L2 Cache is in its cache line to revise out-of-date state always, and this is because L1 has the nearest data of revising. According to the write-back agreement of being observed by L1, L1 can not transmit the data of having revised to L2 or all the other memory hierarchies, until it is expelled out of (evict) or tries to find out and hit this cache line.

In the second situation, in core in the frame 302 L1 is carried out write operation and will cause L1 Update Table as shown in the frame 310, and L2 maintains the out-of-date state of revising with cache line. If the cache line among the L1 is in disarmed state, it is converted to the modification state so, and if the cache line among the L1 is in the modification state, it maintains the modification state so, as shown in frame 310. In certain embodiments, can not notify L2 that write operation to cache line has occured in L1, and therefore L2 does not do any processing fully, the result has caused cache line to maintain revising out-of-date state. Thus, the state of the consequent cache line of L1 and L2 is respectively the modification state and revises out-of-date state, shown in frame 322.

Fig. 3 b for example understands the transformation of second group of state under second group of condition according to an embodiment. Shown in frame 330, (dirty, dirty) state begins the cache line in the L1 Cache, and the cache line corresponding to same memory cell begins to revise out-of-date state in the L2 Cache to revise. In the present embodiment, when L1 evicts (frame 332) from, L2 will occur hit (frame 334), and last Cache state will be as shown in frame 336. Specifically, this situation is evicted from by the L1 cache line and L2 obtains newly copied and produces from L1. Thus, the cache line among the L1 is set to disarmed state, and the cache line among the L2 is converted to the newly copied state of revising.

Fig. 3 c for example understands the transformation of the 3rd group of state under the 3rd group of condition according to an embodiment. Shown in frame 340, cache line in a Cache of L1 Cache begins with the modification state, and the cache line in the every other L1 Cache begins with disarmed state, and the cache line corresponding to same memory cell begins to revise out-of-date state in the L2 Cache. Next, shown in frame 342, L2 occurs replace. L2 replaces evicting from of the data caused coming from L2 and causes these data are written back to memory 250. As the result of this write-back, the L2 cache entry is disabled. Certainly, vacate the room if described clauses and subclauses are being expelled out of for ew line, then described ew line can become effectively after a while. As shown in frame 340, to revise out-of-date state because the data of L2 are in, thereby conclude the agreement according to L1 MESI, one of them L1 Cache comprises the cache line that is in the modification state. Therefore, as shown in frame 344, carry out trying to find out in order to extracts the data of modification of L1 Cache, and if the data of described modification be present in arbitrary L1 Cache invalid described cache line then. The described cache line that once had among the L1 of the cache line that is in the modification state is disabled. Thus, shown in frame 344, it is invalid that the state of the cache line in L1 and L2 Cache all becomes, and can carry out by write-back by the data that trying to find out of L1 Cache receives, shown in frame 345.

Fig. 3 d for example understands the situation similar to Fig. 3 c, except the state of the cache line in the L2 Cache is to revise newly copied state at first, as shown in frame 346 (because L2 has newly copied state so the state in the L1 Cache is invalid). As discussing with reference to figure 3c center 342, as shown in frame 348, L2 occurs replace. As the result that L2 in frame 348 replaces, as shown in frame 349, it is invalid that the state of the cache line in L1 and the L2 Cache all becomes again. Yet in this case, because the state of cache line in the L2 Cache is to revise newly copied state at first, thereby L2 comprises newly copiedly, and needn't do any inquiry to the L1 Cache. Thus, shown in frame 349, do not carry out the trying to find out of L1 Cache, and shown in frame 350, will be written back to memory from the data of L2.

In the situation shown in Fig. 3 c, because occur therein the line revised in L1 Cache, try to find out so carried out. In the situation shown in Fig. 3 d, comprised the newly copied of data because share the L2 Cache, so do not carry out trying to find out the L1 Cache. In certain embodiments, (be that the L1 state is invalid at the environment shown in Fig. 3 d, the L2 state is newly copied for revising) under, people may wish the L1 Cache that is associated with processor except the processor of filing a request is moved periods of inactivity. For example, pack into and keep the buffering area of packing into if processor core can be carried out predictive (speculative), data can be kept in the buffering area of packing into the formation of revising so. In addition, these data can be the data of evicting from from the L1 Cache. Next, if different processors is attempted the write cache line, it can experience miss (line among the L1 is invalid, and the state of L2 is to revise newly copied state) so, and attempts to obtain the ownership of cache line to carry out the write operation of being asked. Comprised and be in the such fact of data of revising nearest state although have the L2 Cache, can not guarantee that described data no longer appear at predictive and pack in buffering area or other buffer structures. Therefore, in certain embodiments, it may be very useful trying to find out the cycle for other L1 Caches generations are invalid.

When Fig. 3 e understands for example that according to an embodiment L2 comprises the situation that is in the implicit cache line of revising out-of-date state, the processing that external invalidates is tried to find out. In certain embodiments, other snoop operations can be similarly processed, and the invalid example of trying to find out provides as the representative that state changes the extensive classification of snoop operations. External invalidates is tried to find out and can be received by interface logic 240, and can derive from other buses and act on behalf of (not shown), such as other processors, I/O (I/O) agency, direct memory access (DMA) agency etc. External invalidates is tried to find out the indication address, and this implicit line of trying to find out is the line corresponding to this address. As shown in frame 351, in the L1 Cache, the state of cache line is modification state and be disarmed state in other L1 Caches in a L1 Cache, and the state of L2 cache line is to revise out-of-date state. In frame 352, external invalidates is tried to find out by L2 and is received. The result who tries to find out as external invalidates is set to disarmed state with the state of the cache line in L1 and the L2 Cache, as shown in frame 354. Because L2 only comprises outdated data, so according to the MESI agreement of L1 Cache, one of them L1 Cache comprises the data of modification. Therefore, also shown in frame 354, carry out the cycle of trying to find out to the L1 Cache. Shown in frame 356, the data of modification provide to satisfy the outside by one of them L1 Cache and try to find out.

For the snoop operations of other types, as the result who tries to find out, state that can cache line is set to other states. For example, snoop operations can indicate Cache that the state of cache line is changed into shared (S) state, effectively sends a command to this Cache in order to the data dimension in the described line is held in shared state. As shown in frame 354, response is forced to be converted to trying to find out of shared state, this transformation to shared state occurs rather than occur to the transformation of disarmed state.

In addition, other embodiment can realize the Cache catalogue concentrated, to reduce trying to find out the needs of indivedual L1 Caches. For example, the Cache catalogue that is used for shared cache also can be followed the tracks of the state at the cache line of other L1 Caches. In other embodiments, the Cache catalogue can be different L1 Caches and is maintained. Use this Cache catalogue to eliminate all L1 Caches are sent the single needs of trying to find out the cycle. Instead, the Cache catalogue will show which L1 Cache should receive this invalid trying to find out.

Fig. 3 f for example understands the processing that the external invalidates of the situation similar to Fig. 3 e is tried to find out, and is in the implicit cache line of revising newly copied state except L2 comprises. As shown in frame 360, the state of the cache line in the L1 Cache is invalid, and the state of the cache line in the L2 Cache is to revise newly copied state. In frame 362, external invalidates is tried to find out by L2 and is received. As the result that external invalidates is tried to find out, it is invalid that the state of the cache line in L1 and the L2 Cache is set to, as shown in frame 364. Because L2 comprises nearest data, so respond inner invalid snoop request, by returning from the L2 Cache recently and the data of revising satisfy the outside and try to find out, shown in frame 366. Therefore, to satisfy before the outside tries to find out the cycle, the cycle of trying to find out that offers the L1 Cache is unwanted at return data. In addition, other types tries to find out (except invalid trying to find out) and can similarly process as described above.

Table 2:M-Status Overview

Figure	Write event	Original state		End-state		Implication/action
		Original state		End-state			L1	L2	L1	L2
		3a	L1 hits, or L1 changes from I to M and L2 hits	I or M	MMRC or MStale		L1	L2	L1	L2	M	MStale	Has MRC among the L1
3b	L1 replaces and L2 hits	3a	L1 hits, or L1 changes from I to M and L2 hits	I or M	MMRC or MStale	M	MStale	I	MMRC	Has MRC among the L2	M	MStale	Has MRC among the L1
3b	L1 replaces and L2 hits	3c	L2 replaces and write-back	I or M	MStale	M	MStale	I	MMRC	Has MRC among the L2	I	I	One of them L1 has MRC. Logic is carried out backward inquiry to all L1
3d	L2 replaces and write-back	3c	L2 replaces and write-back	I or M	MStale	I	MMRC	I	I	Neither one L1 has MRC. Do not need	I	I

						To inquire about backward
						To inquire about backward	3e	External invalidates is tried to find out	I or M	MStale	I	I	One of them L1 has MRC. Logic sends to all L1 (or catalogue) and tries to find out
3f	External invalidates is tried to find out	I	MMRC	I	I	L2 has MRC; Do not need to try to find out L1	3e	External invalidates is tried to find out	I or M	MStale	I	I

Fig. 4 for example understands another embodiment of the system that comprises chip multi-processor, and described chip multi-processor is carried out a plurality of private cache devices according to the multimode agreement, and carries out shared cache according to multimode shared cache agreement. The embodiment of Fig. 4 understands that for example processor 400-1 can not comprise that in certain embodiments inside tries to find out bus. In addition, the embodiment of Fig. 4 understands for example that also a plurality of chip multi-processor 400-1 to 400-4 can be coupled to together to form the multi-chip multicomputer system. In one embodiment, 4 processor 400-1 to 400-4 can be identical. 4 processor 400-1 to 400-4 are as illustrating by multi-point bus 401 couplings; Yet point-to-point or other interconnection techniques also can be used. In addition, the add-on assemble such as memory and I/O equipment also appears in the system usually.

Processor 400-1 comprises interface logic 430, in order to communicate via the interconnection that provides (in exemplary embodiment such as multi-point bus 401). Processor 400-1 comprises N core 405-1 to 405-N, and each of a described N core has their private cache device, is respectively 410-1 to 410-N, and described private cache device and interface logic 430 are coupled. More senior shared cache 420 also is provided, and this shared cache 420 and interface logic 430 couplings. Various Caches all can be supported as the aforementioned cache protocol. In the present embodiment, the interface logic between processor and interconnection participate in consistency check, be more than adopting the inner embodiment that tries to find out bus. In the present embodiment, all between Cache are tried to find out circulation and are all passed through interface logic 430 and processed, and turn back to multi-point bus 401 or other (for example, point-to-point) interconnection that is fit to.

Fig. 5 illustrates supplementary features of using in certain embodiments. In the embodiment shown in fig. 5, the core 500-N of chip multi-processor can comprise removing line shared logic 517. As shown, core 500-N can comprise Cache 510 (for example, as the aforementioned the Cache of MESI agreement). Cache 510 comprises control logic 515, and the part of described control logic 515 is to remove line shared logic 517. Cache 510 can be coupled with other Caches and more senior shared cache by trying to find out interconnection 522. In addition, described Cache can be by the second interconnection 524 and external interface coupling.

Remove line shared logic 517 and can provide performance improvement by correct information is sent to other Caches. Can be used for specially processing the existence that interconnection 522 is tried to find out in the inside of trying to find out the traffic and exchanges data, more multi-band is wide for this Cache provides to the transmission of Cache, described Cache may not be to take very ideally system bandwidth to the transmission meeting of Cache, described system is two and tries to find out bus of purpose use or interconnection, so that retrieve data in the higher level from memory hierarchy. Remove line shared logic 517 and cause that control logic 515 will write back to from the cache line of L1 Cache other Caches being different from the situation of just evicting from. That is to say that removing line shared logic 517 will provide cache line in other Caches, wherein said cache line is maintained (rather than firm dispossessed line) in the shared cache subsequently at least temporarily.

In one embodiment, removing line shared logic 517 provides data to satisfy in the cycle of trying to find out of trying to find out in the interconnection 522. That is to say, if Cache detects one and tries to find out the cycle, wherein can give and describedly try to find out the cycle valid data are provided, so this Cache response described try to find out the cycle with provide these data (with wait for memory hierarchy in higher rank provide these data relative). In a further embodiment, the L1 Cache only operates as writing straight-through Cache. In another embodiment, when bandwidth allows, remove line shared logic 517 and have an opportunity data are written back to other Caches. In yet another embodiment, under various conditions and/or response to try to find out bus congested, remove the line shared logic and can be configured to for the various combinations that line are written back to other Caches and/or memory. In one embodiment, each private cache device comprises removing line shared logic 517.

The active of data between core transmitted is of value to stand-by period of reducing memory access and the consumption of bandwidth in some cases. For example, core attempt to access it the private cache device and shared L2 Cache and all miss in both cases will be possible, and no matter different core comprises the such fact of needed data. Desirablely be, between core, transmit data, so that avoid must be by providing data to the external interface generating period or by access L2 Cache.

Various traditional read and write strategies can use in various embodiments. For example, in some example, can use the distribution of writing on the strategy, when with box lunch write operation to line occuring, with cache line write cache device. In addition, the read operation from low-level cache can or can not cause that more senior Cache is filled (that is, can or can not comprise more senior Cache in certain embodiments).

In addition, design can be experienced the various stages, from being created to emulation to manufacturing. The data of expression design process can represent described design process in many ways. At first, because very useful in emulation, hardware can represent with hardware description language or other functional description language. In addition, have the circuit level module of logic and/or transistor gate, can produce in some stage of design process. In addition, most data ranks that reached the physics setting that is illustrated in various device in the hardware module in some stage that design. In the situation of using conventional semiconductor fabrication process, the data of expression hardware module can be to specify in the appearance of each feature on the different mask layers or the data of shortage, and described mask layer is the mask layer for the manufacture of the mask of integrated circuit. In any expression of design process, data can be stored in the machine readable media of arbitrary form. The light wave or the electric wave that are used for sending this information of modulated or other generation, memory or such as the magnetic of disk or the memory of light all is machine readable media. In these media any one all can " carry " described design information.

Thus, the technology that is used for the conforming method and apparatus of shared cache of chip multi-processor or multicomputer system is disclosed. Although described and specific exemplary embodiment shown in the drawings, but it should be understood that, this embodiment illustrates having the of the present invention of broad range, but not limitation of the present invention, and this invention is not limited to shown and described concrete structure and layout, this is because those skilled in the art on the basis of having learnt this disclosure, can make various other modifications.

Claims

1. equipment comprises:

2. equipment as claimed in claim 1, it is characterized in that: each private cache device is in order to realize the first multi-state cache agreement, be retained in the private cache device in order to the data that allow to revise, until the cycle of trying to find out is hit the address that is associated with the data of described modification.

3. equipment as claimed in claim 1, it is characterized in that: described second multi-state cache agreement comprises a plurality of states, and described a plurality of states comprise:

The first modification state, under this state, described shared cache has the out-of-date copy of the line of the modification in the private cache device of described a plurality of processor cores processor core in the heart;

The second modification state, under this state, described shared cache has line newly copied of modification.

4. equipment as claimed in claim 3, it is characterized in that: described a plurality of states further comprise:

Shared state, under this state, the line that described shared cache storage is also stored by other Caches;

Disarmed state, wherein, each cache line in described Cache is stored under the state in the first modification state, the second modification state, shared state or disarmed state.

5. equipment as claimed in claim 3, it is characterized in that: the write operation that is carried out to the cache line corresponding to the first address by the first processor of described a plurality of processor cores, be set to the modification state in order to described cache line in the first private cache device that is associated with described first processor, and be set to the first modification state of described shared cache multi-state cache agreement in order to the shared cache line corresponding to described the first address.

6. equipment as claimed in claim 3, it is characterized in that: the evicting from of the cache line that is corresponded to by the first processor of described a plurality of processor cores pair and the first address, in order to described cache line is written back to described shared cache, be set to disarmed state in order to described cache line, and be set to the second modification state of shared cache multi-state cache agreement in order to the shared cache line corresponding to the first address.

7. equipment as claimed in claim 4, it is characterized in that: described second multi-state cache agreement is the MESI agreement, and wherein said a plurality of state is comprised of the first modification state, the second modification state, shared state and disarmed state.

8. equipment as claimed in claim 3, it is characterized in that: each private cache device further comprises:

Remove the line shared logic, be used for response and try to find out the private cache device that the cycle offers cache line other.

9. equipment as claimed in claim 1, it is characterized in that: described equipment is included in the machine-readable data of carrying on the machine readable media.

10. equipment comprises:

A plurality of processor cores;

11. equipment as claimed in claim 10, it is characterized in that: described a plurality of states further comprise shared state and disarmed state, wherein each cache line has the Cache state clauses and subclauses that are associated, and described Cache state clauses and subclauses are used for showing a state of the first modification state, the second modification state, shared state or disarmed state.

12. equipment as claimed in claim 10 further comprises:

Each Cache in the internal consistency bus, it and described a plurality of Cache is coupled, and is used for communicating between described a plurality of Caches.

13. equipment as claimed in claim 11, it is characterized in that: a plurality of private cache devices of described a plurality of Caches are kept cache line according to second agreement, described second agreement comprises more than second state, and described second agreement comprises the MESI agreement.

14. equipment as claimed in claim 13 is characterized in that: described the first Cache in described a plurality of Caches is to share N+1 level Cache.

15. a method comprises:

Via the cache miss cycle of external bus driving to memory.

16. method as claimed in claim 15 further comprises:

For described a plurality of internal proprietary Caches are kept second four state cache protocol, wherein said second four state cache protocol comprise modification state, exclusive state, shared state and disarmed state.

17. method as claimed in claim 15 further comprises:

The first cache line is converted to the first modification state, and under this state, the first cache line comprises the out-of-date copy of cache line, and another Cache comprises the modification copy of the information that is associated with described the first cache line.

18. method as claimed in claim 17 further comprises:

The second cache line is converted to the second modification state, and under this state, described the second cache line comprises the newly copied of the information that is associated with described cache line.

19. method as claimed in claim 15 further comprises: common lines between described a plurality of internal proprietary Caches.

20. a system comprises:

Multi-core processor, described multi-core processor comprises:

A plurality of processors;

A plurality of Caches that are associated;

Shared cache;

21. system as claimed in claim 20 is characterized in that: described system is server computer system.

22. system as claimed in claim 20 further comprises: be used for that response is inner to be tried to find out the cycle of trying to find out on the bus and share the logic of the line between described a plurality of Caches that are associated.