WO2025085056A1 - Cache snoop rejection - Google Patents
Cache snoop rejection Download PDFInfo
- Publication number
- WO2025085056A1 WO2025085056A1 PCT/US2023/035319 US2023035319W WO2025085056A1 WO 2025085056 A1 WO2025085056 A1 WO 2025085056A1 US 2023035319 W US2023035319 W US 2023035319W WO 2025085056 A1 WO2025085056 A1 WO 2025085056A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- cache
- cache line
- line
- rejection
- determining
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0811—Multiuser, multiprocessor or multiprocessing cache systems with multilevel cache hierarchies
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0815—Cache consistency protocols
- G06F12/0817—Cache consistency protocols using directory methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0815—Cache consistency protocols
- G06F12/0831—Cache consistency protocols using a bus scheme, e.g. with bus monitoring or watching means
- G06F12/0833—Cache consistency protocols using a bus scheme, e.g. with bus monitoring or watching means in combination with broadcast means (e.g. for invalidation or updating)
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/12—Replacement control
- G06F12/121—Replacement control using replacement algorithms
- G06F12/123—Replacement control using replacement algorithms with age lists, e.g. queue, most recently used [MRU] list or least recently used [LRU] list
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/12—Replacement control
- G06F12/121—Replacement control using replacement algorithms
- G06F12/126—Replacement control using replacement algorithms with special data handling, e.g. priority of data or instructions, handling errors or pinning
- G06F12/127—Replacement control using replacement algorithms with special data handling, e.g. priority of data or instructions, handling errors or pinning using additional replacement algorithms
Definitions
- This specification relates to techniques for performing a snoop rejection technique for cache maintenance.
- a cache is a device that stores data retrieved from memory or data written to memory for one or more different hardware devices in a system.
- the hardware devices can be different components integrated into a system on a chip (SOC).
- Caches are commonly organized into multiple sets having multiple ways.
- the cache can receive a request to evict a cache line and the memory address of the request is used to identify a particular set that includes the cache line. A particular way within the identified set is then selected for eviction and storage of the new cache data.
- Some hierarchical cache systems maintain inclusive caches.
- L2 cache can be an inclusive L2 cache.
- L2 cache can be an inclusive L2 cache.
- each cache line present in the LI cache is also present and maintained in the L2 cache.
- the system evicts a cache line from the L2 cache, the corresponding cache line in the LI cache must also be evicted by performing a back invalidation procedure.
- performing the back invalidation procedure to evict the cache line in the LI cache can result in increased overhead because the system must determine whether an evicted cache line is present in the LI cache in order for the L2 cache to remain inclusive to the LI cache. This can result in increased communication overhead within the hierarchical cache system.
- the system described in this specification allows a system to perform a cache maintenance process.
- the system performs the cache maintenance process for a cache hierarchy of multiple caches, including a level 1 (LI) cache, a level 2 (L2) cache, and a level 3 (L3) cache.
- the system performs the cache maintenance process for the LI cache and for an inclusive L2 cache, which includes each cache line present in the LI cache.
- the system can determine that the cache line in the inclusive L2 cache should be evicted, and the system can determine whether the LI cache must perform a back invalidation procedure by evicting the cache line from the LI cache. For example, the system can send a snoop request to the LI cache from the L2 cache, where the snoop request is a request for the LI cache to determine whether a particular cache line is present in the LI cache. If the LI cache determines that the particular cache line is present, the system can determine whether to transmit a rejection message to the L2 cache from the LI cache based on certain rejection criteria. The rejection message includes an indication that the LI cache rejects the eviction of the cache line.
- an inclusive L2 cache initiates eviction of a cache line and an LI cache does not want to evict the cache line
- the caches must perform additional steps to communicate so that the L2 cache maintains the cache line as well.
- performing this increased cache maintenance to ensure that the cache lines present in the LI cache are also present in the inclusive L2 cache results in increased overhead in the system.
- the systems described in this specification can system determine whether to evict the cache line based on a set of rejection criteria, which allows for the system to refrain from evicting the cache line on a case-by-case basis, which can increase efficiency and reduce overhead in communication between the LI cache and the L2 cache to maintain inclusivity.
- the LI and L2 caches can exchange messages to determine whether a line present in L2 is also in LI.
- the LI cache can, for example, send a message to the inclusive L2 cache to ensure that the inclusive L2 cache refrains from evicting the one or more cache lines. For example, if the system uses the cache line in the LI for multiple code executions, the system may determine to refrain from evicting the cache line from the LI cache. However, the LI cache must update and send these messages relatively often, which can also increase overhead in the system.
- the systems described in this specification can use the set of rejection criteria to determine whether to avoid evicting the cache line, where the set of rejection criteria can include whether a cache line was most recently used (e.g., used for one or more instructions).
- the rejection criteria can include whether a cache line was most recently used (e.g., used for one or more instructions).
- the inclusive L2 cache can send a snoop request for a cache line to the LI cache, and the LI cache determines whether the cache line is present in the LI cache.
- the LI cache can send a message to the inclusive L2 cache indicating whether the cache line is present in the LI cache, and if the cache line is present, the inclusive L2 cache refrains from evicting the cache line.
- the cache line may remain relatively unused in the LI cache (e.g., the LI cache may not receive recent instructions or consistent instructions that include the cache line), and the LI cache may be maintaining a cache line with relatively low utility, which can also increase overhead in the system due to the unnecessary’ maintenance of the cache line in the L I cache.
- the systems described in this specification can use the rejection criteria to determine whether to evict a cache line from the L2 cache, even if the cache line is present in the LI cache, based on whether the cache line is relatively unused. Accordingly, the LI cache can perform back invalidation of the relatively unused cache line to maintain inclusivity.
- the rejection criteria to evict relatively unused cache lines from the LI cache and L2 cache, the system can increase efficiency in cache utility among the caches.
- the cache maintenance subsystem has the ability- to efficiently perform cache maintenance for the LI cache and the inclusive L2 cache.
- the cache subsystem can ensure that the cache lines present in the LI cache are present in the L2 cache, and the cache subsystem can increase efficiency in cache maintenance by evicting certain cache lines based on whether the cache line is in a least recently used position, whether the cache line is in a most recently used position, or whether the cache line has an eviction score that satisfies a threshold.
- FIG. 1 is a block diagram of an example system.
- FIG. 2 is a flow diagram of an example process for performing cache maintenance of multiple caches.
- FIG. 3 is a flow diagram of an example process for determining whether to evict a cache line based on rejection criteria.
- FIG. 1 illustrates an example system 100.
- the system 100 is an example of a system in which the systems, components, and techniques described in this specification can be implemented.
- the system 100 includes a system on chip (SOC) 102 coupled to a memory 124.
- SOC system on chip
- the SOC 102 is an example of a device that can be installed on or integrated into any appropriate computing device.
- the SOC 102 includes a processor 106 and a caching system 104 that caches data between the processor 106 and the memory 124.
- the caching system 104 can store data retrieved from the memory 124 or data to be written to memory for the processor 106 or one or more other hardware devices in the SOC 102.
- the other hardware devices can be different components integrated into the SOC 102.
- the caching system 104 includes three cache levels, including a level 1 (LI) cache 108 and lower-level caches, e.g., an L2 cache 110 and an L3 cache 112.
- Each cache of the caching system 104 can be a set associative structure with a certain number of sets that an address or a cache line can be stored in.
- a set associative cache structure has a certain number of ways within each set and a certain number of address indices.
- a cache of the caching system 104 can be organized into sets of which each set having multiple ways, e.g.. 2, 4, 8, or 16 ways. Therefore, there are multiple possible locations in a set into which a cache line can be written.
- the L2 cache 110 is an inclusive L2 cache, such that each of the cache lines present in the LI cache 108 are also present in the L2 cache 110. and the L2 cache 110 is larger (e.g., has a larger cache line capacity) than the LI cache 108.
- the system performs cache maintenance by ensuring that the L2 cache 110 is inclusive of the LI cache 108.
- the processor 106 can generate memory 7 requests to the caching system 104 associated with memory 7 operations for the memory 124.
- the caching system 104 can receive a cache request 122 for a cache line.
- the caching system 104 can determine whether the cache line is present in the LI cache 108, and in the case of a cache miss (e.g., the cache line is not present in the LI cache 108), the caching system can determine whether the cache line is present in the L2 cache 110.
- the L2 cache 110 can look up the index of the address of the cache line included in the cache request 122.
- the L2 cache 110 can map the index to a particular set in the cache 108.
- the caching system 104 can determine to evict a cache line based on the cache request 122. For example, if the cache request 122 includes a request to w rite a cache line to the caching system 104, the L2 cache 110 can select a cache line to evict in order to be able to write the requested cache line to the L2 cache 110. Based on the L2 cache 110 determining to evict a cache line, the caching system 104 must ensure that the LI cache 108 and the L2 cache 110 remain symmetrical (e.g., that the L2 cache 110 is inclusive of the cache lines in the LI cache 108).
- the LI cache 108 can send a rejection message 116 to the L2 cache 110 rejecting the rejectable snoop request 114, as described in further detail below with reference to FIG. 2.
- the rejection message 116 includes an eviction score corresponding to the cache line, as described in further detail below with reference to FIG. 3.
- L2 cache can send a non-rejectable snoop request 118 to the LI cache 108.
- the non-rejectable snoop request 118 can include an indication to evict the cache line from the LI cache 108, and the LI cache 108 evicts the cache line in order to maintain inclusivity with the L2 cache, as described in further detail below with reference to FIGs. 2 and 3.
- FIG. 2 is a flow diagram of an example process for performing cache maintenance of multiple caches.
- the process 200 will be described as being performed by a system having a hierarchy of caches.
- a caching system e.g., the caching system 104 of FIG. 1, appropriately configured in accordance with this specification, can perform the process 200.
- the second cache e.g., the L2 cache
- the L2 cache can perform victimization of the cache line based on certain criteria, e.g., usage statistics.
- the system determines whether the selected L2 victim cache line requires back invalidation (204). In other words, the system can determine whether the cache line selected for eviction in the L2 cache is definitely not present in the LI cache and that the LI cache does not have to evict the cache line. The system can determine whether the cache line is present by comparing the bits of the address for each way in a particular set. If the system determines that the cache line is definitely not present in the LI cache, the system simply evicts the cache line from L2 cache without instructing the LI cache to perform a back invalidation process. Alternatively, if the system determines that the cache line may be present in the LI cache, the system determines that instructing the LI cache to perform back invalidation may be necessary.
- the system instructs the L2 cache to evict the cache line (branch to 206).
- the L2 cache can then evict the selected victim cache line.
- the L2 cache can send a snoop request to the LI cache (branch to 208).
- the snoop request includes a request to determine whether the cache line is present in the LI cache.
- the system can determine whether the cache line is present in the LI cache (210) based on the snoop request. If the system determines that the cache line is not present in the first cache, the system can send a response to the L2 cache (branch to 212) from the LI cache, where the response indicates that the cache line is not present in LI, and the L2 cache (214) can evict the cache line based on the response.
- the system determines whether the cache line satisfies a set of rejection criteria (branch to 216).
- the rejection criteria is criteria for determining whether the LI cache can reject the rejectable snoop request and refrain from evicting the cache line, as described in further detail below with reference to FIG. 3.
- the system instructs the LI cache to evict the cache line and the LI cache can send a response to the L2 cache (branch to 218), where the response indicates that the LI cache has evicted the cache line.
- the L2 cache can then evict the cache line (220) , and the L2 cache remains inclusive to the LI cache.
- the system can send a rejection message to the L2 cache from the LI cache in response to the rejectable snoop request (branch to 222).
- the rejection message includes an eviction score corresponding to the cache line, as described in further detail below with reference to FIG. 3.
- the L2 cache can reinstate the cache line with an updated position based on the rejection message (224).
- the L2 cache can then select another cache line (e.g., a second cache line) to evict, and the system can send a non-rejectable snoop request to the LI cache (224) from the L2 cache.
- the non-rejectable snoop request includes an instruction to evict the second cache line.
- the LI cache can evict the second cache line.
- the L2 cache also evicts the second cache line (228).
- the system ensures that the L2 cache is inclusive of the LI cache, and based on the rejection criteria, the system can improve cache efficiency by ensuring that a relatively critical cache line is not evicted based on the rejection criteria.
- FIG. 3 is a flow- diagram of an example process for determining whether to evict a cache line based on rejection criteria.
- the process 200 will be described as being performed by a system having a hierarchy of caches.
- a caching system e.g., the caching system 104 of FIG. 1, appropriately configured in accordance with this specification, can perform the process 300.
- the system determines that a cache line in the second cache should be evicted (302). For example, the system determines whether the LI cache must perform back invalidation of the cache line based on a cache request received by the L2 cache.
- the system then sends a rejectable snoop request for the cache line to the first cache (304) from the second cache. For example, the system sends the rejectable snoop request to the LI cache from the L2 cache based on determining that the LI cache may be required to perform back invalidation such that the L2 cache remains inclusive to the LI cache (e.g., that each of the cache lines present in the LI cache are present in the inclusive L2 cache).
- the system determines whether the cache line in the first cache satisfies the rejection criteria (306).
- the rejection criteria dictates whether the LI cache can refrain from evicting the cache line based on how often the cache line is used by the system, despite the request from the L2 cache to evict it.
- the system can use a number of types of rejection criteria.
- the rejection criteria can include whether the cache line is in a least recently used position in a set of cache lines.
- the rejection criteria includes whether the cache line is in a most recently used position in the set of cache lines. For example, in a four way set associative cache, one of the cache lines can be in a least recently used position, and one of the cache lines can be in a most recently used position based on how recently the cache line was accessed (e.g., read) by the cache.
- the rejection criteria includes whether an eviction score of the cache line satisfies a threshold. For example, each cache line that is added to a cache can be associated with a starting eviction score of 3. In each subsequent occasion that the cache is accessed, the system can modify the eviction score. For example, the system can decrease the eviction score by an increment of 1. For example, if a cache line with an initial eviction score of 3 is accessed twice, the system can update the eviction score to 1.
- a cache line might satisfy the rejection criteria if the cache line is in a least recently used position, if the cache line is not in a most recently used position, or if the eviction score of the cache line satisfies a threshold.
- the threshold can be a threshold of 1, where the cache line satisfies the threshold with an eviction score of 0 or 1, and the cache line fails to satisfy the threshold with an eviction score of 2 or 3.
- the cache line can fail to satisfy the rejection criteria if the cache line is not in a least recently used position, if the cache line is in a most recently used position, or if the eviction score of the cache line fails to satisfy the threshold.
- the system evicts the cache line from the first cache and sends a response to the second cache (branch to 308).
- the response indicates that the system has evicted the cache line from the first cache.
- the system maintains the cache line in the first cache and provides a rejection message to the second cache (branch to 310). For example, if the cache line is in a most recently used position, the system determines that the LI cache can refrain from evicting the cache line from the LI cache, and the system sends a rejection message to the L2 cache from the LI cache in response to the rejectable snoop request. Accordingly, the L2 cache refrains from evicting the cache line and selects another cache line to evict, as described above.
- the rejection message can include the eviction score for the cache line so that the L2 cache can update its own copy of the cache line with the updated eviction score.
- the system can update the eviction score of the cache line in the second cache to be the same as the eviction score of the cache line in the first cache. As such, the system can determine whether to evict the cache line in the second cache based on the updated eviction score. Overall, by determining whether to transmit a rejection message based on the rejection criteria, the system can efficiently preserve relatively critical cache lines while maintaining symmetry between the LI cache and the inclusive L2 cache.
- Embodiments of the subject matter and the functional operations described in this specification can be implemented in digital electronic circuitry, in tangibly-embodied computer software or firmware, in computer hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them.
- Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e.. one or more modules of computer program instructions encoded on a tangible non-transitory storage medium for execution by, or to control the operation of, data processing apparatus.
- the computer storage medium can be a machine- readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.
- the program instructions can be encoded on an artificially-generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus.
- data processing apparatus refers to data processing hardware and encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers.
- the apparatus can also be, or further include, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).
- the apparatus can optionally include, in addition to hardware, code that creates an execution environment for computer programs, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.
- a computer program which may also be referred to or described as a program, software, a software application, an app.
- a module, a software module, a script, or code can be w ritten in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages; and it can be deployed in any form, including as a standalone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
- a program may, but need not, correspond to a file in a file system.
- a program can be stored in a portion of a file that holds other programs or data, e.g., one or more scripts stored in a markup language document, in a single file dedicated to the program in question, or in multiple coordinated files, e.g., files that store one or more modules, subprograms, or portions of code.
- a computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a data communication network.
- engine is used broadly to refer to a software-based system, subsystem, or process that is programmed to perform one or more specific functions.
- an engine will be implemented as one or more software modules or components, installed on one or more computers in one or more locations. In some cases, one or more computers will be dedicated to a particular engine; in other cases, multiple engines can be installed and running on the same computer or computers.
- the processes and logic flow s described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform functions by operating on input data and generating output.
- the processes and logic flows can also be performed by special purpose logic circuitry, e.g., an FPGA or an ASIC, or by a combination of special purpose logic circuitry and one or more programmed computers.
- Computers suitable for the execution of a computer program can be based on general or special purpose microprocessors or both, or any other kind of central processing unit.
- a central processing unit will receive instructions and data from a read-only memory or a random access memory or both.
- the essential elements of a computer are a central processing unit for performing or executing instructions and one or more memory devices for storing instructions and data.
- the central processing unit and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
- a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks.
- a computer need not have such devices.
- a computer can be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device, e.g., a universal serial bus (USB) flash drive, to name just a few.
- PDA personal digital assistant
- GPS Global Positioning System
- USB universal serial bus
- Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD- ROM and DVD-ROM disks.
- semiconductor memory devices e.g., EPROM, EEPROM, and flash memory devices
- magnetic disks e.g., internal hard disks or removable disks
- magneto-optical disks e.g., CD- ROM and DVD-ROM disks.
- embodiments of the subject matter described in this specification can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g.. a mouse or a trackball, by which the user can provide input to the computer.
- a display device e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
- keyboard and a pointing device e.g.. a mouse or a trackball
- Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input.
- a computer can interact with a user by sending documents to and receiving documents from a device that is used by the user; for example, by sending web pages to a web browser on a user’s device in response to requests received from the w eb brow ser.
- a computer can interact with a user by sending text messages or other forms of message to a personal device, e.g., a smartphone that is running a messaging application, and receiving responsive messages from the user in return.
- Data processing apparatus for implementing machine learning models can also include, for example, special-purpose hardware accelerator units for processing common and computeintensive parts of machine learning training or production, i.e., inference, workloads.
- Machine learning models can be implemented and deployed using a machine learning framework, e.g., a TensorFlow framework.
- a machine learning framework e.g., a TensorFlow framework.
- the computing system can include clients and servers.
- a client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
- a server transmits data, e.g., an HTML page, to a user device, e.g., for purposes of displaying data to and receiving user input from a user interacting with the device, which acts as a client.
- Data generated at the user device e.g., a result of the user interaction, can be received at the server from the device.
- Embodiment 1 is system comprising: a first cache; and a second cache, wherein the second cache is an inclusive cache that maintains cache lines present in the first cache; and a cache maintenance subsystem comprising data processing apparatus configured to perform operations comprising: determining that a first cache line in the second cache should be evicted; sending, to the first cache, a rejectable snoop request for the first cache line; determining that the first cache line in the first cache satisfies one or more rejection criteria; and in response, maintaining the first cache line in the first cache and providing a rejection message to the second cache.
- a cache maintenance subsystem comprising data processing apparatus configured to perform operations comprising: determining that a first cache line in the second cache should be evicted; sending, to the first cache, a rejectable snoop request for the first cache line; determining that the first cache line in the first cache satisfies one or more rejection criteria; and in response, maintaining the first cache line in the first cache and providing a rejection message to the second cache.
- Embodiment 2 is the system of embodiment 1 , wherein the operations further comprise: receiving the rejection message by the second cache; and reinstating the first cache line in the second cache.
- Embodiment 3 is the system of embodiment 2, wherein the operations further comprise evicting, by the second cache, the first cache line before sending the rejectable snoop request.
- Embodiment 4 is the system of any one of embodiments 1-3, wherein determining that the first cache line in the first cache satisfies one or more rejection criteria comprises determining that, among cache lines in a set to which the first cache line belongs, the first cache line is not in a least recently used position.
- Embodiment 5 is the system of any one of embodiments 1 -4, wherein determining that the first cache line in the first cache satisfies one or more rejection criteria comprises determining that, among cache lines in a set to which the first cache line belongs, the first cache line is in a most recently used position.
- Embodiment 6 is the system of any one of embodiments 1-5, wherein determining that the first cache line in the first cache satisfies one or more rejection criteria comprises determining that, among cache lines in a set to which the first cache line belongs, the first cache line has an eviction score that satisfies a threshold.
- Embodiment 7 is the system of any one of embodiments 1 -6, wherein the operations further comprise sending a rejection message to the second cache that includes the eviction score of the first cache line.
- Embodiment 8 is the system of embodiment 7, wherein the operations further comprise: updating an eviction score of the first cache line in the second cache to be the same as the eviction score of the first cache line included in the rejection message to the second cache.
- Embodiment 9 is the system of any one of embodiments 1-8, wherein the operations further comprise: sending, to the first cache, a second snoop request for a second cache line in the second cache, wherein the second snoop request is a non-rejectable snoop request; and evicting, by the first cache, the second cache line.
- Embodiment 10 is the system of any one of embodiments 1 -9, wherein the first cache is an LI cache, and the second cache is an L2 cache that is larger than and inclusive of the LI cache.
- Embodiment 11 is a method performed by a processing device having a first cache and a second cache that is an inclusive cache, the method comprising: determining that a first cache line in the second cache should be evicted; sending, to the first cache, a rejectable snoop request for the first cache line; determining that the first cache line in the first cache satisfies one or more rejection criteria; and in response, maintaining the first cache line in the first cache and providing a rejection message to the second cache.
- Embodiment 12 is the method of embodiment 11, further comprising: receiving the rejection message by the second cache; and reinstating the first cache line in the second cache.
- Embodiment 13 is the method of embodiment 12, further comprising evicting, by the second cache, the first cache line before sending the rejectable snoop request.
- Embodiment 14 is the method of any one of embodiments 11-13, wherein determining that the first cache line in the first cache satisfies one or more rejection criteria comprises determining that, among cache lines in a set to which the first cache line belongs, the first cache line is not in a least recently used position.
- Embodiment 15 is the method of any one of embodiments 11-14, wherein determining that the first cache line in the first cache satisfies one or more rejection criteria comprises determining that, among cache lines in a set to which the first cache line belongs, the first cache line is in a most recently used position.
- Embodiment 16 is the method of any one of embodiments 11-15, wherein determining that the first cache line in the first cache satisfies one or more rejection criteria comprises determining that, among cache lines in a set to which the first cache line belongs, the first cache line has an eviction score that satisfies a threshold.
- Embodiment 17 is the method of embodiment 16, further comprising sending a rejection message to the second cache that includes the eviction score of the first cache line.
- Embodiment 18 is the method of embodiment 17, further comprising: updating an eviction score of the first cache line in the second cache to be the same as the eviction score of the first cache line included in the rejection message to the second cache.
- Embodiment 19 is the method of any one of embodiments 11-18. further comprising: sending, to the first cache, a second snoop request for a second cache line in the second cache, wherein the second snoop request is a non-rejectable snoop request; and evicting, by the first cache, the second cache line.
- Embodiment 20 is the method of any one of embodiments 11-19, wherein the first cache is an LI cache, and the second cache is an L2 cache that is larger than and inclusive of the LI cache.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing a cache maintenance process. In one aspect, a system comprises a first cache, a second cache that is an inclusive cache that maintains cache lines present in the first cache, and a cache maintenance subsystem including data processing apparatus. The cache maintenance subsystem can determine that a first cache line in the second cache should be evicted and to send a rejectable snoop request for the first cache line to the first cache. The cache maintenance subsystem can determine that the first cache line in the first cache satisfies one or more rejection criteria, and in response, the cache maintenance subsystem can maintain the first cache line in the first cache and provide a rejection message to the second cache.
Description
CACHE SNOOP REJECTION
BACKGROUND
This specification relates to techniques for performing a snoop rejection technique for cache maintenance.
A cache is a device that stores data retrieved from memory or data written to memory for one or more different hardware devices in a system. The hardware devices can be different components integrated into a system on a chip (SOC). Caches are commonly organized into multiple sets having multiple ways. The cache can receive a request to evict a cache line and the memory address of the request is used to identify a particular set that includes the cache line. A particular way within the identified set is then selected for eviction and storage of the new cache data.
Some hierarchical cache systems maintain inclusive caches. For example, in a cache hierarchy with multiple caches, such as a level 1 (LI) cache, a level 2 (L2) cache, and a level 3 (L3) cache, the L2 cache can be an inclusive L2 cache. This means that each cache line present in the LI cache is also present and maintained in the L2 cache. Thus, if the system evicts a cache line from the L2 cache, the corresponding cache line in the LI cache must also be evicted by performing a back invalidation procedure.
However, in some cases, performing the back invalidation procedure to evict the cache line in the LI cache can result in increased overhead because the system must determine whether an evicted cache line is present in the LI cache in order for the L2 cache to remain inclusive to the LI cache. This can result in increased communication overhead within the hierarchical cache system.
SUMMARY
The system described in this specification allows a system to perform a cache maintenance process. In this specification, the system performs the cache maintenance process for a cache hierarchy of multiple caches, including a level 1 (LI) cache, a level 2 (L2) cache, and a level 3 (L3) cache. In particular, the system performs the cache maintenance process for the LI cache and for an inclusive L2 cache, which includes each cache line present in the LI cache.
In the case of a requested cache line eviction of the inclusive L2 cache, the system can determine that the cache line in the inclusive L2 cache should be evicted, and the system can determine whether the LI cache must perform a back invalidation procedure by evicting the
cache line from the LI cache. For example, the system can send a snoop request to the LI cache from the L2 cache, where the snoop request is a request for the LI cache to determine whether a particular cache line is present in the LI cache. If the LI cache determines that the particular cache line is present, the system can determine whether to transmit a rejection message to the L2 cache from the LI cache based on certain rejection criteria. The rejection message includes an indication that the LI cache rejects the eviction of the cache line.
Particular embodiments of the subject matter described in this specification can be implemented so as to realize one or more of the following advantages. The systems and techniques described in this specification solve multiple problems with inclusive caches.
For example, in prior systems, if an inclusive L2 cache initiates eviction of a cache line and an LI cache does not want to evict the cache line, the caches must perform additional steps to communicate so that the L2 cache maintains the cache line as well. However, performing this increased cache maintenance to ensure that the cache lines present in the LI cache are also present in the inclusive L2 cache results in increased overhead in the system.
In contrast, the systems described in this specification can system determine whether to evict the cache line based on a set of rejection criteria, which allows for the system to refrain from evicting the cache line on a case-by-case basis, which can increase efficiency and reduce overhead in communication between the LI cache and the L2 cache to maintain inclusivity.
As another example, the LI and L2 caches can exchange messages to determine whether a line present in L2 is also in LI. The LI cache can, for example, send a message to the inclusive L2 cache to ensure that the inclusive L2 cache refrains from evicting the one or more cache lines. For example, if the system uses the cache line in the LI for multiple code executions, the system may determine to refrain from evicting the cache line from the LI cache. However, the LI cache must update and send these messages relatively often, which can also increase overhead in the system.
In contrast, the systems described in this specification can use the set of rejection criteria to determine whether to avoid evicting the cache line, where the set of rejection criteria can include whether a cache line was most recently used (e.g., used for one or more instructions). By using the rejection criteria to determine whether to evict the cache line, the system avoids having to update and send messages to the LI cache, further reducing overhead in communication between caches.
Additionally, prior systems that use snoop requests suffer performance problems. For example, the inclusive L2 cache can send a snoop request for a cache line to the LI cache, and the LI cache determines whether the cache line is present in the LI cache. The LI cache can
send a message to the inclusive L2 cache indicating whether the cache line is present in the LI cache, and if the cache line is present, the inclusive L2 cache refrains from evicting the cache line. However, in some cases, the cache line may remain relatively unused in the LI cache (e.g., the LI cache may not receive recent instructions or consistent instructions that include the cache line), and the LI cache may be maintaining a cache line with relatively low utility, which can also increase overhead in the system due to the unnecessary’ maintenance of the cache line in the L I cache.
In contrast, the systems described in this specification can use the rejection criteria to determine whether to evict a cache line from the L2 cache, even if the cache line is present in the LI cache, based on whether the cache line is relatively unused. Accordingly, the LI cache can perform back invalidation of the relatively unused cache line to maintain inclusivity. By using the rejection criteria to evict relatively unused cache lines from the LI cache and L2 cache, the system can increase efficiency in cache utility among the caches.
In general, by determining whether to evict a cache line based on the rejection criteria, the cache maintenance subsystem has the ability- to efficiently perform cache maintenance for the LI cache and the inclusive L2 cache. The cache subsystem can ensure that the cache lines present in the LI cache are present in the L2 cache, and the cache subsystem can increase efficiency in cache maintenance by evicting certain cache lines based on whether the cache line is in a least recently used position, whether the cache line is in a most recently used position, or whether the cache line has an eviction score that satisfies a threshold.
The details of one or more embodiments of the subject matter of this specification are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages of the subject matter will become apparent from the description, the drawings, and the claims.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram of an example system.
FIG. 2 is a flow diagram of an example process for performing cache maintenance of multiple caches.
FIG. 3 is a flow diagram of an example process for determining whether to evict a cache line based on rejection criteria.
Like reference numbers and designations in the various drawings indicate like elements.
DETAILED DESCRIPTION
FIG. 1 illustrates an example system 100. The system 100 is an example of a system in which the systems, components, and techniques described in this specification can be implemented. The system 100 includes a system on chip (SOC) 102 coupled to a memory 124. The SOC 102 is an example of a device that can be installed on or integrated into any appropriate computing device.
The SOC 102 includes a processor 106 and a caching system 104 that caches data between the processor 106 and the memory 124. For example, the caching system 104 can store data retrieved from the memory 124 or data to be written to memory for the processor 106 or one or more other hardware devices in the SOC 102. The other hardware devices can be different components integrated into the SOC 102.
The caching system 104 includes three cache levels, including a level 1 (LI) cache 108 and lower-level caches, e.g., an L2 cache 110 and an L3 cache 112. Each cache of the caching system 104 can be a set associative structure with a certain number of sets that an address or a cache line can be stored in. A set associative cache structure has a certain number of ways within each set and a certain number of address indices. A cache of the caching system 104 can be organized into sets of which each set having multiple ways, e.g.. 2, 4, 8, or 16 ways. Therefore, there are multiple possible locations in a set into which a cache line can be written.
The L2 cache 110 is an inclusive L2 cache, such that each of the cache lines present in the LI cache 108 are also present in the L2 cache 110. and the L2 cache 110 is larger (e.g., has a larger cache line capacity) than the LI cache 108. The system performs cache maintenance by ensuring that the L2 cache 110 is inclusive of the LI cache 108.
The processor 106 can generate memory7 requests to the caching system 104 associated with memory7 operations for the memory 124. For example, the caching system 104 can receive a cache request 122 for a cache line. The caching system 104 can determine whether the cache line is present in the LI cache 108, and in the case of a cache miss (e.g., the cache line is not present in the LI cache 108), the caching system can determine whether the cache line is present in the L2 cache 110. The L2 cache 110 can look up the index of the address of the cache line included in the cache request 122. The L2 cache 110 can map the index to a particular set in the cache 108.
In some cases, the caching system 104 can determine to evict a cache line based on the cache request 122. For example, if the cache request 122 includes a request to w rite a cache line to the caching system 104, the L2 cache 110 can select a cache line to evict in order to be able to write the requested cache line to the L2 cache 110.
Based on the L2 cache 110 determining to evict a cache line, the caching system 104 must ensure that the LI cache 108 and the L2 cache 110 remain symmetrical (e.g., that the L2 cache 110 is inclusive of the cache lines in the LI cache 108). For example, if the L2 cache 110 evicts the cache line, the caching system 104 can instruct the LI cache 108 to perform back invalidation by evicting the cache line in the LI cache in order for the L2 cache to remain inclusive of the LI cache.
In particular, the caching system 104 can determine that the L2 cache 110 should evict the cache line based on usage statistics of the cache line (e g., how often the cache line is used by the system). In response to determining to evict the cache line and to ensure that the L2 cache 110 is inclusive to the LI cache 108, the L2 cache 110 can send a snoop request to the LI cache 108 based on whether the cache line is present in the LI cache 108. The snoop request can be a rejectable snoop request 114, and the rejectable snoop request can include the requested cache line. Based on whether the requested cache line is present in the LI cache and based on a set of rejection criteria, the caching system 104 can determine whether to maintain the cache line in the LI cache 108 based on a set of rejection criteria, as described in further detail below with reference to FIGs. 2 and 3. The set of rejection criteria can be based on whether the cache line has been least recently used, most recently used, or based on an eviction score.
In some cases, if the caching system 104 determines that the cache line satisfies the set of rejection criteria, the LI cache 108 can send a rejection message 116 to the L2 cache 110 rejecting the rejectable snoop request 114, as described in further detail below with reference to FIG. 2. In some examples, the rejection message 116 includes an eviction score corresponding to the cache line, as described in further detail below with reference to FIG. 3.
In some cases, if the caching system 104 determines that the cache line does not satisfy the set of rejection criteria, L2 cache can send a non-rejectable snoop request 118 to the LI cache 108. The non-rejectable snoop request 118 can include an indication to evict the cache line from the LI cache 108, and the LI cache 108 evicts the cache line in order to maintain inclusivity with the L2 cache, as described in further detail below with reference to FIGs. 2 and 3.
FIG. 2 is a flow diagram of an example process for performing cache maintenance of multiple caches. For convenience, the process 200 will be described as being performed by a system having a hierarchy of caches. For example, a caching system, e.g., the caching system 104 of FIG. 1, appropriately configured in accordance with this specification, can perform the process 200.
The second cache (e.g., the L2 cache) selects a cache line to evict (202). For example, the L2 cache can perform victimization of the cache line based on certain criteria, e.g., usage statistics.
The system determines whether the selected L2 victim cache line requires back invalidation (204). In other words, the system can determine whether the cache line selected for eviction in the L2 cache is definitely not present in the LI cache and that the LI cache does not have to evict the cache line. The system can determine whether the cache line is present by comparing the bits of the address for each way in a particular set. If the system determines that the cache line is definitely not present in the LI cache, the system simply evicts the cache line from L2 cache without instructing the LI cache to perform a back invalidation process. Alternatively, if the system determines that the cache line may be present in the LI cache, the system determines that instructing the LI cache to perform back invalidation may be necessary.
If the system determines that back invalidation is not necessary, the system instructs the L2 cache to evict the cache line (branch to 206). The L2 cache can then evict the selected victim cache line.
If the system determines that instructing the LI cache to perform back invalidation may be necessary, the L2 cache can send a snoop request to the LI cache (branch to 208). The snoop request includes a request to determine whether the cache line is present in the LI cache.
The system can determine whether the cache line is present in the LI cache (210) based on the snoop request. If the system determines that the cache line is not present in the first cache, the system can send a response to the L2 cache (branch to 212) from the LI cache, where the response indicates that the cache line is not present in LI, and the L2 cache (214) can evict the cache line based on the response.
However, if the system determines that the cache line is present in the LI cache (210), the system determines whether the cache line satisfies a set of rejection criteria (branch to 216). The rejection criteria is criteria for determining whether the LI cache can reject the rejectable snoop request and refrain from evicting the cache line, as described in further detail below with reference to FIG. 3.
If the cache line fails to satisfy the rejection criteria, the system instructs the LI cache to evict the cache line and the LI cache can send a response to the L2 cache (branch to 218), where the response indicates that the LI cache has evicted the cache line. The L2 cache can then evict the cache line (220) , and the L2 cache remains inclusive to the LI cache.
If the cache line satisfies the rejection criteria, the system can send a rejection message to the L2 cache from the LI cache in response to the rejectable snoop request (branch to 222).
In some examples, the rejection message includes an eviction score corresponding to the cache line, as described in further detail below with reference to FIG. 3.
The L2 cache can reinstate the cache line with an updated position based on the rejection message (224).
The L2 cache can then select another cache line (e.g., a second cache line) to evict, and the system can send a non-rejectable snoop request to the LI cache (224) from the L2 cache. The non-rejectable snoop request includes an instruction to evict the second cache line. The LI cache can evict the second cache line.
In this case, the L2 cache also evicts the second cache line (228). The system ensures that the L2 cache is inclusive of the LI cache, and based on the rejection criteria, the system can improve cache efficiency by ensuring that a relatively critical cache line is not evicted based on the rejection criteria.
FIG. 3 is a flow- diagram of an example process for determining whether to evict a cache line based on rejection criteria. For convenience, the process 200 will be described as being performed by a system having a hierarchy of caches. For example, a caching system, e.g., the caching system 104 of FIG. 1, appropriately configured in accordance with this specification, can perform the process 300.
The system determines that a cache line in the second cache should be evicted (302). For example, the system determines whether the LI cache must perform back invalidation of the cache line based on a cache request received by the L2 cache.
The system then sends a rejectable snoop request for the cache line to the first cache (304) from the second cache. For example, the system sends the rejectable snoop request to the LI cache from the L2 cache based on determining that the LI cache may be required to perform back invalidation such that the L2 cache remains inclusive to the LI cache (e.g., that each of the cache lines present in the LI cache are present in the inclusive L2 cache).
The system then determines whether the cache line in the first cache satisfies the rejection criteria (306). Essentially, the rejection criteria dictates whether the LI cache can refrain from evicting the cache line based on how often the cache line is used by the system, despite the request from the L2 cache to evict it.
The system can use a number of types of rejection criteria. For example, the rejection criteria can include whether the cache line is in a least recently used position in a set of cache lines. Alternatively or in addition, the rejection criteria includes whether the cache line is in a most recently used position in the set of cache lines. For example, in a four way set associative cache, one of the cache lines can be in a least recently used position, and one of the cache lines
can be in a most recently used position based on how recently the cache line was accessed (e.g., read) by the cache.
In some implementations, the rejection criteria includes whether an eviction score of the cache line satisfies a threshold. For example, each cache line that is added to a cache can be associated with a starting eviction score of 3. In each subsequent occasion that the cache is accessed, the system can modify the eviction score. For example, the system can decrease the eviction score by an increment of 1. For example, if a cache line with an initial eviction score of 3 is accessed twice, the system can update the eviction score to 1.
Thus, a cache line might satisfy the rejection criteria if the cache line is in a least recently used position, if the cache line is not in a most recently used position, or if the eviction score of the cache line satisfies a threshold. The threshold can be a threshold of 1, where the cache line satisfies the threshold with an eviction score of 0 or 1, and the cache line fails to satisfy the threshold with an eviction score of 2 or 3.
Alternatively or in addition, the cache line can fail to satisfy the rejection criteria if the cache line is not in a least recently used position, if the cache line is in a most recently used position, or if the eviction score of the cache line fails to satisfy the threshold.
If the cache line does not satisfy the rejection criteria, the system evicts the cache line from the first cache and sends a response to the second cache (branch to 308). In particular, the response indicates that the system has evicted the cache line from the first cache.
If the cache line satisfies the rejection criteria, the system maintains the cache line in the first cache and provides a rejection message to the second cache (branch to 310). For example, if the cache line is in a most recently used position, the system determines that the LI cache can refrain from evicting the cache line from the LI cache, and the system sends a rejection message to the L2 cache from the LI cache in response to the rejectable snoop request. Accordingly, the L2 cache refrains from evicting the cache line and selects another cache line to evict, as described above.
In some implementations, the rejection message can include the eviction score for the cache line so that the L2 cache can update its own copy of the cache line with the updated eviction score.. The system can update the eviction score of the cache line in the second cache to be the same as the eviction score of the cache line in the first cache. As such, the system can determine whether to evict the cache line in the second cache based on the updated eviction score.
Overall, by determining whether to transmit a rejection message based on the rejection criteria, the system can efficiently preserve relatively critical cache lines while maintaining symmetry between the LI cache and the inclusive L2 cache.
This specification uses the term “configured” in connection with systems and computer program components. For a system of one or more computers to be configured to perform particular operations or actions means that the system has installed on it software, firmware, hardware, or a combination of them that in operation cause the system to perform the operations or actions. For one or more computer programs to be configured to perform particular operations or actions means that the one or more programs include instructions that, when executed by data processing apparatus, cause the apparatus to perform the operations or actions.
Embodiments of the subject matter and the functional operations described in this specification can be implemented in digital electronic circuitry, in tangibly-embodied computer software or firmware, in computer hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e.. one or more modules of computer program instructions encoded on a tangible non-transitory storage medium for execution by, or to control the operation of, data processing apparatus. The computer storage medium can be a machine- readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them. Alternatively or in addition, the program instructions can be encoded on an artificially-generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus.
The term “data processing apparatus” refers to data processing hardware and encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can also be, or further include, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit). The apparatus can optionally include, in addition to hardware, code that creates an execution environment for computer programs, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.
A computer program, which may also be referred to or described as a program, software, a software application, an app. a module, a software module, a script, or code, can be w ritten in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages; and it can be deployed in any form, including as a standalone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A program may, but need not, correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data, e.g., one or more scripts stored in a markup language document, in a single file dedicated to the program in question, or in multiple coordinated files, e.g., files that store one or more modules, subprograms, or portions of code. A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a data communication network.
In this specification the term “engine” is used broadly to refer to a software-based system, subsystem, or process that is programmed to perform one or more specific functions. Generally, an engine will be implemented as one or more software modules or components, installed on one or more computers in one or more locations. In some cases, one or more computers will be dedicated to a particular engine; in other cases, multiple engines can be installed and running on the same computer or computers.
The processes and logic flow s described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by special purpose logic circuitry, e.g., an FPGA or an ASIC, or by a combination of special purpose logic circuitry and one or more programmed computers.
Computers suitable for the execution of a computer program can be based on general or special purpose microprocessors or both, or any other kind of central processing unit. Generally, a central processing unit will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a central processing unit for performing or executing instructions and one or more memory devices for storing instructions and data. The central processing unit and the memory can be supplemented by, or incorporated in, special purpose logic circuitry. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer need not have such devices. Moreover, a computer can be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio or
video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device, e.g., a universal serial bus (USB) flash drive, to name just a few.
Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD- ROM and DVD-ROM disks.
To provide for interaction with a user, embodiments of the subject matter described in this specification can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g.. a mouse or a trackball, by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input. In addition, a computer can interact with a user by sending documents to and receiving documents from a device that is used by the user; for example, by sending web pages to a web browser on a user’s device in response to requests received from the w eb brow ser. Also, a computer can interact with a user by sending text messages or other forms of message to a personal device, e.g., a smartphone that is running a messaging application, and receiving responsive messages from the user in return.
Data processing apparatus for implementing machine learning models can also include, for example, special-purpose hardware accelerator units for processing common and computeintensive parts of machine learning training or production, i.e., inference, workloads.
Machine learning models can be implemented and deployed using a machine learning framework, e.g., a TensorFlow framework.
Embodiments of the subject matter described in this specification can be implemented in a computing system that includes a back-end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front-end component, e.g., a client computer having a graphical user interface, a web browser, or an app through which a user can interact with an implementation of the subject matter described in this specification, or any combination of one or more such back-end, middlew are, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication
networks include a local area network (LAN) and a wide area network (WAN), e.g., the Internet.
The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. In some embodiments, a server transmits data, e.g., an HTML page, to a user device, e.g., for purposes of displaying data to and receiving user input from a user interacting with the device, which acts as a client. Data generated at the user device, e.g., a result of the user interaction, can be received at the server from the device.
In addition to the embodiments described above, the following embodiments are also innovative:
Embodiment 1 is system comprising: a first cache; and a second cache, wherein the second cache is an inclusive cache that maintains cache lines present in the first cache; and a cache maintenance subsystem comprising data processing apparatus configured to perform operations comprising: determining that a first cache line in the second cache should be evicted; sending, to the first cache, a rejectable snoop request for the first cache line; determining that the first cache line in the first cache satisfies one or more rejection criteria; and in response, maintaining the first cache line in the first cache and providing a rejection message to the second cache.
Embodiment 2 is the system of embodiment 1 , wherein the operations further comprise: receiving the rejection message by the second cache; and reinstating the first cache line in the second cache.
Embodiment 3 is the system of embodiment 2, wherein the operations further comprise evicting, by the second cache, the first cache line before sending the rejectable snoop request.
Embodiment 4 is the system of any one of embodiments 1-3, wherein determining that the first cache line in the first cache satisfies one or more rejection criteria comprises determining that, among cache lines in a set to which the first cache line belongs, the first cache line is not in a least recently used position.
Embodiment 5 is the system of any one of embodiments 1 -4, wherein determining that the first cache line in the first cache satisfies one or more rejection criteria comprises determining that, among cache lines in a set to which the first cache line belongs, the first cache line is in a most recently used position.
Embodiment 6 is the system of any one of embodiments 1-5, wherein determining that the first cache line in the first cache satisfies one or more rejection criteria comprises determining that, among cache lines in a set to which the first cache line belongs, the first cache line has an eviction score that satisfies a threshold.
Embodiment 7 is the system of any one of embodiments 1 -6, wherein the operations further comprise sending a rejection message to the second cache that includes the eviction score of the first cache line.
Embodiment 8 is the system of embodiment 7, wherein the operations further comprise: updating an eviction score of the first cache line in the second cache to be the same as the eviction score of the first cache line included in the rejection message to the second cache.
Embodiment 9 is the system of any one of embodiments 1-8, wherein the operations further comprise: sending, to the first cache, a second snoop request for a second cache line in the second cache, wherein the second snoop request is a non-rejectable snoop request; and evicting, by the first cache, the second cache line.
Embodiment 10 is the system of any one of embodiments 1 -9, wherein the first cache is an LI cache, and the second cache is an L2 cache that is larger than and inclusive of the LI cache.
Embodiment 11 is a method performed by a processing device having a first cache and a second cache that is an inclusive cache, the method comprising: determining that a first cache line in the second cache should be evicted; sending, to the first cache, a rejectable snoop request for the first cache line; determining that the first cache line in the first cache satisfies one or more rejection criteria; and in response, maintaining the first cache line in the first cache and providing a rejection message to the second cache.
Embodiment 12 is the method of embodiment 11, further comprising: receiving the rejection message by the second cache; and reinstating the first cache line in the second cache.
Embodiment 13 is the method of embodiment 12, further comprising evicting, by the second cache, the first cache line before sending the rejectable snoop request.
Embodiment 14 is the method of any one of embodiments 11-13, wherein determining that the first cache line in the first cache satisfies one or more rejection criteria comprises determining that, among cache lines in a set to which the first cache line belongs, the first cache line is not in a least recently used position.
Embodiment 15 is the method of any one of embodiments 11-14, wherein determining that the first cache line in the first cache satisfies one or more rejection criteria comprises determining that, among cache lines in a set to which the first cache line belongs, the first cache line is in a most recently used position.
Embodiment 16 is the method of any one of embodiments 11-15, wherein determining that the first cache line in the first cache satisfies one or more rejection criteria comprises determining that, among cache lines in a set to which the first cache line belongs, the first cache line has an eviction score that satisfies a threshold.
Embodiment 17 is the method of embodiment 16, further comprising sending a rejection message to the second cache that includes the eviction score of the first cache line.
Embodiment 18 is the method of embodiment 17, further comprising: updating an eviction score of the first cache line in the second cache to be the same as the eviction score of the first cache line included in the rejection message to the second cache.
Embodiment 19 is the method of any one of embodiments 11-18. further comprising: sending, to the first cache, a second snoop request for a second cache line in the second cache, wherein the second snoop request is a non-rejectable snoop request; and evicting, by the first cache, the second cache line.
Embodiment 20 is the method of any one of embodiments 11-19, wherein the first cache is an LI cache, and the second cache is an L2 cache that is larger than and inclusive of the LI cache.
While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any invention or on the scope of what may be claimed, but rather as descriptions of features that may be specific to particular embodiments of particular inventions. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially be claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Similarly, while operations are depicted in the drawings and recited in the claims in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system modules and components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.
Particular embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. For example, the actions recited in the claims can be performed in a different order and still achieve desirable results. As one example, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some cases, multitasking and parallel processing may be advantageous.
What is claimed is:
Claims
1. A sy stem compri sing : a first cache; and a second cache, wherein the second cache is an inclusive cache that maintains cache lines present in the first cache; and a cache maintenance subsystem comprising data processing apparatus configured to perform operations comprising: determining that a first cache line in the second cache should be evicted; sending, to the first cache, a rejectable snoop request for the first cache line; determining that the first cache line in the first cache satisfies one or more rejection criteria; and in response, maintaining the first cache line in the first cache and providing a rejection message to the second cache.
2. The system of claim 1, wherein the operations further comprise: receiving the rejection message by the second cache; and reinstating the first cache line in the second cache.
3. The system of claim 2, wherein the operations further comprise evicting, by the second cache, the first cache line before sending the rejectable snoop request.
4. The system of any one of claims 1-3, wherein determining that the first cache line in the first cache satisfies one or more rejection criteria comprises determining that, among cache lines in a set to which the first cache line belongs, the first cache line is not in a least recently used position.
5. The system of any one of claims 1-4, wherein determining that the first cache line in the first cache satisfies one or more rejection criteria comprises determining that, among cache lines in a set to which the first cache line belongs, the first cache line is in a most recently used position.
6. The system of any one of claims 1-5, wherein determining that the first cache line in the first cache satisfies one or more rejection criteria comprises determining that, among cache lines in a set to which the first cache line belongs, the first cache line has an eviction score that satisfies a threshold.
7. The system of claim 6, wherein the operations further comprise sending a rejection message to the second cache that includes the eviction score of the first cache line.
8. The system of claim 7, wherein the operations further comprise: updating an eviction score of the first cache line in the second cache to be the same as the eviction score of the first cache line included in the rejection message to the second cache.
9. The system of any one of claims 1-8, wherein the operations further comprise: sending, to the first cache, a second snoop request for a second cache line in the second cache, wherein the second snoop request is a non-rejectable snoop request; and evicting, by the first cache, the second cache line.
10. The system of any one of claims 1-9, wherein the first cache is an LI cache, and the second cache is an L2 cache that is larger than and inclusive of the LI cache.
11. A method performed by a processing device having a first cache and a second cache that is an inclusive cache, the method comprising: determining that a first cache line in the second cache should be evicted; sending, to the first cache, a rejectable snoop request for the first cache line; determining that the first cache line in the first cache satisfies one or more rejection criteria; and in response, maintaining the first cache line in the first cache and providing a rejection message to the second cache.
12. The method of claim 11, further comprising: receiving the rejection message by the second cache; and reinstating the first cache line in the second cache.
13. The method of claim 12, further comprising evicting, by the second cache, the first cache line before sending the rejectable snoop request.
14. The method of any one of claims 11-13, wherein determining that the first cache line in the first cache satisfies one or more rejection criteria comprises determining that, among cache lines in a set to which the first cache line belongs, the first cache line is not in a least recently used position.
15. The method of any one of claims 11-14, wherein determining that the first cache line in the first cache satisfies one or more rejection criteria comprises determining that, among cache lines in a set to which the first cache line belongs, the first cache line is in a most recently used position.
16. The method of any one of claims 11-15, wherein determining that the first cache line in the first cache satisfies one or more rejection criteria comprises determining that, among cache lines in a set to which the first cache line belongs, the first cache line has an eviction score that satisfies a threshold.
17. The method of claim 16. further comprising sending a rejection message to the second cache that includes the eviction score of the first cache line.
18. The method of claim 17, further comprising: updating an eviction score of the first cache line in the second cache to be the same as the eviction score of the first cache line included in the rejection message to the second cache.
19. The method of any one of claims 11-18, further comprising: sending, to the first cache, a second snoop request for a second cache line in the second cache, wherein the second snoop request is a non-rejectable snoop request; and evicting, by the first cache, the second cache line.
20. The method of any one of claims 11-19, wherein the first cache is an LI cache, and the second cache is an L2 cache that is larger than and inclusive of the LI cache.
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/US2023/035319 WO2025085056A1 (en) | 2023-10-17 | 2023-10-17 | Cache snoop rejection |
| TW113137711A TW202518257A (en) | 2023-10-17 | 2024-10-03 | Cache snoop rejection |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/US2023/035319 WO2025085056A1 (en) | 2023-10-17 | 2023-10-17 | Cache snoop rejection |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025085056A1 true WO2025085056A1 (en) | 2025-04-24 |
Family
ID=88793282
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2023/035319 Pending WO2025085056A1 (en) | 2023-10-17 | 2023-10-17 | Cache snoop rejection |
Country Status (2)
| Country | Link |
|---|---|
| TW (1) | TW202518257A (en) |
| WO (1) | WO2025085056A1 (en) |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20180293175A1 (en) * | 2017-04-10 | 2018-10-11 | Samsung Electronics Co., Ltd. | Techniques to reduce read-modify-write overhead in hybrid dram/nand memory |
| US20190370187A1 (en) * | 2017-03-08 | 2019-12-05 | Huawei Technologies Co., Ltd. | Cache Replacement Method, Apparatus, and System |
-
2023
- 2023-10-17 WO PCT/US2023/035319 patent/WO2025085056A1/en active Pending
-
2024
- 2024-10-03 TW TW113137711A patent/TW202518257A/en unknown
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20190370187A1 (en) * | 2017-03-08 | 2019-12-05 | Huawei Technologies Co., Ltd. | Cache Replacement Method, Apparatus, and System |
| US20180293175A1 (en) * | 2017-04-10 | 2018-10-11 | Samsung Electronics Co., Ltd. | Techniques to reduce read-modify-write overhead in hybrid dram/nand memory |
Also Published As
| Publication number | Publication date |
|---|---|
| TW202518257A (en) | 2025-05-01 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US8015365B2 (en) | Reducing back invalidation transactions from a snoop filter | |
| CN101523361B (en) | Handling of write access requests to shared memory in a data processing device | |
| US8706973B2 (en) | Unbounded transactional memory system and method | |
| US9836406B2 (en) | Dynamic victim cache policy | |
| US11373062B1 (en) | Model training method, data processing method, electronic device, and program product | |
| US12093177B2 (en) | Multi-level partitioned snoop filter | |
| US8140767B2 (en) | Cache management through delayed writeback | |
| KR20070069053A (en) | Apparatus, System and Method of Multi-State Cache Coherence Method | |
| US20090006668A1 (en) | Performing direct data transactions with a cache memory | |
| CN106227674B (en) | Cache coherence controller and method | |
| CN103885894A (en) | Information Coherency Maintenance Systems And Methods | |
| US20230121843A1 (en) | Managing data stored in a cache using a reinforcement learning agent | |
| US20210173789A1 (en) | System and method for storing cache location information for cache entry transfer | |
| US7617364B2 (en) | System, method and storage medium for prefetching via memory block tags | |
| CN114036084B (en) | Data access method, shared cache, chip system and electronic equipment | |
| US20110055482A1 (en) | Shared cache reservation | |
| US11288237B2 (en) | Distributed file system with thin arbiter node | |
| US10339055B2 (en) | Cache system with multiple cache unit states | |
| US8127079B2 (en) | Intelligent cache injection | |
| US9842050B2 (en) | Add-on memory coherence directory | |
| WO2025085056A1 (en) | Cache snoop rejection | |
| US20120005432A1 (en) | Reducing Cache Probe Traffic Resulting From False Data Sharing | |
| US20190324906A1 (en) | Selective data retrieval based on access latency | |
| WO2024210897A1 (en) | Enhanced cache utilization | |
| US9552293B1 (en) | Emulating eviction data paths for invalidated instruction cache |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23805731 Country of ref document: EP Kind code of ref document: A1 |