TWI627536B - System and method for a shared cache with adaptive partitioning - Google Patents
System and method for a shared cache with adaptive partitioning Download PDFInfo
- Publication number
- TWI627536B TWI627536B TW105133396A TW105133396A TWI627536B TW I627536 B TWI627536 B TW I627536B TW 105133396 A TW105133396 A TW 105133396A TW 105133396 A TW105133396 A TW 105133396A TW I627536 B TWI627536 B TW I627536B
- Authority
- TW
- Taiwan
- Prior art keywords
- cache
- component
- obsolete
- storage
- cache line
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/084—Multiuser, multiprocessor or multiprocessing cache systems with a shared cache
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0842—Multiuser, multiprocessor or multiprocessing cache systems for multiprocessing or multitasking
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0844—Multiple simultaneous or quasi-simultaneous cache accessing
- G06F12/0846—Cache with multiple tag or data arrays being simultaneously accessible
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0893—Caches characterised by their organisation or structure
- G06F12/0895—Caches characterised by their organisation or structure of parts of caches, e.g. directory or tag array
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/12—Replacement control
- G06F12/121—Replacement control using replacement algorithms
- G06F12/126—Replacement control using replacement algorithms with special data handling, e.g. priority of data or instructions, handling errors or pinning
- G06F12/127—Replacement control using replacement algorithms with special data handling, e.g. priority of data or instructions, handling errors or pinning using additional replacement algorithms
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0864—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using pseudo-associative means, e.g. set-associative or hashing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/12—Replacement control
- G06F12/121—Replacement control using replacement algorithms
- G06F12/123—Replacement control using replacement algorithms with age lists, e.g. queue, most recently used [MRU] list or least recently used [LRU] list
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/10—Providing a specific technical effect
- G06F2212/1041—Resource optimization
- G06F2212/1044—Space efficiency improvement
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/10—Providing a specific technical effect
- G06F2212/1048—Scalability
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/60—Details of cache memory
- G06F2212/601—Reconfiguration of cache memory
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/62—Details of cache specific to multiprocessor cache arrangements
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
本發明提供一種快取控制器,其調適性地分割一共用快取。該調適性分割快取控制器包括回應於用戶端存取請求及各種參數之標記比較及過時邏輯以及選擇邏輯。向一組件快取指派一目標佔用數,其與一當前佔用數進行比較。對過時快取行之一條件性識別用以管理儲存於該共用快取中之資料。當識別出一衝突或快取遺漏時,選擇邏輯識別用於較佳地在被識別為過時之快取行之間替換的候選者。對於每組件快取之固定數目個儲體,將各快取行指派給一儲體。依據該目標佔用數將所分配快取行指派給一儲體。在填充了所選數目個儲體之後,後續分配使得將最舊快取行標識為過時。在撤銷確證快取行之各別組件快取作用中指示符時,將該等快取行視為過時。The present invention provides a cache controller that adaptively divides a shared cache. The adaptive split cache controller includes flag comparison and obsolete logic and selection logic in response to client access requests and various parameters. A component cache is assigned a target occupancy count that is compared to a current occupancy count. Conditional identification of one of the obsolete cache lines to manage the data stored in the shared cache. When a conflict or cache miss is identified, the selection logic identifies candidates that are preferably replaced between cache lines that are identified as obsolete. For a fixed number of banks per component cache, each cache line is assigned to a bank. The assigned cache line is assigned to a bank based on the target occupancy. Subsequent allocations make the oldest cache line obsolete after the selected number of banks have been populated. The cache lines are considered obsolete when the individual component cached indicator of the cache line is revoked.
Description
用以服務計算系統中之多個用戶端程序或用戶端應用程式(「用戶端」)的快取分割具有若干益處。分割區大小可變化以匹配用戶端之效能要求。用戶端經歷穩健的效能,此係因為其私用快取中之資料由於由其他用戶端起始之存取而不經受收回。 然而,此等益處由於由實現當前快取分割之方式外加的限制而不會如其可能的那般大。計算裝置之一些習知設計包括具有不同類型之多種處理器(諸如中央處理單元、圖形處理單元、顯示控制器、硬體加速器等)及/或具有多個核心以支援對於特定計算裝置所要之各種主要及周邊功能的處理器。此等設計常常進一步整合類比、數位及射頻電路或其他專用電路以在單個矽基板上實現特定功能,且通常稱作系統單晶片(SoC)。 當前,可僅橫跨快取通路或使用通路分割與集合分割兩者之組合分割快取,該等選項皆為有限的。當僅實施通路分割時,快取分割區數目受到快取通路之數目限制,且向快取分割區添加關聯性以減小快取衝突會減小可能快取分割區之最大數目。最小通路分割區大小係由關聯性判定。舉例而言,具有8個通路(亦即,8通路關聯)之8 MB快取的最小分割區為1 MB。又,使得能夠使用較小快取分割區可能引起快取空間之損失。使用相同實例,儲存於最小1 MB快取分割區中之512 KB的資料將在快取分割區中產生512 KB之未使用空間。 集合分割維持高關聯性且提供較大靈活性,但產生可能需要複雜解決方案以成功地進行管理之資料一致性管理問題,此係因為對集合尺寸中之快取調整大小並非輕而易舉的任務。集合分割與基線集合關聯快取一起工作。然而,集合分割並不與偏斜關聯快取結構一起工作。 在圖1中說明通路分割。快取儲存區1包括可由N個通路及M個集合單獨地定址之可定址記憶體區段的陣列,其中N及M為正整數。如圖1中進一步指示,快取儲存區1包括8個通路及8個集合,且分段成快取分割區2、3及4,其中快取分割區2包括4×8個可定址區段之容量,快取分割區3包括2×8個可定址區段,且快取分割區4包括2×8個可定址區段。 在圖2中說明集合及通路分割。快取儲存區5包括可由N個通路及M個集合單獨地定址之可定址記憶體區段的陣列,其中N及M為正整數。如圖2中進一步指示,快取儲存區5包括8個通路及8個集合,且分段成快取分割區6、7及8,其中快取分割區6包括4×4個可定址區段之容量,快取分割區7包括2×1個可定址區段,且快取分割區8包括1×8個可定址區段。 組合式集合/通路分割方案之有效性為可疑的,其超出相對小數目個粗糙粒度及高度自訂解決方案。當產生、刪除組件快取及對其調整大小時,必須隨時間推移操控且管理經集合/通路分割之快取的複雜幾何表示。因此,集合/通路分割方法不可擴展。 因此,存在對於用於動態地管理快取分割區之改良型機制的需要。Cache segmentation to serve multiple client programs or client applications ("user terminals") in a computing system has several benefits. The partition size can be varied to match the performance requirements of the client. The client experiences robust performance because the data in its private cache is not subject to retraction due to access initiated by other clients. However, such benefits are not as large as they might be due to the limitations imposed by the way the current cache is split. Some conventional designs of computing devices include multiple processors of different types (such as central processing units, graphics processing units, display controllers, hardware accelerators, etc.) and/or having multiple cores to support the various types of computing devices required for a particular computing device. Main and peripheral functions of the processor. Such designs often further integrate analog, digital, and RF circuits or other specialized circuits to achieve a particular function on a single germanium substrate, and are commonly referred to as system single-chip (SoC). Currently, the cache can be split across only the cache path or a combination of both path split and set split, all of which are limited. When only path segmentation is implemented, the number of cache partitions is limited by the number of cache paths, and adding associations to the cache partitions to reduce cache collisions reduces the maximum number of possible cache partitions. The minimum path partition size is determined by the association. For example, the smallest partition of an 8 MB cache with 8 lanes (ie, 8 lane associations) is 1 MB. Also, enabling the use of smaller cache partitions may cause loss of cache space. Using the same example, 512 KB of data stored in a minimum 1 MB cache partition will produce 512 KB of unused space in the cache partition. Set partitioning maintains high correlation and provides greater flexibility, but creates data consistency management issues that may require complex solutions for successful management because it is not a trivial task to resize the cache in the collection size. The collection split works in conjunction with the baseline collection association cache. However, set splitting does not work with skewed associative cache structures. Path segmentation is illustrated in FIG. The cache storage area 1 includes an array of addressable memory segments that can be individually addressed by N lanes and M sets, where N and M are positive integers. As further indicated in FIG. 1, the cache storage area 1 includes 8 lanes and 8 sets, and is segmented into cache partitions 2, 3, and 4, wherein the cache partition 2 includes 4×8 addressable sections. The capacity of the cache partition 3 includes 2 x 8 addressable segments, and the cache partition 4 includes 2 x 8 addressable segments. The set and path segmentation are illustrated in FIG. The cache storage area 5 includes an array of addressable memory segments that can be individually addressed by N lanes and M sets, where N and M are positive integers. As further indicated in FIG. 2, the cache storage area 5 includes 8 lanes and 8 sets, and is segmented into cache partitions 6, 7, and 8, wherein the cache partition 6 includes 4×4 addressable sections. The capacity of the cache partition 7 includes 2 x 1 addressable sectors, and the cache partition 8 includes 1 x 8 addressable sectors. The effectiveness of the combined set/path segmentation scheme is questionable, which exceeds a relatively small number of coarse-grained and highly customized solutions. When generating and deleting component caches and resizing them, it is necessary to manipulate and manage the complex geometric representation of the aggregate/path segmentation cache over time. Therefore, the set/path segmentation method is not scalable. Therefore, there is a need for an improved mechanism for dynamically managing cache partitions.
揭示用於調適性地分割一共用快取之計算系統及方法的替代性實施例。實例實施例說明硬體元件,包括儲存暫存器、互連件及可部署於一計算裝置中之一改良型快取控制器中的邏輯電路。 An alternative embodiment for a computing system and method for adaptively segmenting a shared cache is disclosed. Example embodiments illustrate hardware components, including storage registers, interconnects, and logic circuitry that can be deployed in an improved cache controller in a computing device.
另一實例實施例包括一種計算裝置。該計算裝置包括一處理器、一共用快取,及一共用或調適性分割快取控制器。該共用快取為該處理器提供一資料儲存資源。該調適性分割快取控制器以通信方式耦接至該處理器及該共用快取。該調適性分割快取控制器包括一輸入埠、暫存器、每通路一個標記比較及過時邏輯模組,及一選擇邏輯模組。該輸入埠自在該電腦裝置上執行或與其通信之一用戶端接收存取請求。該等暫存器為該共用快取之所要數目個組件快取或區段提供對操作參數之資料儲存。該標記比較及過時邏輯耦接至該等暫存器且經配置以接收對資訊已儲存於該共用快取中之一特定快取行中的經過時間之一量度、記錄一過時條件且條件性地更新一儲體指標。該選擇邏輯耦接至該標記比較及過時邏輯及該等暫存器。該選擇邏輯經配置以條件性地識別在該共用快取中待替換之一替換候選者。 Another example embodiment includes a computing device. The computing device includes a processor, a shared cache, and a shared or adaptive split cache controller. The shared cache provides a data storage resource for the processor. The adaptive split cache controller is communicatively coupled to the processor and the shared cache. The adaptive segmentation cache controller includes an input port, a register, a tag comparison and outdated logic module per channel, and a selection logic module. The input terminal receives an access request from a client executing on or communicating with the computer device. The registers provide data storage for operational parameters for a desired number of component caches or segments of the shared cache. The tag comparison and obsolescence logic is coupled to the registers and configured to receive one of the elapsed times for which information has been stored in a particular cache line of the shared cache, to record an obsolete condition and conditional Update a storage indicator. The selection logic is coupled to the tag comparison and obsolete logic and the registers. The selection logic is configured to conditionally identify one of the replacement candidates to be replaced in the shared cache.
另一實例實施例為一種用於調適性地分割一共用快取之方法。該方法包含以下步驟:儲存界定該共用快取中之整數數目個組件快取的態樣之參數,其中該整數數目個組件快取中的至少一者與包括一目標佔用數及一佔用數計數之一組參數相關聯;將對應於該整數數目個組件快取內之一組件快取的一唯一識別符應用於用戶端請求;回應於將用戶端資料儲存於該共用快取中之用戶端請求,使用儲存於該組參數中之資訊來進行快取行分配,其中當分配一快取行時,遞增一相關聯佔用數計數;及使用儲存於該組參數中之該資訊來識別用於收回的候選快取行,其中當收回一快取行 時,遞減該相關聯佔用數計數。 Another example embodiment is a method for adaptively splitting a shared cache. The method includes the steps of: storing parameters defining an integer number of component caches in the shared cache, wherein at least one of the integer number of component caches includes a target occupancy number and an occupancy count One set of parameters is associated; a unique identifier corresponding to one component cache of the integer number of component caches is applied to the client request; in response to the client storing the client data in the shared cache Requesting, using information stored in the set of parameters for cache line allocation, wherein when a cache line is allocated, incrementing an associated occupancy count; and using the information stored in the set of parameters to identify Retrieved candidate cache line, where when reclaiming a cache line When the associated occupancy count is decremented.
在另一實施例中,一種調適性分割控制器包括暫存器、標記比較及過時邏輯以及選擇邏輯。該等暫存器儲存一共用快取內之所要數目個組件快取的操作參數。該標記比較及過時邏輯耦接至該等暫存器,且接收一快取位址及識別該共用快取中之一快取行的一唯一組件快取識別符。回應於一操作模式輸入,該標記比較及過時邏輯產生對資訊已儲存於該共用快取中之一特定快取行中的經過時間之一量度、條件性地記錄一過時條件、一有效條件且更新一儲體指標。該選擇邏輯耦接至該標記比較及過時邏輯以及該等暫存器。該選擇邏輯經配置以接收對資訊已儲存於該共用快取中之該特定快取行中的經過時間之該量度、該唯一組件快取識別符及一優先級指示符。該選擇邏輯根據該過時條件、一有效條件及該組件快取識別符而識別在該共用快取中待替換之一候選快取行。 In another embodiment, an adaptive segmentation controller includes a scratchpad, tag comparison and obsolescence logic, and selection logic. The registers store operational parameters of a desired number of component caches within a shared cache. The tag comparison and obsolescence logic is coupled to the registers, and receives a cache address and a unique component cache identifier identifying one of the cache lines of the shared cache. In response to an operation mode input, the tag comparison and obsolete logic generates a measure of the elapsed time that the information has been stored in a particular cache line of the shared cache, conditionally records an obsolete condition, a valid condition, and Update a storage indicator. The selection logic is coupled to the tag comparison and obsolete logic and the registers. The selection logic is configured to receive the measure of elapsed time for the information stored in the particular cache line in the shared cache, the unique component cache identifier, and a priority indicator. The selection logic identifies one of the candidate cache lines to be replaced in the shared cache based on the obsolete condition, a valid condition, and the component cache identifier.
在一替代性實施例中,該新穎快取控制器及方法之部分可由一種非暫時性處理器可讀媒體實現,該非暫時性處理器可讀媒體在其上儲存有當執行時引導一處理器及一調適性分割快取控制器執行操作之處理器可執行指令。該等操作包括:界定組態參數、佔用數參數及時間相關參數,包括一唯一識別符、一目標佔用數、一佔用數計數及整數數目個儲體指標;自一用戶端應用程式識別一共用快取存取請求;產生對資訊已儲存於一系統快取中之一特定快取行中的經過時間之一量度;記錄一過時條件;條件性地更新一儲體指標;及條件性地識別在該系統快取中待替換之一替換候選者。In an alternative embodiment, portions of the novel cache controller and method may be implemented by a non-transitory processor readable medium having stored thereon a processor that, when executed, directs a processor And a processor-executable instruction that adaptively splits the cache controller to perform operations. The operations include: defining configuration parameters, occupancy parameters, and time-related parameters, including a unique identifier, a target occupancy number, an occupancy count, and an integer number of storage indicators; identifying a share from a client application Cache access request; generate one of the elapsed times for which information has been stored in a particular cache line in a system cache; record an obsolete condition; conditionally update a storage index; and conditionally identify One of the candidate candidates to be replaced in the system cache.
本文中使用詞語「例示性」以意謂「充當實例、例子或說明」。在本文中被描述為「例示性」之任何態樣未必應被認作比其他態樣較佳或有利。 在此描述中,術語「應用程式」亦可包括具有可執行內容之檔案,諸如:對象程式碼、指令碼、位元組碼、標示語言檔案以及修補程式。另外,本文中所提及之「應用程式」亦可包括在本質上不可執行之檔案,諸如可能需要打開之文獻或需要存取之其他資料檔案。 將參考附圖來詳細地描述各種態樣。在任何可能之處,貫穿圖式將使用相同參考數字指代相同或類似部分。對特定實例及實施之參考係出於說明性目的,且並不意欲限制本發明之系統及方法或申請專利範圍之範疇。 術語「計算裝置」與「行動裝置」在本文中可互換地使用以指以下各者中之任一者或全部:蜂巢式電話、智慧型電話、個人或行動多媒體播放器、個人資料助理(PDA)、膝上型電腦、平板電腦、智慧筆記型電腦、超級本、掌上型電腦、無線電子郵件接收器、具多媒體網際網路功能之蜂巢式電話、無線遊戲控制器及包括記憶體之類似個人電子裝置,及可程式化處理器。 儘管各種態樣特別適用於行動計算裝置(諸如智慧型電話,其具有有限能量資源),但各態樣大體上適用於實施複數個記憶體裝置且其中能效為關注點之任何電子裝置。因此,調適性分割快取控制器及方法可應用於以桌上型及機架安裝型外觀尺寸所配置之計算裝置中,計算裝置通常僅在其他電源不可用時才使用電池。 術語「系統單晶片」(SoC)在本文中用以指通常(但非排他性地)包括硬體核心、記憶體及通信介面之一組互連的電子電路。硬體核心可包括多種不同類型的處理器,諸如通用處理器、中央處理單元(CPU)、數位信號處理器(DSP)、圖形處理單元(GPU)、加速處理單元(APU)、輔助處理器、單核心處理器及多核心處理器。硬體核心可進一步體現其他硬體及硬體組合,諸如場可程式化閘陣列(FPGA)、特殊應用積體電路(ASIC)、其他可程式化邏輯裝置、離散閘邏輯、電晶體邏輯、效能監視硬體、看門狗硬體及時間參考。積體電路可經組態使得積體電路之組件駐留於單塊半導體材料(諸如矽)上。 術語「內容」在本文中用以描述具有可執行內容之檔案,諸如:目標碼、指令碼、位元組碼、標示語言檔案及修補程式。此外,本文中所提及之「內容」亦可包括在本質上不可執行之檔案,諸如需要存取之資料檔案或資料值。 如在本說明書中所使用,術語「組件」、「模組」、「系統」及類似者意欲指代電腦相關實體,其為硬體、韌體、硬體與軟體之組合、軟體或執行中之軟體中的任一者。舉例而言,組件可為(但不限於)在處理器上執行之處理程序、處理器、物件、可執行體、執行緒、程式及/或電腦。藉助於說明,在計算裝置上執行之應用程式及該計算裝置兩者皆可為組件。一或多個組件可駐留於處理程序及/或執行緒內,且一組件可位於一個電腦上及/或分佈於兩個或多於兩個電腦之間。另外,此等組件可自上面儲存有各種資料結構之各種處理器可讀媒體執行。組件可(諸如)根據具有一或多個資料封包的信號(例如,來自藉助於該信號與在本端系統、分散式系統中的另一組件及/或跨越網路(諸如網際網路)與其他系統互動之一個組件的資料)藉助於本端及/或遠端處理程序而通信。 如在本說明書中所使用,術語「調適性」描述且包括可在多個用戶端之間不存在並行時增長超出目標佔用數但在多個用戶端之間存在並行使用之情況下保持接近目標佔用數的快取。調適性分割快取控制器將保證各組件快取之最小佔用數,不論相對並行性如何。此外,組件快取可在並行性低時機會性地佔用額外儲存空間,從而引起增大之效能。因此,在一些實施例中,調適性分割快取控制器可藉由動態地調整不存在組件快取參數的軟件額外負擔及干預來實現較佳效能。 一實例實施例包括一種調適性分割快取控制器。該調適性分割快取控制器包括暫存器、標記比較及過時邏輯模組以及犧牲者選擇邏輯模組。每通路存在一個標記比較及過時邏輯。該等暫存器經配置以儲存共用快取之所要數目個組件快取的一組操作參數。各狀態邏輯模組以通信方式耦接至暫存器,且經配置以接收與向快取所呈現之異動相關聯的快取位址、唯一組件快取識別符及視情況選用的特定屬性。各標記比較及過時邏輯模組回應於當前儲體指標、溢位位元、作用中位元、優先級層級及替換策略模式而產生對資訊已儲存於共用快取中之特定快取行中的經過時間之量度、記錄過時條件且根據替換策略模式而更新儲體指標。在某些條件下,准許所識別組件快取超出各別所指派目標佔用數。犧牲者選擇邏輯以通信方式耦接至標記比較及過時邏輯模組及暫存器。犧牲者選擇邏輯經配置以並行地自各標記比較及過時邏輯模組接收對資訊已儲存於共用快取中之特定快取行中的經過時間之量度、組件快取識別符、優先級層級、過時及有效條件,且根據犧牲者選擇演算法而識別在共用快取中待替換之候選快取行。選擇演算法將優先選擇非有效候選者,否則選擇過時候選者,否則選擇較低優先級候選者,否則選擇具有相同組件快取識別符之最舊候選者,否則不選擇犧牲者。 The word "exemplary" is used herein to mean "serving as an example, instance, or illustration." Any aspect described herein as "exemplary" is not necessarily to be construed as preferred or advantageous. In this description, the term "application" may also include files having executable content such as object code, instruction code, byte code, markup language file, and patch. In addition, the "applications" mentioned herein may also include files that are not executable in nature, such as documents that may need to be opened or other data files that need to be accessed. Various aspects will be described in detail with reference to the accompanying drawings. Wherever possible, the same reference numerals reference The specific examples and implementations are for illustrative purposes and are not intended to limit the scope of the systems and methods of the invention or the scope of the claims. The terms "computing device" and "mobile device" are used interchangeably herein to refer to any or all of the following: a cellular telephone, a smart telephone, a personal or mobile multimedia player, a personal data assistant (PDA) ), laptops, tablets, smart laptops, ultrabooks, palmtops, wireless email receivers, cellular phones with multimedia Internet capabilities, wireless game controllers and similar individuals including memory Electronic devices, and programmable processors. While the various aspects are particularly applicable to mobile computing devices, such as smart phones, which have limited energy resources, the various aspects are generally applicable to any electronic device that implements a plurality of memory devices and in which energy efficiency is a concern. Thus, the adaptive segmentation cache controller and method can be applied to computing devices configured in a desktop and rack-mounted form factor, and the computing device typically only uses the battery when other power sources are not available. The term "system-on-a-chip" (SoC) is used herein to refer to an electronic circuit that typically (but not exclusively) includes a group of hardware cores, memory, and communication interfaces. The hardware core may include a plurality of different types of processors, such as general purpose processors, central processing units (CPUs), digital signal processors (DSPs), graphics processing units (GPUs), accelerated processing units (APUs), auxiliary processors, Single core processor and multi-core processor. The hardware core can further embody other hardware and hardware combinations, such as field programmable gate array (FPGA), special application integrated circuits (ASIC), other programmable logic devices, discrete gate logic, transistor logic, performance Monitor hardware, watchdog hardware, and time reference. The integrated circuit can be configured such that components of the integrated circuit reside on a single piece of semiconductor material, such as germanium. The term "content" is used herein to describe a file having executable content, such as: object code, instruction code, byte code, markup language file, and patch. In addition, the "content" mentioned in this document may also include files that are not executable in nature, such as data files or data values that need to be accessed. As used in this specification, the terms "component," "module," "system," and the like are intended to refer to a computer-related entity that is a combination of hardware, firmware, hardware, and software, software, or implementation. Any of the software. For example, a component can be, but is not limited to being, a processor, a processor, an object, an executable, a thread, a program, and/or a computer executing on a processor. By way of illustration, both an application executing on a computing device and the computing device can be a component. One or more components can reside within a process and/or thread, and a component can be located on a computer and/or distributed between two or more than two computers. In addition, such components can be executed from a variety of processor readable media having various data structures stored thereon. The component may, for example, be based on a signal having one or more data packets (eg, from the signal with the other system in the local system, the distributed system, and/or across the network (such as the Internet) The data of one component of other system interactions) communicates by means of the local and/or remote processing program. As used in this specification, the term "adaptation" describes and includes that it can grow beyond the target occupancy when there is no parallel between multiple clients, but stay close to the target if there is parallel use between multiple clients. The cache of the number of occupations. The adaptive split cache controller will guarantee the minimum occupancy of each component cache, regardless of relative parallelism. In addition, component caches can take up additional storage space opportunistically when parallelism is low, resulting in increased performance. Thus, in some embodiments, the adaptive segmentation cache controller can achieve better performance by dynamically adjusting the software overhead and intervention of the absence of component cache parameters. An example embodiment includes an adaptive split cache controller. The adaptive split cache controller includes a scratchpad, a tag comparison and an outdated logic module, and a victim selection logic module. There is one tag comparison and outdated logic per channel. The registers are configured to store a set of operational parameters of a desired number of component caches of the shared cache. Each state logic module is communicatively coupled to the scratchpad and configured to receive a cache address associated with the transaction presented to the cache, a unique component cache identifier, and a particular attribute selected as appropriate. Each of the tag comparison and obsolete logic modules generates, in response to the current bank indicator, the overflow bit, the active bit, the priority level, and the replacement strategy mode, the information is stored in a particular cache line in the shared cache. The time indicator is measured, the obsolete condition is recorded, and the storage indicator is updated according to the replacement strategy mode. Under certain conditions, the identified component cache is allowed to exceed the number of target assignments assigned by each. The victim selection logic is communicatively coupled to the tag comparison and obsolete logic module and the scratchpad. The victim selection logic is configured to receive, in parallel from each of the tag comparison and obsolete logic modules, a measure of elapsed time, component cache identifier, priority level, obsolete, for information stored in a particular cache line in the shared cache. And valid conditions, and the candidate cache lines to be replaced in the shared cache are identified according to the victim selection algorithm. The selection algorithm will preferentially select non-valid candidates, otherwise select outdated candidates, otherwise select lower priority candidates, otherwise select the oldest candidate with the same component cache identifier, otherwise the victim will not be selected.
圖3說明一系統,其包括與適合於與各種態樣一起使用之遠端電腦50通信之計算裝置10。計算裝置10包括具有處理器14、記憶體16、通信介面18及儲存介面20之SoC 12。計算裝置10進一步包括諸如有線或無線數據機之通信組件22、儲存組件24、用於建立至無線網路30之無線連接32的天線26及/或用於連接到至網際網路40之有線連接44的網路介面28。處理器14可包括多種硬體核心中之任一者以及若干處理器核心。計算裝置10亦可包括一或多個處理器14及兩個或多於兩個SoC 12,藉此增大處理器14及處理器核心之數目。計算裝置10可包括不與SoC 12相關聯之單獨處理器核心14。個別處理器14可為如下文參考圖4所描述之多核心處理器。處理器14可出於特定目的各自組態為可與計算裝置10之其他處理器14相同或不同。處理器14及具有相同或不同組態之處理器核心中的一或多者可一起分組為如下文參考圖5所描述之計算裝置10的一或多個子系統之部分。 3 illustrates a system that includes computing device 10 in communication with a remote computer 50 suitable for use with a variety of aspects. The computing device 10 includes a SoC 12 having a processor 14, a memory 16, a communication interface 18, and a storage interface 20. The computing device 10 further includes a communication component 22, such as a wired or wireless data modem, a storage component 24, an antenna 26 for establishing a wireless connection 32 to the wireless network 30, and/or a wired connection for connecting to the Internet 40. 44 network interface 28. Processor 14 can include any of a variety of hardware cores as well as several processor cores. Computing device 10 may also include one or more processors 14 and two or more SoCs 12, thereby increasing the number of processors 14 and processor cores. Computing device 10 may include a separate processor core 14 that is not associated with SoC 12. The individual processor 14 can be a multi-core processor as described below with reference to FIG. Processors 14 may each be configured to be the same or different than other processors 14 of computing device 10 for a particular purpose. One or more of processor 14 and processor cores having the same or different configurations may be grouped together as part of one or more subsystems of computing device 10 as described below with reference to FIG.
SoC 12之記憶體16可為經組態用於儲存資料及處理器可執行碼以供由處理器14存取之揮發性或非揮發性記憶體。在一態樣中,記憶體16可經組態以至少暫時儲存資料結構,諸如用於管理如下文所描述而調適性地分割之快取記憶體的組件快取之表格。如下文進一步詳細論述,處理器核心中之每一者可存取快取記憶體之多個組件快取。 The memory 16 of the SoC 12 can be a volatile or non-volatile memory configured to store data and processor executable code for access by the processor 14. In one aspect, the memory 16 can be configured to at least temporarily store a data structure, such as a table of component caches for managing cached memory that is adaptively segmented as described below. As discussed in further detail below, each of the processor cores can access a plurality of component caches of the cache memory.
計算裝置10及/或SoC 12可包括經組態以用於各種目的之一或多個記憶體16。在一態樣中,一或多個記憶體16可經組態以專用於以使得資料結構資訊能夠由處理器14存取以用於管理組件快取存取請求之方式儲存用於儲存組件快取資訊之資料結構。當儲存資料結構之記憶體16為非揮發性的時,記憶體16可保持資料結構之資訊,甚至在已切斷計算裝置10之電力之後亦如此。當恢復電力且計算裝置10重啟時,儲存於非揮發性記憶體16中之資料結構的資訊可對計算裝置10可用。 通信介面18、通信組件22、天線26及/或網路介面28可協同工作,以使得計算裝置10能夠經由無線連接32及/或有線網路連接44及包括遠端計算裝置50之網際網路40上的節點而經由無線網路30通信。可使用包括(例如)用於無線通信之射頻頻譜的多種無線通信技術實施無線網路30,以提供連接至網際網路40之計算裝置10,計算裝置10可藉由網際網路與遠端計算裝置50交換資料。 儲存介面20及儲存組件24可協同工作以允許計算裝置10在非揮發性儲存媒體上儲存資料。儲存組件24可經組態成極類似於記憶體16之一態樣,其中儲存組件24可儲存資料結構,使得儲存於資料結構中之資訊可由一或多個處理器14存取。為非揮發性的儲存組件24甚至在已切斷計算裝置10之電力之後亦可保持所儲存資訊。當電力恢復且計算裝置10重啟時,儲存於儲存組件24中之資訊可對計算裝置10可用。儲存介面20可控制對儲存組件24之存取,且允許處理器14自儲存組件24讀取資料及將資料寫入至儲存組件24。 計算裝置10之組件中的一些或全部可以不同方式加以配置及/或組合,同時仍為必要功能提供服務。此外,計算裝置10可不受限於組件中之每一者中的一者,且各組件之多個例項可包括於計算裝置10之各種組態中。 圖4說明適合於實施一態樣之多核心處理器14。多核心處理器14可具有複數個等效處理器核心200、201、202、203。處理器核心200、201、202、203可等效,此係因為單個處理器14的處理器核心200、201、202、203可經組態用於相同目的且具有相同或類似效能特性。舉例而言,處理器14可為通用處理器,且處理器核心200、201、202、203可為等效通用處理器核心。或者,處理器14可為GPU或DSP,且處理器核心200、201、202、203可分別為等效GPU核心或DSP核心。經由製造製程及材料之變化,處理器核心200、201、202、203之效能特性可能在相同多核心處理器14內或在使用相同設計的處理器核心之另一多核心處理器14內在處理器核心之間不同。 在圖4中所說明之實例中,多核心處理器14包括四個處理器核心200、201、202、203 (亦即,處理器核心0、處理器核心1、處理器核心2及處理器核心3)。為易於解釋,本文中之實例可提及圖4中所說明之四個處理器核心200、201、202、203。然而,應注意,圖4中所說明且本文中所描述之四個處理器核心200、201、202、203僅作為一實例而提供,且決不意欲為限制性的。計算裝置10、SoC 12或多核心處理器14可個別地或組合地包括少於或多於本文中所說明且描述之四個處理器核心200、201、202、203。 圖5說明經組態以按自調適性方式將快取記憶體分割成組件快取之SoC 12。SoC 12可包括如上文所描述之多種組件。可使用一些此等組件及額外組件以實施組件快取。舉例而言,經組態以實施組件快取之SoC 12可包括系統集線器300、系統快取302、系統快取控制器304、CPU叢集306、協定轉換器308、GPU 310、數據機DSP 312、應用程式DSP 314、記憶體介面316、相機子系統318、視訊子系統320、顯示子系統322、系統晶片上網路(NoC) 324、記憶體控制器326及隨機存取記憶體(RAM) 328。 系統集線器300為SoC 12之組件,SoC 12管理CPU叢集306經由協定轉換器308以及GPU 310、數據機DSP 312及應用程式DSP 314對各種記憶體之存取。如圖5中所說明,系統集線器300進一步管理相機子系統318、視訊子系統320及顯示子系統322經由系統NoC 324對各種記憶體裝置之存取。子系統中之每一者可經由藉由系統NoC 324、記憶體介面316及系統快取控制器324建立之各種路徑存取系統快取302。RAM 328之多個例項經由記憶體控制器326及系統快取控制器324耦接至系統快取302。類似地,GPU 310、數據機DSP 312及應用程式DSP 314中之每一者可經由記憶體介面316及系統快取控制器324存取系統快取302。在一態樣中,系統集線器300可管理對SoC 12之系統快取302以及對RAM 328的存取。可存取各種記憶體之處理器中之一些可包括於CPU叢集306及各種子系統(諸如相機子系統318、視訊子系統320及顯示子系統322)中,且亦可包括其他專用處理器,諸如GPU 310、數據機DSP 312及應用程式DSP 314。 系統快取302可為SoC 12中之用以替換或補充可與各種處理器及/或子系統相關聯之快取記憶體的共用記憶體裝置。系統快取302可集中化SoC 12之快取記憶體資源,以使得各種處理器及子系統可存取系統快取302以讀取及寫入經指定用於重複及/或快速存取之程式命令及資料。系統快取302可儲存來自各種處理器及子系統以及來自計算裝置之其他記憶體裝置(諸如主記憶體(未展示)、RAM 328或其他儲存裝置(例如,固態記憶體模組、硬碟機))的資料。在一態樣中,倘若因為無法定位自系統快取302所請求之項目而發生快取遺漏,則系統快取302可由此等記憶體及儲存裝置加以補充。在一態樣中,系統快取302可用作各種處理器及子系統之高速暫存記憶體。系統快取302可在儲存空間及實體大小上比具有不使用系統快取302之類似架構的SoC之本端快取記憶體的組合小。然而,如本文中進一步描述,系統快取302之管理可允許較大能量節省及SoC 12之相等或較佳效能速度,儘管系統快取之儲存空間及實體大小較小。 在稱作組件快取之分割區中邏輯地管理系統快取302。各組件快取含有藉由組件快取識別符與彼此相關聯之一組有限的快取行。在計算裝置10上在作用中之處理程序或執行緒可在與系統快取控制器304通信時識別組件快取。可藉由使用用於所有由系統快取控制器304接收的不具有組件快取識別符之快取異動(諸如由CPU發出之快取異動)之預設組件快取識別符來將此等異動指派給同一組件快取。 在啟動組件快取之後,根據計算裝置10上之當前工作負荷而根據需求分配屬於組件快取之快取行。系統快取控制器304藉由分配至組件快取之快取行的計數追蹤組件快取使用或佔用數。快取控制器304將組件快取之目標佔用數應用或指派為整數數目個快取行。各別目標佔用數儲存於暫存器中以供由系統快取控制器304使用。當組件快取超出目標佔用數時,組件快取處於溢出狀態。組件快取可藉由將額外快取行添加至組件快取而繼續增長,但將儲存於組件快取中之最舊快取行標識為過時。因此,溢出組件快取將具有至少兩個部分:一最新部分及一過時部分。在一些條件下,甚至最新部分亦可能超出目標佔用數。 過時快取行或過時處理程序之識別由替換策略用以識別用於自組件快取移除的候選快取行。當衝突遺漏發生時,回應於快取分配請求,系統快取控制器304應用替換策略。 系統快取控制器304及用於調適性地分割系統快取302之方法將各快取行指派給稱作儲體之群組。將固定數目個儲體指派給組件快取。出於說明及解釋的目的,使用整數16。應理解,儲體之數目可小於或大於16,此選擇可基於計算裝置10之預計工作負荷或根據其他準則。將唯一儲體指標指派給16個儲體。在一實例實施例中,儲體指標由數字0至15識別,其可由4個位元表示。此相比於可使用粗糙時戳之先前技術為較低數目個位元。在此先前技術中,必須使用大得多數目個位元以減少由混疊造成之年齡遺漏計算的出現。當前電路及模組使用新機制基於過時位元條件來以顯著較低之硬體成本達成相同目標。 一種用於調適性地分割共用快取之實例方法進一步包括將儲體指標及儲體層級計數指派給整數數目個組件快取中之至少一者的步驟,整數數目個組件快取邏輯地劃分成整數數目個儲體。對於定址特定組件快取之快取行分配操作,遞增指派給特定組件快取之儲體層級計數中的儲體層級值。當儲體層級值達到各別組件快取之目標佔用數的固定分率時,遞增儲體指標且重設儲體層級計數。將與所分配快取行相關聯之儲體指標儲存於共用快取中。 此外,對於定址各別組件快取之存取操作,系統快取控制器304識別用於替換的一組候選快取行,且針對該組候選快取行,藉由比較與候選快取行相關聯之儲體指標中的值與各別組件快取之當前儲體值來計算各別年齡。系統快取控制器304取決於各別年齡而判定過時位元條件,且將過時位元條件儲存於共用快取中。與各別組件快取相關聯之該組參數包括最近最少使用參數,其條件性地識別應在何時將來自較舊儲體之快取行移至當前儲體。當快取行自較舊儲體移至當前儲體時,系統快取控制器304設定或確證再新位元。對於引起命中條件之組件快取存取操作,系統快取控制器304條件性地更新儲體指標、過時位元條件及各別組件快取之儲體層級值。當設定或確證最近最少使用參數且儲體指標不等於組件快取當前儲體指標時,再新儲體指標,重設或撤銷確證過時位元條件,且遞增儲體層級值。 對於引起遺漏條件之組件快取存取操作,系統快取控制器304條件性地識別替換候選者。更確切而言,系統快取控制器304將優先級參數指派給組件快取,且在無過時候選者存在時條件性地識別替換候選者。在較佳序列中,包括自任何過時快取行當中、自具有比傳入快取行低或相等優先級之最低優先級快取行及當識別出多個最新替換候選者時,自存在於共用快取中之最舊快取行選擇替換候選者。當識別出多個替換候選者時,系統快取控制器304應用替代性替換候選者識別準則。在一個實例實施例中,替代性替換候選者準則包括識別共用快取302中的預設通路。系統快取控制器304亦將可選分配參數指派給共用快取302中之各別組件快取,且在所識別組件快取未滿時條件性地分配快取行。或者,系統快取控制器304在各別組件快取已滿且設定或確證可選分配參數時不分配快取行。 圖6為說明圖5之系統快取控制器304的組件之示意圖,該等組件使得能夠調適性地分割與系統快取控制器304通信之系統快取302。用於使得能夠在SoC 12上之各種處理器與系統快取302之間通信的習知緩衝器及邏輯電路為一般熟習此項技術者所熟知。為了易於說明及描述調適性分割快取控制器及相關功能而省略此等習知元件。 系統快取控制器304,其在所說明實施例中為調適性分割快取控制器,包括耦接至通信匯流排之輸入/輸出(I/O)埠610,該匯流排將記憶體介面316、記憶體控制器320及協定轉換器308連接至系統快取控制器304。在通信匯流排610上自CPU叢集306、相機子系統318、視訊子系統320、顯示子系統322、GPU 310、數據機DSP 312或應用程式DSP 314中之執行資源上的執行緒或處理程序接收快取初始化及分配請求。如所說明,I/O埠連接件610a將傳入CCID傳達至組件快取暫存器620及選擇邏輯模組640兩者。更新邏輯模組650自組件快取暫存器620接收傳入CCID。在一替代性實施例(未展示)中,更新邏輯模組650可配置有專用連接件以接收傳入CCID。除了傳入CCID以外,亦沿著連接件610b將快取位址傳達至標記比較及過時邏輯模組630。如圖6中所指示,在共用或系統快取302中對於各所識別通路存在標記比較及過時邏輯模組630之一例項。 如圖6中所指示,調適性分割快取控制器304之I/O埠610接收傳入異動之位址及相關聯組件快取識別符(或CCID)。將CCID傳達至一組組件快取暫存器620及更新邏輯模組650。組件快取暫存器620包括用於快取組態參數、組件快取佔用數參數及儲體參數之儲存位置。在表1中呈現實例組件快取參數。調適性分割快取控制器之一重要態樣為經由諸如定義於表1中之參數的參數的設定界定共用快取302中之不同組件快取的不同行為的能力。可添加新特性且可根據需要對於各組件快取調整舊特性,而不影響組件快取用戶端或SoC基礎結構。實例組件快取佔用數參數包括於表2中。在表3中呈現實例儲體或時序參數。將識別系統快取302之一部分的位址轉遞至標記比較及過時邏輯模組630以及選擇邏輯模組640。標記比較及過時邏輯模組630使標記資訊與儲存於系統快取302中之各快取行相關聯或將標記資訊應用於各快取行。在表4中呈現實例標記資訊。
1‧‧‧快取儲存區
2‧‧‧快取分割區
3‧‧‧快取分割區
4‧‧‧快取分割區
5‧‧‧快取儲存區
6‧‧‧快取分割區
7‧‧‧快取分割區
8‧‧‧快取分割區
10‧‧‧計算裝置
12‧‧‧系統單晶片(SoC)
14‧‧‧處理器
16‧‧‧記憶體
18‧‧‧通信介面
20‧‧‧儲存介面
22‧‧‧通信組件
24‧‧‧儲存組件
26‧‧‧天線
28‧‧‧網路介面
30‧‧‧無線網路
32‧‧‧無線連接
40‧‧‧網際網路
44‧‧‧有線連接
50‧‧‧遠端電腦
200‧‧‧處理器核心
201‧‧‧處理器核心
202‧‧‧處理器核心
203‧‧‧處理器核心
300‧‧‧系統集線器
302‧‧‧系統快取/共用快取
304‧‧‧系統快取控制器
306‧‧‧中央處理單元(CPU)叢集
308‧‧‧協定轉換器
310‧‧‧圖形處理單元(GPU)
312‧‧‧數據機數位信號處理器(DSP)
314‧‧‧應用程式數位信號處理器(DSP)
316‧‧‧記憶體介面
318‧‧‧相機子系統
320‧‧‧視訊子系統
322‧‧‧顯示子系統
324‧‧‧系統晶片上網路(NoC)
326‧‧‧記憶體控制器
328‧‧‧隨機存取記憶體(RAM)
610‧‧‧輸入/輸出(I/O)埠
610a‧‧‧輸入/輸出(I/O)埠連接件
610b‧‧‧連接件
620‧‧‧組件快取暫存器
625‧‧‧匯流排
630‧‧‧標記比較及過時邏輯模組
630a‧‧‧標記比較及過時邏輯模組
630b‧‧‧標記比較及過時邏輯模組
630p‧‧‧標記比較及過時邏輯模組
637‧‧‧匯流排
640‧‧‧選擇邏輯模組
650‧‧‧更新邏輯模組
710‧‧‧SRAM
720‧‧‧多工器
722‧‧‧多工器
724‧‧‧多工器
726‧‧‧多工器
730‧‧‧比較器
732‧‧‧「及」閘
733‧‧‧多工器
734‧‧‧「及」閘
740‧‧‧比較器
742‧‧‧「或」閘
750‧‧‧模數比較器
752‧‧‧比較器
754‧‧‧「及」閘
756‧‧‧「或」閘
758‧‧‧「及」閘
760‧‧‧「及」閘
800‧‧‧方法
802‧‧‧區塊
804‧‧‧區塊
806‧‧‧區塊
808‧‧‧區塊
902a‧‧‧比較器
902p‧‧‧比較器
904a‧‧‧比較器
904p‧‧‧比較器
906a‧‧‧比較器
906p‧‧‧比較器
914‧‧‧「或」邏輯閘
916‧‧‧「及」閘
918‧‧‧多工器
920‧‧‧「或」邏輯閘
922‧‧‧「及」閘
924‧‧‧多工器
926‧‧‧所選最舊候選者邏輯模組
928‧‧‧「或」閘
930‧‧‧多工器
932a‧‧‧反相器
932p‧‧‧反相器
933‧‧‧「或」閘
934‧‧‧多工器
938‧‧‧「或」閘
940‧‧‧「或」閘
942‧‧‧多工器
944‧‧‧「或」閘
946‧‧‧挑選任何邏輯模組
950‧‧‧多工器
1100‧‧‧計算裝置
1120‧‧‧匯流排
1200‧‧‧處理器可讀媒體
1210‧‧‧組件快取組態參數
1220‧‧‧佔用數參數
1230‧‧‧時間相關參數
1240‧‧‧共用快取存取識別邏輯
1250‧‧‧儲體識別邏輯
1260‧‧‧替換候選者選擇邏輯1‧‧‧Cache storage area
2‧‧‧Cache partition
3‧‧‧Cache partition
4‧‧‧Cache partition
5‧‧‧Cache storage area
6‧‧‧Cache partition
7‧‧‧Cache partition
8‧‧‧Cache partition
10‧‧‧ Computing device
12‧‧‧System Single Chip (SoC)
14‧‧‧ Processor
16‧‧‧ memory
18‧‧‧Communication interface
20‧‧‧ Storage interface
22‧‧‧Communication components
24‧‧‧Storage components
26‧‧‧Antenna
28‧‧‧Network interface
30‧‧‧Wireless network
32‧‧‧Wireless connection
40‧‧‧Internet
44‧‧‧Wired connection
50‧‧‧ remote computer
200‧‧‧ processor core
201‧‧‧ processor core
202‧‧‧ Processor Core
203‧‧‧ processor core
300‧‧‧System Hub
302‧‧‧System Cache/Share Cache
304‧‧‧System cache controller
306‧‧‧Central Processing Unit (CPU) Cluster
308‧‧‧Commitment Converter
310‧‧‧Graphical Processing Unit (GPU)
312‧‧‧Data machine digital signal processor (DSP)
314‧‧‧Application Digital Signal Processor (DSP)
316‧‧‧ memory interface
318‧‧‧ camera subsystem
320‧‧‧Video subsystem
322‧‧‧Display subsystem
324‧‧‧System on-chip network (NoC)
326‧‧‧ memory controller
328‧‧‧ Random Access Memory (RAM)
610‧‧‧Input/Output (I/O)埠
610a‧‧‧Input/Output (I/O)埠 Connector
610b‧‧‧Connecting parts
620‧‧‧Component cache register
625‧‧ ‧ busbar
630‧‧‧Marker comparison and obsolete logic module
630a‧‧‧Marker comparison and obsolete logic module
630b‧‧‧Marker comparison and obsolete logic module
630p‧‧‧Marker comparison and obsolete logic module
637‧‧‧ busbar
640‧‧‧Select logic module
650‧‧‧Update Logic Module
710‧‧‧SRAM
720‧‧‧Multiplexer
722‧‧‧Multiplexer
724‧‧‧Multiplexer
726‧‧‧Multiplexer
730‧‧‧ Comparator
732‧‧‧"and" gate
733‧‧‧Multiplexer
734‧‧‧"and" gate
740‧‧‧ comparator
742‧‧‧"or" gate
750‧‧• Analog Comparator
752‧‧‧ Comparator
754‧‧‧"and" gate
756‧‧‧"or" gate
758‧‧‧"and" gate
760‧‧‧"and" gate
800‧‧‧ method
802‧‧‧ block
804‧‧‧ Block
806‧‧‧ Block
808‧‧‧ Block
902a‧‧‧ comparator
902p‧‧‧ comparator
904a‧‧‧ comparator
904p‧‧‧ comparator
906a‧‧‧ comparator
906p‧‧‧ comparator
914‧‧‧"or" logic gate
916‧‧‧"and" gate
918‧‧‧Multiplexer
920‧‧‧"or" logic gate
922‧‧‧"and" gate
924‧‧‧Multiplexer
926‧‧‧Selected oldest candidate logic module
928‧‧‧"or" gate
930‧‧‧Multiplexer
932a‧‧‧Inverter
932p‧‧‧Inverter
933‧‧‧"or" gate
934‧‧‧Multiplexer
938‧‧‧"or" gate
940‧‧‧"or" gate
942‧‧‧Multiplexer
944‧‧‧"or" gate
946‧‧‧Select any logic module
950‧‧‧Multiplexer
1100‧‧‧ Computing device
1120‧‧‧ busbar
1200‧‧‧Processable media
1210‧‧‧Component cache configuration parameters
1220‧‧‧ occupancy parameter
1230‧‧‧ time related parameters
1240‧‧‧Shared cache access recognition logic
1250‧‧ ‧ storage identification logic
1260‧‧‧Replacement candidate selection logic
併入本文中且構成本說明書之部分的附圖說明本發明之實例態樣,且與上文所給出之一般描述及下文說給出之具體描述一起用以解釋新穎系統及方法之特徵。 在圖式中,除非另有指示,否則相同參考標號貫穿各視圖指代相同部分。對於諸如「102A」或「102B」之具有字母字符名稱的參考標號而言,字母字符名稱可區分相同圖中存在之兩個相同部分或元件。當希望參考標號涵蓋所有圖式中具有相同參考標號之所有部分時,可省略參考標號之字母字符名稱。 圖1為說明共用快取之習知通路分割的示意圖。 圖2為說明共用快取之習知集合及通路分割的示意圖。 圖3為說明包括適合於實施改良型快取控制器之計算裝置的計算環境之一實例實施例的功能區塊。 圖4為說明可實施於圖3之計算裝置中的多核心處理器之一實例實施例的示意圖。 圖5為說明圖3之系統單晶片的一實例實施例之示意圖。 圖6為說明圖5之共用快取控制器的組件之示意圖。 圖7為說明圖6之標記比較及過時邏輯模組的一實例實施例之示意圖。 圖8為說明由圖7之過時邏輯模組及更新邏輯模組實施之過時程序的一實例實施例之時序圖。 圖9為說明圖6之選擇邏輯模組的一實例實施例之示意圖。 圖10為說明用於調適性地分割共用快取之方法的一實例實施例之流程圖。 圖11為說明計算裝置之一實例實施例的示意圖。 圖12為說明包括處理器可讀媒體之一實例實施例的計算裝置之示意圖。BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are incorporated in FIG In the drawings, the same reference numerals refer to the For reference numerals having an alphabetic character name such as "102A" or "102B", the alphabetic character name can distinguish between two identical parts or elements present in the same figure. Whenever a reference numeral is used to cover all parts of the drawings having the same reference numerals, the letter character names of the reference numerals may be omitted. Figure 1 is a schematic diagram illustrating the conventional path segmentation of a shared cache. 2 is a schematic diagram illustrating a conventional set of shared caches and path segmentation. 3 is a functional block illustrating an example embodiment of a computing environment including a computing device suitable for implementing an improved cache controller. 4 is a schematic diagram illustrating an example embodiment of a multi-core processor that can be implemented in the computing device of FIG. FIG. 5 is a schematic diagram showing an example embodiment of the system single wafer of FIG. 3. FIG. 6 is a schematic diagram showing the components of the shared cache controller of FIG. 5. 7 is a schematic diagram showing an example embodiment of the tag comparison and obsolete logic module of FIG. 6. 8 is a timing diagram illustrating an example embodiment of an obsolete program implemented by the obsolete logic module and the update logic module of FIG. 9 is a schematic diagram showing an example embodiment of the selection logic module of FIG. 6. 10 is a flow chart illustrating an example embodiment of a method for adaptively partitioning a shared cache. 11 is a schematic diagram illustrating an example embodiment of a computing device. 12 is a schematic diagram illustrating a computing device including an example embodiment of a processor readable medium.
Claims (30)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US14/921,468 | 2015-10-23 | ||
| US14/921,468 US9734070B2 (en) | 2015-10-23 | 2015-10-23 | System and method for a shared cache with adaptive partitioning |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| TW201717040A TW201717040A (en) | 2017-05-16 |
| TWI627536B true TWI627536B (en) | 2018-06-21 |
Family
ID=57043042
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| TW105133396A TWI627536B (en) | 2015-10-23 | 2016-10-17 | System and method for a shared cache with adaptive partitioning |
Country Status (3)
| Country | Link |
|---|---|
| US (1) | US9734070B2 (en) |
| TW (1) | TWI627536B (en) |
| WO (1) | WO2017069907A1 (en) |
Families Citing this family (19)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10078613B1 (en) | 2014-03-05 | 2018-09-18 | Mellanox Technologies, Ltd. | Computing in parallel processing environments |
| US9832277B2 (en) * | 2015-11-13 | 2017-11-28 | Western Digital Technologies, Inc. | Systems and methods for adaptive partitioning in distributed cache memories |
| US10366013B2 (en) * | 2016-01-15 | 2019-07-30 | Futurewei Technologies, Inc. | Caching structure for nested preemption |
| US10503652B2 (en) | 2017-04-01 | 2019-12-10 | Intel Corporation | Sector cache for compression |
| US10424040B2 (en) | 2017-04-21 | 2019-09-24 | Intel Corporation | Dynamic allocation of cache based on instantaneous bandwidth consumption at computing devices |
| US10528519B2 (en) | 2017-05-02 | 2020-01-07 | Mellanox Technologies Ltd. | Computing in parallel processing environments |
| US10394747B1 (en) | 2017-05-31 | 2019-08-27 | Mellanox Technologies Ltd. | Implementing hierarchical PCI express switch topology over coherent mesh interconnect |
| US10789175B2 (en) * | 2017-06-01 | 2020-09-29 | Mellanox Technologies Ltd. | Caching policy in a multicore system on a chip (SOC) |
| CN111164580B (en) | 2017-08-03 | 2023-10-31 | 涅克斯硅利康有限公司 | Reconfigurable cache architecture and method for cache coherency |
| US10678690B2 (en) * | 2017-08-29 | 2020-06-09 | Qualcomm Incorporated | Providing fine-grained quality of service (QoS) control using interpolation for partitioned resources in processor-based systems |
| US10394716B1 (en) * | 2018-04-06 | 2019-08-27 | Arm Limited | Apparatus and method for controlling allocation of data into a cache storage |
| US10884959B2 (en) | 2019-02-13 | 2021-01-05 | Google Llc | Way partitioning for a system-level cache |
| WO2021089117A1 (en) * | 2019-11-05 | 2021-05-14 | Microsoft Technology Licensing Llc | Eviction mechanism |
| CN111143245B (en) * | 2019-11-15 | 2021-07-13 | 海光信息技术股份有限公司 | A cache data processing method, circuit, processor and chip |
| US11656997B2 (en) | 2019-11-26 | 2023-05-23 | Intel Corporation | Flexible cache allocation technology priority-based cache line eviction algorithm |
| US11294808B2 (en) | 2020-05-21 | 2022-04-05 | Micron Technology, Inc. | Adaptive cache |
| US11422934B2 (en) | 2020-07-14 | 2022-08-23 | Micron Technology, Inc. | Adaptive address tracking |
| US11409657B2 (en) | 2020-07-14 | 2022-08-09 | Micron Technology, Inc. | Adaptive address tracking |
| US11507516B2 (en) | 2020-08-19 | 2022-11-22 | Micron Technology, Inc. | Adaptive cache partitioning |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| TW457427B (en) * | 1999-03-31 | 2001-10-01 | Ibm | Method and system for dynamically partitioning a shared cache |
| US6591347B2 (en) * | 1998-10-09 | 2003-07-08 | National Semiconductor Corporation | Dynamic replacement technique in a shared cache |
| US20070143546A1 (en) * | 2005-12-21 | 2007-06-21 | Intel Corporation | Partitioned shared cache |
| US20080235457A1 (en) * | 2007-03-21 | 2008-09-25 | Hasenplaugh William C | Dynamic quality of service (QoS) for a shared cache |
| US20120042127A1 (en) * | 2010-08-13 | 2012-02-16 | Advanced Micro Devices, Inc. | Cache partitioning |
| US20140040556A1 (en) * | 2012-08-05 | 2014-02-06 | William L. Walker | Dynamic Multithreaded Cache Allocation |
Family Cites Families (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP5413001B2 (en) | 2009-07-09 | 2014-02-12 | 富士通株式会社 | Cache memory |
| US20120144118A1 (en) | 2010-12-07 | 2012-06-07 | Advanced Micro Devices, Inc. | Method and apparatus for selectively performing explicit and implicit data line reads on an individual sub-cache basis |
| US9658960B2 (en) | 2010-12-22 | 2017-05-23 | Advanced Micro Devices, Inc. | Subcache affinity |
| US20130097387A1 (en) | 2011-10-14 | 2013-04-18 | The Board Of Trustees Of The Leland Stanford Junior University | Memory-based apparatus and method |
| US9298626B2 (en) | 2013-09-26 | 2016-03-29 | Globalfoundries Inc. | Managing high-conflict cache lines in transactional memory computing environments |
| CN103699497B (en) | 2013-12-19 | 2017-01-04 | 京信通信系统(中国)有限公司 | A kind of cache allocation method and device |
-
2015
- 2015-10-23 US US14/921,468 patent/US9734070B2/en not_active Expired - Fee Related
-
2016
- 2016-09-22 WO PCT/US2016/053082 patent/WO2017069907A1/en not_active Ceased
- 2016-10-17 TW TW105133396A patent/TWI627536B/en not_active IP Right Cessation
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6591347B2 (en) * | 1998-10-09 | 2003-07-08 | National Semiconductor Corporation | Dynamic replacement technique in a shared cache |
| TW457427B (en) * | 1999-03-31 | 2001-10-01 | Ibm | Method and system for dynamically partitioning a shared cache |
| US20070143546A1 (en) * | 2005-12-21 | 2007-06-21 | Intel Corporation | Partitioned shared cache |
| US20080235457A1 (en) * | 2007-03-21 | 2008-09-25 | Hasenplaugh William C | Dynamic quality of service (QoS) for a shared cache |
| US20120042127A1 (en) * | 2010-08-13 | 2012-02-16 | Advanced Micro Devices, Inc. | Cache partitioning |
| US20140040556A1 (en) * | 2012-08-05 | 2014-02-06 | William L. Walker | Dynamic Multithreaded Cache Allocation |
Also Published As
| Publication number | Publication date |
|---|---|
| US9734070B2 (en) | 2017-08-15 |
| US20170116118A1 (en) | 2017-04-27 |
| WO2017069907A1 (en) | 2017-04-27 |
| TW201717040A (en) | 2017-05-16 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| TWI627536B (en) | System and method for a shared cache with adaptive partitioning | |
| US11163699B2 (en) | Managing least recently used cache using reduced memory footprint sequence container | |
| US11487675B1 (en) | Collecting statistics for persistent memory | |
| CN108804031B (en) | Optimal record lookup | |
| US11263149B2 (en) | Cache management of logical-physical translation metadata | |
| US10474397B2 (en) | Unified indirection in a multi-device hybrid storage unit | |
| US9904473B2 (en) | Memory and processor affinity in a deduplicated environment | |
| US9996542B2 (en) | Cache management in a computerized system | |
| US9798655B2 (en) | Managing a cache on storage devices supporting compression | |
| JP6496626B2 (en) | Heterogeneous integrated memory unit and its extended integrated memory space management method | |
| US9501419B2 (en) | Apparatus, systems, and methods for providing a memory efficient cache | |
| US20140115260A1 (en) | System and method for prioritizing data in a cache | |
| US20120117328A1 (en) | Managing a Storage Cache Utilizing Externally Assigned Cache Priority Tags | |
| US11392323B2 (en) | Memory system and method of controlling nonvolatile memory | |
| KR20190052546A (en) | Key-value storage device and method of operating the key-value storage device | |
| KR20170097609A (en) | Apparatus, system and method for caching compressed data background | |
| US10503647B2 (en) | Cache allocation based on quality-of-service information | |
| US20220382672A1 (en) | Paging in thin-provisioned disaggregated memory | |
| US9699254B2 (en) | Computer system, cache management method, and computer | |
| US10353829B2 (en) | System and method to account for I/O read latency in processor caching algorithms | |
| CN116107926B (en) | Management methods, devices, equipment, media and program products for cache replacement strategies | |
| US9880778B2 (en) | Memory devices and methods | |
| US11797183B1 (en) | Host assisted application grouping for efficient utilization of device resources | |
| US11132128B2 (en) | Systems and methods for data placement in container-based storage systems | |
| JP7337228B2 (en) | Memory system and control method |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| MM4A | Annulment or lapse of patent due to non-payment of fees |