[go: up one dir, main page]

WO2024196846A1 - Décodage d'adresse mémoire système pour entrelacement d'adresses à travers des régions physiques d'un système sur puce (soc) et à travers des ressources de mémoire partagée dans un système basé sur un processeur - Google Patents

Décodage d'adresse mémoire système pour entrelacement d'adresses à travers des régions physiques d'un système sur puce (soc) et à travers des ressources de mémoire partagée dans un système basé sur un processeur Download PDF

Info

Publication number
WO2024196846A1
WO2024196846A1 PCT/US2024/020378 US2024020378W WO2024196846A1 WO 2024196846 A1 WO2024196846 A1 WO 2024196846A1 US 2024020378 W US2024020378 W US 2024020378W WO 2024196846 A1 WO2024196846 A1 WO 2024196846A1
Authority
WO
WIPO (PCT)
Prior art keywords
memory
hashing
region
physical
available
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/US2024/020378
Other languages
English (en)
Inventor
Keith Robert Pflederer
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qualcomm Inc
Original Assignee
Qualcomm Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US18/511,079 external-priority patent/US12380019B2/en
Application filed by Qualcomm Inc filed Critical Qualcomm Inc
Publication of WO2024196846A1 publication Critical patent/WO2024196846A1/fr
Anticipated expiration legal-status Critical
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/06Addressing a physical block of locations, e.g. base addressing, module addressing, memory dedication
    • G06F12/0607Interleaved addressing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0813Multiuser, multiprocessor or multiprocessing cache systems with a network or matrix configuration
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/084Multiuser, multiprocessor or multiprocessing cache systems with a shared cache
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0864Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using pseudo-associative means, e.g. set-associative or hashing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1016Performance improvement
    • G06F2212/1024Latency reduction
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1041Resource optimization
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/25Using a specific main memory architecture
    • G06F2212/254Distributed memory

Definitions

  • the technology of the disclosure relates generally to system memory addressing in a computer system, and more particularly to mapping system memory addresses to distributed memory resources in a computer system.
  • Microprocessors also known as processing units (PUs), perform computational tasks in a wide variety of applications.
  • One type of conventional microprocessor or PU is a central processing unit (CPU).
  • Another type of microprocessor or PU is a dedicated processing unit known as a graphics processing unit (GPU).
  • a GPU is designed with specialized hardware to accelerate the rendering of graphics and video data for display.
  • a GPU may be implemented as an integrated element of a general- purpose CPU or as a discrete hardware element that is separate from the CPU.
  • a PU(s) executes software instructions that instruct a processor to fetch data from a location in memory and to perform one or more processor operations using the fetched data. The result may then be stored in memory.
  • this memory can be a cache memory local to the PU, a shared local cache among PUs in a PU block, a shared cache among multiple PU blocks, and/or a system memory in a processor-based system.
  • Cache memory which can also be referred to as just “cache,” is a smaller, faster memory that stores copies of data stored at frequently-accessed memory addresses in a main memory or higher-level cache memory to reduce memory access latency.
  • a cache memory can be used by a PU to reduce memory access times.
  • PUs utilize system memory addresses to route memory requests to the appropriate memory resource such as a shared cache or system memory.
  • Computer systems utilize system memory address maps which map memory address ranges to physical memory resources.
  • Computer systems also include hashing hardware for decoding system memory addresses and identifying the appropriate memory resource to direct memory access requests such as memory writes and memory reads to the memory resource that is configured to store the data requested.
  • a SoC is an integrated circuit (IC) chip that includes a processor with one or more processor units and an on-chip memory system.
  • a SoC may also include other computing resource circuits on the IC chip as well.
  • the on- chip memory system includes shared memory resources such as caches, snoop filters, and memory interfaces to system memory.
  • the processor units are configured to issue memory access requests to the on-chip memory system.
  • the SoC includes a plurality of memory interfaces in which respective memory chips (e.g., DRAM chips) can be coupled as a shared memory resource.
  • the shared memory resources may reside in different physical locations/regions in the SoC relative to processor units to balance access times between processor units and memory interfaces.
  • the physical address space can be configured into a plurality of address ranges based on the number and size of memory chips that are coupled to the memory interfaces and the number of operable caches and snoop filters.
  • the SoC includes a way in which to access the physical address space given that any number of memory interfaces may be coupled to memory chips.
  • the SoC is configured to discover, for every range of addresses, the number of physical regions and the number and/or size of the shared memory resources available including caches, snoop filters and memory interfaces within each physical region.
  • the SoC may include a system memory address decoding circuit that is configured to adaptively decode a memory address received in a memory access request from a processor unit based on the memory address range in which the system address resides and then direct such memory access request to the proper shared memory resource.
  • the SoC is configured to interleave sequential memory addresses to the available shared memory resources across both physical regions in which the available shared memory resources are located and across shared memory resources within a physical region.
  • the SoC is also configured to decode memory access requests to effectuate such interleaving.
  • a method of determining a target identification for a memory request comprises discovering configuration parameters comprising a plurality of available shared memory resources on a system on chip in one or more physical regions, each of the plurality of available shared memory resources associated with a target identifier, the configuration parameters comprising locations for system memory addresses which are interleaved across one or more physical regions and one or more available shared memory resources within a physical region, the configuration parameters further comprising a plurality of hashing regions wherein each hashing region includes a hash circuit and corresponds to a unique combination of one or more physical regions and one or more available memory resources.
  • the method further comprises determining a first hashing region of the plurality of hashing regions in which the system address resides, hashing the system address based on a first hash circuit corresponding to the first one of the plurality of hashing regions to identify a first physical region, hashing the system address based on a second hash circuit corresponding to the first one of the plurality of hashing regions to select a first available shared memory resource within the first physical region; and determining a first target identifier of the first available shared memory resource.
  • each memory address range within a physical address space may map to a different number of physical regions and the number and size of shared memory resources present in each physical region within the SoC may also be different.
  • the SoC and/or its system memory address decoding circuit may advantageously employ different memory decoding techniques to direct a memory request to the proper shared memory resource while achieving interleaving across physical regions and across shared memory resources within the same physical region.
  • the number of shared memory resources may include over 32 shared system memory caches and over 32 shared memory interface controllers such that if only a small portion of the SoC is not usable, there will be plenty of shared memory resources to address workloads of the SoC. By utilizing SoCs with unusable portions, chip yields can be increased which will reduce waste and save costs. Additionally, given today’s complex SoCs, interleaving memory addresses across all the shared memory resources will lessen the opportunity for hot spots at some memory resources and reduce delays by the interconnect network routing memory requests between resources on the SoC. Interleaving memory means that the next sequential address of a physical address space will map to the next shared memory resource. Given that the size of a manufacturing failure may be unpredictable, various unique configurations of memory resources may exist on a SoC and the system memory address decoder will have to adapt to continue interleaving memory addresses across its useable shared memory resources.
  • the SoC and/or the system memory address decode circuit can take advantage of the power of two (2) relationship to decode the region and shared memory resources either by employing an OR-based logic operation (e.g. XOR or OR operations) in an OR-based logic circuit of multiple bit positions in the system memory address or decoding specific bit positions in the system memory address for the region, cache/snoop filter, and memory interfaces.
  • an OR-based logic operation e.g. XOR or OR operations
  • the SoC and/or the system memory address decode circuit utilizes the same bits for decoding these shared memory resources.
  • the SoC and/or the system memory address decode circuit applies modular arithmetic to the system memory address to decode the target entity which does not have a power of two relation.
  • the SoC and/or the system memory address decode circuit can apply a modulo three circuit to the system address to decode the specific region and either apply an XOR circuit of multiple bit positions in the system memory address to decode the cache/snoop filter and memory channel or decode specific bit positions in the system memory address for the cache/snoop filter and memory channel.
  • the SoC and/or the system memory address decoding circuit when a memory address range of a physical address space spans physical regions and shared memory resources whose number is not a power of two, applies modulo operations to decode both the physical region and the shared memory resource. For example, if the memory address range spans three physical regions, and, in each physical region, there are seven shared cache memories and six shared memory interfaces, the SoC and/or the system memory address decoding circuit applies a modulo three circuit to the system address to decode the physical region and a modulo seven circuit to the system address to decode the shared cached memory resource.
  • the system address memory decode circuit has to apply a modulo nine circuit and a div three circuit, and add circuit of the lowest bit of a portion of the system address, and also concatenate the lowest bit to decode the memory interface. Doing so, the SoC and/or the system memory address decoding circuit ensures that the memory addresses in this memory range are interleaved over all the memory interfaces in each of the three (3) physical regions. [0014] When a first address range ends and a second address range begins, the system address decoding circuit ensures that the first address in the second address range decodes the system address to select the next sequential shared memory resource.
  • the system address decoding circuit ensures that the first address in the second address range decodes the system address to direct the memory request to the first shared memory resource in the first physical region.
  • the system address decoding circuit ensures that the first address in the second address region decodes the system address to direct the memory request to the targeted resource in the set of targeted resources next to the targeted resource which was targeted by the last address in a first address region.
  • target resources is defined to be physical regions, shared cache memory resources, and memory interfaces in a particular hashing region.
  • the SoC and/or the system memory address decoding circuit described herein can be flexible to interleave memory addresses across various unique hardware configurations of shared memory resources. These unique hardware configurations may be caused by various situations including an equipment manufacturer adding various size system DRAM memory modules to connect to a SoC or, during manufacturing, portions of the SoC may be deemed inoperable such that some memory controllers and/or some shared system memory caches are unusable while other portions of the SoC are usable.
  • Figure l is a block diagram illustrating an exemplary processor-based system for interleaving addresses across physical regions and across shared memory resources
  • Figure 2 is a physical layout on an integrated circuit (IC) chip of an exemplary configuration of the processor-based system in Figure 1;
  • Figure 3 is an exemplary set of interleave address ranges interleaved across four (4) physical regions in the IC chip in Figure 2;
  • Figure 4 is an exemplary set of hashing regions comprising ranges of addresses that are interleaved across both the interleave address ranges of Figure 3 and four (4) memory interface pairs in Figure 2;
  • Figure 5 is an exemplary flowchart for decoding a system memory address of a memory request in accordance with interleaving memory addresses across physical regions and across shared memory resources, including, but not limited to, the shared memory resources in the IC chip in Figures 1 and 2;
  • Figure 6 is a block diagram of an exemplary system address decoding circuit utilized by processing units in Figure 2;
  • Figure 7 is a block diagram of an exemplary system address decoding circuit utilized by shared cache memories in Figure 2 when a memory request misses in a shared cache memory;
  • Figure 8 is an exemplary XOR tree circuit for a system memory address which resides in an address range that spans two sockets, four (4) physical regions in each socket, eight (8) shared cache memory resources in each physical region, and eight (8) memory interfaces in each physical region;
  • Figure 9 is an exemplary XOR tree circuit in Figures 6 and 7 for a system memory address which resides in an address range that spans one socket, four (4) physical regions in the socket, eight (8) shared cache memory resources in each physical region, and eight (8) memory interfaces in each physical region;
  • Figure 10 is an exemplary bit hash circuit in Figures 6 and 7 for a system memory address which resides in an address range that spans one socket, four (4) physical regions in the socket, eight (8) shared cache memory resources in each physical region, and eight (8) memory interfaces in each physical region;
  • Figure 11 is an exemplary hash circuit in Figures 6 and 7 for a system memory address which resides in an address range that spans one socket, three (3) physical regions in the socket, eight (8) shared cache memory resources in each physical region, and eight (8) memory interfaces in each physical region;
  • Figure 12 is an exemplary hash circuit in Figures 6 and 7 for a system memory address which resides in an address range that spans one socket, three (3) physical regions in the socket, seven (7) shared cache memory resources in each physical region, and six (6) memory interfaces in each physical region;
  • Figure 13 is flowchart for the logic process of an exemplary aggregated hash circuit which may be utilized by one or more of the hash circuits described in Figures 6 and 7 that hashes system memory addresses based on the factor of the number of targeted resources in a set of targeted resources when the set of targeted resources is a set of physical regions, shared cache memory resources, or memory interfaces;
  • Figure 14 is an exemplary configuration of a collection of memory index tables where each memory index table is unique for a shared cache memory resource in the same physical region to interleave system memory addresses across memory interfaces when the ratio of shared cache memory resources to memory interfaces is 7:8;
  • Figure 15 is an exemplary flowchart for decoding a system memory address of a memory request in accordance with interleaving memory addresses across physical regions and across shared memory resources, including, but not limited to, the shared memory resources in the IC chip in Figures 1 and 2; and
  • Figure 16 is a block diagram of an exemplary processor-based system that can include a system address decoding circuit for interleaving addresses across physical regions and across shared memory resources in Figures 6 and 7.
  • An SoC is an integrated circuit (IC) chip that includes a processor with one or more processor units and an on-chip memory system.
  • An SoC may also include other computing resource circuits on the IC chip as well.
  • the on-chip memory system includes shared memory resources such as caches, snoop filters, and memory interfaces to system memory.
  • the processor units are configured to issue memory access requests to the on-chip memory system.
  • the SoC includes a plurality of memory interfaces in which respective memory chips (e.g., DRAM chips) can be coupled as a shared memory resource.
  • the shared memory resources may reside in different physical locations / regions in the SoC relative to processor units to balance access times between processor units and memory interfaces.
  • the physical address space can be configured into a plurality of address ranges based on the number and size of memory chips that are coupled to the memory interfaces and the number of operable caches and snoop filters.
  • the SoC includes a way in which to access the physical address space given that any number of memory interfaces may be coupled to memory chips.
  • the SoC is configured to discover, for every range of addresses, the number of physical regions and the number and/or size of the shared memory resources available including caches, snoop filters and memory interfaces within each physical region.
  • the SoC may include a system memory address decoding circuit that is configured to adaptively decode a memory address received in memory access request from a processor unit based on the memory address range in which the system address resides and then direct such memory access request to the proper shared memory resource.
  • the SoC is configured to interleave sequential memory addresses to the available shared memory resources across both physical regions in which the available shared memory resources are located and across shared memory resources within a physical region.
  • the SoC is also configured to decode memory access requests to effectuate such interleaving.
  • each memory address range within a physical address space may map to a different number of physical regions and the number and size of shared memory resources present in each physical region within the SoC may also be different.
  • the SoC and/or its system memory address decoding circuit may advantageously employ different memory decoding techniques to direct a memory request to the proper shared memory resource while achieving interleaving across physical regions and across shared memory resources within the same physical region.
  • FIG. 1 is a block diagram illustrating an exemplary processor-based system 100 for interleaving addresses across physical regions and across shared memory resources.
  • the processor-based system 100 includes a multiple (multi-) processing unit (PU) (multi -PU) processor (multi -processor) 102 that includes multiple PUs 104(0)-104(N) and a hierarchical memory system.
  • PU 104(0) includes a private local cache memory 106, which may be a level 2 (L2) cache memory.
  • L2 level 2
  • PUs 104(1), 104(2) and PUs 104(3)-PU 104(N) are configured to interface with respective local shared cache memories 106S(0)- 106S(X), which may also be L2 cache memories for example. If a memory request is requested by any one of PUs 104(0)-104(N) and results in a cache miss to the respective cache memories 106, 106S(0)-106S(X), the memory request may be communicated to a next-level cache memory, which in this example is a shared cache memory 108(A)- 108(Y).
  • the shared cache memory 108(A)- 108(Y) may be a level 3 (L3) cache memory.
  • a snoop controller 110 also known as a snoop filter, will monitor the memory request to determine whether another shared L2 cache contains the latest data for the memory request.
  • the snoop controller 110 and the shared cache memory 108(A)- 108(Y) are treated similarly with respect to routing memory requests.
  • the cache memory 106, the local shared cache memories 106S(0)-106S(X), the snoop controller 110, and the shared cache memory 108(A)- 108(Y) are part of a hierarchical cache memory system 112.
  • An interconnect bus 114 which may be a coherent bus, is provided that allows each of the PUs 104(0)-104(N) to access the shared cache memories 106S(0)-106S(X) (if shared to the PUs 104(0)-104(N)), the snoop controller 110, the shared cache memory 108(A)- 108(Y), and other shared resources coupled to the interconnect bus 114. If a memory request does not hit in the shared cache memory 108(A)- 108(Y), the memory request will be directed to a memory interface 116(A)-116(N) which will either read or write data to an appropriate system memory module 118(A)-118(N).
  • the system memory modules 118(A)-118(N) may be pluggable high-speed random access memory modules which may have different sizes.
  • High-speed random access memory modules may include dynamic random access memory (DRAM), dual data rate memory (DDR), or any other random access solid state memory devices.
  • the term “shared memory” resources includes the shared cache memory 108(A)- 108(Y), the snoop controller 110, and the memory interfaces 116(A)- 116(N).
  • shared cache memory” resources includes the shared cache memory 108(A)- 108(Y) and the snoop controller 110.
  • Each of the shared memory resources is associated with a unique target identifier which allows the interconnect bus 114 to direct the memory request to the appropriates shared memory resource.
  • the multi-processor 102 includes a socket gateway 120(A)- 120(H) through which a memory request is routed.
  • the processor-based system 100 in Figure 1 includes a system address decoding circuit 122(0) or a shared system address decoding circuit 122(1) that is configured to decode a system memory address in a memory request and determine the unique target identifier for the interconnect bus 114 to route the memory request.
  • the processor-based system 100 discovers configuration parameters associated with the characteristics of the multiprocessor 102 including whether and to what extent any shared system cache memories 108(A)-108(Y) and memory interfaces 116(A)-116(N) are available, the size of each available shared system cache memory 108(A)- 108(Y), the size of each system memory module 118(A)-118(N) coupled to the respective available memory interfaces 116(A)- 116(N), and optionally the number of socket gateways 120(A)- 120(H).
  • the configuration parameters include memory regions containing a plurality of address ranges which interleaves memory addresses across physical regions and shared memory resources within each physical region.
  • the description associated with Figure 2 will illustrate an exemplary physical layout of the processor-based system 100.
  • the description associated with Figures 3 and 4 will illustrate exemplary memory regions which interleave memory addresses across physical regions, available shared cache memory resources within a physical region, and memory interfaces within a physical region.
  • the system address decoding circuit 122(0) is configured to determine whether and to which socket gateway to route the memory request. Refer to the discussion in connection with Figure 5 for a more detailed discussion of decoding a system memory address. If the memory request is not destined to a socket gateway 120(A)- 120(H), the system address decoding circuit 122(0) is configured to determine an address region in which the system address resides and to hash the system address corresponding to the hash circuit associated with the address region to decode a physical region and a shared memory resource which should be the target of the memory request.
  • the system address decoding circuit 122(0) is also configured to determine the target identifier associated with the shared cache memory resource and to output the target identifier to the interconnect bus 114 to route the memory request to a specific shared cache memory resource 108(A)- 108(Y), say 108(C) for example.
  • the system address decoding circuit 122(1) of shared cache memory resource 108(C) is configured to determine an address region in which the system address resides and to hash the system address corresponding to the hash circuit associated with the address region to decode a memory interface 116(A)-116(N) which should be the next target of the memory request.
  • the shared system address decoding circuit 122(1) is also configured to determine the target identifier associated with the specific memory interface 116(A)- 116(N) and to output the target identifier to the interconnect bus 114 to route the memory request to a specific memory interface 116(A)-116(N).
  • socket gateways 120(A)- 120(H) are deployed, they would receive memory requests originating from another SoC coupled to a socket gateway 120(A)- 120(H).
  • the socket gateway 120(A)- 120(H) will invoke the system address decoding circuit 122(0) to determine the unique target identifier for the shared cache memory resource on this chip in the same way as the system address decoding circuit 122(0) does when a memory request originates from any one of the PUs 104(0)-104(N).
  • FIG. 2 is a physical layout of an IC chip for an exemplary configuration of the processor-based system 100 in Figure 1.
  • a SoC 200 is an IC chip containing a particular configuration of the multi-processor 102.
  • the SoC 200 includes PUs 104(1)- 104(20) coupled to the interconnect bus 114.
  • the SoC 200 also includes four physical regions 202(1 )-202(4). In each physical region 202(1 )-202(4), there are eight shared cache memory resources, for example, 108(l)-108(8) and eight memory interfaces 116( 1 )- 116(8).
  • Each shared cache memory resource 108(1 )- 108(8) may route a memory request to any of the eight memory interfaces 116(1)-116(8).
  • Each memory interface 116( 1 )- 116(8) couples to a respective system memory module 118(1)- 118(8).
  • the SoC 200 also includes eight gateways 120(A)- 120(H).
  • Figure 3 and Figure 4 together describe how during the boot-up process, the processor-based system 100 creates configuration parameters including a set of hashing regions where each address range interleaves memory addresses across a unique combination of the physical regions 202(1 )-202(4), the shared memory resources, and the system memory modules 118(A)-118(N) within each physical region 202(1 )-202(4).
  • Figure 3 is an exemplary set of address ranges that are interleaved across four physical regions in Figure 2 and assumes that shared cache memory resources 118(1)- 118(32) and memory interfaces 116(1)-116(32) are available.
  • the processor-based system 100 is configured to discover at boot up that the memory capacity of the system memory modules 118(1)- 118(8) in the physical region 202(1) is larger than the memory capacity of the system memory modules 118(9)-118(16) in the physical region 202(2), which is larger than the memory capacity of the system memory modules 118( 17)- 118(24) in the physical region 202(3), which is larger than the memory capacity of the system memory modules 118(25)- 118(32) in the physical region 202(4).
  • the processor-based system 100 is also configured to create interleave physical regions 300(l)-300(4) at boot up.
  • System memory addresses falling within interleave physical region 300(1) are interleaved across all four physical regions 202(1)- 202(4).
  • System memory addresses falling within interleave physical region 300(2) are interleaved across the physical regions 202(l)-202(3).
  • System memory addresses falling within interleave physical region 300(3) are interleaved across physical regions 202(1)- 202(2).
  • System memory addresses falling within interleave physical region 300(4) are interleaved across physical region 202(1).
  • the processor-based system 100 creates the interleave physical region 300(1) by consuming all memory installed in the smallest physical region 202(4).
  • the processor-based system 100 creates the interleave physical region 300(2) by consuming all memory installed in the next smallest physical region 202(3), and so on.
  • the processor-based system 100 creates the interleave physical regions 300(l)-300(4) where the lowest interleave physical region 300(1) spans all the memory in all physical regions of the SoC 200 until the memory in the smallest- sized physical region 202(4) is exhausted and the next interleave physical region 300(2) spans the remaining memory until the next smallest-sized region 202(3) is exhausted, and so on.
  • the interleave physical regions 300(l)-300(4) are associated with unique regional hash circuits.
  • Socket address region 302 is optional and used when the SoC 200 is coupled to another chip through the socket gateways 120(A)- 120(H).
  • the shared cache memory resources 108(A)-108(Y) are caches which are flexible in the memory addresses for which they are able to store data.
  • the processor-based system 100 determines the size of each system memory module 118(A)-(N) coupled to each memory interface 116(A)-116(N) in each physical region 202(l)-202(4) to interleave memory addresses across the system memory modules in that respective region.
  • the processor-based system 100 is configured to determine that the system memory modules 118(1)-118(2) coupled to the memory interfaces 116(1)-116(2) have a larger capacity than the system memory modules 118(3)-118(4) coupled to the memory interfaces 116(3)- 116(4), the system memory modules 118(3)-118(4) coupled to the memory interfaces 116(3)- 116(4) have a larger capacity than the system memory modules 118(5)- 118(6) coupled to the memory interfaces 116(5)- 116(6), and so on.
  • the processor-based system 100 determines memory interface interleave regions 400(l)-400(4) based on the sizes of the memory interface pairs.
  • the processor-based system 100 determines hashing regions 402(l)-402(8) by overlaying the memory interface interleave regions 400(1)- 400(4) and the interleave physical regions 300(l)-300(4). For every system memory address in hashing regions 402(l)-402(8), the processor-based system 100 establishes an order of targeted resources that will be interleaved in each hashing region 402(l)-402(8). [0047]
  • the socket address region 302 and socket hashing region 402(1) are optional and used when the SoC 200 is coupled to another chip through the socket gateways 120(A)- 120(H) for sending memory requests to another chip and receiving memory requests from the other chip.
  • the hashing regions 402(1 )-402(8) define the physical address space of system addresses. Each hashing region that borders another hashing region is defined as being an adjacent hashing region. Additionally, the lowest memory address in one hashing region, e.g., 402(3), is adjacent to the highest memory address of the adjacent hashing region, e.g., 402(2). The lowest memory address in one hashing region maps to the available shared memory resource that is sequential to the available shared memory resource mapped by the highest memory address in the adjacent hashing region.
  • a unique hashing circuit is utilized to decode a system memory address since the discovered configuration for each hashing region 402(l)-402(8) is unique with respect to the number of physical regions over which system memory addresses are interleaved or the number of shared memory resources in a physical region in which system memory addresses are interleaved.
  • system memory addresses that fall in a respective hashing region are interleaved across the unique combination of one or more physical regions and shared memory resources within a physical region.
  • hashing region 402(1) employs a hashing circuit that decodes system addresses that are interleaved across multiple chip sockets
  • hashing region 402(2) employs a hashing circuit that decodes system addresses that are interleaved across four physical regions 202(1 )-202(4).
  • hashing region 402(3) employs a hashing circuit that decodes system addresses which are interleaved across three physical regions 202(l)-202(3) and four memory interface pairs 116(1)-116(8), while hashing region 402(4) employs a different hashing circuit that decodes system addresses which are interleaved across the same three physical regions 202(l)-202(3) and three memory interface pairs 116(1)-116(6).
  • Hashing region 402(5) employs a hashing circuit that decodes system addresses which are interleaved across two physical regions 202(1)- 202(2) and three memory interface pairs 116(1)-116(6), while interleave memory region 402(6) employs a different hashing circuit that decodes system addresses which are interleaved across the same two physical regions 202(1 )-202(2) and two memory interface pairs 116(1)-116(4).
  • Hashing region 402(7) employs a hashing circuit that decodes system addresses which are interleaved across one physical region 202(1) and two memory interface pairs 116(1)- 116(4), while interleave memory region 402(6) employs a different hashing circuit that decodes system addresses which are interleaved across the same one physical region 202(1) and one memory interface pair 116(1)-116(2).
  • each hashing region 402(1 )-402(8) defines a unique combination of physical regions, shared cache memory resources, and memory interfaces over which to interleave memory addresses
  • the system decoding circuit 122 may need to ensure that the first address in a hashing region decodes the system address to target the resource (i.e., shared cache memory resource or memory interface) which is the next sequential resource from the last targeted resource by the last address in the lower adjacent hashing region.
  • the processor-based system 100 determines a smoothing factor, Sf, that will be utilized in the respective hashing circuits associated with hashing regions 402(1 )-402(8).
  • the processor-based system 100 performs a trial hashing process for each hashing circuit associated with hashing regions 402(2)-402(8) to calculate a smoothing factor for hashing regions 402(2)-402(8).
  • the trial hashing process for hashing region 402(2) includes applying the highest system memory address of the lower adjacent hashing region, hashing region 402(1) for example, to the hashing circuit associated with hashing region 402(1) to determine the first targeted resource.
  • the trial hashing process for hashing region 402(2) also includes applying the lowest system memory address hashing the lowest address of hashing region 402(2) to the hashing circuit associated with hashing region 402(2) to determine the second targeted resource.
  • the smoothing constant is calculated by taking the absolute difference between the unique identifiers of second and first targeted resources. For example, if the first targeted resource is shared cache memory resource 108(1) and the second targeted resource is shared cache memory 108(7), the smoothing constant would be equal to six (6).
  • a correction constant is utilized by the corresponding hashing circuit when hashing for a targeted resource in the set of targeted resources that is not a power of two (2) to ensure that each of the targeted resources within the set will be sequentially targeted for each sequential address within the hashing region.
  • the processor-based system 100 performs a correction factor trial hashing process for each hashing region 402(1)- 402(8) to calculate a correction factor for any of the sets of targeted resources in the respective hashing regions 402(1 )-402(8) where the number of targeted resources in the set is not a power of two (2).
  • the correction factor trial hashing process determines a correction factor for addressing three physical regions and a correction factor for addressing six memory interfaces.
  • the processor-based system 100 solves for Cf, the correction factor, in the following formula:
  • D is the distance between two targeted resources in the same set for two sequential system memory addresses
  • Nr is the number of targeted resources in the same set.
  • D is determined by running the corresponding hashing circuit of a hashing region for two adjacent system memory addresses to yield two targeted resources in the set.
  • the distance, D is calculated by taking the absolute difference between the unique identifiers of two targeted resources.
  • hashing region 402(6) employs a unique hashing circuit that decodes system addresses which are interleaved across three physical regions 202(l)-202(3) and two memory interface pairs 116(1)-116(4) (in other words, four (4) memory interfaces).
  • the set of targeted resources that are not both a power of two are both the physical regions (3) and the memory interfaces (4).
  • the correction factor trial hashing process in this example, will run the hashing circuit corresponding to hashing region 402(6) twice for two consecutive system memory addresses which target two physical regions and will assume a correction factor Cf having a value of 1. Assuming the two resulting physical regions are physical region 202(1) and physical region 202(3), the distance, D, would be two (2).
  • a PU When routing a memory request, a PU is configured to determine a target identifier for the memory request having a system memory address so the memory request can be routed first to the appropriate shared cache memory resource.
  • the system address decoding circuit 122(0) determines which unique hashing circuit to use to decode the system memory address based on the hashing region 402(l)-402(8) in which the system memory address resides.
  • the selected hashing circuit is used to decode the physical region for which the memory request is destined and the shared cache memory resource and the memory interface within the determined physical region.
  • the term “unique hashing circuit” as used herein is agnostic to a particular implementation.
  • Hashing circuits may be unique by having different logic circuitry or, if the hashing circuits have the same or common logic circuitry, may be unique by having unique parameters inputted to a respective hashing circuit.
  • the hashing circuits as described further in Figures 6-7 may have the same logic circuity as described in Figure 13 but vary based on the parameters inputted to the respective hashing circuits.
  • FIG. 5 is an exemplary flowchart for decoding a system memory address of a memory request in accordance with interleaving memory addresses across physical regions and across shared memory resources within a physical region.
  • Process 500 involves two levels of system memory address decoding hierarchies.
  • a first level of system memory address decoding 502 is addressed by the system address decoding circuit 122(0) at a PU 104(0)-104(N) which performs blocks 506-514 and optional blocks 524- 530, if the SoC 200 is deployed with multiple sockets.
  • a second level of system address decoding 504 is addressed by the system address decoding circuit 122(1) at a shared cache memory resource 108(A)- 108(Y) which performs blocks 518-522.
  • the process 500 receives a system memory address, determines in which hashing region 402(1 )-402(8) the system memory address resides, and determines whether the processorbased system 100 deploys a multiple-socket system. If not, the process 500 proceeds to block 508.
  • the process 500 particularly configures a hash circuit based on the specific hashing region 402(1 )-402(8) in which the system memory address resides and hashes the system memory address to determine the target physical region for which the memory request is destined.
  • the process 500 particularly configures the hash circuit, which is specific to the target physical region, based on the hashing region 402(l)-402(8) in which the system memory address resides and proceeds to block 512.
  • the process 500 hashes the system memory address to determine which shared cache memory resource should be targeted in the specific target physical region.
  • the process 500 determines the unique target identifier of the targeted shared cache memory resource and submits the memory request to the interconnect bus 114 which routes the memory request to the selected shared cache memory resource.
  • a look-up table is used to convert a local reference of the shared cache memory resource from block 512 to a unique target identifier.
  • the targeted shared cache memory resource determines whether there is a “hit.” If there is, the memory request is serviced by the selected shared cache memory resource. If there is a “miss,” the process 500 proceeds to block 518.
  • the process 500 determines regional specific hashing parameters for finding the appropriate memory interface to route the memory request.
  • the process 500 particularly configures the hash circuit based on the regional specific hashing parameters.
  • the process 500 hashes the system memory address to determine the appropriate memory interface in the specific physical region that should be targeted.
  • the process 500 determines the unique target identifier of the targeted memory interface and directs the memory request to the coupled system memory module.
  • a look-up table is used to convert a local reference of the memory interface from block 520 to a unique target identifier.
  • the process 500 proceeds to block 524.
  • the process 500 consumes the socket configuration parameters that were discovered at boot up.
  • the process 500 hashes the system memory address to determine if the system address should be destined to a shared cache memory resource on the chip in which the memory request originated. The process 500 makes this decision either by determining that the socket configuration parameters indicate that the SoC 200 is deployed in a one-socket configuration or, if the SoC 200 is deployed in a multiple-socket configuration, determining the socket number in the system memory address.
  • the process 500 proceeds to block 508. If not, the process 500 proceeds to block 528 to hash the system memory address according to the hashing circuit associated with the hashing region 402(1) to determine to which socket gateway 120(A)- 120(H) to route the memory request. At block 530, the process 500 utilizes a look-up table to find the unique target identifier associated with the selected socket gateway 120(A)- 120(H) and submits the memory request to the interconnect bus 114 for routing to the selected socket gateway 120(A)- 120(H).
  • Figure 6 is a block diagram of an exemplary system address decoding circuit 122(0) utilized by the PUs 104(1)- 104(20) in Figure 2.
  • Blocks 602-618 are utilized when the SoC 200 is deployed in a single-socket configuration or in a multiple-socket configuration.
  • Blocks 620-630 are optional and utilized only when the SoC 200 is deployed in a multiple-socket configuration.
  • XOR tree circuit 602 is a set of XOR gates which are configured to XOR particular bits in a system memory address to decode a socket, a physical region, a shared cache memory resource, and a memory interface based on the hashing region 402(l)-402(8) in which the system address resides.
  • Figure 8 illustrates six XOR trees for decoding a system memory address which resides in the hashing region 402(1) which is associated with two sockets, four physical regions, eight shared cache memory resources, and eight memory interfaces. (XOR gates are shown for simplicity by connecting bits with horizontal lines.) For addresses in that region, bits 6, 12, 18, 24, 30, 36, 42, and 48 are XOR’d to determine a socket 802 from a system memory address 800. Bits 7, 13, 19, 25, 31, 37, 43, and 49 are XOR’d to determine the least significant bit of a physical region 804, and bits 8, 14, 20, 26, 32, 38, 44, and 50 are XOR’d to determine the most significant bit of the physical region 804.
  • Bits 9, 15, 21, 27, 33, 39, 45, and 51 are XOR’d to determine the least significant bit of a shared cache memory resource 806, bits 10, 16, 22, 28, 34, 40, and 46 are XOR’d to determine the middle bit of the shared cache memory resource 806, and bits 11, 17, 23, 29, 35, 41, and 47 are XOR’d to determine the most significant bit of the shared cache memory resource 806.
  • Figure 9 shows five XOR trees for decoding a system memory address 900 which resides in the hashing region 402(2) which is associated with one socket, four physical regions, eight shared cache memory resources, and eight memory interfaces. For addresses in that region, bits 6, 11, 16, 21, 26, 31 ,36, 41, 46, and 51 are XOR’d to determine the least significant bit of a physical region 902 and bits 7, 12, 17, 22, 27, 32, 37, 42, and 47 are XOR’d to determine the most significant bit of the physical region 902.
  • Bits 8, 13, 18, 23, 28, 33, 38, 43, and 48 are XOR’d to determine the least significant bit of a shared cache memory resource 904, bits 9, 14, 19, 24, 29, 34, 39, 44, and 49 are XOR’d to determine the middle bit of the shared cache memory resource 904, and bits 10, 15, 20, 25, 30, 35, 40, 45, and 50 are XOR’d to determine the most significant bit of the shared cache memory resource 904.
  • the XOR tree circuit 602 of Figure 6 is configurable to address multiple combinations of XOR bits to decode aspects of a system memory address. The XOR tree circuit 602 determines which subset of XOR gates to apply to what bits of the system memory address by reading XOR configuration parameters 604 which were discovered at boot up.
  • a physical region match circuit 606 determines which hashing region in which the system memory address resides by consuming regional configuration parameters 608 and particularly configures regional hash circuit 610 how to determine to which physical region the system memory address should be routed.
  • the regional configuration parameters 608 also include the number of available shared cache memory resources per physical region. For example, at boot up, the configuration parameters will hold information indicating to what extent, if any, certain cache memories 108(A)- 108(Y) are unavailable in each physical region 202(1 )-202(4).
  • the physical region match circuit 606 indicates whether the hashed region is configured for multi-socket interleaving by referring to the parameters associated with the hashing region 402(1 )-402(8) in which the system address resides.
  • the physical region match circuit 606 also forwards shared cache memory resource hashing configuration parameters for each physical region to particularly configure regional hash circuit 610.
  • the physical region match circuit 606 may also include parameters for smoothing factors or correcting factors that may be utilized in the hashing regions 402(1)- 402(8). See the discussion on smoothing or correcting factor calculations in connection with the description of Figure 4.
  • a regional hash circuit 610 determines whether, and to what extent, to utilize the output of the XOR tree circuit 602 or decode specific bits of the system memory address to determine the specific physical region to direct the memory request carrying the system memory address.
  • Figure 10 is an example of simply decoding specific bits from a system memory address.
  • Figure 10 is an exemplary bit hash circuit that can be used in Figure 6 and Figure 7 for a system memory address which resides in an address range that spans one socket, four physical regions in the socket, eight shared cache memory resources in each physical region, and eight memory interfaces in each physical region.
  • a hashing circuit 1000 determines a physical region 1002 by decoding bits 6-7 and a specific shared cache memory resource 1004 by decoding bits 8-10. Since Figure 10 addresses a one-socket hashing region, no bits are decoded for a socket.
  • the regional hash circuit 610 performs smoothing or correcting calculations, if needed when decoding a system memory address to ensure interleaving memory addresses due to multiple hashing regions 402(1 )-402(8). The application of these factors will be discussed in connection with Figure 13.
  • the regional hash circuit 610 forwards the specific physical region index for which the memory request is destined to a multiplexer (mux) 612, a mux 616 and a shared cache memory resource index table 614.
  • the mux 616 which receives the number of available shared resources in each physical region.
  • the mux 616 outputs the appropriate number of available shared cache memory resources in the targeted physical region to a shared cache memory resource hash circuit 618.
  • the shared cache memory resource hash circuit 618 determines whether, and to what extent, to utilize the output of the XOR tree circuit 602 or decodes specific bits of the system memory address to determine an index 619 of the specific shared cache memory resource in the targeted physical region to direct the memory request. For example, if the system memory address targets physical region 202(1) and there are eight available shared cache memories 108(1)- 108(8), the shared cache memory resource hash circuit 618 hashes the system memory address to identify which one of the eight available cache memories to which the memory request should be targeted. Additionally, the shared cache memory resource hash circuit 618 performs smoothing or correcting calculations, if needed when decoding a system memory address to ensure interleaving memory addresses due to multiple hashing regions 402(l)-402(8).
  • the shared cache memory resource index table 614 receives the index 619 of the shared cache memory resource from the shared cache memory resource hash circuit 618 and the targeted physical region from the regional hash circuit 610 and looks up the index 619 to a unique target identifier 621 for the targeted shared cache memory resource. If the SoC 200 is deployed in a one-socket configuration, the shared cache memory resource index table 614 forwards the memory request with the unique target identifier 621 of the targeted shared cache memory resource to the interconnect bus 114 to route the memory request to the targeted shared cache memory resource. If the SoC 200 is deployed in a multiple-socket configuration, the unique target identifier 621 for the targeted shared cache memory resource is forwarded to a mux 620.
  • a gateway hash circuit 622 receives input gateway configuration parameters 624 which were determined at boot up indicating the number of available socket gateways 120(A)-120(H) on the SoC 200.
  • the specific socket gateway is forwarded to a gateway index table 626 to look up the index 625 for the specific socket gateway to determine a unique gateway target identifier 627.
  • a socket hash circuit 628 receives the number of sockets in which the SoC 200 is deployed from socket configuration parameters 630.
  • the socket configuration parameters 630 are determined during boot up.
  • the socket hash circuit 628 determines whether the memory request is destined to a shared cache memory resource on the SoC 200 or a remote SoC connected through a socket gateway 120(A)- 120(H).
  • the socket hash circuit 628 determines whether to use the output of the XOR tree circuit 602 to determine the specific socket to or decode specific bits of the system memory address to obtain the specific socket to direct the memory request carrying the system memory address and sends a select signal 629 to the mux 620 to direct whether the memory request should be sent to the target gateway or a local shared cache memory resource.
  • FIG 7 is a block diagram of an exemplary system address decoding circuit 122(1) utilized by the shared cache memory resources in Figure 2 when a memory request misses in a shared cache memory resource.
  • the memory request is at the specific shared cache memory resource in a specific physical region on the SoC 200 and the specific shared cache memory resource has to direct the memory request to one of the available memory interfaces in the same physical region.
  • the shared cache memory resource 108(1) invokes its system address decoding circuit to determine the unique target of the proper memory interface, one of memory interfaces 118(1)- 118(8).
  • An XOR tree circuit 702 is a set of XOR gates which are configured to XOR particular bits in a system memory address to decode a memory interface based on the hashing region 402(1 )-402(8) in which the system memory address resides. Since there are a maximum of eight memory interfaces per physical region in an exemplary SoC 200, three XOR trees are determined to XOR three sets of bits from the system memory address to decode the memory interface index. Additionally, if there are the same number of available memory interfaces as there are available cache memories in the same physical region, the same bits used for decoding a shared cache memory resource can be used for decoding a memory interface.
  • the XOR tree circuit 702 determines which subset of XOR gates to apply to what bits of the system memory address by reading XOR configuration parameters 704 which were discovered at boot up.
  • a memory interface match circuit 706 receives the output of the XOR tree circuit 702 and the system memory address. Based on hashing region configuration parameters 708, the memory interface match circuit determines which hashing region 402(1 )-402(8) in which system memory address resides and sends configuration parameters 707 describing how to calculate the index for the memory interface to a memory interface hash circuit 710.
  • the hashing region configuration parameters 708 will also include information regarding the number of available memory interfaces in each physical region, the boundaries for each hashing region 402(l)-402(8), and any smoothing or correction calculations that may be utilized in the hashing regions 402(1)- 402(8). As such, the memory interface match circuit 706 will forward those configuration parameters to the memory interface hash circuit 710.
  • the memory interface hash circuit 710 utilizes the configuration parameters 707 and either the XOR output from the XOR tree circuit 702 or decodes specific bits of the system memory address to determine an index 711 for the specific memory interface to route the memory request.
  • a memory interface index table 712 receives the memory interface index 711 for the specific memory interface and looks up a corresponding unique target identifier 716. The memory interface index table 712 submits the unique target identifier 716 to the interconnect bus 114 for routing to the associated memory interface.
  • the memory interface index table 712 may be unique for each available shared cache memory resource in the physical region, and the memory interface index 711 is a decoded address which is used to look up in the memory interface index table 712 to retrieve the target identification of the targeted memory interface. See the description in connection with Figure 14 for more detail.
  • the memory interface routes the memory request to its associated system memory module 118(A)-118(N).
  • the hashing circuits 610, 618, 622, and 710 utilize XOR trees as input rather than bitwise decoding of the system memory address when, for a hashing region, the number of all entities (i.e., physical regions, available shared cache memory resources, and available memory interfaces) in the multi-level hierarchy 502 and 504 are a power of two (2).
  • the hashing circuits 610, 618, 622, and 710 are more complicated to decode system memory addresses which are interleaved over a configuration of entities (i.e., physical regions, available shared cache memory resources, and memory interfaces within a physical region) when any one of the number of physical regions, the number of available shared resources within a physical region, or the number of memory interfaces within the physical region are not a power of two (2).
  • Figure 11 is an exemplary hashing circuit in Figure 6 and Figure 7 for a system memory address which resides in an address region that spans one socket, three physical regions in the socket, eight shared cache memory resources in each physical region, and eight memory interfaces in each physical region.
  • exemplary hashing circuit 1100 decodes bits 6- 8 1102 to find the targeted shared cache memory resource and decodes these same bits 6- 8 1102 on a cache miss to find the targeted memory interface. However, since the address range spans three physical regions, the hashing circuit 1100 applies a mod 3 operator circuit 1104 to bits 6-51 to calculate the targeted physical region.
  • Figure 12 is an exemplary hashing circuit in Figure 6 and Figure 7 for a system memory address which resides in an address range that spans one socket, three physical regions in the socket, seven shared cache memory resources in a physical region, and six memory interfaces in a physical region.
  • exemplary hashing circuit 1200 applies a mod 3 circuit 1202 to bits 6- 51 and applies a correction and smoothing factor circuit (ckt) 1204 to the result of the mod 3 circuit 1202 to calculate the specific physical region.
  • exemplary hashing circuit 1200 also applies a mod 7 circuit 1206 to bits 6-51 to calculate the shared cache memory resource. Correction and smooth circuit calculations were discussed above in connection with Figure 4.
  • the hashing circuit 1200 applies a distinguishing factor circuit 1208.
  • the distinguishing factor circuit 1208, for this example, includes a mod 9 div 3 circuit 1210, a first correction and smoothing circuit 1212, an add/mod 3 circuit 1214, a second correction and smoothing circuit 1216, and a concatenate circuit 1218.
  • the mod 9 div 3 circuit 1210 performs a modular 9 operation on bits 6-51 and a div 3 operation on the result.
  • Correction and smoothing circuits 1212, 1216 perform correction and smoothing calculations in the previous circuit’s calculation and are described in connection with Figure 4.
  • the add/mod 3 circuit 1214 adds bit 6 to the result of the first correction and smoothing circuit 1212 and then performs a mod 3 operation and forwards its output to the second correction and smoothing circuit 1216.
  • the concatenate circuit 1218 concatenates bit 6 to the result of the second correction and smoothing circuit 1216 to calculate the memory interface.
  • the distinguishing factor circuit 1208 resolves to an available memory interface index which is sequential to a last available memory interface mapped from the last address in a lower adjacent hash region.
  • the memory interface index for the targeted memory interface resource is used as input to the memory interface index table 712 to obtain the target id to which a memory request is sent.
  • Other distinguishing factor calculations which address different numbers of physical regions, shared cache memory resources, or memory interfaces are deployed in the system address decoding circuit 122(0) and are discussed in connection with Figure 13.
  • FIG. 13 is flowchart for the logic process 1300 of an exemplary aggregated hash circuit which may be utilized by one or more of the hash circuits described in Figures 6 and 7 that hashes system memory addresses based on the factor of the number of targeted resources in a set of targeted resources when the set of targeted resources is a set of physical regions, shared cache memory resources, or memory interfaces.
  • This flexibility is deployed to handle incomplete good die manufacturing or various combinations of system memory modules coupling to less than the maximum number of available memory interfaces. For example, it may be determined during manufacturing that some shared cache memory resources within specific quadrants are not available. For example, although the SoC 200 was designed to have 64 shared cache memory resources 108(1)- 108(64), due to a manufacturing issue, shared cache memory resources 108(1), 108(9)- 108(10), 108(17)- 108(19) and 108(25)- 108(28) were defective and, thus, unavailable.
  • the system memory address decoding circuit 122(0) deploys a hash circuit to appropriately hash a system memory address to support seven shared cache memory resources in the physical region 202(1), six shared cache memory resources in the physical region 202(2), five shared cache memory resources in the physical region 202(3), and four shared cache memory resources in the physical region 202(4).
  • a device manufacturer may not completely populate the system memory modules 118(A)- 118(N) into the available memory interfaces 116(A)- 116(N).
  • the system memory address decoding circuit 122(1) deploys the combined hash circuit to appropriately hash a system memory address to support various combinations that a device manufacturer may use in plugging in system memory modules.
  • the aggregated hash circuit can be deployed at both the first and second levels of system memory address decoding (i.e., 502 and 504) in system address decoding circuit 122(0) or a shared system address decoding circuit 122(1).
  • system memory address decoding circuit 122(0) i.e., 502 and 504
  • shared system address decoding circuit 122(1) i.e., 502 and 504
  • logic process 1300 for convenience, will use the example of a system memory address being directed to a shared cache memory resource for a particular hashing region.
  • the logic process 1300 proceeds to block 1302(2). If the number of shared cache memory resources has a factor of three, the logic process 1300 proceeds to block 1304. At block 1304, the logic process 1300 determines if there is another targeted resource that has a factor of three, e.g., a physical region in order to ensure that the hashing region for this hashing circuit completely populates the targeted resources in the hashing region. If there are other targeted resources within the hashing region that have a factor of three, say for example a physical region, the logic process 1300 proceeds to block 1306.
  • a factor of three e.g., a physical region
  • the logic process 1300 performs a mod 9 div 3 on the system address and proceeds to block 1312.
  • the logic process 1300 performs a modulo add of smoothing factor Sf to the result from block 1306 and proceeds to block 1314.
  • the logic process 1300 saves the result from block 1312 in case there are other factors of the number of shared cache memory resources that need to be addressed later in the logic process 1300 (e.g., if there are six shared cache memory resources in the hashing region) and proceeds to block 1302(3).
  • the logic process 1300 proceeds to block 1308.
  • the logic process 1300 performs a mod 3 calculation on the system memory address and proceeds to block 1310. If the number of shared cache memory resources had a factor of five (5) or seven (7), the logic process 1300 would perform a mod 5 or mod 7, respectively, on a system memory address at block 1308 which will occur if the logic process 1300 proceeds through block 1302(2) before block 1308.
  • the logic process 1300 performs a modulo multiply of the result of block 1308 by correction factor Cf and then performs a modulo three (3), five (5), or seven (7) operation on the result before proceeding to block 1312 depending on the factor of the number of shared cache memory resources.
  • the logic process 1300 performs a modulo add of smoothing factor Sf to the result from block 1310 and then a modulo three (3), five (5), or seven (7) of the result based on the factor of the shared cache memory resources.
  • the logic process 1300 saves the result from block 1312 in case there are other factors of the number of shared cache memory resources that need to be addressed later in the logic process 1300 and proceeds to block 1302(3).
  • the logic process 1300 determines if the number of shared cache memory resources in the hashing region has a factor of two (2), four (4), or eight (8). If so, the logic process 1300 proceeds to block 1316. At block 1316, the logic process 1300 determines whether to use select bits in the system memory address or XOR select bits in the system memory address to find the target index of the shared cache memory address. The logic process 1300 makes this determination by consuming the configuration parameters which indicate the number of other resources in the hashing region such as physical regions and memory interfaces. If any of the sets of targeted resources are not a power of two (2), the logic process 1300 proceeds to block 1318 to select particular bits in the system address bits to identify a shared cache memory resource.
  • the logic process 1300 proceeds to block 1320 to XOR selected bits to identify a shared cache memory resource.
  • the logic process 1300 performs a modulo multiply of the results of either block 1318 or 1320 by correction factor Cf and then modulo two (2), four (4), or eight (8) of the result depending on the factor of shared cache memory resources that caused the logic process 1300 to exit block 1302(3).
  • the logic process 1300 performs a modulo add of smoothing factor Sf to the result from block 1322 then modulo two (2), four (4), or eight (8) of the result depending on the factor of shared cache memory resources that caused the logic process 1300 to exit block 1302(3).
  • the logic process 1300 saves the result from block 1324 in case there are other factors of the number of shared cache memory resources that need to be addressed from other paths in the logic process 1300 and proceeds to block 1302(4).
  • the logic process 1300 determines if there is only one shared cache memory resource in the hashing region by consuming configuration parameters. If there is one shared cache memory resource, the logic process 1300 proceeds to block 1328 and returns index zero (0) for the shared cache memory resource which is received by the shared cache memory resource index table 614. If there is more than one shared cache memory resource, the logic process 1300 proceeds to block 1302(5).
  • the logic process 1300 determines if the number of shared cache memory resources in the hashing region has a factor of six (6) by consuming configuration parameters. If it does, the logic process 1300 proceeds to block 1330 to add the first result calculated in block 1314 (i.e., factor of three (3) result) to the second result calculated in block 1326 (i.e., factor of two (2) result) and then perform a mod 3 operation on the result. The logic process 1300 proceeds to block 1332 where it concatenates the second result to the output of block 1330. At block 1334, the logic process 1300 returns the concatenated result from block 1332 as the index for the targeted shared cache memory resource which is used as input to the shared cache memory resource index table 614.
  • the logic process 1330 proceeds to block 1336 and returns the target index previously calculated for the factor of the shared cache memory resources. For example, if the shared cache memory resources had a factor of three (3), five (5), or seven (7) and not six (6), the first result would be returned as the target index to shared cache memory resource index table 614. In another example, if the shared cache memory resources had a factor of two (2), four (4), or eight (8) and not six (6), the second result would be returned as the target index to shared cache memory resource index table 614.
  • Individual blocks 1310 and 1322 are generally referred to as “modulo multiply operations.”
  • Individual blocks 1312 and 1324 are generally referred to as “modulo add operations.”
  • Figure 13 was described for convenience with respect to targeting a shared cache memory resource with a system memory address, Figure 13 is also applicable to targeting socket gateways, physical regions, and memory interfaces.
  • the output of Figure 13 is the index to the socket gateway which is input to the gateway index table 626.
  • the output of Figure 13 is the index to the physical region which is input to the shared cache memory resource index table 614.
  • the output of Figure 13 is the index to the memory interface which is input to the memory interface index table 712.
  • the operations performed in block groupings 1306, 1308-1312, 1318-1324, and 1330-1332 may be implemented in individual correction circuits within the hashing circuit.
  • the ratio between available shared cache memory resources and available memory interfaces may not be 1 : 1.
  • the memory interface index table 712 for each shared cache memory resource in the same physical region is configured individually so that when viewed holistically as a combined table, the system memory addresses will decode system memory addresses that are interleaved of all the memory interfaces.
  • Figure 14 is an exemplary configuration of a collection of memory index tables where each memory index table is unique for a shared cache memory resource in the same physical region to interleave system memory addresses across memory interfaces when the ratio of shared cache memory resources to memory interfaces is 7:8.
  • a memory index table 1400 is a collection of memory interface index tables 1402(l)-1402(7) where each memory interface index table 1402(1)- 1402(7) is configured in a corresponding system memory address decoding circuit 122(1) in one of the seven available shared cache memory resources in the physical region.
  • Each memory interface index table 1402(1)- 1402(7) directs eight (8) memory address ranges to no more than two memory interfaces (MI).
  • the physical region 202(1) may have only seven (7) available shared cache memory resources 108(1)- 108(7) and eight (8) available memory interfaces 116(1)-116(8) in the physical region 202(1) due to manufacturing issues.
  • a shared cache memory resource 108(1) is configured to use memory interface index table 1402(1) when determining the target identification for one of the memory interfaces.
  • the memory interface index table 1402(1) directs the memory request to memory interface 1 (Mi l).
  • the memory interface index table 1402(1) directs the memory request to memory interface 0 (MI O).
  • a shared cache memory resource e.g., 108(2)
  • MI_2 memory interface 2
  • the memory interface index table 1402(2) directs the memory request to memory interface 1 (Mi l).
  • memory interface 1 there is an equal number of entries in the collective memory index table 1400 for each of the seven memory interfaces to equally interleave the memory addresses from eight shared cache memory addresses to seven memory interfaces.
  • the collection of memory index tables interleaves system memory addresses equally across one or more memory interfaces in the physical region 202(1)
  • Figure 15 is an exemplary flowchart for decoding a system memory address of a memory request in accordance with interleaving memory addresses across physical regions and across shared memory resources, including, but not limited to, the shared memory resources in the IC chip in Figures 1 and 2.
  • Process 1500 begins at block 1502.
  • process 1500 determines configuration parameters comprising a plurality of available shared memory resources on a SoC in one or more physical regions, each of the plurality of available shared memory resources associated with a target identifier.
  • the configuration parameters comprise locations for system memory addresses which are interleaved across the one or more physical regions and one or more available shared memory resources within a physical region.
  • the configuration parameters further comprise a plurality of hashing regions wherein each hashing region includes a hash circuit and corresponds to a unique combination of the one or more physical regions and the one or more available shared memory resources.
  • the process 1500 determines a first hashing region of the plurality of hashing regions in which the system address resides.
  • the process 1500 hashes the system address based on a first hash circuit corresponding to the first hashing region to identify a first physical region.
  • the process 1500 hashes the system address based on a second hash circuit corresponding to the first hashing region to select a first available shared memory resource within the first physical region.
  • the process 1500 determines a first target identifier of the first available shared memory resource.
  • Electronic devices that include a processor-based system 100 for interleaving addresses across physical regions and across shared memory resources as described in Figures 1 and 2 and including the system address decoding circuits 122(0), 122(1) (“122”) as described in Figures 6 and 7 and according to, but not limited to, any of the exemplary processes 500 and 1500 , and according to any aspects disclosed herein, may be provided in or integrated into any processor-based device.
  • Figure 16 illustrates an example of a processor-based system 1600 that can include a system address decoding circuit 122 for interleaving addresses across physical regions and available shared memory resources within a physical region as described in Figures 6 and 7, and according to any exemplary aspects disclosed herein.
  • the processor-based system 1600 may be formed as a SoC 1606 which is an IC 1604.
  • the processor-based system 1600 includes a central processing unit (CPU) 1608 that includes one or more processors 1610, which may also be referred to as CPU cores or processor units.
  • the CPU 1608 may have cache memory 1612 coupled to the CPU 1608 for rapid access to temporarily stored data.
  • the CPU 1608 is coupled to a system bus 1614 and can intercouple master and slave devices included in the processorbased system 1600. As is well known, the CPU 1608 communicates with these other devices by exchanging address, control, and data information over the system bus 1614. For example, the CPU 1608 can communicate bus transaction requests to a memory controller 1616, as an example of a slave device. Although not illustrated in Figure 16, multiple system buses 1614 could be provided, wherein each system bus 1614 constitutes a different fabric.
  • Other master and slave devices can be connected to the system bus 1614. As illustrated in Figure 16, these devices can include a memory system 1620 that includes the memory controller 1616 and a memory array(s) 1618, one or more input devices 1622, one or more output devices 1624, one or more network interface devices 1626, and one or more display controllers 1628, as examples. Each of the memory system 1620, the one or more input devices 1622, the one or more output devices 1624, the one or more network interface devices 1626, and the one or more display controllers 1628 can be provided in the same or different electronic devices 1602.
  • the input device(s) 1622 can include any type of input device, including, but not limited to, input keys, switches, voice processors, etc.
  • the output device(s) 1624 can include any type of output device, including, but not limited to, audio, video, other visual indicators, etc.
  • the network interface device(s) 1626 can be any device configured to allow exchange of data to and from a network 1630.
  • the network 1630 can be any type of network, including, but not limited to, a wired or wireless network, a private or public network, a local area network (LAN), a wireless local area network (WLAN), a wide area network (WAN), a BLUETOOTHTM network, and the Internet.
  • the network interface device(s) 1626 can be configured to support any type of communications protocol desired.
  • the CPU 1608 may also be configured to access the display controlled s) 1628 over the system bus 1614 to control information sent to one or more displays 1632.
  • the display controller(s) 1628 sends information to the display(s) 1632 to be displayed via one or more video processor(s) 1634, which process the information to be displayed into a format suitable for the display(s) 1632.
  • the display controller(s) 1628 and video processor(s) 1634 can be included as ICs in the same or different electronic devices 1602, and in the same or different electronic devices 1602 containing the CPU 1608, as an example.
  • the display(s) 1632 can include any type of display, including, but not limited to, a cathode ray tube (CRT), a liquid crystal display (LCD), a plasma display, a light emitting diode (LED) display, etc.
  • CTR cathode ray tube
  • LCD liquid crystal display
  • LED light emitting diode
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • a processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine.
  • a processor may also be implemented as a combination of computing devices (e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration).
  • RAM Random Access Memory
  • ROM Read Only Memory
  • EPROM Electrically Programmable ROM
  • EEPROM Electrically Erasable Programmable ROM
  • registers a hard disk, a removable disk, a CD-ROM, or any other form of computer readable medium known in the art.
  • An exemplary storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium.
  • the storage medium may be integral to the processor.
  • the processor and the storage medium may reside in an ASIC.
  • the ASIC may reside in a remote station.
  • the processor and the storage medium may reside as discrete components in a remote station, base station, or server.
  • a processor-based system for determining a target identification for a memory request comprising: a processor configured to determine at least one configuration parameter comprising a plurality of available shared memory resources on a system- on-chip (SoC) in one or more physical regions, each of the plurality of available shared memory resources associated with a target identifier, the at least one configuration parameter further comprising: a plurality of hashing regions for system memory addresses which are interleaved across the one or more physical regions and one or more of the plurality of available shared memory resources within a physical region of the one or more physical regions, each hashing region corresponding to a hash circuit and a unique combination of the one or more physical regions and one or more available memory resources; and in response to receiving the memory request having a system memory address: the processor is configured to determine a first hashing region of the plurality of hashing regions in which the system memory address resides; the processor is configured to hash the system memory address based on
  • the processor-based system is further configured to: determine whether the memory request misses in the first available shared cache memory resource; and in response to the memory request missing in the first available shared cache memory resource, the processor-based system is further configured to: hash the system memory address based on a third hash circuit corresponding to the first hashing region to select a first available memory interface out of one or more available memory interfaces within the first physical region; and determine a second target identifier of the first available memory interface.
  • processor-based system of clause 3 wherein the processor is configured to hash the system memory address based on the second hash circuit corresponding to the first hashing region to select the first available shared memory resource within the first physical region, the processor further configured to: modify the selected first available shared memory resource by a modulo add operation to ensure that a lowest memory address of the first hashing region resolves to an available shared memory resource which is sequential to a last available shared memory resource mapped from a last address in a lower adjacent hash region.
  • a method of determining a target identification for a memory request in a processor-based system comprising: determining at least one configuration parameter comprising a plurality of available shared memory resources on a system-on-chip (SoC) in one or more physical regions, each of the plurality of available shared memory resources associated with a target identifier, the at least one configuration parameter further comprising: a plurality of hashing regions for system memory addresses which are interleaved across the one or more physical regions and one or more of the plurality of available shared memory resources within a physical region of the one or more physical regions, each hashing region corresponds to a hash circuit and a unique combination of the one or more physical regions and one or more available memory resources; and in response to receiving the memory request having a system memory address: determining a first hashing region of the plurality of hashing regions in which the system memory address resides; hashing the system memory address based on a first hash circuit corresponding to the first hashing region to identify a first physical region; hashing
  • the method further comprising: determining whether the memory request misses in the first available shared cache memory resource; and in response to the memory request missing in the first available shared cache memory resource: hashing the system memory address based on a third hash circuit corresponding to the first hashing region to select a first available memory interface out of one or more available memory interfaces within the first physical region; and determining a second target identifier of the first available memory interface.
  • hashing the system memory address based on the second hash circuit corresponding to the first hashing region to select the first available shared memory resource within the first physical region further comprises: modifying the selected first available shared memory resource by a modulo add operation to ensure that the lowest memory address of the first hashing region resolves to an available shared memory resource which is sequential to a last available shared memory resource mapped from a last address in a lower adjacent hash region.
  • determining the at least one configuration parameter of the plurality of available shared memory resources on the SoC in the one or more physical regions further comprises: configuring a collection of memory index tables wherein each memory index table is unique for each of one or more shared memory resources in the first physical region, the collection of memory index tables interleaving the system memory addresses equally across the one or more available memory interfaces in the first physical region, and wherein determining the second target identifier of the first available memory interface further comprises: looking up an address into a first memory index table configured to the first available shared memory resource to retrieve the first target identifier.
  • a non-transitory computer-readable storage medium comprising instructions executable by a processor, which, when executed by the processor, cause the processor to determine a target identification for a memory request in a processor-based system comprising: determining at least one configuration parameter comprising a plurality of available shared memory resources on a system-on-chip (SoC) in one or more physical regions, each of the plurality of available shared memory resources associated with a target identifier, the at least one configuration parameter further comprising: a plurality of hashing regions for system memory addresses which are interleaved across the one or more physical regions and one or more of the plurality of available shared memory resources within a physical region of the one or more physical regions, each hashing region corresponding to a hash circuit and a unique combination of the one or more physical regions and one or more available memory resources; and in response to receiving the memory request having a system memory address: determining a first hashing region of the plurality of hashing regions in which the system memory address resides; hashing the system memory address
  • non-transitory computer-readable storage medium of clauses 15-16 wherein the plurality of hashing regions define a physical address space in the processor-based system, each of the plurality of hashing regions comprising a subset of the physical address space which are contiguous with adjacent hashing regions; and wherein hashing the system memory address based on the second hash circuit corresponding to the first hashing region to select the first available shared memory resource within the first physical region further comprises: applying a modulo multiply operation to ensure that each of the one or more of the plurality of available shared memory resources will be sequentially targeted for each sequential address within the first hashing region.
  • hashing the system memory address based on the second hash circuit corresponding to the first hashing region to select the first available shared memory resource within the first physical region further comprises: modifying the selected first available shared memory resource by a modulo add operation to ensure that a lowest memory address of the first hashing region resolves to an available shared memory resource which is sequential to a last available shared memory resource mapped from a last address in a lower adjacent hash region.
  • determining the at least one configuration parameter of the plurality of available shared memory resources on the SoC in the one or more physical regions further comprises: configuring a collection of memory index tables wherein each memory index table is unique for each of one or more shared memory resources in the first physical region, the collection of memory index tables interleaving the system memory addresses equally across the one or more available memory interfaces in the first physical region, and wherein determining the second target identifier of the first available memory interface further comprises: looking up an address into a first memory index table configured to the first available shared memory resource to retrieve the first target

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

L'invention divulgue un décodage d'adresse mémoire système pour l'entrelacement d'adresses à travers des régions physiques d'un système sur puce (SOC) et à travers des ressources de mémoire partagée dans un système basé sur un processeur et des circuits de hachage associés. Dans des aspects donnés à titre d'exemple, la SoC est configurée pour découvrir, pour chaque plage d'adresses, le nombre de régions physiques et le nombre et/ou la taille des ressources de mémoire partagée disponibles y compris des mémoires caches, des filtres de surveillance et des interfaces de mémoire à l'intérieur de chaque région physique. La SoC peut comporter un circuit de décodage d'adresse de mémoire système qui est configuré pour décoder de manière adaptative une adresse mémoire sur la base de la plage d'adresses de mémoire dans laquelle se trouve l'adresse système et ensuite diriger une telle demande d'accès à la mémoire vers la ressource de mémoire partagée appropriée de telle sorte que chaque adresse dans la plage d'adresses de mémoire s'étende sur la totalité de la ressource de mémoire partagée dans la plage.
PCT/US2024/020378 2023-03-21 2024-03-18 Décodage d'adresse mémoire système pour entrelacement d'adresses à travers des régions physiques d'un système sur puce (soc) et à travers des ressources de mémoire partagée dans un système basé sur un processeur Pending WO2024196846A1 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US202363491506P 2023-03-21 2023-03-21
US63/491,506 2023-03-21
US18/511,079 US12380019B2 (en) 2023-03-21 2023-11-16 System memory address decoding for interleaving addresses across physical regions of a system-on-chip (SOC) and across shared memory resources in a processor-based system
US18/511,079 2023-11-16

Publications (1)

Publication Number Publication Date
WO2024196846A1 true WO2024196846A1 (fr) 2024-09-26

Family

ID=90826490

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2024/020378 Pending WO2024196846A1 (fr) 2023-03-21 2024-03-18 Décodage d'adresse mémoire système pour entrelacement d'adresses à travers des régions physiques d'un système sur puce (soc) et à travers des ressources de mémoire partagée dans un système basé sur un processeur

Country Status (1)

Country Link
WO (1) WO2024196846A1 (fr)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100042759A1 (en) * 2007-06-25 2010-02-18 Sonics, Inc. Various methods and apparatus for address tiling and channel interleaving throughout the integrated system
US10403333B2 (en) * 2016-07-15 2019-09-03 Advanced Micro Devices, Inc. Memory controller with flexible address decoding

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100042759A1 (en) * 2007-06-25 2010-02-18 Sonics, Inc. Various methods and apparatus for address tiling and channel interleaving throughout the integrated system
US10403333B2 (en) * 2016-07-15 2019-09-03 Advanced Micro Devices, Inc. Memory controller with flexible address decoding

Similar Documents

Publication Publication Date Title
US10503661B2 (en) Providing memory bandwidth compression using compressed memory controllers (CMCs) in a central processing unit (CPU)-based system
KR102504728B1 (ko) Cpu(central processing unit)-기반 시스템에서 다수의 llc(last-level cache) 라인들을 사용하여 메모리 대역폭 압축을 제공하는 것
US9558120B2 (en) Method, apparatus and system to cache sets of tags of an off-die cache memory
JP2019532412A (ja) プロセッサベースシステムにおける空間サービス品質(QoS)タグ付けを使用する異種メモリシステムの柔軟な管理の実現
US10176090B2 (en) Providing memory bandwidth compression using adaptive compression in central processing unit (CPU)-based systems
US20160224241A1 (en) PROVIDING MEMORY BANDWIDTH COMPRESSION USING BACK-TO-BACK READ OPERATIONS BY COMPRESSED MEMORY CONTROLLERS (CMCs) IN A CENTRAL PROCESSING UNIT (CPU)-BASED SYSTEM
US12007896B2 (en) Apparatuses, systems, and methods for configuring combined private and shared cache levels in a processor-based system
US20240134650A1 (en) Devices transferring cache lines, including metadata on external links
US9442675B2 (en) Redirecting data from a defective data entry in memory to a redundant data entry prior to data access, and related systems and methods
US12380019B2 (en) System memory address decoding for interleaving addresses across physical regions of a system-on-chip (SOC) and across shared memory resources in a processor-based system
US11194744B2 (en) In-line memory module (IMM) computing node with an embedded processor(s) to support local processing of memory-based operations for lower latency and reduced power consumption
WO2024196846A1 (fr) Décodage d'adresse mémoire système pour entrelacement d'adresses à travers des régions physiques d'un système sur puce (soc) et à travers des ressources de mémoire partagée dans un système basé sur un processeur
US11880306B2 (en) Apparatus, system, and method for configuring a configurable combined private and shared cache
CN117795490A (zh) 用于在基于处理器的系统中配置组合私有和共享高速缓存层级的设备、系统和方法
EP4352620B1 (fr) Appareils, systèmes et procédés de configuration de niveaux de cache privés et partagés combinés dans un système à base de processeur
WO2022261223A1 (fr) Appareil, système et procédé de configuration d'une mémoire cache privée et partagée combinée configurable
US11947454B2 (en) Apparatuses, systems, and methods for controlling cache allocations in a configurable combined private and shared cache in a processor-based system
US12493552B1 (en) Performing snoop filter replacement based on history-augmented victimization priority values of snoop filter entries in processor-based devices
US20240248843A1 (en) Apparatuses, systems, and methods for controlling cache allocations in a configurable combined private and shared cache in a processor-based system
JP5752331B2 (ja) 物理タグ付けされたデータキャッシュへのトラフィックをフィルタリングするための方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24720949

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE