WO2011147884A1 - Fast remote communication and computation between processors - Google Patents
Fast remote communication and computation between processors Download PDFInfo
- Publication number
- WO2011147884A1 WO2011147884A1 PCT/EP2011/058582 EP2011058582W WO2011147884A1 WO 2011147884 A1 WO2011147884 A1 WO 2011147884A1 EP 2011058582 W EP2011058582 W EP 2011058582W WO 2011147884 A1 WO2011147884 A1 WO 2011147884A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- bytes
- processor
- memory
- remote
- remote processor
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/544—Buffers; Shared memory; Pipes
Definitions
- the present invention relates generally to an improved data processing system, and in particular, to a computer implemented method for improving operations in a multiprocessor or multi-core data processing environment. Still more particularly, the present invention relates to a computer-implemented method, system, and computer-usable program code for fast remote communication and computation between processors or processor cores in a multiprocessor or multi-core data processing environment.
- Data processing systems include processors for performing computations.
- a processor can include multiple processing cores.
- a core is a processor or a unit of a processor circuitry that is capable of operating as a separate processing unit. Some data processing systems can include multiple processors.
- a data processing environment can include data processing systems including single processors, multi-core processors, and multiprocessor
- a data processing environment including multiple processors or processors with multiple cores is collectively referred to as a multiprocessor environment.
- a thread is a stream of executable code within an application that can be executed on a processor.
- An application executing in a data processing system spawns threads that are executed by a processor in the data processing system.
- the implementation of threads and processes differs from one operating system to another, but in most cases, a thread is contained inside a process associated with the application. Multiple threads can exist within the same process and share resources such as memory.
- a processor in a multiprocessor environment operates on data that can be referenced using an address space associated with a process executing on the processor. Such an address space is called a context. Thus, a processor performs computations within a context.
- An effective address is a memory address as provided in an instruction that a processor executes. Generally, an effective address resolves to an address space of a memory accessible to the processor.
- a global address is an address that resolves to a global address space.
- a global address space is associated with a memory accessible to all processors in the data processing environment (hence global to the data processing environment). An effective address can be transformed into a global address under suitable configuration of memory in a multiprocessor data processing environment.
- the preferred embodiments provide a method, system, and computer usable program product for fast remote communication and computation between processors.
- An embodiment configures a direct core to core communication unit (DCC) to operate with a first processor, the first processor being a remote processor.
- the embodiment receives in a memory associated with the DCC a set of bytes, the set of bytes being sent from a second processor.
- the embodiment creates without software intervention a hardware execution context using data specified in the set of bytes at the remote processor.
- the embodiment executes an operation specified in the set of bytes at the remote processor within the created context.
- DCC direct core to core communication unit
- the set of bytes includes information about a context available at the remote processor within which the remote processor has to execute an instruction included in the set of bytes.
- An embodiment further determines whether the set of bytes form a complete instruction. The execution occurs in response to the determining being positive. In another embodiment, executing the set of bytes further loads the bytes using a load instruction at the remote processor for computation without requiring a first thread for reading the contents of the set of bytes and a second thread for executing the contents of the set of bytes.
- executing the set of bytes further loads the bytes using a load instruction at the remote processor for computation without requiring sending an interrupt to the remote processor.
- Another embodiment further assess whether the set of bytes is in the first logical position in a FIFO queue in the memory. Execution occurs in response to the assessing being affirmative.
- Another embodiment further assess whether the set of bytes is blocked from execution. Execution occurs in response to the assessing being negative.
- the set of bytes are sent by a thread executing on the second processor.
- Another embodiment further composes, using the thread, the set of bytes.
- the embodiment writes the set of bytes directly to the memory of the DCC of the remote processor.
- the set of bytes are sent using a store instruction that permits the second processor to write directly to the memory of the DCC of the remote processor.
- the memory is a static random access memory (SRAM).
- SRAM static random access memory
- the memory is configured to store several sets of bytes in a first- in first-out (FIFO) queue.
- An embodiment further enables the second processor to write to the memory of the DCC of the remote processor.
- the embodiment configures the remote processor to allow the second processor to write to the memory of the DCC of the remote processor.
- arrival of the set of bytes in the memory triggers execution without an interrupt being sent to the remote processor.
- the present invention provides a method for fast remote communication and computation between processors, the method comprising the steps of: configuring a direct core to core communication unit (DCC) to operate with a first processor, the first processor being a remote processor; receiving in a memory associated with the DCC a set of bytes, the set of bytes being sent from a second processor; creating without software intervention a hardware execution context using data specified in the set of bytes at the remote processor; and executing an operation specified in the set of bytes at the remote processor using the created context.
- DCC direct core to core communication unit
- the present invention provides a method wherein the set of bytes include information about a context available at the remote processor within which the remote processor has to execute an instruction included in the set of bytes.
- the present invention provides a method further comprising: determining whether the set of bytes form a complete instruction, wherein the executing is responsive to the determining being positive.
- the present invention provides a method wherein executing the set of bytes comprises: loading the bytes using a load instruction that loads the set of bytes at the remote processor for computation without requiring a first thread for reading the contents of the set of bytes and a second thread for executing the contents of the set of bytes.
- the present invention provides a method wherein executing the set of bytes further comprises: loading the bytes using an instruction that loads the set of bytes at the remote processor for computation without requiring sending an interrupt to the remote processor.
- the present invention provides a method further comprising: assessing whether the set of bytes is the first in a FIFO queue in the memory, wherein the executing is responsive to the assessing being affirmative.
- the present invention provides a method further comprising: assessing whether the set of bytes is blocked from execution, wherein the executing is responsive to the assessing being negative.
- the present invention provides a method wherein the set of bytes are sent by a thread executing on the second processor.
- the present invention provides a method further comprising: composing, using the thread, the set of bytes; and writing the set of bytes directly to the memory of the DCC of the remote processor.
- the present invention provides a method wherein the set of bytes are sent using a store instruction that permits the second processor to write directly to the memory of the DCC of the remote processor.
- the present invention provides a method wherein the memory is a static random access memory (SRAM).
- SRAM static random access memory
- the present invention provides a method wherein the memory is configured to store a plurality of sets of bytes in a first-in first-out (FIFO) queue.
- FIFO first-in first-out
- the present invention provides a method further comprising: enabling the second processor to write to the memory of the DCC of the remote processor; and configuring the remote processor to allow the second processor to write to the memory of the DCC of the remote processor.
- the present invention provides a method wherein arrival of the set of bytes in the memory triggers the executing without an interrupt being sent to the remote processor.
- the present invention provides an apparatus for fast remote communication and computation between processors, comprising: a first processor, the first processor being a remote processor; a direct core to core communication unit (DCC) configured to operate with the first processor; and a memory associated with the DCC, the memory receiving a set of bytes, the set of bytes being sent from a second processor, wherein the remote processor executes an operation specified in the set of bytes using a hardware execution context that is created without software intervention by using data specified in the set of bytes.
- DCC direct core to core communication unit
- the present invention provides an apparatus wherein the set of bytes include information about a context available at the remote processor within which the remote processor has to execute an instruction included in the set of bytes.
- the present invention provides an apparatus wherein the executing is responsive to the set of bytes forming a complete instruction.
- the present invention provides an apparatus wherein the remote processor executes the set of bytes responsive to loading the bytes using a load instruction that loads the set of bytes at the remote processor for computation without requiring a first thread for reading the contents of the set of bytes and a second thread for executing the contents of the set of bytes.
- the present invention provides an apparatus wherein the remote processor executes the set of bytes responsive to loading the bytes using an instruction that loads the set of bytes at the remote processor for computation without requiring sending an interrupt to the remote processor.
- the present invention provides an apparatus wherein the remote processor executes the set of bytes responsive to the set of bytes occupying first logical position in a FIFO queue in the memory.
- the present invention provides an apparatus wherein the remote processor executes the set of bytes responsive to the set of bytes not being blocked from execution.
- the present invention provides an apparatus wherein the set of bytes are sent by a thread executing on the second processor.
- the present invention provides an apparatus wherein the memory is a static random access memory (SRAM).
- SRAM static random access memory
- the present invention provides an apparatus wherein the memory is configured to store a plurality of sets of bytes in a first-in first-out (FIFO) queue.
- FIFO first-in first-out
- the present invention provides an apparatus further comprising: a first configuration enabling the second processor to write to the memory of the DCC of the remote processor; and a second configuration configuring the remote processor to allow the second processor to write to the memory of the DCC of the remote processor.
- the present invention provides an apparatus wherein arrival of the set of bytes in the memory triggers the executing without an interrupt being sent to the remote processor.
- the present invention provides a data processing system for fast remote communication and computation between processors, the data processing system comprising: a storage device including a storage medium, wherein the storage device stores computer usable program code; and a processor, wherein the processor executes the computer usable program code, and wherein the computer usable program code comprises: computer usable code for configuring a direct core to core communication unit (DCC) to operate with the processor, the processor being a remote processor; computer usable code for receiving in a memory associated with the DCC a set of bytes, the set of bytes being sent from a second processor; computer usable code for creating without software intervention a hardware execution context using data specified in the set of bytes at the remote processor; and computer usable code for executing an operation specified in the set of bytes at the remote processor using the created context.
- DCC direct core to core communication unit
- the present invention provides a data processing system wherein the set of bytes include information about a context available at the remote processor within which the remote processor has to execute an instruction included in the set of bytes.
- the present invention provides a data processing system further comprising: computer usable code for determining whether the set of bytes form a complete instruction, wherein the executing is responsive to the determining begin positive.
- the present invention provides a data processing system wherein executing the set of bytes comprises: computer usable code for loading the bytes using a load instruction that loads the set of bytes at the remote processor for computation without requiring a first thread for reading the contents of the set of bytes and a second thread for executing the contents of the set of bytes.
- the present invention provides a data processing system wherein executing the set of bytes further comprises: computer usable code for loading the bytes using an instruction that loads the set of bytes at the remote processor for computation without requiring sending an interrupt to the remote processor.
- the present invention provides a data processing system further comprising: computer usable code for assessing whether the set of bytes is the first in a FIFO queue in the memory, wherein the executing is responsive to the assessing being affirmative.
- the present invention provides a data processing system further comprising: computer usable code for assessing whether the set of bytes is blocked from execution, wherein the executing is responsive to the assessing being negative.
- Figure 1 depicts a block diagram of a data processing system in which preferred
- Figure 2 depicts a block diagram of an example logical partitioned platform in which a preferred embodiment of the present invention may be implemented
- Figure 3 depicts a block diagram of an example remote computing environment with respect to which preferred embodiments of the present invention may be implemented;
- Figure 4 depicts a block diagram of an example configuration for fast remote computation and communication between processors in accordance with a preferred embodiment of the present invention
- Figure 5 depicts a block diagram of one part of performing fast remote communication and computation between processors in accordance with a preferred embodiment of the present invention
- Figure 6 depicts a block diagram of another part of performing fast remote communication and computation between processors in accordance with a preferred embodiment of the present invention
- Figure 7 depicts a flowchart of an example process for a part of fast remote communication and computation between processors in accordance with a preferred embodiment of the present invention
- Figure 8 depicts a flowchart of an example process for another part of fast remote communication and computation between processors in accordance with a preferred embodiment of the present invention
- Figure 9 depicts a flowchart of an example process for managing the volume of fast remote communication and computation between processors in accordance with a preferred embodiment of the present invention
- Figure 10 depicts a flowchart of an example process for another part of fast remote communication and computation between processors in accordance with a preferred embodiment of the present invention.
- Figure 11 depicts a flowchart of an example process for configuring fast remote
- a multiprocessor or multi-core data processing environment can be configured such that a thread executing on one processor or core can perform operations using another processor or core.
- communications between processors are accomplished in many ways, including shared memory (SM), message passing (MP), remote procedure call (RPC), active message (AM), and active memory operation (AMO).
- SM shared memory
- MP message passing
- RPC remote procedure call
- AM active message
- AMO active memory operation
- the preferred embodiments recognize that performing remote computations using a presently available method incurs substantial overhead cost in terms of computing resources Remote computations are computations performed on one processor for the benefit of a thread executing on another processor. For example, an instruction that may only take four cycles to execute and perform the desired computation, may consume a thousand cycles by the time the procedure of an existing method for communication is complete.
- the preferred embodiments further recognize that some of the overhead cost in remote communication and computations between processors arises from the cost of reading and writing to dynamic random access memory (DRAM) devices such as those used in general memory or main memory in present data processing systems. Initiating computation on the remote processor, such as through generating a hardware interrupt and scheduling a software thread, is also presently an expensive process. Additional overhead comes from reading and interpreting the contents of the message, whatever form they take. Recognizing, reaching, and retrieving data for such computations is also presently an expensive process.
- the invention recognizes that a hardware mechanism to enable a lower overhead cost remote operation may be desirable.
- the preferred embodiments used to describe the invention generally address and solve the above-described problems and other problems related to communicating with remote processors or invoking computation on remote processors in multiprocessor environments.
- the preferred embodiments of the invention provide a method, computer usable program product, and data processing system for fast remote communication and computation between processors.
- the preferred embodiments may be implemented with respect to any type of data processing system.
- a preferred embodiment described with respect to a processor may be implemented in a multi-core processor or a multiprocessor system within the scope of the invention.
- an embodiment of the invention may be implemented with respect to any type of client system, server system, platform, or a combination thereof.
- the preferred embodiments are further described with respect to certain parameters, attributes, and configurations only as examples. Such descriptions are not intended to be limiting on the invention.
- An implementation of an embodiment may take the form of data objects, code objects, encapsulated instructions, application fragments, distributed application or a portion thereof, drivers, routines, services, systems - including basic I/O system (BIOS), and other types of software implementations available in a data processing environment.
- BIOS basic I/O system
- Java® Virtual Machine (JVM®) Java® object, an Enterprise Java Bean (EJB®), a servlet, or an applet may be manifestations of an application with respect to which, within which, or using which, the invention may be implemented.
- JVM® Java® Virtual Machine
- Java® object Java® object
- EJB® Enterprise Java Bean
- a servlet a servlet
- an applet may be manifestations of an application with respect to which, within which, or using which, the invention may be implemented.
- Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates.
- a preferred embodiment may be implemented in hardware, software, or a combination of hardware and software.
- the examples in this disclosure are used only for the clarity of the description and are not limiting on the preferred embodiments. Additional or different information, data, operations, actions, tasks, activities, and manipulations will be
- the preferred embodiments are described using specific code, data structures, files, file systems, logs, designs, architectures, layouts, schematics, and tools only as examples and are not limiting on the preferred embodiments. Furthermore, the preferred embodiments are described in some instances using particular data processing environments only as an example for the clarity of the description. The preferred embodiments may be used in conjunction with other comparable or similarly purposed structures, systems, applications, or architectures.
- Data processing system 100 may be a symmetric multiprocessor (SMP) system including a plurality of processors 101, 102, 103, and 104, which connect to system bus 106.
- SMP symmetric multiprocessor
- data processing system 100 may be an IBM Power System® implemented as a server within a network. (Power Systems is a product and a trademark of International Business Machines Corporation in the United States and other countries). Alternatively, a single processor system may be employed.
- memory controller/cache 108 Also connected to system bus 106 is memory controller/cache 108, which provides an interface to a plurality of local memories 160-163.
- I/O bus bridge 110 connects to system bus 106 and provides an interface to I/O bus 112. Memory controller/cache 108 and I/O bus bridge 110 may be integrated as depicted.
- Data processing system 100 is a logical partitioned data processing system.
- data processing system 100 may have multiple heterogeneous operating systems (or multiple instances of a single operating system) running simultaneously. Each of these multiple operating systems may have any number of software programs executing within it.
- Data processing system 100 is logically partitioned such that different PCI I/O adapters 120-121, 128-129, and 136, graphics adapter 148, and hard disk adapter 149 may be assigned to different logical partitions.
- graphics adapter 148 connects for a display device (not shown), while hard disk adapter 149 connects to and controls hard disk 150.
- memories 160-163 may take the form of dual in-line memory modules (DIMMs). DIMMs are not normally assigned on a per DIMM basis to partitions. Instead, a partition will get a portion of the overall memory seen by the platform. For example, processor 101, some portion of memory from local memories 160-163, and I/O adapters 120, 128, and 129 may be assigned to logical partition PI; processors 102-103, some portion of memory from local memories
- Each operating system executing within data processing system 100 is assigned to a different logical partition. Thus, each operating system executing within data processing system 100 may access only those I/O units that are within its logical partition.
- AIX® Advanced Interactive Executive
- a second instance (image) of the AIX operating system may be executing within partition P2
- a Linux® or IBM-i® operating system may be operating within logical partition P3.
- AIX and IBM-i are trademarks of International business Machines Corporation in the United States and other countries. Linux is a trademark of Linus
- Peripheral component interconnect (PCI) host bridge 114 connected to I/O bus 112 provides an interface to PCI local bus 115.
- PCI input/output adapters 120-121 connect to PCI local bus 115 through PCI-to-PCI bridge 116, PCI bus 118, PCI bus 119, I/O slot 170, and I/O slot 171.
- PCI-to-PCI bridge 116 provides an interface to PCI bus 118 and PCI bus 119.
- PCI I/O adapters 120 and 121 are placed into I/O slots 170 and 171, respectively.
- Typical PCI bus implementations support between four and eight I/O adapters (i.e.
- Each PCI I/O adapter 120-121 provides an interface between data processing system 100 and input/output devices such as, for example, other network computers, which are clients to data processing system 100.
- An additional PCI host bridge 122 provides an interface for an additional PCI local bus 123.
- PCI local bus 123 connects to a plurality of PCI I/O adapters 128-129.
- PCI I/O adapters 128-129 connect to PCI local bus 123 through PCI-to-PCI bridge 124, PCI bus 126, PCI bus 127, I/O slot 172, and I/O slot 173.
- PCI-to-PCI bridge 124 provides an interface to PCI bus 126 and PCI bus 127.
- PCI I/O adapters 128 and 129 are placed into I/O slots 172 and 173, respectively. In this manner, additional I/O devices, such as, for example, modems or network adapters may be supported through each of PCI I/O adapters 128-129.
- data processing system 100 allows connections to multiple network computers.
- a memory mapped graphics adapter 148 is inserted into I/O slot 174 and connects to I/O bus 112 through PCI bus 144, PCI-to-PCI bridge 142, PCI local bus 141, and PCI host bridge 140.
- Hard disk adapter 149 may be placed into I/O slot 175, which connects to PCI bus 145.
- this bus connects to PCI-to-PCI bridge 142, which connects to PCI host bridge 140 by PCI local bus 141.
- a PCI host bridge 130 provides an interface for a PCI local bus 131 to connect to I/O bus 112.
- PCI I/O adapter 136 connects to I/O slot 176, which connects to PCI-to-PCI bridge 132 by PCI bus 133.
- PCI-to-PCI bridge 132 connects to PCI local bus 131.
- This PCI bus also connects PCI host bridge 130 to the service processor mailbox interface and ISA bus access pass-through logic 194 and PCI-to-PCI bridge 132.
- Service processor mailbox interface and ISA bus access pass-through logic 194 forwards
- NVRAM storage 192 connects to the ISA bus 196.
- Service processor 135 connects to service processor mailbox interface and ISA bus access pass-through logic 194 through its local PCI bus 195.
- Service processor 135 also connects to processors 101-104 via a plurality of JTAG/I2C busses 134.
- JTAG/I2C busses 134 are a combination of JTAG/scan busses (see IEEE 1149.1) and Phillips I2C busses.
- JTAG/I2C busses 134 may be replaced by only Phillips I2C busses or only JTAG/scan busses. All SP-ATTN signals of the host processors 101, 102, 103, and 104 connect together to an interrupt input signal of service processor 135.
- Service processor 135 has its own local memory 191 and has access to the hardware OP-panel 190.
- service processor 135 uses the JTAG/I2C busses 134 to interrogate the system (host) processors 101-104, memory controller/cache 108, and I/O bridge 110.
- service processor 135 has an inventory and topology understanding of data processing system 100.
- Service processor 135 also executes Built-In-Self-Tests (BISTs), Basic Assurance Tests (BATs), and memory tests on all elements found by interrogating the host processors 101-104, memory controller/cache 108, and I/O bridge 110. Any error information for failures detected during the BISTs, BATs, and memory tests are gathered and reported by service processor 135.
- BISTs Built-In-Self-Tests
- BATs Basic Assurance Tests
- memory tests on all elements found by interrogating the host processors 101-104, memory controller/cache 108, and I/O bridge 110. Any error information for failures detected during the BISTs, BATs, and memory tests are gathered and reported by service processor 135.
- data processing system 100 is allowed to proceed to load executable code into local (host) memories 160- 163.
- Service processor 135 then releases host processors 101-104 for execution of the code loaded into local memory 160-163. While host processors 101-104 are executing code from respective operating systems within data processing system 100, service processor 135 enters a mode of monitoring and reporting errors.
- the type of items monitored by service processor 135 include, for example, the cooling fan speed and operation, thermal sensors, power supply regulators, and recoverable and non-recoverable errors reported by processors 101-104, local memories 160-163, and I/O bridge 110.
- Service processor 135 saves and reports error information related to all the monitored items in data processing system 100.
- Service processor 135 also takes action based on the type of errors and defined thresholds. For example, service processor 135 may take note of excessive recoverable errors on a processor's cache memory and decide that this is predictive of a hard failure. Based on this determination, service processor 135 may mark that resource for deconfiguration during the current running session and future Initial Program Loads (IPLs). IPLs are also sometimes referred to as a "boot” or "bootstrap”.
- IPLs Initial Program Loads
- Data processing system 100 may be implemented using various commercially available computer systems.
- data processing system 100 may be implemented using IBM Power Systems available from International Business Machines Corporation.
- Such a system may support logical partitioning using an AIX operating system, which is also available from International Business Machines Corporation.
- FIG. 2 depicts a block diagram of an example logical partitioned platform in which the preferred embodiments may be implemented.
- the hardware in logical partitioned platform 200 may be implemented as, for example, data processing system 100 in Figure 1.
- Logical partitioned platform 200 includes partitioned hardware 230, operating systems 202, 204, 206, 208, and platform firmware 210.
- a platform firmware such as platform firmware 210, is also known as partition management firmware.
- Operating systems 202, 204, 206, and 208 may be multiple copies of a single operating system or multiple heterogeneous operating systems simultaneously run on logical partitioned platform 200. These operating systems may be implemented using IBM-i, which are designed to interface with a partition management firmware, such as Hypervisor. IBM-i is used only as an example in these preferred embodiments. Of course, other types of operating systems, such as AIX and Linux, may be used depending on the particular implementation.
- Operating systems 202, 204, 206, and 208 are located in partitions 203, 205, 207, and 209.
- Hypervisor software is an example of software that may be used to implement partition management firmware 210 and is available from International Business Machines
- Firmware is "software" stored in a memory chip that holds its content without electrical power, such as, for example, read-only memory (ROM), programmable ROM (PROM), erasable programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), and nonvolatile random access memory (nonvolatile RAM). Additionally, these partitions also include partition firmware 211, 213, 215, and 217.
- Partition firmware 211, 213, 215, and 217 may be implemented using initial boot strap code, IEEE-1275 Standard Open Firmware, and runtime abstraction software (RTAS), which is available from International Business Machines Corporation.
- RTAS runtime abstraction software
- the processors associated or assigned to the partitions are then dispatched to the partition's memory to execute the partition firmware.
- Partitioned hardware 230 includes a plurality of processors 232-238, a plurality of system memory units 240-246, a plurality of input/output (I/O) adapters 248-262, and a storage unit 270.
- processors 232-238, memory units 240-246, NVRAM storage 298, and I/O adapters 248-262 may be assigned to one of multiple partitions within logical partitioned platform 200, each of which corresponds to one of operating systems 202, 204, 206, and
- Partition management firmware 210 performs a number of functions and services for partitions 203, 205, 207, and 209 to create and enforce the partitioning of logical partitioned platform 200.
- Partition management firmware 210 is a firmware implemented virtual machine identical to the underlying hardware.
- partition management firmware 210 allows the simultaneous execution of independent OS images 202, 204, 206, and 208 by virtualizing all the hardware resources of logical partitioned platform 200.
- Service processor 290 may be used to provide various services, such as processing of platform errors in the partitions. These services also may act as a service agent to report errors back to a vendor, such as International Business Machines Corporation. Operations of the different partitions may be controlled through a hardware management console, such as hardware management console 280.
- Hardware management console 280 is a separate data processing system from which a system administrator may perform various functions including reallocation of resources to different partitions.
- the hardware in Figures 1-2 may vary depending on the implementation.
- Other internal hardware or peripheral devices such as flash memory, equivalent non- volatile memory, or optical disk drives and the like, may be used in addition to or in place of certain hardware depicted in Figures 1-2.
- An implementation of the preferred embodiments may also use alternative architecture for managing partitions without departing from the scope of the invention.
- FIG. 3 depicts a block diagram of an example remote computing environment with respect to which a preferred embodiment may be implemented.
- Processors 302 and 304 may each be implemented using any of processors 101-104 in
- Processors 302 and 304 may be distinct processors or separate cores in a multi- core processor.
- Processor 302 may send a message, a function call, or an operation for execution by performing send 306.
- a corresponding receive - receive 308 - of the sent message, call, or operation occurs at processor 304.
- thread 310 (sender thread) on processor 302 (sending processor) may pass a message to cause another thread on processor 304 (remote processor) to execute a desired computation.
- message passing may only be a transmission of data and may not include a request for remote computation.
- a receiving thread is scheduled 314 to read the message.
- the receiving thread executes at the schedule time to read the message.
- thread 316 (target thread) may perform a desired computation.
- Target thread 316 is woken up at the scheduled time and executed to perform the desired computation.
- RPC Presently used RPC method of communication with remote processors is different from MP in that the contents of the message include a function identifier and function parameters.
- steps 308, 312 and 314 are substantially as described above in terms of computing cost.
- RPC involves additional execution of having thread 318 locate the function using the function identifier and executing 320 the function using the supplied parameters.
- the function execution may utilize or spawn additional threads.
- Presently used AM method of communication with remote processors is different from RPC in that the contents of the message involves a function call by including a pointer to the function to invoke and one or more function parameters.
- Cost incurred in processing an AM may also include cost of executing thread 322 for setting up the context in which the function will execute, resolving the pointer to the function, or otherwise handling the AM and then executing 324 the function.
- AMO allows executing operations on a special-purpose processor associated with the memory controller on the home node of the data used in the operation.
- the computations typically supported in an AMO are limited to a small set of special-purpose operations, e.g., adding a scalar value to a single data point or attempting to acquire a lock or other singular operations or computations.
- an embodiment of the invention is not limited to singular operations from the small set of special-purpose operations.
- any type of operation in any numerosity without limitation may be performed remotely.
- send 332, 334, 336, or 338 for sending the results back to the sender thread in the sending processor may involve additional scheduling and executing costs.
- Data processing system 402 may include processor or core 404.
- Processor 404 may be similar to any of processors 302 or 304 in Figure 3.
- Direct core-to-core communication unit (DCC) 406 includes memory 408.
- data may be written or extracted from in a first-in first-out (FIFO) manner.
- FIFO is only an example method of reading and writing data into memory 408 and is not intended to be limiting on the invention.
- a processor may be allocated a specific slot of memory in which to write the data.
- one processor may write data in the fifth position only and another processor may write first in the seventh position and then in the eight position in memory 408.
- memory 408 may be treated as a scratchpad without any particular organization.
- any method of ordering the reading and writing of data in memory 408 may be used in conjunction with an embodiment without limiting the invention.
- FIFO is chosen as an example method of operation of memory 408 and similar apparatus in other embodiments only for the clarity of the description and not as a limitation.
- memory 408 may be implemented using static random access memory (SRAM). In another embodiment memory 408 may be implemented using dynamic random access memory (DRAM).
- SRAM static random access memory
- DRAM dynamic random access memory
- FIFO description table (FDT) 410 manages the FIFO read/write in memory 408.
- FDT 410 may additionally enable triggering of computations on processor 404 using data from memory 408 as explained elsewhere in this disclosure.
- FDT 410 is only described as an example to correspond with the example of FIFO ordering.
- FDT 410 as a structure is not limiting on the invention.
- FDT 410 may be replaced with any suitable structure
- Level 1 cache 412 LI cache
- level 2 cache 414 L2 cache
- memory 416 such as local memory 160 in Figure 1
- memory 408 may be a peer, to wit, at a comparable level of access, hierarchy, or speed, as LI cache 412.
- memory 408 may be superior, to wit, at a level of access, hierarchy, or speed higher compared to LI cache 412.
- memory 408 may be at a hierarchical level comparable to L2 cache 414.
- FIG. 5 depicts a block diagram of one part of performing fast remote communication and computation between processors in accordance with a preferred embodiment.
- Data processing system 502 may be similar to data processing system 402 in Figure 4.
- Processor 504, DCC 506, memory 508, and FDT 510 may be similar to their corresponding artifacts in Figure 4.
- An area of memory 508 may be allocated as global address space in which data may be written or read using a global address. Furthermore, in one embodiment, such an area may be contiguous in memory 508.
- Processor 504 may be a remote processor on which thread 512 executing on processor 514 may wish to perform an operation.
- An embodiment may provide special load and store instructions for reading from or writing to memory 508 in FIFO mode. Using such a store instruction, thread 512 may perform write 516 by executing a FIFO store with respect to memory 508.
- the FIFO store instruction of write 516 may store into memory 508 a certain number of bytes, which include context information.
- thread 512 may form data that is sufficient to cause the execution of the desired operation at processor 504.
- the bytes corresponding to the data written in write 516 may include one or more bytes of information that encode the context or address space to be used for executing the operation at processor 504. Further, the bytes corresponding to the data written in write 516 may include one or more bytes of data that contain instructions to execute on processor 504, the address of a function to execute on processor 504, or a function identifier that indicates an operation to invoke on processor 504. Further, the bytes corresponding to the data written in write 516 may include one or more bytes of parameters that are to be used with the operation in a specified context.
- the total number of bytes in write (FIFO store) 516 is determined at processor 514.
- the effective address of the bytes in thread 512's context (sender's effective address) is translated to or associated with a global address reachable by processes executing at either processor 504 or processor 514.
- the global address associated with the sender's effective address may be in the area of memory 508 that has been allocated as global address space.
- the bytes of write (FIFO store) 516 are then written to that global address in memory 508.
- An entry in FDT 510 is made relating to the bytes being written into memory 508.
- FIG. 6 depicts a block diagram of another part of performing fast remote communication and computation between processors in accordance with a preferred embodiment.
- Processor 604, DCC 606, memory 608, and FDT 610 may be similar to their corresponding artifacts in Figure 6.
- Trigger 612 may be a triggering mechanism to initiate execution of an instruction on processor 604. In one embodiment, trigger 612 may be implemented in hardware.
- Bytes 614 may be bytes including a context as may be written by write (FIFO store) 516 in Figure 5. Bytes 614 may be written in the order they are received at memory 608 operating as a FIFO queue. Bytes 614 may progressively move higher in the FIFO queue logical order, eventually to occupy the first logical position in the FIFO queue.
- FDT 610 may determine whether certain bytes in the FIFO queue in memory 608 are ready to be executed. For example, bytes not in the first position in the FIFO queue are not ready to be executed. As another example, bytes that have not been written completely at the time of checking are not ready to be executed. For example, the size or number of bytes from a certain global address may indicate that 64 bytes should be found or used from that address but only 32 bytes may be present at the time of checking.
- FDT 610 may determine that the bytes in the first position in the FIFO queue, such as bytes 614 having progressed to the first position, are ready to be executed. Using trigger 612, those bytes may be read (FIFO load) 616, or loaded into processor 604 for execution. Recall that the bytes include the proper context within which to execute the instruction in those bytes. As an example, at or before loading the bytes, FDT 610 may load the context information sent as part of write 516 in Figure 5 to appropriate registers in processor 604 so that addresses presented in that message are made to correspond to effective addresses usable by a thread in processor 604. These addresses may be used to encode the location of instructions to be executed as part of the requested operation. These addresses may also be used to encode the location of parameters to the function to be performed on processor 604.
- the arrival of the bytes at DCC 606 can automatically kick off computation at remote processor 604 in this manner.
- the bytes sent from a sending processor to a remote processor can include an instruction according to any convention.
- the instruction may resemble RPC and may include (optionally) a function identifier, a set of parameters for the function, and a context.
- a set of parameters is zero or more parameters.
- the instruction may resemble RPC but may indirectly imply the destination (the remote processor).
- the embodiment may identify the DCC of the remote processor, the memory of the DCC of the remote processor, or an address space within the memory of the DCC of the remote processor. Any other indirect or implicit identification of the destination where the instruction should execute may be used in conjunction with an embodiment within the scope of the invention.
- the instruction may resemble AM and may include (optionally) a remote processor identifier, a function code, a set of parameters for the function, and a context.
- the instruction may resemble AM but may indirectly imply the destination in any manner, including but not limited to those described above as examples.
- Other embodiments may encode the instruction to resemble any method of remote computation, including but not limited to any presently used method or specification. Such encoded instructions are going to be apparent from this disclosure to those of ordinary skill in the art and are contemplated within the scope of the invention.
- the process of executing the instruction from the memory in the DCC can be accomplished in a variety of ways without limitation on the invention.
- an existing thread may be woken up to perform the instruction or invoke the function included therein.
- a new thread may be created to perform the instruction.
- a hardware thread may be utilized to perform the instruction.
- the instruction (or the function therein) to be executed may be executed in a variety of ways without limitation on the invention.
- a function may be executed by performing a function table lookup and jumping to the function address found in the corresponding table entry.
- the function may be executed by jumping to a specified address.
- the contents of the message sent may include a set of binary (executable) instructions which are themselves executed by a thread on the remote processor. The result of executing the bytes sent from the sending processor may be returned from the remote processor in any manner suitable for a given implementation without departing the scope of the invention.
- a remote processor or another component utilized by the remote processor for computing according to the bytes sent from a sending processor, may hibernate, power down, go to a sleep mode, or otherwise exist in a power conservation mode at a given time.
- the remote processor or another component may be woken up at a suitable time to execute the requested operation.
- a remote processor or another component may be woken up at the arrival of the message (bytes) into a DCC's memory.
- the wake-up process may be automatic, periodic, event-based, or performed in any other way suitable for an implementation within the scope of the invention.
- an embodiment may allow a runtime environment or an operating system, at the sending processor, the remote processor, or both processors to operate as they presently do after an initial configuration according to an embodiment.
- a processor may be configured to be able to send a remote operation request to a second processor's DCC memory but not to a third processor's DCC memory.
- all or a subset of processors in a given data processing environment may be configured with the ability to write to each other's DCC memories.
- a processor and any other processors related to that processor in some way (a first gang of processors) may be configured to write to one or more remote processors in a remote gang of processors.
- the address spaces associated with various processors may be enabled for reading, writing, or execution by distant processors, instead of or in addition to associating processors with each other as described above.
- controlled portions of the address spaces may be enabled for reading, writing, or execution by distant processors.
- Access control to such address spaces may be implemented at any level suitable for a particular implementation. For example, one implementation of access control may enable writing to any available address within reach of the processor or DCC. In another implementation, certain address spaces may be reserved or demarcated for such remote communications. Access control with other granularities may be more suitable for other implementations.
- a trust relationship may be pre-created and may last for a period of time.
- a trust relationship may also be created on demand and may last only for a specific operation, such as for sending one or more communications.
- a trust relationship may be created directly between two processors, or may be inferred from other trust relationships of a processor.
- a remote processor may allocate registers, threads, or other computing resources for executing the bytes sent using an embodiment in any manner suitable to the particular processor's configuration. For example, a resource may be allocated from a shared pool of that resource or a pool of that resource dedicated for operating an embodiment of the invention.
- Process 700 may be implemented in the code of a thread, such as thread 512 in Figure 5.
- Process 700 begins by composing an instruction to write in a DCC of a remote processor (step 702).
- the instruction may be a set of bytes formed in any manner described or suggested within the scope of the invention as described above.
- a set of bytes is one or more bytes.
- Process 700 may accomplish step 702, for example, by writing the necessary data, such as function pointer, context, and parameters, to a local buffer.
- Process 700 writes the instruction to the remote processor's DCC using a suitable command, such as the FIFO send instruction (step 704).
- the instruction being written includes information about a context for executing the instruction. Process 700 ends thereafter.
- Process 800 may be implemented in a DCC, such as DCC 606 in Figure 6.
- Process 800 begins by receiving an instruction (the bytes as described with respect to
- FIG. 5 Figures 5 and 6 into a DCC of a (remote) processor (step 802).
- Another process may enter process 800 at the entry point marked "A".
- Process 800 determines whether the instruction of step 802 is at the head of the logical FIFO queue stored in the DCC memory (step 804). If the instruction is not at the head of the FIFO queue (the "No" path of step 804), process 800 may wait or otherwise allow an interval to elapse (step 806). Process 800 then returns to step 804.
- step 802 determines whether the instruction is complete, or in other words, whether the instruction is not blocked in any way (step 808). If the instruction is not complete or the instruction is blocked (the "No" path of step 808), process 800 returns to step 802.
- process 800 sends the instruction to the processor for execution within the provided context (step 810).
- Process 800 may end thereafter or return to step 802 to operate on the next instruction in the FIFO queue.
- sending instructions for execution in this manner cause an operation encoded within the instructions to be invoked without requiring the help of software threads.
- the invocation occurs within a hardware execution context using data specified in the instruction at the remote processor.
- the hardware execution context is the context corresponding to the context information included with the instruction.
- Process 900 may be implemented in a DCC, such as DCC 606 in Figure 6.
- Process 900 begins by determining whether usage of a memory associated with DCC for maintaining the FIFO queue according to an embodiment has reached a threshold capacity (step 902). If the memory usage has not reached the threshold (the "No" path of step 902), process 900 ends thereafter. Otherwise (the "Yes' path of step 902), process 900 overflows the FIFO queue to another memory while maintaining the sequencing of the various instructions stored in the FIFO queue (step 904). Process 900 ends thereafter. For overflowing to another memory, process 900 may allocate and configure a region of a memory for use as a FIFO queue in accordance with an embodiment (not shown).
- the overflow space may be allocated and configured in a memory different from the memory associated with the DCC, such as memory 608 in Figure 6.
- the overflow memory may be a peer of L2 cache or a portion of main memory.
- Process 1000 may be implemented in a DCC, such as DCC 606 in Figure 6.
- Process 1000 begins by detecting the arrival of (the bytes of) an operation in the DCC of a remote processor (step 1002).
- the operation arriving in step 1002 may be a DCC message containing information regarding an operation to be performed at the associated processor.
- the operation may be encoded in any manner suitable for a given implementation.
- Process 1000 determines whether a power save mode is active (step 1004). If a power save mode is not active (the "No" path of step 1004), process 1000 exits at the exit point marked "A” and enters another process with a corresponding entry point marked "A” in Figure 8.
- process 1000 wakes up the remote processor or a component associated therewith (step 1006).
- Process 1000 may additionally or alternatively wake up a thread as a part of returning from a power save mode (step 1008).
- Process 1000 exits at the exit point marked "A” and enters another process with a corresponding entry point marked "A” in Figure 8.
- Process 1100 may be implemented in an operating system or another application executing on a data processing system that includes a sending processor and a receiving processor, such as data processing system 502 in Figure 5.
- Process 1100 begins by configuring a sending processor to be able to send messages, instructions, or commands to a remote receiving processor's DCC (step 1102).
- Process 1100 or an equivalent process executing on the remote receiving processor's data processing system may configure the remote processor to receive messages, instructions or commands into the DCC from the sender of step 1102 (step 1104).
- Process 1100 may determine whether the sending process should be allowed to initiate remote communication or computation with the receiving process, and opt not to establish a connection if permissions checks indicate that the sender should not be allowed to perform the requested remote operation.
- Process 1100 determines whether more processors are to be configured for writing to remote DCC (step 1106). If more processors are to be configured (the "Yes" path of step 1106), process 1100 returns to step 1102. Otherwise (the "No” path of step 1106), process 1106 ends thereafter.
- a computer implemented method, apparatus, and computer program product are provided in the preferred embodiments for fast remote communication and computation between processors.
- a data processing environment may be able to synchronize operations between two or more processors.
- Remote updates may be executed in-place at remote processors by using an embodiment.
- An embodiment may allow executing operations over a range of remote addresses.
- An embodiment may also enable efficient remote execution of short functions that do not consume a significant number of cycles. Additionally, an embodiment may enable the execution of short functions without using interrupts, polling, or thread scheduling, or with reduced cost of interrupts or scheduling.
- the invention can take the form of an embodiment containing both hardware and software elements.
- the invention is implemented in software or program code, which includes but is not limited to firmware, resident software, and microcode.
- the invention can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system.
- a computer-usable or computer-readable medium can be any tangible apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
- the medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium.
- Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk, and an optical disk.
- Current examples of optical disks include compact disk - read only memory (CD-ROM), compact disk - read/write (CD-R/W) and DVD.
- a computer storage medium may contain or store a computer-readable program code such that when the computer-readable program code is executed on a computer, the execution of this computer-readable program code causes the computer to transmit another computer-readable program code over a communications link.
- This communications link may use a medium that is, for example without limitation, physical or wireless.
- a data processing system suitable for storing and/or executing program code will include at least one processor coupled directly or indirectly to memory elements through a system bus.
- the memory elements can include local memory employed during actual execution of the program code, bulk storage media, and cache memories, which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage media during execution.
- a data processing system may act as a server data processing system or a client data processing system.
- Server and client data processing systems may include data storage media that are computer usable, such as being computer readable.
- a data storage medium associated with a server data processing system may contain computer usable code.
- a client data processing system may download that computer usable code, such as for storing on a data storage medium associated with the client data processing system, or for using in the client data processing system.
- the server data processing system may similarly upload computer usable code from the client data processing system.
- the computer usable code resulting from a computer usable program product embodiment of the preferred
- embodiments may be uploaded or downloaded using server and client data processing systems in this manner.
- I/O devices including but not limited to keyboards, displays, pointing devices, etc.
- I/O controllers can be coupled to the system either directly or through intervening I/O controllers.
- Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks.
- Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multi Processors (AREA)
- Advance Control (AREA)
- Debugging And Monitoring (AREA)
Abstract
Description
Claims
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| GB1222539.7A GB2494578B (en) | 2010-05-27 | 2011-05-25 | Fast remote communication and computation between processors |
| DE112011100854.6T DE112011100854B4 (en) | 2010-05-27 | 2011-05-25 | Fast remote data transmission and remote calculation between processors |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US12/789,082 US9934079B2 (en) | 2010-05-27 | 2010-05-27 | Fast remote communication and computation between processors using store and load operations on direct core-to-core memory |
| US12/789,082 | 2010-05-27 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2011147884A1 true WO2011147884A1 (en) | 2011-12-01 |
Family
ID=44279668
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/EP2011/058582 Ceased WO2011147884A1 (en) | 2010-05-27 | 2011-05-25 | Fast remote communication and computation between processors |
Country Status (5)
| Country | Link |
|---|---|
| US (2) | US9934079B2 (en) |
| DE (1) | DE112011100854B4 (en) |
| GB (1) | GB2494578B (en) |
| TW (1) | TW201211899A (en) |
| WO (1) | WO2011147884A1 (en) |
Families Citing this family (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9411532B2 (en) * | 2001-09-07 | 2016-08-09 | Pact Xpp Technologies Ag | Methods and systems for transferring data between a processing device and external devices |
| US20130117168A1 (en) | 2011-11-04 | 2013-05-09 | Mark Henrik Sandstrom | Maximizing Throughput of Multi-user Parallel Data Processing Systems |
| US8789065B2 (en) | 2012-06-08 | 2014-07-22 | Throughputer, Inc. | System and method for input data load adaptive parallel processing |
| US8490111B2 (en) * | 2011-04-16 | 2013-07-16 | Throughputer, Inc. | Efficient network and memory architecture for multi-core data processing system |
| US9448847B2 (en) | 2011-07-15 | 2016-09-20 | Throughputer, Inc. | Concurrent program execution optimization |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20030084269A1 (en) * | 2001-06-12 | 2003-05-01 | Drysdale Tracy Garrett | Method and apparatus for communicating between processing entities in a multi-processor |
| WO2007092747A2 (en) * | 2006-02-02 | 2007-08-16 | Texas Instruments Incorporated | Multi-core architecture with hardware messaging |
| WO2009134217A1 (en) * | 2008-04-28 | 2009-11-05 | Hewlett-Packard Development Company, L.P. | Method and system for generating and delivering inter-processor interrupts in a multi-core processor and in certain shared-memory multi-processor systems |
Family Cites Families (16)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6249822B1 (en) | 1995-04-24 | 2001-06-19 | Microsoft Corporation | Remote procedure call method |
| US5887172A (en) | 1996-01-10 | 1999-03-23 | Sun Microsystems, Inc. | Remote procedure call system and method for RPC mechanism independent client and server interfaces interoperable with any of a plurality of remote procedure call backends |
| US6487607B1 (en) | 1998-02-26 | 2002-11-26 | Sun Microsystems, Inc. | Methods and apparatus for remote method invocation |
| US6904601B1 (en) | 2000-04-07 | 2005-06-07 | International Business Machines Corporation | Method and system for providing remote procedure calls in a multiprocessing system |
| US6744765B1 (en) | 2000-08-24 | 2004-06-01 | Sun Microsystems, Inc. | Mechanism for completing messages in memory |
| JP3800037B2 (en) * | 2001-06-06 | 2006-07-19 | 日本電気株式会社 | Interprocessor communication system and interprocessor communication method used therefor |
| US7315897B1 (en) * | 2002-09-13 | 2008-01-01 | Alcatel Lucent | Adaptable control plane architecture for a network element |
| US7346757B2 (en) * | 2002-10-08 | 2008-03-18 | Rmi Corporation | Advanced processor translation lookaside buffer management in a multithreaded system |
| US7536468B2 (en) | 2004-06-24 | 2009-05-19 | International Business Machines Corporation | Interface method, system, and program product for facilitating layering of a data communications protocol over an active message layer protocol |
| US7844973B1 (en) * | 2004-12-09 | 2010-11-30 | Oracle America, Inc. | Methods and apparatus providing non-blocking access to a resource |
| US7464115B2 (en) | 2005-04-25 | 2008-12-09 | Silicon Graphics, Inc. | Node synchronization for multi-processor computer systems |
| US7624250B2 (en) | 2005-12-05 | 2009-11-24 | Intel Corporation | Heterogeneous multi-core processor having dedicated connections between processor cores |
| US20070180310A1 (en) | 2006-02-02 | 2007-08-02 | Texas Instruments, Inc. | Multi-core architecture with hardware messaging |
| US7937532B2 (en) | 2007-03-30 | 2011-05-03 | Intel Corporation | Method and apparatus for speculative prefetching in a multi-processor/multi-core message-passing machine |
| US8239879B2 (en) * | 2008-02-01 | 2012-08-07 | International Business Machines Corporation | Notification by task of completion of GSM operations at target node |
| US8813091B2 (en) * | 2008-08-04 | 2014-08-19 | Oracle America, Inc. | Distribution data structures for locality-guided work stealing |
-
2010
- 2010-05-27 US US12/789,082 patent/US9934079B2/en active Active
-
2011
- 2011-05-18 TW TW100117377A patent/TW201211899A/en unknown
- 2011-05-25 DE DE112011100854.6T patent/DE112011100854B4/en active Active
- 2011-05-25 GB GB1222539.7A patent/GB2494578B/en active Active
- 2011-05-25 WO PCT/EP2011/058582 patent/WO2011147884A1/en not_active Ceased
-
2012
- 2012-03-07 US US13/413,787 patent/US8799625B2/en not_active Expired - Fee Related
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20030084269A1 (en) * | 2001-06-12 | 2003-05-01 | Drysdale Tracy Garrett | Method and apparatus for communicating between processing entities in a multi-processor |
| WO2007092747A2 (en) * | 2006-02-02 | 2007-08-16 | Texas Instruments Incorporated | Multi-core architecture with hardware messaging |
| WO2009134217A1 (en) * | 2008-04-28 | 2009-11-05 | Hewlett-Packard Development Company, L.P. | Method and system for generating and delivering inter-processor interrupts in a multi-core processor and in certain shared-memory multi-processor systems |
Non-Patent Citations (2)
| Title |
|---|
| DAVID WENTZLAFF ET AL: "On-Chip Interconnection Architecture of the Tile Processor", 1 September 2007, IEEE MICRO, IEEE SERVICE CENTER, LOS ALAMITOS, CA, US,, ISSN: 0272-1732, pages: 15 - 31, XP011196754 * |
| STRONG R ET AL: "Fast switching of threads between cores", OPERATING SYSTEMS REVIEW, vol. 43, no. 2, April 2009 (2009-04-01), ACM USA, pages 1 - 11, XP002653957, ISSN: 0163-5980 * |
Also Published As
| Publication number | Publication date |
|---|---|
| GB2494578A (en) | 2013-03-13 |
| GB201222539D0 (en) | 2013-01-30 |
| TW201211899A (en) | 2012-03-16 |
| US20110296138A1 (en) | 2011-12-01 |
| US9934079B2 (en) | 2018-04-03 |
| US8799625B2 (en) | 2014-08-05 |
| DE112011100854B4 (en) | 2020-06-10 |
| DE112011100854T5 (en) | 2013-01-24 |
| US20120191946A1 (en) | 2012-07-26 |
| GB2494578B (en) | 2017-11-29 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US8799892B2 (en) | Selective memory donation in virtual real memory environment | |
| US8510749B2 (en) | Framework for scheduling multicore processors | |
| US8359449B2 (en) | Prioritizing virtual real memory paging based on disk capabilities | |
| US8302102B2 (en) | System utilization through dedicated uncapped partitions | |
| US8448006B2 (en) | Performing virtual and/or physical resource management for power management | |
| US8156498B2 (en) | Optimization of thread wake up for shared processor partitions | |
| US8799908B2 (en) | Hardware-enabled lock mediation for controlling access to a contested resource | |
| US20100100892A1 (en) | Managing hosted virtualized operating system environments | |
| US8799625B2 (en) | Fast remote communication and computation between processors using store and load operations on direct core-to-core memory | |
| US20120272016A1 (en) | Memory affinitization in multithreaded environments | |
| US8458431B2 (en) | Expanding memory size | |
| US9513952B2 (en) | Sharing resources allocated to an entitled virtual machine | |
| US8139595B2 (en) | Packet transfer in a virtual partitioned environment | |
| JP4852585B2 (en) | Computer-implemented method, computer-usable program product, and data processing system for saving energy in multipath data communication | |
| US9092205B2 (en) | Non-interrupting performance tuning using runtime reset | |
| US20120124298A1 (en) | Local synchronization in a memory hierarchy | |
| US8880858B2 (en) | Estimation of boot-time memory requirement | |
| US9183056B2 (en) | Expanding memory size |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 11723031 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 112011100854 Country of ref document: DE Ref document number: 1120111008546 Country of ref document: DE |
|
| ENP | Entry into the national phase |
Ref document number: 1222539 Country of ref document: GB Kind code of ref document: A Free format text: PCT FILING DATE = 20110525 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 1222539.7 Country of ref document: GB |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 11723031 Country of ref document: EP Kind code of ref document: A1 |