US20170024249A1 - Parallel Execution Mechanism and Operating Method Thereof - Google Patents
Parallel Execution Mechanism and Operating Method Thereof Download PDFInfo
- Publication number
- US20170024249A1 US20170024249A1 US15/287,272 US201615287272A US2017024249A1 US 20170024249 A1 US20170024249 A1 US 20170024249A1 US 201615287272 A US201615287272 A US 201615287272A US 2017024249 A1 US2017024249 A1 US 2017024249A1
- Authority
- US
- United States
- Prior art keywords
- thread
- priority
- transaction
- instruction
- memory
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0614—Improving the reliability of storage systems
- G06F3/0619—Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0655—Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
- G06F3/0659—Command handling arrangements, e.g. command buffers, queues, command scheduling
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0673—Single storage device
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/3004—Arrangements for executing specific machine instructions to perform operations on memory
- G06F9/30043—LOAD or STORE instructions; Clear instruction
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30076—Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
- G06F9/30087—Synchronisation or serialisation instructions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30076—Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
- G06F9/3009—Thread control instructions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/52—Program synchronisation; Mutual exclusion, e.g. by means of semaphores
- G06F9/526—Mutual exclusion algorithms
- G06F9/528—Mutual exclusion algorithms by using speculative mechanisms
Definitions
- the present invention relates to a hardware mechanism for performing thread-level speculative parallel execution.
- response time is one of the most important indicators to customers.
- response times depend largely on the single thread performance of the processor. In recent years, however, the growth rate of single-thread performance is slowing down.
- Thread-level speculative parallel execution is one well-known response. Thread-level speculative parallel execution speeds up the execution of single-thread programs by allowing a compiler or programmer to speculatively parallelize a single-thread program. This typically requires a complicated hardware mechanism.
- Hardware transaction memory is one technique used to speed up execution.
- a transaction is a sequence of instructions between special instructions such as transaction begins and transaction end.
- a data access conflict occurs between two transactions being executed in parallel such as read-after-write, write-after-write, and read-after-read conflicts, the hardware cancels the execution of the transaction.
- Thread-level speculative parallel execution cannot be performed by the hardware transaction memory alone. Thread-level speculative parallel execution requires the completion of transactions in order. However, the runtime for controlling the completion order causes transaction conflicts.
- Laid-open Patent Publication No. 2009-521767 describes software transaction memory (STM) access processing which is executed when the preceding hardware transaction memory (HTM) access processing fails.
- STM software transaction memory
- Laid-open Patent Publication No. 2010-532053 describes the use of transaction memory hardware to facilitate the updating of a dispatch table in a multi-thread environment utilizing an atomic commit function.
- an emulator uses a dispatch table stored in the main memory to convert a guest program counter to a host program counter.
- PCT Publication No. WO2010/001736 describes a multi-processor system including a plurality of processors for executing multi-threads in the processing of data, and a data processing control unit for determining satisfactory conditions allowing the processors to execute the threads in order, and for starting the execution of each thread so as to satisfy these conditions.
- U.S. Pat. No. 8,151,252 describes the speculative parallelization of a program using transactional memory by scoping program variables during compilation, and by inserting code into the program during compilation.
- the scoping is determined based on whether a scalar variable being scoped is involved in inter-loop non-reduction data dependencies, whether the scalar variable is used outside the loop defining it, and at what point in a loop the scalar variable is defined.
- speculative parallel execution is performed on a sequential program composed of a plurality of blocks which may have a dependency with each other by using hardware transaction memory to detect conflicts and to recover from mistaken speculative execution.
- a thread-level speculation mechanism which has content-addressable memory, an address register and a comparator for recording transaction footprints, and a control logic circuit for supporting CAD (compare and delay) instructions.
- This thread-level speculation mechanism includes a priority up bit for recording an attribute operand in a CAD instruction, a means for generating a priority up event when a thread wake-up event has occurred and the priority up bit is 1, and a means for preventing the CAM from storing the load/store address when the instruction is a non-transaction instruction.
- the transaction of the application program raises the priority in response to a priority up event.
- the present invention provides a thread priority controlling mechanism which uses the completion event of the preceding transaction to raise the priority of the next transaction in the order of execution when the transaction status has been changed from speculative to non-speculative. This reduces the amount of wasted resources due to speculative failure.
- FIG. 1 is a diagram showing the hardware configuration for a computer system embodying the present invention.
- FIG. 2 is a diagram showing an example of a transaction conflict occurring during thread-level speculation parallel execution processing in the prior art.
- FIG. 3 is a diagram showing an example of a hardware logic configuration embodying the present invention.
- FIG. 4 is a diagram used to explain the operations of the present invention.
- a transaction has to be completed in order in order to maintain the semantics of the original program.
- Threads in a transaction cannot communicate with other threads without stopping the transaction.
- the runtime for non-transactional instructions should allow the transactions to be completed in order.
- the present invention provides a thread priority controlling mechanism which uses the completion event of the preceding transaction to raise the priority of the next transaction in the order of execution when the transaction status has been changed from speculative to non-speculative. This reduces the amount of wasted resources due to speculative failure.
- FIG. 1 is a block diagram of computer hardware used to realize the system configuration and processing in an embodiment of the present invention.
- the computer hardware is preferably configured according to the IBMTM System zTM architecture.
- a CPU 104 main memory (RAM) 106 , a hard disk drive (HDD) 108 , a keyboard 110 , a mouse 112 , and a display 114 are connected to a system bus 102 .
- the CPU 104 is preferably based on the architecture of the z/OTM processor chip with modifications related to the present invention.
- the CPU 104 is a multi-core processor including a cache memory, fetch unit, decode unit, register file, arithmetic unit, load/store unit, transaction memory control circuit, and other logic circuits and control circuits.
- a z/OTM transaction memory control circuit is added. More specifically, a modification has been made to the CAD (compare and delay) instruction executing unit. The hardware modifications will be explained in greater detail below with reference to FIG. 3 .
- the main memory 106 has a capacity of at least 16 GB.
- the capacity of the hard disk drive 108 can be, for example, 1 TB.
- the operating system is stored in the hard disk drive 108 .
- the operating system can be any operating system compatible with the computer hardware, such as z/OS, z/VM or zNSE.
- the keyboard 110 and mouse 112 are used to load programs in the main memory 106 from the hard disk drive 108 , operate the program displayed on the display 114 (not shown) and enter text according to the functions of the operating system.
- the display 114 is preferably a liquid crystal display. Any resolution can be used, including XGA (resolution: 1024 ⁇ 768) or UXGA (resolution: 1600 ⁇ 1200). While not shown in the drawings, the display 114 is used to display numerical values such as accounting data calculated using a COBOL program.
- FIG. 2 thread-level speculation parallel execution is performed on sequential code consisting of multiple instruction blocks (t 1 and t 2 ).
- pointer aliasing may occur, that is, the same pointer may be accessed from multiple locations.
- t 2 has to be executed after t 1 .
- t 1 and t 2 may be performed in parallel as a transaction without a conflict occurring.
- the HTM detects a read-after-write, and allows t 2 to be executed again.
- the HTM detects a write-after-read, and t 1 is aborted. However, t 1 cannot be executed again in order to obtain the correct result.
- This instruction takes as operands register R 1 , base address register B 2 , and memory displacement value D 2 .
- the total value of value D 2 and value B 2 is the memory address, and the execution of subsequent instructions is delayed until the value of the memory at the memory address equals the value in the register R 1 .
- the original configuration to be modified has an address register 302 , a comparator 304 , and content-addressable memory (CAM) 306 .
- the content-addressable memory 306 keeps track of the transaction footprints (the cache line accessed in the transaction) to detect any transaction conflicts.
- the content-addressable memory 306 receives external store addresses and external load addresses from other threads and cores.
- An event notification is issued when a data access conflict occurs between transactions (read-after-write, write-after-write, and read-after-read conflicts).
- the address register 302 stores a memory address from the CAD.
- the comparator 304 compares an external store address from another thread or core with the content of the address register 302 , and issues a thread wake-up event notification when there is a match.
- a control logic circuit (sel) 312 to prevent the content-addressable memory 306 from storing a load/store address when the non-transaction flag is 1, that is, when the instruction is a non-transaction instruction.
- the flag is set at 1 .
- the added configurational elements are depicted using thick lines in FIG. 3 .
- CAD is used as the non-transaction instruction for synchronization, and the runtime maintains the completion of transactions in order.
- the following example is written using a System/z assembler.
- CAD is understood to be CAD with functions modified in accordance with the present invention.
- the initial CAD indicates the synchronization word, and a priority up attribute is simply added to the cache line indicated by the memory operand (R 2 ).
- the thread executes the CAD instruction again after the thread has completed the execution of the main body of the transaction.
- the thread continues the execution process if another thread has already completed the execution of the preceding transaction. Otherwise, the thread stops the execution process until the other thread has completed the execution of the preceding transaction.
- the thread loads the value of the synchronization word by using a non-transaction load (NTLG) in order to verify the completion of the preceding transaction.
- NTLG non-transaction load
- non-transaction instructions CAD and NTLG
- An ordinary (transaction) instruction causes a read-after-write conflict, and the transaction is aborted.
- FIG. 4 is a schematic diagram of the processing.
- sequential code (A) is executed as speculative parallel execution code (B).
- the wait operation waits for a notification from the preceding thread using a non-transaction instruction in order to keep the notification process from causing a transaction conflict.
- Primer up indicates raising the priority of a thread in response to a notification from the preceding thread.
- the ‘tbegin’ instruction and ‘tend’ instruction refer, respectively, to transaction begin and transaction end.
- the present invention was explained above with reference to a particular example in which CAD instructions in System/z were used.
- the present invention is not limited to this particular CPU architecture, but can be applied to any instructions used in inter-thread synchronization or memory synchronization.
- the instruction for which the present invention modifies the hardware is not limited to CAD, but also includes other memory synchronizing instructions such as BUSY WAIT
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Computer Security & Cryptography (AREA)
- Advance Control (AREA)
- Executing Machine-Instructions (AREA)
Abstract
A thread priority control mechanism is provided which uses the completion event of the preceding transaction to raise the priority of the next transaction in the order of execution when the transaction status has been changed from speculative to non-speculative. In one aspect of the present invention, a thread-level speculation mechanism is provided which has content-addressable memory, an address register and a comparator for recording transaction footprints, and a control logic circuit for supporting memory synchronization instructions. This supports hardware transaction memory in detecting transaction conflicts. This thread-level speculation mechanism includes a priority up bit for recording an attribute operand in a memory synchronization instruction, a means for generating a priority up event when a thread wake-up event has occurred and the priority up bit is 1, and a means for preventing the CAM from storing the load/store address when the instruction is a non-transaction instruction.
Description
- This application is a continuation of U.S. patent application Ser. No. 14/061,775 filed Oct. 24, 2013, which claims priority under 35 U.S.C. §119 from Japanese Patent Application No. 2012-234293 filed Oct. 24, 2012, the entire contents of which are incorporated herein by reference.
- The present invention relates to a hardware mechanism for performing thread-level speculative parallel execution.
- In real time transaction applications, response time is one of the most important indicators to customers. However, response times depend largely on the single thread performance of the processor. In recent years, however, the growth rate of single-thread performance is slowing down.
- Thread-level speculative parallel execution is one well-known response. Thread-level speculative parallel execution speeds up the execution of single-thread programs by allowing a compiler or programmer to speculatively parallelize a single-thread program. This typically requires a complicated hardware mechanism.
- Hardware transaction memory is one technique used to speed up execution. In hardware transaction memory, a transaction is a sequence of instructions between special instructions such as transaction begins and transaction end. When a data access conflict occurs between two transactions being executed in parallel such as read-after-write, write-after-write, and read-after-read conflicts, the hardware cancels the execution of the transaction.
- However, thread-level speculative parallel execution cannot be performed by the hardware transaction memory alone. Thread-level speculative parallel execution requires the completion of transactions in order. However, the runtime for controlling the completion order causes transaction conflicts.
- The following prior art technologies are known to be related to this.
- Laid-open Patent Publication No. 2009-521767 describes software transaction memory (STM) access processing which is executed when the preceding hardware transaction memory (HTM) access processing fails.
- Laid-open Patent Publication No. 2010-532053 describes the use of transaction memory hardware to facilitate the updating of a dispatch table in a multi-thread environment utilizing an atomic commit function. Here, an emulator uses a dispatch table stored in the main memory to convert a guest program counter to a host program counter.
- PCT Publication No. WO2010/001736 describes a multi-processor system including a plurality of processors for executing multi-threads in the processing of data, and a data processing control unit for determining satisfactory conditions allowing the processors to execute the threads in order, and for starting the execution of each thread so as to satisfy these conditions.
- U.S. Pat. No. 8,151,252 describes the speculative parallelization of a program using transactional memory by scoping program variables during compilation, and by inserting code into the program during compilation. In this technique, the scoping is determined based on whether a scalar variable being scoped is involved in inter-loop non-reduction data dependencies, whether the scalar variable is used outside the loop defining it, and at what point in a loop the scalar variable is defined.
- Architecture based on thread-level speculation is presented in Jeffrey Thomas Oplinger, “Enhancing Software Reliability with Speculative Threads”, Graduate Studies of Stanford University, August 2004. A programmer can use this to add monitoring code for checking the execution of a program. This architecture mitigates speed reductions when the monitoring code is executed speculatively in parallel with the main computations. In order to recover from an error, the programmer can define transactions with fine granularity. Side effects of these transactions are committed or aborted via program control. These transactions are implemented efficiently via thread-level hardware support.
- A hybrid conflict management mechanism is presented in Ruben Titos, Manuel E. Acacio, Jose M. Garcia,” Speculation-Based Conflict Resolution in Hardware Transaction Memory”, Parallel & Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on 23-29 May 2009. In hardware transaction memory, this hybrid conflict management mechanism uses a mechanism with an enthusiastic policy as the base, but combines the advantages of an enthusiastic policy with a lazy policy to allow many conflict-prone transactions to coexist.
- A process for detecting and addressing conflicts in hardware transaction memory has been disclosed in the prior art literature, but the literature does not suggest a mechanism for enabling thread-level speculative parallel execution when the completion of transactions in order has been requested.
- Therefore, it is an object of the present invention to provide a mechanism enabling thread-level speculative parallel execution to be performed in hardware transaction memory.
- In the present invention, speculative parallel execution is performed on a sequential program composed of a plurality of blocks which may have a dependency with each other by using hardware transaction memory to detect conflicts and to recover from mistaken speculative execution.
- In one aspect of the present invention, a thread-level speculation mechanism is provided which has content-addressable memory, an address register and a comparator for recording transaction footprints, and a control logic circuit for supporting CAD (compare and delay) instructions. This supports hardware transaction memory in detecting transaction conflicts. This thread-level speculation mechanism includes a priority up bit for recording an attribute operand in a CAD instruction, a means for generating a priority up event when a thread wake-up event has occurred and the priority up bit is 1, and a means for preventing the CAM from storing the load/store address when the instruction is a non-transaction instruction.
- The transaction of the application program raises the priority in response to a priority up event.
- The present invention provides a thread priority controlling mechanism which uses the completion event of the preceding transaction to raise the priority of the next transaction in the order of execution when the transaction status has been changed from speculative to non-speculative. This reduces the amount of wasted resources due to speculative failure.
-
FIG. 1 is a diagram showing the hardware configuration for a computer system embodying the present invention. -
FIG. 2 is a diagram showing an example of a transaction conflict occurring during thread-level speculation parallel execution processing in the prior art. -
FIG. 3 is a diagram showing an example of a hardware logic configuration embodying the present invention. -
FIG. 4 is a diagram used to explain the operations of the present invention. - The following problems related to hardware transaction memory occur in the prior art.
- A transaction has to be completed in order in order to maintain the semantics of the original program.
- Threads in a transaction cannot communicate with other threads without stopping the transaction.
- When a block is the earliest of all the blocks in a given sequence of instructions, the block is non-speculative.
- While CPU resources are divided evenly between speculatively executed blocks and non-speculatively executed blocks when a simultaneous multi-threading (SMT) processor executes multiple blocks in parallel, the allocation of resources should favor the non-speculatively executed blocks.
- In order to enable thread-level speculative execution in hardware transaction memory so as not to cause conflict between transactions, the runtime for non-transactional instructions should allow the transactions to be completed in order.
- The present invention provides a thread priority controlling mechanism which uses the completion event of the preceding transaction to raise the priority of the next transaction in the order of execution when the transaction status has been changed from speculative to non-speculative. This reduces the amount of wasted resources due to speculative failure.
- The following is an explanation of embodiments of the present invention with reference to the drawings. The embodiments are used to explain preferred embodiments of the present invention, and are not intended to limit the scope of the invention in any way. In the drawings, identical objects are denoted by the same numbers unless otherwise indicated.
-
FIG. 1 is a block diagram of computer hardware used to realize the system configuration and processing in an embodiment of the present invention. The computer hardware is preferably configured according to the IBM™ System z™ architecture. - In
FIG. 1 , aCPU 104, main memory (RAM) 106, a hard disk drive (HDD) 108, akeyboard 110, amouse 112, and adisplay 114 are connected to asystem bus 102. TheCPU 104 is preferably based on the architecture of the z/O™ processor chip with modifications related to the present invention. In other words, theCPU 104 is a multi-core processor including a cache memory, fetch unit, decode unit, register file, arithmetic unit, load/store unit, transaction memory control circuit, and other logic circuits and control circuits. In the present invention, a z/O™ transaction memory control circuit is added. More specifically, a modification has been made to the CAD (compare and delay) instruction executing unit. The hardware modifications will be explained in greater detail below with reference toFIG. 3 . - Preferably, the
main memory 106 has a capacity of at least 16 GB. The capacity of thehard disk drive 108 can be, for example, 1 TB. - While not shown in the drawing, the operating system is stored in the
hard disk drive 108. The operating system can be any operating system compatible with the computer hardware, such as z/OS, z/VM or zNSE. - The
keyboard 110 andmouse 112 are used to load programs in themain memory 106 from thehard disk drive 108, operate the program displayed on the display 114 (not shown) and enter text according to the functions of the operating system. - The
display 114 is preferably a liquid crystal display. Any resolution can be used, including XGA (resolution: 1024×768) or UXGA (resolution: 1600×1200). While not shown in the drawings, thedisplay 114 is used to display numerical values such as accounting data calculated using a COBOL program. - Before explaining the configuration of the present invention in greater detail, the problem of the prior art will be explained with reference to
FIG. 2 . This problem relates to the difficulty of controlling the completion order in thread-level speculation parallel execution using hardware transaction memory (HTM).FIG. 2 thread-level speculation parallel execution is performed on sequential code consisting of multiple instruction blocks (t1 and t2). Here, pointer aliasing may occur, that is, the same pointer may be accessed from multiple locations. In order to obtain correct results, t2 has to be executed after t1. - At this time, as shown in the drawing, the following three scenarios are possible.
- (A) Pointer aliasing does not occur. In other words, p≠q.
- In this scenario, t1 and t2 may be performed in parallel as a transaction without a conflict occurring.
- (B) Pointer aliasing occurs (p=q), and t1 is completed before t2.
- In this scenario, the HTM detects a read-after-write, and allows t2 to be executed again.
- (C) Pointer aliasing occurs (p=q), and t2 is completed before t1.
- In this scenario, the HTM detects a write-after-read, and t1 is aborted. However, t1 cannot be executed again in order to obtain the correct result.
- The following is an explanation of a memory synchronization (CAD) instruction. This instruction takes as operands register R1, base address register B2, and memory displacement value D2. The total value of value D2 and value B2 is the memory address, and the execution of subsequent instructions is delayed until the value of the memory at the memory address equals the value in the register R1.
- The following is an explanation, with reference to
FIG. 3 , of the hardware modifications to the execution unit for memory synchronization (CAD) instructions in the embodiment of the present invention. - The original configuration to be modified has an
address register 302, acomparator 304, and content-addressable memory (CAM) 306. The content-addressable memory 306 keeps track of the transaction footprints (the cache line accessed in the transaction) to detect any transaction conflicts. - The content-
addressable memory 306 receives external store addresses and external load addresses from other threads and cores. An event notification is issued when a data access conflict occurs between transactions (read-after-write, write-after-write, and read-after-read conflicts). - The address register 302 stores a memory address from the CAD. The
comparator 304 compares an external store address from another thread or core with the content of theaddress register 302, and issues a thread wake-up event notification when there is a match. - Only the prior art configuration has been described so far. The following logic circuits are added to this configuration in the example of the present invention.
- A priority up bit (pu) 308 from an additional operand to track an attribute of the memory synchronization instruction.
- A control logic circuit (and) 310 for generating a priority up event when a thread wake-up event has occurred, and the pu bit is 1.
- A control logic circuit (sel) 312 to prevent the content-
addressable memory 306 from storing a load/store address when the non-transaction flag is 1, that is, when the instruction is a non-transaction instruction. - When the memory synchronization instruction is a non-transaction instruction, the flag is set at 1. The added configurational elements are depicted using thick lines in
FIG. 3 . - In this configuration, CAD is used as the non-transaction instruction for synchronization, and the runtime maintains the completion of transactions in order. The following example is written using a System/z assembler. In this code, CAD is understood to be CAD with functions modified in accordance with the present invention.
-
typedef struct_tran_t{ int cont; int pad1[sizeof(cacheline)/sizeof(int)-1];} tran_t; tran_t TRAN[NUM_THREADS]; // one cache line per thread TRAN_N: TBEGIN //TBEGIN lowers priority of this thread JNZ TABORT_N LAR2,TRAN[N] //R2=&(TRAN[N].cont); load the address of a synch. LHI R1,1 CAD R1,PU,(R2) //TRAN[N].cont!=1 Therefore, thread does not stop. //TRAN[N].cont is initialized by 0 (N>0). //This means a PU (priority up) attribute added to each cache line. <mainbody> // Body of Nth speculative thread. LHI R1,0 CAD R1,PU,(R2) //if TRAN[N].cont==0, stop; (non-transaction cap) NTLG R4,0(R2) //R4=TRAN[N].cont; (non-transaction load) LHI R1,1 CR R4,R1 JNZABORT_N //if TRAN[N].cont!=1, abort; (e.g., timeout) TEND LHI R1,1 // TRAN_(N+1) is assumed to continue. ST R1,256+(R2) //TRAN[N+1].cont=1; increase the priority of the next ABORT_N: TABORT_N: ... // Delay the next try, optional logic J TRAN_N // Retry the transaction. - In this code, the initial CAD indicates the synchronization word, and a priority up attribute is simply added to the cache line indicated by the memory operand (R2).
- The thread executes the CAD instruction again after the thread has completed the execution of the main body of the transaction. The thread continues the execution process if another thread has already completed the execution of the preceding transaction. Otherwise, the thread stops the execution process until the other thread has completed the execution of the preceding transaction. Next, the thread loads the value of the synchronization word by using a non-transaction load (NTLG) in order to verify the completion of the preceding transaction.
- Here, the non-transaction instructions (CAD and NTLG) are essentially used in a single transaction. An ordinary (transaction) instruction causes a read-after-write conflict, and the transaction is aborted.
-
FIG. 4 is a schematic diagram of the processing. In this diagram, sequential code (A) is executed as speculative parallel execution code (B). - In speculative parallel execution code (B), the wait operation waits for a notification from the preceding thread using a non-transaction instruction in order to keep the notification process from causing a transaction conflict.
- ‘Priority up’ indicates raising the priority of a thread in response to a notification from the preceding thread.
- The ‘tbegin’ instruction and ‘tend’ instruction refer, respectively, to transaction begin and transaction end.
- The present invention was explained above with reference to a particular example in which CAD instructions in System/z were used. However, the present invention is not limited to this particular CPU architecture, but can be applied to any instructions used in inter-thread synchronization or memory synchronization. In other words, the instruction for which the present invention modifies the hardware is not limited to CAD, but also includes other memory synchronizing instructions such as BUSY WAIT
Claims (20)
1. A computer-implemented method for speculative parallel thread execution, the method comprising:
executing a first thread in parallel with a second thread;
waiting for a notification from the first thread using a non-transaction instruction to prevent a transaction conflict; and
raising a priority of the second thread in response to receiving the notification from the first thread.
2. The computer-implemented method of claim 1 , wherein the non-transaction memory access instruction includes an inter-thread synchronization instruction and a non-transaction load instruction.
3. The computer-implemented method of claim 2 , wherein the inter-thread synchronization instruction is a memory synchronization instruction.
4. The computer-implemented method of claim 2 , wherein the non-transaction load instruction is an NTLG instruction.
5. The computer-implemented method of claim 1 , wherein raising a priority of the second thread further comprises setting a priority up bit to track an attribute of a memory synchronization instruction.
6. The computer-implemented method of claim 5 , wherein raising a priority of the second thread further comprises generating a priority up event when a thread wake-up event has occurred and the priority up bit is set.
7. The computer-implemented method of claim 6 , wherein raising a priority of the second thread further comprises preventing a content-addressable memory from storing a load/store address.
8. A system for speculative parallel thread execution, the system comprising:
a memory comprising computer readable instructions; and
a processing device for executing the computer readable instructions for performing a method, the method comprising:
executing a first thread in parallel with a second thread;
waiting for a notification from the first thread using a non-transaction instruction to prevent a transaction conflict; and
raising a priority of the second thread in response to receiving the notification from the first thread.
9. The system of claim 8 , wherein the non-transaction memory access instruction includes an inter-thread synchronization instruction and a non-transaction load instruction.
10. The system of claim 9 , wherein the inter-thread synchronization instruction is a memory synchronization instruction.
11. The system of claim 9 , wherein the non-transaction load instruction is an NTLG instruction.
12. The system of claim 8 , wherein raising a priority of the second thread further comprises setting a priority up bit to track an attribute of a memory synchronization instruction.
13. The system of claim 12 , wherein raising a priority of the second thread further comprises generating a priority up event when a thread wake-up event has occurred and the priority up bit is set.
14. The system of claim 13 , wherein raising a priority of the second thread further comprises preventing a content-addressable memory from storing a load/store address.
15. A computer program product for speculative parallel thread execution, the computer program product comprising:
a computer readable storage medium having program instructions embodied therewith, wherein the computer readable storage medium is not a transitory signal per se, the program instructions executable by a processing device to cause the processing device to perform a method comprising:
executing a first thread in parallel with a second thread;
waiting for a notification from the first thread using a non-transaction instruction to prevent a transaction conflict; and
raising a priority of the second thread in response to receiving the notification from the first thread.
16. The computer program product of claim 15 , wherein the non-transaction memory access instruction includes an inter-thread synchronization instruction and a non-transaction load instruction.
17. The computer program product of claim 16 , wherein the inter-thread synchronization instruction is a memory synchronization instruction.
18. The computer program product of claim 15 , wherein raising a priority of the second thread further comprises setting a priority up bit to track an attribute of a memory synchronization instruction.
19. The computer program product of claim 18 , wherein raising a priority of the second thread further comprises generating a priority up event when a thread wake-up event has occurred and the priority up bit is set.
20. The computer program product of claim 19 , wherein raising a priority of the second thread further comprises preventing a content-addressable memory from storing a load/store address.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US15/287,272 US20170024249A1 (en) | 2012-10-24 | 2016-10-06 | Parallel Execution Mechanism and Operating Method Thereof |
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2012-234293 | 2012-10-24 | ||
| JP2012234293A JP2014085839A (en) | 2012-10-24 | 2012-10-24 | Concurrent execution mechanism and operation method thereof |
| US14/061,775 US9495225B2 (en) | 2012-10-24 | 2013-10-24 | Parallel execution mechanism and operating method thereof |
| US15/287,272 US20170024249A1 (en) | 2012-10-24 | 2016-10-06 | Parallel Execution Mechanism and Operating Method Thereof |
Related Parent Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US14/061,775 Continuation US9495225B2 (en) | 2012-10-24 | 2013-10-24 | Parallel execution mechanism and operating method thereof |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| US20170024249A1 true US20170024249A1 (en) | 2017-01-26 |
Family
ID=50486418
Family Applications (2)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US14/061,775 Expired - Fee Related US9495225B2 (en) | 2012-10-24 | 2013-10-24 | Parallel execution mechanism and operating method thereof |
| US15/287,272 Abandoned US20170024249A1 (en) | 2012-10-24 | 2016-10-06 | Parallel Execution Mechanism and Operating Method Thereof |
Family Applications Before (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US14/061,775 Expired - Fee Related US9495225B2 (en) | 2012-10-24 | 2013-10-24 | Parallel execution mechanism and operating method thereof |
Country Status (2)
| Country | Link |
|---|---|
| US (2) | US9495225B2 (en) |
| JP (1) | JP2014085839A (en) |
Families Citing this family (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20140351826A1 (en) * | 2013-05-21 | 2014-11-27 | Nvidia Corporation | Application programming interface to enable the construction of pipeline parallel programs |
| US9323568B2 (en) * | 2014-01-24 | 2016-04-26 | International Business Machines Corporation | Indicating a low priority transaction |
| US10346196B2 (en) * | 2015-08-11 | 2019-07-09 | Oracle International Corporation | Techniques for enhancing progress for hardware transactional memory |
| JP6645348B2 (en) | 2016-05-06 | 2020-02-14 | 富士通株式会社 | Information processing apparatus, information processing program, and information processing method |
| CN108228905B (en) * | 2018-02-08 | 2020-09-25 | 中国人民解放军战略支援部队信息工程大学 | Parallel comparison model and method for massive normalized data |
| EP3594804B1 (en) * | 2018-07-09 | 2021-11-03 | ARM Limited | Transactional compare-and-discard instruction |
| CN109508337A (en) * | 2018-11-12 | 2019-03-22 | 杭州秘猿科技有限公司 | A kind of transaction is parallel to execute method, apparatus, electronic equipment and system |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20070143287A1 (en) * | 2005-12-15 | 2007-06-21 | Ali-Reza Adl-Tabatabai | Coordinating access to memory locations for hardware transactional memory transactions and software transactional memory transactions |
| US20090217253A1 (en) * | 2008-02-22 | 2009-08-27 | Sun Microsystems, Inc. | Compiler framework for speculative automatic parallelization with transactional memory |
Family Cites Families (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8683143B2 (en) | 2005-12-30 | 2014-03-25 | Intel Corporation | Unbounded transactional memory systems |
| US9043553B2 (en) | 2007-06-27 | 2015-05-26 | Microsoft Technology Licensing, Llc | Leveraging transactional memory hardware to accelerate virtualization and emulation |
| CN101587447B (en) * | 2008-05-23 | 2013-03-27 | 国际商业机器公司 | System supporting transaction storage and prediction-based transaction execution method |
| WO2010001736A1 (en) | 2008-07-04 | 2010-01-07 | 日本電気株式会社 | Multiprocessor system, multithread processing method, and program |
| US8438568B2 (en) * | 2010-02-24 | 2013-05-07 | International Business Machines Corporation | Speculative thread execution with hardware transactional memory |
| US9424015B2 (en) * | 2011-02-08 | 2016-08-23 | Oracle International Corporation | System and method for optimizing software transactional memory operations using static caching of memory objects |
| US9772958B2 (en) * | 2011-10-31 | 2017-09-26 | Hewlett Packard Enterprise Development Lp | Methods and apparatus to control generation of memory access requests |
-
2012
- 2012-10-24 JP JP2012234293A patent/JP2014085839A/en active Pending
-
2013
- 2013-10-24 US US14/061,775 patent/US9495225B2/en not_active Expired - Fee Related
-
2016
- 2016-10-06 US US15/287,272 patent/US20170024249A1/en not_active Abandoned
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20070143287A1 (en) * | 2005-12-15 | 2007-06-21 | Ali-Reza Adl-Tabatabai | Coordinating access to memory locations for hardware transactional memory transactions and software transactional memory transactions |
| US20090217253A1 (en) * | 2008-02-22 | 2009-08-27 | Sun Microsystems, Inc. | Compiler framework for speculative automatic parallelization with transactional memory |
Also Published As
| Publication number | Publication date |
|---|---|
| US9495225B2 (en) | 2016-11-15 |
| JP2014085839A (en) | 2014-05-12 |
| US20140115249A1 (en) | 2014-04-24 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20170024249A1 (en) | Parallel Execution Mechanism and Operating Method Thereof | |
| US8301849B2 (en) | Transactional memory in out-of-order processors with XABORT having immediate argument | |
| US8544022B2 (en) | Transactional memory preemption mechanism | |
| US6182210B1 (en) | Processor having multiple program counters and trace buffers outside an execution pipeline | |
| US8332619B2 (en) | Primitives to enhance thread-level speculation | |
| US7899997B2 (en) | Systems and methods for implementing key-based transactional memory conflict detection | |
| US9626187B2 (en) | Transactional memory system supporting unbroken suspended execution | |
| CN100392622C (en) | Processor with memory sequential buffer | |
| JP5118652B2 (en) | Transactional memory in out-of-order processors | |
| EP1989619B1 (en) | Hardware acceleration for a software transactional memory system | |
| US6772324B2 (en) | Processor having multiple program counters and trace buffers outside an execution pipeline | |
| JP5416223B2 (en) | Memory model of hardware attributes in a transactional memory system | |
| US8468526B2 (en) | Concurrent thread execution using user-level asynchronous signaling | |
| US7860847B2 (en) | Exception ordering in contention management to support speculative sequential semantics | |
| JP2002508567A (en) | Out-of-pipeline trace buffer for instruction re-execution after misleading | |
| JP7007371B2 (en) | Handling of inter-element address hazards for vector instructions | |
| JP2015501019A (en) | Maintenance of operand activity information in computer systems | |
| CN111767159A (en) | A coroutine-based asynchronous system call system | |
| EP4363991A1 (en) | Providing atomicity for complex operations using near-memory computing | |
| US20050283783A1 (en) | Method for optimizing pipeline use in a multiprocessing system | |
| JP3146058B2 (en) | Parallel processing type processor system and control method of parallel processing type processor system | |
| US10209997B2 (en) | Computer architecture for speculative parallel execution | |
| US20190065160A1 (en) | Pre-post retire hybrid hardware lock elision (hle) scheme |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:JACOBI, CHRISTIAN;MITRAN, MARCEL;OHARA, MORIYOSHI;SIGNING DATES FROM 20130918 TO 20131023;REEL/FRAME:039963/0077 |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: ADVISORY ACTION MAILED |
|
| STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |