[go: up one dir, main page]

US20170024249A1 - Parallel Execution Mechanism and Operating Method Thereof - Google Patents

Parallel Execution Mechanism and Operating Method Thereof Download PDF

Info

Publication number
US20170024249A1
US20170024249A1 US15/287,272 US201615287272A US2017024249A1 US 20170024249 A1 US20170024249 A1 US 20170024249A1 US 201615287272 A US201615287272 A US 201615287272A US 2017024249 A1 US2017024249 A1 US 2017024249A1
Authority
US
United States
Prior art keywords
thread
priority
transaction
instruction
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US15/287,272
Inventor
Christian Jacobi
Marcel Mitran
Moriyoshi Ohara
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to US15/287,272 priority Critical patent/US20170024249A1/en
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION reassignment INTERNATIONAL BUSINESS MACHINES CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: JACOBI, CHRISTIAN, OHARA, MORIYOSHI, MITRAN, MARCEL
Publication of US20170024249A1 publication Critical patent/US20170024249A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0659Command handling arrangements, e.g. command buffers, queues, command scheduling
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • G06F9/30043LOAD or STORE instructions; Clear instruction
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30076Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
    • G06F9/30087Synchronisation or serialisation instructions
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30076Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
    • G06F9/3009Thread control instructions
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/52Program synchronisation; Mutual exclusion, e.g. by means of semaphores
    • G06F9/526Mutual exclusion algorithms
    • G06F9/528Mutual exclusion algorithms by using speculative mechanisms

Definitions

  • the present invention relates to a hardware mechanism for performing thread-level speculative parallel execution.
  • response time is one of the most important indicators to customers.
  • response times depend largely on the single thread performance of the processor. In recent years, however, the growth rate of single-thread performance is slowing down.
  • Thread-level speculative parallel execution is one well-known response. Thread-level speculative parallel execution speeds up the execution of single-thread programs by allowing a compiler or programmer to speculatively parallelize a single-thread program. This typically requires a complicated hardware mechanism.
  • Hardware transaction memory is one technique used to speed up execution.
  • a transaction is a sequence of instructions between special instructions such as transaction begins and transaction end.
  • a data access conflict occurs between two transactions being executed in parallel such as read-after-write, write-after-write, and read-after-read conflicts, the hardware cancels the execution of the transaction.
  • Thread-level speculative parallel execution cannot be performed by the hardware transaction memory alone. Thread-level speculative parallel execution requires the completion of transactions in order. However, the runtime for controlling the completion order causes transaction conflicts.
  • Laid-open Patent Publication No. 2009-521767 describes software transaction memory (STM) access processing which is executed when the preceding hardware transaction memory (HTM) access processing fails.
  • STM software transaction memory
  • Laid-open Patent Publication No. 2010-532053 describes the use of transaction memory hardware to facilitate the updating of a dispatch table in a multi-thread environment utilizing an atomic commit function.
  • an emulator uses a dispatch table stored in the main memory to convert a guest program counter to a host program counter.
  • PCT Publication No. WO2010/001736 describes a multi-processor system including a plurality of processors for executing multi-threads in the processing of data, and a data processing control unit for determining satisfactory conditions allowing the processors to execute the threads in order, and for starting the execution of each thread so as to satisfy these conditions.
  • U.S. Pat. No. 8,151,252 describes the speculative parallelization of a program using transactional memory by scoping program variables during compilation, and by inserting code into the program during compilation.
  • the scoping is determined based on whether a scalar variable being scoped is involved in inter-loop non-reduction data dependencies, whether the scalar variable is used outside the loop defining it, and at what point in a loop the scalar variable is defined.
  • speculative parallel execution is performed on a sequential program composed of a plurality of blocks which may have a dependency with each other by using hardware transaction memory to detect conflicts and to recover from mistaken speculative execution.
  • a thread-level speculation mechanism which has content-addressable memory, an address register and a comparator for recording transaction footprints, and a control logic circuit for supporting CAD (compare and delay) instructions.
  • This thread-level speculation mechanism includes a priority up bit for recording an attribute operand in a CAD instruction, a means for generating a priority up event when a thread wake-up event has occurred and the priority up bit is 1, and a means for preventing the CAM from storing the load/store address when the instruction is a non-transaction instruction.
  • the transaction of the application program raises the priority in response to a priority up event.
  • the present invention provides a thread priority controlling mechanism which uses the completion event of the preceding transaction to raise the priority of the next transaction in the order of execution when the transaction status has been changed from speculative to non-speculative. This reduces the amount of wasted resources due to speculative failure.
  • FIG. 1 is a diagram showing the hardware configuration for a computer system embodying the present invention.
  • FIG. 2 is a diagram showing an example of a transaction conflict occurring during thread-level speculation parallel execution processing in the prior art.
  • FIG. 3 is a diagram showing an example of a hardware logic configuration embodying the present invention.
  • FIG. 4 is a diagram used to explain the operations of the present invention.
  • a transaction has to be completed in order in order to maintain the semantics of the original program.
  • Threads in a transaction cannot communicate with other threads without stopping the transaction.
  • the runtime for non-transactional instructions should allow the transactions to be completed in order.
  • the present invention provides a thread priority controlling mechanism which uses the completion event of the preceding transaction to raise the priority of the next transaction in the order of execution when the transaction status has been changed from speculative to non-speculative. This reduces the amount of wasted resources due to speculative failure.
  • FIG. 1 is a block diagram of computer hardware used to realize the system configuration and processing in an embodiment of the present invention.
  • the computer hardware is preferably configured according to the IBMTM System zTM architecture.
  • a CPU 104 main memory (RAM) 106 , a hard disk drive (HDD) 108 , a keyboard 110 , a mouse 112 , and a display 114 are connected to a system bus 102 .
  • the CPU 104 is preferably based on the architecture of the z/OTM processor chip with modifications related to the present invention.
  • the CPU 104 is a multi-core processor including a cache memory, fetch unit, decode unit, register file, arithmetic unit, load/store unit, transaction memory control circuit, and other logic circuits and control circuits.
  • a z/OTM transaction memory control circuit is added. More specifically, a modification has been made to the CAD (compare and delay) instruction executing unit. The hardware modifications will be explained in greater detail below with reference to FIG. 3 .
  • the main memory 106 has a capacity of at least 16 GB.
  • the capacity of the hard disk drive 108 can be, for example, 1 TB.
  • the operating system is stored in the hard disk drive 108 .
  • the operating system can be any operating system compatible with the computer hardware, such as z/OS, z/VM or zNSE.
  • the keyboard 110 and mouse 112 are used to load programs in the main memory 106 from the hard disk drive 108 , operate the program displayed on the display 114 (not shown) and enter text according to the functions of the operating system.
  • the display 114 is preferably a liquid crystal display. Any resolution can be used, including XGA (resolution: 1024 ⁇ 768) or UXGA (resolution: 1600 ⁇ 1200). While not shown in the drawings, the display 114 is used to display numerical values such as accounting data calculated using a COBOL program.
  • FIG. 2 thread-level speculation parallel execution is performed on sequential code consisting of multiple instruction blocks (t 1 and t 2 ).
  • pointer aliasing may occur, that is, the same pointer may be accessed from multiple locations.
  • t 2 has to be executed after t 1 .
  • t 1 and t 2 may be performed in parallel as a transaction without a conflict occurring.
  • the HTM detects a read-after-write, and allows t 2 to be executed again.
  • the HTM detects a write-after-read, and t 1 is aborted. However, t 1 cannot be executed again in order to obtain the correct result.
  • This instruction takes as operands register R 1 , base address register B 2 , and memory displacement value D 2 .
  • the total value of value D 2 and value B 2 is the memory address, and the execution of subsequent instructions is delayed until the value of the memory at the memory address equals the value in the register R 1 .
  • the original configuration to be modified has an address register 302 , a comparator 304 , and content-addressable memory (CAM) 306 .
  • the content-addressable memory 306 keeps track of the transaction footprints (the cache line accessed in the transaction) to detect any transaction conflicts.
  • the content-addressable memory 306 receives external store addresses and external load addresses from other threads and cores.
  • An event notification is issued when a data access conflict occurs between transactions (read-after-write, write-after-write, and read-after-read conflicts).
  • the address register 302 stores a memory address from the CAD.
  • the comparator 304 compares an external store address from another thread or core with the content of the address register 302 , and issues a thread wake-up event notification when there is a match.
  • a control logic circuit (sel) 312 to prevent the content-addressable memory 306 from storing a load/store address when the non-transaction flag is 1, that is, when the instruction is a non-transaction instruction.
  • the flag is set at 1 .
  • the added configurational elements are depicted using thick lines in FIG. 3 .
  • CAD is used as the non-transaction instruction for synchronization, and the runtime maintains the completion of transactions in order.
  • the following example is written using a System/z assembler.
  • CAD is understood to be CAD with functions modified in accordance with the present invention.
  • the initial CAD indicates the synchronization word, and a priority up attribute is simply added to the cache line indicated by the memory operand (R 2 ).
  • the thread executes the CAD instruction again after the thread has completed the execution of the main body of the transaction.
  • the thread continues the execution process if another thread has already completed the execution of the preceding transaction. Otherwise, the thread stops the execution process until the other thread has completed the execution of the preceding transaction.
  • the thread loads the value of the synchronization word by using a non-transaction load (NTLG) in order to verify the completion of the preceding transaction.
  • NTLG non-transaction load
  • non-transaction instructions CAD and NTLG
  • An ordinary (transaction) instruction causes a read-after-write conflict, and the transaction is aborted.
  • FIG. 4 is a schematic diagram of the processing.
  • sequential code (A) is executed as speculative parallel execution code (B).
  • the wait operation waits for a notification from the preceding thread using a non-transaction instruction in order to keep the notification process from causing a transaction conflict.
  • Primer up indicates raising the priority of a thread in response to a notification from the preceding thread.
  • the ‘tbegin’ instruction and ‘tend’ instruction refer, respectively, to transaction begin and transaction end.
  • the present invention was explained above with reference to a particular example in which CAD instructions in System/z were used.
  • the present invention is not limited to this particular CPU architecture, but can be applied to any instructions used in inter-thread synchronization or memory synchronization.
  • the instruction for which the present invention modifies the hardware is not limited to CAD, but also includes other memory synchronizing instructions such as BUSY WAIT

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computer Security & Cryptography (AREA)
  • Advance Control (AREA)
  • Executing Machine-Instructions (AREA)

Abstract

A thread priority control mechanism is provided which uses the completion event of the preceding transaction to raise the priority of the next transaction in the order of execution when the transaction status has been changed from speculative to non-speculative. In one aspect of the present invention, a thread-level speculation mechanism is provided which has content-addressable memory, an address register and a comparator for recording transaction footprints, and a control logic circuit for supporting memory synchronization instructions. This supports hardware transaction memory in detecting transaction conflicts. This thread-level speculation mechanism includes a priority up bit for recording an attribute operand in a memory synchronization instruction, a means for generating a priority up event when a thread wake-up event has occurred and the priority up bit is 1, and a means for preventing the CAM from storing the load/store address when the instruction is a non-transaction instruction.

Description

  • This application is a continuation of U.S. patent application Ser. No. 14/061,775 filed Oct. 24, 2013, which claims priority under 35 U.S.C. §119 from Japanese Patent Application No. 2012-234293 filed Oct. 24, 2012, the entire contents of which are incorporated herein by reference.
  • BACKGROUND
  • The present invention relates to a hardware mechanism for performing thread-level speculative parallel execution.
  • In real time transaction applications, response time is one of the most important indicators to customers. However, response times depend largely on the single thread performance of the processor. In recent years, however, the growth rate of single-thread performance is slowing down.
  • Thread-level speculative parallel execution is one well-known response. Thread-level speculative parallel execution speeds up the execution of single-thread programs by allowing a compiler or programmer to speculatively parallelize a single-thread program. This typically requires a complicated hardware mechanism.
  • Hardware transaction memory is one technique used to speed up execution. In hardware transaction memory, a transaction is a sequence of instructions between special instructions such as transaction begins and transaction end. When a data access conflict occurs between two transactions being executed in parallel such as read-after-write, write-after-write, and read-after-read conflicts, the hardware cancels the execution of the transaction.
  • However, thread-level speculative parallel execution cannot be performed by the hardware transaction memory alone. Thread-level speculative parallel execution requires the completion of transactions in order. However, the runtime for controlling the completion order causes transaction conflicts.
  • The following prior art technologies are known to be related to this.
  • Laid-open Patent Publication No. 2009-521767 describes software transaction memory (STM) access processing which is executed when the preceding hardware transaction memory (HTM) access processing fails.
  • Laid-open Patent Publication No. 2010-532053 describes the use of transaction memory hardware to facilitate the updating of a dispatch table in a multi-thread environment utilizing an atomic commit function. Here, an emulator uses a dispatch table stored in the main memory to convert a guest program counter to a host program counter.
  • PCT Publication No. WO2010/001736 describes a multi-processor system including a plurality of processors for executing multi-threads in the processing of data, and a data processing control unit for determining satisfactory conditions allowing the processors to execute the threads in order, and for starting the execution of each thread so as to satisfy these conditions.
  • U.S. Pat. No. 8,151,252 describes the speculative parallelization of a program using transactional memory by scoping program variables during compilation, and by inserting code into the program during compilation. In this technique, the scoping is determined based on whether a scalar variable being scoped is involved in inter-loop non-reduction data dependencies, whether the scalar variable is used outside the loop defining it, and at what point in a loop the scalar variable is defined.
  • Architecture based on thread-level speculation is presented in Jeffrey Thomas Oplinger, “Enhancing Software Reliability with Speculative Threads”, Graduate Studies of Stanford University, August 2004. A programmer can use this to add monitoring code for checking the execution of a program. This architecture mitigates speed reductions when the monitoring code is executed speculatively in parallel with the main computations. In order to recover from an error, the programmer can define transactions with fine granularity. Side effects of these transactions are committed or aborted via program control. These transactions are implemented efficiently via thread-level hardware support.
  • A hybrid conflict management mechanism is presented in Ruben Titos, Manuel E. Acacio, Jose M. Garcia,” Speculation-Based Conflict Resolution in Hardware Transaction Memory”, Parallel & Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on 23-29 May 2009. In hardware transaction memory, this hybrid conflict management mechanism uses a mechanism with an enthusiastic policy as the base, but combines the advantages of an enthusiastic policy with a lazy policy to allow many conflict-prone transactions to coexist.
  • SUMMARY
  • A process for detecting and addressing conflicts in hardware transaction memory has been disclosed in the prior art literature, but the literature does not suggest a mechanism for enabling thread-level speculative parallel execution when the completion of transactions in order has been requested.
  • Therefore, it is an object of the present invention to provide a mechanism enabling thread-level speculative parallel execution to be performed in hardware transaction memory.
  • In the present invention, speculative parallel execution is performed on a sequential program composed of a plurality of blocks which may have a dependency with each other by using hardware transaction memory to detect conflicts and to recover from mistaken speculative execution.
  • In one aspect of the present invention, a thread-level speculation mechanism is provided which has content-addressable memory, an address register and a comparator for recording transaction footprints, and a control logic circuit for supporting CAD (compare and delay) instructions. This supports hardware transaction memory in detecting transaction conflicts. This thread-level speculation mechanism includes a priority up bit for recording an attribute operand in a CAD instruction, a means for generating a priority up event when a thread wake-up event has occurred and the priority up bit is 1, and a means for preventing the CAM from storing the load/store address when the instruction is a non-transaction instruction.
  • The transaction of the application program raises the priority in response to a priority up event.
  • The present invention provides a thread priority controlling mechanism which uses the completion event of the preceding transaction to raise the priority of the next transaction in the order of execution when the transaction status has been changed from speculative to non-speculative. This reduces the amount of wasted resources due to speculative failure.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a diagram showing the hardware configuration for a computer system embodying the present invention.
  • FIG. 2 is a diagram showing an example of a transaction conflict occurring during thread-level speculation parallel execution processing in the prior art.
  • FIG. 3 is a diagram showing an example of a hardware logic configuration embodying the present invention.
  • FIG. 4 is a diagram used to explain the operations of the present invention.
  • DETAILED DESCRIPTION
  • The following problems related to hardware transaction memory occur in the prior art.
  • A transaction has to be completed in order in order to maintain the semantics of the original program.
  • Threads in a transaction cannot communicate with other threads without stopping the transaction.
  • When a block is the earliest of all the blocks in a given sequence of instructions, the block is non-speculative.
  • While CPU resources are divided evenly between speculatively executed blocks and non-speculatively executed blocks when a simultaneous multi-threading (SMT) processor executes multiple blocks in parallel, the allocation of resources should favor the non-speculatively executed blocks.
  • In order to enable thread-level speculative execution in hardware transaction memory so as not to cause conflict between transactions, the runtime for non-transactional instructions should allow the transactions to be completed in order.
  • The present invention provides a thread priority controlling mechanism which uses the completion event of the preceding transaction to raise the priority of the next transaction in the order of execution when the transaction status has been changed from speculative to non-speculative. This reduces the amount of wasted resources due to speculative failure.
  • The following is an explanation of embodiments of the present invention with reference to the drawings. The embodiments are used to explain preferred embodiments of the present invention, and are not intended to limit the scope of the invention in any way. In the drawings, identical objects are denoted by the same numbers unless otherwise indicated.
  • FIG. 1 is a block diagram of computer hardware used to realize the system configuration and processing in an embodiment of the present invention. The computer hardware is preferably configured according to the IBM™ System z™ architecture.
  • In FIG. 1, a CPU 104, main memory (RAM) 106, a hard disk drive (HDD) 108, a keyboard 110, a mouse 112, and a display 114 are connected to a system bus 102. The CPU 104 is preferably based on the architecture of the z/O™ processor chip with modifications related to the present invention. In other words, the CPU 104 is a multi-core processor including a cache memory, fetch unit, decode unit, register file, arithmetic unit, load/store unit, transaction memory control circuit, and other logic circuits and control circuits. In the present invention, a z/O™ transaction memory control circuit is added. More specifically, a modification has been made to the CAD (compare and delay) instruction executing unit. The hardware modifications will be explained in greater detail below with reference to FIG. 3.
  • Preferably, the main memory 106 has a capacity of at least 16 GB. The capacity of the hard disk drive 108 can be, for example, 1 TB.
  • While not shown in the drawing, the operating system is stored in the hard disk drive 108. The operating system can be any operating system compatible with the computer hardware, such as z/OS, z/VM or zNSE.
  • The keyboard 110 and mouse 112 are used to load programs in the main memory 106 from the hard disk drive 108, operate the program displayed on the display 114 (not shown) and enter text according to the functions of the operating system.
  • The display 114 is preferably a liquid crystal display. Any resolution can be used, including XGA (resolution: 1024×768) or UXGA (resolution: 1600×1200). While not shown in the drawings, the display 114 is used to display numerical values such as accounting data calculated using a COBOL program.
  • Before explaining the configuration of the present invention in greater detail, the problem of the prior art will be explained with reference to FIG. 2. This problem relates to the difficulty of controlling the completion order in thread-level speculation parallel execution using hardware transaction memory (HTM). FIG. 2 thread-level speculation parallel execution is performed on sequential code consisting of multiple instruction blocks (t1 and t2). Here, pointer aliasing may occur, that is, the same pointer may be accessed from multiple locations. In order to obtain correct results, t2 has to be executed after t1.
  • At this time, as shown in the drawing, the following three scenarios are possible.
  • (A) Pointer aliasing does not occur. In other words, p≠q.
  • In this scenario, t1 and t2 may be performed in parallel as a transaction without a conflict occurring.
  • (B) Pointer aliasing occurs (p=q), and t1 is completed before t2.
  • In this scenario, the HTM detects a read-after-write, and allows t2 to be executed again.
  • (C) Pointer aliasing occurs (p=q), and t2 is completed before t1.
  • In this scenario, the HTM detects a write-after-read, and t1 is aborted. However, t1 cannot be executed again in order to obtain the correct result.
  • The following is an explanation of a memory synchronization (CAD) instruction. This instruction takes as operands register R1, base address register B2, and memory displacement value D2. The total value of value D2 and value B2 is the memory address, and the execution of subsequent instructions is delayed until the value of the memory at the memory address equals the value in the register R1.
  • The following is an explanation, with reference to FIG. 3, of the hardware modifications to the execution unit for memory synchronization (CAD) instructions in the embodiment of the present invention.
  • The original configuration to be modified has an address register 302, a comparator 304, and content-addressable memory (CAM) 306. The content-addressable memory 306 keeps track of the transaction footprints (the cache line accessed in the transaction) to detect any transaction conflicts.
  • The content-addressable memory 306 receives external store addresses and external load addresses from other threads and cores. An event notification is issued when a data access conflict occurs between transactions (read-after-write, write-after-write, and read-after-read conflicts).
  • The address register 302 stores a memory address from the CAD. The comparator 304 compares an external store address from another thread or core with the content of the address register 302, and issues a thread wake-up event notification when there is a match.
  • Only the prior art configuration has been described so far. The following logic circuits are added to this configuration in the example of the present invention.
  • A priority up bit (pu) 308 from an additional operand to track an attribute of the memory synchronization instruction.
  • A control logic circuit (and) 310 for generating a priority up event when a thread wake-up event has occurred, and the pu bit is 1.
  • A control logic circuit (sel) 312 to prevent the content-addressable memory 306 from storing a load/store address when the non-transaction flag is 1, that is, when the instruction is a non-transaction instruction.
  • When the memory synchronization instruction is a non-transaction instruction, the flag is set at 1. The added configurational elements are depicted using thick lines in FIG. 3.
  • In this configuration, CAD is used as the non-transaction instruction for synchronization, and the runtime maintains the completion of transactions in order. The following example is written using a System/z assembler. In this code, CAD is understood to be CAD with functions modified in accordance with the present invention.
  • typedef struct_tran_t{
    int cont; int pad1[sizeof(cacheline)/sizeof(int)-1];} tran_t;
    tran_t TRAN[NUM_THREADS];  // one cache line per thread
    TRAN_N: TBEGIN //TBEGIN lowers priority of this thread
    JNZ TABORT_N
    LAR2,TRAN[N] //R2=&(TRAN[N].cont); load the address of a synch.
    LHI R1,1
    CAD R1,PU,(R2) //TRAN[N].cont!=1 Therefore, thread does not stop.
    //TRAN[N].cont is initialized by 0 (N>0).
    //This means a PU (priority up) attribute added to each cache line.
    <mainbody> // Body of Nth speculative thread.
    LHI R1,0
    CAD R1,PU,(R2) //if TRAN[N].cont==0, stop; (non-transaction cap)
    NTLG R4,0(R2) //R4=TRAN[N].cont; (non-transaction load)
    LHI R1,1
    CR R4,R1
    JNZABORT_N //if TRAN[N].cont!=1, abort; (e.g., timeout)
    TEND
    LHI R1,1 // TRAN_(N+1) is assumed to continue.
    ST R1,256+(R2) //TRAN[N+1].cont=1; increase the priority of the next
    ABORT_N:
    TABORT_N: ... // Delay the next try, optional logic
    J TRAN_N // Retry the transaction.
  • In this code, the initial CAD indicates the synchronization word, and a priority up attribute is simply added to the cache line indicated by the memory operand (R2).
  • The thread executes the CAD instruction again after the thread has completed the execution of the main body of the transaction. The thread continues the execution process if another thread has already completed the execution of the preceding transaction. Otherwise, the thread stops the execution process until the other thread has completed the execution of the preceding transaction. Next, the thread loads the value of the synchronization word by using a non-transaction load (NTLG) in order to verify the completion of the preceding transaction.
  • Here, the non-transaction instructions (CAD and NTLG) are essentially used in a single transaction. An ordinary (transaction) instruction causes a read-after-write conflict, and the transaction is aborted.
  • FIG. 4 is a schematic diagram of the processing. In this diagram, sequential code (A) is executed as speculative parallel execution code (B).
  • In speculative parallel execution code (B), the wait operation waits for a notification from the preceding thread using a non-transaction instruction in order to keep the notification process from causing a transaction conflict.
  • ‘Priority up’ indicates raising the priority of a thread in response to a notification from the preceding thread.
  • The ‘tbegin’ instruction and ‘tend’ instruction refer, respectively, to transaction begin and transaction end.
  • The present invention was explained above with reference to a particular example in which CAD instructions in System/z were used. However, the present invention is not limited to this particular CPU architecture, but can be applied to any instructions used in inter-thread synchronization or memory synchronization. In other words, the instruction for which the present invention modifies the hardware is not limited to CAD, but also includes other memory synchronizing instructions such as BUSY WAIT

Claims (20)

What is claimed is:
1. A computer-implemented method for speculative parallel thread execution, the method comprising:
executing a first thread in parallel with a second thread;
waiting for a notification from the first thread using a non-transaction instruction to prevent a transaction conflict; and
raising a priority of the second thread in response to receiving the notification from the first thread.
2. The computer-implemented method of claim 1, wherein the non-transaction memory access instruction includes an inter-thread synchronization instruction and a non-transaction load instruction.
3. The computer-implemented method of claim 2, wherein the inter-thread synchronization instruction is a memory synchronization instruction.
4. The computer-implemented method of claim 2, wherein the non-transaction load instruction is an NTLG instruction.
5. The computer-implemented method of claim 1, wherein raising a priority of the second thread further comprises setting a priority up bit to track an attribute of a memory synchronization instruction.
6. The computer-implemented method of claim 5, wherein raising a priority of the second thread further comprises generating a priority up event when a thread wake-up event has occurred and the priority up bit is set.
7. The computer-implemented method of claim 6, wherein raising a priority of the second thread further comprises preventing a content-addressable memory from storing a load/store address.
8. A system for speculative parallel thread execution, the system comprising:
a memory comprising computer readable instructions; and
a processing device for executing the computer readable instructions for performing a method, the method comprising:
executing a first thread in parallel with a second thread;
waiting for a notification from the first thread using a non-transaction instruction to prevent a transaction conflict; and
raising a priority of the second thread in response to receiving the notification from the first thread.
9. The system of claim 8, wherein the non-transaction memory access instruction includes an inter-thread synchronization instruction and a non-transaction load instruction.
10. The system of claim 9, wherein the inter-thread synchronization instruction is a memory synchronization instruction.
11. The system of claim 9, wherein the non-transaction load instruction is an NTLG instruction.
12. The system of claim 8, wherein raising a priority of the second thread further comprises setting a priority up bit to track an attribute of a memory synchronization instruction.
13. The system of claim 12, wherein raising a priority of the second thread further comprises generating a priority up event when a thread wake-up event has occurred and the priority up bit is set.
14. The system of claim 13, wherein raising a priority of the second thread further comprises preventing a content-addressable memory from storing a load/store address.
15. A computer program product for speculative parallel thread execution, the computer program product comprising:
a computer readable storage medium having program instructions embodied therewith, wherein the computer readable storage medium is not a transitory signal per se, the program instructions executable by a processing device to cause the processing device to perform a method comprising:
executing a first thread in parallel with a second thread;
waiting for a notification from the first thread using a non-transaction instruction to prevent a transaction conflict; and
raising a priority of the second thread in response to receiving the notification from the first thread.
16. The computer program product of claim 15, wherein the non-transaction memory access instruction includes an inter-thread synchronization instruction and a non-transaction load instruction.
17. The computer program product of claim 16, wherein the inter-thread synchronization instruction is a memory synchronization instruction.
18. The computer program product of claim 15, wherein raising a priority of the second thread further comprises setting a priority up bit to track an attribute of a memory synchronization instruction.
19. The computer program product of claim 18, wherein raising a priority of the second thread further comprises generating a priority up event when a thread wake-up event has occurred and the priority up bit is set.
20. The computer program product of claim 19, wherein raising a priority of the second thread further comprises preventing a content-addressable memory from storing a load/store address.
US15/287,272 2012-10-24 2016-10-06 Parallel Execution Mechanism and Operating Method Thereof Abandoned US20170024249A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/287,272 US20170024249A1 (en) 2012-10-24 2016-10-06 Parallel Execution Mechanism and Operating Method Thereof

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
JP2012-234293 2012-10-24
JP2012234293A JP2014085839A (en) 2012-10-24 2012-10-24 Concurrent execution mechanism and operation method thereof
US14/061,775 US9495225B2 (en) 2012-10-24 2013-10-24 Parallel execution mechanism and operating method thereof
US15/287,272 US20170024249A1 (en) 2012-10-24 2016-10-06 Parallel Execution Mechanism and Operating Method Thereof

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US14/061,775 Continuation US9495225B2 (en) 2012-10-24 2013-10-24 Parallel execution mechanism and operating method thereof

Publications (1)

Publication Number Publication Date
US20170024249A1 true US20170024249A1 (en) 2017-01-26

Family

ID=50486418

Family Applications (2)

Application Number Title Priority Date Filing Date
US14/061,775 Expired - Fee Related US9495225B2 (en) 2012-10-24 2013-10-24 Parallel execution mechanism and operating method thereof
US15/287,272 Abandoned US20170024249A1 (en) 2012-10-24 2016-10-06 Parallel Execution Mechanism and Operating Method Thereof

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US14/061,775 Expired - Fee Related US9495225B2 (en) 2012-10-24 2013-10-24 Parallel execution mechanism and operating method thereof

Country Status (2)

Country Link
US (2) US9495225B2 (en)
JP (1) JP2014085839A (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140351826A1 (en) * 2013-05-21 2014-11-27 Nvidia Corporation Application programming interface to enable the construction of pipeline parallel programs
US9323568B2 (en) * 2014-01-24 2016-04-26 International Business Machines Corporation Indicating a low priority transaction
US10346196B2 (en) * 2015-08-11 2019-07-09 Oracle International Corporation Techniques for enhancing progress for hardware transactional memory
JP6645348B2 (en) 2016-05-06 2020-02-14 富士通株式会社 Information processing apparatus, information processing program, and information processing method
CN108228905B (en) * 2018-02-08 2020-09-25 中国人民解放军战略支援部队信息工程大学 Parallel comparison model and method for massive normalized data
EP3594804B1 (en) * 2018-07-09 2021-11-03 ARM Limited Transactional compare-and-discard instruction
CN109508337A (en) * 2018-11-12 2019-03-22 杭州秘猿科技有限公司 A kind of transaction is parallel to execute method, apparatus, electronic equipment and system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070143287A1 (en) * 2005-12-15 2007-06-21 Ali-Reza Adl-Tabatabai Coordinating access to memory locations for hardware transactional memory transactions and software transactional memory transactions
US20090217253A1 (en) * 2008-02-22 2009-08-27 Sun Microsystems, Inc. Compiler framework for speculative automatic parallelization with transactional memory

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8683143B2 (en) 2005-12-30 2014-03-25 Intel Corporation Unbounded transactional memory systems
US9043553B2 (en) 2007-06-27 2015-05-26 Microsoft Technology Licensing, Llc Leveraging transactional memory hardware to accelerate virtualization and emulation
CN101587447B (en) * 2008-05-23 2013-03-27 国际商业机器公司 System supporting transaction storage and prediction-based transaction execution method
WO2010001736A1 (en) 2008-07-04 2010-01-07 日本電気株式会社 Multiprocessor system, multithread processing method, and program
US8438568B2 (en) * 2010-02-24 2013-05-07 International Business Machines Corporation Speculative thread execution with hardware transactional memory
US9424015B2 (en) * 2011-02-08 2016-08-23 Oracle International Corporation System and method for optimizing software transactional memory operations using static caching of memory objects
US9772958B2 (en) * 2011-10-31 2017-09-26 Hewlett Packard Enterprise Development Lp Methods and apparatus to control generation of memory access requests

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070143287A1 (en) * 2005-12-15 2007-06-21 Ali-Reza Adl-Tabatabai Coordinating access to memory locations for hardware transactional memory transactions and software transactional memory transactions
US20090217253A1 (en) * 2008-02-22 2009-08-27 Sun Microsystems, Inc. Compiler framework for speculative automatic parallelization with transactional memory

Also Published As

Publication number Publication date
US9495225B2 (en) 2016-11-15
JP2014085839A (en) 2014-05-12
US20140115249A1 (en) 2014-04-24

Similar Documents

Publication Publication Date Title
US20170024249A1 (en) Parallel Execution Mechanism and Operating Method Thereof
US8301849B2 (en) Transactional memory in out-of-order processors with XABORT having immediate argument
US8544022B2 (en) Transactional memory preemption mechanism
US6182210B1 (en) Processor having multiple program counters and trace buffers outside an execution pipeline
US8332619B2 (en) Primitives to enhance thread-level speculation
US7899997B2 (en) Systems and methods for implementing key-based transactional memory conflict detection
US9626187B2 (en) Transactional memory system supporting unbroken suspended execution
CN100392622C (en) Processor with memory sequential buffer
JP5118652B2 (en) Transactional memory in out-of-order processors
EP1989619B1 (en) Hardware acceleration for a software transactional memory system
US6772324B2 (en) Processor having multiple program counters and trace buffers outside an execution pipeline
JP5416223B2 (en) Memory model of hardware attributes in a transactional memory system
US8468526B2 (en) Concurrent thread execution using user-level asynchronous signaling
US7860847B2 (en) Exception ordering in contention management to support speculative sequential semantics
JP2002508567A (en) Out-of-pipeline trace buffer for instruction re-execution after misleading
JP7007371B2 (en) Handling of inter-element address hazards for vector instructions
JP2015501019A (en) Maintenance of operand activity information in computer systems
CN111767159A (en) A coroutine-based asynchronous system call system
EP4363991A1 (en) Providing atomicity for complex operations using near-memory computing
US20050283783A1 (en) Method for optimizing pipeline use in a multiprocessing system
JP3146058B2 (en) Parallel processing type processor system and control method of parallel processing type processor system
US10209997B2 (en) Computer architecture for speculative parallel execution
US20190065160A1 (en) Pre-post retire hybrid hardware lock elision (hle) scheme

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:JACOBI, CHRISTIAN;MITRAN, MARCEL;OHARA, MORIYOSHI;SIGNING DATES FROM 20130918 TO 20131023;REEL/FRAME:039963/0077

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: ADVISORY ACTION MAILED

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION