HK1210845B

HK1210845B - Method and system for improving exception processing

Info

Publication number: HK1210845B
Application number: HK15111616.4A
Authority: HK
Inventors: J.D.布拉德伯里; E.M.施瓦茨; T．什莱格尔; M．K．克施温德
Original assignee: 国际商业机器公司
Priority date: 2013-01-23
Filing date: 2013-12-06
Publication date: 2019-06-28

Description

Method and system for facilitating exception handling

Background

One or more aspects relate generally to processing within a computing environment, and more particularly to vector processing within such an environment.

Processing within a computing environment includes controlling the operation of one or more Central Processing Units (CPUs). Generally, the operation of the central processing unit is controlled by instructions in the memory device. Instructions may have different formats and typically specify registers to be used in performing various operations.

Depending on the architecture of the central processing unit, various types of registers may be used, including, for example, general purpose registers, special purpose registers, floating point registers, and/or vector registers. Different types of registers may be used with different types of instructions. For example, a floating point register stores a floating point number to be used by a floating point instruction; the vector registers hold data for vector processing performed by single instruction multiple data (SMID) instructions, including vector instructions.

Disclosure of Invention

The shortcomings of the prior art are overcome and advantages are provided through the provision of a computer program product for executing machine instructions. The computer program product includes a computer readable storage medium readable by a processing circuit and storing instructions for execution by the processing circuit for performing a method. The method for example comprises: determining, by a processor, that an instruction executing within a computing environment has caused an exception, the instruction performing an operation with respect to a vector register comprising a plurality of elements; and obtaining a vector exception code based on the exception, the vector exception code including a location of an element of the plurality of elements of the vector register that caused the exception.

Methods and systems relating to one or more aspects are also described and claimed herein. Further, services relating to one or more aspects are also described and may be claimed herein.

Additional features and advantages are realized through one or more aspects of the technology. Other embodiments and aspects are described in detail herein and are considered a part of the claims.

Drawings

One or more aspects are particularly pointed out and distinctly claimed as examples in the claims at the conclusion of the specification. The above and other objects, features and advantages will be apparent from the following detailed description when read in conjunction with the accompanying drawings in which:

FIG. 1 illustrates an example of a computing environment to incorporate and use one or more aspects;

FIG. 2A illustrates another example of a computing environment to incorporate and use one or more aspects;

FIG. 2B shows further details of the memory of FIG. 2A;

FIG. 3 illustrates one example of a register file;

FIG. 4A illustrates an example of a format of a vector floating point test DataType immediate instruction;

FIG. 4B illustrates an example of bit values of a third operand of the vector floating point test DataType immediate instruction of FIG. 4A;

FIG. 4C illustrates one embodiment of logic associated with the vector floating point test Datagram immediate instruction of FIG. 4A;

FIG. 4D illustrates one example of a block diagram of an execution of the vector floating point test DataType immediate instruction of FIG. 4A;

FIG. 4E illustrates an example of the definition of multiple classes of binary floating point data;

FIG. 5A illustrates one example of a format of a vector checksum instruction;

FIG. 5B illustrates one embodiment of the logic associated with the vector checksum instruction of FIG. 5A;

FIG. 5C illustrates one example of a block diagram of an execution of the vector checksum instruction of FIG. 5A;

FIG. 6A illustrates one example of a format of a vector Galois field multiply sum and accumulate instruction;

FIG. 6B illustrates one embodiment of the logic associated with the vector Galois field multiply sum and accumulate instruction of FIG. 6A;

FIG. 6C illustrates one example of a block diagram for execution of the vector Galois field multiply-sum and accumulate instruction of FIG. 6A;

FIG. 7A illustrates one example of a format of a vector generate mask instruction;

FIG. 7B illustrates one embodiment of the logic associated with the vector generate mask instruction of FIG. 7A;

FIG. 7C illustrates one example of a block diagram for execution of the vector generate mask instruction of FIG. 7A;

FIG. 8A illustrates one example of a vector element rotate and insert under mask instruction;

FIG. 8B illustrates one embodiment of the logic associated with the vector element rotate and insert under mask instruction of FIG. 8A;

FIG. 8C illustrates one example of a block diagram for execution of the vector element rotate and insert under mask instruction of FIG. 8A;

FIG. 9A illustrates an example of a vector exception code;

FIG. 9B illustrates one embodiment of logic to set the vector exception code of FIG. 9A;

FIG. 10 depicts one embodiment of a computer program product incorporating one or more aspects;

FIG. 11 illustrates one embodiment of a host computer system;

FIG. 12 illustrates a further example of a computer system;

FIG. 13 illustrates another example of a computer system including a computer network;

FIG. 14 illustrates one embodiment of various elements of a computer system;

FIG. 15A illustrates one embodiment of an execution unit of the computer system of FIG. 14;

FIG. 15B illustrates one embodiment of a branching unit of the computer system of FIG. 14;

FIG. 15C illustrates one embodiment of a load/store unit of the computer system of FIG. 14; and

FIG. 16 illustrates one embodiment of an emulated host computer system.

Detailed Description

According to one or more aspects, a vector tool is provided that includes a variety of vector instructions and vector exception handling. Each instruction described herein is a Single Instruction Multiple Data (SIMD) instruction that uses one or more vector registers (also referred to herein as vectors). A vector register is, for example, a processor register (also referred to as a hardware register) that is a small amount of storage (e.g., non-main memory) available as part of a Central Processing Unit (CPU) or other processor. Each vector register contains a vector operand having one or more elements, for example 1, 2, 4 or 8 bytes in length. In other embodiments, the elements can have other sizes, and the vector instructions need not be SIMD instructions.

One embodiment of a computing environment to incorporate and use one or more aspects is described with reference to FIG. 1. Computing environment 100 includes, for instance, a processor 102 (e.g., a central processing unit), a memory 104 (e.g., a main memory), and one or more input/output (I/O) devices and/or interfaces 106, coupled to one another via, for instance, one or more buses 108 and/or other connections.

In one example, the processor 102 is based on the z/Architecture offered by International Business machines corporation and is part of a server, such as a System z Server, which is also offered and implemented by International Business machines corporation. One embodiment of the z/Architecture is described in the title "z/Architecture principles of Operation, numbered SA22-7832-09Publication, tenth edition, 9 months 2012)Described in a publication, the entire contents of which are incorporated herein by reference. In one example, a processor executing an operating system such as z/OS is also provided by International Business machines corporation.Andis a registered trademark of international business machines corporation, armonk, new york, usa. Other names used herein may be registered trademarks, trademarks or product names of International Business machines corporation or other companies.

In a further embodiment, the processor 102 is based on international commerceThe PowerArchitecture offered by machine corporation. One embodiment of the Power Architecture is in the "Power ISA^TMVersion 2.06, revision B (international business machines corporation, 23/7/2010), the entire contents of which are incorporated herein by reference. PowerIs a registered trademark of international business machines corporation.

In another embodiment, the processor 102 is based on the Intel architecture provided by Intel corporation. An embodiment of the Intel architecture is "64and IA-32Architectures Developer’sManual:Vol.2B,Instructions Set Reference,A-L(64 and IA-32 architecture developer guide: volume 2B, instruction set reference, A-L, order number: 253666 well 045US, 2013 month 1) & lt "64and IA-32ArchitecturesDeveloper’s Manual:Vol.2B,Instructions Set Reference,M-Z(64 and IA-32 architecture developer guide: volume 2B, instruction set reference, M-Z, order number: 253667-.Is a registered trademark of intel corporation, santa clara, california.

Another embodiment of a computing environment to incorporate and use one or more aspects is described with reference to FIG. 2A. In this example, computing environment 200 includes, for instance, a local central processing unit 202, a memory 204, and one or more input/output devices and/or interfaces 206, coupled to one another, e.g., via one or more buses 208 and/or other connections. For example, the computing environment 200 may include a PowerPC processor, pSeries server, or xSeries server offered by International Business machines corporation, Armonk, N.Y.; HP Superdome with intel itanium II processor from hewlett packard, palo alto, california; and/or other machines based on the architecture provided by international business machines corporation, hewlett packard corporation, intel, Oracle, or other companies.

The native central processing unit 202 includes one or more native registers 210, such as one or more general purpose registers and/or one or more special purpose registers used during processing within the environment. These registers contain information that represents the state of the environment at any particular time.

In addition, the native central processing unit 202 executes instructions and code stored in memory 204. In one particular example, the central processing unit executes emulator code 212 stored in memory 204. The code enables a processing environment in one architecture configuration to emulate another architecture. For example, the emulator code 212 allows machines based on architectures other than z/Architecture (e.g., PowerPC processors, pSeries servers, xSeries servers, HP Superdome servers, etc.) to emulate and execute software and instructions developed based on z/Architecture.

Further details regarding the emulator code 212 are described with reference to FIG. 2B. The guest instructions 250 stored in the memory 204 include software instructions (e.g., associated with machine instructions) developed to execute in an architecture other than that of the native CPU 202. For example, the guest instruction 250 may have been designed to execute on the z/Architecture processor 102, but in fact is emulated on the native CPU 202, which may be, for example, an Intel Itanium II processor. In one example, the emulator code 212 includes an instruction fetch routine 252 to fetch one or more guest instructions 250 from the memory 204 and optionally to provide local buffering for the fetched instructions. It also includes an instruction conversion routine 254 to determine the type of guest instruction that has been obtained and to convert the guest instruction into one or more corresponding native instructions 256. The translation includes, for example, identifying a function to be performed by the guest instruction, and selecting the native instruction(s) to perform the function.

Further, emulator 212 includes emulation control routine 260 to cause execution of native instructions. The emulation control routine 260 may cause the native CPU 202 to execute a routine that emulates native instructions of one or more previously obtained guest instructions, and at the end of such execution, return control to the instruction fetch routine to emulate the obtaining of the next guest instruction or set of guest instructions. Execution of the native instructions 256 may include loading data from the memory 204 into registers; storing data from the register back to the memory; or perform some type of arithmetic or logical operation as determined by the conversion routine.

Each routine is implemented, for example, by software stored in memory and executed by the local central processing unit 202. In other examples, one or more routines or operations are implemented in firmware, hardware, software, or some combination thereof. The registers of the emulated processor may be emulated using registers 210 of the native CPU or using locations in memory 204. In various embodiments, the guest instructions 250, native instructions 256, and emulator code 212 may reside in the same memory or may be distributed among different storage devices.

As used herein, firmware includes, for example, microcode, and/or macrocode of a processor. It includes, for example, hardware-level instructions and/or data structures used in the implementation of higher-level machine code. In one embodiment, it comprises, for example, specialized code that is typically provided as microcode (including trusted software or microcode specific to the underlying hardware) and that controls operating system access to the system hardware.

In one example, the guest instructions 250 that are obtained, translated, and executed are the instructions described herein. Instructions having one Architecture (e.g., z/Architecture) are retrieved from memory, translated, and represented as a series of native instructions 256 having another Architecture (e.g., PowerPC, pSeries, xSeries, Intel, etc.). These native instructions are then executed.

In one embodiment, the instructions described herein are vector instructions that are part of a vector tool. The vector tool provides, for example, a vector of fixed size from 1 to 16 elements. Each vector includes data that is operated on by a vector instruction defined in the tool. In one embodiment, if the vector includes multiple elements, each element is processed in parallel with the other elements. Instruction completion does not occur until processing of all elements is complete. In other embodiments, the elements are partially processed in parallel and/or sequentially.

Vector instructions may be implemented as part of a variety of architectures including, but not limited to, z/Architecture, Power, x86, IA-32, IA-64, and the like. Although the embodiments described herein are directed to z/Architecture, the vector instructions and one or more other aspects described herein may be based on many other architectures. The z/Architecture is just one example.

In one embodiment, where the vector facility is implemented as part of a z/Architecture, to use vector registers and instructions, the vector enable control and register control in a designated control register (e.g., control register 0) are set to 1, for example. If the vector tool is installed and the vector instruction is executed without setting the enable control, a data exception is identified. If the vector tool is not installed and the vector instruction is executed, an operation exception is identified.

In one embodiment, there are 32 vector registers, and other types of registers may be mapped to one quarter of the vector registers. For example, as shown in FIG. 3, register file 300 includes 32 vector registers 302, and each register is 128 bits in length. Sixteen floating-point registers 304 (64 bits in length) may override vector registers. Thus, as an example, when floating-point register 2 is modified, vector register 2 is also modified. Other mappings of other types of registers are possible.

The vector data appears in the storage means, for example, in the same left-to-right order as the other data formats. Bits numbered 0-7 of the data format constitute the byte in the leftmost (lowest numbered) byte position in the storage device, bits 8-15 form the byte in the next sequential position, and so on. In a further example, the vector data may appear in the storage in another order (e.g., right to left).

Each vector instruction described herein has a plurality of fields, and one or more fields have a subscript number associated therewith. The subscript numbers associated with a field of an instruction indicate the operands to which that field applies. E.g. AND vector register V₁Associated reference numeral 1 denotes V₁The register in (1) includes the first operand, and so on. The length of a register operand is 1 register, which is 128 bits, for example.

Further, many vector instructions provided using vector tools have a field that includes a specified bit. This field (also called register extension bit or RXB) includes the most significant bits of the operand named for each vector register. The bit of the register name (register designation) not specified by the instruction will be reserved and set to 0. The most significant bit is concatenated, for example, to the left of the four-bit register name to create a five-bit vector register name.

In one example, the RXB field includes four bits (e.g., bits 0-3), which are defined as follows:

0-the highest order bit of the first vector register name (e.g., in bits 8-11) of the instruction.

1-the second vector register of the instruction names (e.g., in bits 12-15) the most significant bit (if any).

The third vector register of the 2-instruction names (e.g., in bits 16-19) the most significant bit (if any).

The fourth vector register of the 3-instruction names (e.g., in bits 32-35) the most significant bit (if any).

Each bit is set to either 0 or 1, for example by an assembler, depending on the register number. For example, for registers 0-15, the bit is set to 0; for registers 16-31, the bit is set to 1, and so on.

In one embodiment, each RXB bit is an extension bit that includes a particular location in an instruction of one or more vector registers. For example, in one or more vector instructions, bit 0 of RXB is an extension bit of positions 8-11, which is assigned to, for example, V₁(ii) a Bit 1 of RXB is an extension bit of positions 12-15, which is assigned to, for example, V₂(ii) a And so on. In another embodiment, the RXB field includes additional bits, and more than one bit is used as an extension for each vector or position.

One instruction provided according to an aspect that includes an RXB field is a vector floating point test data class immediate (VFTCI) instruction, an example of which is shown in fig. 4A. In one example, the vector floating point test dataclass immediate instruction 400 includes: an opcode field 402a (e.g., bits 0-7), 402b (e.g., bits 40-47) that indicates a vector floating point test DataType immediate operation; a first vector register field 404 (e.g., bits 8-11) used to specify a first vector register (V)₁) (ii) a A second vector register field 406 (e.g., bits 12-15) used to specify a second vector register (V)₂) (ii) a Immediate field (I)₃)408 (e.g., bits 16-27), which includes a bitmask; first mask field (M)₅)410 (e.g., bits 28-31); second mask field (M)₄)412 (e.g., bits 32-35); and an RXB field 414 (e.g., bits 36-39). In one example, each of the fields 404-414 is separate and independent from the opcode field(s). Further, in one embodiment, they are separate and independent from each other; however, in other embodiments, more than one field may be combined. Further information regarding the use of these fields is described below.

In one example, selected bits (e.g., the first two bits) of the opcode specified by the opcode field 402a specify the length of the instruction. In this particular example, the selected bits indicate a length of three halfwords. Further, the format of the instruction is a vector register with an extended opcode field and an immediate operation. Each vector (V) field specifies a vector register along with its corresponding extension bit specified by RXB. Specifically, for a vector register, the register containing the operand is specified, for example, using a four-bit field of the register field plus its corresponding register extension bit (RXB) as the most significant bit. For example, if the four-bit field is 0110 and the extension bit is 0, the five-bit field 00110 indicates register number 6.

Further, in one embodiment of the VFTCI command, V₁404 and V₂406 specify, for the instruction, vector registers that include a first operand and a second operand, respectively. In addition, I₃408 includes a bitmask having a plurality of bits, and each bit is used to represent a binary floating point element class and sign (positive or negative), as described in further detail below.

In another embodiment, for example, the bitmask may be provided in a general purpose register, in memory, in an element of a vector register (with differences depending on the element), by address calculation. The bitmask may be included as an explicit operand of the instruction or as an implicit operand or input.

M₅The field 410 has, for example, four bits 0-3 and specifies a single element control (S), for example, in bit 0. If bit 0 is set to 1, then the operation only occurs for the zero index element in the vector. The bit positions of all other elements in the first operand vector are unpredictable. If bit 0 is set to 0, then the operation occurs for all elements in the vector.

M₄The field 412 is used, for example, to specify the size of a floating point number in the second operand of the instruction. In one example, this field is set to 3, indicating a double precision binary floating point number. Other examples are possible.

In executing one embodiment of a vector floating point test data class immediate instruction, the class and sign of the floating point element(s) of the second operand are checked to select one or more bits from the third operand. If the selected bit is set, all bit positions of the corresponding element in the first operand are set to 1; otherwise, they are set to 0. That is, if the class/sign of the floating point number contained in the element of the second operand matches a set bit (i.e., the bit is set to, for example, 1) in the third operand, the element of the first operand corresponding to the element of the second operand is set to 1. In one example, all operand elements contain long format BFP (binary floating point) numbers.

As indicated herein, the 12 bits of the third operand (bits 16-27 of the instruction text) are used to specify 12 combinations of BFP data classes and symbols. In one example, as shown in FIG. 4B, BFP operand elements are divided into six classes 430: 0. a norm, a non-norm, infinity, silence NaN (non-numeric), signaling NaN, and each class has a symbol 432 (positive or negative) associated with it. Thus, for example, I₃Bit 0 of (a) specifies class 0 with a positive sign, bit 1 specifies class 0 with a negative sign, and so on.

One or more third operand bits may be set to 1. Further, in one embodiment, the instructions may perform operations on one or more elements simultaneously.

Operand elements including SNaN (signaling NaN) and QNaN (silent NaN) are checked without causing an IEEE exception.

Summary condition codes for all elements obtained:

0 for all elements, choose bit 1 (match)

1 bit is selected to be 1 for at least one element, but not all elements (when the S bit is 0)

2--

3 for all elements, select bit 0 (not match)

IEEE exception: is free of

Program exception:

data with data exception code (DXC) FE, vector instruction indicating no vector tool enabled

Operation (if vector tool for z/Architecture is not installed)

Norm of

Constraint of transaction

Programming comments:

1. the instruction provides a way to test operand elements without the risk of exceptions or setting the IEEE flag.

2. When the S bit is set, condition code 1 is not used.

Further details of one embodiment relating to a vector floating point test dataclass immediate instruction are described with reference to FIGS. 4C and 4D. In particular, FIG. 4C illustrates one embodiment of logic associated with a vector floating point test dataclass immediate instruction executed by a processor (e.g., CPU), and FIG. 4D is an example of a block diagram illustrating execution of the vector floating point test dataclass immediate instruction.

Referring to fig. 4C, first, a variable called an element index (Ei) is initialized to 0 (step 450). Then, a second operand of the slave instruction (e.g., the slave is stored at V)₂An operand in a specified register) extracts the value in element Ei (in this case element 0) (step 452). The value, which is a long format binary floating point value, is converted to a type number to obtain the class and sign of the floating point element of the second operand, as described below (step 454). In one example, the size of floating point number 453 is input to translation logic. The obtained class and sign are associated with a particular class/sign bit as described with reference to fig. 4B. For example, if the translation indicates that the floating point number is a positive canonical number, then bit 2 is associated with the floating point number.

After the conversion, the bits in the third operand (referred to as selected bits) corresponding to the particular bits determined based on the conversion are examined (step 456). If the selected bit is set (query 458), the element in the first operand corresponding to element (Ei) is set equal to all 1's (step 460); otherwise, the element in the first operand is set equal to 0 (step 462). For example, if the translation of a floating point number in element 0 indicates a positive canonical number, bit 2 is associated with that number. Thus, bit 2 of the third operand is examined, and if set to 1, element 0 of the first operand is set to all 1's.

Thereafter, a determination is made as to whether Ei is equal to the maximum number of elements of the second operand (INQUIRY 464). If not, Ei is incremented by, for example, 1 (step 466), and processing continues with step 452. Otherwise, if Ei is equal to the maximum number of elements, a summary condition code is generated (step 468). The summary condition code outlines the processing of all elements of the second operand. For example, if the selected bit is 1 (match) for all elements, the resulting condition code is 0. On the other hand, if bit 1 is selected (when the S bit is not 0) for at least one element (but not all elements), then the condition code is 1, and if bit 0 is selected (not match) for all elements, then the condition code is 3.

The above process is graphically illustrated in the block diagram of fig. 4D. As shown, vector register 480 includes a plurality of elements 482a-482n, each of which includes a floating point number. Each floating point number and size of floating point numbers 483a-483n is input to the convert to form logic 484a-484n, and the output is a specific bit of a class/symbol that represents the floating point number. Then, selected bits in each mask 486a-486b corresponding to each particular bit are examined. The first operand in the vector register 488 is set depending on whether the selected bit is set. For example, if the selected bit is set for element 0 of the second operand, then element 490a of the first operand is set to all 1's. Similarly, if the selected bit of element 1 of the second operand is not set (e.g., is set to 0), then element 490b of the first operand is set to 0, and so on.

Further details of one embodiment of the conversion to type logic are now described. First, a floating-point number that is a standard IEEE binary floating-point number is converted into three parts: sign, exponent (8 bits) +127, and mantissa (fraction) (32 bits), as is well known. The values of these three portions are then examined to determine class and sign, as shown in FIG. 4E. For example, the symbol is the value of the sign portion, and the class (i.e., the entity in fig. 4E) is based on the values of the exponent and mantissa (the unit bit in fig. 4E is the implicit bit of the mantissa). As an example, if the exponent and mantissa (including unit bits) have values of 0, the class is 0, and if the sign part is positive, the sign is positive. Thus, bit 0 (FIG. 4B) represents the class/sign of the floating point number.

One embodiment of an instruction to test the floating point number class of elements in a vector and set the resulting bitmask is described above. The vector floating point test data class immediate instruction has an immediate field with each bit in the field representing the class of floating point numbers to be detected. Each floating point element of the input vector is tested to see if the value belongs to any of the classes specified by the instruction. If the floating-point element belongs to one of the classes, the bit position of the corresponding element of the output vector is set to 1. This provides a technique for determining certain attributes (e.g., class and sign) about a binary floating-point number without causing any exceptions or interrupts.

In another embodiment, the testing may be performed by: it is checked which bits in the third operand are set (e.g., to 1) and it is then determined whether the class/sign of one or more elements of the second operand is the same as one of the set bits. The first operand is then set based on the comparison.

In another aspect, a vector checksum instruction is provided, an example of which is shown in FIG. 5A. In one example, the vector checksum instruction 500 includes: opcode fields 502a (e.g., bits 0-7), 502b (e.g., bits 40-47) that indicate a vector checksum operation; a first vector register field 504 (e.g., bits 8-11) used to specify a first vector register (V)₁) (ii) a A second vector register field 506 (e.g., bits 12-15) used to specify a second vector register (V)₂) (ii) a A third vector register field 508 (e.g., bits 16-19) that is used to specify a third vector register (V)₃) (ii) a And an RXB field 510 (e.g., bits 36-39).In one example, each of the fields 504-510 is separate and independent from the opcode field(s). Further, in one embodiment, they are separate and independent from each other; however, in other embodiments, more than one field may be combined.

In another embodiment, the third vector register field is not included as an explicit operand of the instruction, but instead is an implicit operand or input. Further, the values provided in the operands may be provided in other ways, such as in general purpose registers, in memory, as address calculations, and so forth.

In yet another embodiment, no explicit or implicit third operand is provided at all.

In one example, selected bits (e.g., the first two bits) of the opcode specified by the opcode field 502a specify the length of the instruction. In this particular example, the selected bits indicate a length of three halfwords. Further, the format of the instruction is vector register and register operation with an extended opcode field. Each vector (V) field specifies a vector register along with its corresponding extension bit specified by RXB. Specifically, for a vector register, the register containing the operand is specified, for example, using a four-bit field of the register field plus its corresponding register extension bit (RXB) as the most significant bit.

In executing one embodiment of the vector checksum instruction, elements from the second operand (e.g., single word size elements) are added together one by one along with selected elements of the third operand (e.g., elements in word 1 of the third operand). (in another embodiment, the addition of the selected elements of the third operand is optional). The sum is placed in a selected location (e.g., word 1) of the first operand. 0 is placed within the other word elements (e.g., word elements 0, and 2-3) of the first operand. The single-word-sized elements are all treated as 32-bit unsigned binary integers. After each element is added, a carry from the sum (e.g., bit position 0) is added, for example, to bit position 31 of the result in word element 1 of the first operand.

Condition codes: the code remains unchanged.

Program exception:

Operation (if vector tool for z/Architecture is not installed)

Constraint of transaction

Programming comments:

1. the content of the third operand will contain a0 at the beginning of the checksum calculation algorithm.

A 2.16 bit checksum is used, for example, for TCP/IP applications. The following procedure may be performed after the 32-bit checksum has been calculated:

VERLLF V2, V1,16(0) (VERLLF-vector element rotation left logic-4 byte value)

VAF V2, V1, V2 (VAF-vector addition-4 byte value)

The halfword in element 2 contains a 16-bit checksum.

Further details regarding the vector checksum instruction are described with reference to fig. 5B and 5C. In one example, FIG. 5B illustrates one embodiment of logic executed by a processor in executing a vector checksum instruction and FIG. 5C illustrates a block diagram of one example of executing a vector checksum instruction.

Referring to FIG. 5B, first, the element index (Ey) of the first operand (OP1) is set to 1, for example, thereby indicating element 1 of the first operand (step 530). Similarly, the element index (Ex) of the third operand (OP3) is set to, for example, 1, indicating element 1 of the third operand (step 532). Then, the element index (Ei) is set equal to 0, and the element at the element index (Ey) (i.e., element 1 in this example) is initialized to 0 (step 534). In further embodiments, Ex and Ey may be set to any valid element index.

An End Around Carry (EAC) addition is performed, where OP1(Ey) is OP1(Ey) + OP2(Ei) + OP2(Ei +1) (step 536). Thus, element 1 of the output vector (OP1) is set equal to the content of this element plus the value in element 0 of the second operand (OP2) and the value in element 1 of the second operand. With end carry-back addition, an add operation is performed and any carry from the add is added back to the sum to produce a new sum.

In another embodiment, the above-described addition is not performed, but the following operations are performed: a temporary accumulator value is defined and initialized to 0 and then added one element at a time. As another embodiment, all words are added in parallel and there is no temporary accumulator. Other variations are also possible.

Thereafter, a determination is made as to whether there are additional elements to be added to the second operand (INQUIRY 538). E.g., whether Ei-2 is less than the element number of the second operand. If there are more second operand elements to be added, Ei is incremented by 2, for example (step 540), and processing continues with step 536.

After adding the element across the second operand, the result is added to the value in the third operand. For example, an end-carry-back addition of the element of the first operand (Ey), which is the sum of the EAC additions across all second operand elements, and the value in the element (Ex) of the third operand (OP3) (i.e., EAC ADD OP1(Ey) + OP3(Ex)) is performed (step 542). This is shown graphically in fig. 5C.

As shown in FIG. 5C, the second operand 550 includes a plurality of elements 552a-552n, and these elements are added together one by one along with the elements in words 1(562) of the third operand 560. The result is placed in element 1(572) of the first operand 570. This is mathematically shown by the following equation: ey is the sum of Ex + Ei, where i is 0 to n, and the addition is an end carry-back addition.

One embodiment of performing checksums across elements of a vector register, rather than performing a ryan arithmetic (lanearithmetic) vector checksum instruction, is described above. In one embodiment, the vector checksum instruction performs a checksum by performing a sum-over-crossing (sum-across) by end carry-back addition. In one example, a vector checksum instruction fetches four 4-byte integer elements from a vector register and adds them together. Any carry from the addition is added back to the sum. The 4-byte sum is added to the 4-byte element in the other operand and then saved in yet another vector register (e.g., the low-order 4-byte element of the vector register, with 0 stored in the high-order element of the vector register).

In another embodiment, no further vector register or another register is used to hold the value, but one of the other registers (i.e., operands) is used as an accumulator.

The provided checksum may be used to maintain data integrity. Checksums are often applied to data and sent over noisy channels to verify that the received data is correct. In this example, the checksum is calculated by adding consecutive 4-byte integers together, as described herein. If there is a carry from an integer arithmetic operation, the carry (an extra 1) is added to the current sum (running sum).

Although checksums are described herein, similar techniques may be used for other end carry-back additions.

Another instruction provided according to one aspect is a vector galois field multiply sum and accumulate (VGFMA) instruction, an example of which is shown in fig. 6A. In one example, the vector galois field multiply sum and accumulate instruction 600 includes: an opcode field 602a (e.g., bits 0-7), 602b (e.g., bits 40-47) that indicates a vector Galois field multiply-sum and accumulate operation; a first vector field 604 (e.g., bits 8-11) used to specify a first vector register (V)₁) (ii) a A second vector register field 606 (e.g., bits 12-15) used to specify a second vector register (V)₂) (ii) a A third vector register field 608 (e.g., bits 16-19) that is used to specify a third vector register (V)₃) (ii) a Mask field (M)₅)610 (e.g., bits 20-23)(ii) a A fourth vector register field 612 (e.g., bits 32-35) used to specify a fourth vector register (V)₄) (ii) a And an RXB field 614 (e.g., bits 36-39). In one example, each of the fields 604-614 is separate and independent from the opcode field(s). Further, in one embodiment, they are separate and independent from each other; however, in other embodiments, more than one field may be combined.

In one example, selected bits (e.g., the first two bits) of the opcode specified by the opcode field 602a specify the length of the instruction. In this particular example, the selected bits indicate a length of three halfwords. Further, the format of the instruction is vector register and register operation with an extended opcode field. Each vector (V) field specifies a vector register along with its corresponding extension bit specified by RXB. Specifically, for a vector register, the register containing the operand is specified, for example, using a four-bit field of the register field plus its corresponding register extension bit (RXB) as the most significant bit.

M₅The field 610 has, for example, four bits 0-3 and specifies an Element Size (ES) control. The element size control specifies the size of the elements in vector register operands 2 and 3; the elements in the first and fourth operands are twice the size of the elements specified by the ES control. For example, M₅A value of 0 in indicates a single byte size element; 1 indicates a half word; 2 indicates a single character; and 3 indicates a doubleword.

In executing one embodiment of the vector galois field multiply sum and accumulate instruction, each element of the second operand is multiplied in a galois field (i.e., a finite field having a finite number of elements) by a corresponding element of the third operand. That is, each element of the second operand is multiplied by a corresponding element of the third operand using carryless multiplication. In one example, a galois field is second order. The multiplication is similar to a standard binary multiplication, which does not add the shifted multiplicands, but rather is an exclusive or (XOR). For example, the resulting pairs of odd-even double element sizes are xored with each other and with corresponding elements (e.g., double wide elements) of the fourth operand. The result is put, for example, into a double-wide element of the first operand.

Condition codes: the code remains unchanged.

Program exception:

Operation (if vector tool for z/Architecture is not installed)

Norm of

Constraint of transaction

In another embodiment, the instruction may include one or fewer operands. For example, the value to be XOR'd is located in the first operand, which will also include the result, rather than in the fourth operand. Other variations are also possible.

Further details regarding one embodiment of execution of a vector galois field multiply sum and accumulate instruction are described with reference to fig. 6B and 6C. In one example, FIG. 6B illustrates one embodiment of logic executed by a processor to perform a vector Galois field multiply-sum-and-accumulate instruction and FIG. 6C is one example of a block diagram illustrating execution of the logic.

Referring to FIG. 6B, first, odd/even pairs are extracted from the second operand (OP2), the third operand (OP3), and the fourth operand (OP4) (step 630) and the carryless multiply-sum-accumulate function is performed (step 632). For example, when performing operations in a power-of-2 galois field, the carryless multiplication is a shift or XOR (exclusive or), which effectively ignores any carry. The result is placed in the first operand (OP1) (step 634) and it is determined whether there are more pairs to be extracted (query 636). If there are more pairs, processing continues with step 630; otherwise, processing is complete (step 638). In one example, the element size 631 is entered into step 630 and 634.

Further details of the carryless multiply-sum-accumulate function of step 632 are described with reference to FIG. 6C. As shown, a pair of operands OP2H 652a, OP2L652b are extracted from the second operand 650. Further, operand pairs OP3H 662a, OP3L 662b are extracted from the third operand 660, and operand pairs OP4H 672a and OP4L 672b are extracted from the fourth operand 670. The operand OP2H 652a is multiplied by the operand OP3H 662a through carryless multiplication and then provides a result H680 a. Similarly, OP2L652b multiplies with operand OP3L 662b using carryless multiplication and then provides result L680 b. Then, the result H680 a is XOR-ed with the result L680 b, the result is XOR-ed with the operands OP4H 672a and OP4L 672b, and the results are then placed into OP1H690a and OP1L 690 b.

Vector instructions that perform carryless multiplication operations followed by a final exclusive-or operation to create a cumulative sum are described herein. This technique can be used to perform error detection code and aspects of cryptography of operations in a second-order finite domain.

In one example, the instruction performs a carryless multiplication operation for a plurality of elements of a vector register to obtain a sum. Further, the instruction performs a final exclusive or on the sums to produce an accumulated sum. When executed, the instructions multiply corresponding elements in the second and third vectors in a galois field and perform an exclusive or on the shifted multiplicands. Each double-wide product is xored with each other, and the result is xored with the double-wide corresponding element of the first vector. The result is stored in a first vector register. Although a two-word element is described above, single-word sized elements having other element sizes may be used. The instructions are capable of performing operations for a plurality of different element sizes.

Another instruction provided in accordance with an aspect is a Vector Generate Mask (VGM) instruction, an example of which is described with reference to fig. 7A. In one example, the vector generate mask instruction 700 includes: opcode fields 702a (e.g., bits 0-7), 702b (e.g., bits 40-47) that indicate a vector generate mask operation; a first vector register field 704 (e.g., bits 8-11) used to specify a first vector register (V)₁) (ii) a First momentField I₂706 (e.g., bits 16-24) used to specify a first value; second immediate field (I)₃)708 (e.g., bits 24-32) used to specify a second value; mask field (M)₄)710 (e.g., bits 32-35); and an RXB field 712 (e.g., bits 36-39). In one example, each of the fields 704-712 are separate and independent from the opcode field(s). Further, in one embodiment, they are separate and independent from each other; however, in other embodiments, more than one field may be combined.

In another embodiment, the first value and/or the second value may be provided in a general register, in memory, in an element of a vector register (differing by element), provided by an address calculation. It may be included as an explicit operand of the instruction or as an implicit operand or input.

In one example, selected bits (e.g., the first two bits) of the opcode specified by the opcode field 702a specify the length of the instruction. In this particular example, the selected bits indicate a length of three halfwords. Further, the format of the instruction is a vector register with an extended opcode field and immediate operation. Each vector (V) field specifies a vector register along with its corresponding extension bit specified by RXB. Specifically, for a vector register, the register containing the operand is specified, for example, using a four-bit field of the register field plus its corresponding register extension bit (RXB) as the most significant bit.

M₄The field specifies, for example, an element size control (ES). The element size control specifies the size of an element in a vector register operand. In one example, M₄Bit 0 of the field specifies a single byte; bit 1 specifies a halfword (e.g., 2 bytes); bit 2 specifies a single word (e.g., 4 bytes; i.e., a full word); and bit 3 specifies a doubleword.

In executing one embodiment of the vector generate mask instruction, a bitmask is generated for each element in the first operand. The mask includes bits set to 1, from, for example, I₂Starting at a bit position specified by an unsigned integer value in (a) to, e.g., I₃Ends at the bit position specified by the unsigned integer value in (1). All other bit positions are set to 0. In one example, only from I₂And I₃The field uses the number of bits required to represent all bit positions of the specified element size; the other bits are ignored. If I₂Bit position in field greater than I₃Bit position in the field, then the range of bits wraps around (wrap) at the maximum bit position for the specified element size. For example, assume a single byte sized element, if I₂1 and I₃When the mask obtained is X, 6^︳7E^︳Or B^︳01111110^︳. However, if I₂1 is 6 and I₃When the mask obtained is X, 1^︳81^︳Or b^︳10000001^︳。

Condition codes: the code remains unchanged.

Program exception:

Operation (if vector tool for z/Architecture is not installed)

Norm of

Constraint of transaction

Further details regarding one embodiment of a vector generate mask instruction are described with reference to fig. 7B and 7C. In particular, FIG. 7B illustrates one embodiment of the logic associated with a vector generate mask instruction executed by a processor; figure 7C is one example of a block diagram illustrating one embodiment of execution of a vector generate mask instruction.

Referring to FIG. 7B, first, a mask is generated for each element in the first operand (step 720). This step uses a number of inputs including a value (722) specified as the starting position in the second operand field, a value (724) specified as the ending position in the third operand field, and an element specified in the M4 fieldSize of element (726). These inputs are used to generate a mask and to fill in the position of a selected element (e.g., element 0) of the first operand (OP1) (step 730). For example, element 0 of the first operand (OP1) includes multiple positions (e.g., bit positions), and is driven from I₂Starting at the position specified by the unsigned integer value in (1) to (I)₃Ends with the position (e.g., bit) of element 0 of the first operand set to 1. The other bit positions are set to 0. Then, a determination is made as to whether there are more elements in the first operand (INQUIRY 734). If there are more elements, processing continues with step 720. Otherwise, processing is complete (step 736).

The generation of the mask and the population of the first operand is shown in graphical form in fig. 7C. As shown, a mask for each element of the first operand is generated 720 using the inputs (e.g., 722 and 726), and the result of generating the mask is stored in the elements of the first operand 740.

The instructions to generate a bitmask for each element of the vector are described in detail above. In one embodiment, the instruction takes a start bit position and an end bit position and generates a bit mask that is repeated for each element. The instruction specifies a range of bits, each bit within this range being set to 1 for each element of the vector register, while the other bits are set to 0.

In one embodiment, using instructions to generate bitmasks provides advantages over, for example, loading bitmasks from memory, which increases the cache footprint of the instruction stream, and may increase the latency of critical loops depending on the number of masks required.

Yet another instruction provided in accordance with an aspect is a vector element rotate and insert under mask (VERIM) instruction, an example of which is illustrated in fig. 8A. In one example, the vector element rotate and insert under mask instruction 800 includes: an opcode field 802a (e.g., bits 0-7), 802b (e.g., bits 40-47) that indicates a vector element rotate and insert under mask operation; a first vector register field 804 (e.g., bits 8-11) used to specifyA vector register (V)₁) (ii) a A second vector register field 806 (e.g., bits 12-15) used to specify a second vector register (V)₂) (ii) a A third vector register field 808 (e.g., bits 16-19) that is used to specify a third vector register (V)₃) (ii) a Immediate field (I)₄)812 (e.g., bits 24-31) that include, for example, an unsigned binary integer that specifies the number of bits used to rotate each element; mask field (M)₅)814 (e.g., bits 32-35); and an RXB field 816 (e.g., bits 36-39). In one example, each of the fields 804 and 816 are separate and independent from the opcode field(s). Further, in one embodiment, they are separate and independent from each other; however, in other embodiments, more than one field may be combined.

In one example, selected bits (e.g., the first two bits) of the opcode specified by the opcode field 802a specify the length of the instruction. In this particular example, the selected bits indicate a length of three halfwords. Further, the format of the instruction is a vector register with an extended opcode field and immediate operation. Each vector (V) field specifies a vector register along with its corresponding extension bit specified by RXB. Specifically, for a vector register, the register containing the operand is specified, for example, using a four-bit field of the register field plus its corresponding register extension bit (RXB) as the most significant bit.

M₅The field specifies the element size control (ES). The element size control specifies the size of an element in a vector register operand. In one example, M₅Bit 0 of the field specifies a single byte; bit 1 specifies a halfword (e.g., 2 bytes); bit 2 specifies a single word (e.g., 4 bytes; i.e., a full word); and bit 3 specifies a doubleword.

When the vector element rotate and insert under mask instruction is executed, each element of the second operand is rotated to the left according to the number of bits specified by the fourth operand. Each bit shifted out of the leftmost bit position of the element reenters the rightmost bit position of the element. The third operand includes a mask in each element. For each bit of 1 in the third operand, the corresponding bit of the rotated element in the second operand replaces the corresponding bit in the first operand. That is, the value of the corresponding bit of the rotated element replaces the value of the corresponding bit in the first operand. For each bit of 0 in the third operand, the corresponding bit of the first operand remains unchanged. The second and third operands remain unchanged except when the first operand is the same as the second or third operand.

The fourth operand is, for example, an unsigned binary integer that specifies the number of bits that each element in the second operand is rotated. If the value is greater than the number of bits in the specified element size, the value is reduced modulo the number of bits in the element.

In one example, the mask included in the third operand is generated using the VGM instruction described herein.

Condition codes: the code remains unchanged.

Program exception:

Operation (if vector tool for z/Architecture is not installed)

Norm of

Constraint of transaction

Programming comments:

a combination of VERIM and VGM may be used to perform all of the functions of rotating and inserting selected positioning instructions.

2. Although I₄The bits of the field are defined to contain an unsigned binary integer that specifies the number of bits to be rotated per element to the left, but negative values that effectively specify an amount of rotation to the right may also be encoded.

Further details regarding the execution of the vector element rotate and insert under mask instruction are described with reference to fig. 8B and 8C. In particular, FIG. 8B illustrates one embodiment of the logic associated with a vector element rotate and insert under mask instruction executed by a processor, and FIG. 8C illustrates graphically one example of execution of the vector element rotate and insert under mask instruction.

Referring to FIG. 8B, the selected elements of the second operand are rotated by an amount specified in the fourth operand (820) (step 830). If the value specified in the fourth operand is greater than the number of bits specified in the element size (822), the value is reduced modulo the number of bits in the element.

After rotating the bits of the elements, a masked down merge is performed (step 832). For example, for each bit (824) of 1 in the third operand, the corresponding bit of the rotated element in the second operand replaces the corresponding bit in the first operand.

Then, a determination is made as to whether there are more elements to be rotated (query 834). If there are more elements to be rotated, processing continues with step 830. Otherwise, processing is complete (step 836).

Referring to FIG. 8C, the elements of the second operand are rotated 830 based on inputs 820 and 822 as shown. Further, 832 masked down merge is performed using input 824. An output is provided in the first operand 850.

One example of a vector element rotation and insert under mask instruction is described above. The instruction is used to rotate elements in the selected operand by a defined number of bits. Although bits have been specified, in another embodiment, the element can be rotated by the number of positions, and the positions may be positions other than bits. Furthermore, the instruction may be used for different element sizes.

As one example, such an instruction is used to select a particular bit range from a number of table lookups.

During execution of a particular vector instruction or other SIMD operation, an exception may occur. When an exception occurs on a SIMD operation, it is generally not known which element of the vector register caused the exception. The software interrupt handler must fetch each element and re-perform the computation in scalar mode to determine which element or elements caused the exception. However, according to one aspect, when a machine (e.g., a processor) handles a program interrupt caused by a vector operation, an element index will be reported, for example, indicating the lowest index element in the vector that caused the exception. The software interrupt handler can then immediately jump to the problematic element and perform any required or desired action.

For example, in one embodiment, when a vector data exception causes a program interrupt, the vector exception code (VXC) is stored, for example, in a real storage location (e.g., location 147 (X)^︳93^︳) And 0 is stored, for example, in true storage location 144-146 (X)^︳90^︳-X^︳92^︳) To (3). In another embodiment, if the specified bit (e.g., bit 45) of the specified control register (e.g., CR0) is 1, then VXC is also placed in the data exception code (DXC) field of the floating point control register. DXC and location 147 (X) of the FPC register when bit 45 of control register 0 is 0 and bit 46 of control register 0 is 1^︳93^︳) The stored content of (b) is unpredictable.

In one embodiment, VXC distinguishes between multiple types of vector floating point exceptions and indicates which element caused the exception. In one example, as shown in fig. 9A, the vector exception code 900 includes a Vector Index (VIX)902, a Vector Interrupt Code (VIC) 904. In one example, the vector index includes bits 0-3 of the vector exception code, and its value is the index of the leftmost element of the selected vector register that identifies the exception. Further, the vector interrupt code is included in bits 4-7 of the vector exception code and has, for example, the following values:

in another embodiment, VXC includes only the vector index or other position indicator of the element that caused the exception.

In one embodiment, VXC may be set by a plurality of instructions, including, for example, the following instructions: vector Flowing Point (FP) Add, Vector FP Complex Scale, Vector FP Complex Equal, Vector FP complex High or Equal, Vector FP Convert From filtered 64-Bit, Vector FP Convert to filtered 64-Bit, Vector FP Divide, Vector Load FP Integer, Vector FP Load Length, Vector FP Load Rounded, Vector FP Multiply, Vector FPMultiplex and Add, Vector FP Multi and bypass, Vector Square and Vector FP bypass, and other types of vectors and/or Floating Point instructions.

Further details regarding setting vector exception codes are described with reference to FIG. 9B. In one embodiment, a processor of the computing environment executes the logic.

Referring to FIG. 9B, first, an instruction to perform an operation on the vector register, such as one of the instructions listed above or another instruction, is executed (step 920). During execution of the instruction, an exception condition is encountered (step 922). In one example, the exception condition causes an interrupt. A determination is made as to which element of the vector register caused the exception (step 924). For example, one or more hardware units of a processor that perform computations on one or more elements of a vector register determine an exception and provide a signal. For example, if multiple hardware units perform computations in parallel for multiple elements of a vector register, and an exception is encountered during processing of one or more elements, the hardware unit(s) performing the processing that encountered the exception signal an exception condition, along with an indication of the element it is processing. In another embodiment, if the elements of the vector are executed in order, and an exception is encountered during processing of the elements, the hardware will indicate what element in the sequence it is processing when the exception occurs.

Based on the signaled exception, a vector exception code is set (step 926). This includes, for example, indicating the location of the element in the vector register that caused the exception, and an interrupt code.

Vector exception code providing valid vector exception handling is described in detail above. In one example, when a machine processes a program interrupt caused by a vector operation, an element index will be reported that indicates the lowest index element in the vector register that caused the exception. As a specific example, if vector add is being performed and each vector register has two elements, providing a0+ B0 and a1+ B1, and an inaccurate result is received for a0+ B0 instead of a1+ B1, VIX is set to 0 and VIC is set equal to 0101. In another example, if an occurrence occurs where a0+ B0 does not receive an exception, but a1+ B1 receives an exception, VIX is set equal to 1(VIC 0101). If both encounter an exception, VIX is set to 0, so it is the leftmost index position, and VIC 0101.

Various vector instructions and vector exception codes are described in detail above, the exception code indicating the location of an exception within a vector register. In the flow charts provided, certain processes may occur sequentially; however, in one or more embodiments, the elements are processed in parallel, so, for example, it may not be necessary to check whether there are more elements to be processed. Many other variations are possible.

Furthermore, in other embodiments, the contents of one or more fields of an instruction may be provided in a general purpose register, in memory, in elements of a vector register (differing by element), by address calculation. They may be included as explicit operands to the instruction or as implicit operands or inputs. Further, one or more instructions may use fewer operands or inputs, and conversely, one or more operands may be used for multiple operations or steps.

Furthermore, as described above, element size control may be provided in other ways than by including element size control in a field of an instruction. Further, the element size may be specified by an opcode. For example, the particular opcode of the instruction specifies the operation and the size of the element, etc.

Herein, memory, main memory, storage, and main storage are used interchangeably unless otherwise indicated explicitly or by context.

As will be appreciated by one skilled in the art, the present invention may be embodied as a system, method or computer program product. Accordingly, the present disclosure may be embodied in the form of: may be embodied entirely in hardware, entirely in software (including firmware, resident software, micro-code, etc.) or in a combination of hardware and software, and may be referred to herein generally as a "circuit," module "or" system. Furthermore, in some embodiments, the invention may also be embodied in the form of a computer program product in one or more computer-readable media having computer-readable program code embodied in the medium.

Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

Referring now to FIG. 10, in one example, a computer program product 1000 includes, for instance, one or more non-transitory computer-readable storage media 1002 having computer-readable program code means or logic 1004 stored thereon to provide and facilitate one or more aspects of the present invention.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The present invention is described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to one or more embodiments. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable medium produce an article of manufacture including instruction means (instructions) which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition to the foregoing, one or more aspects of the present invention may be provided, offered, deployed, managed, serviced, etc. by a service provider who offers management of a user's environment. For example, a service provider can create, maintain, support, etc., computer code and/or computer infrastructure that performs one or more aspects of the present invention for one or more users. The service provider, in turn, may accept payment from the user, for example, according to a subscription and/or fee agreement. Additionally or alternatively, the service provider may receive payment from the sale of advertising content to one or more third parties.

In one aspect of the invention, an application may be deployed to perform one or more aspects of the invention. As one example, deploying an application comprises providing a computer infrastructure operable to perform one or more aspects of the present invention.

As yet another aspect of the present invention, a computing infrastructure may be deployed comprising integrating computer-readable code into a computer system, wherein the code in combination with the computing system is capable of performing one or more aspects of the present invention.

As yet another aspect of the present invention, a process for integrating computing infrastructure comprising integrating computer readable code into a computer system may be provided. The computer system includes a computer-readable medium, wherein the computer medium includes one or more aspects of the present invention. The code in combination with the computer system is capable of performing one or more aspects of the present invention.

Although various embodiments are described above, these embodiments are by way of example only. For example, computing environments with other architectures may incorporate and use one or more aspects. Moreover, vectors having other sizes may be used, and changes to the instructions may be made without departing from one or more aspects. Further, registers other than vector registers may be used. Furthermore, in other embodiments, the vector operands may be storage locations, rather than vector registers. Other variations are also possible.

Moreover, other types of computing environments may benefit from one or more aspects of the present invention. By way of example, a data processing system suitable for storing and/or executing program code will be used that includes at least two processors coupled directly or indirectly to memory elements through a system bus. The memory elements include, for instance, local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.

Input/output or I/O devices (including but not limited to keyboards, displays, pointing devices, DASD, magnetic tape, CDs, DVDs, thumb drives, and other storage media) can be coupled to the system either directly or through intervening I/O controllers. Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the available types of network adapters.

Referring to FIG. 11, representative components of a host computer system 5000 to implement one or more aspects of the present invention are depicted. The representative host computer 5000 includes one or more CPUs 5001 in communication with computer memory (i.e., central storage) 5002, and I/O interfaces to storage media devices 5011 and networks 5010 for communicating with other computers or SANs and the like. The CPU 5001 conforms to an architecture having an architectural instruction set and architectural functions. The CPU 5001 may have Dynamic Address Translation (DAT)5003 for translating program addresses (virtual addresses) to real addresses of memory. A DAT typically includes a Translation Lookaside Buffer (TLB)5007 for caching translations so that later accesses to a block of computer memory 5002 do not require the delay of address translation. Typically, a cache 5009 is used between the computer memory 5002 and the processor 5001. The cache 5009 may be hierarchical, having a large cache available to more than one CPU, and smaller, faster (lower level) caches between the large cache and each CPU. In some embodiments, the lower level cache is split to provide separate lower level caches for instruction fetching and data accesses. In one embodiment, instructions are fetched from memory 5002 by instruction fetch unit 5004 via cache 5009. The instructions are decoded in the instruction decode unit 5006 and (in some embodiments, with other instructions) sent to the one or more instruction execution units 5008. Typically, several execution units 5008 are used, such as an arithmetic execution unit, a floating point execution unit, and a branch instruction execution unit. The instruction is executed by the execution unit, accessing operands from registers or memory specified by the instruction, as needed. If an operand is to be accessed (loaded or stored) from memory 5002, load/store unit 5005 typically handles the access under the control of the instruction being executed. The instructions may be executed in hardware circuitry, or in internal microcode (firmware), or in a combination thereof.

Note that the computer system includes information in local (or main) memory, as well as addressing, protection, and reference and change records. Some aspects of addressing include address format, concept of address space, various types of addresses, and the manner in which one type of address is translated to another type of address. Some main memories include permanently allocated memory locations. The main memory provides the system with fast-access data storage that is directly addressable. Both data and programs will be loaded into main memory (from the input device) before they can be processed.

The main memory may include one or more smaller, faster-access cache memories, sometimes referred to as caches. The cache is typically physically associated with the CPU or I/O processor. The effects of the physical structure and use of different storage media are not typically observed by a program except in terms of performance.

Separate caches for instruction and data operands may be maintained. Information in a cache may be maintained as contiguous bytes on integer boundaries called cache blocks or cache lines (or simply lines). The model may provide an EXTRACT CACHE ATTRIBUTE instruction that returns the byte size of the CACHE line. The model may also provide PREFETCH DATA (prefetch data) and PREFETCH DATA relative issue (prefetch longer data) instructions that enable prefetching for storage into a data or instruction cache, or release of data from the cache.

The memory is considered to be a long horizontal string of bits. For most operations, accesses to memory are made in left-to-right order. The bit string is subdivided into units of eight bits. The eight-bit unit is called a byte, which is the basic building block for all information formats. Each byte location in memory is identified by a unique non-negative integer, which is the address of the byte location, or simply, the byte address. Adjacent byte positions have consecutive addresses, starting at 0 on the left and proceeding in left to right order. The address is an unsigned binary integer and is 24, 31 or 64 bits.

Information is transferred between the memory and the CPU or channel subsystem one byte or a group of bytes at a time. Unless otherwise specified, for example in the z/Architecture, a group of bytes in memory is addressed by the leftmost byte of the group. The number of bytes in a group may be implied or explicitly specified by the operation to be performed. When used in CPU operations, a group of bytes is called a field. Within each group of bytes, e.g., in the z/Architecture, the bits are numbered in left-to-right order. In the z/Architecture, the leftmost bits are sometimes referred to as "high order" bits and the rightmost bits as "low order" bits. However, the number of bits is not a memory address. Only bytes can be addressed. To operate on a single bit of a byte in memory, the entire byte is accessed. The bits on a byte are numbered 0 to 7 from left to right (e.g., in z/Architecture). The bits in the address are numbered 8-31 or 40-63 for a 24-bit address, 1-31 or 33-63 for a 31-bit address, and 0-63 for a 64-bit address. In any other fixed length format of a plurality of bytes, the bits that make up the format are numbered consecutively starting from 0. For error detection, and preferably for correction, one or more check bits may be passed with each byte or group of bytes. Such check bits are automatically generated by the machine and cannot be directly controlled by the program. The storage capacity is expressed in number of bytes. When the length of a memory operand field is implied by the opcode of the instruction, the field is said to have a fixed length, which may be one, two, four, eight, or sixteen bytes. Larger fields may be implied for some instructions. When the length of the memory operand field is not implied but explicitly indicated, the field is said to have a variable length. Variable length operands may be variable in length in increments of one byte (or for some instructions, in multiples of two bytes or other multiples). When information is placed in memory, only the contents of those byte locations included in the specified field are replaced, even though the width of the physical path to memory may be greater than the length of the field being stored.

Some units of information are located on integer limits in memory. For a unit of information, a bound is said to be an integer when its memory address is a multiple of the length of the unit in bytes. Specific names are given to fields of 2, 4,8 and 16 bytes on the integer limit. A halfword is a set of two consecutive bytes on a two-byte boundary and is the basic building block of instructions. A word is a set of four consecutive bytes on a four-byte boundary. A doubleword is a set of eight consecutive bytes on an eight-byte boundary. A quad word (quadword) is a set of 16 contiguous bytes on a 16-byte boundary. When a memory address specifies a halfword, a word, a doubleword, and a quadword, the binary representation of the address includes one, two, three, or four rightmost zero bits, respectively. The instruction will be on a two-byte integer boundary. Most instructions have memory operands that do not have boundary alignment requirements.

On devices that implement separate caches for instructions and data operands, significant delays may be experienced if a program stores in a cache line and an instruction is subsequently fetched from the cache line, regardless of whether the store alters the subsequently fetched instruction.

In one embodiment, the invention may be implemented by software (sometimes referred to as licensed internal code, firmware, microcode, millicode, picocode, etc., any of which may be consistent with one or more aspects). Referring to fig. 11, software program code embodying one or more aspects is typically accessible by the processor 5001 of the host system 5000 from a long-term storage media device 5011, such as a CD-ROM drive, tape drive or hard drive. The software program code may be embodied on any of a variety of known media for use with a data processing system, such as a floppy disk, a hard drive, or a CD-ROM. The code may be distributed on such media, or may be distributed to users of other computer systems from the computer memory 5002 or storage devices of one computer system over the network 5010 for use by users of such other systems.

The software program code includes an operating system which controls the function and interaction of the various computer components and one or more application programs. The program code is typically paged from the storage media device 5011 to the relatively higher speed computer memory 5002 where it is available to the processor 5001. Techniques and methods for embodying software program code in memory, on physical media, and/or distributing software code via networks are well known and will not be discussed further herein. When the program code is created and stored on a tangible medium, including but not limited to an electronic memory module (RAM), flash memory, Compact Discs (CDs), DVDs, tapes, etc., it is often referred to as a "computer program product". The computer program product medium is typically readable by a processing circuit, preferably located in a computer system, for execution by the processing circuit

FIG. 12 illustrates a representative workstation or server hardware system in which one or more aspects may be implemented. The system 5020 of fig. 12 includes a representative base computer system (base computer system)5021, such as a personal computer, workstation or server, including optional peripherals. A basic computer system 5021 comprises one or more processors 5026 and a bus used to connect and enable communication between the processors 5026 and other components of the system 5021, in accordance with known techniques. The bus connects the processor 5026 to memory 5025 and long-term storage 5027 which may comprise a hard disk drive (including any of magnetic media, CD, DVD, and flash memory, for example) or a tape drive, for example. The system 5021 can also include a user interface adapter that connects the microprocessor 5026 via the bus to one or more interface devices, such as a keyboard 5024, a mouse 5023, a printer/scanner 5030, and/or other interface devices, which can be any user interface device, such as a touch-sensitive screen, a digital input pad (digitized entry pad), etc. The bus may also connect a display device 5022, such as an LCD screen or monitor, to the microprocessor 5026 via a display adapter.

The system 5021 may communicate with other computers or networks of computers via a network adapter capable of communicating 5028 with a network 5029. Exemplary network adapters are communications channels, token ring, Ethernet or modems. Alternatively, the system 5021 may communicate using a wireless interface, such as a CDPD (cellular digital packet data) card. The system 5021 can be associated with such other computers in a Local Area Network (LAN) or a Wide Area Network (WAN), or the system 5021 can be a client in a client/server arrangement with another computer, etc. All of these configurations, as well as suitable communication hardware and software, are known in the art.

Fig. 13 illustrates a data processing network 5040 in which one or more aspects may be implemented. The data processing network 5040 may include a plurality of separate networks, such as wireless and wired networks, each of which may include a plurality of separate workstations 5041, 5042, 5043, 5044. Further, those skilled in the art will appreciate that one or more LANs may be included, wherein a LAN may include a plurality of intelligent workstations coupled to a host processor.

Still referring to FIG. 13, the network may also include mainframe computers or servers, such as a gateway computer (client server 5046) or application server (remote server 5048, which may access a data repository and may also be accessed directly from a workstation 5045). The gateway computer 5046 serves as a point of entry into each individual network. When connecting one networking protocol to another, a gateway is required. The gateway 5046 may preferably be coupled to another network (e.g., the internet 5047) by a communications link. The gateway 5046 may also be directly coupled to one or more workstations 5041, 5042, 5043, 5044 using a communications link. The gateway computer may be implemented using an IBM eServer TMSystemz server available from International Business machines corporation.

Referring concurrently to fig. 12 and 13, software programming code which may embody one or more aspects of the present invention may be accessed by the processor 5026 of the system 5020 from long-term storage media 5027, such as a CD-ROM drive or hard drive. The software programming code may be embodied on any of a variety of known media for use with a data processing system, such as a floppy disk, a hard drive, or a CD-ROM. The code may be distributed on such media, or from the memory or storage of one computer system over a network to users 5050, 5051 of other computer systems for use by users of such other systems.

Alternatively, the programming code may be embodied in the memory 5025 and accessed by the processor 5026 using a processor bus. Such programming code includes an operating system which controls the function and interaction of the various computer components and one or more application programs 5032. Program code is typically paged from the storage medium 5027 to high-speed memory 5025 where it is available for processing by the processor 5026. Techniques and methods for embodying software programming code in memory, on physical media, and/or distributing software code via networks are well known and will not be discussed further herein. Program code, when created and stored on tangible media, including but not limited to electronic memory modules (RAM), flash memory, Compact Discs (CDs), DVDs, tapes, etc., is commonly referred to as a "computer program product". The computer program product medium is typically readable by a processing circuit, preferably located in a computer system, for execution by the processing circuit.

The cache most readily used by the processor (which is typically faster and smaller than the other caches of the processor) is the lowest level (L1 or level 1) cache, and main storage (main memory) is the highest level cache (L3 if there are three levels). The lowest level cache is often divided into an instruction cache (I-cache) that holds the machine instructions to be executed, and a data cache (D-cache) that holds the data operands.

Referring to FIG. 14, an exemplary processor embodiment is shown for the processor 5026. Typically, one or more levels of cache 5053 are used to buffer memory blocks in order to improve processor performance. The cache 5053 is a cache buffer that holds cache lines of memory data that are likely to be used. Typical cache lines are 64, 128 or 256 bytes of memory data. A separate cache is typically used for caching instructions rather than data. Cache coherency (synchronization of copies of lines in memory and cache) is typically provided by various "snoop" algorithms well known in the art. The main memory 5025 of the processor system is commonly referred to as a cache. In a processor system having 4 levels of cache 5053, main memory 5025 is sometimes referred to as a level 5(L5) cache, because it is typically faster and maintains only a portion of the non-volatile storage (DASD, tape, etc.) that is available to the computer system. Main memory 5025 may "cache" pages of data paged in and out of main memory 5025 by the operating system.

Program counter (instruction counter) 5061 keeps track of the address of the current instruction to be executed. The program counter in the z/Architecture processor is 64 bits and may be truncated to 31 or 24 bits to support the previous addressing limits. The program counter is typically embodied in the computer's PSW (program status word) so that it persists during context transitions. Thus, an in-progress program having a program counter value may be interrupted by, for example, an operating system (context switch from a program environment to an operating system environment). When a program is inactive, the PSW of the program maintains a program counter value, and while the operating system executes, the program counter (in the PSW) of the operating system is used. Typically, the program counter is incremented by an amount equal to the number of bytes of the current instruction. RISC (reduced instruction set computing) instructions are typically of fixed length, while CISC (Complex instruction set computing) instructions are typically of variable length. The instructions of the IBMz/Architecture are CISC instructions having a length of 2, 4, or 6 bytes. Program counter 5061 is modified by, for example, a context switch operation or a branch taken operation of a branch instruction. In a context switch operation, the current program counter value is saved in a program status word along with other status information about the program being executed (such as condition codes), and a new program counter value is loaded and points to the instruction of the new program module to be executed. A branch taken operation is performed to allow the program to make a decision or loop within the program by loading the result of the branch instruction into the program counter 5061.

Typically, instructions are fetched on behalf of the processor 5026 using an instruction fetch unit 5055. The fetch unit may fetch a "next sequence of instructions," a target instruction of a branch taken instruction, or a first instruction of a context-switched program. Present instruction fetch units typically use prefetch techniques to speculatively prefetch instructions based on the likelihood that the prefetched instructions will be used. For example, the fetch unit may fetch 16 bytes of instructions, including the next sequential instruction and additional bytes of further sequential instructions.

The fetched instructions are then executed by the processor 5026. In one embodiment, the fetched instructions are passed to the dispatch unit 5056 of the fetch unit. The dispatch unit decodes the instructions and forwards information about the decoded instructions to the appropriate units 5057, 5058, 5060. The execution unit 5057 will typically receive information from the instruction fetch unit 5055 regarding decoded arithmetic instructions, and will perform arithmetic operations on operands according to the opcode of the instruction. Operands are preferably provided to the execution unit 5057 from storage 5025, architectural registers 5059, or from an immediate field (immediate field) of the instruction being executed. The results of the execution, when stored, are stored in storage 5025, registers 5059, or other machine hardware (such as control registers, PSW registers, etc.).

The processor 5026 typically has one or more units 5057, 5058, 5060 for performing the function of instructions. Referring to fig. 15A, an execution unit 5057 may communicate with architected general registers 5059, decode/dispatch unit 5056, load store unit 5060, and other 5065 processor units via interface logic 5071. The execution unit 5057 may use several register circuits 5067, 5068, 5069 to hold information that the Arithmetic Logic Unit (ALU)5066 is to operate on. The ALU performs arithmetic operations such as add, subtract, multiply, divide, and logical operations such as AND, OR, and exclusive OR (XOR), rotate, and shift. Preferably, the ALU supports specialized operations that are design dependent. Other circuitry may provide other architectural tools 5072, including condition codes and recovery support logic, for example. Typically, the results of the ALU operations are held in output register circuitry 5070, which may forward the results to a variety of other processing functions. There are many processor unit arrangements and this description is intended only to provide a representative understanding of one embodiment.

For example, ADD instructions will be executed in an execution unit 5057 having arithmetic and logical functionality, while floating point instructions will be executed in floating point execution with dedicated floating point capabilities, for example. Preferably, the execution unit operates on the operands identified by the instruction by executing the function defined by the opcode on the operands. For example, an ADD instruction may be executed by the execution unit 5057 on operands found in two registers 5059 identified by register fields of the instruction.

The execution unit 5057 performs arithmetic addition on two operands and stores the result in a third operand, which may be a third register or one of the two source registers. The execution unit preferably utilizes an Arithmetic Logic Unit (ALU)5066, which can perform a variety of logical functions, such as shifting, rotating, and, OR, XOR, and any of a variety of algebraic functions, including addition, subtraction, multiplication, and division. Some ALUs 5066 are designed for scalar operations and some for floating point. Depending on the architecture, the data may be big endien (where the least significant byte is located at the most significant byte address) or little endian (where the least significant byte is located at the least significant byte address). IBMz/Architecture is the big end. Depending on the architecture, the signed field may be sign and magnitude, 1's complement, or 2's complement. A 2's complement number is advantageous in that the ALU does not need to design subtraction capability because only addition in the ALU is required, whether negative or positive in the 2's complement. The numbers are typically described in shorthand, where a 12-bit field defines the address of a block of 4096 bytes, and are typically described as a 4Kbyte block, for example.

Referring to FIG. 15B, branch instruction information for executing a branch instruction is typically sent to a branch unit 5058, which often predicts branch outcome before other conditional operations are completed, using a branch prediction algorithm such as branch history table 5082. Before the conditional operation completes, the target of the current branch instruction will be fetched and speculatively executed. When the conditional operation completes, the speculatively executed branch instruction is either completed or discarded based on the condition of the conditional operation and the speculative result. Typical branch instructions may test the condition code and branch to a target address if the condition code satisfies the branch requirement of the branch instruction, the branch address may be calculated based on a number including, for example, a number found in a register field or an immediate field of the instruction. The branch unit 5058 may utilize an ALU 5074 having a plurality of input register circuits 5075, 5076, 5077 and an output register circuit 5080. The branch unit 5058 may communicate with, for example, general registers 5059, decode dispatch unit 5056, or other circuitry 5073.

Execution of a set of instructions may be interrupted for a number of reasons including, for example, a context switch initiated by the operating system, a program exception or error causing a context switch, an I/O interrupt signal causing a context switch, or multi-threaded activity of multiple programs (in a multi-threaded environment). Preferably, the context switch action saves state information about the currently executing program and then loads state information about another program being invoked. The state information may be stored, for example, in hardware registers or memory. The state information preferably includes a program counter value pointing to the next instruction to be executed, condition codes, memory translation information and architectural register contents. The context translation activities may be implemented by hardware circuitry, application programs, operating system programs, or firmware code (microcode, pico code, or Licensed Internal Code (LIC)), alone or in combination.

The processor accesses operands according to the instruction defined method. An instruction may provide an immediate operand using the value of a portion of the instruction, may provide one or more register fields that explicitly point to general purpose registers or special purpose registers (e.g., floating point registers). The instruction may utilize the implied register determined by the opcode field as an operand. The instruction may utilize memory locations for operands. The memory location of the operand may be provided by a register, an immediate field, or a combination of a register and an immediate field, as exemplified by the z/Architecture long displacement facility (facility), where the instruction defines a base register, an index register, and an immediate field (displacement field) that add together to provide, for example, the address of the operand in memory. Unless otherwise indicated, a location herein typically means a location in main memory (main storage).

Referring to fig. 15C, a processor accesses a memory using a load/store unit 5060. The load/store unit 5060 may perform a load operation by obtaining the address of a target operand in memory 5053 and loading the operand into a register 5059 or other memory 5053 location, or may perform a store operation by obtaining the address of a target operand in memory 5053 and storing data obtained from a register 5059 or another memory 5053 location in the target operand location in memory 5053. The load/store unit 5060 may be speculative and may access memory in an out-of-order relative to instruction order, but the load/store unit 5060 will maintain the appearance to a program that instructions are executed in order. The load/store unit 5060 may communicate with general registers 5059, decryption/dispatch unit 5056, cache/memory interface 5053 or other elements 5083, and includes various register circuits, ALUs 5085 and control logic 5090 to calculate memory addresses and provide pipeline order to keep operations in order. Some operations may be out of order, but the load/store unit provides functionality such that operations that are performed out of order appear to the program as if they were performed in order, as is well known in the art.

Preferably, the addresses that are "seen" by the application are commonly referred to as virtual addresses. Virtual addresses are sometimes referred to as "logical addresses" and "effective addresses". These virtual addresses are virtual in that they are redirected to a physical memory location by one of a variety of Dynamic Address Translation (DAT) techniques including, but not limited to, simply prefixing the virtual address with an offset value, translating the virtual address via one or more translation tables, preferably including at least one segment table and a page table (alone or in combination), preferably the segment table having an entry pointing to the page table. In the z/Architecture, a translation hierarchy is provided that includes a region first table, a region second table, a region third table, a segment table, and an optional page table. The performance of address translation is typically improved by utilizing a Translation Lookaside Buffer (TLB) that includes entries that map virtual addresses to associated physical memory locations. When a DAT translates a virtual address using a translation table, an entry is created. Subsequent use of the virtual address may then utilize the entry of the fast TLB, rather than the slow sequential translation table access. TLB content may be managed by a plurality of replacement algorithms including LRU (least recently used).

Where the processors are processors of a multi-processor system, each processor has the responsibility of maintaining shared resources, such as I/O, caches, TLBs, and memory, which are interlocked to achieve coherency. Typically, "snooping" techniques will be used to maintain cache coherency. In a snooping environment, each cache line may be marked as being in one of a shared state, an exclusive state, a changed state, an invalid state, etc., to facilitate sharing.

The I/O unit 5054 (fig. 14) provides the processor with means for attaching to peripheral devices including, for example, tapes, disks, printers, displays, and networks. The I/O cells are typically presented to the computer program by a software driver. In a mainframe computer such as System z from IBM, channel adapters and open System adapters are the I/O units of the mainframe computer that provide communication between the operating System and peripheral devices.

Moreover, other types of computing environments may benefit from one or more aspects of the present invention. By way of example, an environment may include an emulator (e.g., software or other emulation mechanisms), in which a particular architecture (including, for example, instruction execution, architectural functions such as address translation, and architectural registers) or a subset thereof is emulated (e.g., in a native computer system having a processor and memory). In such an environment, one or more emulation functions of the emulator can implement one or more aspects of the present invention, even though the computer executing the emulator may have a different architecture than the capabilities being emulated. As one example, in emulation mode, a particular instruction or operation being emulated is decoded, and the appropriate emulation function is established to implement the single instruction or operation.

In an emulation environment, a host computer includes, for example, memory to store instructions and data; an instruction fetch unit to fetch instructions from memory and, optionally, to provide local buffering of fetched instructions; an instruction decode unit to receive the fetched instruction and determine a type of instruction that has been fetched; and an instruction execution unit to execute the instruction. Execution may include loading data from memory to a register; storing data from the register back to the memory; or perform some type of arithmetic or logical operation as determined by the decode unit. In one example, each unit is implemented in software. For example, the operations performed by the units are implemented as one or more subroutines in emulator software.

More specifically, in a mainframe computer, programmers (typically today's "C" programmers) typically use architected machine instructions through compiler applications. The instructions stored in the storage medium may be inEither locally in a server or in a machine executing other architectures. They may be present and futureMainframe computer server andother machines (e.g., Power Systems servers and Systems)Server) is simulated. They can be used byAMD^TMEtc. are executed in machines running Linux on various machines of manufactured hardware. In addition to executing on this hardware under the z/Architecture, Linux can also be used for machines that use emulation provided by Hercules, UMX, or FSI (fundamental software, Inc), where execution is generally in emulation mode. In emulation mode, emulation software is executed by the native processor to emulate the architecture of the emulated processor.

The native processor typically executes emulation software, which includes firmware or a native operating system, to execute an emulation program of the emulated processor. The emulation software is responsible for fetching and executing instructions of the emulated processor architecture. The emulation software maintains an emulated program counter to keep track of instruction boundaries. The emulation software can fetch one or more emulated machine instructions at a time and convert the one or more emulated machine instructions into a corresponding set of native machine instructions for execution by the native processor. These translated instructions may be cached so that faster translations may be accomplished. The emulation software will maintain the architectural rules of the emulated processor architecture to ensure that the operating system and applications written for the emulated processor operate correctly. Furthermore, the emulation software will provide resources determined by the emulated processor architecture, including but not limited to control registers, general purpose registers, floating point registers, dynamic address translation functions including, for example, segment and page tables, interrupt mechanisms, context translation mechanisms, time of day (TOD) clocks, and architectural interfaces to the I/O subsystem, such that operating systems or applications designed to run on the emulated processor may run on the native processor with the emulation software.

The particular instruction being emulated is decoded and a subroutine is called to perform the function of that single instruction. The emulation software functions that emulate the functions of an emulated processor are implemented, for example, in a "C" subroutine or driver, or by other methods that provide drivers for specific hardware, as will be understood by those skilled in the art after understanding the description of one or more embodiments. Including, but not limited to, U.S. patent No. 5,551,013 entitled "Multiprocessor for Hardware Emulation" to beaussoleil et al; and U.S. patent certificate No. 6,009,261 entitled "Preprocessing of StoredPargeways for simulating incorporated Instructions on a Target Processor" to Scalazi et al; and U.S. patent document No. 5,574,873 entitled "Decoding Guest Instructions instruction instructions" by Davidian et al; and U.S. patent certificate No. 6,308,255 entitled "symmetric multiprocessing bus and chip Used for multiprocessor support Non-Native Code to Runin a System" to Gorishek et al; and U.S. patent document No. 6,463,582 entitled "dynamic optimizing Object code translator for Architecture implementation and dynamic optimizing Object code Translation Method" by Lethin et al; and U.S. patent publication No. 5,790,825 to EricTraut entitled "Method for simulating Guest instruments on a Host computer thread Dynamic Recompression of Host instruments" (the entire contents of these patents are incorporated herein by reference); as well as numerous other patents, show various known ways to implement emulation of instruction formats architected for different machines for a target machine available to those skilled in the art.

In fig. 16, an example of an emulated host computer system 5092 is provided that emulates a host computer system 5000' of a host architecture. In the emulated host computer system 5092, the host processor (CPU)5091 is an emulated host processor (or virtual host processor) and includes an emulated processor 5093 having a different native instruction set architecture than the processor 5091 of the host computer 5000'. The emulation host computer system 5092 has a memory 5094 accessible by an emulation processor 5093. In the exemplary embodiment, memory 5094 is partitioned into a host computer memory 5096 portion and an emulation routines 5097 portion. Host computer memory 5096 is available to programs emulating host computer 5092, according to the host computer architecture. The emulation processor 5093 executes native instructions of an architected instruction set of a different architecture than the emulated processor 5091 (i.e., native instructions from the emulated program processor 5097), and may access host instructions for execution from programs in the host computer memory 5096 by using one or more instructions obtained from a sequence and access/decode routine that may decode the accessed host instructions to determine a native instruction execution routine for emulating the function of the accessed host instructions. Other tools defined for the host computer system 5000' architecture may be emulated by the architecture tool routines, including such tools as general purpose registers, control registers, dynamic address translation and I/O subsystem support and processor caches. The emulation routine may also take advantage of functions available in the emulation processor 5093, such as dynamic translation of general purpose registers and virtual addresses, to improve the performance of the emulation routine. Specialized hardware and offload engines may also be provided to assist the processor 5093 in emulating the functionality of the host computer 5000'.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, and/or components.

The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below, if any, are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of one or more aspects is presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of one or more aspects. The embodiments were chosen and described in order to best explain the principles of one or more aspects and the practical application, and to enable others of ordinary skill in the art to understand one or more aspects for various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. A method of facilitating exception handling, the method comprising the steps of:

determining, by a processor, that an instruction executing within a computing environment has caused an exception, the instruction comprising at least one Single Instruction Multiple Data (SIMD) operation and performing the operation against a vector register comprising a plurality of elements; and

obtaining a vector exception code based on the exception, the vector exception code including a location within the vector register of an element of the plurality of elements of the vector register that caused the exception.

2. The method of claim 1, wherein the location comprises an index within the vector register corresponding to the element causing the exception.

3. The method of claim 1, wherein the location comprises a lowest index element within the vector register that caused the exception.

4. The method of claim 1, wherein the obtaining is based on the exception causing an interrupt.

5. The method of claim 1, wherein the vector exception code comprises a first portion to specify the location and a second portion to specify a vector interrupt code.

6. The method of claim 5, wherein the location comprises a lowest index element in the vector register that caused the exception.

7. The method of claim 5, wherein the vector interrupt code comprises a value indicating one of an invalid operation, divide by 0, overflow, underflow, or inaccurate result.

8. The method of claim 1, wherein the method further comprises determining which element or elements in the vector register caused the exception, and based on determining which element or elements caused the exception, obtaining the location to be included in the vector exception code.

9. The method of claim 8, wherein obtaining the location comprises determining a lowest index element of one or more elements that caused the exception, and using an index of the lowest index element as the location.

10. The method of claim 1, wherein the method further comprises placing the vector exception code within a data exception code field of a floating point control register.

11. The method of claim 1, wherein a size of the element causing the exception is specified in a field of the instruction.

12. A system for facilitating exception handling comprising means adapted for carrying out all the steps of the method according to any preceding method claim.