HK1201351B - Convert to zoned format from decimal floating point format - Google Patents
Convert to zoned format from decimal floating point format Download PDFInfo
- Publication number
- HK1201351B HK1201351B HK15101782.3A HK15101782A HK1201351B HK 1201351 B HK1201351 B HK 1201351B HK 15101782 A HK15101782 A HK 15101782A HK 1201351 B HK1201351 B HK 1201351B
- Authority
- HK
- Hong Kong
- Prior art keywords
- field
- operand
- instruction
- memory
- sign
- Prior art date
Links
Description
Technical Field
One aspect of the present invention relates generally to processing within a computing environment, and in particular, to converting data from one format to another.
Background
Data may be stored in internal computer memory or external memory in many different formats, including Extended Binary Coded Decimal Interchange Code (EBCDIC), American Standard Code for Information Interchange (ASCII), and decimal floating point, among others.
Different computer architectures support different data formats, and it may be desirable to perform operations on a particular format. In this case, it may be necessary to convert data in one format to the desired format.
In addition, conventionally, operations for processing numerical decimal data stored in EBCDIC or ASCII format in a database operate directly on memory. These operations, and the execution of these operations, referred to as memory-to-memory decimal operations, are limited by the latency of the memory interface. Each operation that depends on the result from a previous operation must wait until the result is written out to memory before it can begin. As the gap between memory latency and processor speed continues to increase, the relative execution of these operations continues to decrease.
Disclosure of Invention
The shortcomings of the prior art are overcome and advantages are provided through the provision of a computer program product for executing machine instructions in a central processing unit. The computer program product includes a computer-readable storage medium readable by a processing circuit and storing instructions for execution by the processing circuit for performing a method. The method includes obtaining, for execution, by a processor, a machine instruction defined for computer execution according to a computer architecture, the machine instruction including: at least one opcode field to provide an opcode, the opcode identifying a convert from decimal floating point to zoned function; a first register field specifying a first register, the first register containing a first operand; a second register field and a displacement field, wherein contents of a second register specified by the second register field are combined with contents of the displacement field to form an address of a second operand; and a mask field, the mask field including one or more controls used during execution of the machine instruction; and executing the machine instruction, the executing comprising: converting at least a portion of the first operand in decimal floating point format to a zoned format; and placing the result of the conversion at a location specified by the address of the second operand.
Methods and systems relating to one or more aspects of the present invention are also described and claimed herein. Additionally, services relating to one or more aspects of the present invention are also described and may be claimed herein.
Additional features and advantages are realized through the techniques of the present invention. Other embodiments and aspects of the invention are described in detail herein and are considered a part of the claimed invention.
Drawings
One or more aspects of the present invention are particularly pointed out and distinctly claimed as examples in the claims at the conclusion of the specification. The foregoing and other objects, features and advantages of the invention will be apparent from the following detailed description taken in conjunction with the accompanying drawings.
FIG. 1 depicts one embodiment of a computing environment to incorporate and use one or more aspects of the present invention;
FIG. 2A depicts another embodiment of a computing environment incorporating and using one or more aspects of the present invention;
FIG. 2B depicts further details of the memory of FIG. 2A, in accordance with an aspect of the present invention;
FIG. 3 depicts an overview of the logic for converting from zoned format to decimal floating point format, in accordance with an aspect of the present invention;
FIG. 4 depicts one embodiment of a format for a convert from location instruction, used in accordance with an aspect of the present invention;
FIG. 5 depicts further details of the logic to convert from zoned to decimal floating point, in accordance with an aspect of the present invention;
FIG. 6 depicts an overview of the logic for converting from decimal floating point format to zoned format, in accordance with an aspect of the present invention;
FIG. 7 depicts one embodiment of a convert from decimal floating point to zoned instruction, used in accordance with an aspect of the present invention;
FIG. 8 depicts further details of the logic for converting from decimal floating point to zoned, in accordance with an aspect of the present invention;
FIG. 9 depicts one embodiment of a computer program product incorporating one or more aspects of the present invention;
FIG. 10 depicts one embodiment of a host computer system incorporating and using one or more aspects of the present invention;
FIG. 11 depicts yet another example of a computer system incorporating and using one or more aspects of the present invention;
FIG. 12 depicts another example of a computer system including a computer network incorporating and using one or more aspects of the present invention;
FIG. 13 depicts one embodiment of various elements of a computer system incorporating and using one or more aspects of the present invention;
FIG. 14A depicts one embodiment of an execution unit of the computer system of FIG. 13 incorporating and using one or more aspects of the present invention;
FIG. 14B depicts one embodiment of a branch unit of the computer system of FIG. 13 incorporating and using one or more aspects of the present invention;
FIG. 14C depicts one embodiment of a load/store unit of the computer system of FIG. 13 incorporating and using one or more aspects of the present invention; and
FIG. 15 depicts one embodiment of an emulated host computer system incorporating and using one or more aspects of the present invention.
Detailed Description
Different computer architectures may support different data formats, and the supported data formats may change over time. For example, machines offered by international business machines corporation have traditionally supported EBCDIC and ASCII formats. Later, machines began to support Decimal Floating Point (DFP) formats and operations, and IEEE standards for DFP formats and operations existed (IEEE 754-2008). However, to use DFP operations, EBCDIC and ASCII data are converted to DFPs.
According to one aspect of the present invention, an efficient mechanism for converting between EBCDIC or ASCII and decimal floating point is provided. In one example, this mechanism performs the translation without the memory overhead of other techniques.
In one aspect of the present invention, a machine instruction is provided that reads EBCDIC or ASCII data (which has a zoned format) from memory, converts it to the appropriate decimal floating point format, and writes it to a target floating point register or floating point register pair. These instructions are referred to herein as long convert from location (CDZT) and extended convert from location (CXZT) instructions.
In yet another aspect of the present invention, a machine instruction is provided that converts a Decimal Floating Point (DFP) operand in a source floating point register or floating point register pair to EBCDIC or ASCII data and stores it to a target memory location. These instructions are referred to herein as a long convert to zone bit instruction (CZDT) and an extended convert to zone bit instruction (CZXT).
One embodiment of a computing environment to incorporate and use one or more aspects of the present invention is described with reference to FIG. 1. Computing environment 100 includes a processor 102 (e.g., a central processing unit), a memory 104 (e.g., a main memory), and one or more input/output (I/O) devices and/or interfaces 106 coupled to one another, e.g., via one or more buses 108 and/or other connections.
In one example, the processor 102 isProcessor, by International Business machines corporation(Armonk, N.Y.) systemPart of a server.Implementation of the Server by International Business machines corporationWhich specifies the logical structure and functional operations of the computer.One example of (A) is described in the title "z/Architecture Principles of OperationDisclosure of (A)Publication No. SA22-7832-08, ninth edition, month 8 2010), which is hereby incorporated by reference in its entirety. In one example, the server runs an operating system, such as that also provided by International Business machines corporationAndis a registered trademark of international business machines corporation, Armonk, new york, usa. Other names used herein may be registered trademarks, trademarks or product names of International Business machines corporation or other companies.
Another embodiment of a computing environment to incorporate and use one or more aspects of the present invention is described with reference to FIG. 2A. Herein shownIn an example, the computing environment 200 includes a native central processing unit 202, a memory 204, and one or more input/output devices and/or interfaces 206 coupled to one another, e.g., via one or more buses 208 and/or other connections, for example. By way of example, the computing environment 200 may include that provided by International Business machines corporation (Armonk, N.Y.)A processor,Server orA server; having an Intel Itanium, supplied by Hewlett Packard Co. (Palo Alto, Calif.)HP Superdome of the processor; and/or based onHewlett packard, Intel, Sun Microsystems, or others.Andis a registered trademark of international business machines corporation, Armonk, new york, usa.And ItaniumIs a registered trademark of Intel Corporation (Santa Clara, California).
The native central processing unit 202 includes one or more native registers 210, such as one or more general purpose registers and/or one or more special purpose registers, used during processing within the environment. These registers contain information that represents the state of the environment at any particular point in time.
In addition, the native central processing unit 202 executes instructions and code stored in the memory 204. In one particular example, the central processing unit executes simulator (emulator) code 212 stored in the memory 204. The code enables a processing environment configured in one architecture to emulate another architecture. For example, the simulator code 212 allows bases other thanThe machine of (1) architecture (e.g.,a processor,A server,Server, HP Superdome Server, or other) simulationAnd execution is based onDeveloped software and instructions.
Further details regarding the simulator code 212 are described with reference to FIG. 2B. Guest instructions 250 include software instructions (e.g., machine instructions) developed to be executed in one architecture different from that of native CPU 202. For example, guest instruction 250 may have been designed to be inExecute on the processor 102, but instead is the native CPU202 (which may be (for example)ItaniumProcessor) is simulated. In one example, the emulator code 212 includes an instruction fetch unit 252 to obtain one or more guest instructions 250 from the memory 204 and optionally to provide local buffering for the obtained instructions. It also includes an instruction translation routine 254 to determine the type of guest instruction that has been obtained and to translate the guest instruction into one or more corresponding native instructions 256. This translation includes, for example, identifying a function to be performed by the guest instruction and selecting a native instruction to perform the function.
Additionally, simulator 212 includes a simulation control routine 260 to cause native instructions to be executed. Emulation control routine 260 may cause native CPU202 to execute a routine of native instructions that emulate one or more previously obtained guest instructions, and at the conclusion of such execution, return control to the instruction fetch routine to emulate the obtaining of a next guest instruction or a set of guest instructions. Execution of guest instructions 250 may include loading data from memory 204 into registers; storing data from the register back to the memory; or perform some type of arithmetic or logical operation (as determined by the translation routine).
Each routine is implemented, for example, in software that is stored in memory and executed by the native central processing unit 202. In other examples, one or more of the routines or operations are implemented in firmware, hardware, software, or some combination thereof. Registers of the emulated processor may be emulated using registers 210 of the native CPU or by using locations in memory 204. In embodiments, guest instructions 250, native instructions 256, and emulator code 212 may reside in the same memory or may be distributed among different memory devices.
As used herein, firmware includes, for example, microcode, millicode, and/or macrocode of a processor. Including, for example, hardware-level instructions and/or data structures used in the implementation of higher-level machine code. In one embodiment, it comprises proprietary code, for example, that is typically delivered as microcode that includes trusted software or microcode specific to the underlying hardware and that controls operating system access to the system hardware.
In one example, guest instruction 250, which is obtained, translated, and executed, is one of the instructions described herein. In this example isThe instructions of the instructions are fetched from memory, translated, and represented as a series of native instructions 256 that are executed (e.g.,etc.).
In another embodiment, one or more of the instructions are executed in another architectural environment, including, for example, an architecture as described below: 11-month in 2006'64and IA-32architecture software developers' Manual Volume1 (Ser. No. 253665-; 11-month in 2006'64and IA-32Architecture Software developers' Manual Volume2A (Ser. No. 253666-; 1 month in 2006 "Architecture Software development's Manual volume1 "(document No. 245826-; 1 month in 2006 "Architecture software development's Manual Volume2 "(document number 245851-; and/or 1 month 2006 "Architecture Software development's Manual Volume3 "(document number 245826-.
The processors described herein, as well as other execution instructions, perform certain functions (such as converting between EBCDIC or ASCII and decimal floating point formats). In one example, EBCDIC or ASCII data has a zoned format, and thus, example instructions include, for example, convert from zoned to decimal floating point instructions and convert from decimal floating point to zoned instructions as described herein.
However, before describing the instructions, various data formats mentioned herein are described. For example, in the zoned format, the four rightmost bits of a byte are referred to as the digit (N) and typically include a code representing a decimal digit (digit). The four leftmost bits of the byte are called the digit bits (Z), except for the rightmost byte of the decimal operand, in which case these bits can be treated as zone bits or as symbols (S).
The decimal digits in zoned format may be part of a larger character set that also includes letters and special characters. The locational format is thus suitable for inputting, editing and outputting digital data in human readable form. In one embodiment, the decimal arithmetic instructions do not operate directly on decimal numbers in a zoned format; such a number is first converted to one, for example, in decimal floating point format.
Decimal floating point data may be represented in any of three data formats: short, long, or extended. The content of each data format represents encoded information. Special codes are assigned to distinguish finite numbers from NaN (nonnumeric) and infinite numbers.
For a finite number, biased exponents (binary exponents) are used in this format. For each format, a different bias is used for the Right Unit View (RUV) index than for the Left Unit View (LUV) index. Biased exponents are unsigned numbers. The biased exponent is encoded by combining the leftmost digits (LMD) of the significand (significant) in the field. The remaining digits of the significand are encoded in the encoded tail significand field.
Examples of these data formats are:
DFP short format
When a DFP short format operand is loaded into a floating-point register, it occupies the left half of the register, and the right half remains unchanged.
DFP Long Format
When a DFP long format operand is loaded into a floating point register, it occupies the entire register.
DFP extension format
Operands in the DFP extended format occupy floating point register pairs. The leftmost 64 bits occupy all of the lower numbered registers of the pair and the rightmost 64 bits occupy all of the higher numbered registers.
The sign bit is in bit 0 of each format and is, for example, 0 for positive and 1 for negative.
For a finite number, the composition field includes the leftmost digit of the biased exponent and the significand; for NaN and infinity, this field includes a code identifying it.
When bits 1-5 of this format are in the range of 00000-. The two leftmost bits of the biased exponent and the leftmost bit of the significand are encoded in bits 1-5 of the format. Bit 6 to the end of the combined field includes the remainder of the biased exponent.
When bits 1-5 of the format field are 11110, the operand is an infinite number. All bits in the combined field to the right of bit 5 of the format constitute a reserved field for an infinite number. Accepting a non-zero value in a reserved field in a source infinity; the reserved field is set to 0 in the resulting infinite number.
When bits 1-5 of this format are 11111, the operand is NaN, and bit 6 (referred to as the SNaN bit) further distinguishes QNaN from SNaN. If bit 6 is 0, it is QNaN; otherwise, it is SNaN. All bits in the combined field to the right of bit 6 of the format constitute the reserved field for NaN. Accepting a non-zero value in a reserved field in a source NaN; the reserved field is set to 0 in the resulting NaN.
The following table summarizes the coding and layout of the combination fields. In this table, the finite number of biased indices is a concatenation of two parts: (1) the two leftmost bits are derived from bits 1-5 of the format, and (2) the remaining bits in the combined field. For example, if the combined field of the DFP short format contains 10101010101 bins, it represents 10010101 bins of biased exponent and leftmost significant digit 5.
The encoded tail significand field includes an encoded decimal number representing a digit in the tail significand. The tail significand includes all the significand digits except the leftmost digit. For an infinite number, accepting a non-zero tail significand digit in a source infinite number; all tail significand digits in the resulting infinite number are set to 0 unless otherwise specified. For NaN, this field includes diagnostic information called payload (payload).
The encoded tail significant field is a large number of 10-bit blocks called three-digit decimal (declet). The number of three-digit decimal numbers depends on the format. Each three digit decimal number represents three decimal digits of a 10-bit value.
The values for the finite number of the various formats are shown in the following table:
the term significant number is used to mean, for example, the following:
1. for a finite number, the significand includes all of the trailing significand bits padded on the left with the leftmost bits of the significand derived from the combined field.
2. For infinite and NaN, the significand contains all of the trailing significand bits padded on the left with zero bits.
For finite numbers, the DFP significand begins with the leftmost non-zero significand bit and ends with the rightmost significand bit.
For finite numbers, the number of DFP significant bits is the difference of the number of leading zeros subtracted from the format precision. The number of leading zeros is the number of zeros in the significand to the left of the leftmost non-zero digit.
In addition to the above, there is a Densely Packed Decimal (DPD) format. An example of a mapping of a 3 digit decimal number (000-999) to a 10-bit value (referred to as a three digit decimal number) is shown in the table below. The DPD items are shown in hexadecimal. The first two digits of the decimal number are shown in the leftmost row, and the third digit is shown along the top column.
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
| 00_ | 000 | 001 | 002 | 003 | 004 | 005 | 006 | 007 | 008 | 009 |
| 01_ | 010 | 011 | 012 | 013 | 014 | 015 | 016 | 017 | 018 | 019 |
| 02_ | 020 | 021 | 022 | 023 | 024 | 025 | 026 | 027 | 028 | 029 |
| 03_ | 030 | 031 | 032 | 033 | 034 | 035 | 036 | 037 | 038 | 039 |
| 04_ | 040 | 041 | 042 | 043 | 044 | 045 | 046 | 047 | 048 | 049 |
| 05_ | 050 | 051 | 052 | 053 | 054 | 055 | 056 | 057 | 058 | 059 |
| 06_ | 060 | 061 | 062 | 063 | 064 | 065 | 066 | 067 | 068 | 069 |
| 07_ | 070 | 071 | 072 | 073 | 074 | 075 | 076 | 077 | 078 | 079 |
| 08_ | 00A | 00B | 02A | 02B | 04A | 04B | 06A | 06B | 04E | 04F |
| 09_ | 01A | 01B | 03A | 03B | 05A | 05B | 07A | 07B | 05E | 05F |
| 10_ | 080 | 081 | 082 | 083 | 084 | 085 | 086 | 087 | 088 | 089 |
| 90_ | 08C | 08D | 18C | 18D | 28C | 28D | 38C | 38D | 0AE | 0AF |
| 91_ | 09C | 09D | 19C | 19D | 29C | 29D | 39C | 39D | 0BE | 0BF |
| 92_ | 0AC | 0AD | 1AC | 1AD | 2AC | 2AD | 3AC | 3AD | 1AE | 1AF |
| 93_ | 0BC | 0BD | 1BC | 1BD | 2BC | 2BD | 3BC | 3BD | 1BE | 1BF |
| 94_ | 0CC | 0CD | 1CC | 1CD | 2CD | 2CD | 3CC | 3CD | 2AE | 2AF |
| 95_ | 0DC | 0DD | 1DC | 1DD | 2DC | 2DD | 3DC | 3DD | 2BE | 2BF |
| 96_ | 0EC | 0ED | 1EC | 1ED | 2EC | 2ED | 3EC | 3ED | 3AE | 3AF |
| 97_ | 0FC | 0FD | 1FC | 1FD | 2FC | 2FD | 3FC | 3FD | 3BE | 3BF |
| 98_ | 08E | 08F | 18E | 18F | 28E | 28F | 38E | 38F | 0EE | 0EF |
| 99_ | 09E | 09F | 19E | 19F | 29E | 29F | 39E | 39F | 0FE | 0FF |
An example of a mapping of a 10-digit three-digit decimal to a 3-digit decimal is shown in the table below. The 10-bit three-digit decimal value is split into a 6-bit index shown in the left row and a 4-bit index shown along the top row, both in hexadecimal representation.
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
| 00_ | 000 | 001 | 002 | 003 | 004 | 005 | 006 | 007 | 008 | 009 | 080 | 081 | 800 | 801 | 880 | 881 |
| 01_ | 010 | 011 | 012 | 013 | 014 | 015 | 016 | 017 | 018 | 019 | 090 | 091 | 810 | 811 | 890 | 891 |
| 02_ | 020 | 021 | 022 | 023 | 024 | 025 | 026 | 027 | 028 | 029 | 082 | 083 | 820 | 821 | 808 | 809 |
| 03_ | 030 | 031 | 032 | 033 | 034 | 035 | 036 | 037 | 038 | 039 | 092 | 093 | 830 | 831 | 818 | 819 |
| 04_ | 040 | 041 | 042 | 043 | 044 | 045 | 046 | 047 | 048 | 049 | 084 | 085 | 840 | 841 | 088 | 089 |
| 05_ | 050 | 051 | 052 | 053 | 054 | 055 | 056 | 057 | 058 | 059 | 094 | 095 | 850 | 851 | 098 | 099 |
| 06_ | 060 | 061 | 062 | 063 | 064 | 065 | 066 | 067 | 068 | 069 | 086 | 087 | 860 | 861 | 888 | 889 |
| 07_ | 070 | 071 | 072 | 073 | 074 | 075 | 076 | 077 | 078 | 079 | 096 | 097 | 870 | 871 | 898 | 899 |
| 08_ | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 | 180 | 181 | 900 | 901 | 980 | 981 |
| 09_ | 110 | 111 | 112 | 113 | 114 | 115 | 116 | 117 | 118 | 119 | 190 | 191 | 910 | 911 | 990 | 991 |
| 0A_ | 120 | 121 | 122 | 123 | 124 | 125 | 126 | 127 | 128 | 129 | 182 | 183 | 920 | 921 | 908 | 909 |
According to one aspect of the present invention, instructions are provided for converting from a zoned format to decimal floating point. In one embodiment, there are two types of convert from zoned to decimal floating point instructions, including a long convert from zoned instruction (CDZT) and an extended convert from zoned instruction (CXZT), each of which is described below. These instructions provide an efficient way to convert data from EBCDIC or ASCII directly in memory to decimal floating point format in registers.
For example, referring to FIG. 3, in one embodiment, EBCDIC or ASCII data is read from memory per machine instruction (step 300); converting it to the appropriate decimal floating point format (step 302); and written to the target floating point register or floating point register pair (step 304).
The long convert from location instruction CDZT reads operand data from a specified memory location, converts it to a double precision DFP operand with zero exponent, and writes it to a specified target floating point register. The extended convert from location instruction CXZT reads operand data from a specified memory location, converts it to an extended precision DFP operand with zero exponent, and writes it to a specified target floating point register pair. The number of bytes in the source memory location is specified in the instruction and may be 1 to 16 bytes for CDZT or 1 to 34 bytes for CXZT. The digits of the source operand are all checked against the significand code. The sign field in the instruction indicates that a sign nibble (nibble) of the source operand is to be processed. If the symbol field is set, the symbol is checked for valid symbol codes. Assuming it is valid, the sign of the DFP result is set to the same sign, as indicated by the sign nibble of the source operand. If an invalid digit or symbol code is detected, a decimal data exception condition is recognized.
In one embodiment, each of the convert from location instructions has the same format (RSL-b format), an example of which is depicted in FIG. 4. As depicted in one embodiment, one format 400 for a convert from location instruction includes, for example, the following fields:
opcode fields 402a, 402 b: the opcode field provides an opcode that indicates the function being executed by the instruction. As an example, one defined opcode defines a function as a long convert from location instruction and another predefined opcode indicates that it is an extended convert from location instruction.
Length field (L)2)404: the length field 404 specifies the length (e.g., in bytes) of the second operand. As an example, for an extended convert to location instruction, the length field includes length codes of 0 to 33, and for a long convert from location instruction, the length field includes length codes of 0 to 15.
Base register field (B)2)406: the base register field specifies a general register whose contents are added to the contents of the displacement field to form the second operand address.
Displacement field (D)2)408: the displacement field includes content that is added to the contents of the general purpose register specified by the base register field to form the second operand address.
Register field (R)1)410: the register field specifies a register whose contents are the first operand. The register that includes the first operand is sometimes referred to as the first operand location.
Mask field (M)3)412: the mask field includes, for example, a symbol (S) control (e.g., bit), which in one example is M3Bit 0 of the field. When the bit is 0, the second operand does not have a sign field, and the DFP first operand is resultantly setThe sign bit of (a) is set to 0. When 1, the second operand is signed. That is, the four leftmost bits of the rightmost byte are symbols. Setting the sign bit of the DFP first operand result to 0 when the sign field indicates a positive value; and is set to 1 when the sign field indicates a negative value. In one embodiment, M is ignored3Bits 1 through 3 of the field.
In operation of the convert from locational instruction, a second operand in locational format is converted to DFP format and the result is placed at the first operand location. In one example, the amount is 1, and the delivery value is expressed in terms of the amount. The result placed at the first operand location is positive (canonical).
In one embodiment, a decimal operand data exception condition is recognized when an invalid digit or symbolic code is detected in the second operand. A specification exception condition is recognized and the operation is inhibited when, for example, any of the following is true: for CDZT, L2A field is greater than or equal to 16; and for CXZT, R1The field specifies an invalid floating-point register pair, or L2The field is greater than or equal to 34.
In one embodiment, when the ASCII second operand is specified, M3Bit 0 of the field is 0; otherwise, a decimal operand data exception condition is recognized. That is, the symbol value of 0011 binary is not a valid symbol.
Further details regarding execution of a convert from location instruction are described with reference to FIG. 5. In one example, executing a convert from location instruction that performs this logic is a processor.
Initially, a determination is made as to whether the opcode of the convert from location instruction indicates that it is in extended or long format (INQUIRY 500). That is, the instruction being executed is a long convert from location instruction or an extended convert from location instruction. If the opcode indicates that it is a long convert from location instruction, then proceed with respect to the length field (L) provided in the instruction2) Another determination of whether a length greater than 15 is specified (query 502). If the length field refers toIf the length is greater than 15, an exception indicating that it is more than 16 digits (0-15) is provided (step 504).
Returning to INQUIRY 502, if the length field does not specify a length greater than 15, then the source location digits (at least a portion of the second operand) are read from memory (step 506). Thereafter, the source region bits read from memory are converted to decimal floating point format (step 508). In this example, it is converted to a double precision DFP operand with zero exponent.
In addition, a process is performed with respect to the presence mask field (M)3) The symbol designated in (1) controls whether or not the symbol is set to 1 (query 510). If the sign control is not equal to one, then the sign of the DFP number is forced positive (step 512), and the target floating point register is updated with the translated value, including the forced sign (step 514).
Returning to INQUIRY 510, if the sign control is equal to 1, then the source sign field (of the second operand) is read from memory (step 516). Thereafter, the sign of the DFP number is set to the sign of the source (step 518), and the target floating point register is updated with the converted value and sign (e.g., bit 0 in DFP format) (step 514).
Returning to INQUIRY 500, if the opcode indicates that it is an extended convert from location instruction, a determination is made as to whether the length field of the instruction specifies a length greater than 33 (INQUIRY 530). If the length field specifies a length greater than 33, an exception condition indicating more than 34 digits (0 through 33) is provided (step 532). However, if the length field does not specify a length greater than 33, a determination is made as to whether the R1 field of the instruction specifies an invalid floating point register pair (query 534). If an invalid floating point register pair is indicated, an exception condition is provided (step 536). Otherwise, the source location digits (at least a portion of the second operand) are read from memory (step 538). Thereafter, the source cell bits read from memory are converted to decimal floating point format (step 540). In this example, the digits (at least a portion of the second operand) are converted to an extended precision data floating point operation with zero exponent.
Thereafter, a determination is made as to whether or not to set the sign (S) control in the mask field of the instruction to 1 (query 542). If the sign control is not equal to 1, then the sign of the data floating point number is forced positive (step 544). However, if the sign control is equal to 1, then the source sign field (of the second operand) is read from memory (step 546) and the sign of the DFP number is set to the sign of the source (step 548). After the sign is set in step 544 or step 548, the target floating point register pair is updated with the converted decimal floating point format and sign (step 550).
Mentioned above are two steps for converting source region bits read from memory to decimal floating point format. Specifically, step 508 converts the source to a double precision decimal floating point operand having a zero exponent, and step 540 converts the source to an extended precision data floating point operand having a zero exponent. Further details regarding the conversion are described below and above in the context of the "z/Architecture Principles of Operation"Publication No. SA22-7832-08, ninth edition, 8 months 2010).
One embodiment of the process for converting a location formatted number to DFP format is as follows: the source bits are read from the memory. If necessary, the Binary Coded Decimal (BCD) digits in the right 4 bits of each byte of source data are padded with zeros on the left so that there are a total of 16 BCD digits for double precision operations and so that there are 34 digits for extended precision operations. These BCD digits are then converted from BCD to Densely Packed Decimal (DPD) such that every 3 BCD digits, starting on the right of the source data, are converted to 10-bit DPD groups for all BCD digits (except the leftmost BCD digit). Thus, there are 5 DPD groups for double precision conversion and 11 DPD groups for extended precision conversion. These DPD groups constitute bits 14-63 of the double precision result and bits 17-127 of the extended precision result. Bits 6-13 are the exponent field of the double precision result and 2 of bits 1-5 from the combine field are set to a value 398 for double precision operations. For extended precision operations, bits 6-17 are exponent field bits, and 2 bits from the combination field are set to value 6176 for extended precision operations.
If the most significant BCD digits are "8" or "9," bits 1 and 2 are set to "1"; bits 3 and 4 are the most significant 2 bits of the exponent, and thus will be set to "01"; and bit 5 is set to "0" for "8" or "1" for "9". If the most significant BCD digits are "0" to "7," bits 1 and 2 are the most significant value of the exponent, and thus will be set to "01," and bits 3-5 are set to the rightmost 3 bits of the most significant BCD digits.
If S is 1, the leftmost 4 bits of the rightmost byte of the source data are a symbol code. In this case, if the value of the sign code is "1011" or "1101", the resultant sign bit (bit 0) is set to 1.
Described in detail above are two instructions that provide a way to significantly improve the traditional memory to memory decimal workload. In a traditional memory-to-memory decimal workload, EBCDIC or ASCII operands are first converted to packed decimal format, which strips out the field codes and places the digit and sign digits of the two operands in another portion of memory. The padded operands are then operated on by arithmetic operations such as addition, subtraction, multiplication, or division. These arithmetic operations must wait for the padding program to complete before they can begin, and these operations then store their results to memory. Once the result storage is complete, the result is then unpacked back to the target format (EBCDIC or ASCII). Memory and operation dependencies dominate performance.
According to one aspect of the invention, depending on the target format, the Pack or PKA instruction is replaced with CDZT or CXZT using a new instruction (e.g., recompiling code with a new instruction enabled). The mathematical operation may then be replaced by its DFP equivalent (e.g., AD/XTR, SD/XTR, MD/XTR, DT/XTR) so that there is no wait to store any operands or read any operands from memory. These instructions operate in a similar amount of time as Add (AP), Subtract (SP), Multiply (MP), or Divide (DP), but without memory overhead. When the UNPK or UNPKA operations are replaced, the second memory dependency is avoided and the results are converted directly to the target format via the CZDT or CZXT instructions described below.
Conventional memory-to-memory decimal fill operations are capable of handling 15 digits and symbols, which require 3 overlapping fill operations to handle each 31 digit (and symbol) operand typically found in applications such as COBOL applications. Having to split the operands into smaller overlapping mini-operands increases the complexity of the compiler and the compiled code; additional instructions need to be executed to perform a given task, such as handling carry/borrow between mini-operands; and affect performance. Since CXZT is capable of converting 34 digit and sign code to DFP operands, a compiler can treat common 31 digit and sign operands (e.g., COBOL operands) as a single entity, simplifying the compiled code and improving performance.
As described herein, CDZT and CXZT instructions provide an efficient way to convert data from EBCDIC or ASCII in memory directly to DFP format in registers. Which allows to convert the data from EBCDIC or ASCII to DFP format in a single step. Previously, this procedure required the use of Pack or PKA operations to convert data into packed decimal format. The data must then be loaded into General Purpose Registers (GPRs), but this often requires a mix of word, halfword and byte load operations since there is currently no length controlled load in the instruction set architecture. Other instructions CDSTR or CXSTR may then be used to convert the packed decimal data in GPR/GPR pairs to a target DFP format. According to one aspect of the invention, PACK/PKA and CDSTR/CXSTR are replaced by an instruction CDZT or CXZT.
In addition to a convert from zoned to decimal floating point instruction, according to yet another aspect of the present invention, a convert from decimal floating point to zoned instruction is also provided. These instructions provide an efficient way to convert data from decimal floating point format held in floating point registers or pairs of floating point registers to EBCDIC or ASCII data and store it directly to memory.
For example, referring to FIG. 6, in one example, DFP operands in a source register or pair of source registers are converted to EBCDIC or ASCII data (step 600). The result of the conversion is then stored in the target memory location (step 602). These instructions allow the data to be converted directly from DFP format to EBCDIC and ASCII in a single step.
Examples of these instructions include a long convert to zone bit instruction (CZDT) and an extended convert to zone bit instruction (CZXT). A long convert to locational instruction CZDT reads the double precision DFP operand data from the specified FPR register, converts the mantissa to locational format, and writes it to the target memory location. Likewise, extended convert to locational instruction CZXT reads the extended precision DFP operand data from the specified FPR register pair and converts the mantissa to locational format and writes it to the target memory location. If the length of the specified memory location is not sufficient to fit all of the leftmost non-zero digits of the source operand, a decimal overflow exception condition is recognized (assuming that a decimal overflow mask is enabled). If not all digits fit into a specified memory location, a specific condition code (e.g., 3) is set. The sign of the DFP operand is copied to the sign nibble of the result in memory (if the sign field is set). The positive sign encoding used is controlled by the P field in the instruction context described below, and the result of the zero operand may be conditionally forced positive by the Z field in the instruction context also described below. Such symbol manipulation is typically required in compiled code, and including this function directly in the instruction provides performance savings and simplifies compiled code.
One embodiment of the convert to location instruction format (RSL-b) is described with reference to FIG. 7. In one example, format 700 for a convert to location instruction includes the following fields:
opcode fields 702a, 702 b: the opcode field provides an opcode that indicates the function being executed by the instruction. As an example, one defined opcode specifies the function as a long convert to location instruction, and another predefined opcode indicates that it is an extended convert to location instruction.
Length field (L)2)704: the length field 704 specifies the length (e.g., in bytes) of the second operand. As an example, for an extended convert to locational instruction, the length field includes length codes of 0 to 33, and for a long convert to locational instruction, the length field includes length codes of 0 to 15. In addition, the rightmost significant digit of the first operand to be converted is represented by L2And (4) specifying.
Base register field (B)2)706: the base register field specifies a general register whose contents are added to the contents of the displacement field to form the second operand address.
Displacement field (D)2)708: the displacement field includes content that is added to the contents of the general purpose register specified by the base register field to form the second operand address.
Register field (R)1)710: the register field specifies a register whose contents are the first operand.
Mask field (M)3)712: the mask field includes (for example):
sign control (S): m3Bit 0 of the field is the sign control. When S is 0, the second operand does not have a sign field. When S is 1, the second operand has a sign field. That is, the four leftmost bit positions of the rightmost byte are symbols.
Zone bit control (Z): m3Bit 1 of the field is location control. When Z is 0, each location field of the second operand is stored as 1111 binaries. When Z is 1, each location field of the second operand is stored as 0011 binary.
Plus code control (P): m3Bit 2 of the field is plus code control. When P is 0, the plus sign is encoded as 1100 binary. When P is 1, the plus sign is encoded as 1111 bins.When the S bit is 0, the P bit is ignored, and the P bit is assumed to be 0.
Forced plus zero control (F): m3Bit 3 of the field is a force plus zero control. When F is 0, no action is taken. When F is 1 and the absolute value of the result placed at the second operand location is 0, the sign of the result is set by the sign code specified by the P bit to indicate the addition value. When the S bit is 0, the F bit is ignored, and the F bit is assumed to be 0.
In operation, the specified number of rightmost significant digits of the DFP first operand and the sign bit of the first operand are converted to a locational format and the result is placed at the second operand location. Implying a right cell view of the first operand with the amount 1. The exponent in the combined field is ignored and treated as if it had a zero value before being biased.
The rightmost significant digit of the first operand to be converted is represented by L2And (4) specifying. The length of the bytes of the second operand is 1-34 for CZXT, which corresponds to L from 0 to 332Length code in (1), which means 1-34 digits. The length of the bytes of the second operand is 1-16 for CZDT, which corresponds to L from 0 to 152Length code in (1), which means 1 to 16 digits.
In one embodiment, the operation is performed for any first operand (including infinity, QNaN, or SNaN) without causing an IEEE exception condition. If the first operand is an infinite number or NaN, then the zero digits are assumed to be the leftmost digit of the significand, the specified number and sign bits of the rightmost significand are converted to location format, the result is placed in the second operand location, and execution includes a specific condition code (e.g., 3).
When the leftmost non-zero digit of the result is lost, because the second operand field is too short, the result is obtained by ignoring the overflow digit, the specified condition code (e.g., 3) is set, and if the decimal overflow mask bit is 1, a program interrupt for decimal overflow occurs. Operand length alone is not an indication of overflow; non-zero digits will be lost during operation.
The specification exception condition is recognized and the operation is inhibited when, for example, any of the following is true: for CZDT, L2The field is greater than or equal to 16, which means 17 or more digits than 17. For CZXT, R1The field specifies an invalid floating-point register pair, or L2The field is greater than or equal to 34, which means 35 or more digits than 35.
Examples of resulting condition codes include:
0 source is 0
1 source less than 0
2 sources greater than 0
3 infinity, QNan, SNaN, partial results.
In one embodiment, when the S bit is 1, the ASCII zoned decimal operand may be stored as signed. This is handled by the program because the ASCII representation is usually unsigned and positive, and no rightmost bit is used as a notion of a sign. Additionally, the inclusion of a particular condition code (e.g., 0) indicates that the absolute value of the first operand is 0.
M3The relation between the sign of the control bits for the first operand DFP and the absolute value of the resulting second operand being 0 is illustrated in the following table, which is provided as an example:
x ignore
- - -is not applicable
Further details regarding the logic for the convert to location instruction are described with reference to FIG. 8. In one example, the logic is executed by a processor executing a convert to zoned machine instruction.
Referring to FIG. 8, initially, a convert to location instruction for this extended or long convert to location instruction is performedThe determination of the instruction, as indicated by the opcode of the instruction (query 800). If it is a long convert to location instruction (as indicated by opcode), proceed with respect to L2A determination of whether the field specifies a length greater than 15 (query 802). If L is2If the field does not specify a length greater than 15, then an exception condition is provided (step 804) because there are more than 16 digits (0-15).
Returning to INQUIRY 802, if the length field does not specify a length greater than 15, then the DFP operand is read from the floating point register specified in the translate instruction (using R1) (step 806). The source DFP bits of the read DFP operand are then converted to BCD bits (step 808).
After conversion, a determination is made as to whether non-zero digits fit to the bits of L2A decision in the specified length (query 810). If the non-zero bits do not match, an overflow exception condition is indicated (step 812). Otherwise, a further determination is made as to whether the Z bit of the mask field is equal to 1 (INQUIRY 814). If Z is equal to 1, then the location field and symbol code are set to "0011" (step 816). Otherwise, the location field and symbol code are set to "1111" (step 818).
After the location field and the code symbol are set, a further determination is made as to whether the S bit of the mask field is set to 1 (query 820). If the S bit is not set to 1, the BCD digits, symbol fields, and field codes are stored to memory in the appropriate format (step 822). An example of a location format is as follows:
in this example, the rightmost four bits of the byte are referred to as the digit (N) and typically include a code representing a decimal digit. The four leftmost bits of the byte are called zone bits (Z), except for the rightmost byte of the decimal operand, in which case these bits can be treated as zone bits or as symbols (S).
Returning to INQUIRY 820, if the S bit is equal to 1, then yet another determination is made as to whether the Z bit in the mask is set to 1 (INQUIRY 824). If Z is equal to 1, a determination is made as to whether the result is equal to 0 (step 826). If the result is equal to 0, the result sign is set to positive (step 828). If the result is not set equal to 0 or Z is not equal to 1, the result symbol is set to the DFP symbol (step 830).
After the result symbol is set, a determination is made as to whether the result symbol is positive (query 832). If the result sign is not positive, processing continues with step 822 where the BCD digits, sign field, and field code are stored to memory in the appropriate format. However, if the result sign is positive (query 832), then a further determination is made as to whether the P bit of the mask field is set to 1 (query 834). If the P bit is set to 1, then the sign is set equal to 1111; otherwise, the symbol is set equal to 1100 (step 838). After the symbol is set, processing continues with step 822.
Returning to INQUIRY 800, if this is an extended convert to location instruction, a determination is made as to whether the length field specifies a length greater than 33 (INQUIRY 850). If the length field specifies a length greater than 33, an exception condition indicating more than 34 digits is provided (step 852). Otherwise, proceed with respect to register field (R)1) A determination is made as to whether an invalid floating-point register pair is specified (query 854). If not, processing continues with step 806. Otherwise, an exception condition is provided (step 856). This completes the description of the embodiment of the convert to location instruction.
The steps for converting the source DFP digits into BCD digits are mentioned above. Further details regarding the conversion are described below and above in the context of the "z/Architecture Principles of Operation" ((R))Publication No. SA22-7832-08, ninth edition, 8 months 2010). The following description also provides details regarding the process of converting from DFP to location format.
In one example, for the double precision format, the combination field (which is bits 1-5 of the source data) contains the most significant bits of mantissa data to be converted to the locational format. Bit 0 is a sign bit, where a negative value is indicated by bit 0 being a "1". Bits 6-13 are exponent continuation fields and are ignored by this operation. Bits 14-63 are the encoded tail significand and contain the remaining 15 digits of the decimal data, and are encoded in DPD (densely packed decimal) format.
In one example, for extended precision format, the combination field (which is bits 1-5 of the source data) contains the most significant bits of mantissa data to be converted to locational format. Bit 0 is a sign bit, where a negative value is indicated by bit 0 being a "1". Bits 6-17 are exponent continuation fields and are ignored by this operation. Bits 18-127 are the encoded tail significand and contain the remaining 33 digits of the decimal data encoded in DPD format.
For both the double precision and extended precision formats, the tail significand digits of the digits encoded for DPD are converted from DPD format to BCD (binary coded decimal) format, and the digits from the combined field (bits 1-5) are prepended to those digits. Only a few gates (gates) are required for DPD-to-BCD conversion, and a block of 10-bit DPD data is broken up into blocks of 12-bit BCD data via these gates, so that each BCD block includes three 4-bit BCD numbers. Check the string against leading zeros, and then sum the string with the L of the instruction2The fields are compared to determine if an overflow condition has occurred, and if it has occurred, once the data is expanded to zoned decimal format, it will fit the appropriate most significant digits (those that will not fit in L2A bit within a specified memory length) is cleared (zero out).
The next 4-bit field is inserted to the left of each BCD digit so that each byte (8 bits) now includes a 4-bit field followed by a 4-bit BCD digit. Each bit field is either "0011" or "1111" depending on whether the Z bit is 0 or 1 herein. Next, if S ═ l in the instruction, the sign bit from the DFP source operand is used to determine the sign code. If the BCD digits are all 0 and F ═ l, the sign is ignored and a positive sign code is created. Otherwise, the generated sign code is the sign of the DFP source operand from bit 0 and encodes a negative sign as "1101"; if P ═ 0, the positive symbol is encoded as "1100", or if P ═ l, the positive symbol is encoded as "1111". This symbol code then replaces the field code to the left of the least significant BCD digits. (in one embodiment, the symbol is processed in parallel with the field code and inserted to the left of the least significant BCD digits, in place of the field code.) this result is then written to memory.
Described in detail above as two machine instructions-CZDT and CZXT, which convert decimal floating point operands in a source floating point register or register pair to EBCDIC or ASCII data and store them to a target memory location. These instructions provide a way to significantly improve the traditional memory to memory decimal workload. Conventional memory-to-memory decimal decapsulation operations are capable of handling 15 digits and symbols, which require three overlapping decapsulation operations to handle the 31 digit (and symbol) results typically found in applications such as COBOL applications. Having to split the results into smaller overlapping mini-results increases compiler complexity and impacts performance because it requires additional instructions to be executed to perform a given task. Since CZXT is capable of converting DFP operands containing up to 34 digit and symbol codes and storing them to memory in a single instruction, a compiler can handle the common 31 digit and symbol result (e.g., COBOL result) as a single entity, simplifying the compiled code and improving performance.
Previously, programs needed to use CSDTR or CSXTR to convert data from DFP format to packed decimal format in GPR. Data must then be stored from the GPRs to memory, but since there is currently no length controlled storage in the instruction set architecture, this often requires a mix of word, halfword and byte store operations. Finally, a decapsulation or UNPKA operation is required to convert the data in memory back to EBCDIC or ASCII. These new instructions allow the data to be converted directly from DFP format to EBCDIC and ASCII in a single step. The CZDT or CZXT instructions replace both CSDTR/CSXTR and UNPK/UNPKA instructions.
As will be appreciated by one skilled in the art, one or more aspects of the present invention may be embodied as a system, method or computer program product. Accordingly, one or more aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a "circuit," module "or" system. Furthermore, one or more aspects of the present invention may take the form of a computer program product embodied in one or more computer-readable media having computer-readable code embodied thereon.
Any combination of one or more computer-readable media may be utilized. The computer readable medium may be a computer readable storage medium. For example, a computer readable storage medium may be, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer magnetic disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or cache), an optical fiber, a portable compact disc-read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the case of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
Referring now to fig. 9, in one example, a computer program product 900 includes, for instance, one or more non-transitory computer-readable storage media 902 to store computer-readable code means or logic 904 thereon to provide and facilitate one or more aspects of the present invention.
Code embodied on a computer readable medium may be transmitted using an appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer code for carrying out operations for one or more aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language, a compiler or similar programming languages. The code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone suite of software, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
One or more aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in these figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of one or more aspects of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition to the above, one or more aspects of the present invention may be provided, proposed, deployed, managed, serviced, etc. by a service provider who proposes management of a customer environment. For example, a service provider can create, maintain, support, etc., computer code and/or computer infrastructure that performs one or more aspects of the present invention for one or more customers. In return, the service provider may collect payment from the customer under a subscription and/or fee agreement, as examples. Additionally or alternatively, the service provider may collect payment from the sale of advertising content to one or more third parties.
In one aspect of the invention, an application program may be deployed to perform one or more aspects of the invention. As one example, deployment of an application program involves providing a computer infrastructure operable to perform one or more aspects of the present invention.
As yet another aspect of the present invention, a computing infrastructure may be deployed comprising integrating computer-readable code into a computing system, wherein the code in combination with the computing system is capable of performing one or more aspects of the present invention.
As yet another aspect of the present invention, a program for integrating computing infrastructure comprising integrating computer readable code into a computer system may be provided. The computer system includes a computer-readable medium, wherein the computer medium includes one or more aspects of the present invention. The code in combination with the computer system is capable of performing one or more aspects of the present invention.
While various embodiments are described above, these embodiments are merely examples. For example, computing environments of other architectures may incorporate and use one or more aspects of the present invention. In addition, although some fields and/or bits are described, others may be used. Additionally, some steps of the flow diagrams may be performed in parallel or in a different order. Many changes and/or additions may be made without departing from the spirit of the invention.
In addition, other types of computing environments may benefit from one or more aspects of the present invention. As an example, a data processing system suitable for storing and/or executing code is available that includes at least two processors coupled directly or indirectly to memory elements through a system bus. These memory elements include, for example, local memory used during actual execution of the code, bulk storage, and cache memories which provide temporary storage of at least some code in order to reduce the number of times code must be fetched from bulk storage during execution.
Input/output or I/O devices (including but not limited to keyboards, displays, pointing devices, DASD, magnetic tape, CDs, DVDs, flash drives, and other memory media) can be coupled to the system either directly or through intervening I/O controllers. Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the available types of network adapters.
Other examples of computing environments are described below that may incorporate and/or use one or more aspects of the present invention.
Referring to FIG. 10, representative components of a host computer system 5000 to implement one or more aspects of the present invention are depicted. The representative host computer 5000 includes one or more CPUs 5001 in communication with computer memory (i.e., central storage) 5002, and I/O interfaces to storage media devices 5011 and networks 5010 for communicating with other computers or SANs and the like. The CPU5001 is compliant with an architecture having an architected instruction set and architected functionality. The CPU5001 may have a Dynamic Address Translation (DAT)5003 for translating program addresses (virtual addresses) into real addresses of memory. A DAT typically includes a Translation Lookaside Buffer (TLB)5007 for caching translations so that later accesses to a block of computer memory 5002 do not require delayed address translations. Typically, a cache 5009 is used between the computer memory 5002 and the processor 5001. The cache 5009 may be hierarchical, having a large cache available to more than one CPU and smaller, faster (lower level) caches between the large cache and each CPU. In some implementations, the lower-level cache is split to provide separate lower-level caches for instruction fetching and data accesses. In one embodiment, instructions are fetched from memory 5002 by instruction fetch unit 5004 via cache 5009. The instructions are decoded in an instruction decode unit 5006 and scheduled (in some embodiments, by other instructions) to one or more instruction execution units 5008. Instruction execution units 5008 are typically used, such as arithmetic execution units, floating point execution units, and branch instruction execution units. Instructions are executed by the execution units, accessing operands as needed from instruction specific registers or memory. If operands are to be accessed (loaded or stored) from the memory 5002, the load/store unit 5005 typically handles the access under the control of the instruction being executed. The instructions may be executed in hardware circuitry, or in internal microcode (firmware), or by a combination of both.
As noted, the computer system includes information in local (or main) storage, as well as addressing, protection and reference and change records. Some aspects of addressing include the format of addresses, the concept of address space, various address types, and the manner in which addresses of one type are translated to addresses of another type. Some of the main memory includes permanently assigned storage locations. The main memory provides direct addressable fast access storage of data to the system. Both data and programs are loaded into main memory (from the input device) before they can be processed.
Main memory may include one or more smaller, fast-access buffer memories, sometimes referred to as cache memories. Cache memories are typically associated with CPU or I/O processor entities. The effects of physical construction and use of distinct storage media (other than on performance) are generally not observable by programs.
Separate caches may be maintained for instructions and for data operands. Information in the cache is maintained in contiguous bytes on integer boundaries called cache blocks or cache lines (or simply lines). The model may provide an EXTRACT CACHE ATTRIBUTE instruction that returns the size of the CACHE line in bytes. The model may also provide PREFETCHDATA and PREFETCH DATA RELATIVE LONG instructions that implement a pre-fetch of memory into the data or instruction cache or a release of data from the cache.
The memory is treated as a long horizontal bit string. For most operations, accesses to memory are made in a left-to-right sequence. The bit string is subdivided into units of eight bits. An eight-bit unit, called a byte, builds a block on the basis of all information formats. Each byte location in memory is identified by a unique non-negative integer, which is the address of the byte location or simply the byte address. Adjacent byte positions have consecutive addresses, which start at 0 on the left and proceed in a left-to-right sequence. Addresses are unsigned binary integers and are 24, 31 or 64 bits.
Information is transferred between the memory and the CPU or channel subsystem one byte or a group of bytes at a time. Unless otherwise specified, in (e.g.)A group of bytes in memory is addressed by the leftmost byte of the group. The number of bytes in the group is implied or explicitly specified by the operation to be performed. When used in CPU operations, a group of bytes is called a field. Within each group of bytes, in (e.g.)In (1), numbering is done in a left-to-right sequence. In thatThe leftmost bit is sometimes referred to as the "high order" bit, and the rightmost bit is sometimes referred to as the "low order" bit. However, the bit number is not a memory address. Only bytes may be addressed. To operate on individual bits of a byte in memory, the entire byte is accessed. The bits in a byte are numbered 0 to 7 from left to right (e.g., inIn (1). The bits in the address may be numbered 8-31 or 40-63 for a 24-bit address, or 1-31 or 33-63 for a 31-bit address; for a 64-bit address, it is numbered 0-63. Within any other fixed length format of a plurality of bytes, the bits making up that format are numbered consecutively starting from 0. For the purpose of error detection, anPreferably, one or more check bits may be transmitted per byte or over a group of bytes for correction. These check sites are automatically generated by the machine and may not be directly program controlled. The storage capacity is expressed in terms of the number of bytes. When the length of the store operand field is implied by the opcode of the instruction, this field is considered to have a fixed length, which may be one, two, four, eight, or sixteen bytes. For some instructions, a larger field may be implied. When the length of the store operand field is explicitly stated without implication, the field is considered to have a variable length. The variable length operands may vary in length by increments of one byte (or in the case of some instructions, by multiples of two bytes or other multiples). When information is placed in memory, the contents of only those byte locations included in the specified field are replaced, even though the physical path to memory width may be greater than the length of the field being stored.
Some information units will be on integer boundaries in memory. For a unit of information, a boundary is called an integer when its storage address is a multiple of the length (in bytes) of the unit. Special names are given to fields of 2, 4,8, and 16 bytes on integer boundaries. A halfword is a set of two consecutive bytes on a two-byte boundary and builds a block on the basis of an instruction. A word is a group of four consecutive bytes on a four-byte boundary. A doubleword is a group of eight consecutive bytes on an eight-byte boundary. A quadword is a group of 16 consecutive bytes on a 16-byte boundary. When a memory address specifies a halfword, a word, a doubleword, and a quadword, the binary representation of the address contains one, two, three, or four rightmost zero bits, respectively. The instruction will be on a two-byte integer boundary. Most instructions have no boundary alignment requirements for their storage operands.
On devices implementing separate caches for instruction and data operands, significant delays may be experienced if a program is stored into a cache line from which an instruction is subsequently fetched, regardless of whether the memory modifies the subsequently fetched instruction.
In one embodiment, the invention may be practiced with software (sometimes referred to as authorized internal code, firmware, microcode, millicode, picocode, and the like, any of which will be consistent with one or more aspects of the invention). Referring to fig. 10, software code embodying one or more aspects of the present invention may be accessed by the processor 5001 of the host system 5000 from a long-term storage media device 5011, such as a CD-ROM drive, tape drive or hard drive. The software code may be embodied on any of a variety of known media for use with a data processing system such as a magnetic disk, hard drive, or CD-ROM. The code may be distributed on such media, or may be distributed to users over a network 5010 from computer memory 5002 or the memory of one computer system to other computer systems for use by users of such other systems.
The software code includes an operating system that controls the function and interaction of various computer components and one or more application programs. Code is typically paged from storage media device 5011 to the relatively higher speed computer memory 5002 where it is available for processing by processor 5001. Techniques and methods for embodying software code in memory, on tangible media, and/or distributing software code via networks are well known and will not be discussed further herein. The code is often referred to as a "computer program product" when created and stored on tangible media, including, but not limited to, electronic memory modules (RAM), cache memory, Compact Discs (CD), DVDs, tapes, and the like. The computer program product medium is generally readable by a processing circuit, preferably in a computer system, for execution by the processing circuit.
FIG. 11 illustrates a representative workstation or server hardware system in which one or more aspects of the present invention may be practiced. The system 5020 of fig. 11 comprises a representative base computer system 5021 (such as a personal computer, workstation or server), including optional peripherals. A basic computer system 5021 includes one or more processors 5026, and a bus for connecting the processor 5026 to other components of the system 5021 and for enabling communication between the processor 5026 and other components of the system 5021, in accordance with known techniques. The bus connects the processor 5026 to memory 5025 and long-term storage 5027, the long-term storage 5027 may comprise, for example, a hard drive (including, for example, any of magnetic media, CDs, DVDs, and caches), or a tape drive. The system 5021 can also include a user interface adapter that connects the microprocessor 5026 via the bus to one or more interface devices, such as a keyboard 5024, a mouse 5023, a printer/scanner 5030, and/or other interface devices that can be any user interface device, such as a touch-sensitive screen, a digitized tablet, etc. The bus also connects a display device 5022, such as an LCD screen or monitor, to the microprocessor 5026 via a display adapter.
The system 5021 may communicate with other computers or networks of computers through a network adapter capable of communicating 5028 with a network 5029. Example network adapters are communications channels, token ring, Ethernet or modems. Alternatively, the system 5021 may communicate using a wireless interface, such as a CDPD (cellular digital packet data) card. The system 5021 can be associated with such other computers in a Local Area Network (LAN) or a Wide Area Network (WAN), or the system 5021 can be a client in a client/server configuration with another computer, and so on. All of these configurations, as well as appropriate communication hardware and software, are known in the art.
Fig. 12 illustrates a data processing network 5040 in which one or more aspects of the present invention may be practiced. Data processing network 5040 may include a plurality of individual networks (such as wireless networks and wired networks), each of which may include a plurality of individual workstations 5041, 5042, 5043, 5044. Additionally, one or more LANs may be included, where a LAN may include a plurality of intelligent workstations coupled to the host processor, as will be appreciated by those skilled in the art.
Still referring to FIG. 12, the network may also include mainframe computers or servers, such as a gateway computer (client server 5046) or application server (remote server 5048, which may access a data repository and may also directly store from a workstation 5045Taking). The gateway computer 5046 acts as an entry point to each individual network. A gateway is required when connecting a network connection protocol to another network connection protocol. The gateway 5046 may be coupled to another network (e.g., the internet 5047), preferably by a communications link. The gateway 5046 may also be directly coupled to one or more workstations 5041, 5042, 5043, 5044 using a communications link. IBM eServer available from International Business machines corporation may be utilizedTMThe server implements a gateway computer.
Referring concurrently to fig. 11 and 12, software programming code which may embody one or more aspects of the present invention may be accessed by the processor 5026 of the system 5020 from long-term storage media 5027, such as a CD-ROM drive or hard drive. The software programming code may be embodied on any of a variety of known media for use with a data processing system, such as a magnetic diskette, hard drive, or CD-ROM. The code may be distributed on such media, or may be distributed to users 5050, 5051 from the storage or memory of one computer system over a network to other computer systems for use by users of such other systems.
Alternatively, the programming code may be embodied in the memory 5025 and accessed by the processor 5026 using a processor bus. This programming code includes an operating system which controls the function and interaction of the various computer components and one or more application programs 5032. Code is typically paged from the storage medium 5027 to high speed memory 5025 where it is available for processing by the processor 5026. Techniques and methods for embodying software programming code in memory, on tangible media, and/or distributing software code via a network are well known and will not be discussed further herein. The code is often referred to as a "computer program product" when created and stored on tangible media, including, but not limited to, electronic memory modules (RAM), flash memory, Compact Discs (CD), DVDs, tapes, and the like. The computer program product medium is generally readable by a processing circuit, preferably in a computer system, for execution by the processing circuit.
The most readily available cache memory for the processor (typically faster and smaller than the other cache memories of the processor) is the lowest (L1 or level one) cache memory, and the main memory (main memory) is the highest level cache memory (L3 if there are 3 levels). The lowest level Cache is often divided into an instruction Cache (I-Cache) that holds the machine instructions to be executed and a data Cache (D-Cache) that holds the data operands.
Referring to fig. 13, an exemplary processor embodiment is depicted for the processor 5026. Typically, one or more levels of cache 5053 are used to buffer memory blocks in order to improve processor performance. Cache 5053 is a cache that holds cache lines of memory data that may be used. Typical cache lines store 64, 128, or 256 bytes of data. Cache memories separate from those used to cache data are often used to cache instructions. Cache coherency (synchronization of copies of lines in memory and cache) is often provided by various "snoop" algorithms well known in the art. The main memory 5025 of the processor system is often referred to as a cache memory. In a 4-tier processor system having cache 5053, main memory 5025 is sometimes referred to as a tier 5(L5) cache, because it is typically faster and only holds a portion of the non-volatile storage (DASD, tape, etc.) available to the computer system. Main memory 5025 "caches" pages of data paged by the operating system in and out of main memory 5025.
Program counter (instruction counter) 5061 tracks the address of the current instruction to be executed.The program counter in the processor is 64 bits and may be truncated to 31 or 24 bits to support the previous addressing restrictions. The program counter is typically embodied in the computer's PSW (program status word) such that it is context-sensitiveThe switch period lasts. Thus, an in-progress program having a program counter value may be interrupted by, for example, the operating system (a context switch from the program environment to the operating system environment). The PSW of a program maintains a program counter value when the program is not active, and uses the program counter of the operating system (in the PSW) while the operating system is executing. Typically, the program counter is incremented by an amount equal to the number of bytes of the current instruction. RISC (reduced instruction set computing) instructions are typically of fixed length, while CISC (Complex instruction set computing) instructions are typically of variable length. IBMIs a CISC instruction having a length of 2, 4 or 6 bytes. For example, the program counter 5061 is modified by a context switch operation or a branch taken operation of a branch instruction. In a context switch operation, the current program counter value is stored in a program status word along with other status information about the program being executed (such as condition codes), and a new program counter value is loaded that points to the instruction of the new program module to be executed. A branch taken operation is performed to permit the program to make decisions or loop through the program by loading the result of the branch instruction into the program counter 5061.
In general, instruction fetch unit 5055 is used to fetch instructions on behalf of processor 5026. The fetch unit fetches the "next sequential instruction", the target instruction of the branch taken operation, or the first instruction of the program after the context switch. Modern instruction fetch units often use prefetch techniques to speculatively prefetch instructions based on the likelihood that prefetch instructions may be used. For example, the fetch unit may fetch 16 bytes of the instruction that includes the next sequential instruction and additional bytes of other sequential instructions.
The fetched instructions are then executed by the processor 5026. In one embodiment, the fetched instructions are passed to dispatch unit 5056 of the fetch unit. The dispatch unit decodes the instruction and forwards information about the decoded instruction to the appropriate units 5057, 5058, 5060. The execution unit 5057 will typically receive information from the instruction fetch unit 5055 regarding decoded arithmetic instructions, and will perform arithmetic operations on operands according to the opcode of the instruction. Operands are preferably provided to the execution units 5057 from storage 5025, architected registers 5059, or from the immediate field of the instruction being executed. The results of the execution, when stored, are stored in storage 5025, registers 5059, or other machine hardware (such as control registers, PSW registers, and the like).
Processor 5026 typically has one or more units 5057, 5058, 5060 for performing the function of instructions. Referring to fig. 14A, an execution unit 5057 may communicate with architected general registers 5059, decode/dispatch unit 5056, load store unit 5060, and other 5065 processor units via interfacing logic 5071. The execution unit 5057 may use several register circuits 5067, 5068, 5069 to hold information of the operation to which the Arithmetic Logic Unit (ALU)5066 is to be paired. The ALU performs arithmetic operations (such as addition, subtraction, multiplication, and division) as well as logical functions (such as "AND", "OR", and "exclusive OR (XOR)", rotation, and displacement). Preferably, the ALU supports specialized operations related to design. Other circuitry may provide other architected facilities 5072, including, for example, condition codes and recovery support logic. Typically, the results of the ALU operation are held in an output register circuit 5070, which output register circuit 5070 may forward the results to a variety of other processing functions. There are many configurations of processor units, and this description is intended only to provide a representative understanding of one embodiment.
The ADD instruction, for example, will execute in an execution unit 5057 with arithmetic and logical functionality, while the floating point instruction, for example, will execute in a floating point execution with specialized floating point capabilities. Preferably, the execution unit operates on operands identified by the instruction by performing an opcode-defining function on the operands. For example, an ADD instruction may be executed by the execution unit 5057 on operands found in two registers 5059 identified by register fields of the instruction.
Execution unit 5057 performs arithmetic addition on two operands and stores the result in a third operand, where the third operand may beEither the third register or one of the two source registers. The execution unit preferably utilizes an Arithmetic Logic Unit (ALU)5066 capable of performing a wide variety of logical functions, such as shift, rotate, And (add), Or (Or) And XOR, as well as a wide variety of algebraic functions, including any of addition, subtraction, multiplication, And division. Some ALUs 5066 are designed for scalar operations and some for floating point. Depending on the architecture, the data may be Big Endian (where the least significant byte is at the most significant byte address) or Little Endian (Little Endian) where the least significant byte is at the least significant byte address. IBMIt is big endian. Depending on the architecture, the signed field may be the sign and magnitude, 1's complement, or 2's complement. A 2's complement is advantageous because the ALU does not need to design a reduction capability since within the ALU, negative or positive values in the 2's complement only need to be added. Numbers are typically described in stenography, where a 12-bit field defines the address of a 4,096 byte block, and is typically described as, for example, a 4Kbyte block.
Referring to FIG. 14B, branch instruction information for executing a branch instruction is typically sent to the branch unit 5058, and the branch unit 5058 often uses branch prediction algorithms (such as the branch history table 5082) to predict the outcome of the branch before other conditional operations are completed. The target of the current branch instruction will be fetched and speculatively executed before the conditional operation completes. When the conditional operation is completed, the speculatively executed branch instruction is completed or discarded based on the condition of the conditional operation and the speculative result. If the condition code conforms to the branch requirement of a branch instruction, typical branch instructions may test the condition code and branch to a target address, which may be calculated based on a number of numbers, including, for example, numbers found in a register field or immediate field of the instruction. The branch unit 5058 may employ an ALU5074 having a plurality of input register circuits 5075, 5076, 5077 and an output register circuit 5080. For example, the branch unit 5058 may communicate with general registers 5059, decode schedule unit 5056, or other circuitry 5073.
Execution of a set of instructions may be interrupted for a variety of reasons including, for example: context switches initiated by the operating system, program exception conditions or errors causing a context switch, I/O interrupts causing a context switch, or multi-threaded activity of multiple programs (in a multi-threaded environment). Preferably, the context switch action stores state information about the currently executing program and then loads state information about the other program being invoked. For example, the state information may be stored in hardware registers or in memory. The state information preferably includes a program counter value that points to the next instruction to be executed, condition codes, memory translation information, and architected register contents. The context switch activity may be exercised by hardware circuitry, application programs, operating system programs, or firmware code (microcode, pico code, or authorized internal code (LIC)), alone or in combination.
The processor accesses operands according to an instruction definition method. An instruction may provide an immediate operand using the value of a portion of the instruction, may provide one or more register fields that explicitly point to general purpose registers or special purpose registers (e.g., floating point registers). The instruction may utilize the implied register identified by the opcode field as an operand. The instruction may utilize memory locations for operands. The memory location of the operand may be provided by a register, an immediate field, or a combination of a register and an immediate field, such asA long displacement facility illustrates where an instruction defines, for example, a base register, an index register, and an immediate field (displacement field) that are added together to provide the location of an operand in memory. A location herein generally implies a location in main memory (main memory), unless otherwise indicated.
Referring to fig. 14C, a processor accesses memory using load/store unit 5060. The load/store unit 5060 may perform a load operation by obtaining the address of the target operand in memory 5053 and loading the operand in a register 5059 or another memory 5053 location, or may perform a store operation by obtaining the address of the target operand in memory 5053 and storing data obtained from a register 5059 or another memory 5053 location in the target operand location in memory 5053. The load/store unit 5060 may be speculative and may access memory in an out-of-order (relative to instruction sequence) sequence, however the load/store unit 5060 will maintain the appearance of a program executing instructions in order. The load/store unit 5060 may communicate with general registers 5059, decode/dispatch unit 5056, cache/memory interface 5053, or other elements 5083, and includes various register circuits, ALUs 5085, and control logic 5090 to calculate storage locations and provide pipeline sequencing to order operations. Some operations may be out of order, but the load/store unit provides functionality that makes out of order operations appear to a program as having been executed in order, as is well known in the art.
Preferably, the addresses that are "seen" by the application are often referred to as virtual addresses. Virtual addresses are sometimes referred to as "logical addresses" and "effective addresses". These virtual addresses are virtual in that they are redirected to a physical memory location by one of a variety of Dynamic Address Translation (DAT) techniques, including, but not limited to, adding a header only to a virtual address having an offset value, translating the virtual address via one or more translation tables, preferably containing, either alone or in combination, at least a segment table and a page table, preferably the segment table having an entry pointing to the page table. In thatA hierarchy of translations is provided, including a region first table, a region second table, a region third table, a segment table, and an optional page table. Performance of address translation is often improved by utilizing a Translation Lookaside Buffer (TLB), which contains entries that map virtual addresses to associated physical memory locations. These entries are created when DAT translates a virtual address using a translation table. Subsequent use of the virtual address may then utilize the entry of the fast TLB, rather than the slow sequential translation table access. TLB content may be managed by a variety of replacement algorithms including LRU (least recently used).
In the case where the processors are processors of a multi-processor system, each processor has the responsibility of keeping common information (such as I/O, cache, TLB, and memory) chained for coherency. Typically, "snooping" techniques will be utilized in maintaining cache coherency. In a snooping environment, each cache line is marked as being in any one of a shared state, a mutually exclusive state, a changed state, an invalid state, and the like, to facilitate sharing.
I/O unit 5054 (fig. 13) provides the processor with a way to attach to peripheral devices, including, for example, tapes, disks, printers, displays, and networks. The I/O cells are often presented to the computer program by a software driver. At a mainframe computer (such as fromIs/are as follows) Channel adapters and open system adapters are the I/O units of a mainframe computer that provide communication between the operating system and peripheral devices.
In addition, other types of computing environments may benefit from one or more aspects of the present invention. As an example, as referred to herein, an environment may include a simulator (e.g., software or other simulation mechanism) in which a particular architecture (including, for example, instruction execution, architected functions (such as address translation), and architected registers) or a subset thereof is simulated (e.g., on a native computer system having a processor and memory). In this environment, one or more simulation functions of a simulator may implement one or more aspects of the present invention, even though the computer executing the simulator may have a different architecture than the capabilities being simulated. As one example, in emulation mode, a particular instruction or operation being emulated is decoded, and the appropriate emulation function is built to implement the individual instruction or operation.
In a simulation environment, a host computer includes (for example): a memory storing instructions and data; an instruction fetch unit that fetches instructions from memory and, optionally, provides local buffering of fetched instructions; an instruction decode unit that receives the fetched instruction and determines the type of instruction that has been fetched; and an instruction execution unit that executes the instructions. Execution may include loading data from memory into a register; storing data from the register back to the memory; or perform some type of arithmetic or logical operation (as determined by the decode unit). In one example, each unit is implemented in software. For example, the operations being performed by these units are implemented as one or more subroutines within simulator software.
More specifically, in a mainframe computer, architected machine instructions are often used by programmers (today often "C" programmers) by compiling application programs. The instructions stored in the storage medium may be natively locatedIn a server or in a machine executing other architectures. Can be in the present and futureIn large computer servers andother machines (e.g., Power Systems servers andserver) to simulate it. Can be used byAMDTMAnd other hardware manufactured in a wide variety of machines executing Linux. Removing deviceIs at leastInstead of being executed on this hardware, Linux can be used, and machines using simulation by Hercules, UMX or FSI (fundamental software, Inc), in which case the execution is usually in simulation mode. In simulation mode, simulation software is executed by the native processor to simulate the architecture of the simulated processor.
A native processor typically executes emulation software, including firmware or a native operating system, to perform an emulation of the emulated processor. The emulation software is responsible for fetching and executing instructions of the emulated processor architecture. The simulation software maintains a simulated program counter to track instruction boundaries. The simulation software may fetch one or more simulated machine instructions at a time and convert the one or more simulated machine instructions into a corresponding set of native machine instructions for execution by the native processor. These translated instructions may be cached so that faster translations may be achieved. Although the simulation software will maintain the architectural rules of the simulated processor architecture in order to ensure that the operating system and applications written for the simulated processor operate correctly. Furthermore, the emulation software will provide the resources identified by the emulated processor architecture, including, but not limited to, control registers, general purpose registers, floating point registers, dynamic address translation functions including, for example, segment tables and page tables, interrupt mechanisms, context switch mechanisms, calendar (TOD) clocks, and architected interfaces to the I/O subsystem, so that an operating system or application designed to execute on the emulated processor can execute on the native processor with the emulation software.
The particular instruction being simulated is decoded and a subroutine is called to perform the function of the individual instruction. Simulation software functions that simulate the functions of a simulated processor are implemented, for example, in a "C" subroutine or driver routine, or some other method of providing drivers for specific hardware will be within the skill of those in the art after understanding the description of the preferred embodiment. Various software and hardware simulation patents, including but not limited to the following, describe various known ways to achieve simulation of instruction formats architected for different machines that may be used for target machines by those skilled in the art: U.S. patent certificate No. 5,551,013 entitled "Multiprocessor for Hardware Emulation" to Beausoleil et al; and U.S. patent certificate No. 6,009,261 entitled "Preprocessing of Stored Target routes for simulating incorporated Instructions on a Target Processor" to Scalazi et al; and U.S. patent certificate No. 5,574,873 entitled "Decoding Guest Instruction to direct Access Instructions of the Guest Instructions" of Davidian et al; and U.S. patent certificate No. 6,308,255 entitled "symmetric Multiprocessing Bus and chip use for processor Support Non-Native Code to Run in a System" by Gorishek et al; and U.S. patent certificate No. 6,463,582 entitled "Dynamic Optimizing Object code Transformer for Architecture implementation and Dynamic Optimizing Object code transformation Method" by Lethin et al; and U.S. patent certificate No. 5,790,825 to Eric Trout entitled "Method for simulating Guest instruments on a Host Computer Through Dynamic Recompatibilities of Host instruments" (each of these patents is hereby incorporated by reference in its entirety); and many others.
In FIG. 15, an example of a simulated host computer system 5092 is provided that simulates a host computer system 5000' of a host architecture. In the emulated host computer system 5092, the host processor (CPU)5091 is an emulated host processor (or virtual host processor) and includes an emulation processor 5093, the emulation processor 5093 having a native instruction set architecture that is different from the instruction set architecture of the processor 5091 of the host computer system 5000'. The emulated host computer system 5092 has a memory 5094 that is accessible to an emulation processor 5093. In an example embodiment, memory 5094 is partitioned into a host computer memory 5096 portion and an emulation routines 5097 portion. Host computer memory 5096 may be used for programs of emulated host computer 5092, according to the host computer architecture. Emulation processor 5093 executes native instructions of the architected instruction set of the architecture that are different from native instructions of emulated processor 5091, which are obtained from emulation routines memory 5097 and which may be accessed for execution from a program in host computer memory 5096 using one or more instructions obtained in a sequence and access/decode routine which may decode the accessed host instructions to determine a native instruction execution routine for emulating the function of the accessed host instructions. Other facilities defined for the host computer system 5000' architecture may be emulated by architected facility routines, including such facilities as general purpose registers, control registers, dynamic address translation and I/O subsystem support, and processor caches. The simulation routine may also utilize functions available in the simulation processor 5093, such as dynamic translation of general purpose registers and virtual addresses, to improve performance of the simulation routine. Special hardware and off-load engines may also be provided to assist the processor 5093 in emulating the functionality of the host computer 5000'.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below, if any, are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of one or more aspects of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.
Claims (10)
1. A computer system for executing a machine instruction in a central processing unit, the computer system comprising:
a memory; and
a processor in communication with the memory, wherein the computer system is configured to perform a method comprising:
obtaining a machine instruction for execution, the machine instruction defined for computer execution according to a computer architecture, the machine instruction comprising:
at least one opcode field to provide an opcode, the opcode identifying a convert from decimal floating point to zoned function;
a first register field specifying a first register, the first register containing a first operand;
a second register field and a displacement field, wherein contents of a second register specified by the second register field are combined with contents of the displacement field to form an address of a second operand; and
a mask field containing one or more controls used during execution of the machine instruction; and is
Executing the machine instruction, the executing comprising:
converting at least a portion of the first operand in decimal floating point format to zoned format, wherein said converting comprises: converting the appointed number of the rightmost significant digit of the first operand in decimal floating point format and the sign bit of the first operand into zone bit format; and
the result of the conversion is placed at a location specified by the address of the second operand.
2. The computer system of claim 1, wherein the mask field includes a sign control for indicating that the second operand has a sign field.
3. The computer system of claim 1, wherein said mask field comprises a location control for determining a value of a location field of said second operand.
4. The computer system of claim 1, wherein the mask field includes a plus code control for encoding a plus.
5. The computer system of claim 1, wherein the mask field includes a zero-forcing control to determine a sign of a result placed in the second operand.
6. The computer system of claim 1, wherein the mask field comprises a location field and a sign field, and wherein the method further comprises using at least one of the location field and the sign field to determine a value of at least one of the sign field and a field code of the result stored in the second operand.
7. The computer system of claim 1, wherein the machine instruction comprises a length field specifying at least one of a number of rightmost significant digits of the first operand to be converted and a length of the second operand.
8. A method for executing a machine instruction in a central processing unit, the method comprising:
obtaining, by a processor, a machine instruction for execution, the machine instruction defined for computer execution according to a computer architecture, the machine instruction comprising:
at least one opcode field to provide an opcode, the opcode identifying a convert from decimal floating point to zoned function;
a first register field specifying a first register, the first register containing a first operand;
a second register field and a displacement field, wherein contents of a second register specified by the second register field are combined with contents of the displacement field to form an address of a second operand; and
a mask field containing one or more controls used during execution of the machine instruction; and is
Executing the machine instruction, the executing comprising:
converting at least a portion of the first operand in decimal floating point format to zoned format, wherein said converting comprises: converting the appointed number of the rightmost significant digit of the first operand in decimal floating point format and the sign bit of the first operand into zone bit format; and
the result of the conversion is placed at a location specified by the address of the second operand.
9. The method of claim 8, wherein said mask field includes a sign control for indicating whether said second operand has a sign field, a location control for determining a value of a location field of said second operand, a plus code control for encoding a plus sign, and a force plus zero control for determining a sign of a result placed in said second operand.
10. The method of claim 8, wherein the machine instruction comprises a length field specifying at least one of a number of rightmost significant digits of the first operand to be converted and a length of the second operand.
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US13/339,526 | 2011-12-29 | ||
| US13/339,526 US9329861B2 (en) | 2011-12-29 | 2011-12-29 | Convert to zoned format from decimal floating point format |
| PCT/IB2012/056368 WO2013098668A1 (en) | 2011-12-29 | 2012-11-13 | Convert to zoned format from decimal floating point format |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| HK1201351A1 HK1201351A1 (en) | 2015-08-28 |
| HK1201351B true HK1201351B (en) | 2017-11-10 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US10303478B2 (en) | Convert from zoned format to decimal floating point format | |
| CN104025043B (en) | Convert from decimal floating-point format to zoned format | |
| CN104169877B (en) | Convert non-adjacent instruction specifiers to adjacent instruction specifiers | |
| HK1201351B (en) | Convert to zoned format from decimal floating point format | |
| HK1210845A1 (en) | Method and system for improving exception processing | |
| HK1210845B (en) | Method and system for improving exception processing |