[go: up one dir, main page]

WO1990002995A1 - Processeur modulaire ameliore, ordinateur l'integrant, et son mode de fonctionnement - Google Patents

Processeur modulaire ameliore, ordinateur l'integrant, et son mode de fonctionnement Download PDF

Info

Publication number
WO1990002995A1
WO1990002995A1 PCT/US1989/003656 US8903656W WO9002995A1 WO 1990002995 A1 WO1990002995 A1 WO 1990002995A1 US 8903656 W US8903656 W US 8903656W WO 9002995 A1 WO9002995 A1 WO 9002995A1
Authority
WO
WIPO (PCT)
Prior art keywords
instruction
processor
instructions
block
bus
Prior art date
Application number
PCT/US1989/003656
Other languages
English (en)
Inventor
Ronald L. Yin
Original Assignee
Yin Ronald L
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yin Ronald L filed Critical Yin Ronald L
Publication of WO1990002995A1 publication Critical patent/WO1990002995A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/80Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
    • G06F15/8007Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors

Definitions

  • the present invention relates to an improved universal, modular processor, and more particularly to an improved modular processor having a plurality of instruction processors, connected in parallel, receiving substantially simultaneously an instruction from an instruction fetch unit.
  • the plurality of instruction processors collectively processes all of the instructions of the instruction set of the improved processor.
  • the present invention also relates to a universal computer using the improved processor.
  • the present invention relates to a method for parallel processing.
  • Electronic digital processors are well known in the art. Whether they are circuits in discrete components or are integrated circuits in a single die, they are generally characterized by a central decoder for decoding all of the instructions (the instruction set) of the processor, and a central sequencer to carry out the execution of the instruction decoded.
  • prior art single chip processors have been of two general types. The first type is termed CISC or Complex Instruction Set Computer. The second type is termed RISC or Reduced Instruction Set Computer.
  • CISC Complex Instruction Set Computer
  • RISC Reduced Instruction Set Computer
  • the processor has a central microcoded control store which decodes a single instruction received into a plurality of microcoded instructions. Each microcoded instruction is executed in approximately one clock cycle.
  • the typical instruction is a complex instruction, there are many microcoded instructions for each instruction of the processor.
  • the effective processing speed of a CISC processor can be approximated by dividing the clock speed by the average number of microcode instructions per instruction received.
  • RISC processors differ from CISC processors only in that RISC processors reduce the complexity of the instruction set to a smaller number of primitive instructions. Since the processor executes less complex instructions, the number of microcode instructions per instruction received has also decreased. Many instructions in a RISC processor can be executed in a single clock cycle.
  • a modular processor for processing a plurality of different instructions which collectively form an instruction set for the processor comprises an instruction fetch unit for fetching an instruction.
  • An instruction bus receives the instruction from the instruction fetch unit.
  • a plurality of instruction processors are all connected in parallel to the instruction bus and receive substantially simultaneously the instruction from the instruction fetch unit.
  • the plurality of instruction processors collectively process all of the instructions of the instruction set.
  • Each of the instruction processors comprises an instruction decoder for decoding one of the instructions from the instruction set.
  • Each of the instruction processors further comprises an instruction execution circuit for executing the one decoded instruction.
  • each of the instruction processors has circuitry for signaling the instruction fetch unit upon completion of execution.
  • a method of executing a computer program is disclosed.
  • the method comprises the steps of determining a block of sequential non- branching instructions with the operands of the instructions in the block not depending upon the results of execution by the other instructions in the block.
  • the instructions of the block are then supplied to a plurality of instructions processors which processes each instruction without waiting for the completion of execution of the instruction(s) preceding it. All of the instructions in the block are executed completely before an instruction after the block is executed.
  • the present invention also relates to a universal computer using the improved modular processor.
  • Figure 1 is a schematic block diagram of one embodiment of the processor of the present invention, showing the instruction fetch unit and a plurality of instruction processors.
  • FIG 2 is a detailed schematic block diagram of the instruction fetch unit of the embodiment of the processor shown in Figure 1.
  • FIG 3 is a detailed schematic block diagram of an instruction processor of the embodiment of the processor shown in Figure 1.
  • FIG 4 is a detailed circuit diagram of the instruction decode circuit portion of the instruction processor shown in Figure 3.
  • Figure 5 is a timing diagram showing the timing of various signals used in the instruction decode circuit shown in Figure 4.
  • Figure 6 is a block schematic diagram of the instruction execution circuit portion of the instruction processor shown in Figure 3.
  • Figure 7 is a detailed schematic diagram of a portion of another embodiment of the processor of the present invention capable of performing parallel processing.
  • Figure 8 is a detailed circuit diagram of the parallel instruction decode circuit and the parallel instruction execution circuit shown in Figure 7.
  • Figure 9 is a schematic block diagram of yet another embodiment of the processor of the present invention capable of performing parallel processing.
  • Figure 10 is a detailed schematic block diagram of one embodiment of an instruction processor shown in Figure 9.
  • Figure 11 is a detailed schematic block diagram of another embodiment of an instruction processor shown in Figure 9.
  • Figure 12 is a detailed schematic circuit diagram of the instruction fetch unit of the embodiment of the processor shown in Figure 9.
  • Figures 13 & 14 are detailed circuit diagrams of the parallel instruction decode circuit and the parallel instruction execution circuit, respectively, shown in Figure 12.
  • Figure 15 is a schematic block diagram of an embodiment of an intelligent pre-fetch circuit.
  • Figure 16 is a detailed circuit diagram of the logic circuit portion shown in Figure 15.
  • Figure 17 is a schematic block diagram of the computer of the present invention incorporating the 5 processor of the present invention.
  • Figure 18 is a diagram showing the relationship between a computer's hardware and its software.
  • FIG. 10 diagram form an embodiment of a processor 10 of the present ' invention.
  • the processor 10 is connected to a memory 12 through a data bus 14, an address bus 16, and a control bus 18.
  • Each of the data bus 14, address bus 16, and control bus 18, comprises a plurality of signal
  • the processor 10 is connected to the memory 12 in a conventional well known manner.
  • the processor 10 comprises an instruction fetch unit 20 which serves to interface the processor 10 with an instruction fetch unit 20.
  • the processor 10 also comprises a plurality of instruction processors 22 (a..z).
  • An internal data bus 24 connects the instruction fetch unit 20 to each of the instruction processors 22 (a...z).
  • instruction processors 22 (a..z) is connected in parallel to the internal data bus 24 and can receive substantially simultaneously the instruction or data that is present on the internal data bus 24.
  • Each of the instruction processors 22 (a..z) is a.
  • Each of the instruction processors 22(a..z) is adapted to process a group of instructions.
  • a group of instructions can include, for example, ⁇ Add" and "Subtract”.
  • each of the instruction processors 22(a..z) processes a single instruction.
  • a single instruction can be, for example, "Add Register”.
  • each instruction processor 22(a...z) and the instruction fetch unit 20 can be made of a single integrated circuit die. Because, only an instruction processor 22(a..z) is on one die, the die area is smaller than a conventional single die processor. Thus, each of the instruction processors can be made entirely of hard-wire logic, without any microcode. Alternatively, the processor 10 can be a single integrated circuit.
  • the instruction fetch unit 20 comprises a memory data register (MDR) 32 which stores the instruction/data received on the data bus 14.
  • the instruction fetch unit 20 also comprises a memory address register (MAR) 34 which stores the value of the address at which the instruction/data is supplied on the data bus 14 to or from the processor 10.
  • the contents of the memory address register (MAR) 34 is supplied from a program counter 36.
  • the program counter 36 is incremented or decremented by an incrementer/decrementer 38.
  • the instruction fetch unit 20 also comprises a register array 40.
  • the data on the register data bus 26 is stored in a data register 42 to which the register data bus 26 is connected.
  • the address for the register array 40 is stored in an address register 44, to which the register bus 28 is connected.
  • One of the signals on the internal control bus 30 is the line PC 46.
  • a signal on the line PC 46 signifies to the incrementer/decrementer 38 to increment the program counter 36 which is then supplied to the MAR 34 for the next fetch cycle.
  • Another signal line on the internal control bus 30 is the line CTL 48.
  • the function of the line CTL 48 will be discussed hereinafter.
  • other signal lines 50 are part of the internal control bus 30. These include lines such as write enable, hold, acknowledge, wait, ready, sync, and various clock signals.
  • the instruction processor 22a comprises an instruction decode circuit 22a ' ! and an instruction execute circuit 22a 2 —each of which is connected in parallel to receive substantially simultaneously the data/instruction from the internal data bus 24.
  • the instruction decode circuit 22a ⁇ _ is connected to the lines PC 46 and CTL 48 (in addition to other control lines 50) .
  • an instruction execution enable (IEE) line 52 is supplied from the instruction decode circuit 22a* ] _ to the instruction execute circuit 22a 2 .
  • the instruction execute circuit 22a 2 is also connected to the PC line 46 (in addition to the register data bus 26 and the register address bus 28) .
  • the interconnection of the instruction decode circuit 22a*]_ and the instruction execute circuit 22a 2 to the various control lines of the internal control bus 28 is highly dependent upon the instruction architecture of the processor 10.
  • the embodiment shown in Figure 3 and as discussed hereinafter in greater detail emulates the Intel 8080 processor instruction set, wherein the data bus 24 is 8 bits wide and each instruction is a single byte.
  • 8080 instruction set see The Intelligent Microcomputer by Roy W. Goody, Science Research Associates, Inc., 1986 (Second Edition) .
  • FIG. 4 there is shown in greater circuit detail the instruction decode circuit 22a 1 .
  • the operation of the processor 10 will be described with respect to the processing of an eigh -bit wide data bus 14 upon which instructions are transferred having the format of a first byte being the instruction code or operation code (op code) followed by one or more bytes that comprise the data (operand) upon which the op code operates.
  • the signal lines D 0 ..D N are the 8 lines of the internal data bus 24.
  • Each of the lines of 'the data bus 24 is gated with the line CTL 48 through an AND gate 60 into an instruction decode logic circuit 62.
  • the instruction decode logic circuit 62 is a specific logic circuit for each of the individual instruction decoders (22a-j_...22z- * i_) .
  • the instruction decode logic circuit 62 of one particular instruction decoder 22a-j_ may decode the signal pattern of "11000011", which in the 8080 instruction set is the instruction for JUMP.
  • Another instruction decode logic circuit 62 may decode the bit pattern of "11001101” which is CALL ADDRESS.
  • Each of the instruction decode logic circuit 62 is an electrically alterable logic circuit, e.g., an EEPAL (electrically erasable programmable array logic) . Such a circuit is well known in the art. Such a circuit can be reprogrammed by the contents of the memory 64.
  • the output 70 of the instruction decode logic circuit 62 will be high if the bit pattern present on the data bus 24 matches the pattern set for the instruction decode logic circuit
  • the output 70 is supplied to both an OR gate 72 and an AND gate 74.
  • the processor 10 also processes each byte in a single clock cycle.
  • the output of the OR gate 72 is supplied to another AND gate 76 which has as another input thereof, the clock signal CLK.
  • the output of the AND gate 76 is supplied to a counter 80.
  • the counter 80 is a four-bit counter and the outputs thereof are supplied to another logic circuit 84 and to an OR gate 82.
  • the output of the logic circuit 84 is supplied to another AND gate 86 to which the signal CLK is also supplied.
  • the output of the AND gate 86 is used to reset the counter 80.
  • the signal is supplied as the IEE signal 52 and is also supplied to a NOR gate 90 having multiple inputs.
  • the other inputs of the NOR gate 90 are supplied from each of the other individual instruction decode circuits 22X ] _.
  • the output of the gate 90 is the CTL line 48.
  • FIG. 6 there is shown a schematic block diagram of the gating of the signals from the data bus 24 by the IEE line 52 into the instruction execution circuit 22a 2 .
  • the instruction decode unit 22a-j_ is adapted to decode a JUMP instruction of which three bytes are provided.
  • the first byte is the op code which is followed by two bytes representing the lower order and the higher order respectively of the address to which control is transferred.
  • the first byte or the op code is received by all of the instruction processors 22(a..z) .
  • only the particular instruction decode logic circuit 62 which is adapted to decode the bit pattern of "11000011" results in the output 70 going high.
  • the counter 80 is set to "0000". This results in IEE 52 being low.
  • the clock signal CLK that accompanies the data would cause a pulse to be sent through the AND gate 74 to the PC line 46. This would be a signal to the instruction fetch unit 20 to fetch the next byte of data/instruction.
  • the CLK pulse With the output 70 going high, the CLK pulse would be gated through the AND gate 76 to trigger the counter 80 to "0001".
  • the output of the OR gate 82 would be high. This would send the IEE line 52 to high.
  • the input of the NOR gate 90 would also be high, which causes CTL 48 to go low.
  • the logic circuit 84 would still output a low.
  • the output of the OR gate 82 would still be high and IEE line 52 would still be high.
  • CTL 48 would continue to be low.
  • IEE 52 high, the signal on the data bus 24, representing the first byte of the operand, will be gated into the instruction execution circuit 22a 2 that is associated with the instruction decode circuit 22a* ] _.
  • the second byte of the instruction is then gated by the AND gates 92 into the instruction execution circuit 22a 2 .
  • the instruction execution circuit 22a 2 may simply store the data signal into the registers 40 of the instruction fetch unit 20. It would then issue a PC signal on the line PC 46 to the instruction fetch unit 20 to fetch the next sequential byte of instruction.
  • the third byte of the instruction again would not be decoded by the instruction decode logic circuit 62.
  • the clock signal CLK would be gated through AND gate 76 bringing the counter to the count of "0011". Since this bit pattern is detected by the logic circuit 84, the output of the logic circuit 84 goes high.
  • the inverse clock signal, CLK which occurs after the CLK signal, is gated through the AND gate 86, resetting the counter 80. Before that occurs, however, the IEE line 50 would still have been high and would gate the third byte of the instruction through the AND gates 92 into the instruction execution unit 22a 2 .
  • the output of the counter 80 With the counter 80 being reset, the output of the counter 80 would have a bit pattern of "0000" which causes IEE 52 to go low. With IEE 52 going low, ' CTL 48 would return to a high state. This would then permit the next block of bytes comprising a single instruction having an op code and its operand be processed by the processor 10.
  • the processor 10 does not have a central control store. Rather, each instruction processor 2 (a...z) has a control store to decode the instruction and then to execute it.
  • the processor 10 is modular. A new processor, which is upwardly software compatible, would require only the addition of a new instruction processor. Further, a plurality of the same instruction processors 22(a...z) can be used in the processor 10 to achieve greater speed through parallel processing (discussed hereinafter) .
  • the instruction decode logic circuit 62 is electrically alterable, the processor 10 functions as a universal processor. The processor 10 can process programs that have been written with the same functional instructions but with different bit patterns or formats.
  • both op code and operand can be supplied in a single data word. In that event, the line CTL 48 would not be needed.
  • FIG. 7 there is shown a schematic circuit diagram of a portion of another embodiment of a processor 110.
  • the processor 110 is adapted to execute parallel processing.
  • the processor 110 comprises an instruction fetch unit 120, and a plurality of instruction processors 22 (a..z) (shown in Figure 1) , all as connected and shown in Figure 1.
  • One of the instruction processors 22 (a..z) is a parallel instruction processor 22p (shown in Figure 7) .
  • the function of the parallel instruction processor 22p is to permit the processor 110 to operate a block of certain instructions in parallel.
  • the instructions must comprise a plurality of non-branching sequential instructions, with each instruction having operand(s) which does not depend upon the results of execution by the other instructions in the block. Some may even have the same op code.
  • a branching instruction such as JUMP or CALL
  • the branching instruction must be the last instruction of the block.
  • the block must not contain more than one branching instruction. This is because if an instruction is a branching instruction (conditional or unconditional) , the execution of that instruction may result in program execution branched to a non- sequential location.
  • non-branching instructions up to and including the last branching instruction, are sequential.
  • a plurality of non-branching instructions can include a conditional or unconditional branching instruction as the last instruction in the block.
  • the op code of each instruction in the block must not be the same if there is only one instruction processor adapted to execute that instruction. Thus, for example, if only one instruction processor operates on the instruction MOV, then clearly the instruction MOV cannot be in the block more than once.
  • the choice of which instruction processors 22(a...z) will be duplicated so that instructions with the same op code can be operated on in the block will be dependent on the application for the processor 110.
  • the instruction processors 22 should process instructions dealing with computation, such as addition, subtraction etc. As will be appreciated by those skilled in the art, many different embodiments of the processor 110 are possible.
  • a parallel instruction which comprises an op code for parallel instruction followed by the operand which is a number indicating the number of instructions that follow, in which parallel processing can occur.
  • op code for parallel instruction followed by the operand which is a number indicating the number of instructions that follow, in which parallel processing can occur.
  • non-branching, sequential instruction 3-7 operate on operands, which do not depend upon the results of execution by the other instructions in the same block. Thus, they can be processed in parallel.
  • This assumes that there are two instruction processors that can decode and execute the instruction LXI, at the same time.
  • an instruction in a program is fetched from memory, decoded, and the operand is completely executed before the execution of a second instruction is commenced.
  • the processor 110 comprises a plurality of independent instruction processors
  • each of which can operate independently of the other instruction processors 22 (a...z)—so long as the operands upon which each instruction processor 22 (a...z) operates does not depend upon the results of execution by the other instruction processors
  • the processor 110 having a parallel instruction processor 22p, recognizes the instruction for parallel processing would permit the instructions that follow the parallel instruction operand to be processed as fast as the instruction fetch unit 120 can retrieve the instruction from memory (or from registers within the fetch unit 120, which have been pre-fetched) without waiting for a single instruction to be completely executed before commencing the execution of the next instruction.
  • the processor 110 can greatly speed up the processing of that block of instructions.
  • the instructions are processed sequentially in a conventional manner.
  • the parallel instruction processor unit 22p-L and the parallel instruction execution unit 22p can be made a part of the instruction fetch unit 120. Because the parallel instruction processor 22p is similar to any other instruction processor 22(a..z), the parallel instruction processor 22p can be added on odularly, as previously described.
  • the instruction fetch unit 120 is very similar to the instruction fetch unit 20 described previously. However, instead of a single memory data register 32, the instruction fetch unit 120 has a plurality of (8) shift registers 13.2 arranged in parallel. Thus, a single clock cycle shifts 8 bits out of the 8 shift register 132 onto the data bus 24. Similar to the instruction fetch unit 20, the instruction fetch unit 120 has an MAR 34, the program counter 36, an increment/decrementer 38, an array of register 40, a data registers 42, and address registers 44. In addition to the PC line 46, a PPC line 136 and a parallel line 134 interconnect the instruction fetch unit 120 with the parallel instruction processor 22p.
  • FIG. 8 there is shown in greater detail a schematic circuit diagram of the parallel instruction decode circuit 22p* j _ and the parallel instruction execution circuit 22p 2 .
  • CTL 48 is high, i.e., this is the beginning of a new instruction code cycle. If CTL 48 is high then data along the data bus 24 is gated through the AND gates 60 into the instruction decode logic circuit 62.
  • the instruction decode logic circuit 62 i * s specifically designed to decode the bit pattern that corresponds to the instruction for parallel processing.
  • the instruction decode logic circuit 62 is an electrically alterable circuit, i.e., an electrically alterable programmable logic array. The contents of memory 64 can be used to alter the electrically alterable instruction decode logic circuit 62.
  • the output 70 of the instruction decode logic circuit 62 will be high.
  • the clock signal CLK is gated through AND gate 74 resulting in a pulse sent to the instruction fetch unit 120 along the line PC 46. This would signal the instruction fetch unit 120 to fetch the operand portion of the parallel processing instruction.
  • the pulse which is the output of the AND gate 74 is also used to set a flip flop 150.
  • the Q output of the flip flop 150 is connected to the line Parallel 134. With the flip flop 150 set, Parallel 134 would go high. When Parallel 134 goes high, it is inverted by the inverter 138 and is low at the input to the AND gate 140.
  • the format for the parallel instruction code is that the first byte, would contain the op code with the operand being in the second byte which is in the second clock cycle.
  • the operand would comprise a number indicating the number of instructions after the parallel instruction which can be processed in parallel.
  • the operand would then be gated into the parallel instruction execution circuit 22p by IEE 52.
  • the operand would be gated through the AND gates 92 into the counter 152.
  • the clock signal CLK which accompanies the second byte of the instruction would be gated through the AND gate 153 along with IEE 52 into the load port of counter 152.
  • the operand is then loaded into the counter 152.
  • the output of the counter 152 is then supplied to an OR gate 154.
  • the output of the OR gate 154 would be high.
  • the clock CLK is gated through the AND gate 156.
  • Parallel 134 is high and the output of the OR gate 154 is high, the signals from CLK are used to supply pulses to PPC 136 and to the count down pin of the counter 152. In addition, the pulses would also be supplied into the count up input of the counter 158.
  • the effect of the clock signal CLK is to supply as many pulses to PPC 136 and counter 158 as there are in the number loaded in the counter 152. Thus, for example, if the counter 152 were loaded with the number 10, signifying 10 bytes that follow which can be processed in parallel, there would be 10 CLK pulses gated through the AND gate 156. When counter 152 reaches 0, the output of the OR gate
  • the counter 158 will have the value of 10.
  • 10 pulses would have been sent along the line PPC 136.
  • the function of the PPC line 136 is to supply 10 pulses to the increment/decrementer 38 through the OR gate 142.
  • 10 pulses would be supplied to the array of shift register 132 bringing 10 sequential bytes down the data bus 24.
  • the shift register 132 would have been pre-loaded with instructions by pre-fetching, a technique well known in the art.
  • the CLK pulse that accompanies it is also passed through the AND gate 74 (the output of OR gate 72 would still be high because IEE 52 is still high) .
  • the second pulse through the AND gate 76 would set the counter 80 to the state of "10". This is detected by the inverter and the AND gate 84 such that the signal CLK that follows the CLK signal is gated through the AND gate 86 to reset the counter 80.
  • the output of OR gate 82 will be "0", bringing IEE 52 to low. This would then turn off the input to the AND gate 76. Further, the NOR gate 90 would return CTL 48 to high.
  • each of the other instruction processors 22 (a...z) has finished the processing of its operand and each has generated a pulse along the PC line 46, the PC signals would be inhibited from affecting increment/decrementer 36 by Parallel 134 being high. However, each of the PC signals would be passed through AND gate 160. The output of the AND gate 160 is supplied through the count down input of the counter 158. Thus, each pulse along PC 46 is used to decrement the counter 158. When the counter 158 reaches the count of "0001", the output of the counter 158 through the inverter 162, OR gate 164, and inverter 166 would be high.
  • the processor 110 is completely compatible with existing programs. Programs which do not have the parallel instruction would not activate the parallel instruction processor 22p.
  • the processor 110 can execute existing programs and programs using the parallel instruction. However, as.can be seen, each of the bytes within the block are supplied along the data bus 24 in a single clock cycle.
  • FIG. 9 there is shown in block diagram another embodiment of a processor 210 of the present invention.
  • the processor 210 is similar to the processor 110 in that it is capable of parallel processing.
  • the theoretical basis for parallel processing of the processor 210 is the same as the basis for the processor 110.
  • the processor 210 comprises an instruction fetch unit 220 which serves to interface the processor 210 with the data bus 14, address bus 16, and control bus 18.
  • the processor 210 also comprises a plurality of instruction processors 22(a...z..).
  • a plurality of internal data buses 24(a,b) connect the instruction fetch unit 220 to each of the instruction processors 22(a...z..).
  • Each of the instruction processors 22(a..z..) is connected in parallel to one or more of the internal data bus 24(a or b) and can receive substantially simultaneously the instruction or data that is present on that connected internal data bus 24(a or b) .
  • Some of the instruction processors, e.g. 22(k...) are connected to both of the data buses 24(a,b).
  • the other instruction processors 22(a%) and 22(z..) are connected to only data bus 24a or 24b, respectively.
  • Each of the instruction processors 22(a..k...z.. ) is also connected to the register data bus 26, the register address bus 28, and the internal processor control bus 30.
  • the register data bus 26, the register address bus 28 and the internal processor control bus 30 are all connected to the instruction fetch unit 220 which contains the internal registers 40 of the processor 210.
  • the internal control bus 30 comprises all the internal signal lines for the control of the processor 10.
  • each of the instruction processors 22(a..k..z..) is adapted to process a group of instructions.
  • a group of instructions can include, for example, add and subtract.
  • each of the instruction processors 22 (a..k..z.. ) processes a single instruction.
  • a single instruction can be, for example, "Add Register".
  • the instruction processors 22 (a%) that are connected to the data bus 24a are duplicative of the instruction processors 22 (z..)- As for tne instruction processors (k%), connected to both data buses 24(a,b), any instruction on the data bus 24a or 24b can be processed by those instruction processors 22 (k). All of the instruction processors 22 (a%) and 22 (k... ) connected to the instruction bus 24a process all of the instructions of the instruction set of the processor 210.
  • FIG. 10 there is shown in greater detail a schematic diagram of an instruction processor of the type 22(a).
  • the instruction processor 22a is identical to the instruction processor 22a shown in Figure 3. The only difference is that since there are two internal data/instruction buses 24 (a,b), there are two PC 46 and CTL 48 lines. Thus, the instruction processor 22a shown in Figure 10 is connected to PCI line 46a and CTL1 line 48a. Of course an instruction processor of the type 22(z...) would be connected to PC2 line 46b and CTL2 line 48b.
  • Each of the instruction processors of the type 22 (k%) which is connected to both data buses 24(a,b) is shown generally in Figure 11.
  • An instruction decode circuit 22k la similar to the instruction decode circuit 22a- j _ decodes the instruction from the bus 24a. If the instruction received on the bus 24a is the instruction for the instruction processor 22k to execute, then the IEE signal 52a is generated. Similarly, the instruction decode circuit 22 ⁇ -, decodes the instruction from the bus 24b. If appropriate, the IEE signal 52b is generated. The IEE signal 52a or 52b is used to switch the multiplexer 212, which receives both of the buses 24a and 24b.
  • the output of the MUX 212 is supplied to a plurality of AND gates 92, which also receives the output of the OR gate 214.
  • the OR gate 214 receives as its input the IEE signal 52a and 52b.
  • one of the buses 24(a or b) contains the instruction for decode and execution by the instruction processor 22k, then the corresponding decode circuit 22k la or 22 ⁇ ] -, is activated, activating IEE 52a or 52b.
  • the corresponding instruction bus 24a or 24b is then passed through the MUX 212 and is supplied to the AND gates 92 into the instruction execute circuit 22k 2 .
  • the processor 210 Since some of the instruction processors 22(k... ) are connected to both of the internal data buses 24(a,b), the processor 210 is restricted to operate in a mode where the same instruction for decoding and execution as represented by the instruction processors 22(k%) do not appear at the same time on the buses 24(a,b) .
  • the choice of which instruction processors 22(a—) and 22(z..) will be duplicated so that instructions with the same op code can be operated on simultaneously will be dependent on the application for the processor 210.
  • the instruction processors 22(a%) and 22(z..) should process instructions dealing with computation, such as addition, subtraction etc. As will be appreciated by those skilled in the art, many different embodiments of the processor 210 are possible.
  • the instruction fetch unit 220 comprises a pre-fetch unit 202.
  • the pre-fetch unit 202 is of conventional design and can comprise, for example, a memory data register array 132 which receives the instructions/data from the external data bus 14.
  • the pre-fetch unit 202 also comprises a memory address register 34, a program counter 36, and an incrementer/decrementer 38, which are connected to the external address bus 16.
  • the pre ⁇ fetch unit 202 also comprises a central register array 40 to which all of the instruction processors 22a...22z are connected via register data bus 26 and register address bus 28. Some of the logic circuits necessary for parallel processing are also shown.
  • the instruction fetch unit 220 also comprises a single internal data bus 24 from the memory data register array 132.
  • the internal bus 24 is as wide as the external bus 14.
  • a parallel instruction processor 22p comprising of a parallel instruction decode 22p--_ and a parallel instruction execute 22p 2 , is connected to the internal bus 24.
  • the function of the parallel instruction processor 22p is to permit the processor 210 to operate on the block of instructions in parallel, by supplying each instruction in the block to each of the data buses 24 (a,b).
  • a parallel instruction which comprises an op code for parallel instruction followed by the operand, which is a number indicating the number of instructions that follow, in which parallel processing can occur.
  • FIG. 13 there is shown in greater detail a schematic circuit diagram of the parallel instruction decode circuit 22p* ] _ and the parallel instruction execution circuit 22p 2 .
  • CTL1 48a is high, i.e., this is the beginning of a new instruction code cycle. If CTL1 48a is high then data along the internal bus 24 is gated through the AND gates 60 into the instruction decode logic circuit 62.
  • the instruction decode logic circuit 62 is specifically designed to decode the bit pattern that corresponds to the instruction for parallel processing. Further, the instruction decode logic circuit 62 is an electrically alterable circuit, i.e., an electrically alterable programmable logic array.
  • the contents of memory 64 can be used to alter the electrically alterable instruction decode logic circuit 62. If the instruction decoded is the parallel processing instruction, then the output 70 of the instruction decode logic circuit 62 will be high.
  • the clock signal CLK is gated through AND gate 74 resulting in a pulse sent to the pre-fetch unit 202 along the line PCI 46a. This would signal the pre-fetch unit 202 to fetch the operand portion of the parallel processing instruction.
  • the pulse, which is the output of the AND gate 74 is also used to set a flip flop 150.
  • the Q output of the flip flop 150 is connected to the line Parallel 134. With the flip flop 150 set, Parallel 134 would go high.
  • Parallel 134 When Parallel 134 goes high, it is inverted by the inverter 138 and is low at the input to the AND gate 140. This would inhibit any further signals on PCI 46a from being gated through the AND gate 140 into the incrementer/decrementer 38. Thus, once Parallel 134 goes high, the signal that is normally sent along PCI 46a by each of the instruction processors 2 (a...k... ) after the completion of the processing of each byte of an instruction will be inhibited from affecting the pre-fetch unit 202.
  • the output 70 is gated through the OR gate 72 and is supplied to the AND gate 76.
  • the CLK pulse which accompanies the first byte of the instruction of parallel processing is gated through AND gate 76 into the counter 80.
  • the counter 80 is a two-bit counter. Once the first byte of data is decoded by the instruction decode logic circuit 62 and the clock signal CLK is supplied through the AND gate 76 into the counter 80, the output of the counter 80 would have a bit pattern of "01". That would set the output of the OR gate 82 to high, bringing IEE 52 to high. This is also supplied to the NOR gate 90 which would then bring CTL1 48a to low.
  • the format for the parallel instruction code is that the first byte contains the op code with the operand being in the second byte which is in the second clock cycle.
  • the operand would comprise a number indicating the number of instructions after the parallel instruction which can be processed in parallel. Since in this embodiment there are only two buses 26 (a & b) , the operand portion of the parallel instruction need not be specified. If the. parallel instruction is present, then the operand must be "10" - the only number possible in the operand portion of the parallel instruction. Further, it is possible not to use the parallel instruction preceding the block of instructions capable of being parallel processed. For example, it is also possible to use an extra single bit with all of the instructions in the block of instructions capable of parallel processing, signifying that these instructions can be processed in parallel. Instructions that are not parallel processed will not have this extra bit turned on.
  • the operand would then be gated into the parallel instruction execution circuit 22p 2 by IEE 52.
  • the operand would be gated through the AND gates 92 into the counter 152.
  • the clock signal CLK which accompanies the second byte of the instruction would be gated through the AND gate 153 along with IEE 52 into the load port of counter 152.
  • the operand is then loaded into the counter 152.
  • the output of the counter 152 is then supplied to an OR gate 154. Once the counter 152 has been loaded with the operand of the parallel instruction code, the output of the OR gate 154 would be high.
  • the clock CLK is gated through the AND gate 156.
  • the function of the PPC line 136 is to increment the program counter 38 with the number of bytes which have been processed in parallel.
  • the output of the counter 152 is also supplied to inverter 212 and AND gate 214.
  • the output, PAR-MUX 216, of the AND gate 214 is used to switch the MUX 218 so that the instruction/data from the internal bus 24 will be supplied to the registers 224(a,b) associated with each bus 24(a,b). Since in this embodiment there are only two buses 24(a & b) , the operand portion of the parallel instruction must be the number "10".
  • PAR-MUX 216 is high and switches the Mux 218 so that the first instruction after the parallel instruction is supplied to the bus register 224b.
  • the PCI signal would be inhibited from affecting increment/decrementer 38 by Parallel 134 being high.
  • the signal PCI is used to set the bus flag 226a.
  • the signal PC2 generated by the instruction processor processing the instruction from the bus 24b is used to set the bus flag 226b.
  • the output of the bus flags 226(a,b) is gated into an AND gate 228, which is used to reset flipflop 150, thereby turning Parallel 134 to low.
  • the processor 210 is completely compatible with existing programs. Programs which do not have the parallel instruction would not activate the parallel instruction processor 22p.
  • the processor 210 can execute existing programs and programs using the parallel instruction. Further, the conversion of existing programs into one capable of parallel processing is extremely simplified. The conversion process can be done with a compiler compiling a source program or with a utility program operating on a binary file. All that is required is to determine which immediate pairs (in the case of two buses 24(a,b) ) of instructions can be processed in parallel.
  • FIG. 15 there is shown a schematic block diagram of an intelligent pre-fetch circuit 250 which can be a portion of the instruction fetch unit 20, 120 or 220.
  • the instruction/data is fetched from memory 12 and is supplied on the external bus to the Memory Data Register 32. From the Memory Data Register 32, the instructions are supplied on the bus 248.
  • a branch instruction decode circuit 22b is connected to the bus 248.
  • the branch instruction decode circuit 22b* j _ can decode all of the branching instructions (conditional or unconditional) of the instruction set, processed by the processor 10. These can include JUMP, CALL, RETURN etc.
  • the branch instruction decode circuit 22b-j_ is like the decode circuit discussed heretofore, except that it comprises only a logic circuit 62.
  • the IEE signal 52 is the output of the logic circuit 62.
  • the IEE signal 52 is also used to set a one bit register 252, preserving the high status of IEE 52.
  • the output 254 of the register 252 is supplied to the logic circuit 256, which is shown in greater detail in Figure 16.
  • the branch instruction execute circuit takes the byte(s) following the branch instruction as the address to which branching should or might occur, and supplies the address to the MUX 258. From the MUX 258, the address is supplied to either PCI 36a or PC2 36b. The address is stored in the Program Counter (PCI or PC2) which is not being currently used.
  • PCI or PC2 Program Counter
  • the logic circuit 256 waits for eight clock pulses before setting SW high.
  • the SW signal is used to switch the MUX 268.
  • the eight clock pulses permit eight bytes to be stored into one of the register arrays 262a or 262b, which corresponds to the program counter 36a or 36b, which is currently being used.
  • instruction/data are then supplied to the register array 262a or 262b, which corresponds to the program counter which is not currently being used.
  • the fetched instructions are stored in the non-current register array 262a or 262b.
  • the logic circuit 256 gates another plurality of clock pulses (eight in one embodiment) , and thus permits eight bytes of instruction/data to be stored in the non-current register array 262a or 262b.
  • the logic circuit 256 then generates the Reset signal.
  • the operation of the intelligent pre-fetch circuit 250 will be described with reference to the following example.
  • Program Counter PCI 36a is being used, and data/instructions from the register array 262a are supplied to the bus 24.
  • the next fetched instruction is a branching instruction. This is decoded by the circuit 22b ] _, generating IEE 52. With IEE 52 going high, register 252 is set. Further IEE 52 going high sets the MUX 258 such that the next byte(s) following the branch instruction, indicating the address for the branch would be loaded into the Program Counter PC2 36b.
  • the circuit 250 would continue to fetch instruction from the memory 12, using the contents of PCI 36a, and storing the instructions into the register array 262a. Thus, eight bytes immediately after the branch instruction would be stored in the register array 262a.
  • the processor 10, 110 or 210 would continue to use the instructions from the register array 262a for execution. This is done through bus 24 and MUX 270.
  • the MUX 268 and MUX 272 are switched.
  • the fetching of the instructions from the memory 12 would now be based upon the address from the contents of the PC2 36b - the branched to address.
  • the fetched instructions are stored- in the register array 262b.
  • the logic circuit 256 After eight bytes have been fetched, the logic circuit 256 generates the Reset signal.
  • the Reset signal is used to switch the MUX 268, MUX 258 and MUX 272 back to their original setting.
  • the execution of the instruction/data from the register array 262a continues.
  • the pre-fetch circuit 250 would continue to fetch based upon the address from the Program Counter PCI 36a and store the instructions in the register array 262a.
  • the BR signal (from the processor 10, 110 or 210) sets the flip-flop 278, which changes the connection of the bus 24 through MUX 270 to the register array 262b. Since the instructions from the branched to address have been pre-fetched and stored in the register array 262b, execution by the various instruction processors 22 can continue without waiting for the processor 10 to fetch the requisite instruction from memory 12. Further, the BR signal switches the MUX 268 so that instructions from the bus 248 are then supplied to the ' register array 262b. Further, the current Program Counter being used becomes PC2 36b, because MUX 272 and MUX 274 are also switched. Finally MUX 258 is also switched to point to PC2 36b as the current Program Counter. This would permit PCI 36a and register array 262a to be used in the event a branch instruction is detected on the bus 248.
  • the computer 300 comprises the processor 10, 110 or 210 of the present invention and is connected to a memory 12 through a data bus 14, an address bus 16, _and a control bus 18.
  • the computer 300 also comprises a storage device 322, such as a DASD ..(Direct Access Storage Device). Typically, the computer programs for the computer 300 are stored in the DASD 322 and are retrieved into the memory 12, by a DASD controller 320, where they can be accessed by the processor 10 and executed. This is well known in the art.
  • the computer 300 also comprises an input device
  • the computer 10 comprises a display 312 to receive the output of the processor 10.
  • the input 310 and the display 312 form the input/output devices of the computer 300 and is well known in the art.
  • Control memory 64 reprograms the logic circuit 62 of each instruction processor 22, such that for a different instruction set the same function may be expressed in a different binary format.
  • the JUMP instruction in the 8080 instruction set is represented by the binary op code of "11000011".
  • a JUMP ⁇ function is represented by the binary op code of
  • control memory 64 is used to change the instruction decode logic circuit 62 so that for another instruction set, the different binary format will activate the instruction processor 22.
  • FIG. 18 there is shown a diagram of the relationship between a computer's hardware components and its software components.
  • the hardware of a computer includes the instruction set of the computer.
  • An operating system program is designed to be executed on the particular hardware components of the computer including its particular instruction set.
  • Application programs are written to operate under specific operating systems. Thus, application programs are also constrained by the instruction set of the processor used in the hardware components of the computer.
  • a plurality of different computer programs are stored in the DASD 322.
  • Each of the different computer programs is executable under a different instruction set.
  • a first computer program is retrieved from the DASD 322 by the controller 320, and is loaded into the memory 12.
  • the first program retrieved from the DASD 322 is executable under a first instruction set.
  • the control memory 64 activates the particular instruction processors 22 which define the first instruction set. Depending on the instruction set chosen, this may be all of the instruction processors 22 (a...z..) . However, some instruction processors 22 might not be chosen and thus the line to the AND gates 60 would not be activated.
  • control memory 64 changes the instruction decode logic circuit 62 such that the format of the binary sequence for the activated instruction processor 22 would match the format of the binary sequence of the instructions of the first instruction set.
  • the control memory 64 activates another set of instruction processors 22, which correspond to the instructions of a second different instruction set. If the instructions of the second instruction set do not have identical corresponding function as the instructions of the first instruction set, some of the functions (represented by the instruction processor) of the first instruction set may not be in the second instruction set. Thus, some of the instruction processors 22, activated for the first instruction set, may not be activated for the second instruction set. Further, some instruction processors 22, not activated for the first instruction set, may be activated for the second instruction set.
  • the format of the activated instruction processors 22 are changed, if need be, so that the binary sequence format of the instruction processors 22 match the expected binary sequence format of the instructions of the second instruction set. Once the plurality of instruction processors 22 are chosen to process the second instruction set, the second program is then executed.
  • the processor 10 is configured to execute the instruction set of an Intel 80x86.
  • An operating system program such as MS-DOS is retrieved from DASD 322 and is loaded into memory 12.
  • An application program such as word processing, executable under MS-DOS is also retrieved from DASD 322 and is executed by the processor 10.
  • the processor 10 is reconfigured to execute the instruction set of the Motorola 680x0.
  • a UNIX operating system and an application program executable under the UNIX operating system and under the 680x0 instruction set are retrieved from DASD 322 and are executed by the processor 10.
  • processors With each new generation processor, new instructions can simply be modularly added. Further, because the processor has electrically alterable decoding logic circuit, it is a universal instruction set processor, adaptable to process instruction sets that are different from one another but having the same functionality. Further, with the parallel instruction processing unit, the processor of the present invention can perform parallel processing on conventional programs.
  • the processor can be made of any type of- semiconductive processing techniques, such as bipolar (ECL, TTL, I 2 L) and FET (NMOS, CMOS). Further, the processor can be used in any type of computer: microcomputer, minicomputer, mainframe or supercomputer. As for the method of operating the processor, it can be seen that speed is greatly enhanced when instructions that can be processed in parallel are done so and yet conventional sequential processing capability is maintained. Thus, the method is fully compatible with existing software.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Advance Control (AREA)

Abstract

Un processeur modulaire (10) comprend une unité d'extraction d'instructions (20) à laquelle est relié un bus d'instructions (24). Une pluralité de processeurs d'instruction (22(a...z)) sont reliés en parallèle au bus d'instructions (24), aucun processeur d'instructions individuel ne traite toutes les instructions de l'ensemble d'instructions du processeur, mais tous les processeurs d'instructions ((22(a...z)) traitent collectivement toutes les instructions de l'ensemble d'instructions. On procède au traitement parallèle soit au moyen des processeurs d'instructions (22(a...z)) fonctionnant indépendamment soit à l'aide d'une pluralité de bus d'instructions (24(a,b)) auxquels sont reliés lesdits processeurs d'instructions (22(a...k...z...)). Un ordinateur universel (300) utilisant le processeur modulaire est aussi décrit.
PCT/US1989/003656 1988-09-01 1989-08-24 Processeur modulaire ameliore, ordinateur l'integrant, et son mode de fonctionnement WO1990002995A1 (fr)

Applications Claiming Priority (8)

Application Number Priority Date Filing Date Title
US23952188A 1988-09-01 1988-09-01
US239,521 1988-09-01
US29329289A 1989-01-04 1989-01-04
US29337389A 1989-01-04 1989-01-04
US293,373 1989-01-04
US293,292 1989-01-04
US33756789A 1989-04-13 1989-04-13
US337,567 1989-04-13

Publications (1)

Publication Number Publication Date
WO1990002995A1 true WO1990002995A1 (fr) 1990-03-22

Family

ID=27499973

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1989/003656 WO1990002995A1 (fr) 1988-09-01 1989-08-24 Processeur modulaire ameliore, ordinateur l'integrant, et son mode de fonctionnement

Country Status (2)

Country Link
AU (1) AU4212089A (fr)
WO (1) WO1990002995A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1168129A1 (fr) * 2000-06-27 2002-01-02 Philips Corporate Intellectual Property GmbH Microcontrôleur avec décodage variable d'instructions

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4471433A (en) * 1980-04-21 1984-09-11 Tokyo Shibaura Denki Kabushiki Kaisha Branch guess type central processing unit
US4498134A (en) * 1982-01-26 1985-02-05 Hughes Aircraft Company Segregator functional plane for use in a modular array processor
US4635186A (en) * 1983-06-20 1987-01-06 International Business Machines Corporation Detection and correction of multi-chip synchronization errors
US4825360A (en) * 1986-07-30 1989-04-25 Symbolics, Inc. System and method for parallel processing with mostly functional languages
US4829422A (en) * 1987-04-02 1989-05-09 Stellar Computer, Inc. Control of multiple processors executing in parallel regions

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4471433A (en) * 1980-04-21 1984-09-11 Tokyo Shibaura Denki Kabushiki Kaisha Branch guess type central processing unit
US4498134A (en) * 1982-01-26 1985-02-05 Hughes Aircraft Company Segregator functional plane for use in a modular array processor
US4635186A (en) * 1983-06-20 1987-01-06 International Business Machines Corporation Detection and correction of multi-chip synchronization errors
US4825360A (en) * 1986-07-30 1989-04-25 Symbolics, Inc. System and method for parallel processing with mostly functional languages
US4829422A (en) * 1987-04-02 1989-05-09 Stellar Computer, Inc. Control of multiple processors executing in parallel regions

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1168129A1 (fr) * 2000-06-27 2002-01-02 Philips Corporate Intellectual Property GmbH Microcontrôleur avec décodage variable d'instructions

Also Published As

Publication number Publication date
AU4212089A (en) 1990-04-02

Similar Documents

Publication Publication Date Title
EP0901071B1 (fr) Procédés permettant l'interface entre un processeur et un processeur adjoint
US5923893A (en) Method and apparatus for interfacing a processor to a coprocessor
US4438492A (en) Interruptable microprogram controller for microcomputer systems
US4674089A (en) In-circuit emulator
US5235686A (en) Computer system having mixed macrocode and microcode
JP3105223B2 (ja) マイクロコンピュータ,マイクロプロセッサおよびコア・プロセッサ集積回路用デバッグ周辺装置
EP0352103B1 (fr) Ecrasement des bulles du pipeline dans un système de calcul
US6134653A (en) RISC processor architecture with high performance context switching in which one context can be loaded by a co-processor while another context is being accessed by an arithmetic logic unit
US4719565A (en) Interrupt and trap handling in microprogram sequencer
US4172281A (en) Microprogrammable control processor for a minicomputer or the like
CA1145478A (fr) Ordinateur synchrone a grande vitesse
CA1036713A (fr) Resolution par priorite d'interruption peripherique dans un dispositif de traitement de donnees microprogrammes ayant plusieurs niveaux d'ensemble de sous-instructions
US5983338A (en) Method and apparatus for interfacing a processor to a coprocessor for communicating register write information
US5062041A (en) Processor/coprocessor interface apparatus including microinstruction clock synchronization
JP2001525568A (ja) 命令デコーダ
NZ201809A (en) Microprocessor
JPH076151A (ja) オンチップメモリデバイスのアクセスのために最適化されたcpuコアバス
US4348720A (en) Microcomputer arranged for direct memory access
AU644065B2 (en) Arithmetic unit
US4258417A (en) System for interfacing between main store memory and a central processor
US6012138A (en) Dynamically variable length CPU pipeline for efficiently executing two instruction sets
EP0010196B1 (fr) Circuit et procédé de contrôle pour mémoires digitales
EP0279953B1 (fr) Ordinateur exécutant un mélange de micro- et macro-instructions
EP0448127B1 (fr) ContrÔleur de séquence de microprogramme
US5590293A (en) Dynamic microbranching with programmable hold on condition, to programmable dynamic microbranching delay minimization

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AU JP

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE FR GB IT LU NL SE