WO2002057893A2 - Procede et appareil de reduction de la consommation d'energie dans un processeur numerique - Google Patents
Procede et appareil de reduction de la consommation d'energie dans un processeur numerique Download PDFInfo
- Publication number
- WO2002057893A2 WO2002057893A2 PCT/US2001/051064 US0151064W WO02057893A2 WO 2002057893 A2 WO2002057893 A2 WO 2002057893A2 US 0151064 W US0151064 W US 0151064W WO 02057893 A2 WO02057893 A2 WO 02057893A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- pipeline
- instruction
- processor
- data
- logic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30076—Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
- G06F9/30083—Power or thermal control instructions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3206—Monitoring of events, devices or parameters that trigger a change in power modality
- G06F1/3228—Monitoring task completion, e.g. by use of idle timers, stop commands or wait commands
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3234—Power saving characterised by the action undertaken
- G06F1/3237—Power saving characterised by the action undertaken by disabling clock generation or distribution
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3234—Power saving characterised by the action undertaken
- G06F1/3243—Power saving in microcontroller unit
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30145—Instruction analysis, e.g. decoding, instruction word fields
- G06F9/30156—Special purpose encoding of instructions, e.g. Gray coding
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3842—Speculative instruction execution
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3867—Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- the present invention relates to the field of integrated circuit design, specifically to (i) power reduction techniques; and (ii) the use of a hardware description language (HDL) for implementing related instructions and control; in a pipelined central processing unit (CPU) or user-customizable microprocessor.
- HDL hardware description language
- RISC reduced instruction set computer
- RISC processors are well known in the computing arts.
- RISC processors generally have the fundamental characteristic of utilizing a substantially reduced instruction set as compared to non-RISC (commonly known as "CISC") processors.
- CISC non-RISC
- RISC processor machine instructions are not all micro- coded, but rather may be executed immediately without decoding, thereby affording significant economies in terms of processing speed.
- This "streamlined" instruction handling capability furthermore allows greater simplicity in the design of the processor (as compared to non-RISC devices), thereby allowing smaller silicon and reduced cost of fabrication.
- RISC processors are also typically characterized by (i) load/store memory architecture (i.e., only the load and store instructions have access to memory; other instructions operate via internal registers within the processor); (ii) unity of processor and compiler; and (iii) pipelining.
- a significant concern in RISC processors (and for that matter, most every integrated circuit) is power consumption and dissipation.
- dynamic power The power that is consumed only when a signal toggles (i.e. changes from 0 to 1 or from 1 to 0) is defined as dynamic power consumption.
- Toggles are also commonly referred to as switching activity.
- the much smaller amount of power that is consumed in a cell (e.g. a gate or flipflop) when there is no switching activity is called static power consumption or cell leakage power. In a modern CMOS technology, static power consumption represents less than 1% of the total power consumption and can thus be ignored in most applications.
- Dynamic power in turn consists of two components: net switching power and cell internal power.
- Net switching power is the power consumed on a net when the signal it is carrying is toggling. Net switching power is proportionally dependent on the switching activity, the net load and the squared voltage.
- the net load is the capacitive load of the net itself plus the capacitive loads of all input pins of the cells connected to the net. Thus the net load is dependent on its length (its load) and its fanout (the load of connected cells).
- Net switching power can also be defined as only the net load if the capacitive load of the input pins is added to the cell internal power. The total power consumption will be the same since both definitions include the same loads in aggregate.
- N power supply voltage to the gate
- f switching frequency
- Cell internal power is the power consumed when one or more cell input signals toggle. During the transition time when an input or an output signal changes state, both the pulldown and pull-up transistor will be open and a large current will flow through the cell. This is also often called short circuit power. The transition time depends on the chosen technology, but the number of times the transition occurs depends on the switching activity. Cell internal power is proportionally dependent on the switching activity and the squared voltage. Voltage is generally the most important parameter for determining the total power consumption as it is the only squared term in the power equation. Therefore, the choice of technology (where the voltage is defined) is the most important factor that determines total power consumption.
- HDL specifications typically do not permit designers to set the operating voltage level within the target design. Instead, HDL permits designers to address the second and third most important parameter, switching activity and net load. The product of these two parameters affects the power.
- the principle of most power reduction strategies at the HDL level is to add logic that reduces the switching activity and thereby the power consumption.
- Gray codes also called cyclical or progressive codes
- Gray codes have historically been useful in mechanical encoders since a slight change in location only affects one bit.
- these same codes offer other benefits well understood to one skilled in the art including being hazard-free for logic races and other conditions that could give rise to faulty operation of the circuit.
- the use of such Gray codes also have important advantages in power saving designs. Because only one bit changes per state change, there is a minimal number of circuit elements involved in switching per input change. This in turn reduces the amount of dynamic power by limiting the number of switched nodes toggled per clock change. Using a typical binary code, up to n bits could change, with up to n subnets changing per clock or input change.
- the present invention satisfies the aforementioned needs by providing an improved method and apparatus for reducing power consumption with a digital processor using sleep modes.
- an improved method for reducing power consumption within a digital processor comprises first defining an instruction which invokes a "sleep mode" within the processor and its pipeline; inserting the instruction into the pipeline during operation of the processor; decoding and executing the instruction; stalling the pipeline in response to the sleep mode instruction; disabling processor memory in response to the sleep mode instruction; and awaking the core from sleep mode based on the occurrence of a predetermined event.
- the programmer can selectively shut down portions of the processor under certain circumstances, thereby significantly reducing power consumption during such periods, and reducing the power consumption of the processor as a whole.
- the aforementioned sleep mode methodology is combined with a pipeline low power enable configuration which stalls unnecessary data in the pipeline, thereby conserving power within the processor.
- the method comprises providing a logic circuit adapted for detection of a predetermined condition of the data within the pipeline; mserting data into the pipeline; detecting, using the aforementioned logic circuit, that the predetermined condition exists with respect to certain of the data; invoking a sleep mode within the pipeline in response to the detected condition; and restarting the pipeline when the condition no longer exists.
- Gray coding is used in the design of the pipeline logic and in conjunction with the aforementioned sleep mode technique to further reduce power consumption.
- Such Gray coding comprises forming a binary sequence of data in which only one bit changes at any given time.
- an improved instruction format for invoking the aforementioned "sleep mode" comprises (i) a base instruction element or kernel, (ii) one or more operand bits or fields, and (iii) one or more flag bits or fields.
- the instruction is coded within the base instruction set of the processor.
- an improved method of synthesizing the design of an integrated circuit incorporating the aforementioned sleep mode functionaUty comprises obtaining user input regarding the design configuration; creating a customized HDL functional block description based on the user input and existing libraries of functions; deterrnining a design hierarchy based on the user input and existing libraries; riirining a makefile to create the structural HDL and script; rurining the script to create a makefile for the simulator and a synthesis script; and synthesizing and/or simulating the design from the simulation makefile or synthesis script, respectively.
- an improved computer program useful for synthesizing processor designs and embodying the aforementioned sleep mode functionality comprises an object code representation stored on the magnetic storage device of a microcomputer, and adapted to run on the central processing unit thereof.
- the computer program further comprises an interactive, menu-driven graphical user interface (GUI), thereby facilitating ease of use.
- GUI graphical user interface
- an improved apparatus for running the aforementioned computer program used for synthesizing gate logic associated with the aforementioned sleep mode functionality comprises a stand-alone microcomputer system having a display, central processing unit, data storage device(s), and input device.
- the processor comprises a reduced instruction set computer (RISC) having a four stage pipeline comprising instruction fetch, decode, execute, and writeback stages and an instruction set comprising at least one SLEEP instruction, which is used in a delay slot of the pipeline of the processor.
- RISC reduced instruction set computer
- Fig. la is a graphical representation of a first embodiment ("base case") of the SLEEP instruction format according to the present invention.
- Fig. lb is a graphical representation of a second embodiment of the SLEEP instruction format according to the present invention, having associated operand and flag fields.
- Fig. lc is a graphical representation of the debug register of the processor core, including ZZ and ED fields.
- Fig. 2 is logical flow diagram illustrating a first embodiment of the method of reducing power consumption within a digital processor according to the present invention.
- Figs. 3a and 3b are schematic diagrams illustrating exemplary embodiments of the logic used to implement the sleep mode functionality according to the present invention.
- Fig. 4a is a functional block diagram illustrating the relationship of the core clock module to other components within the processor core.
- Figs. 4b and 4c are schematic diagrams illustrating exemplary clock module gate logic for the instances where clock gating is selected and not selected during core build, respectively.
- Figs 4d-4f are schematic diagrams illustrating exemplary embodiments of the logic used to implement the clock gating functionality according to the present invention.
- Fig. 5 is logical flow diagram illustrating a second embodiment of the method of reducing power consumption within a digital processor by stalling the pipeline in response to the detection of invalid data.
- Fig. 6 is a logical flow diagram illustrating the generalized methodology of synthesizing processor logic which incorporates the sleep mode functionality of the present invention.
- Fig. 7 is a block diagram of a pipelined processor design incorporating the sleep mode functionality of the present invention.
- Fig. 8 is a functional block diagram of one exemplary embodiment of a computer system useful for synthesizing logic gate logic implementing the aforementioned sleep mode functionality within a processor device.
- processor is meant to include any integrated circuit or other electronic device capable of performing an operation on at least one instruction word including, without limitation, reduced instruction set core (RISC) processors such as the ARC user-configurable core manufactured by the Assignee hereof, central processing units (CPUs), and digital signal processors (DSPs).
- RISC reduced instruction set core
- CPUs central processing units
- DSPs digital signal processors
- the hardware of such devices may be integrated onto a single piece of silicon (“die”), or distributed among two or more die.
- various functional aspects of the processor may be implemented solely as software or firmware associated with the processor.
- stage refers to various successive stages within a pipelined processor; i.e., stage 1 refers to the first pipelined stage, stage 2 to the second pipelined stage, and so forth.
- stage refers to the number of times a signal changes from 0 to 1 or from 1 to 0. If a signal changes from 0 to 1 it has toggled once. If it changes back to 0 again it has toggled twice.
- a clock signal generally toggles twice per clock period, and all other signals toggle at a maximum of once per clock period (except if the signals are generated on both clock edges, etc.).
- NHSIC hardware description language (NHDL)
- other hardware description languages such as Nerilog® may be used to describe various embodiments of the invention with equal success.
- an exemplary Synopsys® synthesis engine such as the Design Compiler 1999.05 (DC99) is used to synthesize the various embodiments set forth herein
- other synthesis engines such as Buildgates® available from, inter alia, Cadence Design Systems, Inc.
- IEEE std. 1076.3-1997, IEEE Standard VHDL Synthesis Packages describe an industry-accepted language for specifying a Hardware Definition Language-based design and the synthesis capabilities that may be expected to be available to one of ordinary skill in the art. Appendix I hereto provides relevant portions of the HDL code relating to the various aspects of the invention. Sleep Mode
- the present invention comprises a "sleep mode" wherein the core pipeline (and optionally memory devices associated with the core) is shut down to conserve power.
- the sleep mode is initiated using a SLEEP instruction which comprises an assembly language instruction of the type well known in the art which is placed within an instruction slot in the processor pipeline.
- the SLEEP instruction when executed by the processor, allows the processor core to go into a sleep mode which, ter alia, stalls the processor pipeline until an interrupt or designated restart event occurs, thereby reducing power consumption.
- interrupt refers to a state wherein the processor causes programmatic control to be transferred to an interrupt service routine
- restart refers to that condition when the processor is re-enabled after having been halted.
- SLEEP instruction of the invention is configured only to be detected in pipeline stage 2, and has no associated options or operands. Such embodiment represents the "baseline" functionality. It will be appreciated, however, that other configurations which utilize operands and/or flags may be employed with equal success, depending on the required attributes for the particular core design.
- Fig. lb illustrates an exemplary embodiment of such an alternative instruction encoding (format) for the SLEEP instruction. As illustrated in Fig. lb, the format 100 comprises (i) a base instruction element or kernel 102; (ii) one or more operand fields 104; and (iii) one or more flag fields 106. Other configurations are also possible consistent with the invention.
- the SLEEP instruction of the present invention may advantageously be put anywhere in the code, for example as shown below:
- the SLEEP instruction comprises a single operand instruction without flags or other operands. This instruction is part of the base case instruction set of the core.
- one or more additional control bits are introduced in the debug register 190 of the core of the present embodiment to control lower power modes. The following outlines the general functionality of the sleep mode control bits:
- ZZ (Sleep Mode):- Indicates when the core is in sleep mode 0 - core is not in sleep mode (default)
- the Sleep Mode flag is set when the core enters sleep mode as previously described.
- the ZZ flag is set when a SLEEP instruction arrives in pipeline stage 2, and cleared when the core is restarted or receives an interrupt request of the type previously described.
- the timer register aux_timer of Assignee's ARC core is incremented by one on every clock cycle. If the least significant bit in the aux_tcontrol register is set, the timer generates an interrupt when the register aux_timer "wraps.” This wrapping occurs one cycle after the aux_timer has reached the maximum value of OxOOFFFFFF. Hence, when the timer wraps, the interrupt signal is generated, and core wakes up from sleep mode as previously described.
- the following exemplary code illustrates this concept:
- JAL _start ivech_handler ⁇ User defined code> sr 0x0, [aux_tcontrol] ; Disable interrupt generation
- the "sr 0x1" instruction (aux_tcontrol) enables interrupt generation, while "flag 2" enables level 1 interrupts.
- the "sr OxOOFFOOOO” instruction sets the start value of the timer to a starting value of OxOOFFOOOO.
- the core encounters the SLEEP instruction, it goes into sleep mode until the timer has counted to OxOOFFFFFF (from the starting value of OxOOFFOOOO).
- the timer wraps (i.e. is set to the value 0x00000000) and generates an interrupt signal on (IRQ3) whereby the core wakes up.
- the interrupt enable flag for level 1 has been set to allow the interrupt signal (IRQ3) to be recognized.
- the first step 202 of the method 200 comprises defining a sleep mode for the processor via an instruction word format (such as the foregoing SLEEP word).
- the SLEEP instruction is also coded to invoke a pipeline stall and optional disabling of the RAM via the HDL code that defines the pipeline operation.
- the SLEEP instruction is inserted into the pipeline at stage 1.
- the pipeline is advanced, with the SLEEP instruction being advanced to stage 2 (decode) of the pipeline.
- the SLEEP instruction at stage 2 sets the ZZ flag when stage 2 is allowed to move into stage 3.
- the processor enters the sleep mode. No more instruction fetches are allowed and pipeline stage 1 is prevented to move into stage 2 (step 210). Stages 2 and above flow free, however, which means that pipeline stages 2 and above will be flushed in the beginning of the sleep mode (step 212). This means that the SLEEP instruction itself will also be flushed, since the SLEEP instruction in stage 2 is advanced to stage 3 as described above. Also, upon execution, the RAM associated with the processor is optionally disabled per step 213, depending on the HDL coding of the instruction.
- the sleep mode duration may then be optionally controlled using a timer or similar function, such as the aux_timer function as previously described herein (step 216).
- a timer or similar function such as the aux_timer function as previously described herein (step 216).
- an interrupt is generated (step 220), and the core wakes from the sleep mode per step 222.
- the aforementioned interrupt signal may be generated by another function within the core, or may be generated by an external module, such as a disk drive.
- the SLEEP instruction of the present invention may also advantageously be put in a delay slot present in the pipeline, as in the following code example: bal.d after_sleep sleep
- the term "delay slot” refers to the slot within a pipeline subsequent to a branching or jump instruction being decoded.
- Branching used consistent with the present invention may be conditional (i.e., based on the truth or value of one or more parameters, such as the value of a flag bit) or unconditional. It may also be absolute (e.g., based on an absolute memory address), or relative (e.g., based on a relative addressing scheme and independent of any particular memory address).
- the processor core enters the sleep mode after the branch instruction has been executed.
- the program counter PC points to the "add” instruction after the label "after_sleep".
- the core wakes up, executes the interrupt service routine, and continues with the add instruction to which the PC is pointing.
- the SLEEP instruction of the present invention can be put in the delay slot of a jump instruction to solve the problem with a real-time operating system (RTOS) that sets the interrupt flags in the main memory, the latter required to be cleared before entering the sleep mode.
- RTOS real-time operating system
- the current flag settings are first stored in core register rl.
- the PC address to which the program jumps after it has been woken up from SLEEP mode is also stored in rl . Consequently the core register rl will contain both the current flag settings and the exit address towards which the program goes to after the sleep mode.
- the interrupt enable flags are disabled so that no new interrupt requests can be detected by the processor. All interrupt flags in the memory are serviced until there are no more interrupt flags set.
- the SLEEP instruction of the present invention acts as a no-operation (NOP) instruction during single-step mode since every single-step is treated as a restart and the core wakes up at the next single-step.
- NOP no-operation
- single-step mode refers generally to modes wherein the processor steps sequentially through a limited number of cycles, a specific example of which being where one processor cycle is initiated per switch closure on the single step pin of the processor. This mode is useful for software debugging and evaluation of pipeline contents during execution.
- Figs. 3a and 3b illustrate first and second exemplary embodiments, respectively, of synthesized gate logic 300, 320 used to implement the foregoing sleep mode power reduction functionality within the core.
- clock gating is a hardware option that is selected when the core build is created by the hardware engineer (described in greater detail below). Hence, the software programmer has no control over clock gating.
- enable debug is a clock gating option for the action points of the core. If this option is selected, then the action point clock is gated when the action points are not used.
- ED debug
- the enable debug (ED) flag is used to enable the debug clock and thereby turn on the debug extensions.
- debug extensions refers to optional instructions and other hardware capabilities that are included in the processor to facilitate the debugging process, such as for example extension instructions included as part of the extension instruction set designed to facilitate debug or related processes.
- ED flag setting is typically accomplished via the host by the debugger just before it needs to access the debug extensions. When the ED flag is clear the debug clock is gated, and the debug extensions are thereby completely switched off. Conversely, when the flag is set, the debug clock is not gated, and the debug extensions are enabled.
- the ED flag does not affect the sleep mode in any way; rather, it only controls the clock gating of the debug extensions.
- the ED flag only works if clock gating was selected by the programmer. If clock gating was not selected during the core build, the ED flag is removed during the synthesis process, the latter being described below.
- Fig. 4a illustrates the relationship of the core clock module to the rest of the design.
- the clock module 450 is a part of all core builds, even if clock gating was not selected in the build; however, the content of the clock module varies accordingly. If clock gating was selected, the clock module 450 contains the clock gating (see Fig. 4b). If this option was not selected during core build, the clock module 450 is empty, with all clock outputs directly connected to the input clock (see Fig. 4c).
- a constant called ck_gating (defined in extutil.vhdl) controls the clock module configuration.
- Figs 4d-4f illustrate exemplary embodiments of logic 440, 460, 480 used to implement the foregoing clock gating functionality within the processor core. It will be recognized, however, that other logic configurations may be substituted to perform the foregoing functions with equal success, such other configurations being readily determined by those of ordinary skill in the processor design and logic synthesis arts. Gray Coding
- Gray coding comprises forming a binary sequence in which only one bit changes at any given time. By restricting the core design during build such that only one bit changes at the time, power consumption is reduced. Specifically, Gray coding reduces power consumption by reducing the number of nodes that toggle per clock cycle. Since the core's pipeline employs a clock that operates at the highest frequency of the processor, reductions in the number of nodes toggled per clock cycle can be significant. Pipeline control logic is often implemented by state machine logic.
- Gray code can generally be implemented in two ways within the processor core of the present invention: (i) within the HDL; or (ii) within the synthesis script. Full control over the Gray coding is often best achieved in the HDL.
- the significant benefit to Gray coding, in contrast to many other power reduction techniques, is that it does not add any extra control logic to the design. Consequently there are very few if any downsides to implementing Gray coding.
- Gray coding may be implemented in conjunction with the sleep mode functionality described above to further reduce core power consumption with effectively no detriments to other aspects of core operation.
- Gray code for 3 bits is (000, 010, 011, 001, 101, 111, 110, 100).
- An n- bit Gray code corresponds to a Hamiltonian cycle on an n-dimensional hypercube. While the term Gray code is used herein as if there is only one Gray code, it will be recognized that Gray codes are not unique.
- One way to construct a Gray code for n bits is to use a Gray code for n-1 bits with each code prefixed by 0 (for the first half of the code) and append the n-1 Gray code reversed with each code prefixed by 1 for the second half.
- the following example illustrates the creation of a 3 -bit Gray code from a 2-bit Gray code (algorithm derived from "Combinatorial Algorithms," Reingold, Nievergelt, Deo):
- the method 500 of reducing power consumption comprises first providing a logic circuit adapted for detection of a predetermined condition of the data within the pipeline (step 502); inserting data into the pipeline (step 504); detecting, using the logic circuit, that the predetermined condition exists with respect to certain of the data (step 506); invoking a sleep mode within the pipeline in response to the detected condition (508); and restarting the pipeline when the condition no longer exists (step 510).
- a logic circuit adapted for detection of a predetermined condition of the data within the pipeline
- step 504 inserting data into the pipeline
- invoking a sleep mode within the pipeline in response to the detected condition (508) invoking a sleep mode within the pipeline in response to the detected condition (508); and restarting the pipeline when the condition no longer exists (step 510).
- Such conditions include anticipatory execution of an instruction which is then subsequently stopped by a conditional evaluation.
- the pipeline logic may be modified to prevent unnecessary switching activity in two ways: (i) by generating a "low power" version of the pipeline enable signal enl (e.g., enl_lowpower); and (ii) by generating the enable signal en2 (which controls the data path to the ALU of the core) differently.
- the modification comprises activating the two enable signals (individually) if the pipeline stage contains valid data. Accordingly, data determined to be no longer valid, or of no further use, is not propagated down the pipeline, thereby conserving power.
- ALU arithmetic logic unit
- the foregoing process may add a delay to the critical path and thereby reduce the maximum clock frequency. If this is not acceptable, it is a simple matter to use the non-low power version. If a timing problem exists with one of the extensions, the normal data path (slval and s2val) is selected. It is acceptable to change only the extension that is on the critical path, while letting all the other extensions use the low power version of the data path. Hence, the only reason not to use the low power version is if the extension in question will be on the critical path, and add too much delay, thereby adversely impacting the target clock frequency of the resulting design.
- the small multi-cycle extensions of the ARC core can be further reduced in power consumption by using Gray code of the type previously described herein.
- Gray code of the type previously described herein.
- the two methods of introducing Gray code previously discussed i.e., in synthesis script or in HDL code
- only the HDL solution gives a robust result, even though it provides only a few percent overall power reduction. Further reduction in the overall power consumption can be achieved by modifying the extension ALU of the core.
- the exemplary ARC core described herein is configurable is highly advantageous from a power point of view. By only choosing those modules that will actually be used by the design, much unnecessary power consumption can be removed. This is a major advantage of configurable cores (such as the ARC) over non-configurable cores. Another important feature of such cores is the ability to design extensions to minimize cycle counts for common or recurring functions, thereby reducing the power consumption. Hence, by (i) choosing only modules used by the design; (ii) designing extensions adapted to minimize cycle counts; and (iii) utilization of one or more of the foregoing power reduction functions (e.g., sleep mode, clock gating, pipeline logic modification), the overall power consumption of the core can be significantly reduced.
- the foregoing power reduction functions e.g., sleep mode, clock gating, pipeline logic modification
- MAC multiply and accumulate
- the technology library files in the present invention store all of the information related to cells necessary for the synthesis process, including for example logical function, input/output timing, and any associated constraints.
- each user can define his/her own library name and location(s), thereby adding further flexibility.
- step 603 the user creates customized HDL functional blocks based on the user's input and the existing library of functions specified in step 602.
- step 604 the design hierarchy is determined based on user input and the aforementioned library files.
- a hierarchy file, new library file, and makefile are subsequently generated based on the design hierarchy.
- makefile refers to the commonly used UNLX makefile function or similar function of a computer system well known to those of skill in the computer prograinming arts.
- the makefile function causes other programs or algorithms resident in the computer system to be executed in the specified order.
- it further specifies the names or locations of data files and other information necessary to the successful operation of the specified programs. It is noted, however, that the invention disclosed herein may utilize file structures other than the "makefile” type to produce the desired functionality.
- the user is interactively asked via display prompts to input information relating to the desired design such as the type of "build" (e.g., overall device or system configuration), width of the external memory system data bus, different types of extensions, cache type/size, use of clock gating, Gray coding restrictions, etc. Many other configurations and sources of input information may be used, however, consistent with the invention.
- the user runs the makefile generated in step 604 to create the structural
- This structural HDL ties the discrete functional block in the design together so as to make a complete design.
- step 608 the script generated in step 606 is run to create a makefile for the simulator.
- the user also runs the script to generate a synthesis script in step 508.
- a decision is made whether to synthesize or simulate the design (step 610). If simulation is chosen, the user runs the simulation using the generated design and simulation makefile (and user program) in step 612. Alternatively, if synthesis is chosen, the user runs the synthesis using the synthesis script(s) and generated design in step 614. After completion of the synthesis/simulation scripts, the adequacy of the design is evaluated in step 616.
- a synthesis engine may create a specific physical layout of the design that meets the performance criteria of the overall design process yet does not meet the die size requirements. In this case, the designer will make changes to the control files, libraries, or other elements that can affect the die size. The resulting set of design information is then used to re-run the synthesis script. If the generated design is acceptable, the design process is completed. If the design is not acceptable, the process steps beginning with step 602 are re-performed until an acceptable design is achieved. In this fashion, the method 600 is iterative.
- Fig. 7 illustrates an exemplary pipelined processor fabricated using a 1.0 um process.
- the processor 700 is an ARC microprocessor-like CPU device having, inter alia, a processor core 702, on-chip memory 704, and an external interface 706.
- the device is fabricated using the customized NHDL design obtained using the method 600 of the present invention, which is subsequently synthesized into a logic level representation, and then reduced to a physical device using compilation, layout and fabrication techniques well known in the semiconductor arts.
- the processor of Figure 6 may contain any commonly available peripheral such as serial communications devices, parallel ports, timers, counters, high current drivers, analog to digital (A D) converters, digital to analog converters (D/A), interrupt processors, LCD drivers, memories and other similar devices. Further, the processor may also include custom or application specific circuitry.
- the present invention is not limited to the type, number or complexity of peripherals and other circuitry that may be combined using the method and apparatus. Rather, any limitations are imposed by the physical capacity of the extant semiconductor processes which improve over time. Therefore it is anticipated that the complexity and degree of integration possible employing the present invention will further increase as semiconductor processes improve.
- the computing device 800 comprises a motherboard 801 having a central processing unit (CPU) 802, random access memory (RAM) 804, and memory controller 805.
- a storage device 806 (such as a hard disk drive or CD-ROM), input device 807 (such as a keyboard or mouse), and display device 808 (such as a CRT, plasma, or TFT display), as well as buses necessary to support the operation of the host and peripheral components, are also provided.
- the aforementioned NHDL descriptions and synthesis engine are stored in the form of an object code representation of a computer program in the RAM 804 and/or storage device 806 for use by the CPU 802 during design synthesis, the latter being well known in the computing arts.
- the user (not shown) synthesizes logic designs by inputting design configuration specifications into the synthesis program via the program displays and the input device 807 during system operation. Synthesized designs generated by the program are stored in the storage device 806 for later retrieval, displayed on the graphic display device 808, or output to an external device such as a printer, data storage unit, other peripheral component via a serial or parallel port 812 if desired.
- Sleep Mode signals — out AP_p3disable_r L To flags.vhdl. This signals to the ARC that the pipeline has been flushed due to a breakpoint or sleep instruction. If it was due to a breakpoint instruction the ARC is halted via the 'en' bit, and the AH bit is set to ' 1' in the debug register.
- the sleep instruction is determined at stage 2 from:
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Advance Control (AREA)
- Microcomputers (AREA)
- Power Sources (AREA)
Abstract
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| AU2002246904A AU2002246904A1 (en) | 2000-10-27 | 2001-10-25 | Method and apparatus for reducing power consuption in a digital processor |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US24407100P | 2000-10-27 | 2000-10-27 | |
| US60/244,071 | 2000-10-27 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| WO2002057893A2 true WO2002057893A2 (fr) | 2002-07-25 |
| WO2002057893A3 WO2002057893A3 (fr) | 2003-05-30 |
Family
ID=22921254
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2001/051064 Ceased WO2002057893A2 (fr) | 2000-10-27 | 2001-10-25 | Procede et appareil de reduction de la consommation d'energie dans un processeur numerique |
Country Status (2)
| Country | Link |
|---|---|
| AU (1) | AU2002246904A1 (fr) |
| WO (1) | WO2002057893A2 (fr) |
Cited By (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2004051450A3 (fr) * | 2002-12-04 | 2005-02-03 | Koninkl Philips Electronics Nv | Commande logicielle de la dissipation de puissance de microprocesseur |
| WO2006094196A3 (fr) * | 2005-03-03 | 2007-02-01 | Qualcomm Inc | Procede et appareil destines a la reduction de la consommation electrique au moyen d'un processeur a multiples pipelines heterogenes |
| WO2007101216A3 (fr) * | 2006-02-27 | 2008-01-03 | Qualcomm Inc | Processeur à virgule flottante à besoins en énergie réduits pour la précision inférieure au choix |
| WO2008009366A1 (fr) * | 2006-07-21 | 2008-01-24 | Sony Service Centre (Europe) N.V. | Système ayant une pluralité de blocs matériels et procédé pour son exploitation |
| US8918446B2 (en) | 2010-12-14 | 2014-12-23 | Intel Corporation | Reducing power consumption in multi-precision floating point multipliers |
| CN107977227A (zh) * | 2016-10-21 | 2018-05-01 | 超威半导体公司 | 包括不同指令类型的独立硬件数据路径的管线 |
Family Cites Families (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5485625A (en) * | 1992-06-29 | 1996-01-16 | Ford Motor Company | Method and apparatus for monitoring external events during a microprocessor's sleep mode |
| JPH06332693A (ja) * | 1993-05-27 | 1994-12-02 | Hitachi Ltd | タイムアウト機能付き休止命令の発行方式 |
| US5584031A (en) * | 1993-11-09 | 1996-12-10 | Motorola Inc. | System and method for executing a low power delay instruction |
| GB9419246D0 (en) * | 1994-09-23 | 1994-11-09 | Cambridge Consultants | Data processing circuits and interfaces |
| US5774709A (en) * | 1995-12-06 | 1998-06-30 | Lsi Logic Corporation | Enhanced branch delay slot handling with single exception program counter |
| JP2001282548A (ja) * | 2000-03-29 | 2001-10-12 | Matsushita Electric Ind Co Ltd | 通信装置及びその方法 |
-
2001
- 2001-10-25 AU AU2002246904A patent/AU2002246904A1/en not_active Abandoned
- 2001-10-25 WO PCT/US2001/051064 patent/WO2002057893A2/fr not_active Ceased
Cited By (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2004051450A3 (fr) * | 2002-12-04 | 2005-02-03 | Koninkl Philips Electronics Nv | Commande logicielle de la dissipation de puissance de microprocesseur |
| WO2006094196A3 (fr) * | 2005-03-03 | 2007-02-01 | Qualcomm Inc | Procede et appareil destines a la reduction de la consommation electrique au moyen d'un processeur a multiples pipelines heterogenes |
| WO2007101216A3 (fr) * | 2006-02-27 | 2008-01-03 | Qualcomm Inc | Processeur à virgule flottante à besoins en énergie réduits pour la précision inférieure au choix |
| US8595279B2 (en) | 2006-02-27 | 2013-11-26 | Qualcomm Incorporated | Floating-point processor with reduced power requirements for selectable subprecision |
| WO2008009366A1 (fr) * | 2006-07-21 | 2008-01-24 | Sony Service Centre (Europe) N.V. | Système ayant une pluralité de blocs matériels et procédé pour son exploitation |
| US8161276B2 (en) | 2006-07-21 | 2012-04-17 | Sony Service Centre (Europe) N.V. | Demodulator device and method of operating the same |
| US8918446B2 (en) | 2010-12-14 | 2014-12-23 | Intel Corporation | Reducing power consumption in multi-precision floating point multipliers |
| CN107977227A (zh) * | 2016-10-21 | 2018-05-01 | 超威半导体公司 | 包括不同指令类型的独立硬件数据路径的管线 |
Also Published As
| Publication number | Publication date |
|---|---|
| AU2002246904A1 (en) | 2002-07-30 |
| WO2002057893A3 (fr) | 2003-05-30 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20030070013A1 (en) | Method and apparatus for reducing power consumption in a digital processor | |
| US6477697B1 (en) | Adding complex instruction extensions defined in a standardized language to a microprocessor design to produce a configurable definition of a target instruction set, and hdl description of circuitry necessary to implement the instruction set, and development and verification tools for the instruction set | |
| US7010558B2 (en) | Data processor with enhanced instruction execution and method | |
| US20020010848A1 (en) | Data processing system | |
| Mantovani et al. | HL5: a 32-bit RISC-V processor designed with high-level synthesis | |
| US11086631B2 (en) | Illegal instruction exception handling | |
| US7171631B2 (en) | Method and apparatus for jump control in a pipelined processor | |
| US7752592B2 (en) | Scheduler design to optimize system performance using configurable acceleration engines | |
| US20020032558A1 (en) | Method and apparatus for enhancing the performance of a pipelined data processor | |
| Christensen et al. | The design of an asynchronous TinyRISC/sup TM/TR4101 microprocessor core | |
| WO2000070483A2 (fr) | Procede et appareil de segmentation et de reassemblage d'un processeur pipeline | |
| WO2002057893A2 (fr) | Procede et appareil de reduction de la consommation d'energie dans un processeur numerique | |
| US6993674B2 (en) | System LSI architecture and method for controlling the clock of a data processing system through the use of instructions | |
| JP4800582B2 (ja) | 演算処理装置 | |
| EP1190305B1 (fr) | Procede et appareil de controle d'emplacement de temporisation de branchement dans un processeur pipeline | |
| EP1190303B1 (fr) | Procede et dispositif de commande de saut dans un processeur pipeline | |
| WO2000070446A2 (fr) | Procede et appareil d'encodage de registre libre dans un processeur pipeline | |
| US6044460A (en) | System and method for PC-relative address generation in a microprocessor with a pipeline architecture | |
| Namjoo et al. | Implementing sparc: A high-performance 32-bit risc microprocessor | |
| US20060168431A1 (en) | Method and apparatus for jump delay slot control in a pipelined processor | |
| Wu et al. | Instruction buffering for nested loops in low-power design | |
| Krashinsky | Microprocessor energy characterization and optimization through fast, accurate, and flexible simulation | |
| Cho et al. | CalmRISC/sup TM/-32: a 32-bit low-power MCU core | |
| Meyer et al. | HDL FSM code generation using a MIPS-based assembler | |
| Alana et al. | in SCL 180 nm Technology |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW |
|
| AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
| REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
| 122 | Ep: pct application non-entry in european phase | ||
| NENP | Non-entry into the national phase |
Ref country code: JP |