WO2005041074A1

WO2005041074A1 - System and method for functional verification of an electronic integrated circuit design

Info

Publication number: WO2005041074A1
Application number: PCT/IB2003/004762
Authority: WO
Inventors: Asger Munk Nielsen; Søren KRAGH
Original assignee: ATINEC APS
Current assignee: ATINEC APS
Priority date: 2003-10-28
Filing date: 2003-10-28
Publication date: 2005-05-06
Anticipated expiration: 2006-04-28
Also published as: AU2003274453A1

Abstract

This invention relation to a simulation method and system for functional verification of an electronic integrated circuit design. The system (100) comprises a server (120) compiling a design input into a first code and second code, a host system (102) connected to the server (120) and executing the first code, and a simulation engine (106) connected to the host system (102) through a communication channel (108) and executing the second code. The simulation engine (106) comprises a hardware board (110) comprising a link unit (404) interconnecting a controller unit (406), storage unit (408) and compute units (112) in a link unit network (508), and each of said compute units (112) comprises a plurality of simulation processing units (300). The server (120) performs a partition the second code into subgroups and maps each of the subgroups on to one of the simulation processing units (300).

Description

SYSTEM AND METHOD FOR FUNCTIONAL VERIFICATION OF AN ELECTRONIC INTEGRATED CIRCUIT DESIGN

Field of invention

The present invention relates to electronic design automation (EDA) , and more specifically to a simulation method and system for functional verification of an electronic integrated circuit design.

Background of invention

Electronic design automation (EDA) refers to human assisted automatic computer based tools that are employed when designing or verifying electronic integrated circuits. EDA tools may be special purpose software running on a workstation or similar computer platform. EDA tools may also be dedicated special purpose computation devices, built to accelerate or ease the task of verifying, altering, creating or analyzing an electronic design description. In some cases EDA tools have been created as a combination of software and dedicated hardware .

Functional verification is often a substantial and complicated task that must be completed in the process of designing and producing a functionally correct integrated circuit. The term functional verification refers to the task of verifying that the functional behavior of an electronic circuit or description thereof or parts of such a circuit or description thereof is in accordance with a specification of its intended function. The design of an integrated circuit is typically described in a high-level Hardware Description Language (HDL) such as VHDL,

Verilog, SystemC or SystemVerilog. Functional verification is commonly performed by simulating the behavior of the device in a manner where the device is presented with input data representing stimuli as part of a plausible test scenario. A response is then computed based on the HDL description of the design and the input data and the current state of the simulation. The response is more specifically computed as new internal state values and output data. The input data is most often generated by an HDL test bench that computes stimuli and applies it to the input pins of the device under tests (DUT) . The actual verification consists of checking whether the output and internal state is in accordance with the expected behavior of the DUT.

The quality or fidelity of a verification suite may be measured in terms of the probability that there are no undiscovered bugs in the device under test, this probability is related to the amount of verification cycles that a design has been subjected to. Simulating ^'a description of an electronic circuit, which by nature is parallel, in software running on a serial machine like a general-purpose workstation is very inefficient and time consuming .

Advances in integrated circuit technology have resulted in continuing design size measured in terms of total transistor count. This trend has resulted in an ever-growing demand for functional verification capacity. It is well known that the cost of and time associated with correcting a bug grows considerably as the design process progresses towards the manufacturing stage or even worse beyond the manufacturing stage. Prior art technologies, such as described in International patent application no. WO 03/005212 disclosing a communication system between a plurality of field-programmable gate arrays (FPGAs) constituting a simulation system together with a Kernel, which simulation system utilizes a 4-state value communication bus, and American patent no. US 6,480,988 disclosing a communication mechanism between combinatory logic units and a run-time controller, generally consist of a mapping process mapping hardware description language (HDL) into a system of "gates" and connections, or "look-up tables", which system is simulated in a 2 -state simulator. Assertions, i.e. concise high-level descriptions of required or illegal behavior of the design description are synthesized into a hardware description language prior to being mapped.

Summary of the invention

An object of the present invention is therefore to provide a system and method solving the problems identified in the prior art, that is, in particular, enabling accelerated simulation of designs which operate with a 4-state valued output and compiling assertions and functional coverage together with hardware description language.

An object of the present invention is to provide a system and method for functional verification of electronic integrated circuit design, which system and method operates in a parallel mode using hardware-assisted simulation that relies on parallel special purpose processing.

A particular advantage of the present invention is provision of fast and reliable functional verification, which is instrumental in reducing the risk of uncovering bugs at a late stage, while reducing the cost of fixing bugs.

A particular feature of the present invention relates to the system and method enables simulation of assertions and functional coverage incorporated into a hardware description type language.

The above objects, ^'advantage and feature together with numerous other objects, advantages and features, which will become evident from below detailed description, are obtained according to a first aspect of the present invention by a system for functional verification of an electronic integrated circuit design input, and said system comprising: a server adapted to compile said design input into a first code and second code, a host system connected to said server and operable to execute said first code generated by said server, and a simulation engine connected to said host system through a communication channel and operable to execute said second code generated by said server, and wherein said simulation engine comprises a hardware board comprising a link unit interconnecting a controller unit, storage unit and compute units in a link unit network, wherein each of said compute units comprises a plurality of simulation processing units, and wherein said server is adapted to partition said second code into subgroups and to map each of said subgroups on to one of said simulation processing units.

Simulation alone only provides a vehicle for exercising parts of the state space of a design. Assertions on the other hand, provide a concise high-level description of required or illegal behavior of the design description. Moreover, what is generally known as functional coverage may effectively measure the quality of a verification suite, in the sense that the functional coverage measure provides valuable feedback to test engineers about how much of the design has been exercised during a particular verification run. Thus simulation in conjunction with assertions and functional coverage results in high fidelity verification.

The above objects, advantages and features together with numerous other objects, advantages and features, which_, will become evident from below detailed description, are obtained according to a second aspect of the present invention by a method for functional verification of an electronic integrated circuit design input, and said method comprising: receiving said design input for by means of a server, partitioning said design input in a first part and in a second part by means of said server, compiling said first part thereby generating a first code to be executed on a host system, partitioning said second part into sub-groups and mapping each of said sub-groups to one of a plurality of simulation processing units of a simulation engine, each of said plurality of simulation processing units, compiling said partitioned second part thereby generating a second code to be executed on each of said plurality of simulation processing units of said simulation engine by means of said server.

The method according to the second aspect of the present invention may incorporate any features of the system according to the first aspect of the present invention. The above objects, advantages and features together with numerous other objects, advantages and features, which will become evident from below detailed description, are obtained according to a third aspect of the present invention by a simulation processing unit for receiving mapping of a part of a hardware description language and comprising a data path operable to perform calculations on operands, a control block reading instructions from an instruction RAM, which instructions then orchestrate the operation of said data path, a data RAM serving as storage for intermediate values and memory for said data path, and a network interface unit enabling communication to and from other simulation processing units.

The unique characteristic of the simulation processing unit design according to the third aspect of the present invention is its special -purpose instruction set optimized for computing the functional hardware description language of a device under test with exactly the same result as a traditional software simulator would have computed.

The simulation processing unit according to the third aspect of the present invention may incorporate any features of the system according to the first aspect and the method according to the second aspect of the present invention.

Embodiments of the first, second and third aspect of the present invention are further disclosed in independent claims 2 to 28 and independent claims 30 to 43. Brief description of the drawings

The above, as well as additional objects, features and advantages of the present invention, will be better understood through the following illustrative and non-limiting detailed description of preferred embodiments of the present invention, with reference to the appended drawing, wherein:

figure 1, shows a functional verification system in accordance with a first embodiment of the present invention;

figure 2, shows a partitioning of the user design and allocation to various components of the functional verification system;

figure 3, shows organization of a simulation processing unit (SPU) according to the first embodiment of the present invention;

figure 4, shows a PCI board according to the first embodiment of the present invention. Note only one side of the PCB is shown;

figure 5, shows organization of a link chip and associated components and interfaces according to the first embodiment of the present invention;

figure 6, shows a verification chassis according to the second embodiment of the present invention, which chassis including a simulation engine and host system;

figure 7, shows an on-chip full crossbar network topology according to the first embodiment of the present invention; figure 8, shows a network interface unit (NIU) , handling off- chip communication according to the first embodiment of the present invention;

figure 9, shows the on-chip full crossbar network topology with integrated off-chip communication network interface units according to the first embodiment of the present invention;

figure- 10, shows an interconnect network topology, fully interconnecting all compute chips according to the first embodiment of the present invention. Note only connections for one compute chip is shown;

figure 11, shows an interconnect network topology, partially interconnecting all compute chips according to a second embodiment of the present invention. Note only one side of the PCB is shown;

figure 12, shows an example of a timing diagram of a message passing protocol. In this example four messages are sent;

figure 13, shows a bi-directional inter-chip connection method enhanced with message-end signaling; and

figure 14, shows an interconnect topology according to the second embodiment of the present invention interconnecting the PCBs in a complete simulation engine;

figure 15, shows an IP integration FPGA board. figure 16, shows a flowchart of a method for verification of user HDL in accordance with the first embodiment of the present invention;

figure 17, shows utilization of related applications and drivers during a^' runtime section of the method according to the first embodiment of the present invention;

figure 18, shows a flowchart of the runtime section of the method according to the first embodiment of the present invention;

Detailed description of preferred embodiments

In the following description of the various embodiments, reference is made to the accompanying drawing which form a part hereof, and in which is shown by way of illustration various embodiments in which the invention may be practiced. It is to be understood that other embodiments may be utilized and structural and functional modifications may be made without departing from the scope of the present invention.

The present invention provides a mechanism for efficient functional simulation of an integrated circuit described in a Hardware Description Language (HDL) .

The HDL verification system according to the preferred embodiment of the present invention is^' a massively parallel computation system comprising of a host system and a simulation engine. The system provides simulation speeds orders of magnitude faster than what may be achieved using conventional software simulation without compromising the functional behavior of the user HDL description. This ability to preserve the exact functional HDL behavior in a hardware-accelerated environment is a- key part of the present invention.

In the preferred embodiment described below the simulation engine is capable of simulating instructions that are in great resemblance to the operators that are present in the user HDL _. description. Moreover the simulation processing units (SPUs) that are part of the simulation engine are configured to compute on vector operands, i.e. operands that are wider than a single bit. In addition to the ordinary Boolean values 0 and 1, each bit position of an operand vector may also take on a X or Z value signifying an unknown respectively unconnected signal value. These expanded bit states are often referred to as 4- state values .

It is the unique combination of 4-state value handling and native handling of operations that are akin to the HDL language that enables speedy verification that is in exact accordance with the functional behavior of the source HDL description.

Figure 1 shows a verification system designated in entirety by reference numeral 100. The verification system 100 comprises of the following main components: a host system 102 comprising one or more hosts 104a, b, and c, capable of interacting with the user or users and act as a computational platform for simulating parts of the design. The one or more hosts 104a, b,c may be a single workstation, special purpose processing platform, or any combination thereof.

The second component of the verification system 100 is a special purpose simulation engine 106. The host system 102 and the simulation engine 106 are interconnected using a communication channel 108, such as a bus. The bus 108 may be implemented using any communication technology known to a person skilled in the art. For example, the communication channel may be implemented using a standard

PCI bus.

The host system 102 one or more general -purpose CPU systems interact with the simulation engine 106 hardware. In a first embodiment of the present invention the host system 102 is a general purpose PC equipped with one or more special purpose PCI boards establishing the communication channel 108 to the simulation engine 106. The host system 102 performs tasks that are not directly supported by the simulation engine 106, such as file support functions, LAN communication or other parts of the HDL description for the DUT that are not supported by the simulation engine 106.

The simulation engine 106 comprises a plurality of printed circuit boards (PCB) 110 comprising a multiplicity of compute chips 112 soldered on each printed circuit board (PCB) 110. The plurality of PCBs 110 are connected to a backplane 114. Each compute chip 112 comprises a multitude of special purpose simulation processing units (SPUs) arranged in a hierarchical manner and optimized for simulating the behavior of a digital design described in a HDL language.

In a first embodiment of the present invention the SPUs are implemented in field-programmable gate array chips known as FPGAs . Using FPGAs has the advantage of being able to upgrade the verification system 100 in the field by loading new firmware into the simulation engine's 106 FPGAs; the disadvantage is smaller capacity (i.e. fewer SPUs) and lower frequency of operation.

In a second embodiment of the present invention the SPUs are implemented in application specific integrated circuits known as ASICs, this allows for higher capacity and higher frequency of operation but the SPU implementation inside each chip is fixed.

Each SPU is implemented as a traditional full Harvard architecture with pipelined operation, this architecture has for decades been the preferred choice for high performance general -purpose CPU designs. The unique characteristic of the SPU design according to the present invention is however its special -purpose instruction set optimized for computing the functional HDL description of the DUT with exactly the same result as a traditional software simulator would have computed. Prior art relies on converting the HDL description to look-up tables (LUTs) or very simple primitive operations known as "gates" in order to achieve the accelerated simulation speed; a person skilled in the art will know that essential information present in the source HDL description might be lost during such conversion. Especially when the user is debugging the DUT it is important that the original source HDL description is adhered to as closely as possible by the simulation system. Therefore ■ the ability of the simulation engine 106 according to the present invention to accurately simulate the source HDL is essential,.

As shown in figure 1 several hosts 104a, b, c may connect to the simulation engine 106 through communication channels 108 thus enabling multiple simulations, and hence multiple users, to simultaneously utilize the accelerated HDL simulation resources provided by the simulation engine 106. The interface between the communication channel 108 and host system 102 may be implemented as one or more proprietary PCI plug-in boards equipped with special hardware components optimized for the purpose of establishing a fast and efficient link between the simulation engine 106 and the individual hosts 104a, b, c in the host system 102. The communication channel 108 may be one or more high-speed serial connections. The multi-user capability provided by the present invention is an important feature since it allows better utilization of the expensive simulation engine 106 hardware.

In this particular embodiment of the present invention, due to the separate and dedicated physical enclosure of the simulation engine 106, the individual PCBs 110 are not necessarily confined by the PCI standard; the PCBs may for example be made larger in order to increase capacity and efficiency of the simulation engine.

As the verification system 100 comprises of two distinct computation components, namely the host system 102 and the HDL simulation engine 106, the user design in the form of an HDL description of an integrated circuit and related test benches, is partitioned into two distinct parts. As illustrated in figure 2 one part 202 comprises components that are running on the host system 102, these may be, but are not restricted to, test bench components. Another part 204 comprises components that are running on the simulation engine 106, such components will often resemble the actual circuit or design under test (DUT), but may also comprise test bench components. As the DUT in the common case corresponds to an integrated circuit that is going to be fabricated in real silicon and packaged using commercially available packing technologies, the total pin count for the DUT will be limited to a relatively small number.

These pins are represented using a number of inputs 206 that connect the host system to the DUT and a number of outputs 208 that connect the DUT to the host system, in addition theses pins are represented using a number of inout ports 210 i.e. bidirectional ports.

Figure 3, shows a SPU designated in entirety by reference numeral 300 and comprising a data path 302 performing the computational operations necessary to correctly simulate the behavior of the DUT in accordance with the source HDL description of the DUT. The data path 302 is capable of operating directly on what is known as 4-state logic values: logical 0, logical 1, unknown X and unconnected Z. The ability to operate directly on 4-state logic values is a central part of the present invention and forms the foundation, together with the unique instructions of the SPU 300, for accurately simulating the HDL source for the DUT.

Another unique feature of the data path 302 of the present invention is its ability to operate on multi-bit values also known as vectors. Vector operations in the data path 302 allow the SPU 300 to perform more complex computations such as addition and shifting in a single or a few operations thus avoiding the need to break down these operations into simple gate primitives or look-up tables (LUTs) ; the vector operation support is also central to ensuring correct functional simulation in accordance with the HDL source. Analysis has shown that the number of vector operations needed to simulate the behavior of a given digital design is roughly one-tenth the number of primitive single-bit gate operations needed to do the same. This fact contributes significantly to the achievable level of acceleration obtainable with the present invention. The maximum vector length according to the first embodiment of the present invention, which is supported by the data path 302 is 8 bits wide. If, during translation of the source HDL, vectors wider than 8 bits are encountered these are broken_. down to the 8 -bit data types supported by the data path 302, this step preserves all properties of the original wide vectors.

If the SPUs are implemented in FPGAs then the data paths may be implemented with- inherent vector widths optimally suited to the specific DUT under consideration thereby achieving more optimal utilization of the simulation engine 106 for the given simulation task.

Each SPU, such as shown in figure 3, comprises a control block 304 reading instructions from an instruction RAM 306, which instructions then orchestrate the operation of the data path

302. The instruction RAM 306 may be located inside the compute chips 112 or it may be located in memory banks outside the compute chips 112.

The operands for and results from the data path 302 are read and written to a data RAM 308 serving as storage for intermediate values and memory elements known as flip-flops and latches. In the first embodiment of the present invention the instruction RAM 306 and data RAM 308 are implemented using available RAM blocks present inside a FPGA.

Sharing of results among the multitude of SPUs is needed in order to simulate a large digital design. For this purpose each SPU comprises a network interface unit (NIU) 310, which enables and establishes communication to and from other SPUs in the verification system 100 using a general network 312. A given operation encoded in the instruction word may take one or more of its operands from the NIU 310, and a given operation may send its result to the NIU 310. When sending a result to the NIU 310 one or several destination SPUs, as encoded in the instruction word, is also supplied to the NIU 310. The NIU 310 comprises buffers to effectively handle incoming and outgoing traffic. If the data path 302 requests from the NIU 310 a data value, which has not been received yet, the NIU 310 stalls the 'data path 302 until the requested data value has been received. This stall functionality of the NIU 310 enables the simulation engine to dynamically react to various conditions also known as events, for example, one event happening in one SPU might result in several other events and computations in other SPUs or in the host system 102.

Events comprise, but are not confined to, the rising or falling edge of a clock, synchronizing semaphores between multiple clock domains or completion of specific computations.

^'The verification system 100 provides means for multiple users to simultaneously access the simulation engine 106. As- shown in figure 1 multiple host system 102 may be connected to the simulation engine 106 via individual communication channels 116a, 116b and 116c thus enabling a number of users corresponding to the number of hosts 104a, 104b and 104c to simultaneously and completely independently use the simulation engine 106.

When multiple users are using the simulation resources provided by the simulation engine 106 the available SPUs must be allocated to the users according to certain priorities. In the first embodiment of the present invention this administrative task is performed automatically by a special simulation engine server 120 that based on certain dynamic criteria allocates simulation engine resources to the individual users . The simulation engine server 120 according to the first embodiment of the present invention connect to hosts 104a, 104b, and 104c through a exterior network 122 such as a wired or wireless local area network, metropolitan area network, wide area network, inter-network, or any combination thereof. In a second embodiment of the simulation engine server 120, the server 120 connects to the hosts 104a, 104b, and 104c through dedicated ^■ lines or a combination of dedicated lines and the exterior network 122. Further, in a third embodiment of verification system 100 the simulation engine server 120 is comprised in one of the hosts 104a, 104b, or 104c.

Since the allocation of simulation engine resources are dynamic a partial recompilation of the user design may be required when the number of available SPUs change for a given user, this is accomplished by rerunning only the mapping phase and code generation phase of the compilation (described below with reference to figure 16) thus allowing relatively swift adaptation to changes in the simulation engine resources available to a given user. If the simulation engine server 120 decides that too few SPUs are available for a given simulation job, then that job is instead queued until sufficient simulation engine resources are available.

The accelerated simulation in the present invention allows for a certain level of flexibility when allocating resources to a given user. The instruction set based method of simulation means that more instructions may be generated for each SPU thus¹ resulting in fewer allocated SPUs for a given simulation job, at the cost of slower simulation speed. For example, a given simulation job may optimally be mapped to hundred SPUs with hundred instructions in each instruction RAM 306. But that same job might just as well be mapped to for example fifty SPUs with two hundred instructions in each instruction RAM 306; the result is a two-fold reduction in simulation speed. This flexibility allows the simulation engine server 120 to more easily allocate resources among multiple users in a fair fashion.

Figure 4 , shows a PCB according to the second embodiment of the present invention, which PCB is designated in entirety by reference numeral 400. The PCB size is dictated by the specifications in the PCI standard thus the number of compute chips 402 depends on physical constraints, cost etc. In this case sixteen compute chips 402 comprising SPUs may be soldered on the PCB 400. Eight compute chips 402 are mounted on each side of the PCB 400. The PCB 400 further comprises two special link chips 404, which orchestrate communication between the compute chips 402, a PCI controller chip 406, on-board RAM blocks 408, and a connector 410 to the backplane, shown in figure 1 as reference numeral 114, and/or neighboring PCBs.

A PCI interface 412 establishes a communication channel, shown in figure 1 as reference numeral 108, to the host system by plugging into available PCI slots in the host system. The electrical details of the PCI interface 412 are managed by the PCI controller chip 406, which may be any commonly available PCI controller chip. The PCI controller chip 406 presents a simple interface to the link chip 404.

The link chip 404, as shown in detail in figure 5, establishes the link to the host system via PCI controller 406 and to the on-board RAM blocks 408 through NIUs 502a and 502b hooked up to the general hierarchical network, shown in figure 3 as reference numeral 312, connecting all SPUs. Each of these special NIUs 502a and 502b are associated with special hardware 504 and 506, respectively, which hardware is used for controlling a specific resource associated with a particular connected NIU. The resource associated with regular NIUs, shown in figure 3 as reference numeral 310, is computation, but the resource associated with the NIUs 502a is large storage capacity and the resource associated with NIUs 502b is host system communication. A link chip network 508 locally present inside the link chip 404 hooks a local PCB network between the compute chips on a particular PCB on which the given link chip is soldered together with the global PCB network hooking several PCBs together.

Thus the link chip 404 acts as a gateway to other PCBs through connector 410 via backplane 114 as well as provider of storage and host communication capabilities.

The RAM blocks 408 serve several purposes. It is common that digital circuits comprise several RAMs and ROMs as parts of the HDL description and these components may only poorly be simulated by the regular SPUs. As a consequence the software tool chain and method will identify these components and assign them to the available RAM blocks present on each PCB. The NIUs 502a associated with these RAM blocks 408 integrate into the link chip network 508 exactly as if they were part of a regular SPU. Reception of values from link chip network 508 may then e.g. represent an address in the RAM block 408 or a value to be written to a particular location in the RAM block 408, and transmissions to the link chip network 508 may represent values read from the RAM block 408.

Another purpose of the RAM blocks 408 is to hold input stimuli to the DUT and collect results and debug information produced during simulation of the DUT. Certain types of digital designs require very large data sets in order to verify the operation of the DUT; when these data sets are stored in RAM blocks 408 . inside the simulation engine the communication channel, as shown in figure 1 as reference numeral 108, to the host system is not burdened by this traffic. During simulation very substantial quantities of result data may be produced depending on the specific nature of the DUT and depending on the level of debugging requested by the user. Therefore it is preferred to store this data locally in the simulation engine as an alternative to sending all this data to the host system through the limited capacity of the communication channel 108.

A third purpose of the RAM blocks 408 is to collect statistical information regarding specific metrics of the DUT operation. An example of one such metric may be to count the number of times a specific finite state machine (FSM) has entered a specific state or to log the distribution of values seen on a specific bus during simulation. Such metrics are generally known as functional coverage and constitutes an important part of modern verification strategies for large digital designs. The unique accelerated simulation employed in the present invention is ideally suited for more high-level extensions such as functional coverage. One reason for this is the ability to handle 4-state values since this prevents faulty coverage of values that do not have a defined , value . Also, because the SPUs are built around an instruction set, specialized instructions may easily be added without affecting the overall operation of the system 100. This -makes the solution much more flexible and responsive to new requirements and trends in the industry.

The RAM blocks 408 may_. be implemented as SRAMs or SDRAMs or any combination thereof. Depending on the actual DUT memory components being modeled by the RAM blocks 408 it may be beneficial to dynamically duplicate parts of the contents of these large off-chip RAM blocks 408 in local but smaller on- chip RAMs inside the link chip 404. For example, a large ROM table might in the common case only be accessed in a few locations, and therefore dynamically buffering these commonly accessed locations in internal on-chip RAMs will give better response times and thus higher performance of the system 100. Such techniques are commonly known as caching and are usually only seen in high-end general purpose CPU systems. The big advantage of using caches for this purpose in the present invention is that the software tool chain does not have to worry about which parts of the DUT memory components to model in the large but slow responding off-chip RAM blocks 408 and which parts to model in the small but fast responding on-chip RAMs - during the cause of simulation the most frequently accessed data locations are automatically brought into the on- chip RAMs thus optimizing the system performance.

Figure 6, shows a verification system according to a second embodiment of the present invention designated in entirety by reference numeral 600. In the verification system 600, a host system 602 and a simulation engine 606 are enclosed in the same physical enclosure 604. The simulation engine 606 comprises a plurality of PCBs 608. The plurality of PCBs 608 is connected to a proprietary backplane board 610 or connected to neighboring PCBs through connector 616 or connected to both 610 or connected to both a proprietary backplane board 610 and neighboring PCBs through connector 616.

The individual PCBs 608 are connected to the motherboard 612 of the host system 602 through PCI connectors 614 and furthermore the PCBs are connected together via connectors 616 to neighboring PCBs and/or connectors 618 to the backplane board 610 establishing the required interconnections between the PCBs. The verification -system 600 is just one of many possible configurations, depending on the required system performance and system cost there may be more or fewer PCBs and the backplane board may not be needed in certain configurations or it might be replaced by simple wiring or connectors between PCBs .

The NIUs distributed throughout the simulation engine 106 are logically tied together in a general network, shown in figure 3 as reference numeral 312. The network architecture used in the present invention is based on message passing. Two operations are formally defined for message exchange: SEND and RECEIVE. Once a SEND operation has been accepted the general network 312 guaranties that the message will eventually reach its destination, but the transmission time for the message is generally unknown. At the destination NIU any incoming message is buffered until the message is requested by a RECEIVE operation and if the message has not yet arrived when requested. by the RECEIVE the SPU is stalled. In order for the general network 312 to work properly every message must contain thei destination of the message as well as a payload constituting the data value' to be transferred; this means that every NIU in ^• the general network 312 has a unique address. If the message is used solely as an event indication the receiver ignores the payload.

It is worth noting that the logical behavior of the general network 312 described above is entirely decoupled from the actual physical implementation of the general network 312. This is very useful as the software tool chain may stay the same for different physical implementations and configurations of the simulation engine 106 hardware. In order to achieve highest simulation performance the physical implementation of the general network 312 must provide an effective medium for transmitting messages. Two parameters define the effectiveness of the general network 312: transmission delay and transmission bandwidth. The- former quantifies how long time it takes to send a message from any given source to any given destination and the latter quantifies the amount of traffic the general network 312 may sustain at any given intersection of the general network 312.

The physical network 312 according to the first embodiment of the present invention is organized in a hierarchy with three levels each having a local network topology optimized for that particular level . At the lowest level the SPUs are interconnected inside each of the compute chips 112 -hosting the SPUs. At the next level these compute chips 112 are connected via printed circuit traces on each of the PCBs on which these chips are soldered (the local PCB .network) , and finally these PCBs are interconnected thus forming the highest level of the network (the global PCB network) .

Inside each compute chip the SPUs are connected in what is commonly known as a full crossbar, shown in figure 7 as reference numeral 700. This network topology is rather expensive in terms of hardware resource requirements, but the full crossbar 700 offers excellent network performance. Messages are injected into the full crossbar, and collected from the full crossbar using the NIUs 706 part of each SPU. Inside the full crossbar the messages are directed towards its destination NIU using a plurality of chained switches 704a, 704, 704c and 704n. The chained switches may be implemented using equivalent components known to a person skilled in the art .

Besides being able to communicate with each other, SPUs on the same compute chip must also be able to communicate with SPUs located in other compute and/or link chips or located on other PCBs. To establish this off-chip communication dedicated interface NIUs, shown in figure 8 as reference numeral 800, are integrated into the on-chip crossbar 700, this has the big advantage that the SPUs do not need to be equipped with special means of off-chip communication. Figure 9 shows the on-chip crossbar 700 with integrated off-chip communication NIUs 800.

On each PCB the compute chips comprising the SPUs must be connected both to each other and to the link chips. In the first embodiment of the present invention each PCB comprises 16 compute chips and two link chips, these chips are distributed evenly on both sides of the PCB as shown in figure 4.

Figure 10, shows the connections of a local PCB network 1000 between the chips according to the first embodiment of the present invention. The compute chips designated in entirety by reference numeral 1002 and comprising the SPUs are connected in what is known as a fully connected topology. That is, each of the chips has a direct link to any other chip. Using this topology eliminates the need for hub functionality in the compute chips 1002 and it increases the efficiency of the local PCB network since the pins of any given compute chip are not burdened by relaying messages for other chips, as is the case with a hub-based system. In other words, the available bandwidth in relation to the available physical pins on the chips 1002 is higher in a fully connected topology. Since messages are passed directly between the chips 1002 there is no need to encode the full address for the destination

SPU. No hub functions are present and therefore the destination

SPU is known to reside in the receiving chip; this saves precious pin resources on the chips 1002. However, the fully connected topology requires more interfaces than e.g. the topology shown in figure 11 and therefore fewer pins may be allocated per interface. This means that it generally requires more cycles to transfer a message from one chip to another, but this incurred message delay is still smaller than the delay associated with hub functionality.

Figure 10 shows the fully connected topology when used to hook up chips on the same PCB (for clearness only the connections for one chip 1008 is shown) . This particular embodiment comprises 16 compute chips 1002 and two link chips 1004 and 1006; each compute chip is connected to the 15 other Compute chips and to the link chip 1004 located on the same side of the PCB as itself. This local PCB network 1000 topology requires 16 bi-directional interfaces on each compute chip and 8 for each of the link chips 1004 and 1006.

Figure 11 shows the connections of a local PCB network 1100 between chips according to the second embodiment of the present invention. For illustrative purposes only one side of the PCB is shown since the connections are symmetrical for both sides. Each of the solid lines 1102 between the chips, designated in entirety by reference numeral 1104, represent a two-way connection and the dashed lines 1106 represent a two-way connection between a chip on one side of the PCB and a chip on the opposite side of the PCB. The illustrated local PCB network 1100 efficiently connects the link chip 1108 and the compute chips, designated in entirety by reference numeral 1110 and comprising the SPUs. Any chip may reach any other chip either directly or through at most one other chip which then acts as a hub for the given communication channel. This network topology requires 8 bi-directional interfaces on each compute chip and 4 for each of ^' the link chips 1108.

At the highest level of the network hierarchy the individual PCBs are connected together in a global PCB network. The global PCB network may according to the first embodiment of the present invention link PCBs together in a fully connected topology. Each of the PCBs has a direct .path to any other PCB. This topology has the advantage of simplicity since no hub functionality is required, but the number of physical wires might be high. To bring down the physical wire connections required by this topology high speed asynchronous differential serial signaling is used. The message-based network used in the present invention is ideally suited for this kind of connection since there are no requirements with regards to latency, as long as the message eventually arrives at its designated destination. The global PCB network topology may utilise

RocketIO™ transceivers available in the Xilinx Virtex-II Pro FPGAs to implement the connections. In order to get more bandwidth several fully connected network topologies are implemented in parallel, so, instead of just one direct path to each PCB, several connections are made available. Depending on system configuration (number of PCBs etc . ) , the physical connections may be either a number of twisted pair wires or a dedicated backplane such as shown in figure 1 as reference numeral 11 . Due to the fact that the connections are asynchronous the PCBs may have independent clocking circuits and thus special synchronization requirements are not needed, as long as the PCBs are clocked at approximately the same frequency. Analysis have shown that while the computational weight of vector results is high (i.e. many of the computed bits originate from vector operations) the number of single bit results is also high. As an example, consider a DUT with two single-bit operations and one 8 -bit operation. In this example 80% of all the computed bits originate from a vector operation, yet the number of single-bit results is twice as high as the number of vector results. This shows that distribution of single-bit results among the SPUs is common and should be optimized if possible. The present invention optimizes this by allowing messages with a payload of only one bit or zero bits (an event message) to be transmitted in fewer cycles than other messages .

Figure 12 shows one example of the associated method in use. In this particular embodiment of the present invention messages with single- or zero-bit payload may be transmitted in one cycle while all other messages may be transmitted in two cycles. The described messaging is made available via a special

MESEND signal 1202 which, when asserted, indicates the last cycle of a particular message transmission. Figure 12 shows a two-cycle message 1204 followed by two single-cycle messages 1206 and 1208 followed by a two-cycle message 1210.

Figure 13 shows a bi-directional connection 1302 between two chips 1304 and 1306 with the MESEND signal comprised.

In the alternative embodiment of the present invention a traditional ring topology connects a plurality of PCBs 1402 together and in order to obtain the required low latency and high bandwidth multiple rings are implemented in parallel. Letting the rings run through the PCBs in different order satisfies- the latency requirement. In figure 14, eight PCBs

1402 are connected with two parallel bi-directional ring networks where one ring jumps from one PCB to the next while the other jumps in increments of three thereby connecting the PCBs 1402 in a different order, .but still in a ring. With this particular ring topology any PCB may be reached either directly or through at most one hub.

When the user is designing a large chip it is common that several Intellectual Property (IP) blocks are comprised and becomes part of the design. The IP blocks are typically bought from merchant IP providers that deliver these blocks in a form of their choosing, typical IP deliverables comprise: functional models, HDL source code, encrypted HDL source code and gate netlists.

A functional model for the IP block is typically a cycle accurate (or cycle close) behavioral representation of the IP block that is delivered in a compiled form targeting one or more platforms, for example Linux. One of the reasons for IP providers to choose this form of deliverable is to protect their IP. In the present invention these functional models are linked into the part of the simulation executable that is running on the host system 102 in much the same way as it happens for conventional software simulators.

IP blocks delivered as HDL source code or encrypted HDL source code may be compiled to the simulation engine 106 just like the other parts of the user's DUT. For the encrypted HDL source this, however, requires access to the decryption method suitable for the particular encryption scheme used, if such access is not possible the encrypted HDL source will be linked into the part of the simulation executable that is running on the host system 102 in much the same way as it happens for conventional software simulators.

An IP block may also be delivered as a gate netlist targeting one or more gate libraries. This gate netlist is a description of the IP block where the functionality has been broken down to the individual physical primitives that will be implemented in silicon. The gate library comprises a functional HDL model for each of these gate primitives. With the HDL models for each gate in the IP block it is possible to compile the netlist to the simulation engine 106 in the same way as the other parts of the DUT.

Since an IP block is typically delivered in a finished form from the IP provider, debugging the internals of an IP block is seldom required; as such the IP block may be considered a black box but with a well defined interface. The present invention provides possibility for implementing one or more IP blocks directly in FPGAs that are integrated into the simulation engine 106 using special FPGA PCBs. Figure 15 shows the first embodiment according to the present invention of such an FPGA PCB 1500. The central components of this FPGA PCB 1500 are the FPGA 1502 in which the IP block is implemented using standard FPGA synthesis tools, a link chip 1504 and a connector 1506 to the backplane, as shown in figure 1 as reference numeral 114, and/or neighboring PCBs. The link chip 1504 is responsible for handling the interface between the pins on the FPGA 1502 and the message-based network via connector 1506. At the connector 1506 the FPGA PCB 1500 looks exactly like a normal PCB connector 410 and the FPGA PCB 1500 may therefore easily be integrated into the simulation engine 106. In further embodiments of the present invention the FPGA PCB may be larger or smaller, or it may comprise more FPGAs and external RAM blocks etc. depending on configuration and cost etc. The basic property of the FPGA is to provide a platform for integrating synthesized IP blocks into a hardware- accelerated simulation environment.

As a further alternative embodiment of the present invention the FPGA PCB comprises a special socket for connecting packaged chips. This allows the user to integrate a third party hard IP block in the form of a packaged chip into the simulation environment .

Figure 16, shows a method according to the first embodiment of the present invention designated in entirety by reference numeral 1600. The method 1600 commences in step 1602, during which the user design is input to a server, such as shown in figure 1 as reference numeral 120. Then during step 1604 the user design comprising hardware description language (HDL) , assertions and functional coverage is analyzed and partitioned into two parts. A first part corresponds to portions of the input to be run on the host system, while a second part corresponds to portions of the input to be run on the simulation engine. The partitioning in step 1604 is done while preserving the dependencies present in the HDL.

The HDL may be specified by any of the commonly used hardware description languages such as: Verilog, VHDL, SystemVerilog, SystemC, or any combination thereof.

During step 1606 a first code is generated for the first part of the HDL, assertions and functional coverage. This code may resemble HDL, which serves as input to an existing software simulation tool. The generated code may comprise calls to a synchronization mechanism, which synchronizes the processes run on the host system with the processes run on the simulation engine .

During steps 1604, 1606, 1608 through 1610 (generally referred to as compile time) a second code is generated for the second part of the HDL, assertions and functional coverage.

During step 1608 the second part of the HDL, assertions and functional coverage is analyzed and further partitioned into subcomponents corresponding to unique clock domains, since the second part of the HDL, assertions and functional coverage may comprise multiple clock domains. - . .

During step 1612 the server translates the second part of the HDL, assertions and functional coverage into a topological representation comprising executable primitives . The translation is done such that the functional properties of the second part of the HDL, assertions and functional coverage are preserved in the topological representation.

During step 1614 a number of optimization steps are optionally applied. For example, one or more of the following optimization passes may be performed: common sub-expression elimination, dead code elimination, constant propagation, algebraic simplification and interconnect simplification. As an example, consider the following interconnect simplification technique. If a and b are 32 bit busses and c is the concatenation of a and b, then extraction of the least significant 16 bits of b is simpler and equivalent to extracting the least significant 16 bits of c . During step 1616 the second part of the HDL, assertions and functional coverage operations are partitioned into a number of subgroups . Each subgroup is mapped for execution on a specific

SPU. During step 1610 the second code is generated based on the second part of the HDL, assertions and functional coverage, which second code is to run on the simulation engine.

According to the first embodiment of the method 1600 the compilation procedure is implemented such that only the parts that have changed since the last valid compilation run, and possibly closely related portions of ' the HDL, assertions and functional coverage, is recompiled. This compilation time saving technique is commonly referred to as incremental compilation.

The method 1600 further comprises step 1618, during which the host system runs the first code, and step 1620, during which the simulation engine runs the second code. Thi-s section of the method 1600 is referred to as runtime. The method 1600 allows for the host system and .simulation engine to communicate with each other during steps 1618 and 1620.

During the runtime section of the method 1600 the host system and simulation engine utilizes applications and drivers, which are depicted in figure 17. The host simulation process 1702, i.e. the process of simulating code generated from the first part of the HDL, assertions and functional coverage, communicates with the simulation engine processes 1704, i.e. the process of simulating code generated from the second part of the HDL, assertions and functional coverage, using drivers comprising a simulation engine application-programming interface (API) 1706 that makes a number of functions such as input and output transfer from the host system to the simulation engine available to the host simulation process

1702, a communication channel driver 1708 implementing communication primitives corresponding to the chosen bus technology, a simulation engine driver 1710 providing runtime control of the simulation engine processes 1704.

Figure 18, shows the runtime section of the method 1600 according to the first embodiment of the present invention. The runtime section designated in entirety by reference numeral 1800 starts in step 1802. During step 1804 the host system evaluates initialization code. Beginning at step 1806 and bounded by step 1808, the simulation loop 1810 begins and cycles repeatedly until no more simulation events are to be computed, in which case the simulation process is terminated.

During step 1806 the host system code is evaluated. After a number of computation steps this evaluation is completed. If a clock event is detected in step 1812, then the inputs related to this clock domain are sent to the simulation engine during step 1814. On the other hand if no clock event is detected in step 1812, then the control is forwarded to step 1816, during which the host system code is evaluated.

During step 1818 the simulation processes that are dependent on the transmitted input signals are started. These processes run on the simulation engine.

Beginning in step 1820 and bounded by step 1822, the values corresponding to output ports are transmitted from the simulation engine to the host system. During step 1820 the host probes to see if an output is ready, if no outputs are ready then control is transferred back prior to step 1820, i.e. a wait state. If on the other hand an output is ready, then the output value is transferred during step 1824. Step 1822 checks whether there are more outputs that require processing, if not, then the method continues to step 1816, if there are more outputs that require processing the method continues back to step 1820^'.

During step 1816 more host system code is evaluated. During step 1808 whether a finish condition has been met is checked. If the finish condition has been met the method terminates during step 1826. If on the other hand there are remaining events to be computed then the method continues in step 1806.

Before simulation starts all instruction RAMs 306 must be initialized with an instruction sequence for the associated SPU 300 and all data RAMs 308 must be initialized to the unknown state X. Furthermore the RAM blocks 408 must be initialized. To avoid implementing special busses etc. for these purposes the normal^' message-based interconnect system is used to distribute , this information. To distinguish from normal operation, a few extra wires connected to each chip on the PCB might indicate that the initialization process is ongoing.

Initialization of the instruction RAMs 306 is only needed when the DUT is changed and therefore this step may be skipped when possible. The simulation engine 106 may comprise a checksum that uniquely defines the current instruction image loaded into the simulation engine 106, that way whether or not to skip the instruction initialization may be automatically detected. The data RAMs 308 and RAM blocks 408 must be initialized at the beginning of each new simulation and therefore it is desirable to optimize this process. For this purpose each SPU, such as SPU 300, comprises a small FSM, which, on command, initializes all state RAMs with the undefined value. For those RAM blocks

408, that require re-initialization between each simulation, the initial memory image is stored in other parts of the RAM blocks 408. A small FSM will then, on command, copy the initialization data to the appropriate locations in the RAM block 408 before simulation starts. Together these mechanisms ensure swift initialization of all RAMs in the simulation engine 106.

As known to persons skilled in the art, assertions constitute a concise description of required or illegal functional behavior of the user's HDL design. Assertions are in essence a rigorous way of capturing the design specification, and they are primarily used in conjunction with simulation to validate the behavior of the design. In addition, assertions may be used to provide functional coverage of the user design and verification procedure .

According to the preferred embodiment of the present invention, the assertions specified using special assertion constructs, for instance using assertion expression parts of SystemVerilog, are compiled by a server, such as shown in figure 1 as reference numeral 120, directly as part of the simulation compilation, by translating assertion constructs to native simulation engine instructions or sequences of such instructions .

In a first alternative embodiment of the present invention, assertions are described using ordinary HDL constructs such as expressions, combinatorial logic and register assignments.

These constructs are compiled by the server and simulated using the basic techniques that where applied as described above when compiling a user HDL design (figure 16) . In a second alternative embodiment of the present invention, assertions specified using special assertion constructs are preprocessed as part of the simulation method and translated by the server into ordinary HDL constructs such as expressions, combinatorial logic and register assignments. These constructs are in turn compiled by the server and simulated using the basic techniques that where applied as described above when compiling a user HDL design.

If an assertion is triggered during simulation, the trigger value together with a unique assertion identifier may be communicated to the host system using the message-passing network, such as shown in figure 1 as reference numeral 108. Alternatively, the trigger value and related assertion identifier is stored in one of the external RAM blocks 408. Further alternatively, the trigger value and identifier is stored in an internal data RAM location 308.

According to the preferred embodiment of the present invention, the functional coverage specified using special functional coverage constructs, are compiled by a server, shown in figure 1 as reference numeral 120, directly as part of the simulation compilation, by translating the functional coverage constructs to native simulation engine instructions or sequences of such.

The storage locations in one or more of the external RAM blocks 408 or internal data RAMs 308 are reserved for collecting coverage information. When an SPU executes dedicated coverage point SPU instructions, a coverage count is incremented at a storage location specified by the coverage instruction. The coverage instructions are inserted during the compilation process . In an initialization step, executed prior to starting a simulation, all coverage locations may be initialized to zero, if specified by the user. Alternatively, the initialization sequence is only executed prior to the first simulation job, in subsequent simulations the coverage counts are incremented based on what was computed in a prior simulation job, such that the coverage values are effectively accumulated.

When the user is debugging the functionality of the DUT, the values, over time, of the signal wires (also called a signal trace) inside the DUT must be made available to the user. This is commonly done by tracing all or some of the signal values to a file that then may be post-processed by a debugging tool such as a waveform viewer. Due to the typically large number of wires in the DUT and the potential long run time (i.e. many cycles simulated) the amount of information contained in a signal trace may be very substantial .

According to the first embodiment of the present invention only the contents of the data RAMs 308 or parts thereof are saved in the trace for each cycle of the simulation. During simulation trace data is transferred and saved in the on-board RAM blocks 408 via the general network 312 connecting the SPUs. Due to the fact that the present invention operates on 4-state values and by using an instruction set that closely resembles the original operations present in the HDL source, all the signal values, which were not saved explicitly in the trace, for the DUT, may be accurately computed in a post-processing step. It is important to note that, due to the unique method of simulation employed in the present invention, the resulting trace will be identical to that of a traditional software simulator; such equivalence may never be achieved with similar techniques employed in a gate-based or LUT-based hardware simulation system.

When the simulation is complete the user may request the entire trace (located in the RAMs 408) or parts thereof. If, for example, a bug is discovered at the end of the simulation, the user might only want to see the trace for the immediately preceding cycles. The requested parts of the trace data is sent to the host system 102, via any of the communication channels 108, for further post-processing before being presented to the user, for example using a commonly available waveform viewer.

As an alternative to saving the trace in the on-board RAM blocks the trace may be sent directly to the host system 102 . via any of the communication channels 108. Due to the limited bandwidth of a communication channel 116a, 116b and 116c this process will typically slow down simulation but in some situations this may be acceptable, if, for example, the user wants the entire trace anyway. The user determines the method of tracing.

Claims

1. A system for functional verification of an electronic integrated circuit design input, and said system comprising: a server adapted to compile said design input into a first code and second code, a host system connected to said server and operable to execute said first code generated by said server, and a simulation engine connected to said host system through a communication channel and operable to execute said second code generated by said server, and wherein said simulation engine comprises a hardware board comprising a link unit interconnecting a controller unit, storage unit and compute units in a link unit network, wherein each of said compute units comprises a plurality of simulation processing units, and wherein said server is adapted to partition said second code into subgroups and to map each of said subgroups on to one of said simulation processing units.

2. A system according to claim 1, wherein said design input comprises a hardware description language.

3. A system according to claim 2, wherein said design input further comprises assertions.

4. A system according to any of claims 2 to 3 , wherein said design input further comprises functional coverage.

5. A system according to any of claims 1 to 4, wherein said plurality of simulation processing units are operable to compute on vector operands .

6. A system according to any of claims 1 to 5, wherein said plurality of simulation processing units are operable to provide 4-state values.

7. A system according to any of claims 1 to 6, wherein said host system comprises a multiplicity of hosts connecting to said simulation engine through said communication channel.

8. A system according to any of claims 1 to 7, wherein said server is an integral part of said host system and wherein said server connects to said host system through a general hardware board network.

9. A system according to any of claims 1 to 7, wherein said server is a separate part of said host system and wherein said server connects to said host system through an exterior network.

10. A system according to claim 9, wherein said exterior network comprises a wired or wireless dedicated line, local area network, metropolitan area network, wide area network, inter-network, or any combination thereof.

11. A system according to any of claims 1 to 10, wherein said hardware description language comprises VHDL, Verilog, SystemC or SystemVerilog.

12. A system according to any of claims 1 to 11, wherein each of said simulation processing units comprises a data path operable to perform calculations on operands, a control block reading instructions from an instruction RAM, -which instructions then orchestrate the operation of said data path_/ a data RAM serving as storage for intermediate values and memory for said data path, and a network interface unit enabling communication to and from other simulation processing units.

13. A system according to any of claims 1 to 12, wherein said simulation processing unit are implemented in field- programmable gate array chips (FPGA) .

14. A system according to any of claims 1 to 12, wherein said simulation processing unit are implemented in application specific integrated circuits (ASIC) .

15. A system according to any of claims 1 to 14, wherein said simulation processing unit implemented as a traditional full Harvard architecture with pipelined operation.

16. A system according to any of claims 1 to 15, wherein said communication channel is a PCI bus and wherein said hardware board implemented on . printed circuit board comprising a PCI connector for engaging in said PCI bus.

17. A system according to any of claims 12 to 16, wherein said instruction RAM and data RAM are implemented using available RAM blocks present inside a FPGA or an ASIC.

18. A system according to any of claims 8 to 17, wherein said link unit interconnects said link unit network and said general hardware board network.

19. A system according to any of. claims 1 to 18, wherein said simulation processing units are interconnected inside each of said compute units by an on-chip network configured with a full crossbar, and wherein said simulation processing units of a first compute unit are interconnected with simulation processing units of another compute unit by said link network established by a link network interface unit of said link unit, which link network interface unit is integrated on to the full crossbar of said on-chip network.

20. A system according to any of claims 1 to 19, wherein each of said -compute units on said hardware board is interconnected in a local board network configured in a fully connected topology having each unit having a direct link to any other unit on said hardware board.

21. A system according to any of claims 1 to 19, wherein each of said compute unit on said hardware board is interconnected in a local board network configured as a multiple bidirectional ring topology.

22. A system according to any of claims 1 to 21, wherein said system comprises a plurality of said hardware boards interconnected in a global board network configured as a fully connected topology having one or more direct paths from each board to each board operating with high speed asynchronous differential serial signaling.

23. A system according to any of claims 1 to 22, wherein said system further comprises a physical enclosure for housing said host system and said simulation engine, a backplane for connecting a first plurality of hardware boards, a motherboard for connecting through receiving slots a second plurality of hardware boards, and a connector for interconnecting said first and second plurality of hardware boards.

24. A system according to any of claims 1 to 23, wherein said hardware board comprises RAM blocks connecting to said link network through a RAM block network interface unit ,^■ and said

RAM block operable for simulating RAMs and ROMs as parts of a device under test described in said hardware description language and operable for storing input stimuli to said device under test and for collecting results and debug information produced during simulation of said device under test.

25. A system according to claim 24, wherein said RAM blocks is operable to collect statistical information regarding specific metrics of the operation of said device under test .

26. A system according to any of claims 1 to 25, wherein said assertions comprising constructs translated by said server to native simulation engine instructions or sequences of such instructions .

27. A system according to any of claims 1 to 26, wherein said functional coverage comprising constructs translated by said server to native simulation engine instructions or sequences of such instructions.

28. A system according to claim 27, wherein said constructs comprises assertion expression parts, which are preprocessed and translated by the server into ordinary hardware description language constructs such as expressions, combinatorial logic and register assignments.

29. A method for functional verification of an electronic integrated circuit design input, and said method comprising: receiving said design input for by means of a server, partitioning said design input in a first part and in a second part by means of said server, compiling said first part thereby generating a first code to be executed on a host system, partitioning said second part into sub-groups and mapping each of said sub-groups to one of a plurality of simulation processing units of a simulation engine, compiling said partitioned second part thereby generating a second code to be executed on each of said plurality of simulation processing units of said simulation engine by means of said server.

30. A method according to claim 29, wherein said design input comprises a hardware description language.

31. A method according to claim 30, wherein said design input further comprises assertions.

32. A method according to any of claims 30 to 31, wherein said design input further comprises functional coverage.

33. A method according to any of claims 29 to 32, wherein said plurality of simulation processing units are operable to compute on vector operands .

34. A method according to any of claims 29 to 33, wherein said plurality of simulation processing units are operable to provide 4-state values.

35. A method according to any of claims 29 to 34 further comprises analyzing said second part for identifying clock domains and dividing said second part into subcomponents corresponding to unique clock domains .

36. A method according to any of claims 29 to 35 further comprises translating said second part into a topological representation comprising executable primitives thereby preserving functional properties of the second part in said topological representation.

37. A method according to any of claims 29 to 36 further comprises optimizing said second part by common sub-expression . elimination, dead code elimination, constant propagation, algebraic simplification, interconnect simplification, or any combination thereof.

38. A method according to any of claims 29 to 37, wherein said functional coverage comprising collecting statistical information regarding specific metrics of the operation of said device under test by means of a RAM block.

39. A method according to any of claims 29 to 38 further comprising translating said assertions comprising constructs to native simulation engine instructions or sequences of such instructions by means of said server.

40. A method according to claim 39 further comprising preprocessing and translating said constructs comprising assertion expression parts into ordinary hardware description language constructs, such as expressions-, combinatorial logic and register assignments, by of said server.

41. A method according to any of claims 29 to 40 further comprising a runtime section comprising following steps: evaluation of initialization code by means of said host system, repeating a simulation run until no more simulation events are to be computed, in' which case the simulation process is terminated, pre-evaluating host system test bench code, and detecting clock event, if a clock event is detected, then the inputs related to this clock domain are transmitted to said simulation engine by means of a communication channel, if no clock event is detected, then inputs related to this clock domain are forwarded to post-evaluating host system test bench code.

42. A method according to claim 41 further comprises starting simulation processes, which are dependent on transmitted input signals by means of said simulation engine.

43. A method according to any of claims 41 to 42 further comprises requesting outputs from said simulation engine, if no outputs are ready then said method enters a wait state, if an output is ready, then said output is transferred from said simulation engine to said host system, and comprises checking whether there are more outputs that require processing, if not, then the method continues to said post-evaluating host system test bench code, if there are more outputs that require processing the method returns to requesting outputs form said simulation engine.

44. A simulation processing unit for receiving mapping of a part of a hardware description language and comprising a data path operable to perform calculations on operands, a control block reading instructions from an instruction RAM, which instructions then orchestrate the operation of said data path, a data RAM serving as storage for intermediate values and memory for said data path, and a network interface unit enabling communication to and from other simulation processing units.