WO2002067137A1

WO2002067137A1 - Vector and scalar signal processing

Info

Publication number: WO2002067137A1
Application number: PCT/US2000/035385
Authority: WO
Inventors: Edward R. Prado; Edward E. Ille; Dean W. Brenner; John P. Prewitt
Original assignee: Honeywell International Inc.
Priority date: 2001-02-01
Filing date: 2001-02-01
Publication date: 2002-08-29

Abstract

Data, such as an image, is supplied to a signal processor that contains a vector signal processor and a scalar processor and a controller to control the flow of results between them. Vector processing scripts are stored in association with the vector processor in memory unit, such as SRAM and programs used by the scalar processor are stored in an associated memory unit. The vector processor performs vector operations and the results are sent to the scalar processor by the controller and the vector processor performs operations on new data as the scalar processor operates.

Description

VECTOR AND SCALAR SIGNAL PROCESSING

Field of the Invention

This invention relates to complex signal-processing that use algorithms combining vector and scalar constructs.

Background Vector operations, such as Fast Fourier Transforms (FFT) and pixel element manipulations, are array oriented: each data set, i.e., vector, is processed as a vector array. Scalar operations, on the other hand, are non-vector operations, which comprise flow control or decision making operations. Signal processing algorithms can be partitioned into a series of both vector and scalar operations. Algorithm implementation using hardware optimized for each type of operation is the most efficient method of execution. A vector processor can be eight to thirty-seven times more efficient in vector operations than a scalar processor. Vector processors could be implemented using common devices such as Field Programmable Gate Arrays (FPGA) or Application Specific Integrated Circuits (ASIC). Scalar operations can be handled using general purpose digital signal processors (GP-DSP). GP-DSPs can be used to perform both vector and scalar operations. Due to the GP-DSP's inefficiency at vector operations, it has sometimes been necessary to use multiple GP-DSPs to obtain the required performance. Because DSPs are flow control intensive, multiple DSPs require complex software overhead to manage the operation each of the DSPs. On the other hand, pure use of a vector processor for these hybrid applications is not necessarily anymore practicable.

Summary of the Invention

An object of the present invention is providing a more efficient way to perform signal processing by providing a integrated hybrid vector/scalar processor.

According to the present invention, a signal processor comprises a vector processor and a scalar processor whose operation is managed by a controller. The controller uses stored program scripts (algorithms) to schedule the operations of the vector processor and scalar processor. The vector and scalar functions (algorithm components) are individually stored programs associated with each processor.

According to the invention, each processor performs its respective operations as commanded by the controller. The order in which these operations are performed is dependent upon the algorithm being implemented. The controller is responsible for ensuring correct algorithm execution and data flow between the two processors. Among the features of the present invention is that it can accommodate many DSPs by redefining the PROM based algorithms contained on the controller and scalar processor which is especially useful for so called "space applications" which often require on-orbit reconfiguration of missions algorithms.

Another feature of the present invention is that it is especially useful for signal processing in such applications as scatterometer/radar, image compression and hyperspectral imaging.

Another feature of the present invention is that it can be constructed using currently available vector and scalar processors.

Other objects, benefits and features of the invention will be apparent from the following discussion of one or more embodiments.

Brief Description of the Drawing

Fig. 1 is a block diagram of a system embodying the present invention.

Fig. 2 illustrates a JPEG vector/scalar data compression process that can be performed with the present invention.

Fig. 3A-3B is a flow chart showing the operations of the two processors in performing the JPEG data compression shown in Fig. 2.

Detailed Description

The system shown in Fig. 1 contains a vector processor 10 and a GP-DSP (general purpose digital signal processor) 12, both controlled by a programmed controller 14, a dedicated processor, such as an ASIC (application specific integrated circuit) or FPGA (field programmable gate array). A PROM 24 contains the instructions for the controller 14 in the form of controller instructions generated from scripts. The vector processor 10 also has its own memory devices 18, in this case S AMs, as does the DSP 12. In this particular example, an image source 20 provides data to a buffer 22 with an output connected to the input of the controller 14. The PROM 24, it will be explained, is also used to schedule the operations of the vector processor 10 and the scalar processor 12. The result of the vector and scalar processing is an output from the controller 14 supplied to a output buffer 26. In this specific example the output is a "compressed image".

The Vector Processor

Vector processors are optimized to implement a set of high- level instructions that support pipe line oriented operations, such as the compression shown in Fig. 2, the input in Fig. 1, or FFT functions, as discussed previously. The Sharp brand 9124 processor, manufactured by Sharp Electronics and DSP24 brand processor, manufactured by DSP Architectures are commercially available examples of these devices and can be used in this invention. These particular vector processing devices support several signal processing functions: time domain processing that includes Fast Fourier Transforms (FF ), Finite Impulse Response (FIR) filtering, vector operations, logical array processing (real and complex) and such tasks as convolution and digital modulation/demodulation . Vector processors typically have multiple complex bidirectional data ports that are highly flexible (any port to any port routing) and capable of the simultaneous reading and writing of data. Vector processors can typically be cascaded together to support complex vector operations and significantly increase performance. For example a single vector processor operating at 50 MHz can perform an FFT operation in 42μs. If two identical VPs are cascaded together then this same operation can be performed in only 2 lμs. A vector processor is "pass-based", where a single instruction is implemented as one digital signal processing function on the entire vector array, that is passed through the chip from one port to another, instead of reading new instruction for each cycle. A GP-DSP, on the other hand, is required to fetch and decode instructions for every data entry within the vector array that is inputted. Vector operations use the same instruction fetch for every piece of data, which presents significant processing overhead if implemented in a GP-DST, as compared to performing the same operation in a vector processor. For example, to do vector addition of the following vectors [al, a2, a3, a4] + [bl, b2, b3, b4] = [al+bl, a2+b2, a3+b3, a4+b4] a scalar processor would use at least four instructions to perform the above example. This would require both instruction and data fetches, using several clock cycles per addition. A vector processor will use one instruction, in this case "add", and perform this operation every clock cycle while vector data is read in. This operation continues until the vector processor is scheduled to stop by the controller.

The GP-DSP

The GP-DSP 12 processor provides data flow control to the output buffer and handles the scalar portion of the signal processing. In that respect, it should be noted in Fig. 2 that block 26 illustrates vector computations performed on the inputted pixel block 27. After the computation at block 26 is performed, the scalar process is carried out following block 28 by the scalar processor 12. This JPEG compression algorithm is well-known by those skilled in the art to which this invention relates and is illustrative of the type of complex, hybrid processing sometimes needed. Briefly, the steps in Fig. 2 involve deternnning the frequency components of an 8 by 8 pixel block by performing a 2-D Discrete Cosine Transform at step 26a to produce output data that is quantized (or decimated) in step 26b, which compresses the data according to a desired compression ratio parameter. The output from the decimation step 26b is subjected to binary encoding 26a, producing a binary output stream and pre-formatted to an Environmental Data Record (EDR) per (CCSDS) step 28b, producing a serial data output that is routed via bus 12 from the scalar processor through the controller 14 to be output buffer 26 as the compressed pixel data. Focusing on the controller 14, the device provides the interface between the vector processor 10 and the scalar processor 12 allowing both of the processors 10, 12 to operate independently. This approach allows the scalar processor 12 to operate on the first set of data set inputted at time (tl) while the vector processor operates on the next data set, inputted at time (t2).

One skilled in this art will understand that high- performance signal processing can be achieved by defining algorithms such as the compression algorithm shown in Fig. 2 into efficiently mapped vector and scalar constructs. This will be illustrated in the context of Figs. 3A, 3B, which illustrates the image compression algorithm in Fig. 2 to show that once an algorithm has been defined, vector and scalar operations can be partitioned between the vector then scalar processors 10, 12. A closely coupled integrated development standard for this allocation of processes should allow the GP-DSP to perform scalar processing while the vector processor is operating on the next data set. In contrast, those familiar with this technology will appreciate that past approaches have required hand coded vector constructs which had to be manually synchronized with scalar processing constructs. Such an approach mandates multiple development environments, resulting in reduced processing performance for each particular processor architecture. The system shown in Fig. 1 allows for scripts to be executedfrom SRAMs 18 as well as the PROM 24 to simultaneously carry out vector and scalar processing constructs with a controller scheduling all processing events of each processing element. Referring to Figs. 3A, 3B, as noted before, these illustrate the application of the invention on one example of a known process requiring vector and scalar signal processing, JPEG image compression. In the first step Sl.l, the first pixel block is applied to the buffer 12. Once the buffer is full, as determined in step Sl.2 by the controller 14, at step Sl.3, the controller initializes the vector processor 10 and its MMUs 10A to run the first vector computation, block 26 beginning at step S1.4. To run this process, the vector processor 10 is setup by the controller 14 using instructions which have been generated using scripts and stored in PROM 24. Once setup, the controller 14 then cues the vector processor to start execution. At step Sl.5, the controller 14 queries vector processor 10 to find out if step S 1.4 is done for the entire image block of "n" bits. If so, the result from block 26a is stored in one of the vector processor 10 SRAMs 18, in step S1.6. At step Sl.7, the controller 10 directs the vector processor 10 to perform, at step Sl.8, the process of block 26b. The process is performed on the data previously stored in step S1.6. In step S1.9, the controller determines if the decimation step 26b has been performed. An affirmative answer moves the process to step SI.10 in the controller 10, storing the decimation results in one of the vector processor 10 SRAMs. The vector processing steps having been completed and the results stored, at step Si.11, the controller 10 initializes or cues the scalar processor for the blocks 28a, 28b. The scalar operation begins with step 2.1, where the controller 10 insures that the scalar processor 12 is initialized. At step S2.2, the scalar processor waits until the controller 10 determines that the decimation data is present in a vector processor SRAM. That data is read in step 2.3 by the controller 10 to the scalar processor 12 over a bus 12a, whereupon the scalar processor 12 begins the binary encoding of block 28a by using a decision based processing scheme. Once the encoding for block 28a is completed, as determined by the controller 14 in step of S2.5, the binary stream 28c is stored in the scalar processor's 12 SRAM 18a in step S2.6. The scalar processor 12 notifies the control at step S2.7 that the encoding is complete for the first pixel block, and at that point, the controller commands the MMU 10a at step S2.8 to empty the SRAM pulling the data storage at step Si.10. At the same time, step Sl.l is repeated, which begins the process to receive the next image block and perform the vector processing while the scalar processor is completing its operations on the earlier image block. In the following step S2.8, the scalar processor 12 completes the formatting of block 20b in the step S2.9, and in step S2.10, the controller 14 retrieves the formatted data from the scalar processor 12, outputting it to the buffer 26, where the buffer output 26a is the compressed image of the first pixel block 27.

It should be apparent from the above that while in this particular example, JPEG image compression, the scalar processor waits for the vector processor, that in with other signal processes they can operate simultaneously. The partition is a function of the type of algorithm. The controller schedules the start of both vector and scalar processes based on when the next algorithm process can begin and when completion indicators are received from each type of processor.

With the benefit of the previous discussion of the invention, one of ordinary skill in art the may be able to modify the invention and its components and functions in whole or in part without departing from the true scope and spirit of the invention .

Claims

ClaimsWe daim:

1. An integrated circuit signal processing system, characterized by: a first signal processor for performing vector signal processing solutions; a first memory dedicated to said first signal processor for storing specific script operations to carry out a plurality of said vector processing solutions; a second signal processor for perfoπning deάsional signal processing routines; a second memory dedicated with said second signal processor for storing said deάsional signal processing routines; a controller that receives and transmits instructions to said first and second signal processors according to criteria based on data received by said controller, said criteria induding causing the first signal processor to perform vector signal processing operations according to said scripts on a first unit of said data to produce a result and transferring to said second signal processor said result and upon said transfer causing said second signal processor to perform said deάsional signal processing operations according on said results and to produce an output of said results for said data ; and a programmable memory dedicated to the controller for storing said criteria.

2. The integrated signal processing system described in daim 1, further characterized in that said controller causes the first signal processor to perform said vector signal processing on a second unit of data upon said transfer.

3. The integrated signal processing system described in daim 1, further characterized in that said scripts define vector solutions for each of a plurality of data requiring vector and scalar operations that are supplied to the integrated signal processing system.

4. An integrated circuit signal processing system, characterized by: a first signal processor for perforating vector signal processing solutions; a first memory dedicated to said first signal processor for storing specific script operations to carry out a plurality of said vector processing solutions; a second signal processor for performing deάsional signal processing routines; a second memory dedicated with said second signal processor for storing said deάsional signal processing routines; a controller that receives and transmits instructions to said first and second signal processors according to criteria based on data received by said controller, said criteria causing the first signal processor to perform vector signal processing operations according to said scripts on a first unit of said data to produce a result and causing said second signal processor to perform said deάsional signal processing operations on a second unit of said, the first signal processor and the second signal processor producing an output for said first unit and said second unit of data ; and a programmable memory dedicated to the controller for storing said criteria.