[go: up one dir, main page]

WO2025029644A2 - Processing nodes for signal processing in radio transceivers - Google Patents

Processing nodes for signal processing in radio transceivers Download PDF

Info

Publication number
WO2025029644A2
WO2025029644A2 PCT/US2024/039801 US2024039801W WO2025029644A2 WO 2025029644 A2 WO2025029644 A2 WO 2025029644A2 US 2024039801 W US2024039801 W US 2024039801W WO 2025029644 A2 WO2025029644 A2 WO 2025029644A2
Authority
WO
WIPO (PCT)
Prior art keywords
data
burst
memory
dcx
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/US2024/039801
Other languages
French (fr)
Other versions
WO2025029644A3 (en
Inventor
Vrishbhan Singh SISODIA
Sameep Dave
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Viasat Inc
Original Assignee
Viasat Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Viasat Inc filed Critical Viasat Inc
Publication of WO2025029644A2 publication Critical patent/WO2025029644A2/en
Publication of WO2025029644A3 publication Critical patent/WO2025029644A3/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B1/00Details of transmission systems, not covered by a single one of groups H04B3/00 - H04B13/00; Details of transmission systems not characterised by the medium used for transmission
    • H04B1/38Transceivers, i.e. devices in which transmitter and receiver form a structural unit and in which at least one part is used for functions of transmitting and receiving
    • H04B1/40Circuits
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/167Interprocessor communication using a common memory, e.g. mailbox
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B1/00Details of transmission systems, not covered by a single one of groups H04B3/00 - H04B13/00; Details of transmission systems not characterised by the medium used for transmission
    • H04B1/0003Software-defined radio [SDR] systems, i.e. systems wherein components typically implemented in hardware, e.g. filters or modulators/demodulators, are implented using software, e.g. by involving an AD or DA conversion stage such that at least part of the signal processing is performed in the digital domain

Definitions

  • the present disclosure generally relates to processing nodes for signal processing in radio transceivers.
  • a radio transceiver can receive an analog radio-frequency (RF) signal and convert it into a digital signal via analog to digital conversion (ADC) and relatedly, the radio transceiver can convert a digital signal ready for transmission via digital to analog conversion (DAC).
  • Digital signal processing includes the processing steps performed on the digital signal (e.g., after processing by the ADC and/or before processing by the DAC).
  • Digital signal processing can be applied to extract, filter, and enhance useful information in the signal. This processing may include processes or operations such as filtering, equalization, modulation, demodulation, channel coding or decoding, error correction, and noise reduction. Other processes that can be performed include frequency conversion (e.g., converting to a baseband frequency or to a carrier frequency), gain control, amplification, and the like.
  • the present disclosure relates to a processing node (PN) that includes a first digital signal processor (DSP) core; a second DSP core; a plurality of extended direct memory access controllers (DCX), each DCX having shared memory space, an input packet interface, and an output packet interface, the input packet interface configured to receive samples from a hardware block separate from the processing node, the shared memory space configured to store the received samples, and the output packet interface configured to transmit samples processed by the first DSP core or the second DSP core to the hardware block; and a PN network interconnect configured to communicably couple the first DSP core, the second DSP core, and the plurality of DCX, each DSP core and DCX coupled to the PN network interconnect through a respective master interface and a respective slave interface, the PN network interconnect further including an SDP master interface and an SDP slave interface each configured to communicate with an SDP network interconnect.
  • the processing node is configured to be integrated into a radio transceiver comprising the hardware block
  • the PN network interconnect further includes a configuration interface configured to enable the processing node to configure the hardware block.
  • the processing node further includes a queue interface configured to transfer commands or data from the first DSP core to the second DSP core and to transfer commands or data from the second DSP core to the first DSP core.
  • the processing node further includes a first queue interface and a second queue interface, the first queue interface configured to transfer commands or data from the first DSP core to the second DSP core, the second queue interface configured to transfer commands or data from the second DSP core to the first DSP core.
  • each DSP core includes a general-purpose input-output (GPIO) port connected to a configuration register and configured to receive input for placement in the configuration register and to transmit data stored in the configuration register.
  • each DSP core is configured to receive interrupt requests through the PN network interface from the hardware block that is separate from the processing node.
  • a first DCX of the plurality of DCX is configured to: receive a plurality of samples, from a first hardware block, to be processed through the input packet interface; temporarily store the plurality of samples in the shared memory space; and transmit the plurality of samples to the first DSP core through the PN network interconnect.
  • the first DSP core or the second DSP core is configured to: program the first DCX to convey the plurality of samples to the first DSP core; place the plurality of samples into an internal memory space of the first DSP core; process the plurality of samples; and place the processed samples into the internal memory space of the first DSP core.
  • the first DCX is further configured to reformat the received plurality of samples.
  • the first DCX is configured to reformat the received plurality of samples by sign extending samples of the received plurality of samples to increase the number of bits for each sample.
  • the first DCX is configured to reformat the received plurality of samples by bit clipping samples of the received plurality of samples to reduce a resolution of each sample.
  • a first DCX of the plurality of DCX is configured to: receive a plurality of samples to be processed through the input packet interface; temporarily store the plurality of samples in the shared memory space; and transmit the plurality of samples to the hardware block separate from the processing node for processing using the output packet interface.
  • the first DCX is configured to: receive the processed plurality of samples from the hardware block; and temporarily store the processed plurality of samples in the shared memory space.
  • the first DSP core and the second DSP core are configured to be used both as separate entities and as a shared dual-core configuration.
  • the first DSP core and the second DSP core each include two processors and the plurality of DCX includes a DCX for each processor of the first DSP core and the second DSP core.
  • the SDP master interface and the SDP slave interface of the PN network interconnect are configured to communicate with a PN network interconnect of a different processing node in the radio transceiver via the SDP network interconnect.
  • the first DSP core, the second DSP core, and each of the plurality of DCX includes a configuration register configured to store data to configure the associated DSP core or DCX.
  • the processing node is configured to be implemented within a demodulator of the radio transceiver. In some embodiments, the processing node is configured to be implemented within a decoder of the radio transceiver. In some embodiments, the processing node is configured to be implemented within a modulator or encoder of a transmitter of the radio transceiver.
  • the present disclosure relates to a signal processing architecture comprising: a software defined physical layer (SDP) network interconnect; a plurality of processing nodes connected to the SDP network interconnect and configured to provide configurable processing power to process receiver and transmitter waveforms in a radio transceiver, each processing node including: a plurality of digital signal processing (DSP) cores; a plurality of extended direct memory access controllers (DCX); and a PN network interconnect connected to the plurality of DSP cores, to the plurality of DCX, and to the SDP network interconnect; a capture memory array (CMA) comprising a plurality of memory banks that are connected to the SDP network interconnect to provide access to the plurality of memory banks for the plurality of processing nodes; and a CPU subsystem connected to the SDP network interconnect, wherein the SDP network interconnect enables communication among each of the plurality of processing nodes, the CMA, and the CPU subsystem to augment processing power and functionality in the
  • SDP software defined physical layer
  • one or more of the plurality of processing nodes can be dynamically allocated to provide signal processing power to one or more hardware blocks of the radio transceiver.
  • each processing node of the plurality of processing nodes is configured to interface with one or more individual hardware blocks of both receiver and transmitter signal processing data paths in the radio transceiver.
  • a first processing node of the plurality of processing nodes is implemented in an encoder or modulator of the radio transceiver.
  • a second processing node of the plurality of processing nodes is implemented in a demodulator of the radio transceiver.
  • a third processing node of the plurality of processing nodes is implemented in a decoder of the radio transceiver.
  • the signal processing architecture further includes an external memory connected to the CPU subsystem, the plurality of processing nodes configured to pass data from individual DSP cores to the external memory through the SDP network interconnect.
  • individual processing nodes of the plurality of processing nodes are integrated within different portions of a demodulator.
  • each processing node includes an SDP master interface and an SDP slave interface to the SDP network interconnect
  • the CMA includes a plurality of SDP master interfaces to the SDP network interconnect
  • the CPU subsystem includes an SDP master interface and an SDP slave interface to the SDP network interconnect.
  • the signal processing architecture further includes a second SDP network interconnect connected to the SDP network interconnect; and a second plurality of processing nodes connected to the second SDP network interconnect, each processing node of the second plurality of processing nodes including one or more digital signal processing (DSP) cores, one or more extended direct memory access controllers (DCX), and a PN network interconnect connected to the plurality of DSP cores, to the plurality of DCX, and to the second SDP network interconnect.
  • DSP digital signal processing
  • DCX extended direct memory access controllers
  • PN network interconnect connected to the plurality of DSP cores, to the plurality of DCX, and to the second SDP network interconnect.
  • the present disclosure relates to a method for passing data to a processing node in a signal processing architecture that includes a software defined physical layer (SDP) network interconnect connected to the processing node, the processing node including a digital signal processing (DSP) core, an extended direct memory access controller (DCX), a packet receiver, a packet transmitter, and a PN network interconnect connected to the SDP network interconnect, the method comprising: utilizing a buffer pointer queue to manage available memory, the buffer pointer queue comprising a plurality of buffer pointers that each identify a buffer address in memory that is available for storing data; utilizing a work descriptor queue to manage burst data to be processed, the work descriptor queue comprising a plurality of work descriptors that each identify a buffer address in memory that includes burst data to be processed; responsive to the packet receiver receiving burst data to be processed: retrieving a first buffer pointer from the buffer pointer queue; processing the received burst data;
  • SDP software defined physical layer
  • DSP
  • the work descriptor further includes a data header length indicating an amount of storage occupied by a data header associated with the burst data.
  • the work descriptor further includes: a burst start flag indicating that the burst data belongs to a first packet of a burst; and a burst end flag indicating that the burst data belongs to a last packet of a burst.
  • the work indicator indicates the burst data is a fully contained burst by setting the burst start flag and the burst end flag to true.
  • the method further includes adding the new work descriptor to the work descriptor queue. In some embodiments, the method further includes adding the new buffer pointer to the buffer pointer queue. In some embodiments, the method further includes, responsive to receiving the buffer pointer queue and the work descriptor queue: retrieving a second work descriptor from the work descriptor queue; obtaining data from memory at the buffer address indicated by the second work descriptor; retrieving a second buffer pointer from the buffer pointer queue; processing the retrieved data to generate output processed data; storing the output processed data in memory at the buffer address indicated by the second buffer pointer; outputting a new work descriptor, the new work descriptor including the buffer address indicated by the second buffer pointer; and outputting a new buffer pointer, the new buffer pointer indicating the buffer address indicated by the second work descriptor. In some embodiments, each work descriptor further includes a burst identifier of the burst data to be processed and a burst length indicating an amount of
  • the method further includes monitoring a fill level of the work descriptor queue by the DCX; responsive to determining that the fill level is below a threshold fill level, adding one or more work descriptors to the work descriptor queue from a work descriptor list.
  • the method further includes monitoring a fill level of the buffer pointer queue by the DCX; responsive to determining that the fill level is below a threshold fill level, adding one or more buffer pointers to the buffer pointer queue from a buffer pointer list.
  • the present disclosure relates to a method for converting between sample streams or symbols streams and messages in a signal processing architecture for storing and processing by a processing node that includes a digital signal processor (DSP) core and an extended direct memory access controller (DCX), the method comprising: receiving a sample stream that includes a burst to be processed by the signal processing architecture; generating a header message including information related to the burst; splitting the sample stream into a plurality of burst messages, a size of each burst message, except for a final burst message corresponding to an end of the burst, corresponding to a buffer size in the DCX, the size of the final burst message being less than or equal to the buffer size in the DCX; generating a footer message including information related to a size of the plurality of burst messages; transferring a burst interface packet to the DCX, the burst interface packet including the header message, the plurality of burst messages, and the foot
  • the sample stream is received by a component of the processing node.
  • the method further includes identifying a start flag and an end flag within the sample stream to determine end points of the burst in the sample stream.
  • the method further includes identifying a first start flag and a second start flag within the sample stream to determine end points of the burst in the sample stream, the end points being the first start flag and data preceding but not including the second start flag.
  • each packet of the sample stream has a size in bits equal to a size of a word in the DCX.
  • splitting the sample stream into the plurality of burst messages is responsive to identifying a start of frame indicator in the sample stream.
  • splitting the sample stream into the plurality of burst messages terminates responsive to identifying an end of frame indicator in the sample stream.
  • the header message includes a burst counter that increments responsive to identifying a boundary of the burst.
  • the initial portion of the burst memory packet is sized to be less than or equal to a size of two words in memory of the DCX.
  • reformatting the burst interface packet further includes converting a first word size of data in the burst interface packet to a second word size that is compatible with the DCX, the second word size being greater than the first word size.
  • the present disclosure relates to a method for accessing memory in a signal processing architecture that includes a capture memory array (CMA), the CMA including a plurality of random access memory (RAM) modules, each RAM module being logically split into a plurality of memory banks that are sequentially arranged, the method comprising: receiving a plurality of requests to access RAM modules in the CMA, each request of the plurality of requests including a memory address in the CMA corresponding to a memory within a particular RAM module; for each request, deriving from the memory address in the request a particular bank of the plurality of banks in the RAM module, the particular bank including the memory address in the request; responsive to determining that two requests of the plurality of requests request access the same bank in the same RAM module: determining a priority among the two requests; granting access to the requested bank to the request of the two requests with a higher priority; and delaying the request of the two requests with a lower priority by a clock cycle; and for each request that requests consecutive
  • the plurality of banks is low-ordered interleaved and the number of banks of the plurality of banks is a power of 2.
  • the plurality of RAM modules is further divided into a plurality of channels, the number of channels of the plurality of channels is a power of 2.
  • the plurality of channels is high-ordered interleaved.
  • the method further includes, responsive to determining that two requests of the plurality of requests results in a request to access different banks in the same RAM module, granting simultaneous access to the two requests to the respective requested banks.
  • determining the priority comprises assigning a lower priority to the request that most recently accessed the requested RAM module.
  • FIG. 1 illustrates an example radio system with a modem that incorporates a plurality of processing nodes with a plurality of hardware blocks.
  • FIG. 2 illustrates an example processing node that includes a plurality of digital signal processing (DSP) cores and a plurality of extended direct memory access (DMA) controllers (DCX).
  • DSP digital signal processing
  • DMA extended direct memory access
  • FIG. 3A illustrates a diagram of an example DCX of a processing node, such as the processing node of FIG. 2.
  • FIG. 3B illustrates a diagram of an example core of a DSP core of a processing node, such as the processing node of FIG. 2.
  • FIGS. 3C and 3D illustrates a detailed diagram of the example DOX of FIG. 3A.
  • FIG. 4 illustrates a diagram of data flow through a processing node, such as the DOX and the DSP core of FIGS. 3A and 3B.
  • FIG. 5A illustrates an example of a shared memory model for DSP cores in a processing node.
  • FIG. 5B illustrates an example of a producer-consumer model for DSP cores in a processing node.
  • FIGS. 6A and 6B illustrate example software-defined physical layer (SDP) architectures.
  • FIG. 7A illustrates an example radio transceiver that includes a CPU subsystem, a demodulator, a high-speed serial module, a decoder module, and a transmit module, the example radio transceiver including an SDP architecture similar to the SDP architecture of FIG. 6B.
  • FIG. 7B illustrates the demodulator back module of FIG. 7A in greater detail to show connections between hardware blocks and a processing node.
  • FIG. 8A illustrates an example of a buffer pointer.
  • FIG. 8B illustrates an example of a work descriptor.
  • FIG. 9 illustrates an example of a component that utilizes buffer pointers and work descriptors to facilitate data processing in an SDP architecture.
  • FIG. 10A illustrates receiving data at a packet receiver through a portion of a data path in an SDP architecture, components of the SDP architecture operating like the component of FIG. 9.
  • FIG. 10B illustrates passing data through a portion of a data path in an SDP architecture to a packet transmitter for transmission, components of the SDP architecture operating like the component of FIG. 9.
  • FIG. 10C illustrates an example of a flow of data for software-based processing using buffer pointers and work descriptors in an SDP architecture, components of the SDP architecture operating like the component of FIG. 9.
  • FIG. 10D illustrates an example of a flow of data that is stored in on- chip memory using buffer pointers and work descriptors in an SDP architecture, components of the SDP architecture operating like the component of FIG. 9.
  • FIG. 1 1A illustrates an example of a data formatting module configured to receive digitized data from a high-speed serial receive module, similar to the HSS/ADC RX or HSS/SDP RX modules described herein with reference to FIG. 7A.
  • FIG. 1 1 B illustrates an example of a data formatting module configured to receive processed data from a DCX packet transmit interface and to prepare the processed data for a high-speed serial transmit module, similar to the HSS/DAC TX or HSS/SDP TX modules described herein with reference to FIG. 7A.
  • FIGS. 12A and 12B illustrate packet formats for data in the SDP architecture.
  • FIG. 13 illustrates a memory module in a DCX, the memory module including read ports, write ports, a memory arbiter, and memory banks.
  • FIG. 14A illustrates an example memory module of a capture memory array (CMA) that includes a memory bank that is split into multiple channels and banks to provide multiple access capability of the memory simultaneously, the CMA being similar to the CMA described herein with reference to FIGS. 6A, 6B, and 7A.
  • CMA capture memory array
  • FIG. 14B illustrates the CMA with multiple memory modules where each memory module can be selected based on the bank and the channel derived from a requested address.
  • a radio system can be used to transmit and receive signals in a variety of architectures, such as land-based radio systems, satellite systems, hybrid systems with land- and satellite-based radios, and the like.
  • Signal processing in a radio system involves various operations to manipulate and enhance received and transmitted signals. Signal processing can be accomplished using hardware that is specifically or specially designed to accomplish certain tasks such as modulation, demodulation, encoding or decoding, error correction, noise reduction, and the like.
  • the physical layer (PHY) of a radio communication system encompasses the components and processes involved in the transmission and reception of the physical signals. It deals with the physical characteristics of the transmitted and received signals, including their modulation, coding, and transmission over the physical medium.
  • the physical layer of a radio system encompasses the hardware components, modulation techniques, coding schemes, transmission medium, antennas, and signal conditioning processes involved in transmitting and receiving the physical signals.
  • hardware can be specifically designed to perform certain signal processing functions at the physical layer of the radio system. These can include, for example, application-specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), digital signal processors (DSPs), and the like.
  • ASICs application-specific integrated circuits
  • FPGAs field programmable gate arrays
  • DSPs digital signal processors
  • each waveform may have hardware specific to that waveform. This means that a radio system that communicates with multiple satellites with different waveforms may have an ASIC and/or other hardware components specific to each different waveform.
  • a radio system e.g., a terminal in a satellite system
  • a radio system that includes specifically designed hardware to perform signal processing tasks lacks flexibility to add capabilities or improvements in the future.
  • radio systems and transceivers with a software-defined physical layer (SDP) architecture that incorporate processing nodes with existing hardware blocks.
  • the processing nodes provide a soft-processing design that supports existing functionality and enables future updates to provide additional and/or improved flexibility.
  • the processing nodes may also be able to adapt or to be reconfigured to process new waveforms to enable communication with new or different satellites even when the existing signal processing hardware (e.g., hardware blocks, processing nodes, etc.) is not specifically configured to operate with the new waveforms.
  • the term hardware block can be used to refer to any signal processing component and/or module that may be used in modem designs for radio systems and/or transceivers.
  • the disclosed systems, devices, architectures, and methods that incorporate the disclosed processing nodes support legacy abilities in that the processing nodes are added to existing hardware designs and configured to operate as such. This allows the processing nodes to support existing functionality, provide additional processing power to existing hardware blocks when requested or needed, and to provide expanded capabilities by taking over certain functions performed by existing hardware blocks or by replacing the existing hardware blocks in the processing chain.
  • the disclosed processing nodes can support signal processing during the initial deployment of a terminal and can be fine-tuned over time to provide expanded and/or refined functionality.
  • the disclosed processing nodes allow radio systems, such as terminals, to maintain their core design (e.g., the same hardware blocks performing the same functions) while adding a soft-processing design that can assist in performing the core tasks of the hardware blocks and that can eventually take over and replace the functionality of the hardware blocks.
  • the disclosed processing nodes include a plurality of digital signal processor (DSP) cores.
  • DSP digital signal processor
  • the processing nodes may be configured to interface with existing hardware blocks in the radio system, allowing communication between the processing nodes and the hardware blocks.
  • multiple processing nodes can be added to augment or assist certain hardware blocks to assist with a variety of signal processing tasks.
  • each processing node can be structurally and/or architecturally identical and/but can be programmed to perform different signal processing tasks according to the hardware blocks with which the processing node is associated or according to the hardware block that the processing node is replacing.
  • the processing nodes can be configured to operate independently and can be added at any location in the radio system where corresponding processing is desirable.
  • the processing nodes can be added to provide software-defined functionality, which can advantageously enable augmented capabilities relative to existing hardware blocks.
  • Each processing node can be configured to provide a control and configuration interface with existing hardware blocks. This allows the hardware blocks to access the processing and memory of the processing nodes.
  • the processing nodes can provide a bypass processing route (e.g., bypassing the hardware block(s)) or an enhanced processing route (e.g., providing additional functionality to the hardware block(s)).
  • hardware blocks can refer to components that typically provide the functionality of the digital receiver/transmitter physical layer in a radio transceiver.
  • a hardware block is a signal processing component of the physical layer in a radio transceiver that is separate from the disclosed processing nodes.
  • SDR software-defined radio
  • SDR refers to a radio system in which many traditional hardware components of a radio transceiver are replaced or augmented by software processing.
  • SDR the majority of the signal processing functions are implemented in software, providing flexibility, reconfigurability, and the ability to adapt to different communication standards and protocols.
  • the defining characteristic of an SDR is its ability to perform RF signal processing using software algorithms rather than relying on fixed-function hardware.
  • the disclosed processing nodes provide functionality similar to a software- defined radio in that they provide a software-defined physical layer (SDP).
  • the SDP architecture is a cluster of DSP processors that can be tightly integrated with existing hardware blocks (e.g., modemcodec blocks), becoming part of a radio terminal ASIC.
  • the disclosed SDP architectures provide the flexibility to implement a portion or all of the digital receiver/transmitter physical layer functionality of the ASIC in the processing nodes.
  • the disclosed SDP architectures employ a multiprocessor signal processing architecture coupled with an existing modem design that provides flexibility to the receiver or transmitter waveform processing algorithms while still leaving ample room to accommodate future system-level design changes and updates.
  • the disclosed SDP architectures further enable existing terminals to be compatible with updated or new radio systems, such as next generation satellite systems. As a result, radio systems or terminals that incorporate the disclosed SDP architectures can continue to function with existing systems while being ready to communicate with systems with different characteristics in the future.
  • the disclosed processing nodes are configured to provide supporting circuitry around a DSP core to facilitate signal processing.
  • Each processing node includes a DSP core plus supporting circuitry to provide flexibility at strategic locations inside the encoder, modulator, demodulator, decoder, etc. in a radio system.
  • the disclosed processing nodes can then be implemented at multiple locations inside the radio system rather than having to custom design a module at each location.
  • the disclosed processing nodes are superior for design implementation because a processing node can be synthesized once and then stamped at different locations in the radio system to provide flexible capabilities.
  • FIG. 1 illustrates an example radio system with a modem 100 that incorporates a plurality of processing nodes 110 with a plurality of hardware blocks 104.
  • the modem 100 is configured to receive signals (Rx in), to process the received signals using the processing nodes 1 10 and the hardware blocks 104, and to output the processed received signals (Rx out).
  • the modem 100 is configured to receive signals for transmission (Tx in), to process the signals for transmission using the processing nodes 110 and the hardware blocks 104, and to output the processed signals for transmission (Tx out).
  • the processing nodes 110 are configured to be implemented in an SDP architecture to operate as part of a software-defined physical layer, as described herein.
  • the hardware blocks 104 include modules and blocks that provide signal processing functionality and may be incorporated in different portions of the signal processing chain.
  • the processing nodes 110 can be implemented as part of the signal processing chain to provide flexible processing capabilities to one or more of the hardware blocks 104 and/or to bypass processing by one or more of the hardware blocks 104.
  • the hardware blocks 104 can be implemented as part of a receive signal processing chain and/or a transmit signal processing chain.
  • the hardware blocks 104 can be implemented in a demodulator block of the modem 100, a transmit block of the modem 100, a decoder block of the modem 100, a CPU subsystem of the modem 100, and the like.
  • the SDP architecture of the modem 100 is configured to flexibly utilize the processing nodes 110 in the signal processing chain.
  • the SDP architecture of the modem 100 can include shared memory (e.g., capture memory arrays (CMAs)) and one or more network interconnects that tie the hardware blocks 104, the processing nodes 110, and the shared memory together.
  • each processing node 1 10 can be configured to interface with individual hardware blocks 104 of the receiver and/or transmitter signal processing data paths.
  • the processing nodes 1 10 can be arranged relative to demands and/or requirements of an encoder, decoder, modulator, demodulator, etc.
  • the plurality of processing nodes 110 may enable the SDP architecture to reassign processing resources on demand. Additionally, the plurality of processing nodes 1 10 may enable passing of data between external resources and the SDP architecture.
  • the plurality of processing nodes 110 can be disposed in different locations of the signal processing chain. This enables a specific processing node of the plurality of processing nodes 1 10 to be instantiated or included as a local processing feature. For example, one instantiation of a processing node may be included for each of transmitting, demodulator modules, decoder, etc., where each processing node instantiation is programmed to provide particular processing abilities based on its use case.
  • FIG. 2 illustrates an example processing node 210 that includes a plurality of DSP cores 214a, 214b and a plurality of extended direct memory access controllers (DCX) 212a, 212b, 212c, 212d.
  • DCX direct memory access controllers
  • each DCX of the plurality of DCX 212a-212d can be interchangeable or can provide interchangeable functionality
  • reference to a DCX 212 should be considered to reference an individual DCX of the plurality of DCX 212a-212d.
  • Each DCX 212 provides a direct memory access (DMA) with extended capabilities to move data in and out of the memory, examples of which are described herein. Moving data in and out of the memory of the plurality of DCX 212a-212d can be done in conjunction with processing by the plurality of DSP cores 214a, 214b and/or it can be done in conjunction with processing by an external hardware block. Communication between the plurality of DSP cores 214a, 214b, the plurality of DCX 212a-212d, and/or with external hardware blocks can occur through a processing node (PN) network interconnect 216.
  • PN processing node
  • Each DCX 212 and DSP core 214 includes one or more interfaces to communicate with the PN network interconnect 216.
  • each DCX 212 and DSP core 214 includes a master interface and a slave interface coupled to the PN network interconnect 216. This allows different components to utilize an individual DCX 212 and/or DSP core 214 as slave components. This also allows an individual DCX 212 and/or DSP core 214 to act as master for different components coupled to the PN network interconnect 216.
  • the PN network interconnect 216 can communicate with a higher- level network interconnect (e.g., an SDP network interconnect, as described herein) to communicate with other processing nodes.
  • a higher- level network interconnect e.g., an SDP network interconnect, as described herein
  • the PN network interconnect 216 includes an SDP master interface and an SDP slave interface.
  • the SDP master interface is configured to enable communication with other processing nodes wherein the processing node 210 acts as a master to the other processing nodes.
  • the processing node 210 can send data and/or commands to other processing nodes of an SDP architecture to utilize the processing and/or memory of the other processing nodes through the SDP master interface.
  • a master component includes any component that can perform read or write operations on another component (a slave component).
  • the SDP slave interface is configured to provide communication with other processing nodes where the processing node 210 acts as a slave to the other processing nodes.
  • the processing node 210 can receive data and/or commands from other processing nodes to enable the other processing nodes to utilize the processing and/or memory of the processing node 210 through the SDP slave interface.
  • incoming data from a master processing node on the SDP slave interface can cause the processing node 210 to execute code stored in memory of a DSP core 214.
  • the PN network interconnect 216 also includes one or more configuration ports (or configuration interfaces).
  • a configuration port is connected to a hardware block.
  • the configuration ports can interface with hardware blocks to allow the processing node 210 to configure aspects of the hardware block to be compatible with working with the processing node 210.
  • the configuration ports allow the processing node 210 to write configuration data to the hardware block which can include parameters to enable transferring data to and/or from the hardware block.
  • the configuration data can include any configuration that the hardware block supports.
  • the hardware block can be connected to one or more of the plurality of DCX 212a-212d.
  • a hardware block can send data for processing to the processing node 210 through a packet Rx interface of a DCX 212.
  • the processing node 210 can send processed data to the hardware block through a packet Tx interface of a DCX 212.
  • the configuration ports are used to configure the hardware block to send data to a DCX 212 through the packet Rx interface of the DCX 212 and to receive data from the DCX 212 through the packet Rx interface.
  • the sample interfaces may be FIFO interfaces that allow flexible options to connect to hardware blocks (e.g., communications components) for moving data in and/or out from the data path.
  • hardware blocks e.g., communications components
  • using a plurality of DCX to interface with a hardware block may be useful where it is desirable to feed data into a hardware block in parallel, or where it is desirable to have a mix of receive and transmit data into the hardware block. This may also be useful where a hardware block is sub-divided into smaller functional blocks and it may be beneficial to interface with the smaller functional blocks using a dedicated DCX for individual functional blocks.
  • the configuration port allows the hardware block to be configured to operate normally, for example, and it can also be used to read hardware block status, hardware block configuration, to monitor the hardware block configuration etc.
  • the configuration ports can be used to configure the hardware block to route data to the processing node 210 and to configure the interaction so that the processing node 210 can receive the data (e.g., via a packet interface at a DCX 212).
  • the hardware block can be a decoder block, a demodulator block, etc.
  • the configuration data sent over the configuration interface can be configured to tell the hardware block which data path to follow (e.g., to send data to a particular packet interface).
  • the configuration ports act as a control plane interface with a hardware block and the hardware blocks can connect to respective packet interfaces of the plurality of DCX 212a-212d to transfer data to the processing node 210.
  • the processing node 210 can be configured as a dual core node that can connect to different hardware modules.
  • the processing node 210 provides a configuration bus as well as interfaces to receive and to send samples to and from the hardware blocks to become part of the data path and/or to perform captures of data.
  • Individual DCX 212 can be associated with a particular DSP core 214.
  • the PN network interconnect 216 allows high bandwidth data transfers between the plurality of DSP cores 214a, 214b and the plurality of DCX 212a-212d.
  • An individual DCX 212 may also form part of the work descriptor chain, as described in greater detail herein with respect to FIGS. 9, 10A, 10B, 10C, and 10D.
  • the plurality of DCX 212a-212d provide direct memory access (DMA) which allows data to be transferred to or from memory. This may be accomplished using the DSP cores 214 and, may be advantageously accomplished without involvement from an external processor which may improve overall computer speed .
  • the DSP cores 214 can be configured to control and/or configure individual DMA and set up data paths that are to be utilized for the desired functionality. DMA transfers a block of data between memory of the DMA and another location, such as external memory, internal memory to a DSP core, or other DCX memory.
  • Each DCX 212 includes a DMA controller to control the activity of accessing memory directly.
  • the processing node 210 includes a first DSP core 214a and a second DSP core 214b.
  • the processing node 210 also includes a plurality of DCX 212a-212d, each DCX 212 having shared memory space (not shown in this figure but examples of which are described herein with reference to FIGS. 3A and 4), an input packet interface, and an output packet interface.
  • the input packet interface is configured to receive samples from a hardware block separate from but coupled to the processing node 210.
  • the shared memory space is configured to store the received samples.
  • the output packet interface is configured to transmit to the hardware block samples processed by the first DSP core 214a or the second DSP core 214b.
  • the input packet interface and the output packet interface of the DCX are configured to provide a flexible plugin interface to enable connecting with any generic streaming interface within the data paths of the hardware blocks.
  • These streaming interfaces can be simple, such as a valid signal and/or an associated data bus. These streaming interfaces can also be more complex with additional start and end of framing signals. These packet interfaces are thus designed to be generic so that adaptation to modules is either minimal or unnecessary to enable such modules to interface with DCX 212.
  • the processing node 210 also includes the PN network interconnect 216 configured to communicably couple the first DSP core 214a, the second DSP core 214b, and the plurality of DCX 212a-212d. Each DSP core 214 and each DCX 212 is coupled to the PN network interconnect 216 through a respective master interface and a respective slave interface.
  • the PN network interconnect 216 further includes an SDP master interface and an SDP slave interface each configured to communicate with an SDP network interconnect (not shown in this figure but examples of which are described herein with reference to FIGS. 6A, 6B, 7A, and 7B).
  • the processing node 210 is configured to be integrated into a radio transceiver that includes the hardware block and to interface with the hardware block to provide configurable processing functionality to the radio transceiver.
  • the processing node 210 can include a queue interface configured to transfer commands or data from the first DSP core 214a to the second DSP core 214b and to transfer commands or data from the second DSP core 214b to the first DSP core 214a.
  • the inter-processor queues 218 can be unidirectional (e.g., a first inter-processor queue going from the first DSP core 214a to the second DSP core 214b and a second inter-processor queue gone from the second DSP core 214b to the first DSP core 214a), bidirectional (e.g., a single inter-processor queue that can transfer messages between the first and second DSP cores 214a, 214b), or the interprocessor queues 218 can provide bidirectional functionality using a combination of unidirectional queues.
  • the inter-processor queues 218 are configured to provide interprocessor communications or descriptors such as commands and messages.
  • the inter-processor queues 218 comprise FIFO queues that go directly between the plurality of DSP cores 214a, 214b.
  • each DSP core 214 includes a general- purpose input-output (GPIO) port connected to a configuration register and configured to receive input for placement in the configuration register and to transmit data stored in the configuration register.
  • GPIO general- purpose input-output
  • the GPIO port can include a 32-bit input to each DSP core 214 and/or a 32-bit output from each DSP core 214.
  • the GPIO port provides flexibility to the processing node 210.
  • the GPIO port is coupled to one or more configuration registers that allows any master component to write to the 32-bit register of the associated DSP core 214 and allows any slave component to read from the 32-bit register of the associated DSP core 214.
  • the GPIO can be used to output standard values to enable an operator to determine where an error is occurring.
  • the GPIO ports may also be configured to receive hardware events, receive external events from hardware, generate triggers from hardware events, and the like.
  • the plurality of DSP cores 214a, 214b each include a debug and trace port that connects to a CPU subsystem or operating system interface. This allows external data to be used to debug the performance of the plurality of DSP cores 214a, 214b.
  • interrupt requests can provide functionality that could otherwise be provided by the GPIO ports.
  • each DSP core 214 can be configured to receive interrupt requests.
  • the IRQs can be received via dedicated ports or pins (not shown) in the DSP cores 214a, 214b.
  • the IRQs can also be received via dedicated ports in a CPU subsystem, as described herein.
  • the interrupts can be configured via interconnect and/or configuration ports in the DSP cores 214a, 214b and/or CPU subsystem.
  • a polling routine can be implemented on a DSP processor and/or within the CPU subsystem to read the interrupt registers through the interconnect or configuration ports to detect pending IRQs.
  • the plurality of DSP cores 214a, 214b can be configured to implement a multiplexing scheme to receive IRQs to enable the plurality of DSP cores 214a, 214b to filter interrupts. This allows the plurality of DSP cores 214a, 214b to respond to particular interrupts and to ignore other interrupts. Configuration for enabling or disabling interrupts can be performed via communication with the hardware blocks (e.g., through the configuration ports).
  • a configuration can also be implemented at the plurality of DSP cores 214a, 214b to dictate to which interrupts the plurality of DSP core 214a, 214b responds. For example, many interrupts can be accessed by the plurality of DSP cores 214a, 214b but the configuration dictates to which interrupts the plurality of DSP cores 214a, 214b responds.
  • the respective DSP core 214 reads registers of the hardware block using the configuration ports to determine the nature of the interrupt.
  • the interrupts can be used for a variety of purposes including, for example and without limitation, errors, timers, DMA interrupts, transfer notices, packet queue/interface full, packet queue/interface empty, etc.
  • the first DCX 212a is configured to receive a plurality of samples from the hardware block to be processed through the input packet interface. The first DCX 212a is then configured to temporarily store the plurality of samples in the shared memory space of the first DCX 212a. The first DCX 212a is also configured to transmit the plurality of samples to the first DSP core 214a through the PN network interconnect 216.
  • the first DSP core 214a is configured to program the first DCX 212a to convey the plurality of samples to the first DSP core 214a, to place the plurality of samples into an internal memory space of the first DSP core 214a where the first DSP core 214a is configured to process the plurality of samples, and to place the processed samples into the internal memory space of the first DSP core 214a.
  • the first DCX 212a is further configured to reformat the received plurality of samples.
  • reformatting the received plurality of samples includes sign extending samples of the received plurality of samples to increase the number of bits for each sample.
  • reformatting the received plurality of samples includes bit clipping samples of the received plurality of samples to reduce a resolution of each sample.
  • the first DCX 212a is configured to receive a plurality of samples through the input packet interface, to temporarily store the plurality of samples in the shared memory space, and to use the output packet interface to transmit the plurality of samples to the hardware block separate from the processing node 210 for processing.
  • the first DCX 212a can be further configured to receive the processed plurality of samples from the hardware block and to temporarily store the processed plurality of samples in the shared memory space.
  • the processing node 210 can operate in different modes.
  • the processing node 210 can be configured to receive a stream of data from a hardware block.
  • the processing node 210 can be configured to respond to an interrupt that can be used to identify certain data for capture (e.g., to capture data for a particular event).
  • the processing node 210 can operate using a circular buffer.
  • the processing node 210 can include a list of 100 buffers, for example. The processing node 210 returns the data in the top of the list of buffers back to its own queues and can keep overwriting the data so that the buffer includes the last 100 buffers of data. At any point, the processing node 210 can move the data in the buffers into memory for processing and analysis.
  • the processing node 210 can operate in a buffer ID mode where a buffer ID is used to assign data to a particular DSP core. In this mode, a buffer ID that ends in 0, for example, is sent to core 0 of the first DSP core 214a, a buffer ID that ends in 1 is sent to core 1 of the first DSP core 214a, etc.
  • buffer IDs can also be configured to be ignored (e.g., the buffer pointer is sent back to the buffer pointer queue rather than to the work descriptor queue, examples of which are described in greater detail herein with reference to FIGS. 8A-10D).
  • the processing node 210 is an example of a processing node that forms a basic building block of the disclosed SDP architectures.
  • Each processing node in the SDP architecture can be configured to include a number of dual core processors (e.g., 2 DSP processors) and may comprise a number of DOX allowing a wide variety of data movement options between memory and the data path.
  • the processing node 210 may also include a shared memory space (for example, in each DCX 212).
  • the plurality of DCX 212a-212d may also have dedicated packet interfaces to transfer samples to and from hardware blocks.
  • the plurality of DSP cores 214a, 214b include a VLIW (Very Large Instruction Word) SIMD (Single Instruction Multiple Data) processor tailored for complex number processing.
  • the plurality of DSP cores 214a, 214b may comprise a 32-way multiplier-accumulator (MAC), which enables the plurality of DSP cores 214a, 214b to do up to 32 parallel MAC operations every clock cycle (or 8 complex multiplication and accumulation operations).
  • MAC multiplier-accumulator
  • the processing node 210 is configured to be sufficiently flexible so that it can be placed at various locations within a radio transceiver (e.g., encoder, modulator, demodulator, decoder, etc.) as needed.
  • a radio transceiver e.g., encoder, modulator, demodulator, decoder, etc.
  • the processing node 210 can be implemented at a particular location and can be programmed to provide functionality based on the location it is implemented instead of requiring specific or custom designs at different locations. This results in a simplified design stage.
  • FIG. 3A illustrates a diagram of an example extended DMA controller or DCX 312 of a processing node, such as the DCX 212 of the processing node 210 of FIG. 2.
  • the DCX 312 may include a packet transmit interface 301 for outputting samples from a processing node.
  • the DCX 312 may also include a packet receive interface 302 for receiving samples from a hardware block for storage and/or processing with the processing node.
  • the DCX 312 also may include one or more configuration registers 303 that can be configured to pass data, messages, and/or commands to and from the DCX 31 .
  • the DCX 312 includes one hundred or more configuration registers 303 (e.g., each DMA can have 20 or more associated configuration registers).
  • the DCX 312 also can include a memory arbiter 304 that controls access to DCX memory 308 comprising RAM banks 1 and 2.
  • the DCX 312 also includes two sub-DMA memory modules 306, 307.
  • the DCX 312 can be repeated in a processing node allowing for a configurable set of DMA channels to be chained together, including with DMA channels of other processing nodes and other shared memory (e.g., CMAs).
  • the DCX 312 includes a memory arbiter 304 that regulates read and write operations to DCX memory 308 via a slave interface.
  • the slave interface means that the memory arbiter 304 receives requests to read and/or write to the DCX memory 308 from other processing nodes via the network interconnect.
  • the DCX memory 308 can thus be configured based on application.
  • the DCX memory 308 can be used to buffer samples received from a hardware block (e.g., a hardware data path).
  • the DCX memory 308 can be used as a private scratch memory for one or more of the plurality of DSP cores.
  • the memory arbiter 304 allows a dual read/write interface to the DCX memory 308, as described in greater detail herein with respect to FIG. 13.
  • the DCX 312 can include a DCX single atomic write control (SAWC) channel, or SAWC 305, with a dedicated access port on the network interconnect to allow the DCX 312 to control state machines to perform single atomic writes to a programmable register offset, for returning buffer pointers, and for sending work descriptors to the DSP cores.
  • SAWC DCX single atomic write control
  • FIGS. 3C and 3D illustrate additional detail regarding the DCX 312.
  • the DCX 312 includes the SAWC 305 and sub-DMA modules 306, 307, as described.
  • Each sub-DMA module 306, 307 includes a DMA read component 331 and a DMA write component 332 with work descriptor DMA channels 334, reformat FIFOs 333, and a work descriptor controller 339.
  • the SAWC 305 includes a SAWC write component 341 , configuration registers with work descriptor and buffer pointer queues 343.
  • the SAWC write component includes individual write-only DMA channels that are arbitrated in a round-robin fashion. The write-only DMA channels are provided buffer pointers or work descriptors from the various blocks within the DCX 312.
  • the SAWC 305 enables higher quality of service for the single write DMA by letting the network interconnect handle the arbitration between the two sub-DMA modules 306, 307.
  • Arbitration at the NIC level occurs where transactions from any of the AXI masters within the processing node (e.g., individual DMAs, SAWC, DSP cores, etc.) are attempting to be sent to a common slave memory internal or external to the processing node.
  • the DCX 312 uses multiple instances of DMA read and write engines, including two configurable sub-DMA modules 306, 307 that interface with the network interconnect and other interfaces (e.g., packet Rx and Tx interfaces) to facilitate data transfers to other processing nodes and/or hardware blocks.
  • Each subDMA module 306, 307 controls a single write channel and a single read channel.
  • Each sub-DMA module 306, 307 includes 4 read channels in the DMA read component 331 and 4 write channels in the DMA write component 332, where each read channel and write channel operate and are configured as a single entity.
  • the read channel and write channel operate as a single entity for performing a transfer from address A to address B, the read channel performs the read transaction over the interconnect to address A, temporarily buffers data in the reformat FIFO, which is then used by the write channel to perform a write transaction to address B over the network interconnect.
  • Read channels can be arbitrated amongst themselves and write channels amongst themselves, providing efficient usage of the single read and single write interface between the DMA and the network interconnect.
  • Each sub-DMA module 306, 307 uses work descriptors from a work descriptor queue and buffer pointers from a buffer pointer queue for managing data transfers (examples of which are described herein with reference to FIGS. 8A-10D).
  • DMA channels can be configured to operate in a normal DMA operation where each channel can be programmed to perform transfers from address A to address B.
  • the configuration related to the transfer of data can be configured using the available configuration registers for that DMA channel.
  • Configuration data can include, for example and without limitation, destination address, source address, memory attributes, interconnect transaction attributes, transfer length, mode of operation, etc.
  • DMA channels can be configured to operate in a WQ mode of operation where three of the four channels are provided work descriptors and buffer pointers from the WQ controller 339 in a round robin fashion so that the three DMA channels can operate back-to-back increasing the efficiency of transactions across the network interconnect 316.
  • DMA transfers are broken into smaller configurable transaction sizes, which allows a single DMA channel to keep access to the read/write control of the burst for a single transaction. After that, the next DMA waiting in the queue gets access to the interfaces, which proceeds until each DMA channel has completed its transfer.
  • the DCX 312 also includes the packet receive interface 302 and the packet transmit interface 301 which communicate with the memory arbiter 304, as described herein.
  • the DCX 312 also includes a Tx compose component 355 and a Rx parse component 354, examples of which are described herein with reference to the TX compose module 1 125 and the RX parse module 1175, respectively.
  • the configuration registers 303 can pass work descriptors and buffer pointers to the two sub-DMA modules 306, 307 and can receive work descriptor queues and buffer pointer queues from the two sub-DMA modules 306, 307.
  • a sub-DMA module 306, 307 can fetch a buffer pointer from a buffer pointer queue and can start a transfer of data upon receiving a work descriptor.
  • the sub-DMA module 306, 307 Upon completion of the transfer, the sub-DMA module 306, 307 returns the buffer pointer from the received work descriptor (indicating the buffer pointer is available) and provides a new work descriptor with the fetched buffer pointer (indicating the buffer pointer now points to data) to the SAWC 305.
  • a first sub-DMA module 306 is coupled to the network interconnect through a master interface to transfer data to and from memory that is mapped as a slave (which may include memory in other processing nodes and/or other shared memory such as CMAs) on the network interconnect.
  • the second sub-DMA module 307 effectively allows connecting the read and write channels to hardware blocks to get samples to and from the hardware blocks and to read and write directly to the DCX memory 308.
  • Direct access to the DCX memory 308 allows the DCX memory to act as a capture buffer for packets or samples and allows other DMA and DSP cores access to the DCX memory through the network interconnect.
  • FIG. 3B illustrates a diagram of an example DSP core 314 of a processing node, such as the DSP core 214 of the processing node 210 of FIG. 2.
  • the DSP core 314 represents one of a plurality of DSP cores that may be present in a processing node.
  • the DSP core 314 can be one of two DSP cores in a processing node, such as the processing node 210 of FIG. 2.
  • the DSP core 314 includes one or more configuration registers 321 that allows control or command data to come in from external processing nodes, e.g., through a network interconnect.
  • the control or command data includes buffer pointers, work descriptors, and messages.
  • Any external master, or the DSP core 314 itself, can perform a write transaction over the interconnect to write control or command messages via the configuration registers 321 .
  • the configuration registers 321 can pass data in the form of FIFO queues 322.
  • the FIFO queues 322 can include data such as a buffer pointer queue, a message queue, and a work descriptor queue that are passed from a different processing node to the DSP core 314.
  • the queues 322 are passed to the DSP processor 324, the DSP processor 324 including internal ram and cache, wherein the DSP processor 324 processes the data in the queues 322.
  • the buffer pointer queue includes 32-bit words whereas the message queue and the work descriptor queue include 64-bit words.
  • the DSP core 314 can receive messages from a different DSP core of the processing node through a FIFO interface 323.
  • the DSP core 314 can also receive interrupts (IRQs) and send FIFO interface messages through queues 325.
  • IRQs interrupts
  • a processing node includes more than 2 DSP cores and the message queue (the FIFO interface 323) can be used to pass messages directly between the DSP cores.
  • the configuration registers 321 are used to pass messages directly between the DSP cores of a particular processing node as well as between external processing nodes and hardware blocks.
  • the DCX returns buffer pointers back to the DSP core 314 through the configuration registers 321 .
  • the configuration registers 321 can be configured to store data that is passed between components of an SDP architecture, as described herein.
  • the configuration registers 321 can be configured to enable the DSP core 314 to communicate with different components, such as a different DSP core of the processing node, a DCX of the processing node, an external processing node (e.g., a processing node different from the processing node that includes the DSP core 314), and/or an external hardware block.
  • FIG. 4 illustrates a diagram of data flow through an example processing node 410.
  • the processing node includes a DSP core 414, similar to the DSP cores 214, 314, and a DCX 412, similar to the DCX 212, 312.
  • the DSP core 414 and the DCX 412 are each connected to a PN network interconnect 416, similar to the PN network interconnect 216.
  • the PN network interconnect 416 enables communication between the DCX 412, the DSP core 414, hardware blocks (e.g., via interrupts), and external processing nodes.
  • the PN network interconnect 416 includes a configuration interface to configure external hardware blocks as described herein, an SDP master interface for sending data to external slave processing nodes, and an SDP slave interface for receiving data from external master processing nodes.
  • the DCX 412 includes one or more configuration registers that communicate with the PN network interconnect 416 to send and receive data between components of the processing node 410 as well as external processing nodes.
  • the DSP core 414 includes one or more configuration registers that communicate with the PN network interconnect 416 to send and receive data between components of the processing node 410 as well as external processing nodes.
  • the one or more configuration registers of the DSP core 414 can be used to pass buffer pointers, work descriptors, and messages to the DSP core 414 from external processing nodes or from other components of the processing node 410.
  • the DSP core 414 includes other FIFO queues that can be used to transfer messages between cores of the processing node 410.
  • Data can be received from an external hardware block (not shown) at a packet receiver of the processing node 410.
  • the data can be samples captured from the hardware block (e.g., modems or communication hardware).
  • the hardware block can be configured for an appropriate or suitable capture interface using the configuration port of the PN network interconnect 416, as described herein.
  • the packet receiver is coupled to the DCX 412.
  • the write port writes the received data to shared memory (a DMA) of the DCX 412 where it is temporarily stored before being processed.
  • the DSP core 414 programs the DCX 412 to convey a set of data for processing to internal memory of the DSP core 414 (e.g., internal data RAM of the DSP core 414).
  • the DSP core 414 configures the data path to be used for sending and receiving samples. This is accomplished by programming the packet receiver to create packets of a configured size and to store the packets in shared memory.
  • a corresponding work descriptor is created with the size and buffer pointer information and forwarded to the assigned work descriptor queue DMA (or WQ-DMA).
  • the WQ-DMA Upon receiving a work descriptor, the WQ-DMA is configured to move data to DSP core memory and to create a new work descriptor which is sent to the DSP core work queue FIFO interface through the DSP configuration interface.
  • the WQ-DMA is also configured to release the buffer pointer for the work descriptor it receives from the packet receiver to release the buffer pointer back to the packet receiver’s buffer pointer FIFO for reuse once the data is moved to DSP memory.
  • the DSP core 414 programs the transmit data path to set the work descriptor and buffer pointer flow in the reverse direction to enable sending data to the packet transmitter.
  • the DSP core 414 can receive samples from connected hardware blocks through the configuration interface to the hardware block. This allows for data from the connected hardware blocks to the packet receiver to be placed in DSP memory. This triggers sending a work descriptor to the DSP core 414 without requiring the DSP core 414 to oversee and manage data transfers. Processed data can be temporarily stored in internal memory of the DSP core 414. The processed data can then be passed back to shared memory (a DMA) of the DCX 412 for temporary storage.
  • a DMA shared memory
  • the shared memory of the DCX 412 can be used to buffer samples for processing.
  • the shared memory of the DCX 412 can be repurposed as additional memory for the DSP core 414.
  • the DCX 412 can be configured to reformat data when moving data through the DCX 412.
  • each of the DMA engines inside the DCX 412 can include data reformatting logic around the FIFO that connects the read and write channel.
  • the reformatting logic can be configured to help with certain operations, like sign-extend samples from 8 bits to 16 bits, perform bit-clipping to reduce the sample resolution from 16 bits to 8 bits, and so forth.
  • the disclosed SDP architectures enable flexibility in using the DSP cores and support different configurations and ways to provide multi-core capability to the SDP architecture.
  • Each processing node can be configured to be used as a dualcore configuration or multiple nodes can configured to form a larger multicore functioning body.
  • each processing node can access the DCX and/or DSP cores of another processing node.
  • the dual core configuration e.g., a single processing node
  • the multicore configuration e.g., multiple processing nodes
  • the DSP cores can operate in a shared memory model where the cores share access to shared memory while having access to a private memory, and a producer-consumer model where the DSP cores can message each other.
  • FIG. 5A illustrates an example of a shared memory model for DSP cores 514a, 514b in a processing node.
  • the DSP cores 514a, 514b have access to local shared memory 515 (which resides in DCX 512, as described herein with reference to FIG. 4, for example, or any memory that the DSP core 514a, 514b can access through a network interconnect) to allow operating in this mode.
  • the local shared memory 515 can be the shared memory of any of the DCX, a section of CMA marked as shared between DSP cores, or some allocated memory in external DRAM/DDR.
  • the private SRAMs 513a, 513b can be used for scratch data while processing objects or the private SRAMs 513a, 513b can be used as additional memory shared between the cores 514a, 514b. This can be extended to multiple DSP cores in different processing nodes. Through the use of the network interconnect, each DSP core can access shared memory space residing in a particular DCX (that may reside in a different processing node) while each DSP core can access private SRAMs residing in the processing node in which the DSP core resides.
  • FIG. 5B illustrates an example of a producer-consumer model for DSP cores 514a, 514b.
  • the DSP cores 514a, 514b have a queue interface 518 (or a queue can exist in shared memory) and dedicated GPIOs that could be used as message queues within a single processing node. This would allow messaging between the DSP cores 514a, 514b, which could be simple commands or object data.
  • the first DSP core 514a can act as a producer and the second DSP core 514b can act as a consumer with the queue interface 518 acting as the queue of tasks filled by the first DSP core 514a (the producer) and popped by the second DSP core 514b (the consumer).
  • This can be extended to multiple DSP cores in different processing nodes.
  • additional memory-mapped message queues can be used to connect the DSP cores of different processing nodes, the memory-mapped message queues similar to the message queues between DSP cores of the same processing node.
  • the additional memorymapped message queues transfer data using an SDP network interconnect (e.g., using configuration registers as described herein).
  • FIGS. 6A and 6B illustrate example SDP architectures 600a, 600b.
  • the SDP architecture 600a includes processing nodes 610a- 61 Of coupled to an SDP network interconnect 640 via respective SDP master interfaces and SDP slave interfaces.
  • the SDP architecture 600a also includes shared memory in the form of a capture memory array 630 (or CMA) that is coupled to the SDP network interconnect 640 via an SDP master interface and an SDP slave interface.
  • the SDP architecture 600a also includes a CPU subsystem 650 (e.g., a Linux application layer) that is coupled to the SDP network interconnect 640 via an SDP master interface and an SDP slave interface.
  • a CPU subsystem 650 e.g., a Linux application layer
  • each bidirectional arrow coupling a corresponding component of the SDP architecture 600 to the SDP network interconnect 640 may represent one of the SDP master and slave interfaces, such as one SDP master and one SDP slave interface for each component.
  • the processing nodes 610 are configured to communicate with one another via the SDP network interconnect 640, as described in greater detail herein. This enables an individual processing node 610 of the SDP architecture 600a to use a different processing node 610 of the SDP architecture 600a for memory and/or processing.
  • the processing nodes 610a-610f can be implemented in different portions of a radio transceiver or system. In such instances, the processing nodes 610a-610f can be configured to communicate with each other and to utilize the memory and processing capabilities of other processing nodes via the SDP network interconnect 640.
  • the capture memory array 630 is a memory array that allows on-chip storage and that can be used for capturing samples for processing, for providing a scratch area, and for storing lookups for processing.
  • the capture memory array 630 comprises continuous address space which is made up of multiple banks of SRAM.
  • the capture memory array 630 can be designed using interleaved single port RAMs to function as a read/write memory with handshakebased interface for an area/complexity efficient design.
  • the capture memory array 630 may comprise a set of large memory banks connected to the data path allowing low- latency and high bandwidth parallel access to the processors without having to go out to an off-chip memory.
  • the SDP architecture may further include a capture memory array (CMA) and switches.
  • the CMA can be a common location where data can be stored internal to the modem 100, allowing storage of data such as data samples, intermediate processing results, and so forth.
  • the modem 100 can be implemented as a single chip or multiple chips.
  • the SDP architecture 600b illustrates an architecture in which the network interconnect is split into multiple segments, a first SDP network interconnect 640a and a second SDP network interconnect 640b, to which the individual processing nodes 610a-610f, capture memory array 630, and CPU subsystem 650 are connected.
  • one processing node e.g., the first processing node 610a
  • the rest of the processing nodes 61 Ob-61 Of can connect through the first SDP network interconnect 640a or second SDP network interconnect 640b (or a collection of network interconnect segments).
  • Different arrangements of the processing nodes 610, the CPU subsystem 650, and the capture memory array 630 and the SDP network interconnects 640 may similarly exist.
  • the SDP architectures 600 are configured to provide flexibility to receiver or transmitter waveform processing algorithms while leaving room to accommodate future system level design changes and updates.
  • the SDP architectures 600 also enable software-based signal processing on chip.
  • each processing node 610a-610f can interface with individual hardware modules of both receiver and transmitter signal processing data paths.
  • the SDP architectures 600 enable DSP processing power to be instantiated at specific key or desired locations in a radio system, such as the encoder, modulator, demodulator, and decoder, as well as allow sufficient connectivity to reassign processing resources for different locations.
  • the SDP architectures 600 also advantageously provide sufficient connectivity to enable passing data between the processing nodes 61 Oa-61 Of as well as with external resources like external memory.
  • the SDP architectures 600 enable customizable signal processing to provide flexibility in modem design to facilitate different types of signal processing for different applications.
  • the SDP architectures 600 include the SDP network interconnect 640 and a plurality of processing nodes 61 Oa-61 Of connected to the SDP network interconnect 640, the plurality of processing nodes 61 Oa-61 Of configured to provide configurable processing power to process receiver and transmitter waveforms in a radio transceiver.
  • each processing node 610 may include a plurality of digital signal processing (DSP) cores; a plurality of extended direct memory access controllers (DCX); and a PN network interconnect connected to the plurality of DSP cores, to the plurality of DCX, and to the SDP network interconnect 640.
  • DSP digital signal processing
  • DCX extended direct memory access controllers
  • PN network interconnect connected to the plurality of DSP cores, to the plurality of DCX, and to the SDP network interconnect 640.
  • the SDP architectures 600 may include a capture memory array 630 comprising a plurality of memory banks that are connected to the SDP network interconnect 640 to provide access to the plurality of memory banks for the plurality of processing nodes 610.
  • the SDP architectures 600 may also include a CPU subsystem 650 connected to the SDP network interconnect 640.
  • the SDP network interconnect 640 enables communication among each of the plurality of processing nodes 610, the capture memory array 630, and the CPU subsystem 650 to augment processing power and functionality in the radio transceiver.
  • one or more of the plurality of processing nodes 610 can be dynamically allocated to provide signal processing power to one or more hardware blocks of the radio transceiver. This allows for dynamic allocation of processing power to a hardware block.
  • each processing node of the plurality of processing nodes 610 can be configured to interface with one or more individual hardware blocks of both receiver and transmitter signal processing data paths in the radio transceiver. This allows multiple hardware blocks to access a single processing node to provide flexible storage and memory functionality.
  • external off-chip memory can be further connected to the CPU subsystem 650.
  • the plurality of processing nodes 610 can be configured to pass data from individual DSP cores to the external memory through the SDP network interconnect 640.
  • the capture memory array 630 includes a plurality of SDP master interfaces to the SDP network interconnect 640.
  • the CPU subsystem 650 includes an SDP master interface and an SDP slave interface to the SDP network interconnect 640.
  • FIG. 7A illustrates an example radio transceiver 700 that includes a CPU subsystem 750 (similar to the CPU subsystem 650), a demodulator 760 divided into multiple demodulator blocks 760a-760d, a high-speed serial or HSS module 770, a decoder module 780, and a transmit module 790.
  • the radio transceiver 700 includes an SDP architecture similar to the SDP architecture 600b of FIG. 6B.
  • the CPU subsystem 750 includes a network interconnect switch (NIC switch) and a cache- coherent network (CCN or bus interconnect) and is coupled to external memory 755 (e.g., DDR memory).
  • the network interconnect switch (NIC switch) is coupled to SDP network interconnect 2 (SDP NIC 2).
  • the SDP architecture (e.g., the SDP network interconnects, the processing nodes, and the CMA described herein with reference to FIG. 6B) may be represented by and accessible via the SDP NIC 2 as shown.
  • the demodulator 760 can be divided into a number of modules 760a- 760d which each module having one or more components such as SDP network interconnects, CMAs, and/or hardware blocks (or HWBs). Other components of the radio transceiver can also include hardware blocks as well.
  • the hardware blocks can be configured to provide a number of signal processing capabilities such as signal conditioning, channelization, down-sampling, filtering, equalizing, despreading, descrambling, etc.
  • the hardware blocks can send and receive data to the processing nodes, as described herein.
  • the HSS module 770 is configured to convert received analog signals into digital signals for processing using HSS/ADC blocks, which includes two read channels (read channel 0 and read channel 1 ) that are processed in parallel.
  • the HSS module 770 is further configured to pass signals directly between an external chip or system that is part of the software defined physical layer (the external chip or system referred to as SDP in the figure which refers to dedicated SDP interfaces connected a high speed serial (HSS) module of an ASIC) and the first processing node of the demodulator module 760a via the HSS/SDP RX 1 block and the HSS/SDP TX 1 block and between the SDP and the sixth processing node of the transmit module 790 via the HSS/SDP RX 2 block and the HSS/SDP TX 2 block.
  • SDP software defined physical layer
  • HSS high speed serial
  • the SDP interfaces connect directly to the packet receiver and packet transmitter within the processing node. This enables receiving ADC samples directly into the processing node and sending DAC samples directly from the processing node to the HSS module 770.
  • the HSS module 770 is further configured to convert digital signals to analog signals for transmission using the HSS/DAC TX block that is coupled to the transmit module 790 (transmit channel 0).
  • Signals digitized by the HSS/ADCs are passed to the demodulator modules 760a-760c and then to the decoder module 780.
  • Digital signals for transmission are passed to the transmit module 790 and then to the HSS/DAC TX block of the HSS module 770.
  • the processing nodes are in communication with one another via the SDP network interconnects 740a, 740b (SDP NIC 1 740a and SDP NIC 2 740b). Furthermore, the processing nodes have access to the CMAs via the SDP network interconnects 740a, 740b.
  • Each processing node can be dynamically included in the signal processing data flow as described herein. This allows the modules 750, 760, 770, 780 and their hardware blocks to utilize flexible memory and processing provided by the processing nodes.
  • the processing nodes can be configured to access data processed by the hardware blocks through DCX interfaces, as described herein.
  • the processing nodes can configure the hardware blocks to pass data to the processing nodes using configuration ports, as described herein.
  • the transmit module 790 includes a processing node (PN 6) that is coupled to the SDP NIC 1 740a via an SDP master interface and an SDP slave interface.
  • the processing node PN 6 is coupled to the MOD block and the ENC block via packet interfaces and configuration ports, similar to those described herein with reference to FIGS. 2, 3A, and 4.
  • the processing node PN 6 can configure the MOD block and/or the ENC block to send data to the processing node PN 6 and to receive processed data from the processing node PN 6. In some instances, this can be done to enhance the signal processing provided by the MOD block and/or the ENC block. In certain instances, this can be done to bypass the MOD block and/or the ENC block.
  • the radio transceiver 700 can dynamically assign a first DCX of a first processing node to be a master to write data to shared on-chip memory (e.g., CMAs), external memory, and/or the memory of another DCX and a second DCX of a second processing node to be a master to read the result of processing the data written by the first DCX.
  • the placement of the processing nodes can be configured to enhance particular signal processing elements in the radio transceiver 700.
  • a processing node can be placed in the demodulator front module to aid in demodulating a signal and the processing node may be used to enhance or replace (e.g., bypass) the DSC and/or CDM hardware blocks.
  • the placement of the CMAs can be modified from what is illustrated here as well as the placement and configuration of the SDP network interconnects and processing nodes.
  • the placement of these components can be based on a number of factors including signal processing performance, layout of the chip, connectivity of the block to the other designs, latency, as well as fabrication complexity and cost.
  • the radio transceiver 700 is illustrated as providing parallel data paths to process two waveforms in parallel, however it is to be understood that a single data path may be used, or more than two data paths may be implemented in parallel.
  • the NIC switch and CCN of the CPU subsystem 750 allow the processing nodes to access external memory. However, this may be more expensive in terms of speed, so the radio transceiver 700 also advantageously provides on-chip memory in the form of the CMAs. This may also advantageously help to interface with software running on an external processor.
  • FIG. 7B illustrates the demodulator module 760c of FIG. 7A in greater detail to show connections between hardware blocks and a processing node 710 (e.g., PN 4 of FIG. 7A) as well as between the processing node 710 and the SDP NIC 1 740a implemented within the demodulator module 760b.
  • the processing node 710 is similar to the processing node 210 described herein with reference to FIG. 2.
  • a PN network interconnect 716 of the processing node 710 is coupled to an SDP interconnect 740 of the radio transceiver 700 (e.g., SDP NIC 1 of FIG. 7A) to integrate the processing node 710 into the radio transceiver 700.
  • the hardware blocks of the demodulator module 760c include a first hardware block 1 761 a for receive channel 0, a first hardware block 2 762a for receive channel 0, a second hardware block 1 761 b for receive channel 1 , and a second hardware block 2 762b for receive channel 1 .
  • Each hardware block of the demodulator module 760c can be configured using respective configuration ports of the processing node 710.
  • Each hardware block 761 a, 761 b, 762a, 762b is coupled to a respective DCX 712a-712d of the processing node 710 via a packet transmit interface and a packet receive interface.
  • each hardware block 761 a, 761 b, 762a, 762b is coupled to the PN network interconnect 716 via a respective configuration port.
  • data can be passed between the processing node 710 and the hardware blocks (blocks 761 a, 761 b and blocks 762a, 762b) using the packet transmit/receive interfaces of corresponding DCX 712 of the processing node 710.
  • the first hardware block 1 761 a can send data to the processing node 710 via the packet transmit interface of a first DCX 712a
  • the first hardware block 2 762a can send data to the processing node 710 via the packet transmit interface of a second DCX 712b
  • the second hardware block 2 762b can send data to the processing node 710 via the packet transmit interface of a third DCX 712c
  • the second hardware block 1 761 b can send data to the processing node 710 via the packet transmit interface of a fourth DCX 712d of the processing node 710.
  • processed data can be passed back to a particular hardware block from the processing node 710 using the packet receive interface of a corresponding DCX 712a- 712d.
  • the first hardware block 1 761 a can receive processed data from the processing node 710 via the packet receive interface of the first DCX 712a
  • the first hardware block 2 762a can receive processed data from the processing node 710 via the packet receive interface of the second DCX 712b
  • the second hardware block 2 762b can receive processed data from the processing node 710 via the packet receive interface of the third DCX 712c
  • the second hardware block 1 761 b can receive processed data from the processing node 710 via the packet receive interface of the fourth DCX 712d of the processing node 710.
  • the processing node 710 can provide additional and/or alternative processing for the demodulator module 760c.
  • the remaining processing nodes of the radio transceiver 700 can have similar connections and provide similar functionality for the other modules and hardware blocks of the radio transceiver 700 of FIG. 7A.
  • the different processing nodes are configured to seamlessly pass data back and forth (e.g., with each other, memory, hardware blocks, etc.).
  • the disclosed SDP architectures can utilize work descriptors and buffer pointers as part of the data flows. By using work descriptors and buffer pointers as described herein, the SDP architectures can connect different data transfer pieces across the architecture to form a highly configurable data path.
  • each hardware interface and DCX as an interchangeable building block that can be connected to any other hardware interface, DCX, or DSP core within the SDP architecture using a straightforward approach where each component receives a work descriptor that points to the location in memory of the data that needs to be processed and has some pre-allocated memory indicated in a buffer pointer for the component to use for the processed data.
  • the component Once the component is finished processing the data, it forwards the work descriptor containing a pointer to the location in memory where the newly processed data resides to the next component in the processing chain. As memory is freed up, the location of the newly freed up memory is returned to a buffer pointer queue for use by other components in the SDP architecture.
  • memory can be freed up when data is transmitted to a hardware block, moved to a new location in internal or external memory, and/or when data is processed and the result is saved in a new location in memory.
  • data can be passed from one component to the next until it reaches a final destination. Interconnecting data across the SDP architecture in this manner enables connecting virtually any two hardware interfaces, DCX, and/or DSP core to form a desired data path.
  • FIG. 8A illustrates an example of a buffer pointer 811.
  • the buffer pointer 811 can include, for example, a buffer address, a buffer pointer identifier, and reserved bits. As shown, the buffer pointer 81 1 may have a length of 32 bits, though other lengths (larger or smaller) may be used as appropriate.
  • the buffer pointer 811 identifies the starting address of a pre-allocated location in memory that is available for a processing node, or a component of a processing node, to utilize for storing data, the location in memory corresponding to the buffer address of the buffer pointer 81 1 .
  • FIG. 8B illustrates an example of a work descriptor 821.
  • the work descriptor 821 can include, for example, a buffer address, a buffer pointer identifier, reserved bits, a burst identifier, a length, burst start flag, and a burst end flag.
  • the work descriptor 821 is a pointer to a data set for a processing node or a component in the processing node. In some embodiments, the work descriptor 821 can have a length of 64 bits.
  • the work descriptor 821 is used to point to a location in memory that includes data (e.g., a burst) ready to be analyzed or processed, the location in memory corresponding to the buffer address of the work descriptor 821 .
  • the length indicated by the work descriptor 821 can be used to identify the amount of memory used to store the data at the buffer address.
  • Additional identifiers and flags of the work descriptor 821 can be used to facilitate analysis and processing of the data. Examples of identifiers or flags include message type, DSP ID, message count, etc.
  • BPID buffer pool id
  • burst ID [1 :0] in the work descriptor 821
  • the burst start and burst end flags in the work descriptor 821 are flags that get tagged by a packet-receiver in a DCX when receiving data from hardware blocks that use or rely on any kind of framing. These flags allow software running on a DSP core to identify a set of descriptors that make up that frame without having to parse the packets that are saved in memory.
  • FIG. 9 illustrates an example of a component 900 that utilizes buffer pointers and work descriptors to facilitate data processing in an SDP architecture.
  • the component 900 can operate as a consumer and/or producer in a consumer/producer model.
  • the component 900 receives a work descriptor queue 920, or a list of work descriptors, and a buffer pointer queue 910, or a list of buffer pointers and can output a work descriptor 921 and a buffer pointer 911 .
  • the buffer pointer queue 910 and the work descriptor queue 920 are each maintained by a queue controller or other component separate from the component 900.
  • the buffer pointer queue 910 and the work descriptor queue 920 are each available to the component 900 but not managed by the component 900. This allows the component 900 to pop an element from a particular queue, removing that element from the respective queue so that it does not appear as available for another consumer/producer in a signal processing chain.
  • the component 900 can take the next work descriptor from the work descriptor queue 920 to identify data to be processed (causing the work descriptor to be removed from the work descriptor queue 920) and can take the next buffer pointer from the buffer pointer queue 910 to identify a location in memory that is available to store data (causing the buffer pointer to be removed from the buffer pointer queue 910).
  • the component 900 processes the data at the buffer address indicated in the work descriptor to generate processed data and stores the processed data at the buffer address indicated in the buffer pointer.
  • the component 900 then creates a new work descriptor that includes the buffer address where the newly processed data is stored as well as the length of the data stored at the buffer address and outputs the work descriptor 921 to the queue controller that manages the work descriptor queue 920 and/or to a next consumer/producer in a chain.
  • the component 900 can also create a new buffer pointer that includes the buffer address where the data that was just processed was stored and can output that buffer pointer 91 1 to the queue controller that manages the buffer pointer queue and/or to a next consumer/producer in a chain.
  • the memory location with stored data corresponding to the buffer address in the work descriptor prior to processing is now indicated as available memory in the returned buffer pointer 911 whereas the available memory prior to processing is now indicated as holding newly processed data in the returned work descriptor 921 .
  • data can be passed between components of a processing node and/or between processing nodes of an SDP architecture.
  • the component 900 is configured to retrieve data from the location pointed to by the work descriptor from the work descriptor queue 920. The component 900 then processed the retrieved data. The component 900 uses the buffer pointer from the buffer pointer queue 910 to store the processed data. The component 900 then forms a new work descriptor 921 based on the location of the stored, processed data. The component 900 then outputs the buffer pointer comprising the buffer address from the work descriptor to a pre-configured location to be reused again. The component 900 then sends the work descriptor 921 to the next consumer in the chain. Additional examples of using buffer pointers and work descriptors are described in greater detail herein with reference to FIGS. 10A-10D.
  • the component 900 does not receive a work descriptor queue 920 but instead receives data for storage (e.g., a packet receiver of a processing node receiving packets from a hardware block).
  • the work descriptor queue 920 is not provided to the component 900 and the component 900 acts solely as a producer.
  • the component 900 stores the received data in the buffer addresses of the buffer pointer queue 910 and outputs work descriptors 921 indicating the storage location of the received data. In such embodiments, the component 900 does not output the buffer pointer 911 because no memory was made available in the process.
  • the component 900 does not receive a buffer pointer queue 910 but instead receives a work descriptor queue 920 with data for transmitting out of the processing node (e.g., a packet transmitter of a processing node transmitting packets to a hardware block).
  • the buffer pointer queue 910 is not provided to the component 900 and the component 900 acts solely as a consumer.
  • the component 900 takes the data from the buffer addresses indicated in the work descriptors in the work descriptor queue 920 and transmits the data and outputs buffer pointers to a buffer pointer queue manager, the buffer pointers corresponding to the buffer addresses where the data that was transmitted was stored, indicating that those locations in memory are now available for storage.
  • the component 900 does not output the work descriptor 921 because no data was processed and stored back in memory.
  • the component 900 operating with the buffer pointer queue 910 and the work descriptor queue 920 can get data from a location pointed to by a work descriptor in the work descriptor queue 920 and use a buffer address identified in a buffer pointer of the buffer pointer queue 910 for its results.
  • the component 900 may then form a new work descriptor 921 based on the output generated and stored in the buffer pointer queue 910, where the new work descriptor 921 includes the buffer pointer identified in the buffer pointer queue 910 and used for the results generated.
  • the component 900 may then output the buffer pointer 911 to a pre-configured location for reuse by a different component operating on data in a location pointed to by a subsequent work descriptor.
  • the component 900 sends the new work descriptor 921 to the next consumer in a processing chain.
  • the disclosed buffer pointers and work descriptors may be used in various control plane interfaces.
  • buffer pointers and work descriptors can be used in an interface that streams data samples out from a memory to a hardware component.
  • the work descriptor may point to a data packet in memory that contains samples that need to be streamed out to the hardware component, such that the list of work descriptors corresponds to a playlist of data packets to be streamed to the hardware component.
  • FIGS. 10A, 10B, 10C, and 10D illustrate examples of passing data through a portion of a data path in an SDP architecture 1000, components of the SDP architecture 1000 operating like the component of FIG. 9.
  • FIG. 10A illustrates receiving data at a packet receiver 1030 through a portion of a data path in the SDP architecture 1000.
  • the packet receiver 1030 receives samples from a hardware block, for example.
  • the packet receiver 1030 acts as a producer in the consumer/producer model, as described herein with reference to FIG. 9, and takes in a buffer pointer queue (BP) along with the received data samples.
  • the packet receiver 1030 takes the received samples and puts them in buffer addresses identified as being available for data storage in the buffer pointer queue (BP).
  • BP buffer pointer queue
  • the buffer pointer queue is stored in the DCX memory to which the packet receiver 1030 has access. Once placed in memory, the packet receiver 1030 generates respective work descriptors for inclusion in a work descriptor queue (WQ). This is passed on to a work descriptor queue DMA (WQ DMA 1031 ) that manages the work descriptor queue (WQ). In some instances, work descriptors placed in the shared memory of a DCX are accessible such that other DCX and/or DSP cores can access those work descriptors.
  • the buffer pointers can be tied to a DMA 1020 that monitors a fill level of the buffer pointer queue (BP). Responsive to determining that the fill level of the buffer pointer queue (BP) (e.g., the BP of the WQ DMA 1031 which is in a FIFO) falls below a threshold (e.g., FIFO threshold), the DMA 1020 may input one or more subsequent buffer pointers from the pre-allocated buffer pointer list 1011 to refill the buffer pointer queue (BP) to keep the buffer pointer queue filled to a sufficient level.
  • BP buffer pointer queue
  • a threshold e.g., FIFO threshold
  • the WQ DMA 1031 determines that there is space in the work descriptor list 1012 (a FIFO queue)
  • the WQ DMA 1031 sends a work descriptor to be added to the work descriptor list 1012. If memory is freed up in this operation, the WQ DMA 1031 sends a buffer pointer corresponding to the freed-up memory to the buffer pointer queue associated with the packet receiver 1030. . If the buffer pointer queue falls below the FIFO threshold, the DMA 1020 sends one or more buffer pointers from the pre-allocated buffer pointer list 101 1 to the buffer pointer queue associated with the WQ DMA 1031 .
  • a spare DMA channel in the DCX e.g., see FIG. 3C
  • This spare DMA channel in the DCX can be used as the DMA 1020.
  • FIG. 10B illustrates passing data through a portion of a data path in the SDP architecture 1000 to a packet transmitter 1040 for transmission.
  • a WQ DMA 1041 has an associated work descriptor queue (WQ) and it uses the WQ to send work descriptors to the packet transmitter 1040.
  • the packet transmitter 1040 sends data to a hardware block, for example, based on the data identified in the work descriptors of the work descriptor queues (WQ) associated with the packet transmitter 1040. Once data is sent, the memory location that contained that data is freed up and the packet transmitter 1040 generates a buffer pointer that is sent to a buffer pointer queue (BP) associated with the WQ DMA 1041 .
  • BP buffer pointer queue
  • a list of work descriptors may be associated with a DMA 1050 that monitors a fill level of the work descriptor queue (WQ) associated with the WQ DMA 1041 .
  • WQ work descriptor queue
  • the DMA 1050 may input one or more subsequent work descriptors from the work descriptor list 1021 to refill the work queue WQ to keep the work queue filled to a sufficient level.
  • the WQ DMA 1041 obtains the work descriptor in the WQ and reads it out of the FIFO memory to move into an allocated buffer (pointed to by a buffer pointer identified by the work descriptor) and creates a new work descriptor for the next component (e.g., the packet transmitter 1040).
  • the new work descriptor is stored in the work queue (WQ) leading to the packet transmitter 1040.
  • the packet transmitter 1040 may then operate similarly. When it has an entry in its work queue, the packet transmitter 1040 uses the work descriptor therein to identify and readout the corresponding packet.
  • the packet transmitter 1040 may have a state machine that reads packets out according to the work descriptor and transmits it to corresponding components before the memory in the WQ is freed up and is provided to the WQ DMA 1041 for reuse as a buffer location.
  • the buffer pointer and the work descriptor are used to exchange information regarding when buffer space is available for the next packet and the work descriptor list 1021 is the list of items to be streamed out by the packet transmitter 1040.
  • the buffer pointer FIFO effectively operates to control the flow of data.
  • FIG. 10C illustrates an example of a flow of data for software-based processing using buffer pointers and work descriptors in the SDP architecture 1000.
  • samples are received by the packet receiver 1030 of a processing node and are forwarded to a DSP core 1032 for processing using work descriptor queues as described herein.
  • the processed samples are then returned to the packet transmitter 1040 for transmission.
  • This process essentially combines the data paths described herein with reference to FIGS. 10A and 10B without the buffer pointer list 1011 and work descriptor list 1021 and associated DMAs 1020, 1050.
  • the DSP core 1032 when the DSP core 1032 processes the data, it returns the buffer pointer to the buffer pointer queue associated with the WQ DMA 1031 and sends the work descriptor to the work descriptor queue associated with the WQ DMA 1041 to enable the data to be queued for transmission by the packet transmitter 1040.
  • FIG. 10D illustrates an example of a flow of data that is stored in on- chip memory using buffer pointers and work descriptors in the SDP architecture 1000.
  • samples are received at the packet receiver 1030 and put in a work descriptor queue associated with the WQ DMA 1031 which forwards the work descriptor to the work descriptor queue associated with the DSP core 1032.
  • the processed data is sent to the work descriptor queue associated with a WQ DMA 1033 where it can be stored directly in on-chip storage such as a CMA.
  • the processed data can be sent to a work descriptor queue associated with a next consumer 1034 in the signal processing chain where it can be processed and moved along a signal processing data path.
  • the work descriptor queue can be stored in a CMA until it is needed by the next consumer 1034.
  • the respective WQ DMAs can be configured to move data from memory A to memory B.
  • the WQ DMAs create a work descriptor with a pointer to memory B.
  • the WQ DMAs also create a buffer pointer to memory A.
  • the SDP architecture 1000 performs a method for passing data between components such as a processing node or a hardware block where the method includes utilizing a buffer pointer queue to manage available memory, the buffer pointer queue comprising a plurality of buffer pointers that each identify a buffer address in memory that is available for storing data.
  • the method also includes utilizing a work descriptor queue to manage packets or samples to be processed, the work descriptor queue comprising a plurality of work descriptors that each identify a buffer address in memory that includes burst data to be processed.
  • the method includes retrieving a first buffer pointer from the buffer pointer queue, processing the received burst data, storing the processed burst data in memory at the buffer address identified by the first buffer pointer, and outputting a new work descriptor, the new work descriptor including the buffer address identified by the first buffer pointer.
  • the method also includes, responsive to the work descriptor queue having processed data to be transmitted, retrieving a first work descriptor from the work descriptor queue, obtaining the processed data from memory at the buffer address identified by the first work descriptor, releasing the buffer pointer that was associated with the work descriptor, the released buffer pointer corresponding to the buffer address identified by the first work descriptor, and transmitting the processed data by the packet transmitter.
  • the work descriptor further includes a length indicating the total length of the packet.
  • the work descriptor may also further include
  • burst start flag indicating that the burst data belongs to the first packet of a burst
  • burst end flag indicating that the burst data belongs to a last packet of a burst.
  • the work indicator may also indicate the burst data is a fully contained burst by setting the burst start flag and the burst end flag to true.
  • the SDP architecture can be further configured to add the new work descriptor to the work descriptor queue and/or to add the new buffer pointer to the buffer pointer queue.
  • the SDP architecture is further configured to, responsive to receiving the buffer pointer queue and the work descriptor queue, retrieve a second work descriptor from the work descriptor queue; obtain data from memory at the buffer address indicated by the second work descriptor; retrieve a second buffer pointer from the buffer pointer queue; process the retrieved data to generate output processed data; store the output processed data in memory at the buffer address indicated by the second buffer pointer; output a new work descriptor, the new work descriptor including the buffer address indicated by the second buffer pointer; and output a new buffer pointer, the new buffer pointer indicating the buffer address indicated by the second work descriptor.
  • Each work descriptor may also further include a burst identifier of the burst data to be processed and a burst length indicating an amount of storage occupied by the burst data to be processed.
  • the SDP architecture can also be configured to monitor a fill level of the work descriptor queue by the DOX and, responsive to determining that the fill level is below a threshold fill level, add one or more work descriptors to the work descriptor queue from a work descriptor list.
  • the SDP architecture can also be configured to monitor a fill level of the buffer pointer queue by the DCX and, responsive to determining that the fill level is below a threshold fill level, add one or more buffer pointers to the buffer pointer queue from a buffer pointer list.
  • data can be formatted in an efficient and consistent manner across the architecture.
  • incoming RF signals can be digitized and then formatted according to the disclosed data formatting.
  • data can be passed through the disclosed SDP architectures using the work descriptors and buffer pointers disclosed herein.
  • the disclosed data formatting modules are configured to create an adaptation layer between streaming and non-streaming or packetized types of data interfaces.
  • the adaptation layer enables connectivity to the disclosed SDP architectures and processing nodes and is computationally efficient.
  • Data in the SDP architecture may enter as a burst comprising a stream of samples or symbols. Given a burst of sample streams, the burst may be broken up into multiple messages, and the multiple messages may be transferred to a DCX for storage and/or processing by a DSP core.
  • the maximum message size can be configured by a user and a burst will always produce messages with maximum message size except the last message. The consideration for determining maximum message size is two-fold: convenience of DMA transfer size and frequency of message header transfer which may require frequent updates such as “current frequency offset” or “current symbol timing.”
  • sample streams received from other locations within the architecture can be formatted to conform to a target data format, referred to herein as a streaming mode.
  • the streaming mode is used when the sample streams are provided on chip (e.g., from a high-speed serial interface such as the one described herein with reference to FIG. 7A).
  • sample streams can be communicated using streaming or nonstreaming modes. This enables connecting with another FPGA or ASIC with the DCX- based designs described herein such that the DCX can directly send DCX-configured packets through the high-speed serial interface with its associated packet header metadata.
  • Sample streams or data received through a different interface or source can be assumed to be already formatted using the target data formatting and that data can be received and processed in a non-streaming mode.
  • Software can configure components of the SDP architecture to accept sample streams (e.g., samples or symbols) from an interface and send sample streams to an interface when operating in the streaming mode.
  • the software may configure these components to split received sample streams into sizes corresponding to buffer or memory sizes (e.g., sizes identified in the buffer pointers and work descriptors).
  • buffer or memory sizes e.g., sizes identified in the buffer pointers and work descriptors.
  • the component creates or determines start and end markers based on the known buffer or memory sizes. As a result, the component breaks the incoming sample streams into smaller segments and numbers the segments to enable the other components of the SDP architecture to manage and process the data using work descriptors and buffer pointers, as described herein.
  • the components of the SDP architecture when the components of the SDP architecture receive packets with determined start and end points, the components can be configured to identify these points to control when to capture data, for example, using pre-defined boundaries.
  • the components when transmitting, can be configured to either pass start and end markers or to send valid data signals, depending at least in part on downstream use.
  • the disclosed data formatting modules provide an adaptation layer that is configured to process continuous or streaming data as well as data with fixed sizes or data that is packetized.
  • the adaptation layer allows the SDP architecture to handle streaming data as well as packetized data by packaging both types of data streams into a common data structure.
  • the disclosed adaptation layer can support user-defined burst data, which may be a frame or timeslot with an arbitrary length, because the burst boundaries can be preserved during the data formatting process.
  • FIG. 11 A illustrates an example of a data formatting module 1100 configured to receive digitized data from a high-speed serial receive module 1 1 10, similar to the HSS/ADC RX or HSS/SDP RX modules described herein with reference to FIG. 7A.
  • streaming data is received from the HSS/ADC RX module and non-streaming data is received from the HSS/SDP RX module.
  • the data formatting module 1100 is part of a processing node, such as the processing node 210 described herein with reference to FIG. 2.
  • the data formatting module 1100 provides the disclosed adaptation layer to enable storage and processing of streaming and non-streaming data in the SDP architecture.
  • the data formatting module 1 100 formats data in a way that allows processing nodes and hardware blocks to communicate data back and forth.
  • the high-speed serial receive module 11 10 sends digitized data to a streaming mode component 1120 to format the data using a TX compose module 1 125.
  • the TX compose module 1125 is configured to compose the data to be suitable for transmission to a DCX packet receive interface 1140.
  • the TX compose module 1 125 receives Rx data (e.g., Rx data 1 , Rx data 2) and a valid flag to indicate that the data is part of a valid burst.
  • the TX compose module 1 125 receives a ready signal or data from the DCX packet receive interface 1140 as well as configuration data from the DCX packet receive interface 1 140, similar to the data sent over the configuration ports described herein.
  • the data received from the high-speed serial receive module 1 1 10 includes I samples and Q samples.
  • the TX compose module 1125 is configured to package the received data into the format represented in FIG. 12A. Once received by the DCX packet receive interface 1 140, the data is stored in memory in the form represented in FIG. 12B. The TX compose module 1 125 passes this data to the DCX packet receive interface 1140 as well as an identified start of frame (sof) identifier, an end of frame (eof) identifier, the valid flag received from the high-speed serial receive module 1 1 10, and a status flag.
  • a nonstreaming mode component 1130 is configured to provide a data reformat I FIFO module 1 135 to convert the word size of the data to the word size expected by the DCX packet receive interface 1 140.
  • the expected word size is 128 bits.
  • the non-streaming mode component 1 130 can also be configured to receive a ready flag from the DCX packet receive interface 1 140.
  • nonstreaming mode component 1 130 can be configured to receive and pass on an identified start of frame (sof) identifier, an end of frame (eof) identifier, and the valid flag received from the high-speed serial receive module 1 1 10.
  • the operating mode can be switched between the streaming mode and the non-streaming mode using the mode flag.
  • the sample/symbol data can be passed to other parts of the SDP architecture or to other parts of the processing node that includes the DCX packet receive interface 1140.
  • the TX compose module 1 125 is configured to format sample/symbol streams into messages to be passed to the DCX, which then passes the data to the SDP architecture.
  • the TX compose module 1 125 is configured to break up the burst into multiple messages and to transfer the multiple messages to the DCX via the DCX packet receive interface 1140.
  • the TX compose module 1125 is configured to produce messages with a configured maximum message size except the last message.
  • FIG. 11 B illustrates an example of a data formatting module 1150 configured to receive processed data from a DCX packet transmit interface 1160 and to prepare the processed data for a high-speed serial transmit module 1190, similar to the HSS/DAC TX or HSS/SDP TX modules described herein with reference to FIG. 7A.
  • a streaming mode component 1 170 includes an RX parse module 1175 that is configured to process messages received from the DCX packet transmit interface 1 160 and to recover the sample/symbol stream therefrom.
  • the RX parse module 1175 is configured to receive data in the format represented in FIG. 12B and convert it to the format represented in FIG. 12A.
  • the recovered sample/symbols stream is then passed to the high-speed serial transmit module 1190, such as the HSS/DAC TX module described herein with reference to FIG. 7A.
  • a non-streaming mode component 1 180 includes a data reformat/FIFO module 1 185 that is configured to receive messages from the DCX packet transmit interface 1 160 and to reformat the data according to an expected message size.
  • the FIFO module 1185 is configured to take the data coming from the DCX packet transmit interface 1 160 and convert the data to the bit-width used by the high-speed serial interface. Otherwise, the non-streaming mode component 1 180 does not alter the data because the high-speed serial transmit module 1190 is configured to receive data formatted according to the format of FIG. 12B.
  • the non-streaming mode data can then be forwarded to the high-speed serial transmit module 1 190, such as a HSS/SDP TX module described herein with reference to FIG. 7A.
  • FIGS. 12A and 12B illustrate packet formats for data in the SDP architecture.
  • FIG. 12A illustrates the packet format at the DCX packet receive interface 1140 and the DCX packet transmit interface 1160, as described herein, and
  • FIG. 12B illustrates the packet format provided by the DCX packet receive interface 1 140 and stored in the DCX of a processing node or received by the DCX packet transmit interface 1 160 where it is configured to the packet format in FIG. 12A.
  • packets are received for a burst, the burst having a header, payload data, and a footer.
  • the data is reformatted by determining the number of payload messages in the burst, including that information in the footer, and then moving the header and footer together to form the first two words of the burst data, as represented in FIG. 12B.
  • a component reading in the burst data knows that it must read the first two words of the data and from the first two words the component will know how many payload messages to read to read in the entire burst. This provides an efficient method of reading in burst data compared to burst data that is packaged with an unknown number of payload messages.
  • the word size is 128 bits.
  • the combined header and footer is stored in the first 256 bits of the burst data.
  • the adaptation layer allows a buffer pointer to be created that can store the entire burst because it can determine an exact size in memory for the burst data after reading the header/footer combination. For example, once a component receives a footer, a work descriptor can be generated that contains the buffer pointer to the burst data which includes the header and footer combination and the length of the payload data.
  • the formatted data includes a burst ID that includes a lower 2 bits of a burst counter.
  • the formatted data includes a burst start flag that indicates the packet belongs to a start of a burst.
  • the formatted data includes a burst end flag that indicates the packet belongs to an end of a burst. In some embodiments, if a packet includes both a burst start flag and a burst end flag, the packet can be considered to fully contain the burst data.
  • the high-speed serial receive module 1 1 10 is configured to break down data coming in and the streaming mode component 1120 (via the TX compose module 1 125) is configured to create packets based on the data structure illustrated in FIG. 12A.
  • the data is divided into messages (e.g., 128-bit words).
  • the high-speed serial receive module 1 1 10 generates a header (e.g., 192 bits) that is then followed by one or more payload messages (e.g., each being 128 bits).
  • the high-speed serial receive module 1 1 10 can generate a footer (e.g., 64 bits) that is included after the one or more payload messages.
  • a burst counter can be used and can increment each time a new burst is received (e.g., each time a new start flag is encountered).
  • a segment counter can be used and can increment for each payload message that is included in the burst data structure.
  • Each burst can be arbitrarily sized (e.g., 4 packets, 2 packets, 10 packets, then 1 packet, etc.).
  • the segment counter can be reset each time a new burst is seen.
  • the adaptation layer represented by the data formatting module 1 100, is then configured to convert the data to the data structure illustrated in FIG. 12A. That is, RX parse module 1175 is configured to receive packets in the format represented in FIG. 12A and break them down to a valid signal. This is the data structure used to store the data in the DCX and/or elsewhere in the SDP architecture.
  • the data structure includes a combined header and footer in the first two words (e.g., 256 bits). This size can be set based on the characteristics of the memory of the DCX. This allows the components to do a two-word read to read all metadata associated with a burst. Consequently, the component knows exactly how much data to read from memory to read in all the data for a burst.
  • the RX parse module 1175 provides a pull interface and high-speed serial transmit module 1 190 can be configured to throttle how fast it pulls data from this interface.
  • Messages in the SDP architecture can be transmitted a word at a time (e.g., 128 bits at a time).
  • the data structures can have a width of one word and a depth of anywhere up to 2048, for example.
  • the maximum message depth can be configured in the SDP architecture.
  • the depth can be 16, 32, 64, 128, 2048.
  • the maximum depth includes 2 header rows, payload rows, and 1 dead cycle after the end of the burst data.
  • the adaptation layer can perform a method for converting between sample streams or symbols streams and messages in a signal processing architecture for storing and processing by a processing node that includes a digital signal processor (DSP) core and an extended direct memory access controller (DCX).
  • the method includes receiving a sample stream that includes a burst to be processed by the signal processing architecture.
  • the method also includes generating a header message including information related to the burst.
  • the method also includes splitting the sample stream into a plurality of burst messages, a size of each burst message, except for a final burst message corresponding to an end of the burst, corresponding to a buffer size in the DCX, the size of the final burst message being less than or equal to the buffer size in the DCX.
  • the method also includes generating a footer message including information related to a size of the plurality of burst messages.
  • the method also includes transferring a burst interface packet to the DCX, the burst interface packet including the header message, the plurality of burst messages, and the footer message.
  • the method also includes reformatting the burst interface packet into a burst memory packet for storage in the DCX, the burst memory packet including the header message and the footer message in an initial portion of the burst memory packet and the plurality of burst messages following the initial portion of the burst memory packet.
  • the initial portion of the burst memory packet indicates the number of burst messages in the burst memory packet.
  • the method of the adaptation layer can also identify a start flag and an end flag within the sample stream to determine end points of the burst in the sample stream.
  • the method of the adaptation layer can also identify a first start flag and a second start flag within the sample stream to determine end points of the burst in the sample stream, the end points being the first start flag and data preceding but not including the second start flag.
  • splitting the sample stream into the plurality of burst messages is responsive to identifying a start of frame indicator in the sample stream. In some embodiments, splitting the sample stream into the plurality of burst messages terminates responsive to identifying an end of frame indicator in the sample stream.
  • the adaptation layer in a non-streaming mode, is configured to convert a first word size of data in the burst interface packet to a second word size that is compatible with the DCX, the second word size being greater than the first word size.
  • SDP memory e.g., shared memory in a DCX, CMA, or other on-chip memory
  • SDP memory may comprise memory read/write logic that not only performs read/write interleaving across multiple banks of memory (e.g., RAM) to provide very high read/write throughput but also provides flexibility with respect to data format or resolution changes in the data written into and read out of the memory.
  • FIG. 13 illustrates a memory module 1300 in a DCX, the memory module 1300 including read ports 1301 , write ports 1303, a memory arbiter 1305, and memory banks 1307.
  • the memory module 1300 is split into multiple channels and banks to provide multiple access into the memory simultaneously.
  • the channels can be high-order interleaved and the banks can be low-order interleaved where each memory module can be selected based on the bank and the channel derived from the address requested.
  • the memory module 1300 can be a single port memory configured to allow either a read or a write in a single clock cycle. By interleaving several of these memory modules, a higher read and write bandwidth can be achieved because each of these memory modules can be either read from or written to simultaneously.
  • the memory module 1300 can be configured and operated in a way that effectively creates a multi-port memory, e.g., a quad-port memory that allows four simultaneous reads or writes every clock cycle.
  • the read ports 1301 and/or write ports 1303 can be configured to access memory sequentially. Single random reads and writes can be accomplished in the memory module 1300. However, for large data transfers the requested address can be configured to increase sequentially every clock cycle to achieve high data transfer rates.
  • the memory arbiter 1305 is configured to manage access to the memory banks 1307 in the event simultaneous access is requested. In some embodiments, the memory arbiter 1305 can determine access priority by assigning a higher priority to the request that is received from a hardware block of the signal processing architecture. In some embodiments, the memory arbiter 1305 can determine access priority by assigning a lower priority to the component that accessed the memory module 1300 most recently. The other requests are delayed for a clock cycle before determining access once more.
  • FIG. 14A illustrates an example memory module 1400 of a capture memory array 1410 (CMA) that includes a memory bank 1407 that is split into multiple channels and banks to provide multiple access capability of the memory simultaneously, the CMA 1410 being similar to the CMA described herein with reference to FIGS. 6A, 6B, and 7A.
  • the channels may be high-order interleaved and the banks may be low-order interleaved.
  • the memory module 1400 also includes a read port 1401 , a write port 1403, and a memory arbiter 1405.
  • the memory arbiter 1405 acts similarly to the memory arbiter 1305 described herein with reference to FIG. 13.
  • FIG. 14B illustrates the CMA 1410 with multiple memory modules 1400 where each memory module 1400 can be selected based on the bank and the channel derived from a requested address.
  • Each memory bank 1407 may be made up of 8 smaller single port RAMs. These RAMs may be arranged in groups of 4 that are interleaved on the lower address bits, making each consecutive access go to a separate RAM. Additionally, these two groups of 4 RAMs may make up the upper half and the lower half of the memory region. Each of these memories may use handshake signals to indicate the process (read/write) requesting access. When access requests go to the same RAM, the memory arbiter 1405 can be used to determine which request to delay. When access requests go to separate RAMs, the read and write processes can be performed in parallel.
  • arbitration logic may be used to determine when a last access to the RAM occurred and decide who (e.g., whether the read or the write) should get access. This may give both read and write processes fair access to the RAM in case one of the processes tries to access the same RAM consecutively, thereby avoiding lockup conditions.
  • the memory module may be split into multiple channels and banks to provide multiple access into the memory module simultaneously.
  • the channels may basically be high-order interleaved and the banks may be low-order interleaved, where each memory module may be selected based on the bank and the channel derived from the address requested.
  • Each memory bank 1407 may be a single port memory allowing either a read or write process in a single clock cycle. However, due to interleaving of several memory modules, a higher read write bandwidth may be achieved as different memory modules can be either read from or written to simultaneously.
  • the memory module 1400 may create a quad-port memory that allows four simultaneous reads or writes every clock cycle.
  • the ports may be used for large data transfers and hence may access memory sequentially, for example. Single random reads and writes may be allowed for large data transfers, though the address requested may increase sequentially every clock cycle to achieve high data transfer rates.
  • the memory module 1400 can be configured to have an interface configured to allow requests from a network interconnect to be able to read from and write into the CMA 1410.
  • the memory module 1400 may also have a particular handshake-based interface to allow DMA to directly access the memory bank 1407.
  • the CMA 1410 may contain memory banks that allow storing a number of data samples. This memory may be continuous in the memory map. Each memory bank may be a dual port memory and may be designed using multiple interleaved single port memories and an arbiter, as described herein.
  • a memory arbiter such as the memory arbiter 1305 or the memory arbiter 1405, performs a method for controlling access to memory in a signal processing architecture.
  • the memory can be part of on-chip memory, such as a capture memory array (CMA), or part of a DCX.
  • the memory includes a plurality of random access memory (RAM) modules, each RAM module being logically split into a plurality of memory banks that are sequentially arranged.
  • the method includes receiving a plurality of requests to access RAM modules in the memory, each request of the plurality of requests including a memory address in the memory corresponding to a memory bank within a particular RAM module.
  • the method also includes, for each request, deriving from the memory address in the request a particular bank of the plurality of banks in the RAM module, the particular bank including the memory address in the request.
  • the method also includes, responsive to determining that two requests of the plurality of requests request access the same bank in the same RAM module, determining a priority among the two requests; granting access to the requested bank to the request of the two requests with a higher priority; and delaying the request of the two requests with a lower priority by a clock cycle.
  • the method also includes, for each request that requests consecutive access to the memory, granting access to a bank of the plurality of memory banks that is sequentially after the bank in the request.
  • the plurality of banks is low-ordered interleaved and the number of banks of the plurality of banks is a power of 2.
  • the plurality of RAM modules is further divided into a plurality of channels, the number of channels of the plurality of channels is a power of 2. In such embodiments, the plurality of channels can be high-ordered interleaved.
  • the memory arbiter is further configured to, responsive to determining that two requests of the plurality of requests results in a request to access different banks in the same RAM module, grant simultaneous access to the two requests to the respective requested banks. In some embodiments, determining the priority comprises assigning a lower priority to the request that most recently accessed the requested RAM module.
  • the order of the steps and/or phases can be rearranged and certain steps and/or phases may be omitted entirely.
  • the methods described herein are to be understood to be open-ended, such that additional steps and/or phases to those shown and described herein can also be performed.
  • Computer software can comprise computer executable code stored in a computer readable medium (e.g., non-transitory computer readable medium) that, when executed, performs the functions described herein.
  • computerexecutable code is executed by one or more general purpose computer processors.
  • any feature or function that can be implemented using software to be executed on a general-purpose computer can also be implemented using a different combination of hardware, software, or firmware.
  • such a module can be implemented completely in hardware using a combination of integrated circuits.
  • such a feature or function can be implemented completely or partially using specialized computers designed to perform the particular functions described herein rather than by general purpose computers.
  • Multiple distributed computing devices can be substituted for any one computing device described herein.
  • the functions of the one computing device are distributed (e.g., over a network) such that some functions are performed on each of the distributed computing devices.
  • any such computer program instructions may be loaded onto one or more computers, including without limitation a general-purpose computer or special purpose computer, or other programmable processing apparatus to produce a machine, such that the computer program instructions which execute on the computer(s) or other programmable processing device(s) implement the functions specified in the equations, algorithms, and/or flowcharts. It will also be understood that each equation, algorithm, and/or block in flowchart illustrations, and combinations thereof, may be implemented by special purpose hardware-based computer systems which perform the specified functions or steps, or combinations of special purpose hardware and computer- readable program code logic means.
  • computer program instructions such as embodied in computer-readable program code logic, may also be stored in a computer readable memory (e.g., a non-transitory computer readable medium) that can direct one or more computers or other programmable processing devices to function in a particular manner, such that the instructions stored in the computer-readable memory implement the function(s) specified in the block(s) of the flowchart(s).
  • a computer readable memory e.g., a non-transitory computer readable medium
  • the computer program instructions may also be loaded onto one or more computers or other programmable computing devices to cause a series of operational steps to be performed on the one or more computers or other programmable computing devices to produce a computer-implemented process such that the instructions which execute on the computer or other programmable processing apparatus provide steps for implementing the functions specified in the equation(s), algorithm(s), and/or block(s) of the flowchart(s).
  • the computer system may, in some cases, include multiple distinct computers or computing devices (e.g., physical servers, workstations, storage arrays, etc.) that communicate and interoperate over a network to perform the described functions.
  • Each such computing device typically includes a processor (or multiple processors) that executes program instructions or modules stored in a memory or other non-transitory computer-readable storage medium or device.
  • the various functions disclosed herein may be embodied in such program instructions, although some or all of the disclosed functions may alternatively be implemented in application-specific circuitry (e.g., ASICs or FPGAs) of the computer system. Where the computer system includes multiple computing devices, these devices may, but need not, be co-located.
  • the results of the disclosed methods and tasks may be persistently stored by transforming physical storage devices, such as solid state memory chips and/or magnetic disks, into a different state.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Transceivers (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

Radio systems and transceivers are described with a software defined physical layer (SDP) architecture that incorporate processing nodes with existing hardware blocks. The processing nodes provide a soft-processing design that supports existing functionality and also enables future updates to provide additional and/or improved flexibility. The processing nodes can be added to existing hardware designs to support existing functionality, providing additional processing power to existing hardware blocks when requested or needed, and to provide expanded capabilities by taking over certain functions performed by existing hardware blocks or by replacing the existing hardware blocks in the processing chain. The processing nodes allow radio systems to maintain their core design while adding a soft-processing design that can assist in performing the core tasks of the hardware blocks and that can eventually take over and replace the functionality of the hardware blocks.

Description

PROCESSING NODES FOR SIGNAL PROCESSING IN RADIO TRANSCEIVERS
CROSS-REFERENCE TO RELATED APPLICATION(S)
[0001] This application claims the benefit of priority to U.S. Prov. App. No. 63/516,434 filed July 28, 2023 and entitled PROCESSING NODES FOR SIGNAL PROCESSING IN RADIO TRANSCEIVERS, the entire contents of which is incorporated by reference in its entirety for all purposes.
BACKGROUND
Field
[0002] The present disclosure generally relates to processing nodes for signal processing in radio transceivers.
Description of Related Art
[0003] Signal processing in a radio transceiver involves various operations to manipulate and enhance received and transmitted signals. A radio transceiver can receive an analog radio-frequency (RF) signal and convert it into a digital signal via analog to digital conversion (ADC) and relatedly, the radio transceiver can convert a digital signal ready for transmission via digital to analog conversion (DAC). Digital signal processing includes the processing steps performed on the digital signal (e.g., after processing by the ADC and/or before processing by the DAC). Digital signal processing can be applied to extract, filter, and enhance useful information in the signal. This processing may include processes or operations such as filtering, equalization, modulation, demodulation, channel coding or decoding, error correction, and noise reduction. Other processes that can be performed include frequency conversion (e.g., converting to a baseband frequency or to a carrier frequency), gain control, amplification, and the like.
SUMMARY
[0004] According to a number of implementations, the present disclosure relates to a processing node (PN) that includes a first digital signal processor (DSP) core; a second DSP core; a plurality of extended direct memory access controllers (DCX), each DCX having shared memory space, an input packet interface, and an output packet interface, the input packet interface configured to receive samples from a hardware block separate from the processing node, the shared memory space configured to store the received samples, and the output packet interface configured to transmit samples processed by the first DSP core or the second DSP core to the hardware block; and a PN network interconnect configured to communicably couple the first DSP core, the second DSP core, and the plurality of DCX, each DSP core and DCX coupled to the PN network interconnect through a respective master interface and a respective slave interface, the PN network interconnect further including an SDP master interface and an SDP slave interface each configured to communicate with an SDP network interconnect. The processing node is configured to be integrated into a radio transceiver comprising the hardware block and to interface with the hardware block to provide configurable processing functionality to the radio transceiver.
[0005] In some embodiments, the PN network interconnect further includes a configuration interface configured to enable the processing node to configure the hardware block. In some embodiments, the processing node further includes a queue interface configured to transfer commands or data from the first DSP core to the second DSP core and to transfer commands or data from the second DSP core to the first DSP core. In some embodiments, the processing node further includes a first queue interface and a second queue interface, the first queue interface configured to transfer commands or data from the first DSP core to the second DSP core, the second queue interface configured to transfer commands or data from the second DSP core to the first DSP core.
[0006] In some embodiments, each DSP core includes a general-purpose input-output (GPIO) port connected to a configuration register and configured to receive input for placement in the configuration register and to transmit data stored in the configuration register. In some embodiments, each DSP core is configured to receive interrupt requests through the PN network interface from the hardware block that is separate from the processing node. [0007] In some embodiments, a first DCX of the plurality of DCX is configured to: receive a plurality of samples, from a first hardware block, to be processed through the input packet interface; temporarily store the plurality of samples in the shared memory space; and transmit the plurality of samples to the first DSP core through the PN network interconnect. In some embodiments, the first DSP core or the second DSP core is configured to: program the first DCX to convey the plurality of samples to the first DSP core; place the plurality of samples into an internal memory space of the first DSP core; process the plurality of samples; and place the processed samples into the internal memory space of the first DSP core. In some embodiments, the first DCX is further configured to reformat the received plurality of samples. In some embodiments, the first DCX is configured to reformat the received plurality of samples by sign extending samples of the received plurality of samples to increase the number of bits for each sample. In some embodiments, the first DCX is configured to reformat the received plurality of samples by bit clipping samples of the received plurality of samples to reduce a resolution of each sample.
[0008] In some embodiments, a first DCX of the plurality of DCX is configured to: receive a plurality of samples to be processed through the input packet interface; temporarily store the plurality of samples in the shared memory space; and transmit the plurality of samples to the hardware block separate from the processing node for processing using the output packet interface. In some embodiments, the first DCX is configured to: receive the processed plurality of samples from the hardware block; and temporarily store the processed plurality of samples in the shared memory space.
[0009] In some embodiments, the first DSP core and the second DSP core are configured to be used both as separate entities and as a shared dual-core configuration. In some embodiments, the first DSP core and the second DSP core each include two processors and the plurality of DCX includes a DCX for each processor of the first DSP core and the second DSP core. In some embodiments, the SDP master interface and the SDP slave interface of the PN network interconnect are configured to communicate with a PN network interconnect of a different processing node in the radio transceiver via the SDP network interconnect. [0010] In some embodiments, the first DSP core, the second DSP core, and each of the plurality of DCX includes a configuration register configured to store data to configure the associated DSP core or DCX. In some embodiments, the processing node is configured to be implemented within a demodulator of the radio transceiver. In some embodiments, the processing node is configured to be implemented within a decoder of the radio transceiver. In some embodiments, the processing node is configured to be implemented within a modulator or encoder of a transmitter of the radio transceiver.
[0011] According to a number of implementations, the present disclosure relates to a signal processing architecture comprising: a software defined physical layer (SDP) network interconnect; a plurality of processing nodes connected to the SDP network interconnect and configured to provide configurable processing power to process receiver and transmitter waveforms in a radio transceiver, each processing node including: a plurality of digital signal processing (DSP) cores; a plurality of extended direct memory access controllers (DCX); and a PN network interconnect connected to the plurality of DSP cores, to the plurality of DCX, and to the SDP network interconnect; a capture memory array (CMA) comprising a plurality of memory banks that are connected to the SDP network interconnect to provide access to the plurality of memory banks for the plurality of processing nodes; and a CPU subsystem connected to the SDP network interconnect, wherein the SDP network interconnect enables communication among each of the plurality of processing nodes, the CMA, and the CPU subsystem to augment processing power and functionality in the radio transceiver.
[0012] In some embodiments, one or more of the plurality of processing nodes can be dynamically allocated to provide signal processing power to one or more hardware blocks of the radio transceiver. In some embodiments, each processing node of the plurality of processing nodes is configured to interface with one or more individual hardware blocks of both receiver and transmitter signal processing data paths in the radio transceiver.
[0013] In some embodiments, a first processing node of the plurality of processing nodes is implemented in an encoder or modulator of the radio transceiver. In some embodiments, a second processing node of the plurality of processing nodes is implemented in a demodulator of the radio transceiver. In some embodiments, a third processing node of the plurality of processing nodes is implemented in a decoder of the radio transceiver.
[0014] In some embodiments, the signal processing architecture further includes an external memory connected to the CPU subsystem, the plurality of processing nodes configured to pass data from individual DSP cores to the external memory through the SDP network interconnect. In some embodiments, individual processing nodes of the plurality of processing nodes are integrated within different portions of a demodulator. In some embodiments, each processing node includes an SDP master interface and an SDP slave interface to the SDP network interconnect, the CMA includes a plurality of SDP master interfaces to the SDP network interconnect, and the CPU subsystem includes an SDP master interface and an SDP slave interface to the SDP network interconnect. In some embodiments, the signal processing architecture further includes a second SDP network interconnect connected to the SDP network interconnect; and a second plurality of processing nodes connected to the second SDP network interconnect, each processing node of the second plurality of processing nodes including one or more digital signal processing (DSP) cores, one or more extended direct memory access controllers (DCX), and a PN network interconnect connected to the plurality of DSP cores, to the plurality of DCX, and to the second SDP network interconnect.
[0015] According to a number of implementations, the present disclosure relates to a method for passing data to a processing node in a signal processing architecture that includes a software defined physical layer (SDP) network interconnect connected to the processing node, the processing node including a digital signal processing (DSP) core, an extended direct memory access controller (DCX), a packet receiver, a packet transmitter, and a PN network interconnect connected to the SDP network interconnect, the method comprising: utilizing a buffer pointer queue to manage available memory, the buffer pointer queue comprising a plurality of buffer pointers that each identify a buffer address in memory that is available for storing data; utilizing a work descriptor queue to manage burst data to be processed, the work descriptor queue comprising a plurality of work descriptors that each identify a buffer address in memory that includes burst data to be processed; responsive to the packet receiver receiving burst data to be processed: retrieving a first buffer pointer from the buffer pointer queue; processing the received burst data; storing the processed burst data in memory at the buffer address identified by the first buffer pointer; outputting a new work descriptor, the new work descriptor including the buffer address identified by the first buffer pointer; and responsive to the work descriptor queue having processed data to be transmitted: retrieving a first work descriptor from the work descriptor queue; obtaining the processed data from memory at the buffer address identified by the first work descriptor; outputting a new buffer pointer, the new buffer pointer corresponding to the buffer address identified by the first work descriptor; and transmitting the processed data by the packet transmitter.
[0016] In some embodiments, the work descriptor further includes a data header length indicating an amount of storage occupied by a data header associated with the burst data. In some embodiments, the work descriptor further includes: a burst start flag indicating that the burst data belongs to a first packet of a burst; and a burst end flag indicating that the burst data belongs to a last packet of a burst. In some embodiments, the work indicator indicates the burst data is a fully contained burst by setting the burst start flag and the burst end flag to true.
[0017] In some embodiments, the method further includes adding the new work descriptor to the work descriptor queue. In some embodiments, the method further includes adding the new buffer pointer to the buffer pointer queue. In some embodiments, the method further includes, responsive to receiving the buffer pointer queue and the work descriptor queue: retrieving a second work descriptor from the work descriptor queue; obtaining data from memory at the buffer address indicated by the second work descriptor; retrieving a second buffer pointer from the buffer pointer queue; processing the retrieved data to generate output processed data; storing the output processed data in memory at the buffer address indicated by the second buffer pointer; outputting a new work descriptor, the new work descriptor including the buffer address indicated by the second buffer pointer; and outputting a new buffer pointer, the new buffer pointer indicating the buffer address indicated by the second work descriptor. In some embodiments, each work descriptor further includes a burst identifier of the burst data to be processed and a burst length indicating an amount of storage occupied by the burst data to be processed.
[0018] In some embodiments, the method further includes monitoring a fill level of the work descriptor queue by the DCX; responsive to determining that the fill level is below a threshold fill level, adding one or more work descriptors to the work descriptor queue from a work descriptor list.
[0019] In some embodiments, the method further includes monitoring a fill level of the buffer pointer queue by the DCX; responsive to determining that the fill level is below a threshold fill level, adding one or more buffer pointers to the buffer pointer queue from a buffer pointer list.
[0020] According to a number of implementations, the present disclosure relates to a method for converting between sample streams or symbols streams and messages in a signal processing architecture for storing and processing by a processing node that includes a digital signal processor (DSP) core and an extended direct memory access controller (DCX), the method comprising: receiving a sample stream that includes a burst to be processed by the signal processing architecture; generating a header message including information related to the burst; splitting the sample stream into a plurality of burst messages, a size of each burst message, except for a final burst message corresponding to an end of the burst, corresponding to a buffer size in the DCX, the size of the final burst message being less than or equal to the buffer size in the DCX; generating a footer message including information related to a size of the plurality of burst messages; transferring a burst interface packet to the DCX, the burst interface packet including the header message, the plurality of burst messages, and the footer message; and reformatting the burst interface packet into a burst memory packet for storage in the DCX, the burst memory packet including the header message and the footer message in an initial portion of the burst memory packet and the plurality of burst messages following the initial portion of the burst memory packet, wherein the initial portion of the burst memory packet indicates the number of burst messages in the burst memory packet. [0021] In some embodiments, the sample stream is received by a component of the processing node. In some embodiments, the method further includes identifying a start flag and an end flag within the sample stream to determine end points of the burst in the sample stream. In some embodiments, the method further includes identifying a first start flag and a second start flag within the sample stream to determine end points of the burst in the sample stream, the end points being the first start flag and data preceding but not including the second start flag. In some embodiments, each packet of the sample stream has a size in bits equal to a size of a word in the DCX. In some embodiments, splitting the sample stream into the plurality of burst messages is responsive to identifying a start of frame indicator in the sample stream. In some embodiments, splitting the sample stream into the plurality of burst messages terminates responsive to identifying an end of frame indicator in the sample stream. In some embodiments, the header message includes a burst counter that increments responsive to identifying a boundary of the burst. In some embodiments, the initial portion of the burst memory packet is sized to be less than or equal to a size of two words in memory of the DCX. In some embodiments, reformatting the burst interface packet further includes converting a first word size of data in the burst interface packet to a second word size that is compatible with the DCX, the second word size being greater than the first word size.
[0022] According to a number of implementations, the present disclosure relates to a method for accessing memory in a signal processing architecture that includes a capture memory array (CMA), the CMA including a plurality of random access memory (RAM) modules, each RAM module being logically split into a plurality of memory banks that are sequentially arranged, the method comprising: receiving a plurality of requests to access RAM modules in the CMA, each request of the plurality of requests including a memory address in the CMA corresponding to a memory within a particular RAM module; for each request, deriving from the memory address in the request a particular bank of the plurality of banks in the RAM module, the particular bank including the memory address in the request; responsive to determining that two requests of the plurality of requests request access the same bank in the same RAM module: determining a priority among the two requests; granting access to the requested bank to the request of the two requests with a higher priority; and delaying the request of the two requests with a lower priority by a clock cycle; and for each request that requests consecutive access to the CMA, granting access to a bank of the plurality of memory banks that is sequentially after the bank in the request.
[0023] In some embodiments, the plurality of banks is low-ordered interleaved and the number of banks of the plurality of banks is a power of 2. In some embodiments, the plurality of RAM modules is further divided into a plurality of channels, the number of channels of the plurality of channels is a power of 2. In some embodiments, the plurality of channels is high-ordered interleaved.
[0024] In some embodiments, the method further includes, responsive to determining that two requests of the plurality of requests results in a request to access different banks in the same RAM module, granting simultaneous access to the two requests to the respective requested banks. In some embodiments, determining the priority comprises assigning a lower priority to the request that most recently accessed the requested RAM module.
[0025] For purposes of summarizing the disclosure, certain aspects, advantages and novel features have been described herein. It is to be understood that not necessarily all such advantages may be achieved in accordance with any particular embodiment. Thus, the disclosed embodiments may be carried out in a manner that achieves or optimizes one advantage or group of advantages as taught herein without necessarily achieving other advantages as may be taught or suggested herein.
BRIEF DESCRIPTION OF THE DRAWINGS
[0026] FIG. 1 illustrates an example radio system with a modem that incorporates a plurality of processing nodes with a plurality of hardware blocks.
[0027] FIG. 2 illustrates an example processing node that includes a plurality of digital signal processing (DSP) cores and a plurality of extended direct memory access (DMA) controllers (DCX).
[0028] FIG. 3A illustrates a diagram of an example DCX of a processing node, such as the processing node of FIG. 2. [0029] FIG. 3B illustrates a diagram of an example core of a DSP core of a processing node, such as the processing node of FIG. 2.
[0030] FIGS. 3C and 3D illustrates a detailed diagram of the example DOX of FIG. 3A.
[0031] FIG. 4 illustrates a diagram of data flow through a processing node, such as the DOX and the DSP core of FIGS. 3A and 3B.
[0032] FIG. 5A illustrates an example of a shared memory model for DSP cores in a processing node.
[0033] FIG. 5B illustrates an example of a producer-consumer model for DSP cores in a processing node.
[0034] FIGS. 6A and 6B illustrate example software-defined physical layer (SDP) architectures.
[0035] FIG. 7A illustrates an example radio transceiver that includes a CPU subsystem, a demodulator, a high-speed serial module, a decoder module, and a transmit module, the example radio transceiver including an SDP architecture similar to the SDP architecture of FIG. 6B.
[0036] FIG. 7B illustrates the demodulator back module of FIG. 7A in greater detail to show connections between hardware blocks and a processing node.
[0037] FIG. 8A illustrates an example of a buffer pointer.
[0038] FIG. 8B illustrates an example of a work descriptor.
[0039] FIG. 9 illustrates an example of a component that utilizes buffer pointers and work descriptors to facilitate data processing in an SDP architecture.
[0040] FIG. 10A illustrates receiving data at a packet receiver through a portion of a data path in an SDP architecture, components of the SDP architecture operating like the component of FIG. 9.
[0041] FIG. 10B illustrates passing data through a portion of a data path in an SDP architecture to a packet transmitter for transmission, components of the SDP architecture operating like the component of FIG. 9.
[0042] FIG. 10C illustrates an example of a flow of data for software-based processing using buffer pointers and work descriptors in an SDP architecture, components of the SDP architecture operating like the component of FIG. 9. [0043] FIG. 10D illustrates an example of a flow of data that is stored in on- chip memory using buffer pointers and work descriptors in an SDP architecture, components of the SDP architecture operating like the component of FIG. 9.
[0044] FIG. 1 1A illustrates an example of a data formatting module configured to receive digitized data from a high-speed serial receive module, similar to the HSS/ADC RX or HSS/SDP RX modules described herein with reference to FIG. 7A.
[0045] FIG. 1 1 B illustrates an example of a data formatting module configured to receive processed data from a DCX packet transmit interface and to prepare the processed data for a high-speed serial transmit module, similar to the HSS/DAC TX or HSS/SDP TX modules described herein with reference to FIG. 7A.
[0046] FIGS. 12A and 12B illustrate packet formats for data in the SDP architecture.
[0047] FIG. 13 illustrates a memory module in a DCX, the memory module including read ports, write ports, a memory arbiter, and memory banks.
[0048] FIG. 14A illustrates an example memory module of a capture memory array (CMA) that includes a memory bank that is split into multiple channels and banks to provide multiple access capability of the memory simultaneously, the CMA being similar to the CMA described herein with reference to FIGS. 6A, 6B, and 7A.
[0049] FIG. 14B illustrates the CMA with multiple memory modules where each memory module can be selected based on the bank and the channel derived from a requested address.
DETAILED DESCRIPTION OF SOME EMBODIMENTS
[0050] The headings provided herein, if any, are for convenience only and do not necessarily affect the scope or meaning of the claimed invention.
Overview
[0051] A radio system can be used to transmit and receive signals in a variety of architectures, such as land-based radio systems, satellite systems, hybrid systems with land- and satellite-based radios, and the like. Signal processing in a radio system involves various operations to manipulate and enhance received and transmitted signals. Signal processing can be accomplished using hardware that is specifically or specially designed to accomplish certain tasks such as modulation, demodulation, encoding or decoding, error correction, noise reduction, and the like. The physical layer (PHY) of a radio communication system encompasses the components and processes involved in the transmission and reception of the physical signals. It deals with the physical characteristics of the transmitted and received signals, including their modulation, coding, and transmission over the physical medium. The physical layer of a radio system encompasses the hardware components, modulation techniques, coding schemes, transmission medium, antennas, and signal conditioning processes involved in transmitting and receiving the physical signals.
[0052] In certain radio systems, hardware can be specifically designed to perform certain signal processing functions at the physical layer of the radio system. These can include, for example, application-specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), digital signal processors (DSPs), and the like. In radio systems that deal with different waveforms, such as in a satellite system that communicates with different satellites, each waveform may have hardware specific to that waveform. This means that a radio system that communicates with multiple satellites with different waveforms may have an ASIC and/or other hardware components specific to each different waveform. In addition, a radio system (e.g., a terminal in a satellite system) that includes specifically designed hardware to perform signal processing tasks lacks flexibility to add capabilities or improvements in the future.
[0053] Accordingly, described herein are radio systems and transceivers with a software-defined physical layer (SDP) architecture that incorporate processing nodes with existing hardware blocks. The processing nodes provide a soft-processing design that supports existing functionality and enables future updates to provide additional and/or improved flexibility. The processing nodes may also be able to adapt or to be reconfigured to process new waveforms to enable communication with new or different satellites even when the existing signal processing hardware (e.g., hardware blocks, processing nodes, etc.) is not specifically configured to operate with the new waveforms. As used herein, the term hardware block can be used to refer to any signal processing component and/or module that may be used in modem designs for radio systems and/or transceivers.
[0054] The disclosed systems, devices, architectures, and methods that incorporate the disclosed processing nodes support legacy abilities in that the processing nodes are added to existing hardware designs and configured to operate as such. This allows the processing nodes to support existing functionality, provide additional processing power to existing hardware blocks when requested or needed, and to provide expanded capabilities by taking over certain functions performed by existing hardware blocks or by replacing the existing hardware blocks in the processing chain. The disclosed processing nodes can support signal processing during the initial deployment of a terminal and can be fine-tuned over time to provide expanded and/or refined functionality. Thus, the disclosed processing nodes allow radio systems, such as terminals, to maintain their core design (e.g., the same hardware blocks performing the same functions) while adding a soft-processing design that can assist in performing the core tasks of the hardware blocks and that can eventually take over and replace the functionality of the hardware blocks.
[0055] In some implementations, the disclosed processing nodes include a plurality of digital signal processor (DSP) cores. The processing nodes may be configured to interface with existing hardware blocks in the radio system, allowing communication between the processing nodes and the hardware blocks. In the radio system, multiple processing nodes can be added to augment or assist certain hardware blocks to assist with a variety of signal processing tasks. In certain implementations, each processing node can be structurally and/or architecturally identical and/but can be programmed to perform different signal processing tasks according to the hardware blocks with which the processing node is associated or according to the hardware block that the processing node is replacing. The processing nodes can be configured to operate independently and can be added at any location in the radio system where corresponding processing is desirable. The processing nodes can be added to provide software-defined functionality, which can advantageously enable augmented capabilities relative to existing hardware blocks.
[0056] Each processing node can be configured to provide a control and configuration interface with existing hardware blocks. This allows the hardware blocks to access the processing and memory of the processing nodes. The processing nodes can provide a bypass processing route (e.g., bypassing the hardware block(s)) or an enhanced processing route (e.g., providing additional functionality to the hardware block(s)). As used herein, hardware blocks can refer to components that typically provide the functionality of the digital receiver/transmitter physical layer in a radio transceiver. In some implementations, a hardware block is a signal processing component of the physical layer in a radio transceiver that is separate from the disclosed processing nodes.
[0057] Some radio systems employ a software-defined radio (SDR). SDR refers to a radio system in which many traditional hardware components of a radio transceiver are replaced or augmented by software processing. In an SDR, the majority of the signal processing functions are implemented in software, providing flexibility, reconfigurability, and the ability to adapt to different communication standards and protocols. The defining characteristic of an SDR is its ability to perform RF signal processing using software algorithms rather than relying on fixed-function hardware. The disclosed processing nodes provide functionality similar to a software- defined radio in that they provide a software-defined physical layer (SDP).
[0058] Thus, as disclosed herein, the SDP architecture is a cluster of DSP processors that can be tightly integrated with existing hardware blocks (e.g., modemcodec blocks), becoming part of a radio terminal ASIC. The disclosed SDP architectures provide the flexibility to implement a portion or all of the digital receiver/transmitter physical layer functionality of the ASIC in the processing nodes. The disclosed SDP architectures employ a multiprocessor signal processing architecture coupled with an existing modem design that provides flexibility to the receiver or transmitter waveform processing algorithms while still leaving ample room to accommodate future system-level design changes and updates. The disclosed SDP architectures further enable existing terminals to be compatible with updated or new radio systems, such as next generation satellite systems. As a result, radio systems or terminals that incorporate the disclosed SDP architectures can continue to function with existing systems while being ready to communicate with systems with different characteristics in the future.
[0059] The disclosed processing nodes are configured to provide supporting circuitry around a DSP core to facilitate signal processing. Each processing node includes a DSP core plus supporting circuitry to provide flexibility at strategic locations inside the encoder, modulator, demodulator, decoder, etc. in a radio system. The disclosed processing nodes can then be implemented at multiple locations inside the radio system rather than having to custom design a module at each location. The disclosed processing nodes are superior for design implementation because a processing node can be synthesized once and then stamped at different locations in the radio system to provide flexible capabilities.
[0060] FIG. 1 illustrates an example radio system with a modem 100 that incorporates a plurality of processing nodes 110 with a plurality of hardware blocks 104. The modem 100 is configured to receive signals (Rx in), to process the received signals using the processing nodes 1 10 and the hardware blocks 104, and to output the processed received signals (Rx out). In addition, the modem 100 is configured to receive signals for transmission (Tx in), to process the signals for transmission using the processing nodes 110 and the hardware blocks 104, and to output the processed signals for transmission (Tx out).
[0061] The processing nodes 110 are configured to be implemented in an SDP architecture to operate as part of a software-defined physical layer, as described herein. The hardware blocks 104 include modules and blocks that provide signal processing functionality and may be incorporated in different portions of the signal processing chain. The processing nodes 110 can be implemented as part of the signal processing chain to provide flexible processing capabilities to one or more of the hardware blocks 104 and/or to bypass processing by one or more of the hardware blocks 104. The hardware blocks 104 can be implemented as part of a receive signal processing chain and/or a transmit signal processing chain. The hardware blocks 104 can be implemented in a demodulator block of the modem 100, a transmit block of the modem 100, a decoder block of the modem 100, a CPU subsystem of the modem 100, and the like.
[0062] The SDP architecture of the modem 100 is configured to flexibly utilize the processing nodes 110 in the signal processing chain. In addition, the SDP architecture of the modem 100 can include shared memory (e.g., capture memory arrays (CMAs)) and one or more network interconnects that tie the hardware blocks 104, the processing nodes 110, and the shared memory together. In addition to being a centrally connected multi-core system, each processing node 1 10 can be configured to interface with individual hardware blocks 104 of the receiver and/or transmitter signal processing data paths. The processing nodes 1 10 can be arranged relative to demands and/or requirements of an encoder, decoder, modulator, demodulator, etc. The plurality of processing nodes 110 may enable the SDP architecture to reassign processing resources on demand. Additionally, the plurality of processing nodes 1 10 may enable passing of data between external resources and the SDP architecture.
[0063] The plurality of processing nodes 110 can be disposed in different locations of the signal processing chain. This enables a specific processing node of the plurality of processing nodes 1 10 to be instantiated or included as a local processing feature. For example, one instantiation of a processing node may be included for each of transmitting, demodulator modules, decoder, etc., where each processing node instantiation is programmed to provide particular processing abilities based on its use case.
Processing Nodes
[0064] FIG. 2 illustrates an example processing node 210 that includes a plurality of DSP cores 214a, 214b and a plurality of extended direct memory access controllers (DCX) 212a, 212b, 212c, 212d. Because each DSP core of the plurality of DSP cores 214a, 214b can be interchangeable or can provide interchangeable functionality, reference to a DSP core 214 should be considered to reference an individual DSP core of the plurality of DSP cores 214a, 214b. Similarly, because each DCX of the plurality of DCX 212a-212d can be interchangeable or can provide interchangeable functionality, reference to a DCX 212 should be considered to reference an individual DCX of the plurality of DCX 212a-212d. Each DCX 212 provides a direct memory access (DMA) with extended capabilities to move data in and out of the memory, examples of which are described herein. Moving data in and out of the memory of the plurality of DCX 212a-212d can be done in conjunction with processing by the plurality of DSP cores 214a, 214b and/or it can be done in conjunction with processing by an external hardware block. Communication between the plurality of DSP cores 214a, 214b, the plurality of DCX 212a-212d, and/or with external hardware blocks can occur through a processing node (PN) network interconnect 216.
[0065] Each DCX 212 and DSP core 214 includes one or more interfaces to communicate with the PN network interconnect 216. In some embodiments, each DCX 212 and DSP core 214 includes a master interface and a slave interface coupled to the PN network interconnect 216. This allows different components to utilize an individual DCX 212 and/or DSP core 214 as slave components. This also allows an individual DCX 212 and/or DSP core 214 to act as master for different components coupled to the PN network interconnect 216.
[0066] The PN network interconnect 216 can communicate with a higher- level network interconnect (e.g., an SDP network interconnect, as described herein) to communicate with other processing nodes. To interface with the SDP interconnect, the PN network interconnect 216 includes an SDP master interface and an SDP slave interface. The SDP master interface is configured to enable communication with other processing nodes wherein the processing node 210 acts as a master to the other processing nodes. In other words, the processing node 210 can send data and/or commands to other processing nodes of an SDP architecture to utilize the processing and/or memory of the other processing nodes through the SDP master interface. A master component includes any component that can perform read or write operations on another component (a slave component). Similarly, the SDP slave interface is configured to provide communication with other processing nodes where the processing node 210 acts as a slave to the other processing nodes. In other words, the processing node 210 can receive data and/or commands from other processing nodes to enable the other processing nodes to utilize the processing and/or memory of the processing node 210 through the SDP slave interface. As an example, incoming data from a master processing node on the SDP slave interface can cause the processing node 210 to execute code stored in memory of a DSP core 214.
[0067] The PN network interconnect 216 also includes one or more configuration ports (or configuration interfaces). A configuration port is connected to a hardware block. The configuration ports can interface with hardware blocks to allow the processing node 210 to configure aspects of the hardware block to be compatible with working with the processing node 210. For example, the configuration ports allow the processing node 210 to write configuration data to the hardware block which can include parameters to enable transferring data to and/or from the hardware block. The configuration data can include any configuration that the hardware block supports.
[0068] The hardware block can be connected to one or more of the plurality of DCX 212a-212d. For example, a hardware block can send data for processing to the processing node 210 through a packet Rx interface of a DCX 212. As another example, the processing node 210 can send processed data to the hardware block through a packet Tx interface of a DCX 212. In some implementations, the configuration ports are used to configure the hardware block to send data to a DCX 212 through the packet Rx interface of the DCX 212 and to receive data from the DCX 212 through the packet Rx interface. The sample interfaces (or the packet Rx/Tx interfaces) may be FIFO interfaces that allow flexible options to connect to hardware blocks (e.g., communications components) for moving data in and/or out from the data path. As another example, where there are multiple interfaces within a hardware block and it is desirable to tap data from the hardware block in parallel, it may be beneficial to use a plurality of DCX. Similarly, using a plurality of DCX to interface with a hardware block may be useful where it is desirable to feed data into a hardware block in parallel, or where it is desirable to have a mix of receive and transmit data into the hardware block. This may also be useful where a hardware block is sub-divided into smaller functional blocks and it may be beneficial to interface with the smaller functional blocks using a dedicated DCX for individual functional blocks.
[0069] The configuration port allows the hardware block to be configured to operate normally, for example, and it can also be used to read hardware block status, hardware block configuration, to monitor the hardware block configuration etc. In addition, if it is desirable to pass data from the hardware block to the processing node 210, the configuration ports can be used to configure the hardware block to route data to the processing node 210 and to configure the interaction so that the processing node 210 can receive the data (e.g., via a packet interface at a DCX 212). The hardware block can be a decoder block, a demodulator block, etc. The configuration data sent over the configuration interface can be configured to tell the hardware block which data path to follow (e.g., to send data to a particular packet interface). Thus, the configuration ports act as a control plane interface with a hardware block and the hardware blocks can connect to respective packet interfaces of the plurality of DCX 212a-212d to transfer data to the processing node 210.
[0070] Accordingly, the processing node 210 can be configured as a dual core node that can connect to different hardware modules. The processing node 210 provides a configuration bus as well as interfaces to receive and to send samples to and from the hardware blocks to become part of the data path and/or to perform captures of data. Individual DCX 212 can be associated with a particular DSP core 214. The PN network interconnect 216 allows high bandwidth data transfers between the plurality of DSP cores 214a, 214b and the plurality of DCX 212a-212d. An individual DCX 212 may also form part of the work descriptor chain, as described in greater detail herein with respect to FIGS. 9, 10A, 10B, 10C, and 10D.
[0071] The plurality of DCX 212a-212d provide direct memory access (DMA) which allows data to be transferred to or from memory. This may be accomplished using the DSP cores 214 and, may be advantageously accomplished without involvement from an external processor which may improve overall computer speed . The DSP cores 214 can be configured to control and/or configure individual DMA and set up data paths that are to be utilized for the desired functionality. DMA transfers a block of data between memory of the DMA and another location, such as external memory, internal memory to a DSP core, or other DCX memory. Each DCX 212 includes a DMA controller to control the activity of accessing memory directly.
[0072] In some implementations, the processing node 210 includes a first DSP core 214a and a second DSP core 214b. The processing node 210 also includes a plurality of DCX 212a-212d, each DCX 212 having shared memory space (not shown in this figure but examples of which are described herein with reference to FIGS. 3A and 4), an input packet interface, and an output packet interface. The input packet interface is configured to receive samples from a hardware block separate from but coupled to the processing node 210. The shared memory space is configured to store the received samples. The output packet interface is configured to transmit to the hardware block samples processed by the first DSP core 214a or the second DSP core 214b. The input packet interface and the output packet interface of the DCX are configured to provide a flexible plugin interface to enable connecting with any generic streaming interface within the data paths of the hardware blocks. These streaming interfaces can be simple, such as a valid signal and/or an associated data bus. These streaming interfaces can also be more complex with additional start and end of framing signals. These packet interfaces are thus designed to be generic so that adaptation to modules is either minimal or unnecessary to enable such modules to interface with DCX 212.
[0073] The processing node 210 also includes the PN network interconnect 216 configured to communicably couple the first DSP core 214a, the second DSP core 214b, and the plurality of DCX 212a-212d. Each DSP core 214 and each DCX 212 is coupled to the PN network interconnect 216 through a respective master interface and a respective slave interface. The PN network interconnect 216 further includes an SDP master interface and an SDP slave interface each configured to communicate with an SDP network interconnect (not shown in this figure but examples of which are described herein with reference to FIGS. 6A, 6B, 7A, and 7B). The processing node 210 is configured to be integrated into a radio transceiver that includes the hardware block and to interface with the hardware block to provide configurable processing functionality to the radio transceiver.
[0074] The processing node 210 can include a queue interface configured to transfer commands or data from the first DSP core 214a to the second DSP core 214b and to transfer commands or data from the second DSP core 214b to the first DSP core 214a. The inter-processor queues 218 can be unidirectional (e.g., a first inter-processor queue going from the first DSP core 214a to the second DSP core 214b and a second inter-processor queue gone from the second DSP core 214b to the first DSP core 214a), bidirectional (e.g., a single inter-processor queue that can transfer messages between the first and second DSP cores 214a, 214b), or the interprocessor queues 218 can provide bidirectional functionality using a combination of unidirectional queues. The inter-processor queues 218 are configured to provide interprocessor communications or descriptors such as commands and messages. This can be done to synchronize operation between the plurality of DSP cores 214a, 214b and/or to inform the plurality of DSP cores 214a, 214b of the location of data for processing. The inter-processor queues 218 comprise FIFO queues that go directly between the plurality of DSP cores 214a, 214b.
[0075] In some implementations, each DSP core 214 includes a general- purpose input-output (GPIO) port connected to a configuration register and configured to receive input for placement in the configuration register and to transmit data stored in the configuration register. For example, the GPIO port can include a 32-bit input to each DSP core 214 and/or a 32-bit output from each DSP core 214. The GPIO port provides flexibility to the processing node 210. In some embodiments, the GPIO port is coupled to one or more configuration registers that allows any master component to write to the 32-bit register of the associated DSP core 214 and allows any slave component to read from the 32-bit register of the associated DSP core 214. This may be advantageous for debugging firmware and/or to troubleshoot because the GPIO can be used to output standard values to enable an operator to determine where an error is occurring. The GPIO ports may also be configured to receive hardware events, receive external events from hardware, generate triggers from hardware events, and the like. In some embodiments, the plurality of DSP cores 214a, 214b each include a debug and trace port that connects to a CPU subsystem or operating system interface. This allows external data to be used to debug the performance of the plurality of DSP cores 214a, 214b.
[0076] In some embodiments, interrupt requests (IRQs) can provide functionality that could otherwise be provided by the GPIO ports. For example, each DSP core 214 can be configured to receive interrupt requests. The IRQs can be received via dedicated ports or pins (not shown) in the DSP cores 214a, 214b. In some instances, the IRQs can also be received via dedicated ports in a CPU subsystem, as described herein. The interrupts can be configured via interconnect and/or configuration ports in the DSP cores 214a, 214b and/or CPU subsystem. In some instances, in addition to receiving IRQs on via dedicated ports or pins, a polling routine can be implemented on a DSP processor and/or within the CPU subsystem to read the interrupt registers through the interconnect or configuration ports to detect pending IRQs. The plurality of DSP cores 214a, 214b can be configured to implement a multiplexing scheme to receive IRQs to enable the plurality of DSP cores 214a, 214b to filter interrupts. This allows the plurality of DSP cores 214a, 214b to respond to particular interrupts and to ignore other interrupts. Configuration for enabling or disabling interrupts can be performed via communication with the hardware blocks (e.g., through the configuration ports). This can be used to configure the hardware blocks to generate interrupts for certain configured events. A configuration can also be implemented at the plurality of DSP cores 214a, 214b to dictate to which interrupts the plurality of DSP core 214a, 214b responds. For example, many interrupts can be accessed by the plurality of DSP cores 214a, 214b but the configuration dictates to which interrupts the plurality of DSP cores 214a, 214b responds. When a DSP core 214 receives an interrupt to which it is configured to respond, the respective DSP core 214 reads registers of the hardware block using the configuration ports to determine the nature of the interrupt. The interrupts can be used for a variety of purposes including, for example and without limitation, errors, timers, DMA interrupts, transfer notices, packet queue/interface full, packet queue/interface empty, etc.
[0077] In some embodiments, the first DCX 212a is configured to receive a plurality of samples from the hardware block to be processed through the input packet interface. The first DCX 212a is then configured to temporarily store the plurality of samples in the shared memory space of the first DCX 212a. The first DCX 212a is also configured to transmit the plurality of samples to the first DSP core 214a through the PN network interconnect 216. The first DSP core 214a is configured to program the first DCX 212a to convey the plurality of samples to the first DSP core 214a, to place the plurality of samples into an internal memory space of the first DSP core 214a where the first DSP core 214a is configured to process the plurality of samples, and to place the processed samples into the internal memory space of the first DSP core 214a. In some embodiments, the first DCX 212a is further configured to reformat the received plurality of samples. In some embodiments, reformatting the received plurality of samples includes sign extending samples of the received plurality of samples to increase the number of bits for each sample. In some embodiments, reformatting the received plurality of samples includes bit clipping samples of the received plurality of samples to reduce a resolution of each sample.
[0078] In some embodiments, the first DCX 212a is configured to receive a plurality of samples through the input packet interface, to temporarily store the plurality of samples in the shared memory space, and to use the output packet interface to transmit the plurality of samples to the hardware block separate from the processing node 210 for processing. The first DCX 212a can be further configured to receive the processed plurality of samples from the hardware block and to temporarily store the processed plurality of samples in the shared memory space.
[0079] The processing node 210 can operate in different modes. For example, the processing node 210 can be configured to receive a stream of data from a hardware block. As another example, the processing node 210 can be configured to respond to an interrupt that can be used to identify certain data for capture (e.g., to capture data for a particular event). As another example, the processing node 210 can operate using a circular buffer. The processing node 210 can include a list of 100 buffers, for example. The processing node 210 returns the data in the top of the list of buffers back to its own queues and can keep overwriting the data so that the buffer includes the last 100 buffers of data. At any point, the processing node 210 can move the data in the buffers into memory for processing and analysis. That is, the buffer pointers corresponding to the last 100 buffers of data can be sent to the work descriptor queue rather than keeping the buffer pointers in the buffer pointer queue, examples of which are described in greater detail herein with reference to FIGS. BAWD. As another example, the processing node 210 can operate in a buffer ID mode where a buffer ID is used to assign data to a particular DSP core. In this mode, a buffer ID that ends in 0, for example, is sent to core 0 of the first DSP core 214a, a buffer ID that ends in 1 is sent to core 1 of the first DSP core 214a, etc. In this mode, certain buffer IDs can also be configured to be ignored (e.g., the buffer pointer is sent back to the buffer pointer queue rather than to the work descriptor queue, examples of which are described in greater detail herein with reference to FIGS. 8A-10D).
[0080] The processing node 210 is an example of a processing node that forms a basic building block of the disclosed SDP architectures. Each processing node in the SDP architecture can be configured to include a number of dual core processors (e.g., 2 DSP processors) and may comprise a number of DOX allowing a wide variety of data movement options between memory and the data path. The processing node 210 may also include a shared memory space (for example, in each DCX 212). The plurality of DCX 212a-212d may also have dedicated packet interfaces to transfer samples to and from hardware blocks.
[0081] In some embodiments, the plurality of DSP cores 214a, 214b include a VLIW (Very Large Instruction Word) SIMD (Single Instruction Multiple Data) processor tailored for complex number processing. In some embodiments, the plurality of DSP cores 214a, 214b may comprise a 32-way multiplier-accumulator (MAC), which enables the plurality of DSP cores 214a, 214b to do up to 32 parallel MAC operations every clock cycle (or 8 complex multiplication and accumulation operations).
[0082] The processing node 210 is configured to be sufficiently flexible so that it can be placed at various locations within a radio transceiver (e.g., encoder, modulator, demodulator, decoder, etc.) as needed. By having a versatile design, the processing node 210 can be implemented at a particular location and can be programmed to provide functionality based on the location it is implemented instead of requiring specific or custom designs at different locations. This results in a simplified design stage.
[0083] FIG. 3A illustrates a diagram of an example extended DMA controller or DCX 312 of a processing node, such as the DCX 212 of the processing node 210 of FIG. 2. The DCX 312 may include a packet transmit interface 301 for outputting samples from a processing node. The DCX 312 may also include a packet receive interface 302 for receiving samples from a hardware block for storage and/or processing with the processing node. The DCX 312 also may include one or more configuration registers 303 that can be configured to pass data, messages, and/or commands to and from the DCX 31 . In some implementations, the DCX 312 includes one hundred or more configuration registers 303 (e.g., each DMA can have 20 or more associated configuration registers). The DCX 312 also can include a memory arbiter 304 that controls access to DCX memory 308 comprising RAM banks 1 and 2. The DCX 312 also includes two sub-DMA memory modules 306, 307. Advantageously, the DCX 312 can be repeated in a processing node allowing for a configurable set of DMA channels to be chained together, including with DMA channels of other processing nodes and other shared memory (e.g., CMAs).
[0084] The DCX 312 includes a memory arbiter 304 that regulates read and write operations to DCX memory 308 via a slave interface. The slave interface means that the memory arbiter 304 receives requests to read and/or write to the DCX memory 308 from other processing nodes via the network interconnect. The DCX memory 308 can thus be configured based on application. For example, the DCX memory 308 can be used to buffer samples received from a hardware block (e.g., a hardware data path). As another example, the DCX memory 308 can be used as a private scratch memory for one or more of the plurality of DSP cores. The memory arbiter 304 allows a dual read/write interface to the DCX memory 308, as described in greater detail herein with respect to FIG. 13.
[0085] The DCX 312 can include a DCX single atomic write control (SAWC) channel, or SAWC 305, with a dedicated access port on the network interconnect to allow the DCX 312 to control state machines to perform single atomic writes to a programmable register offset, for returning buffer pointers, and for sending work descriptors to the DSP cores. FIGS. 3C and 3D illustrate additional detail regarding the DCX 312.
[0086] The DCX 312 includes the SAWC 305 and sub-DMA modules 306, 307, as described. Each sub-DMA module 306, 307 includes a DMA read component 331 and a DMA write component 332 with work descriptor DMA channels 334, reformat FIFOs 333, and a work descriptor controller 339. The SAWC 305 includes a SAWC write component 341 , configuration registers with work descriptor and buffer pointer queues 343. The SAWC write component includes individual write-only DMA channels that are arbitrated in a round-robin fashion. The write-only DMA channels are provided buffer pointers or work descriptors from the various blocks within the DCX 312. The SAWC 305 enables higher quality of service for the single write DMA by letting the network interconnect handle the arbitration between the two sub-DMA modules 306, 307. Arbitration at the NIC level occurs where transactions from any of the AXI masters within the processing node (e.g., individual DMAs, SAWC, DSP cores, etc.) are attempting to be sent to a common slave memory internal or external to the processing node. The DCX 312 uses multiple instances of DMA read and write engines, including two configurable sub-DMA modules 306, 307 that interface with the network interconnect and other interfaces (e.g., packet Rx and Tx interfaces) to facilitate data transfers to other processing nodes and/or hardware blocks. Each subDMA module 306, 307 controls a single write channel and a single read channel. Each sub-DMA module 306, 307 includes 4 read channels in the DMA read component 331 and 4 write channels in the DMA write component 332, where each read channel and write channel operate and are configured as a single entity. For example, the read channel and write channel operate as a single entity for performing a transfer from address A to address B, the read channel performs the read transaction over the interconnect to address A, temporarily buffers data in the reformat FIFO, which is then used by the write channel to perform a write transaction to address B over the network interconnect. Read channels can be arbitrated amongst themselves and write channels amongst themselves, providing efficient usage of the single read and single write interface between the DMA and the network interconnect. This allows read and write operations to be queued up back-to-back to increase efficiency in bandwidth utilization out of the interface. Each sub-DMA module 306, 307 uses work descriptors from a work descriptor queue and buffer pointers from a buffer pointer queue for managing data transfers (examples of which are described herein with reference to FIGS. 8A-10D).
[0087] In some instances, DMA channels can be configured to operate in a normal DMA operation where each channel can be programmed to perform transfers from address A to address B. The configuration related to the transfer of data can be configured using the available configuration registers for that DMA channel. Configuration data can include, for example and without limitation, destination address, source address, memory attributes, interconnect transaction attributes, transfer length, mode of operation, etc. In certain instances, DMA channels can be configured to operate in a WQ mode of operation where three of the four channels are provided work descriptors and buffer pointers from the WQ controller 339 in a round robin fashion so that the three DMA channels can operate back-to-back increasing the efficiency of transactions across the network interconnect 316. Typically, DMA transfers are broken into smaller configurable transaction sizes, which allows a single DMA channel to keep access to the read/write control of the burst for a single transaction. After that, the next DMA waiting in the queue gets access to the interfaces, which proceeds until each DMA channel has completed its transfer.
[0088] The DCX 312 also includes the packet receive interface 302 and the packet transmit interface 301 which communicate with the memory arbiter 304, as described herein. The DCX 312 also includes a Tx compose component 355 and a Rx parse component 354, examples of which are described herein with reference to the TX compose module 1 125 and the RX parse module 1175, respectively.
[0089] In some implementations, the configuration registers 303 can pass work descriptors and buffer pointers to the two sub-DMA modules 306, 307 and can receive work descriptor queues and buffer pointer queues from the two sub-DMA modules 306, 307. For example, a sub-DMA module 306, 307 can fetch a buffer pointer from a buffer pointer queue and can start a transfer of data upon receiving a work descriptor. Upon completion of the transfer, the sub-DMA module 306, 307 returns the buffer pointer from the received work descriptor (indicating the buffer pointer is available) and provides a new work descriptor with the fetched buffer pointer (indicating the buffer pointer now points to data) to the SAWC 305.
[0090] A first sub-DMA module 306 is coupled to the network interconnect through a master interface to transfer data to and from memory that is mapped as a slave (which may include memory in other processing nodes and/or other shared memory such as CMAs) on the network interconnect. The second sub-DMA module 307 effectively allows connecting the read and write channels to hardware blocks to get samples to and from the hardware blocks and to read and write directly to the DCX memory 308. Direct access to the DCX memory 308 allows the DCX memory to act as a capture buffer for packets or samples and allows other DMA and DSP cores access to the DCX memory through the network interconnect.
[0091] FIG. 3B illustrates a diagram of an example DSP core 314 of a processing node, such as the DSP core 214 of the processing node 210 of FIG. 2. The DSP core 314 represents one of a plurality of DSP cores that may be present in a processing node. For example, the DSP core 314 can be one of two DSP cores in a processing node, such as the processing node 210 of FIG. 2. The DSP core 314 includes one or more configuration registers 321 that allows control or command data to come in from external processing nodes, e.g., through a network interconnect. The control or command data includes buffer pointers, work descriptors, and messages. Any external master, or the DSP core 314 itself, can perform a write transaction over the interconnect to write control or command messages via the configuration registers 321 . In some embodiments, there are dedicated configuration registers for each of the queues shown in the figure. In such embodiments, a write to those specific configuration registers pushes the work descriptor or message into the FIFO, which the DSP core 314 can then read. The configuration registers 321 can pass data in the form of FIFO queues 322. The FIFO queues 322 can include data such as a buffer pointer queue, a message queue, and a work descriptor queue that are passed from a different processing node to the DSP core 314. The queues 322 are passed to the DSP processor 324, the DSP processor 324 including internal ram and cache, wherein the DSP processor 324 processes the data in the queues 322. In some implementations, the buffer pointer queue includes 32-bit words whereas the message queue and the work descriptor queue include 64-bit words.
[0092] In addition, the DSP core 314 can receive messages from a different DSP core of the processing node through a FIFO interface 323. The DSP core 314 can also receive interrupts (IRQs) and send FIFO interface messages through queues 325. In some embodiments, a processing node includes more than 2 DSP cores and the message queue (the FIFO interface 323) can be used to pass messages directly between the DSP cores. In some embodiments, the configuration registers 321 are used to pass messages directly between the DSP cores of a particular processing node as well as between external processing nodes and hardware blocks. In some implementations, the DCX returns buffer pointers back to the DSP core 314 through the configuration registers 321 .
[0093] The configuration registers 321 can be configured to store data that is passed between components of an SDP architecture, as described herein. The configuration registers 321 can be configured to enable the DSP core 314 to communicate with different components, such as a different DSP core of the processing node, a DCX of the processing node, an external processing node (e.g., a processing node different from the processing node that includes the DSP core 314), and/or an external hardware block.
Example Data Flow in a Processing Node
[0094] FIG. 4 illustrates a diagram of data flow through an example processing node 410. The processing node includes a DSP core 414, similar to the DSP cores 214, 314, and a DCX 412, similar to the DCX 212, 312. The DSP core 414 and the DCX 412 are each connected to a PN network interconnect 416, similar to the PN network interconnect 216. The PN network interconnect 416 enables communication between the DCX 412, the DSP core 414, hardware blocks (e.g., via interrupts), and external processing nodes. The PN network interconnect 416 includes a configuration interface to configure external hardware blocks as described herein, an SDP master interface for sending data to external slave processing nodes, and an SDP slave interface for receiving data from external master processing nodes. The DCX 412 includes one or more configuration registers that communicate with the PN network interconnect 416 to send and receive data between components of the processing node 410 as well as external processing nodes. Similarly, the DSP core 414 includes one or more configuration registers that communicate with the PN network interconnect 416 to send and receive data between components of the processing node 410 as well as external processing nodes. In addition, the one or more configuration registers of the DSP core 414 can be used to pass buffer pointers, work descriptors, and messages to the DSP core 414 from external processing nodes or from other components of the processing node 410. In some embodiments, the DSP core 414 includes other FIFO queues that can be used to transfer messages between cores of the processing node 410. [0095] Data can be received from an external hardware block (not shown) at a packet receiver of the processing node 410. The data can be samples captured from the hardware block (e.g., modems or communication hardware). The hardware block can be configured for an appropriate or suitable capture interface using the configuration port of the PN network interconnect 416, as described herein. The packet receiver is coupled to the DCX 412. Data flows in from the packet receiver to a write port of a memory arbiter of the DCX 412. The write port writes the received data to shared memory (a DMA) of the DCX 412 where it is temporarily stored before being processed. When ready, the DSP core 414 programs the DCX 412 to convey a set of data for processing to internal memory of the DSP core 414 (e.g., internal data RAM of the DSP core 414). Typically, the DSP core 414 configures the data path to be used for sending and receiving samples. This is accomplished by programming the packet receiver to create packets of a configured size and to store the packets in shared memory. Next, a corresponding work descriptor is created with the size and buffer pointer information and forwarded to the assigned work descriptor queue DMA (or WQ-DMA). Upon receiving a work descriptor, the WQ-DMA is configured to move data to DSP core memory and to create a new work descriptor which is sent to the DSP core work queue FIFO interface through the DSP configuration interface. The WQ-DMA is also configured to release the buffer pointer for the work descriptor it receives from the packet receiver to release the buffer pointer back to the packet receiver’s buffer pointer FIFO for reuse once the data is moved to DSP memory. Similarly, the DSP core 414 programs the transmit data path to set the work descriptor and buffer pointer flow in the reverse direction to enable sending data to the packet transmitter. With this configured, the DSP core 414 can receive samples from connected hardware blocks through the configuration interface to the hardware block. This allows for data from the connected hardware blocks to the packet receiver to be placed in DSP memory. This triggers sending a work descriptor to the DSP core 414 without requiring the DSP core 414 to oversee and manage data transfers. Processed data can be temporarily stored in internal memory of the DSP core 414. The processed data can then be passed back to shared memory (a DMA) of the DCX 412 for temporary storage. When the data is ready to be transferred to an external hardware block after processing by the processing node 410, the data is transferred back to a read port of the memory arbiter of the DCX 412 where it is read by the packet transmitter and transmitted to the external hardware block. Thus, the shared memory of the DCX 412 can be used to buffer samples for processing. In some implementations, the shared memory of the DCX 412 can be repurposed as additional memory for the DSP core 414.
[0096] In addition to moving samples in/out from the memory, the DCX 412 can be configured to reformat data when moving data through the DCX 412. For example, each of the DMA engines inside the DCX 412 can include data reformatting logic around the FIFO that connects the read and write channel. The reformatting logic can be configured to help with certain operations, like sign-extend samples from 8 bits to 16 bits, perform bit-clipping to reduce the sample resolution from 16 bits to 8 bits, and so forth.
Example SDP Architectures
[0097] The disclosed SDP architectures enable flexibility in using the DSP cores and support different configurations and ways to provide multi-core capability to the SDP architecture. Each processing node can be configured to be used as a dualcore configuration or multiple nodes can configured to form a larger multicore functioning body. Through the use of the network interconnects, each processing node can access the DCX and/or DSP cores of another processing node. In the dual core configuration (e.g., a single processing node) and in the multicore configuration (e.g., multiple processing nodes), the DSP cores can operate in a shared memory model where the cores share access to shared memory while having access to a private memory, and a producer-consumer model where the DSP cores can message each other.
[0098] FIG. 5A illustrates an example of a shared memory model for DSP cores 514a, 514b in a processing node. In the shared memory model, the DSP cores 514a, 514b have access to local shared memory 515 (which resides in DCX 512, as described herein with reference to FIG. 4, for example, or any memory that the DSP core 514a, 514b can access through a network interconnect) to allow operating in this mode. The local shared memory 515 can be the shared memory of any of the DCX, a section of CMA marked as shared between DSP cores, or some allocated memory in external DRAM/DDR. The private SRAMs 513a, 513b can be used for scratch data while processing objects or the private SRAMs 513a, 513b can be used as additional memory shared between the cores 514a, 514b. This can be extended to multiple DSP cores in different processing nodes. Through the use of the network interconnect, each DSP core can access shared memory space residing in a particular DCX (that may reside in a different processing node) while each DSP core can access private SRAMs residing in the processing node in which the DSP core resides.
[0099] FIG. 5B illustrates an example of a producer-consumer model for DSP cores 514a, 514b. The DSP cores 514a, 514b have a queue interface 518 (or a queue can exist in shared memory) and dedicated GPIOs that could be used as message queues within a single processing node. This would allow messaging between the DSP cores 514a, 514b, which could be simple commands or object data. If the queue interface 518 is unidirectional, the first DSP core 514a can act as a producer and the second DSP core 514b can act as a consumer with the queue interface 518 acting as the queue of tasks filled by the first DSP core 514a (the producer) and popped by the second DSP core 514b (the consumer). This can be extended to multiple DSP cores in different processing nodes. For communication and synchronization between multiple processing nodes, additional memory-mapped message queues can be used to connect the DSP cores of different processing nodes, the memory-mapped message queues similar to the message queues between DSP cores of the same processing node. In some implementations, the additional memorymapped message queues transfer data using an SDP network interconnect (e.g., using configuration registers as described herein).
[0100] FIGS. 6A and 6B illustrate example SDP architectures 600a, 600b. As illustrated in FIG. 6A, the SDP architecture 600a includes processing nodes 610a- 61 Of coupled to an SDP network interconnect 640 via respective SDP master interfaces and SDP slave interfaces. The SDP architecture 600a also includes shared memory in the form of a capture memory array 630 (or CMA) that is coupled to the SDP network interconnect 640 via an SDP master interface and an SDP slave interface. The SDP architecture 600a also includes a CPU subsystem 650 (e.g., a Linux application layer) that is coupled to the SDP network interconnect 640 via an SDP master interface and an SDP slave interface. In some embodiments, each bidirectional arrow coupling a corresponding component of the SDP architecture 600 to the SDP network interconnect 640 may represent one of the SDP master and slave interfaces, such as one SDP master and one SDP slave interface for each component. The processing nodes 610 are configured to communicate with one another via the SDP network interconnect 640, as described in greater detail herein. This enables an individual processing node 610 of the SDP architecture 600a to use a different processing node 610 of the SDP architecture 600a for memory and/or processing. As described herein, the processing nodes 610a-610f can be implemented in different portions of a radio transceiver or system. In such instances, the processing nodes 610a-610f can be configured to communicate with each other and to utilize the memory and processing capabilities of other processing nodes via the SDP network interconnect 640.
[0101 ] The capture memory array 630 is a memory array that allows on-chip storage and that can be used for capturing samples for processing, for providing a scratch area, and for storing lookups for processing. The capture memory array 630 comprises continuous address space which is made up of multiple banks of SRAM. In some implementations, the capture memory array 630 can be designed using interleaved single port RAMs to function as a read/write memory with handshakebased interface for an area/complexity efficient design. The capture memory array 630 may comprise a set of large memory banks connected to the data path allowing low- latency and high bandwidth parallel access to the processors without having to go out to an off-chip memory.
[0102] As described herein, the SDP architecture may further include a capture memory array (CMA) and switches. The CMA can be a common location where data can be stored internal to the modem 100, allowing storage of data such as data samples, intermediate processing results, and so forth. In some implementations, the modem 100 can be implemented as a single chip or multiple chips. [0103] With reference to FIG. 6B, the SDP architecture 600b illustrates an architecture in which the network interconnect is split into multiple segments, a first SDP network interconnect 640a and a second SDP network interconnect 640b, to which the individual processing nodes 610a-610f, capture memory array 630, and CPU subsystem 650 are connected. In some embodiments, one processing node (e.g., the first processing node 610a) can be connected directly to the CPU subsystem 650, and the rest of the processing nodes 61 Ob-61 Of can connect through the first SDP network interconnect 640a or second SDP network interconnect 640b (or a collection of network interconnect segments). Different arrangements of the processing nodes 610, the CPU subsystem 650, and the capture memory array 630 and the SDP network interconnects 640 may similarly exist.
[0104] The SDP architectures 600 are configured to provide flexibility to receiver or transmitter waveform processing algorithms while leaving room to accommodate future system level design changes and updates. The SDP architectures 600 also enable software-based signal processing on chip. In the SDP architectures 600, each processing node 610a-610f can interface with individual hardware modules of both receiver and transmitter signal processing data paths. Thus, the SDP architectures 600 enable DSP processing power to be instantiated at specific key or desired locations in a radio system, such as the encoder, modulator, demodulator, and decoder, as well as allow sufficient connectivity to reassign processing resources for different locations. The SDP architectures 600 also advantageously provide sufficient connectivity to enable passing data between the processing nodes 61 Oa-61 Of as well as with external resources like external memory. Furthermore, because the processing nodes 61 Oa-61 Of can be programmed, the SDP architectures 600 enable customizable signal processing to provide flexibility in modem design to facilitate different types of signal processing for different applications.
[0105] Thus, the SDP architectures 600 include the SDP network interconnect 640 and a plurality of processing nodes 61 Oa-61 Of connected to the SDP network interconnect 640, the plurality of processing nodes 61 Oa-61 Of configured to provide configurable processing power to process receiver and transmitter waveforms in a radio transceiver. As described herein, each processing node 610 may include a plurality of digital signal processing (DSP) cores; a plurality of extended direct memory access controllers (DCX); and a PN network interconnect connected to the plurality of DSP cores, to the plurality of DCX, and to the SDP network interconnect 640. The SDP architectures 600 may include a capture memory array 630 comprising a plurality of memory banks that are connected to the SDP network interconnect 640 to provide access to the plurality of memory banks for the plurality of processing nodes 610. The SDP architectures 600 may also include a CPU subsystem 650 connected to the SDP network interconnect 640. The SDP network interconnect 640 enables communication among each of the plurality of processing nodes 610, the capture memory array 630, and the CPU subsystem 650 to augment processing power and functionality in the radio transceiver.
[0106] In some embodiments, one or more of the plurality of processing nodes 610 can be dynamically allocated to provide signal processing power to one or more hardware blocks of the radio transceiver. This allows for dynamic allocation of processing power to a hardware block. In some embodiments, each processing node of the plurality of processing nodes 610 can be configured to interface with one or more individual hardware blocks of both receiver and transmitter signal processing data paths in the radio transceiver. This allows multiple hardware blocks to access a single processing node to provide flexible storage and memory functionality.
[0107] In some embodiments, external off-chip memory can be further connected to the CPU subsystem 650. In such embodiments, the plurality of processing nodes 610 can be configured to pass data from individual DSP cores to the external memory through the SDP network interconnect 640.
[0108] In some embodiments, the capture memory array 630 includes a plurality of SDP master interfaces to the SDP network interconnect 640. In some embodiments, the CPU subsystem 650 includes an SDP master interface and an SDP slave interface to the SDP network interconnect 640.
[0109] FIG. 7A illustrates an example radio transceiver 700 that includes a CPU subsystem 750 (similar to the CPU subsystem 650), a demodulator 760 divided into multiple demodulator blocks 760a-760d, a high-speed serial or HSS module 770, a decoder module 780, and a transmit module 790. The radio transceiver 700 includes an SDP architecture similar to the SDP architecture 600b of FIG. 6B. The CPU subsystem 750 includes a network interconnect switch (NIC switch) and a cache- coherent network (CCN or bus interconnect) and is coupled to external memory 755 (e.g., DDR memory). The network interconnect switch (NIC switch) is coupled to SDP network interconnect 2 (SDP NIC 2). The SDP architecture (e.g., the SDP network interconnects, the processing nodes, and the CMA described herein with reference to FIG. 6B) may be represented by and accessible via the SDP NIC 2 as shown.
[0110] The demodulator 760 can be divided into a number of modules 760a- 760d which each module having one or more components such as SDP network interconnects, CMAs, and/or hardware blocks (or HWBs). Other components of the radio transceiver can also include hardware blocks as well. The hardware blocks can be configured to provide a number of signal processing capabilities such as signal conditioning, channelization, down-sampling, filtering, equalizing, despreading, descrambling, etc. The hardware blocks can send and receive data to the processing nodes, as described herein.
[0111] The HSS module 770 is configured to convert received analog signals into digital signals for processing using HSS/ADC blocks, which includes two read channels (read channel 0 and read channel 1 ) that are processed in parallel. The HSS module 770 is further configured to pass signals directly between an external chip or system that is part of the software defined physical layer (the external chip or system referred to as SDP in the figure which refers to dedicated SDP interfaces connected a high speed serial (HSS) module of an ASIC) and the first processing node of the demodulator module 760a via the HSS/SDP RX 1 block and the HSS/SDP TX 1 block and between the SDP and the sixth processing node of the transmit module 790 via the HSS/SDP RX 2 block and the HSS/SDP TX 2 block. In some embodiments, the SDP interfaces connect directly to the packet receiver and packet transmitter within the processing node. This enables receiving ADC samples directly into the processing node and sending DAC samples directly from the processing node to the HSS module 770. The HSS module 770 is further configured to convert digital signals to analog signals for transmission using the HSS/DAC TX block that is coupled to the transmit module 790 (transmit channel 0).
[0112] Signals digitized by the HSS/ADCs are passed to the demodulator modules 760a-760c and then to the decoder module 780. Digital signals for transmission are passed to the transmit module 790 and then to the HSS/DAC TX block of the HSS module 770.
[0113] The processing nodes are in communication with one another via the SDP network interconnects 740a, 740b (SDP NIC 1 740a and SDP NIC 2 740b). Furthermore, the processing nodes have access to the CMAs via the SDP network interconnects 740a, 740b.
[0114] Each processing node can be dynamically included in the signal processing data flow as described herein. This allows the modules 750, 760, 770, 780 and their hardware blocks to utilize flexible memory and processing provided by the processing nodes. In addition, the processing nodes can be configured to access data processed by the hardware blocks through DCX interfaces, as described herein. The processing nodes can configure the hardware blocks to pass data to the processing nodes using configuration ports, as described herein.
[0115] By way of example, the transmit module 790 includes a processing node (PN 6) that is coupled to the SDP NIC 1 740a via an SDP master interface and an SDP slave interface. In addition, although not shown explicitly in the figure, the processing node PN 6 is coupled to the MOD block and the ENC block via packet interfaces and configuration ports, similar to those described herein with reference to FIGS. 2, 3A, and 4. The processing node PN 6 can configure the MOD block and/or the ENC block to send data to the processing node PN 6 and to receive processed data from the processing node PN 6. In some instances, this can be done to enhance the signal processing provided by the MOD block and/or the ENC block. In certain instances, this can be done to bypass the MOD block and/or the ENC block.
[0116] Advantageously, the radio transceiver 700 can dynamically assign a first DCX of a first processing node to be a master to write data to shared on-chip memory (e.g., CMAs), external memory, and/or the memory of another DCX and a second DCX of a second processing node to be a master to read the result of processing the data written by the first DCX. The placement of the processing nodes can be configured to enhance particular signal processing elements in the radio transceiver 700. For example, a processing node can be placed in the demodulator front module to aid in demodulating a signal and the processing node may be used to enhance or replace (e.g., bypass) the DSC and/or CDM hardware blocks. The placement of the CMAs can be modified from what is illustrated here as well as the placement and configuration of the SDP network interconnects and processing nodes. The placement of these components can be based on a number of factors including signal processing performance, layout of the chip, connectivity of the block to the other designs, latency, as well as fabrication complexity and cost. The radio transceiver 700 is illustrated as providing parallel data paths to process two waveforms in parallel, however it is to be understood that a single data path may be used, or more than two data paths may be implemented in parallel. Advantageously, the NIC switch and CCN of the CPU subsystem 750 allow the processing nodes to access external memory. However, this may be more expensive in terms of speed, so the radio transceiver 700 also advantageously provides on-chip memory in the form of the CMAs. This may also advantageously help to interface with software running on an external processor.
[0117] FIG. 7B illustrates the demodulator module 760c of FIG. 7A in greater detail to show connections between hardware blocks and a processing node 710 (e.g., PN 4 of FIG. 7A) as well as between the processing node 710 and the SDP NIC 1 740a implemented within the demodulator module 760b. The processing node 710 is similar to the processing node 210 described herein with reference to FIG. 2. A PN network interconnect 716 of the processing node 710 is coupled to an SDP interconnect 740 of the radio transceiver 700 (e.g., SDP NIC 1 of FIG. 7A) to integrate the processing node 710 into the radio transceiver 700. The hardware blocks of the demodulator module 760c include a first hardware block 1 761 a for receive channel 0, a first hardware block 2 762a for receive channel 0, a second hardware block 1 761 b for receive channel 1 , and a second hardware block 2 762b for receive channel 1 . Each hardware block of the demodulator module 760c can be configured using respective configuration ports of the processing node 710. Each hardware block 761 a, 761 b, 762a, 762b is coupled to a respective DCX 712a-712d of the processing node 710 via a packet transmit interface and a packet receive interface. In addition, each hardware block 761 a, 761 b, 762a, 762b is coupled to the PN network interconnect 716 via a respective configuration port.
[0118] As introduced above, data can be passed between the processing node 710 and the hardware blocks (blocks 761 a, 761 b and blocks 762a, 762b) using the packet transmit/receive interfaces of corresponding DCX 712 of the processing node 710. For example, the first hardware block 1 761 a can send data to the processing node 710 via the packet transmit interface of a first DCX 712a, the first hardware block 2 762a can send data to the processing node 710 via the packet transmit interface of a second DCX 712b, the second hardware block 2 762b can send data to the processing node 710 via the packet transmit interface of a third DCX 712c, and the second hardware block 1 761 b can send data to the processing node 710 via the packet transmit interface of a fourth DCX 712d of the processing node 710. Similarly, after processing by a first DSP core 714a and/or a second DSP core 714b, processed data can be passed back to a particular hardware block from the processing node 710 using the packet receive interface of a corresponding DCX 712a- 712d. For example, the first hardware block 1 761 a can receive processed data from the processing node 710 via the packet receive interface of the first DCX 712a, the first hardware block 2 762a can receive processed data from the processing node 710 via the packet receive interface of the second DCX 712b, the second hardware block 2 762b can receive processed data from the processing node 710 via the packet receive interface of the third DCX 712c, and the second hardware block 1 761 b can receive processed data from the processing node 710 via the packet receive interface of the fourth DCX 712d of the processing node 710. In this way, the processing node 710 can provide additional and/or alternative processing for the demodulator module 760c. The remaining processing nodes of the radio transceiver 700 can have similar connections and provide similar functionality for the other modules and hardware blocks of the radio transceiver 700 of FIG. 7A.
Work descriptors and buffer pointers
[0119] As described herein, to make the disclosed SDP architectures operate as desired, the different processing nodes are configured to seamlessly pass data back and forth (e.g., with each other, memory, hardware blocks, etc.). To accomplish this, the disclosed SDP architectures can utilize work descriptors and buffer pointers as part of the data flows. By using work descriptors and buffer pointers as described herein, the SDP architectures can connect different data transfer pieces across the architecture to form a highly configurable data path. The disclosed technology enables the use of each hardware interface and DCX as an interchangeable building block that can be connected to any other hardware interface, DCX, or DSP core within the SDP architecture using a straightforward approach where each component receives a work descriptor that points to the location in memory of the data that needs to be processed and has some pre-allocated memory indicated in a buffer pointer for the component to use for the processed data. Once the component is finished processing the data, it forwards the work descriptor containing a pointer to the location in memory where the newly processed data resides to the next component in the processing chain. As memory is freed up, the location of the newly freed up memory is returned to a buffer pointer queue for use by other components in the SDP architecture. By way of example, memory can be freed up when data is transmitted to a hardware block, moved to a new location in internal or external memory, and/or when data is processed and the result is saved in a new location in memory. Using this approach, data can be passed from one component to the next until it reaches a final destination. Interconnecting data across the SDP architecture in this manner enables connecting virtually any two hardware interfaces, DCX, and/or DSP core to form a desired data path.
[0120] FIG. 8A illustrates an example of a buffer pointer 811. The buffer pointer 811 can include, for example, a buffer address, a buffer pointer identifier, and reserved bits. As shown, the buffer pointer 81 1 may have a length of 32 bits, though other lengths (larger or smaller) may be used as appropriate. The buffer pointer 811 identifies the starting address of a pre-allocated location in memory that is available for a processing node, or a component of a processing node, to utilize for storing data, the location in memory corresponding to the buffer address of the buffer pointer 81 1 .
[0121] FIG. 8B illustrates an example of a work descriptor 821. The work descriptor 821 can include, for example, a buffer address, a buffer pointer identifier, reserved bits, a burst identifier, a length, burst start flag, and a burst end flag. The work descriptor 821 is a pointer to a data set for a processing node or a component in the processing node. In some embodiments, the work descriptor 821 can have a length of 64 bits. The work descriptor 821 is used to point to a location in memory that includes data (e.g., a burst) ready to be analyzed or processed, the location in memory corresponding to the buffer address of the work descriptor 821 . In addition, the length indicated by the work descriptor 821 can be used to identify the amount of memory used to store the data at the buffer address. Additional identifiers and flags of the work descriptor 821 can be used to facilitate analysis and processing of the data. Examples of identifiers or flags include message type, DSP ID, message count, etc. In some embodiments, BPID (buffer pool id) [2:0] and burst ID [1 :0] in the work descriptor 821 can be used to distribute buffer pointer descriptors and work descriptors in a round robin fashion. In some implementations, the burst start and burst end flags in the work descriptor 821 are flags that get tagged by a packet-receiver in a DCX when receiving data from hardware blocks that use or rely on any kind of framing. These flags allow software running on a DSP core to identify a set of descriptors that make up that frame without having to parse the packets that are saved in memory.
[0122] FIG. 9 illustrates an example of a component 900 that utilizes buffer pointers and work descriptors to facilitate data processing in an SDP architecture. The component 900 can operate as a consumer and/or producer in a consumer/producer model. The component 900 receives a work descriptor queue 920, or a list of work descriptors, and a buffer pointer queue 910, or a list of buffer pointers and can output a work descriptor 921 and a buffer pointer 911 . The buffer pointer queue 910 and the work descriptor queue 920 are each maintained by a queue controller or other component separate from the component 900. Thus, the buffer pointer queue 910 and the work descriptor queue 920 are each available to the component 900 but not managed by the component 900. This allows the component 900 to pop an element from a particular queue, removing that element from the respective queue so that it does not appear as available for another consumer/producer in a signal processing chain. [0123] The component 900 can take the next work descriptor from the work descriptor queue 920 to identify data to be processed (causing the work descriptor to be removed from the work descriptor queue 920) and can take the next buffer pointer from the buffer pointer queue 910 to identify a location in memory that is available to store data (causing the buffer pointer to be removed from the buffer pointer queue 910). The component 900 processes the data at the buffer address indicated in the work descriptor to generate processed data and stores the processed data at the buffer address indicated in the buffer pointer. The component 900 then creates a new work descriptor that includes the buffer address where the newly processed data is stored as well as the length of the data stored at the buffer address and outputs the work descriptor 921 to the queue controller that manages the work descriptor queue 920 and/or to a next consumer/producer in a chain. The component 900 can also create a new buffer pointer that includes the buffer address where the data that was just processed was stored and can output that buffer pointer 91 1 to the queue controller that manages the buffer pointer queue and/or to a next consumer/producer in a chain. In other words, the memory location with stored data corresponding to the buffer address in the work descriptor prior to processing is now indicated as available memory in the returned buffer pointer 911 whereas the available memory prior to processing is now indicated as holding newly processed data in the returned work descriptor 921 . In this way, data can be passed between components of a processing node and/or between processing nodes of an SDP architecture.
[0124] The component 900 is configured to retrieve data from the location pointed to by the work descriptor from the work descriptor queue 920. The component 900 then processed the retrieved data. The component 900 uses the buffer pointer from the buffer pointer queue 910 to store the processed data. The component 900 then forms a new work descriptor 921 based on the location of the stored, processed data. The component 900 then outputs the buffer pointer comprising the buffer address from the work descriptor to a pre-configured location to be reused again. The component 900 then sends the work descriptor 921 to the next consumer in the chain. Additional examples of using buffer pointers and work descriptors are described in greater detail herein with reference to FIGS. 10A-10D. [0125] In some embodiments, the component 900 does not receive a work descriptor queue 920 but instead receives data for storage (e.g., a packet receiver of a processing node receiving packets from a hardware block). In such embodiments, the work descriptor queue 920 is not provided to the component 900 and the component 900 acts solely as a producer. The component 900 stores the received data in the buffer addresses of the buffer pointer queue 910 and outputs work descriptors 921 indicating the storage location of the received data. In such embodiments, the component 900 does not output the buffer pointer 911 because no memory was made available in the process.
[0126] Similarly, in some embodiments, the component 900 does not receive a buffer pointer queue 910 but instead receives a work descriptor queue 920 with data for transmitting out of the processing node (e.g., a packet transmitter of a processing node transmitting packets to a hardware block). In such embodiments, the buffer pointer queue 910 is not provided to the component 900 and the component 900 acts solely as a consumer. The component 900 takes the data from the buffer addresses indicated in the work descriptors in the work descriptor queue 920 and transmits the data and outputs buffer pointers to a buffer pointer queue manager, the buffer pointers corresponding to the buffer addresses where the data that was transmitted was stored, indicating that those locations in memory are now available for storage. In such embodiments, the component 900 does not output the work descriptor 921 because no data was processed and stored back in memory.
[0127] By way of example, the component 900 operating with the buffer pointer queue 910 and the work descriptor queue 920 can get data from a location pointed to by a work descriptor in the work descriptor queue 920 and use a buffer address identified in a buffer pointer of the buffer pointer queue 910 for its results. The component 900 may then form a new work descriptor 921 based on the output generated and stored in the buffer pointer queue 910, where the new work descriptor 921 includes the buffer pointer identified in the buffer pointer queue 910 and used for the results generated. The component 900 may then output the buffer pointer 911 to a pre-configured location for reuse by a different component operating on data in a location pointed to by a subsequent work descriptor. The component 900 sends the new work descriptor 921 to the next consumer in a processing chain. The disclosed buffer pointers and work descriptors may be used in various control plane interfaces. For example, buffer pointers and work descriptors can be used in an interface that streams data samples out from a memory to a hardware component. The work descriptor may point to a data packet in memory that contains samples that need to be streamed out to the hardware component, such that the list of work descriptors corresponds to a playlist of data packets to be streamed to the hardware component.
[0128] FIGS. 10A, 10B, 10C, and 10D illustrate examples of passing data through a portion of a data path in an SDP architecture 1000, components of the SDP architecture 1000 operating like the component of FIG. 9. FIG. 10A illustrates receiving data at a packet receiver 1030 through a portion of a data path in the SDP architecture 1000. The packet receiver 1030 receives samples from a hardware block, for example. The packet receiver 1030 acts as a producer in the consumer/producer model, as described herein with reference to FIG. 9, and takes in a buffer pointer queue (BP) along with the received data samples. The packet receiver 1030 takes the received samples and puts them in buffer addresses identified as being available for data storage in the buffer pointer queue (BP). The buffer pointer queue is stored in the DCX memory to which the packet receiver 1030 has access. Once placed in memory, the packet receiver 1030 generates respective work descriptors for inclusion in a work descriptor queue (WQ). This is passed on to a work descriptor queue DMA (WQ DMA 1031 ) that manages the work descriptor queue (WQ). In some instances, work descriptors placed in the shared memory of a DCX are accessible such that other DCX and/or DSP cores can access those work descriptors.
[0129] In addition, the buffer pointers can be tied to a DMA 1020 that monitors a fill level of the buffer pointer queue (BP). Responsive to determining that the fill level of the buffer pointer queue (BP) (e.g., the BP of the WQ DMA 1031 which is in a FIFO) falls below a threshold (e.g., FIFO threshold), the DMA 1020 may input one or more subsequent buffer pointers from the pre-allocated buffer pointer list 1011 to refill the buffer pointer queue (BP) to keep the buffer pointer queue filled to a sufficient level. When the WQ DMA 1031 determines that there is space in the work descriptor list 1012 (a FIFO queue), the WQ DMA 1031 sends a work descriptor to be added to the work descriptor list 1012. If memory is freed up in this operation, the WQ DMA 1031 sends a buffer pointer corresponding to the freed-up memory to the buffer pointer queue associated with the packet receiver 1030. . If the buffer pointer queue falls below the FIFO threshold, the DMA 1020 sends one or more buffer pointers from the pre-allocated buffer pointer list 101 1 to the buffer pointer queue associated with the WQ DMA 1031 . In some implementations, a spare DMA channel in the DCX (e.g., see FIG. 3C) can be used to do flow control operation and to oversee the filling of the buffer pointer queue and/or the work descriptor queue based on a configurable threshold. This spare DMA channel in the DCX can be used as the DMA 1020.
[0130] FIG. 10B illustrates passing data through a portion of a data path in the SDP architecture 1000 to a packet transmitter 1040 for transmission. A WQ DMA 1041 has an associated work descriptor queue (WQ) and it uses the WQ to send work descriptors to the packet transmitter 1040. The packet transmitter 1040 sends data to a hardware block, for example, based on the data identified in the work descriptors of the work descriptor queues (WQ) associated with the packet transmitter 1040. Once data is sent, the memory location that contained that data is freed up and the packet transmitter 1040 generates a buffer pointer that is sent to a buffer pointer queue (BP) associated with the WQ DMA 1041 .
[0131] In addition, a list of work descriptors may be associated with a DMA 1050 that monitors a fill level of the work descriptor queue (WQ) associated with the WQ DMA 1041 . Whenever the work descriptor queue (for example, the WQ of the WQ DMA 1041 , which is in a FIFO) falls below a threshold (e.g., FIFO threshold), the DMA 1050 may input one or more subsequent work descriptors from the work descriptor list 1021 to refill the work queue WQ to keep the work queue filled to a sufficient level.
[0132] When there is a buffer pointer available in the BP FIFO, the WQ DMA 1041 obtains the work descriptor in the WQ and reads it out of the FIFO memory to move into an allocated buffer (pointed to by a buffer pointer identified by the work descriptor) and creates a new work descriptor for the next component (e.g., the packet transmitter 1040). The new work descriptor is stored in the work queue (WQ) leading to the packet transmitter 1040. The packet transmitter 1040 may then operate similarly. When it has an entry in its work queue, the packet transmitter 1040 uses the work descriptor therein to identify and readout the corresponding packet. The packet transmitter 1040 may have a state machine that reads packets out according to the work descriptor and transmits it to corresponding components before the memory in the WQ is freed up and is provided to the WQ DMA 1041 for reuse as a buffer location. Thus, the buffer pointer and the work descriptor are used to exchange information regarding when buffer space is available for the next packet and the work descriptor list 1021 is the list of items to be streamed out by the packet transmitter 1040. Thus, the buffer pointer FIFO effectively operates to control the flow of data. Once all buffer pointers are used and sent to the packet transmitter 1040, the WQ DMA 1041 is configured to stop and wait for the packet transmitter 1040 to return back one of the freed buffer pointers. At this point, the WQ DMA 1041 resumes transfer of work descriptors.
[0133] FIG. 10C illustrates an example of a flow of data for software-based processing using buffer pointers and work descriptors in the SDP architecture 1000. In this example, samples are received by the packet receiver 1030 of a processing node and are forwarded to a DSP core 1032 for processing using work descriptor queues as described herein. The processed samples are then returned to the packet transmitter 1040 for transmission. This process essentially combines the data paths described herein with reference to FIGS. 10A and 10B without the buffer pointer list 1011 and work descriptor list 1021 and associated DMAs 1020, 1050. Thus, when the DSP core 1032 processes the data, it returns the buffer pointer to the buffer pointer queue associated with the WQ DMA 1031 and sends the work descriptor to the work descriptor queue associated with the WQ DMA 1041 to enable the data to be queued for transmission by the packet transmitter 1040.
[0134] FIG. 10D illustrates an example of a flow of data that is stored in on- chip memory using buffer pointers and work descriptors in the SDP architecture 1000. In this example, samples are received at the packet receiver 1030 and put in a work descriptor queue associated with the WQ DMA 1031 which forwards the work descriptor to the work descriptor queue associated with the DSP core 1032. Once processed by the DSP core 1032, the processed data is sent to the work descriptor queue associated with a WQ DMA 1033 where it can be stored directly in on-chip storage such as a CMA. In addition, the processed data can be sent to a work descriptor queue associated with a next consumer 1034 in the signal processing chain where it can be processed and moved along a signal processing data path. In some embodiments, the work descriptor queue can be stored in a CMA until it is needed by the next consumer 1034.
[0135] In each of these examples, the respective WQ DMAs can be configured to move data from memory A to memory B. As a result, the WQ DMAs create a work descriptor with a pointer to memory B. In some implementations, the WQ DMAs also create a buffer pointer to memory A.
[0136] In some embodiments, the SDP architecture 1000 performs a method for passing data between components such as a processing node or a hardware block where the method includes utilizing a buffer pointer queue to manage available memory, the buffer pointer queue comprising a plurality of buffer pointers that each identify a buffer address in memory that is available for storing data. The method also includes utilizing a work descriptor queue to manage packets or samples to be processed, the work descriptor queue comprising a plurality of work descriptors that each identify a buffer address in memory that includes burst data to be processed. Responsive to the packet receiver receiving burst data to be processed, the method includes retrieving a first buffer pointer from the buffer pointer queue, processing the received burst data, storing the processed burst data in memory at the buffer address identified by the first buffer pointer, and outputting a new work descriptor, the new work descriptor including the buffer address identified by the first buffer pointer. The method also includes, responsive to the work descriptor queue having processed data to be transmitted, retrieving a first work descriptor from the work descriptor queue, obtaining the processed data from memory at the buffer address identified by the first work descriptor, releasing the buffer pointer that was associated with the work descriptor, the released buffer pointer corresponding to the buffer address identified by the first work descriptor, and transmitting the processed data by the packet transmitter.
[0137] In some embodiments, the work descriptor further includes a length indicating the total length of the packet. The work descriptor may also further include
- M - a burst start flag indicating that the burst data belongs to the first packet of a burst and a burst end flag indicating that the burst data belongs to a last packet of a burst. The work indicator may also indicate the burst data is a fully contained burst by setting the burst start flag and the burst end flag to true.
[0138] The SDP architecture can be further configured to add the new work descriptor to the work descriptor queue and/or to add the new buffer pointer to the buffer pointer queue.
[0139] In some embodiments, the SDP architecture is further configured to, responsive to receiving the buffer pointer queue and the work descriptor queue, retrieve a second work descriptor from the work descriptor queue; obtain data from memory at the buffer address indicated by the second work descriptor; retrieve a second buffer pointer from the buffer pointer queue; process the retrieved data to generate output processed data; store the output processed data in memory at the buffer address indicated by the second buffer pointer; output a new work descriptor, the new work descriptor including the buffer address indicated by the second buffer pointer; and output a new buffer pointer, the new buffer pointer indicating the buffer address indicated by the second work descriptor. Each work descriptor may also further include a burst identifier of the burst data to be processed and a burst length indicating an amount of storage occupied by the burst data to be processed.
[0140] The SDP architecture can also be configured to monitor a fill level of the work descriptor queue by the DOX and, responsive to determining that the fill level is below a threshold fill level, add one or more work descriptors to the work descriptor queue from a work descriptor list. The SDP architecture can also be configured to monitor a fill level of the buffer pointer queue by the DCX and, responsive to determining that the fill level is below a threshold fill level, add one or more buffer pointers to the buffer pointer queue from a buffer pointer list.
SDP Streaming to Packetized Interface Glue
[0141] To be able to seamlessly pass data between components in the disclosed SDP architectures, data can be formatted in an efficient and consistent manner across the architecture. To do so, incoming RF signals can be digitized and then formatted according to the disclosed data formatting. Once formatted, data can be passed through the disclosed SDP architectures using the work descriptors and buffer pointers disclosed herein. The disclosed data formatting modules are configured to create an adaptation layer between streaming and non-streaming or packetized types of data interfaces. The adaptation layer enables connectivity to the disclosed SDP architectures and processing nodes and is computationally efficient.
[0142] Data in the SDP architecture may enter as a burst comprising a stream of samples or symbols. Given a burst of sample streams, the burst may be broken up into multiple messages, and the multiple messages may be transferred to a DCX for storage and/or processing by a DSP core. In some embodiments, the maximum message size can be configured by a user and a burst will always produce messages with maximum message size except the last message. The consideration for determining maximum message size is two-fold: convenience of DMA transfer size and frequency of message header transfer which may require frequent updates such as “current frequency offset” or “current symbol timing.”
[0143] In the SDP architecture, sample streams received from other locations within the architecture (e.g., on chip) can be formatted to conform to a target data format, referred to herein as a streaming mode. The streaming mode is used when the sample streams are provided on chip (e.g., from a high-speed serial interface such as the one described herein with reference to FIG. 7A). On the highspeed serial interface, sample streams can be communicated using streaming or nonstreaming modes. This enables connecting with another FPGA or ASIC with the DCX- based designs described herein such that the DCX can directly send DCX-configured packets through the high-speed serial interface with its associated packet header metadata. Sample streams or data received through a different interface or source can be assumed to be already formatted using the target data formatting and that data can be received and processed in a non-streaming mode.
[0144] Software can configure components of the SDP architecture to accept sample streams (e.g., samples or symbols) from an interface and send sample streams to an interface when operating in the streaming mode. The software may configure these components to split received sample streams into sizes corresponding to buffer or memory sizes (e.g., sizes identified in the buffer pointers and work descriptors). In some embodiments, when a component of the SDP architecture receives a sample stream, the component creates or determines start and end markers based on the known buffer or memory sizes. As a result, the component breaks the incoming sample streams into smaller segments and numbers the segments to enable the other components of the SDP architecture to manage and process the data using work descriptors and buffer pointers, as described herein.
[0145] In some implementations, when the components of the SDP architecture receive packets with determined start and end points, the components can be configured to identify these points to control when to capture data, for example, using pre-defined boundaries. When transmitting, the components can be configured to either pass start and end markers or to send valid data signals, depending at least in part on downstream use.
[0146] Thus, to facilitate flexibility in the disclosed SDP architecture, e.g., to implement DSP processing help in various locations in the signal processing data path, a consistent data formatting is implemented so that the interfaces between the components (e.g., processing nodes and hardware blocks) can exchange data in a known and advantageous format. Accordingly, the disclosed data formatting modules provide an adaptation layer that is configured to process continuous or streaming data as well as data with fixed sizes or data that is packetized. The adaptation layer allows the SDP architecture to handle streaming data as well as packetized data by packaging both types of data streams into a common data structure. Advantageously, the disclosed adaptation layer can support user-defined burst data, which may be a frame or timeslot with an arbitrary length, because the burst boundaries can be preserved during the data formatting process.
[0147] FIG. 11 A illustrates an example of a data formatting module 1100 configured to receive digitized data from a high-speed serial receive module 1 1 10, similar to the HSS/ADC RX or HSS/SDP RX modules described herein with reference to FIG. 7A. In some embodiments, streaming data is received from the HSS/ADC RX module and non-streaming data is received from the HSS/SDP RX module. The data formatting module 1100 is part of a processing node, such as the processing node 210 described herein with reference to FIG. 2. The data formatting module 1100 provides the disclosed adaptation layer to enable storage and processing of streaming and non-streaming data in the SDP architecture. In particular, the data formatting module 1 100 formats data in a way that allows processing nodes and hardware blocks to communicate data back and forth.
[0148] In a streaming mode, the high-speed serial receive module 11 10 sends digitized data to a streaming mode component 1120 to format the data using a TX compose module 1 125. The TX compose module 1125 is configured to compose the data to be suitable for transmission to a DCX packet receive interface 1140. The TX compose module 1 125 receives Rx data (e.g., Rx data 1 , Rx data 2) and a valid flag to indicate that the data is part of a valid burst. In addition, the TX compose module 1 125 receives a ready signal or data from the DCX packet receive interface 1140 as well as configuration data from the DCX packet receive interface 1 140, similar to the data sent over the configuration ports described herein. In some embodiments, the data received from the high-speed serial receive module 1 1 10 includes I samples and Q samples.
[0149] The TX compose module 1125 is configured to package the received data into the format represented in FIG. 12A. Once received by the DCX packet receive interface 1 140, the data is stored in memory in the form represented in FIG. 12B. The TX compose module 1 125 passes this data to the DCX packet receive interface 1140 as well as an identified start of frame (sof) identifier, an end of frame (eof) identifier, the valid flag received from the high-speed serial receive module 1 1 10, and a status flag.
[0150] In a non-streaming mode, the RX data from the high-speed serial receive module 1110, such as the HSS/SDP RX module, is assumed to be already formatted in the targeted data format (represented in FIG. 12B). Thus, a nonstreaming mode component 1130 is configured to provide a data reformat I FIFO module 1 135 to convert the word size of the data to the word size expected by the DCX packet receive interface 1 140. In some embodiments, the expected word size is 128 bits. The non-streaming mode component 1 130 can also be configured to receive a ready flag from the DCX packet receive interface 1 140. In addition, the nonstreaming mode component 1 130 can be configured to receive and pass on an identified start of frame (sof) identifier, an end of frame (eof) identifier, and the valid flag received from the high-speed serial receive module 1 1 10. The operating mode can be switched between the streaming mode and the non-streaming mode using the mode flag.
[0151] Once received by the DCX packet receive interface 1 140, the sample/symbol data can be passed to other parts of the SDP architecture or to other parts of the processing node that includes the DCX packet receive interface 1140. The TX compose module 1 125 is configured to format sample/symbol streams into messages to be passed to the DCX, which then passes the data to the SDP architecture. The TX compose module 1 125 is configured to break up the burst into multiple messages and to transfer the multiple messages to the DCX via the DCX packet receive interface 1140. The TX compose module 1125 is configured to produce messages with a configured maximum message size except the last message.
[0152] FIG. 11 B illustrates an example of a data formatting module 1150 configured to receive processed data from a DCX packet transmit interface 1160 and to prepare the processed data for a high-speed serial transmit module 1190, similar to the HSS/DAC TX or HSS/SDP TX modules described herein with reference to FIG. 7A. A streaming mode component 1 170 includes an RX parse module 1175 that is configured to process messages received from the DCX packet transmit interface 1 160 and to recover the sample/symbol stream therefrom. The RX parse module 1175 is configured to receive data in the format represented in FIG. 12B and convert it to the format represented in FIG. 12A. The recovered sample/symbols stream is then passed to the high-speed serial transmit module 1190, such as the HSS/DAC TX module described herein with reference to FIG. 7A.
[0153] A non-streaming mode component 1 180 includes a data reformat/FIFO module 1 185 that is configured to receive messages from the DCX packet transmit interface 1 160 and to reformat the data according to an expected message size. In some implementations, the FIFO module 1185 is configured to take the data coming from the DCX packet transmit interface 1 160 and convert the data to the bit-width used by the high-speed serial interface. Otherwise, the non-streaming mode component 1 180 does not alter the data because the high-speed serial transmit module 1190 is configured to receive data formatted according to the format of FIG. 12B. The non-streaming mode data can then be forwarded to the high-speed serial transmit module 1 190, such as a HSS/SDP TX module described herein with reference to FIG. 7A.
[0154] FIGS. 12A and 12B illustrate packet formats for data in the SDP architecture. FIG. 12A illustrates the packet format at the DCX packet receive interface 1140 and the DCX packet transmit interface 1160, as described herein, and FIG. 12B illustrates the packet format provided by the DCX packet receive interface 1 140 and stored in the DCX of a processing node or received by the DCX packet transmit interface 1 160 where it is configured to the packet format in FIG. 12A.
[0155] With reference to FIG. 12A, packets are received for a burst, the burst having a header, payload data, and a footer. The data is reformatted by determining the number of payload messages in the burst, including that information in the footer, and then moving the header and footer together to form the first two words of the burst data, as represented in FIG. 12B. Advantageously, a component reading in the burst data knows that it must read the first two words of the data and from the first two words the component will know how many payload messages to read to read in the entire burst. This provides an efficient method of reading in burst data compared to burst data that is packaged with an unknown number of payload messages. In some embodiments, the word size is 128 bits. In such embodiments, the combined header and footer is stored in the first 256 bits of the burst data.
[0156] The adaptation layer allows a buffer pointer to be created that can store the entire burst because it can determine an exact size in memory for the burst data after reading the header/footer combination. For example, once a component receives a footer, a work descriptor can be generated that contains the buffer pointer to the burst data which includes the header and footer combination and the length of the payload data. In some embodiments, the formatted data includes a burst ID that includes a lower 2 bits of a burst counter. In some embodiments, the formatted data includes a burst start flag that indicates the packet belongs to a start of a burst. In some embodiments, the formatted data includes a burst end flag that indicates the packet belongs to an end of a burst. In some embodiments, if a packet includes both a burst start flag and a burst end flag, the packet can be considered to fully contain the burst data.
[0157] In some embodiments, the high-speed serial receive module 1 1 10 is configured to break down data coming in and the streaming mode component 1120 (via the TX compose module 1 125) is configured to create packets based on the data structure illustrated in FIG. 12A. In some embodiments, the data is divided into messages (e.g., 128-bit words). The high-speed serial receive module 1 1 10 generates a header (e.g., 192 bits) that is then followed by one or more payload messages (e.g., each being 128 bits). After the payload data, the high-speed serial receive module 1 1 10 can generate a footer (e.g., 64 bits) that is included after the one or more payload messages. A burst counter can be used and can increment each time a new burst is received (e.g., each time a new start flag is encountered). A segment counter can be used and can increment for each payload message that is included in the burst data structure. Each burst can be arbitrarily sized (e.g., 4 packets, 2 packets, 10 packets, then 1 packet, etc.). The segment counter can be reset each time a new burst is seen.
[0158] The adaptation layer, represented by the data formatting module 1 100, is then configured to convert the data to the data structure illustrated in FIG. 12A. That is, RX parse module 1175 is configured to receive packets in the format represented in FIG. 12A and break them down to a valid signal. This is the data structure used to store the data in the DCX and/or elsewhere in the SDP architecture. The data structure includes a combined header and footer in the first two words (e.g., 256 bits). This size can be set based on the characteristics of the memory of the DCX. This allows the components to do a two-word read to read all metadata associated with a burst. Consequently, the component knows exactly how much data to read from memory to read in all the data for a burst. In addition, the RX parse module 1175 provides a pull interface and high-speed serial transmit module 1 190 can be configured to throttle how fast it pulls data from this interface.
[0159] Messages in the SDP architecture can be transmitted a word at a time (e.g., 128 bits at a time). The data structures can have a width of one word and a depth of anywhere up to 2048, for example. The maximum message depth can be configured in the SDP architecture. For example, the depth can be 16, 32, 64, 128, 2048. The maximum depth includes 2 header rows, payload rows, and 1 dead cycle after the end of the burst data.
[0160] In some embodiments, the adaptation layer can perform a method for converting between sample streams or symbols streams and messages in a signal processing architecture for storing and processing by a processing node that includes a digital signal processor (DSP) core and an extended direct memory access controller (DCX). The method includes receiving a sample stream that includes a burst to be processed by the signal processing architecture. The method also includes generating a header message including information related to the burst. The method also includes splitting the sample stream into a plurality of burst messages, a size of each burst message, except for a final burst message corresponding to an end of the burst, corresponding to a buffer size in the DCX, the size of the final burst message being less than or equal to the buffer size in the DCX. The method also includes generating a footer message including information related to a size of the plurality of burst messages. The method also includes transferring a burst interface packet to the DCX, the burst interface packet including the header message, the plurality of burst messages, and the footer message. The method also includes reformatting the burst interface packet into a burst memory packet for storage in the DCX, the burst memory packet including the header message and the footer message in an initial portion of the burst memory packet and the plurality of burst messages following the initial portion of the burst memory packet. The initial portion of the burst memory packet indicates the number of burst messages in the burst memory packet.
[0161] The method of the adaptation layer can also identify a start flag and an end flag within the sample stream to determine end points of the burst in the sample stream. The method of the adaptation layer can also identify a first start flag and a second start flag within the sample stream to determine end points of the burst in the sample stream, the end points being the first start flag and data preceding but not including the second start flag.
[0162] In some embodiments, splitting the sample stream into the plurality of burst messages is responsive to identifying a start of frame indicator in the sample stream. In some embodiments, splitting the sample stream into the plurality of burst messages terminates responsive to identifying an end of frame indicator in the sample stream.
[0163] In some embodiments, in a non-streaming mode, the adaptation layer is configured to convert a first word size of data in the burst interface packet to a second word size that is compatible with the DCX, the second word size being greater than the first word size.
Data Storage Management and Usage
[0164] To improve the flexibility provided by the SDP architecture, memory access in the SDP architecture can be enhanced. Enhancing memory access can reduce the likelihood that memory access is a bottleneck in terms of the actual amount of storage and speed of access of that storage. Memory in the SDP architecture can be split into banks and/or channels to provide multiple access capability of the memory simultaneously. SDP memory (e.g., shared memory in a DCX, CMA, or other on-chip memory) may comprise memory read/write logic that not only performs read/write interleaving across multiple banks of memory (e.g., RAM) to provide very high read/write throughput but also provides flexibility with respect to data format or resolution changes in the data written into and read out of the memory. For the SDP memory to successfully provide the flexibility to support the communication functions (e.g., modem functions), it is important that memory access processes do not become a bottleneck in terms of the actual amount of storage (helped by data reformatting) and speed of access (interleaving across multiple banks).
[0165] FIG. 13 illustrates a memory module 1300 in a DCX, the memory module 1300 including read ports 1301 , write ports 1303, a memory arbiter 1305, and memory banks 1307. The memory module 1300 is split into multiple channels and banks to provide multiple access into the memory simultaneously. The channels can be high-order interleaved and the banks can be low-order interleaved where each memory module can be selected based on the bank and the channel derived from the address requested.
[0166] The memory module 1300 can be a single port memory configured to allow either a read or a write in a single clock cycle. By interleaving several of these memory modules, a higher read and write bandwidth can be achieved because each of these memory modules can be either read from or written to simultaneously. The memory module 1300 can be configured and operated in a way that effectively creates a multi-port memory, e.g., a quad-port memory that allows four simultaneous reads or writes every clock cycle. For large data transfers, the read ports 1301 and/or write ports 1303 can be configured to access memory sequentially. Single random reads and writes can be accomplished in the memory module 1300. However, for large data transfers the requested address can be configured to increase sequentially every clock cycle to achieve high data transfer rates.
[0167] The memory arbiter 1305 is configured to manage access to the memory banks 1307 in the event simultaneous access is requested. In some embodiments, the memory arbiter 1305 can determine access priority by assigning a higher priority to the request that is received from a hardware block of the signal processing architecture. In some embodiments, the memory arbiter 1305 can determine access priority by assigning a lower priority to the component that accessed the memory module 1300 most recently. The other requests are delayed for a clock cycle before determining access once more.
[0168] FIG. 14A illustrates an example memory module 1400 of a capture memory array 1410 (CMA) that includes a memory bank 1407 that is split into multiple channels and banks to provide multiple access capability of the memory simultaneously, the CMA 1410 being similar to the CMA described herein with reference to FIGS. 6A, 6B, and 7A. The channels may be high-order interleaved and the banks may be low-order interleaved. The memory module 1400 also includes a read port 1401 , a write port 1403, and a memory arbiter 1405. The memory arbiter 1405 acts similarly to the memory arbiter 1305 described herein with reference to FIG. 13. FIG. 14B illustrates the CMA 1410 with multiple memory modules 1400 where each memory module 1400 can be selected based on the bank and the channel derived from a requested address.
[0169] Each memory bank 1407 may be made up of 8 smaller single port RAMs. These RAMs may be arranged in groups of 4 that are interleaved on the lower address bits, making each consecutive access go to a separate RAM. Additionally, these two groups of 4 RAMs may make up the upper half and the lower half of the memory region. Each of these memories may use handshake signals to indicate the process (read/write) requesting access. When access requests go to the same RAM, the memory arbiter 1405 can be used to determine which request to delay. When access requests go to separate RAMs, the read and write processes can be performed in parallel. When read and write requests are received at the same RAM, arbitration logic may be used to determine when a last access to the RAM occurred and decide who (e.g., whether the read or the write) should get access. This may give both read and write processes fair access to the RAM in case one of the processes tries to access the same RAM consecutively, thereby avoiding lockup conditions.
[0170] The memory module may be split into multiple channels and banks to provide multiple access into the memory module simultaneously. The channels may basically be high-order interleaved and the banks may be low-order interleaved, where each memory module may be selected based on the bank and the channel derived from the address requested.
[0171] Each memory bank 1407 may be a single port memory allowing either a read or write process in a single clock cycle. However, due to interleaving of several memory modules, a higher read write bandwidth may be achieved as different memory modules can be either read from or written to simultaneously.
[0172] In some implementations, the memory module 1400 may create a quad-port memory that allows four simultaneous reads or writes every clock cycle. The ports may be used for large data transfers and hence may access memory sequentially, for example. Single random reads and writes may be allowed for large data transfers, though the address requested may increase sequentially every clock cycle to achieve high data transfer rates. The memory module 1400 can be configured to have an interface configured to allow requests from a network interconnect to be able to read from and write into the CMA 1410. The memory module 1400 may also have a particular handshake-based interface to allow DMA to directly access the memory bank 1407.
[0173] In some embodiments, the CMA 1410 may contain memory banks that allow storing a number of data samples. This memory may be continuous in the memory map. Each memory bank may be a dual port memory and may be designed using multiple interleaved single port memories and an arbiter, as described herein.
[0174] In some embodiments, a memory arbiter, such as the memory arbiter 1305 or the memory arbiter 1405, performs a method for controlling access to memory in a signal processing architecture. The memory can be part of on-chip memory, such as a capture memory array (CMA), or part of a DCX. The memory includes a plurality of random access memory (RAM) modules, each RAM module being logically split into a plurality of memory banks that are sequentially arranged. The method includes receiving a plurality of requests to access RAM modules in the memory, each request of the plurality of requests including a memory address in the memory corresponding to a memory bank within a particular RAM module. The method also includes, for each request, deriving from the memory address in the request a particular bank of the plurality of banks in the RAM module, the particular bank including the memory address in the request. The method also includes, responsive to determining that two requests of the plurality of requests request access the same bank in the same RAM module, determining a priority among the two requests; granting access to the requested bank to the request of the two requests with a higher priority; and delaying the request of the two requests with a lower priority by a clock cycle. The method also includes, for each request that requests consecutive access to the memory, granting access to a bank of the plurality of memory banks that is sequentially after the bank in the request.
[0175] In some embodiments, the plurality of banks is low-ordered interleaved and the number of banks of the plurality of banks is a power of 2. In some embodiments, the plurality of RAM modules is further divided into a plurality of channels, the number of channels of the plurality of channels is a power of 2. In such embodiments, the plurality of channels can be high-ordered interleaved.
[0176] In some embodiments, the memory arbiter is further configured to, responsive to determining that two requests of the plurality of requests results in a request to access different banks in the same RAM module, grant simultaneous access to the two requests to the respective requested banks. In some embodiments, determining the priority comprises assigning a lower priority to the request that most recently accessed the requested RAM module.
Additional Embodiments and Terminology
[0177] The present disclosure describes various features, no single one of which is solely responsible for the benefits described herein. It will be understood that various features described herein may be combined, modified, or omitted, as would be apparent to one of ordinary skill. Other combinations and sub-combinations than those specifically described herein will be apparent to one of ordinary skill and are intended to form a part of this disclosure. Various methods are described herein in connection with various flowchart steps and/or phases. It will be understood that in many cases, certain steps and/or phases may be combined together such that multiple steps and/or phases shown in the flowcharts can be performed as a single step and/or phase. Also, certain steps and/or phases can be broken into additional sub-components to be performed separately. In some instances, the order of the steps and/or phases can be rearranged and certain steps and/or phases may be omitted entirely. Also, the methods described herein are to be understood to be open-ended, such that additional steps and/or phases to those shown and described herein can also be performed.
[0178] Some aspects of the systems and methods described herein can advantageously be implemented using, for example, computer software, hardware, firmware, or any combination of computer software, hardware, and firmware. Computer software can comprise computer executable code stored in a computer readable medium (e.g., non-transitory computer readable medium) that, when executed, performs the functions described herein. In some embodiments, computerexecutable code is executed by one or more general purpose computer processors. A skilled artisan will appreciate, in light of this disclosure, that any feature or function that can be implemented using software to be executed on a general-purpose computer can also be implemented using a different combination of hardware, software, or firmware. For example, such a module can be implemented completely in hardware using a combination of integrated circuits. Alternatively or additionally, such a feature or function can be implemented completely or partially using specialized computers designed to perform the particular functions described herein rather than by general purpose computers.
[0179] Multiple distributed computing devices can be substituted for any one computing device described herein. In such distributed embodiments, the functions of the one computing device are distributed (e.g., over a network) such that some functions are performed on each of the distributed computing devices.
[0180] Some embodiments may be described with reference to equations, algorithms, and/or flowchart illustrations. These methods may be implemented using computer program instructions executable on one or more computers. These methods may also be implemented as computer program products either separately, or as a component of an apparatus or system. In this regard, each equation, algorithm, block, or step of a flowchart, and combinations thereof, may be implemented by hardware, firmware, and/or software including one or more computer program instructions embodied in computer-readable program code logic. As will be appreciated, any such computer program instructions may be loaded onto one or more computers, including without limitation a general-purpose computer or special purpose computer, or other programmable processing apparatus to produce a machine, such that the computer program instructions which execute on the computer(s) or other programmable processing device(s) implement the functions specified in the equations, algorithms, and/or flowcharts. It will also be understood that each equation, algorithm, and/or block in flowchart illustrations, and combinations thereof, may be implemented by special purpose hardware-based computer systems which perform the specified functions or steps, or combinations of special purpose hardware and computer- readable program code logic means.
[0181] Furthermore, computer program instructions, such as embodied in computer-readable program code logic, may also be stored in a computer readable memory (e.g., a non-transitory computer readable medium) that can direct one or more computers or other programmable processing devices to function in a particular manner, such that the instructions stored in the computer-readable memory implement the function(s) specified in the block(s) of the flowchart(s). The computer program instructions may also be loaded onto one or more computers or other programmable computing devices to cause a series of operational steps to be performed on the one or more computers or other programmable computing devices to produce a computer-implemented process such that the instructions which execute on the computer or other programmable processing apparatus provide steps for implementing the functions specified in the equation(s), algorithm(s), and/or block(s) of the flowchart(s).
[0182] Some or all of the methods and tasks described herein may be performed and fully automated by a computer system. The computer system may, in some cases, include multiple distinct computers or computing devices (e.g., physical servers, workstations, storage arrays, etc.) that communicate and interoperate over a network to perform the described functions. Each such computing device typically includes a processor (or multiple processors) that executes program instructions or modules stored in a memory or other non-transitory computer-readable storage medium or device. The various functions disclosed herein may be embodied in such program instructions, although some or all of the disclosed functions may alternatively be implemented in application-specific circuitry (e.g., ASICs or FPGAs) of the computer system. Where the computer system includes multiple computing devices, these devices may, but need not, be co-located. The results of the disclosed methods and tasks may be persistently stored by transforming physical storage devices, such as solid state memory chips and/or magnetic disks, into a different state.
[0183] Unless the context clearly requires otherwise, throughout the description and the claims, the words “comprise,” “comprising,” and the like are to be construed in an inclusive sense, as opposed to an exclusive or exhaustive sense; that is to say, in the sense of “including, but not limited to.” The word “coupled”, as generally used herein, refers to two or more elements that may be either directly connected, or connected by way of one or more intermediate elements. Additionally, the words “herein,” “above,” “below,” and words of similar import, when used in this application, shall refer to this application as a whole and not to any particular portions of this application. Where the context permits, words in the above Detailed Description using the singular or plural number may also include the plural or singular number respectively. The word “or” in reference to a list of two or more items, that word covers all of the following interpretations of the word: any of the items in the list, all of the items in the list, and any combination of the items in the list. The word “exemplary” is used exclusively herein to mean “serving as an example, instance, or illustration.” Any implementation described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other implementations.
[0184] The disclosure is not intended to be limited to the implementations shown herein. Various modifications to the implementations described in this disclosure may be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other implementations without departing from the spirit or scope of this disclosure. The teachings of the invention provided herein can be applied to other methods and systems, and are not limited to the methods and systems described above, and elements and acts of the various embodiments described above can be combined to provide further embodiments. Accordingly, the novel methods and systems described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the methods and systems described herein may be made without departing from the spirit of the disclosure. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the disclosure.

Claims

WHAT IS CLAIMED IS:
1 . A processing node (PN) comprising: a first digital signal processor (DSP) core; a second DSP core; a plurality of extended direct memory access controllers (DCX), each DOX having shared memory space, an input packet interface, and an output packet interface, the input packet interface configured to receive samples from a hardware block separate from the processing node, the shared memory space configured to store the received samples, and the output packet interface configured to transmit samples processed by the first DSP core or the second DSP core to the hardware block; and a PN network interconnect configured to communicably couple the first DSP core, the second DSP core, and the plurality of DCX, each DSP core and DCX coupled to the PN network interconnect through a respective master interface and a respective slave interface, the PN network interconnect further including an SDP master interface and an SDP slave interface each configured to communicate with an SDP network interconnect, wherein the processing node is configured to be integrated into a radio transceiver comprising the hardware block and to interface with the hardware block to provide configurable processing functionality to the radio transceiver.
2. The processing node of claim 1 , wherein the PN network interconnect further includes a configuration interface configured to enable the processing node to configure the hardware block.
3. The processing node of claim 1 further comprising a queue interface configured to transfer commands or data from the first DSP core to the second DSP core and to transfer commands or data from the second DSP core to the first DSP core.
4. The processing node of claim 1 further comprising a first queue interface and a second queue interface, the first queue interface configured to transfer commands or data from the first DSP core to the second DSP core, the second queue interface configured to transfer commands or data from the second DSP core to the first DSP core.
5. The processing node of claim 1 , wherein each DSP core includes a general-purpose input-output (GPIO) port connected to a configuration register and configured to receive input for placement in the configuration register and to transmit data stored in the configuration register.
6. The processing node of claim 1 , wherein each DSP core is configured to receive interrupt requests through the PN network interface from the hardware block that is separate from the processing node.
7. The processing node of claim 1 , wherein a first DCX of the plurality of DCX is configured to: receive a plurality of samples, from a first hardware block, to be processed through the input packet interface; temporarily store the plurality of samples in the shared memory space; and transmit the plurality of samples to the first DSP core through the PN network interconnect.
8. The processing node of claim 7, wherein the first DSP core or the second DSP core is configured to: program the first DCX to convey the plurality of samples to the first DSP core; place the plurality of samples into an internal memory space of the first DSP core; process the plurality of samples; and place the processed samples into the internal memory space of the first DSP core.
9. The processing node of claim 7, wherein the first DCX is further configured to reformat the received plurality of samples.
10. The processing node of claim 9, wherein the first DCX is configured to reformat the received plurality of samples by sign extending samples of the received plurality of samples to increase the number of bits for each sample.
1 1 . The processing node of claim 9, wherein the first DCX is configured to reformat the received plurality of samples by bit clipping samples of the received plurality of samples to reduce a resolution of each sample.
12. The processing node of claim 1 , wherein a first DCX of the plurality of DCX is configured to: receive a plurality of samples to be processed through the input packet interface; temporarily store the plurality of samples in the shared memory space; and transmit the plurality of samples to the hardware block separate from the processing node for processing using the output packet interface.
13. The processing node of claim 12, wherein the first DCX is configured to: receive the processed plurality of samples from the hardware block; and temporarily store the processed plurality of samples in the shared memory space.
14. The processing node of claim 1 , wherein the first DSP core and the second DSP core are configured to be used both as separate entities and as a shared dual-core configuration.
15. The processing node of claim 1 , wherein the first DSP core and the second DSP core each include two processors and the plurality of DCX includes a DCX for each processor of the first DSP core and the second DSP core.
16. The processing node of claim 1 , wherein the SDP master interface and the SDP slave interface of the PN network interconnect are configured to communicate with a PN network interconnect of a different processing node in the radio transceiver via the SDP network interconnect.
17. The processing node of claim 1 , wherein the first DSP core, the second DSP core, and each of the plurality of DCX includes a configuration register configured to store data to configure the associated DSP core or DCX.
18. The processing node of claim 1 , wherein the processing node is configured to be implemented within a demodulator of the radio transceiver.
19. The processing node of claim 1 , wherein the processing node is configured to be implemented within a decoder of the radio transceiver.
20. The processing node of claim 1 , wherein the processing node is configured to be implemented within a modulator or encoder of a transmitter of the radio transceiver.
21 . A signal processing architecture comprising: a software defined physical layer (SDP) network interconnect; a plurality of processing nodes connected to the SDP network interconnect and configured to provide configurable processing power to process receiver and transmitter waveforms in a radio transceiver, each processing node including: a plurality of digital signal processing (DSP) cores; a plurality of extended direct memory access controllers (DCX); and a PN network interconnect connected to the plurality of DSP cores, to the plurality of DCX, and to the SDP network interconnect; a capture memory array (CMA) comprising a plurality of memory banks that are connected to the SDP network interconnect to provide access to the plurality of memory banks for the plurality of processing nodes; and a CPU subsystem connected to the SDP network interconnect, wherein the SDP network interconnect enables communication among each of the plurality of processing nodes, the CMA, and the CPU subsystem to augment processing power and functionality in the radio transceiver.
22. The signal processing architecture of claim 21 , wherein one or more of the plurality of processing nodes can be dynamically allocated to provide signal processing power to one or more hardware blocks of the radio transceiver.
23. The signal processing architecture of claim 21 , wherein each processing node of the plurality of processing nodes is configured to interface with one or more individual hardware blocks of both receiver and transmitter signal processing data paths in the radio transceiver.
24. The signal processing architecture of claim 21 , wherein a first processing node of the plurality of processing nodes is implemented in an encoder or modulator of the radio transceiver.
25. The signal processing architecture of claim 24, wherein a second processing node of the plurality of processing nodes is implemented in a demodulator of the radio transceiver.
26. The signal processing architecture of claim 25, wherein a third processing node of the plurality of processing nodes is implemented in a decoder of the radio transceiver.
27. The signal processing architecture of claim 21 further comprising an external memory connected to the CPU subsystem, the plurality of processing nodes configured to pass data from individual DSP cores to the external memory through the SDP network interconnect.
28. The signal processing architecture of claim 21 , wherein individual processing nodes of the plurality of processing nodes are integrated within different portions of a demodulator.
29. The signal processing architecture of claim 21 , wherein each processing node includes an SDP master interface and an SDP slave interface to the SDP network interconnect, the CMA includes a plurality of SDP master interfaces to the SDP network interconnect, and the CPU subsystem includes an SDP master interface and an SDP slave interface to the SDP network interconnect.
30. The signal processing architecture of claim 21 further comprising: a second SDP network interconnect connected to the SDP network interconnect; and a second plurality of processing nodes connected to the second SDP network interconnect, each processing node of the second plurality of processing nodes including one or more digital signal processing (DSP) cores, one or more extended direct memory access controllers (DCX), and a PN network interconnect connected to the plurality of DSP cores, to the plurality of DCX, and to the second SDP network interconnect.
31 . A method for passing data to a processing node in a signal processing architecture that includes a software defined physical layer (SDP) network interconnect connected to the processing node, the processing node including a digital signal processing (DSP) core, an extended direct memory access controller (DCX), a packet receiver, a packet transmitter, and a PN network interconnect connected to the SDP network interconnect, the method comprising: utilizing a buffer pointer queue to manage available memory, the buffer pointer queue comprising a plurality of buffer pointers that each identify a buffer address in memory that is available for storing data; utilizing a work descriptor queue to manage burst data to be processed, the work descriptor queue comprising a plurality of work descriptors that each identify a buffer address in memory that includes burst data to be processed; responsive to the packet receiver receiving burst data to be processed: retrieving a first buffer pointer from the buffer pointer queue; processing the received burst data; storing the processed burst data in memory at the buffer address identified by the first buffer pointer; and outputting a new work descriptor, the new work descriptor including the buffer address identified by the first buffer pointer; and responsive to the work descriptor queue having processed data to be transmitted: retrieving a first work descriptor from the work descriptor queue; obtaining the processed data from memory at the buffer address identified by the first work descriptor; outputting a new buffer pointer, the new buffer pointer corresponding to the buffer address identified by the first work descriptor; and transmitting the processed data by the packet transmitter.
32. The method of claim 31 , wherein the work descriptor further includes a data header length indicating an amount of storage occupied by a data header associated with the burst data.
33. The method of claim 32, wherein the work descriptor further includes: a burst start flag indicating that the burst data belongs to a first packet of a burst; and a burst end flag indicating that the burst data belongs to a last packet of a burst.
34. The method of claim 33, wherein the work indicator indicates the burst data is a fully contained burst by setting the burst start flag and the burst end flag to true.
35. The method of claim 31 further comprising adding the new work descriptor to the work descriptor queue.
36. The method of claim 31 further comprising adding the new buffer pointer to the buffer pointer queue.
37. The method of claim 31 further comprising, responsive to receiving the buffer pointer queue and the work descriptor queue: retrieving a second work descriptor from the work descriptor queue; obtaining data from memory at the buffer address indicated by the second work descriptor; retrieving a second buffer pointer from the buffer pointer queue; processing the retrieved data to generate output processed data; storing the output processed data in memory at the buffer address indicated by the second buffer pointer; outputting a new work descriptor, the new work descriptor including the buffer address indicated by the second buffer pointer; and outputting a new buffer pointer, the new buffer pointer indicating the buffer address indicated by the second work descriptor.
38. The method of claim 37, wherein each work descriptor further includes a burst identifier of the burst data to be processed and a burst length indicating an amount of storage occupied by the burst data to be processed.
39. The method of claim 31 further comprising: monitoring a fill level of the work descriptor queue by the DCX; and responsive to determining that the fill level is below a threshold fill level, adding one or more work descriptors to the work descriptor queue from a work descriptor list.
40. The method of claim 31 further comprising: monitoring a fill level of the buffer pointer queue by the DCX; and responsive to determining that the fill level is below a threshold fill level, adding one or more buffer pointers to the buffer pointer queue from a buffer pointer list.
41 . A method for converting between sample streams or symbols streams and messages in a signal processing architecture for storing and processing by a processing node that includes a digital signal processor (DSP) core and an extended direct memory access controller (DCX), the method comprising: receiving a sample stream that includes a burst to be processed by the signal processing architecture; generating a header message including information related to the burst; splitting the sample stream into a plurality of burst messages, a size of each burst message, except for a final burst message corresponding to an end of the burst, corresponding to a buffer size in the DCX, the size of the final burst message being less than or equal to the buffer size in the DCX; generating a footer message including information related to a size of the plurality of burst messages; transferring a burst interface packet to the DCX, the burst interface packet including the header message, the plurality of burst messages, and the footer message; and reformatting the burst interface packet into a burst memory packet for storage in the DCX, the burst memory packet including the header message and the footer message in an initial portion of the burst memory packet and the plurality of burst messages following the initial portion of the burst memory packet, wherein the initial portion of the burst memory packet indicates the number of burst messages in the burst memory packet.
42. The method of claim 41 , wherein the sample stream is received by a component of the processing node.
43. The method of claim 41 further comprising identifying a start flag and an end flag within the sample stream to determine end points of the burst in the sample stream.
44. The method of claim 41 further comprising identifying a first start flag and a second start flag within the sample stream to determine end points of the burst in the sample stream, the end points being the first start flag and data preceding but not including the second start flag.
45. The method of claim 41 , wherein each packet of the sample stream has a size in bits equal to a size of a word in the DCX.
46. The method of claim 41 , wherein splitting the sample stream into the plurality of burst messages is responsive to identifying a start of frame indicator in the sample stream.
47. The method of claim 41 , wherein splitting the sample stream into the plurality of burst messages terminates responsive to identifying an end of frame indicator in the sample stream.
48. The method of claim 41 , wherein the header message includes a burst counter that increments responsive to identifying a boundary of the burst.
49. The method of claim 41 , wherein the initial portion of the burst memory packet is sized to be less than or equal to a size of two words in memory of the DCX.
50. The method of claim 41 , wherein reformatting the burst interface packet further includes converting a first word size of data in the burst interface packet to a second word size that is compatible with the DCX, the second word size being greater than the first word size.
51 . A method for accessing memory in a signal processing architecture that includes a capture memory array (CMA), the CMA including a plurality of random access memory (RAM) modules, each RAM module being logically split into a plurality of memory banks that are sequentially arranged, the method comprising: receiving a plurality of requests to access RAM modules in the CMA, each request of the plurality of requests including a memory address in the CMA corresponding to a memory within a particular RAM module; for each request, deriving from the memory address in the request a particular bank of the plurality of banks in the RAM module, the particular bank including the memory address in the request; responsive to determining that two requests of the plurality of requests request access the same bank in the same RAM module: determining a priority among the two requests; granting access to the requested bank to the request of the two requests with a higher priority; and delaying the request of the two requests with a lower priority by a clock cycle; and for each request that requests consecutive access to the CMA, granting access to a bank of the plurality of memory banks that is sequentially after the bank in the request.
52. The method of claim 51 , wherein the plurality of banks is low-ordered interleaved and the number of banks of the plurality of banks is a power of 2.
53. The method of claim 51 , wherein the plurality of RAM modules is further divided into a plurality of channels, the number of channels of the plurality of channels is a power of 2.
54. The method of claim 53, wherein the plurality of channels is high-ordered interleaved.
55. The method of claim 51 further comprising, responsive to determining that two requests of the plurality of requests results in a request to access different banks in the same RAM module, granting simultaneous access to the two requests to the respective requested banks.
56. The method of claim 51 , wherein determining the priority comprises assigning a lower priority to the request that most recently accessed the requested RAM module.
PCT/US2024/039801 2023-07-28 2024-07-26 Processing nodes for signal processing in radio transceivers Pending WO2025029644A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202363516434P 2023-07-28 2023-07-28
US63/516,434 2023-07-28

Publications (2)

Publication Number Publication Date
WO2025029644A2 true WO2025029644A2 (en) 2025-02-06
WO2025029644A3 WO2025029644A3 (en) 2025-03-06

Family

ID=92800288

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2024/039801 Pending WO2025029644A2 (en) 2023-07-28 2024-07-26 Processing nodes for signal processing in radio transceivers

Country Status (1)

Country Link
WO (1) WO2025029644A2 (en)

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11436525B2 (en) * 2017-12-01 2022-09-06 Deepwave Digital, Inc. Artificial intelligence radio transceiver
CN112714904A (en) * 2018-09-27 2021-04-27 英特尔公司 FIFO buffer based on stored data or free space

Also Published As

Publication number Publication date
WO2025029644A3 (en) 2025-03-06

Similar Documents

Publication Publication Date Title
Forencich et al. Corundum: An Open-Source 100-Gbps Nic.
CN110417780B (en) Multi-channel high-speed data interface conversion module for customized data transmission protocol
US7669000B2 (en) Host bus adapter with multiple hosts
US7155554B2 (en) Methods and apparatuses for generating a single request for block transactions over a communication fabric
US8606976B2 (en) Data stream flow controller and computing system architecture comprising such a flow controller
US9471521B2 (en) Communication system for interfacing a plurality of transmission circuits with an interconnection network, and corresponding integrated circuit
CN103714026B (en) A kind of memory access method supporting former address data exchange and device
US8576879B2 (en) Communication system and method
CN112306924A (en) Data interaction method, device and system and readable storage medium
CN114257245A (en) Multichannel AD acquisition system based on DSP-FPGA
Wu FELIX: the new detector interface for the ATLAS experiment
WO2018001248A1 (en) Data transmission method and device
EP1508100B1 (en) Inter-chip processor control plane
CN116166581B (en) Queue type DMA controller circuit for PCIE bus and data transmission method
CN114756493A (en) Interface design and communication method for expandable interconnected bare core and peer-to-peer equipment
CN103136163A (en) Protocol processor chip capable of allocating and achieving FC-AE-ASM and FC-AV protocol
CN116719755A (en) A method, device and equipment for multi-application memory access
US8458389B2 (en) Apparatus and method for converting protocol interface
CN115633098B (en) Storage management method and device of many-core system and integrated circuit
WO2025029644A2 (en) Processing nodes for signal processing in radio transceivers
CN115622896B (en) AXI4 high-speed bus and multi-queue simulation verification method and simulation verification device
CN214586880U (en) Information processing apparatus
US7558285B2 (en) Data processing system and data interfacing method thereof
CN116795763B (en) Method, system on chip and chip for data packet transmission based on AXI protocol
US11029914B2 (en) Multi-core audio processor with phase coherency

Legal Events

Date Code Title Description
DPE1 Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)