WO2024217773A1

WO2024217773A1 - Processing unit for a parity check code decoder

Info

Publication number: WO2024217773A1
Application number: PCT/EP2024/056228
Authority: WO
Inventors: Alain Rolle; Jose LAFFITTE
Original assignee: Cactussoft Belgium
Current assignee: Cactussoft Belgium
Priority date: 2023-04-18
Filing date: 2024-03-08
Publication date: 2024-10-24
Anticipated expiration: 2025-10-18

Abstract

A check node processing unit determines reliability messages for decoding a signal encoded with a parity check code and comprises : • - a plurality of processing elements with a plurality of input messages, each processing element comprising a first and a second input for a respective input message and an output to generate, based on the input messages, an output message being a reliability message or an intermediate message, • - data links interconnecting processing elements to form a 3D lattice circuit, whereby a first and a second dimension of the lattice circuit define a plane wherein output messages of a first subset of interconnected processing elements (1201) form in the plane a step pattern having a direction and wherein output messages of a second subset of interconnected processing elements (1211) form in that plane a step pattern having the opposite direction of the first step pattern, wherein each input message is fed twice to the 3D lattice circuit, once via a first data link along the first dimension and once via a second data link along the second dimension, wherein interconnected processing elements of a third subset of interconnected processing elements (1221) are each arranged to receive along a third dimension of the lattice circuit a first intermediate message output by another processing element and to output a corresponding reliability message based on the first intermediate message. Further, a decoder is disclosed comprising such a check node processing unit.

Description

Processing Unit for a Parity Check Code Decoder

Field of the invention

[0001] The present invention is generally related to the field of digital communications, and in particular to methods and devices for decoding a signal encoded using an error correcting code, in particular a parity check code.

Background of the invention

[0002] The goal of digital communication systems is to transmit digital data from one location or subsystem to another location or subsystem, either in an error-free way or with an acceptably low error rate. As shown in Fig.l, data may be transmitted over a variety of communications channels in a wide variety of communication systems: for example, magnetic media, wired, wireless, fibre, copper, without being limited thereto.

[0003] Fig.l represents a general block scheme illustrating various communication systems. A communication channel 199 communicatively couples a first communication device 110 (with a transmitter 112 comprising an encoder 114 and a receiver 116 comprising a decoder 118) situated at one end of the communication channel 199 to a second communication device 120 (containing a transmitter 126 having an encoder 128 and a receiver 122 having a decoder 124) at the other end of the communication channel. In some cases either of the communication devices 110 and 120 may only include a transmitter or a receiver. There are several different types of media by which the communication channel 199 may be implemented (e.g., a satellite communication channel 130 using satellite dishes 132 and 134, a wireless communication channel 140 using towers 142 and 144 and/or local antennae 152 and 154, a wired communication channel 150, and/or a fibre optic communication channel 160 using an electrical to optical (E/O) interface 162 and an optical to electrical (O/E) interface 164)). In addition, more than one type of media may be implemented and interfaced simultaneously thereby forming the communication channel 199.

[0004] To reduce transmission errors that may have occurred within a communication system, for example due to the presence of noise and/or interference, error correction and channel coding schemes are often employed. Generally, these error correction and channel coding schemes involve the use of an encoder at the transmitter and a decoder at the receiver. As commonly known, coding consists in adding redundant data to the original data, the redundant data enabling the detection and/or the correction of errors.

[0005] Error correcting codes are implemented in a plethora of devices and systems applied not only in data communication but for example also in storage. Exemplary applications comprise voice and multimedia transmission for example in wireless ad-hoc networks (e.g., standardized in Wi-Fi 802.11) in radio communication systems (e.g., in 3G, 4G/LTE, 5G and beyond etc...), in optical communication systems, satellite communication systems and in digital video broadcasting (e.g., standardized in DVB-C2, DVB-S2X, and DVB-T).

[0006] Linear error correcting codes are advantageous in that they require a lower implementation complexity than non-linear error correcting codes. Exemplary linear error correcting codes comprise convolutional codes and linear block codes such as Hamming codes, Reed-Solomon codes, Turbo codes, polar codes, BCH codes, High Density Parity Check (HDPC) codes, Low-Density Parity Check (LDPC) codes and Moderate Density Parity Check (MDPC) codes.

[0007] The present invention is concerned with parity check decoding of linear block codes. LDPC codes are throughout this description used as exemplary parity check codes because they are well known and their decoding process is extensively described in the literature. However, the present invention is applicable to the decoding of all linear block codes for which a parity check matrix exists. LDPC codes are very efficient, capacity approaching forward error correcting codes that are being adopted in an increasing number of communication standards (e.g., IEEE 802.3an, IEEE 802. lln, 802.20, DVB-S2, DVB-S2X). According to the 3^rd Generation Partnership Project (3GPP) TS 38.212, LDPC is recommended for the Fifth-Generation (5G) New Radio (NR) shared channels due to its high throughput, low latency, low decoding complexity and rate compatibility. Relevant application domains include magnetic recording, wireless communication and high-speed data transmission over copper and optical fibre.

[0008] Decoding of signals encoded using linear block codes in general, and parity check codes in particular, can be performed using iterative message passing algorithms. Message passing algorithms are based on exchanging messages representative of the encoded data between check node processing units and variable node processing units associated with a bipartite graph representation of the used code. A bipartite graph is made up of two entities, variable nodes and check nodes, connected to each other via a set of edges.

[0009] Fig.2 illustrates a parity check code bipartite graph 500. In the art, a parity check bipartite graph is also sometimes referred to as a Tanner graph. For instance, an LDPC code may be viewed as a code having a binary parity check matrix H wherein nearly all matrix elements are zeroes. For example, H=(hi,j) of size MxN may be viewed as being a parity check matrix of an LDPC code with block length N. Likewise, a HDPC code may be viewed as a code having a binary parity check matrix H wherein matrix elements may be 'ones' or 'zeroes' with about similar probability. Polar codes and BCH codes are good examples of what may be considered an HDPC code. [0010] As parity check codes are linear block codes, the set of all codewords X E C spans the null space of a parity check matrix, H:

H x^T = 0, /x C (1)

For LDPC codes, H is a sparse binary matrix of dimension MxN. Each row of H corresponds to a parity check and a set element hj, different from 0 indicates that data symbol i participates in parity check j. Each column of H corresponds to a codeword symbol. For each codeword x there are N symbols of which M (< N) are parity symbols. For HDPC codes, this still applies, but the matrix H is then not as sparse as the parity check matrix of an LDPC code.

[0011] The row and column weights are defined as the number of set elements in a given row or column of H, respectively. The set elements of H are chosen to satisfy the performance requirements of the code. The number of l's in the i-th column of the parity check matrix H may be denoted as d_v(i), and the number of l's in the j-th row of the parity check matrix may be denoted as d_c(j). If d_v(i)=d_v, for all i, and d_c(j)=d_c for all j, then an LDPC code is called a (d_v,d_c) regular LDPC code; otherwise the LDPC code is called an irregular LDPC code. In principle the same observations can be made with respect to a HDPC code.

[0012] A regular LDPC or HDPC code can be represented as a bipartite graph 500 by its parity check matrix with (see Fig. ) left side nodes representing variables of the code bits (also called the "variable nodes" (or "bit nodes") 510 in a bit decoding approach to decoding parity check coded signals), while the right side nodes (alternatively called the "check nodes") 520 represent check equations. The bipartite graph 500 of the LDPC code represented by H may be defined by N variable nodes (e.g., N bit nodes) and M check nodes. Every variable node / of the N variable nodes 510 has exactly d_v(i)=d_v edges (an example edge is shown using reference numeral 530) connecting the bit node Vi 512 to one or more of the check nodes (within the M check nodes). The edge 530 is in Fig.2 specifically shown as connecting from the bit node Vi 512, to the check node c, 522. The number of edges d_v can be referred to as the degree of a variable node i. Analogously, every check node j of the M check nodes 520 has exactly d_c(j )=d_c edges 524, connecting this node to one or more of the variable nodes (or bit nodes) 510. The number of edges d_c may be referred to as the degree of the check node j.

[0013] An edge 530 between a variable node Vi (or bit node bi) 512 and check node Cj 522 may be defined by e=(i, j). Given an edge e=(i, j), the nodes of the edge may alternatively be denoted as e=(v(e), c(e)) (or e=(b(e), c(e))). Alternatively, the edges in the graph correspond to the set elements of H where a set element hji indicates that an edge connects a variable bit node i with parity check node j- [0014] Given a variable node Vi (or bit node bi), one may define the set of edges emitting from the node Vi by E_v(i)={e |v(e)=i} (or by Eb(i)={e | b(e)=i}); these edges are referred to as bit edges, and the messages corresponding to these bit edges are referred to as bit edge messages or bit node messages. [0015] Given a check node Cj, one may define the set of edges emitting from the node Cj by

E_c(j)={e |c(e)=j}; these edges are referred to as check edges, and the messages corresponding to these check edges are referred to as check edge messages or check node messages.

[0016] Generally, any code that can be represented by a bipartite graph may be characterized as a graph code. This includes HDPC codes, regular LDPC codes and irregular LDPC codes. The degree of each set of nodes within a parity check code may be chosen according to some distribution.

[0017] As already mentioned, parity check decoding processing is performed using an iterative decoding approach in which messages (e.g., check edge messages and bit edge messages (variable edge messages)) are passed back and forth when performing check node processing (sometimes also referred to as check engine processing) in a check node processing unit and bit node processing (sometimes alternatively referred to as bit engine processing) in a bit node processing unit. This message passing decoding processing as it is sometimes named, operates on a graph representation of the code, e.g., a parity check bipartite graph. This is also sometimes named belief propagation decoding processing.

[0018] A belief propagation algorithm consists of iteratively updating the probability value of each bit using the parity check equations that the bit participates in. This algorithm is also sometimes referred to as message-passing decoding (or message passing decoding processing) because intrinsic information is passed as messages between the check nodes and the bit nodes. The check nodes correspond to rows in the parity check matrix, whereas the bit nodes correspond to the columns. Thus, an iteration of the belief propagation algorithm consists of check node updates on all the rows followed by bit node updates on all the columns.

[0019] More details are now provided on implementing belief propagation decoding functionality in a decoder. Reference is made to Fig.3, showing a scheme of a receiver structure comprising a decoder configured to decode, in this example, an LDPC coded signal having an m-bit signal sequence. Note that the figure stays applicable for a HDPC code or any parity check code. A continuous-time signal 601 is received from a communication channel, which can be any type of channel including, but not limited to, a wireline communication channel, a wireless communication channel, a fibre optic communication channel, a read channel of a HDD, or other type of communication channel capable of carrying a continuous-time signal coded using a parity check code.

[0020] The continuous time signal typically contains a mixture of so-called a-priori known reference symbols (sometimes called pilot symbols) and unknown user symbols, related to the user traffic bits. In single-carrier signals these reference symbols occupy time slots and in OFDM signals the reference symbols occupy time-frequency resource elements. The purpose of these reference symbols is to aid the receiver with channel estimation and equalization, as well as synchronization. The reference symbols represent overhead, because any portion of time and or spectrum occupied by them cannot be used to convey user bits. As such reference symbols reduce the bandwidth efficiency.

[0021] In the receiver, an analog front-end (AFE) 610 is operable to perform initial processing on the continuous-time signal (e.g., by performing one or more of filtering (analog and/or digital filtering), gain adjustment, equalisation, etc.) and digital sampling thereby obtaining a discrete-time signal 611. This discrete-time signal 611 can alternatively be referred to as a digital signal, a baseband signal, or other appropriate terminology known in the art. Oftentimes, the discrete-time signal 611 is partitioned into I, Q (In-phase, Quadrature) values of the signal.

[0022] A metric generator 620 is operable to receive the discrete-time signal 611 (e.g., which can include the I, Q values thereof) and to calculate the bit metrics and/or log likelihood ratios (LLRs)

621 that correspond to the received values within the discrete-time signal 611. In some implementations, the calculation of these bit metrics/LLRs symbol metrics 621 is a two-step process, in which the metric generator 620 firstly is operable to calculate symbol metrics corresponding to the symbols of the discrete-time signal 611, and then the metric generator secondly is operable to employ the symbol metrics to decompose those symbol metrics into the bit metrics/LLRs 621. In a typical receiver 600 the bit metrics/LLRs 621 obtained from signal-conditioned symbol metrics are then employed by a bit engine 630, i.e. a variable node processing unit, to initialize the bit edge messages that are employed when performing iterative decoding processing 635 (e.g., as performed by the bit engine 630 and a check engine 640) of the parity check coded signal. Check engine is another name to refer to a check node processing unit, the terms are in this description used interchangeably.

[0023] The initialization of the bit edge messages for each variable node i with the value of the log-likelihood ratio (LLR) i of the corresponding received symbol yi is defined as follows :

[0024] Also, at the bit nodes, a bit engine 630 is operable to compute the corresponding soft information of the bits (e.g., shown as soft information 632) using the most recently updated bit edge messages. However, it is common for multiple decoding iterations to be performed, so the initialized bit edge messages are passed to the check engine 640 where, during a first decoding iteration, the check node processing unit 640 employs the initialized bit edge messages to update check edge messages. [0025] At each check node, in some conventional parity check decoders, the decoding processing forms a parity check result (XOR) on the sign of the incoming messages. This is done by finding the sign of each outgoing message as the XOR of the sign of the corresponding incoming message with the parity check result.

[0026] Thereafter, the bit engine 630 (i.e. the variable node processing unit) receives the updated edge messages (e.g., shown as check edge message 641) from the check engine 640 and employs them to update the bit edge messages. Also, the bit engine 630 is operable to employ the bit metrics/LLRs 621 received from the metric generator 620 when performing the updating of the bit edge messages in accordance with belief propagation decoding. Based on the updated check edge messages 641 passed back to the bit nodes, the bit engine 630 also calculates the soft information 632 of the bits using the bit metrics/LLRs 621 and the current iteration values of the check edge messages. At each bit node, the calculation of the soft information, also known as a posteriori probabilities (APP), involves forming the sum of the LLR of the received symbol with the incoming messages i, from the check node (i.e. the check edge messages 641). The decoded bit Xi is given by the sign of this summation Qi, which is sometimes called the soft information corresponding to that logical bit Xj. Each outgoing variable node message i for the next decoder iteration is computed by subtracting the corresponding incoming message from the summation. To continue with the iterative decoding processing 635, these bit edge messages 631, after being updated, are then passed to the check node processing unit 640.

[0027] Another decoding iteration can be performed, in that, at the check nodes, the check engine 640 receives these updated bit edge messages 631 sent from the bit nodes (from the bit engine 630) and updates the check edge messages accordingly. These updated check edge messages 641 are then passed back to the bit nodes (to the bit engine 630) where the soft information 632 of the bits is calculated using the bit metrics/LLRs 621 and the current iteration values of the check edge messages. Thereafter, using the just calculated soft information 632 of the bits, the bit engine 630 can again update the bit edge messages using the previous values of the check edge messages (from the just preceding iteration). The iterative processing 635 continues between the bit nodes and the check nodes according to the LDPC code bipartite graph employed to encode the signal being decoded.

[0028] The iterative decoding processing steps, performed by the bit node engine 630 and the check node engine 640, are repeated until a stopping criterion is met as shown by reference numeral 661 (e.g., after a predetermined or adaptively determined number of iterations have been performed, after all syndromes of the parity check code are all equal to zero (i.e., all of the parity checks are satisfied), and/or another stopping criterion is met). This is equivalent to stopping the decoding when the current estimate of the codeword x satisfies the relationship HX^T=0.

[0029] As mentioned, soft information 632 can be generated within the bit engine 630 during each of the decoding iterations. In some implementations (like the one in Fig.2), this soft information 632 is provided to a hard limiter 650 where hard decisions are made, and that hard information (i.e., hard/best estimate 651) may be provided to a syndrome calculator 660 to determine whether the syndromes of the parity check code are all equal to zero. In other words, the syndrome calculator 660 determines whether each syndrome associated with the code is equal to zero, based on the current codeword estimate.

[0030] When the syndromes are not equal to zero, the iterative decoding processing 635 can continue by appropriately updating and passing the bit edge messages and the check edge messages between the bit engine 630 and the check engine 640, respectively. After the iterative decoding processing steps have been performed, the hard/best estimates 651 of the bits are output based on the soft information 632.

[0031] Fig.4 illustrates an example of message passing decoding processing or belief propagation decoding processing 700 and the relationships between check node, bit nodes, and rows and columns of a low-density parity check matrix H, which in this case contains 4 rows and 8 columns. The rows of H correspond to the check nodes of the parity check code's bipartite graph representing in this particular example an LDPC code, and the columns of H to the bit nodes (or variable nodes) of the parity check code's bipartite graph. As can be seen on the left hand side of the figure, the low-density parity check matrix includes non-null elements (i.e., value of "1") and null elements (i.e., value of "0"). For each decoding iteration, edge messages are passed between processing units (i.e., one or more bit engines to one or more check engines, and vice versa) in accordance with the edge connectivity of the bipartite graph. The non-null elements in the parity check matrix correspond to the edges of the bipartite graph that selectively interconnect certain check nodes to certain bit nodes.

[0032] While the mathematics applied in the message passing decoding approach contains hyperbolic and logarithmic functions, in conventional belief propagation decoder hardware implementations these functions are typically approximated by look-up tables (LUTs). The number of bits required in a fixed point look up table is determined by the required coding performance, speed of decoder convergence, and whether an error floor must be suppressed. However, as explained in detail later, there is a need for a decoder whose outputs are differentiable with respect to its inputs in order to support high performant non-data aided signal conditioning algorithms that reduce reference symbol overhead. The usage of Look-Up-Tables (LUTs) that implement hyperbolic and logarithmic functions precludes said differentiability. Hence, there is a need for a decoder circuit addressing this problem.

[0033] Recent advances in processing capabilities of general purpose Central Processing Units

(CPUs) and Graphical Processing Units (GPUs) have allowed for an increasing interest in using belief propagation decoders as implementation platform for receiver structures. Nowadays, the GPU and the CPU directly compete with the more traditionally used Field Programmable Gate Arrays (FPGAs) and custom developed Application Specific Integrated Circuits (ASICs), sometimes referred to as Very Large Scale Integrated circuits (VLSI), in the arena of so called Software Defined Radio (SDR) implementations of receivers and decoders embedded therein.

[0034] The largest computational complexity involved during the decoding process executed by a belief propagation decoder stems from the computations performed at the check node processing units in the aforementioned message passing algorithm. One of the greatest hurdles and impediments in designing effective devices and/or communication devices that can decode parity check coded signals is the typically large area and memory required to store and manage all of the updated variable edge messages and check edge messages that are updated and employed during iterative decoding processing (e.g., when storing and passing the check edges messages and the variable edges messages back and forth between a check node processing unit and a variable node processing unit, respectively). Moreover, the non-linear functions that need to be evaluated at the heart of the check node computations are costly in terms of memory look-up cycles. Exactly these non-linear functions are typically implemented or approximated in a way that is efficient in terms of VLSI area and speed of execution, however precluding the criterium of differentiability.

[0035] Fig.5 shows a conventional fully parallel LDPC decoder implementation in VLSI, wherein the bit node processing units (8002) implement summation and subtraction operations associated with the variable nodes, and wherein the check node processing units (8001) implement reliability message computations. To perform each decoder iteration, the variable and check node functional units are used once and messages are exchanged between them along the routed message wires. This allows the variable nodes or the check nodes to be updated in a block-parallel manner enabling a large number of decoder iterations to be performed during a block period as well as very high throughput. Very little control logic is needed for the parallel architecture when compared to the hardware sharing architecture (see below) because the parity check code graph is directly instantiated by the interconnection of the functional units. The major drawbacks with the parallel decoder architecture are however the relatively large chip area and the inability to support multiple block sizes and code rates, that is, multiple different parity check matrices, on the same core. [0036] Traditional fully parallel VLSI implementations of decoders such as shown in Fig.5 are constrained by the complexity of the physical interconnect 8010 required to establish the (Tanner) graph connectivity of the code as per the parity check matrix H and, hence, do not scale well for moderate to large code lengths. Long on-chip interconnect wires present implementation challenges in terms of placement, routing, and buffer-insertion to achieve clock circuit timing closure.

[0037] On the other hand, in serial architectures, sometimes called hardware sharing architectures, as shown in Fig.6, check node computations 1106 and bit node computations 1103 are distributed among a number of processing units that communicate through memory instead of a complex interconnect, as in Fig.5. Such architectures require a substantial memory overhead that amounts to four times the Hamming weight (i.e., the sum of all the non-zero elements) of the paritycheck matrix. Each processing unit requires separate read networks 1102 and write networks 1105 with complex control to access check message memory 1101 and bit message memory 1104, and hence it is practical to use only a small number of function units compared to the parallel case, resulting in a low- throughput decoder.

[0038] The throughput of a hardware sharing architecture is limited by the need for the processing units to be reused and the memory fabric accessed multiple times to perform each decoder iteration. Using multiple memories to achieve the required memory bandwidth is in general difficult because the essentially random or unstructured nature of the code graph resists a memory architecture that allows both the variable node and check node messages to be addressed efficiently.

[0039] In "Architecture generique de decodeur de codes LDPC (F. Guilloud, Ph.D thesis, ENST,

July 2004) iterative decoding algorithms and their hardware implementations are discussed. Amongst other things, a generic node processor is shown wherein output messages are computed using a systolic trellis-based implementation. Unused and dummy nodes occur in the realisation, which is not optimal in terms of efficiency.

[0040] The paper “High-Throughput LDPC decoders" (M. Mansour et al., IEEE Trans, on VLSI

Systems, vol.11, no.6, Dec.2003, pp. 976-996) presents a solution based on a turbo decoding message passing algorithm. The algorithm can be adapted for use with irregular codes. The proposed algorithm yields throughput and memory advantages in the decoder. A forward-backward processing is performed to process the output messages. There is no mention in the paper of requirements related to differentiability of the receiver structure.

[0041] In “Decoder-First Code Design" (E. Boutillon et al., Int'l Symp. on Turbo Codes and

Related Topics, Sept.2000, pp. 459-462) an approach is presented wherein one starts from an efficient hardware structure to construct an LDPC code that adequately fits that structure. One drawback of the proposed architecture is the outermost stages may comprise unused and dummy logic blocks when parity checks are carried out. This was done in order to obtain a regular structure. Such a solution is clearly not optimal in terms of efficiency.

[0042] Hence, there is a need for a parity check decoder design where one or more of the abovementioned limitations and drawbacks are avoided or overcome.

Summary of the invention

[0043] It is an object of embodiments of the present invention to provide for a check node processing unit for an iterative parity check code decoder with reduced requirements in terms of memory and allowing for an improved data throughput compared to prior art solutions. It is also an object of the invention to provide for a check node processing unit whose reliability message outputs are readily differentiable with respect to its inputs in a hardware efficient manner. It is a further object of the invention to provide for a check node processing unit that can, as a byproduct, output a so called soft-syndrome signal which is also readily differentiable with respect to its inputs. It is also an object of the invention to provide for a parity check code decoder comprising at least one such a check node processing unit. A further object is to provide a method to decode a signal encoded with a parity check code by means of a parity check code decoder as described. Yet a further object is to provide a method whereby the soft-syndrome outputs generated by the parity check code decoder can be combined into a so-called loss function that is differentiable with respect to all the decoder inputs as well as with respect to parameters used by signal conditioning blocks included in the analog front end and the metric generator.

[0044] The above objective is accomplished by the solution according to the present invention.

[0045] In a first aspect the invention relates to a check node processing unit for use in decoding a signal encoded with a parity check code, said parity check code associated with a parity check matrix and representable by a graph with a plurality of check nodes and a plurality of bit nodes. The check node processing unit is configured to determine reliability messages to be exchanged between the check nodes and the bit nodes when decoding the signal. The check node processing unit comprises :

- a plurality of processing elements with a plurality of input messages, each processing element of said plurality comprising a first and a second input to receive a first and a second input message of said plurality, respectively, and an output to generate, based on the first and said second input message, an output message being a reliability message or an intermediate message,

- a plurality of data links interconnecting processing elements of the plurality to form a three- dimensional lattice circuit, whereby a first and a second dimension of the lattice circuit define a plane wherein output messages of a first subset of interconnected processing elements form in the plane a step pattern having a direction and wherein output messages of a second subset of interconnected processing elements form in the plane a step pattern having an opposite direction of the first step pattern, wherein each input message of the plurality is fed twice to the three-dimensional lattice circuit, once via a first data link of the plurality along the first dimension and once via a second data link along the second dimension, wherein interconnected processing elements of a third subset of interconnected processing elements are each arranged to receive along a third dimension of the lattice circuit at least a first intermediate message output by another processing element and to output a corresponding reliability message based on at least the first intermediate message, and wherein at least one processing element of the plurality comprises:

- a plurality of multiplier circuits and adder circuits arranged to process the first or the second input of the at least one processing element,

- a function evaluator arranged to operate on intermediate signals from the processing with the plurality of multiplier circuits and adder circuits, whereby multiplier circuits and adder circuits of the plurality are also arranged to process outputs of the function evaluator and wherein the function evaluator is a rectifier circuit.

[0046] The proposed solution indeed allows for performing the decoding operation in an efficient way. The plurality of processing element in the check node processing unit form a systolic array wherein computations for check node processing are performed locally, using small registers, thereby avoiding large memory requirements. Another advantage of the proposed solution is that one obtains high throughput. The check node processing unit can be used with any type of parity check code, for example a LDPC or a HDPC code. The check node processing unit can be used in parity check decoders with various types of architecture, e.g., fully parallel or partially parallel or serial. Each processing element receives two input messages and produces one output message. The latter is either a reliability message or an intermediate message which is fed to another processing element of the check node processing unit. An important characteristic of the check node processing unit is that the data links interconnecting the processing elements form a 3D lattice circuit. Two dimensions of the lattice circuit define a plane and the processing elements positioned in that plane receive one input message via data link along one dimension of the plane and one input message via a link along the other dimension. With respect to their output messages these processing elements can be divided in two subsets, whereby the output messages of processing elements of one subset form a step pattern in a certain direction (e.g. ascending or descending) and the output messages of processing elements of the other subset form a step pattern in the inverse direction (e.g. descending if the output messages of the first subset form an ascending pattern). Further there is a third subset of processing elements which receive their two inputs along the third dimension of the lattice circuit and output a reliability message. At least one of the processing elements, and preferably all, are provided with multiplier circuits and adder circuits and a function evaluator to evaluate a differentiable function and operating on intermediate signals from the processing with the plurality of multiplier circuits and adder circuits. In order to support hardware-efficient differentiability the function evaluator is implemented as a rectifier circuit.

[0047] Advantageously, the multiplication factors and addition terms of the plurality of multiplier circuits and adder circuits take values such that the computation of the processing element approximates closely at least one algorithm of the group comprising {Sum-Product algorithm, Min-Sum algorithm, Normalized Min-Sum algorithm, Offset Min-Sum algorithm}. However, the differentiability of the circuit also allows for these multiplication factors and addition terms to be determined by a stochastic gradient descent based machine learning method and as such a novel entirely unique algorithm may be produced either during offline training or online, in real-time during reception of an actual live signal.

[0048] In one embodiment the check node processing unit further comprises a processing element arranged to produce a syndrome. In some embodiments the syndrome may be a soft syndrome that supports the generation of a so-called loss-function emanating from the decoder, consisting of contributions from individual soft-syndromes. Such a loss function would then be differentiable in a hardware-efficient manner with respect to all the decoder inputs as well as with respect to parameters used by signal conditioning blocks included in the analog front end and the metric generator, as well as differentiable with respect to the multiplication factors and addition terms used by the plurality of multiplier circuits and adder circuits of the check node processing unit. The differentiability can be exploited by means of a stochastic gradient descent algorithm in order to optimize one or more mentioned parameters of the analog front end, metric generator and or processing element, so achieving an optimally tuned receiver functionality with a reduced need for Reference Symbol insertion in the signal. In order to support early stopping of the decoder iterations, the check node processing unit can optionally comprise an extra sign function evaluator to transform a soft syndrome into a hard syndrome that controls the early stopping mechanism.

[0049] In another embodiment the check node processing unit further comprises an input selection block adapted to select between log likelihood ratios and variable node messages as the input messages.

[0050] In some embodiments the check node processing unit is arranged to determine the reliability messages from input messages received at the first input of the processing elements and to determine a syndrome from input messages received at the second input of the processing elements. [0051] In another aspect the invention relates to a parity check code decoder comprising one or more check node processing unit as previously described.

[0052] In a preferred embodiment the parity check code decoder comprises computation means for deriving from a plurality of bit node messages resulting from a current iteration a set of input messages for the one or more check node processing units for use in a next decoding iteration and for deriving from the reliability messages output by the one or more check node processing units a set of bit node messages for the next decoding iteration.

[0053] In one embodiment the computation means comprises a unit for deriving the set of input messages and a different unit for deriving the set of bit node messages.

[0054] In another embodiment also a set of input metrics is considered when deriving the set of bit node messages and the different unit is split up in a plurality of subunits.

[0055] In an advantageous embodiment the parity check code decoder is implemented in VLSI.

[0056] In another aspect the invention relates to a receiver structure comprising a parity check code decoder as described above.

[0057] In another aspect the invention relates to a method to decode a signal encoded with a parity check code, with a check node processing unit as described above, said parity check code associated with a parity check matrix and representable by a graph with a plurality of check nodes and a plurality of bit nodes. The method comprises :

- receiving a plurality of input messages to be fed to a three-dimensional lattice circuit comprising a plurality of processing elements interconnected via data links, whereby a first and a second dimension of said lattice circuit define a plane wherein output messages of a first subset of interconnected processing elements form in the plane a step pattern having a direction and wherein output messages of a second subset of interconnected processing elements form in the plane a step pattern having an opposite direction of the first step pattern,

- applying to each processing element of said plurality a first and a second input message, whereby each input message of the plurality is fed twice to the three-dimensional lattice circuit, once via a first data link along the first dimension and once via a second data link along the second dimension, whereby interconnected processing elements of a third subset of interconnected processing elements each receive along a third dimension of the lattice circuit at least a first intermediate message output by another processing element of the plurality,

- in at least one processing element of the plurality processing said first or second input message using a plurality of multiplier circuits and adder circuits, and applying resulting intermediate signals to a function evaluator implemented as a rectifier circuit, whereby multiplier circuits and adder circuits of the plurality are also arranged to process outputs of the function evaluator, - generating at the output of each processing element, based on the applied first and second input message, either a reliability message or an intermediate message.

[0058] For purposes of summarizing the invention and the advantages achieved over the prior art, certain objects and advantages of the invention have been described herein above. Of course, it is to be understood that not necessarily all such objects or advantages may be achieved in accordance with any particular embodiment of the invention. Thus, for example, those skilled in the art will recognize that the invention may be embodied or carried out in a manner that achieves or optimizes one advantage or group of advantages as taught herein without necessarily achieving other objects or advantages as may be taught or suggested herein.

[0059] The above and other aspects of the invention will be apparent from and elucidated with reference to the embodiment(s) described hereinafter.

Brief description of the drawings

[0060] The invention will now be described further, by way of example, with reference to the accompanying drawings, wherein like reference numerals refer to like elements in the various figures. [0061] Fig.l illustrates a scheme of a communication system as considered in the present invention.

[0062] Fig. illustrates a bipartite graph associated to a parity check code.

[0063] Fig.3 illustrates in general a parity check code decoding functionality in a receiver of a communication system.

[0064] Fig.4 illustrates message passing decoding processing between check nodes and bit nodes.

[0065] Fig.5 illustrates a fully parallel message passing parity check code decoder as known in the art.

[0066] Fig.6 illustrates a message passing parity check code decoder with a hardware sharing architecture (prior art).

[0067] Fig.7 illustrates a communication system comprising a receiver structure provided with an LDPC decoder as in the present invention.

[0068] Fig.8 illustrates an embodiment of a check node processing unit according to the present invention.

[0069] Fig.9 illustrates an embodiment of a check node processing unit according to the present invention. [0070] Fig.10 illustrates an embodiment of a single processing element of the check node processing unit of the invention.

[0071] Fig.11 illustrates an embodiment of a parity check code decoder, which is particularly well suited for relatively small code block sizes.

[0072] Fig.12 illustrates how matrix multiplication is used for combined signal interconnect and variable node message computation in the parity check code decoder of Fig.11.

[0073] Fig.13 illustrates an embodiment of a parity check code decoder, which is particularly well suited for medium to large code block sizes.

[0074] Fig.14 illustrates how matrix multiplication is used for combined signal interconnect and variable node message computation, with smaller matrices combined with traditional wire interconnect as in Fig.13.

Detailed description of illustrative embodiments

[0075] The present invention will be described with respect to particular embodiments and with reference to certain drawings but the invention is not limited thereto but only by the claims.

[0076] Furthermore, the terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a sequence, either temporally, spatially, in ranking or in any other manner. It is to be understood that the terms so used are interchangeable under appropriate circumstances and that the embodiments of the invention described herein are capable of operation in other sequences than described or illustrated herein.

[0077] It is to be noticed that the term "comprising", used in the claims, should not be interpreted as being restricted to the means listed thereafter; it does not exclude other elements or steps. It is thus to be interpreted as specifying the presence of the stated features, integers, steps or components as referred to, but does not preclude the presence or addition of one or more other features, integers, steps or components, or groups thereof. Thus, the scope of the expression "a device comprising means A and B" should not be limited to devices consisting only of components A and B. It means that with respect to the present invention, the only relevant components of the device are A and B.

[0078] Reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment, but may. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner, as would be apparent to one of ordinary skill in the art from this disclosure, in one or more embodiments.

[0079] Similarly it should be appreciated that in the description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

[0080] Furthermore, while some embodiments described herein include some but not other features included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention, and form different embodiments, as would be understood by those in the art. For example, in the following claims, any of the claimed embodiments can be used in any combination.

[0081] It should be noted that the use of particular terminology when describing certain features or aspects of the invention should not be taken to imply that the terminology is being redefined herein to be restricted to include any specific characteristics of the features or aspects of the invention with which that terminology is associated.

[0082] In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In other instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

[0083] It was already explained in the background section that when designing a message passing decoder for a parity check code, one of the key hardware implementation challenges relates to the management of the large number of messages to be exchanged during each decoder iteration. Therefore, the present invention proposes in one aspect a check node processing unit for a parity check decoder which allows for a reduction in the memory requirement compared to the prior art solutions (where e.g., a large memory such as Random Access Memory (RAM) is oftentimes used to store the check edge messages and bit edge messages). The solution of the invention also provides for a high throughput. In the belief propagation decoding architecture disclosed here the interconnect challenge presented by the need for message passing of the reliability messages (i.e. the check node messages) from the check nodes to the bit nodes, as well as the computational tasks related to obtaining the bit node messages, are implemented by means of a mixture of wire interconnects and distributed matrix multiplication operations according to the design principles of a systolic system, sometimes called systolic array. More specifically, the check node processing unit according to the invention comprised in a parity check code decoder employs a systolic array having appropriately placed registers and only localized linear matrix multiplications and non-linear mathematical functions operate to perform the appropriate computations and interconnections of the check node messages and bit node messages for check node processing and bit node processing, respectively. In this invention the parity check decoding is performed by a circuit that generates soft information which can assist other receiver functions such as equalisation, channel estimation and synchronization in order to reduce overhead associated to known reference symbols that are traditionally inserted in the transmitted waveform. In some embodiments, the solution presented herein capitalizes on the fact that a parity check matrix H of the linear block code is composed of sub-matrices. For example, sub-matrix-based processing can then be employed when processing the parity check matrix. The fact of H being composed of submatrices can be exploited to provide for improved architectures and efficiencies.

[0084] The processing unit as proposed is equally applicable in decoding parity check coded signals whose parity check matrix does not include sub-matrices. In other words, the decoding structure presented herein can accommodate any form of parity check matrix H (regular low density, irregular low density or high density such as is the case with BCH codes, polar codes, Reed Solomon codes, Hamming codes etc).

[0085] The check node processing unit presented herein leverages a systolic array architecture to increase data throughput and reduce memory storage requirements. Routing congestion and critical path latency are also improved in the design. The decoding means presented herein can also be applied to LDPC codes operating on a low-density parity check matrix consisting of circularly shifted identity matrix sub-blocks. In some embodiments, the entire low-density parity check matrix H is broken into square sub-matrices such that each sub-matrix consists of either a CSI (Cyclic Shifted Identity) matrix or a null matrix (all zeroes). A CSI sub-matrix is generated using cyclic shifting of an identity matrix (e.g., a CSI sub-matrix can be generated by performing cyclic shifting when starting with an original sub-matrix which is an identity matrix with all main diagonal elements having a value of "1", and all non-diagonal elements "0"). In case the low-density parity check matrix employed to decode an LDPC coded signal includes sub-matrices having CSI format, the CSI offset per sub-matrix should be known. The CSI offset is the cyclic shifting required from an original identity matrix needed to generate the CSI matrix of interest. LDPC codes whose low-density parity check matrices H have such a structure (e.g., including CSI sub-matrices, etc.) are found for example in the IEEE 802. lln draft 2.02 standard, the IEEE 802.16e standard and the 5G NR standard.

[0086] The embodiments and approaches described herein are applicable regardless of the overall parity check code decoder architecture. This architecture may for example be a fully parallel, partially parallel, or serial in architecture/hardware implementation. In a fully parallel implementation, there is one check node processing unit per check node. In a serial or hybrid pa ra llel/seria I implementation there can be less check node processing units than the amount of check nodes. A scheduling scheme then is needed to let the check node processing unit compute alternatingly the check node messages for the different various check nodes.

[0087] The check node processing unit presented herein represents a circuit that produces reliability outputs (904) on the one hand and a soft syndrome (903) on the other hand. Both types of outputs are differentiable in a hardware efficient manner with respect to the inputs of the check node. This property allows for the construction of a loss function output of the LDPC decoder, which can be used by a stochastic gradient descent algorithm in order to optimize any or all mentioned parameters of the analog front end, metric generator and or Processing element PE, achieving an optimally tuned receiver functionality with a reduced need for Reference Symbol insertion in the signal, thereby increasing bandwidth efficiency and reducing bit error probability at the output of the receiver.

[0088] The parity check code decoder as presented herein is advantageously a part of a receiver structure of a communication system, for example as depicted in Fig.7. Referring to the communication system 200 of Fig.7, at a transmitting end of a communication channel 299, information bits 201 are provided to a transmitter 297 operable to perform encoding of these information bits 201 using an encoder and symbol mapper 220 (which may be viewed as being distinct functional blocks 222 and 224, respectively) thereby generating a sequence of discrete-valued modulation symbols 203 that is provided to a transmit driver 230. The transmit driver uses a DAC (Digital to Analog Converter) 232 to generate a continuous-time transmit signal 204 and a transmit filter 234 to generate a filtered, continuous-time transmit signal 205 that substantially comports with the communication channel 299. At a receiving end of the communication channel 299, a continuous-time receive signal 206 is provided to an Analog Front End 260 that includes a receive filter 262 (that generates a filtered, continuous-time receive signal 207) and an Analog to Digital Converter 264 (that generates discrete-time receive signals 208). A metric generator 270 calculates symbol metrics 209 that are employed by a decoder 280 to make best estimates of the discrete-valued modulation symbols and information bits encoded therein 210. The symbol metrics are typically obtained by the metric generator 270 after so-called signal conditioning, comprising e.g. DC offset removal, channel equalization, removal of residual carrier frequency offset and carrier phase error, as well as removal of symbol timing error. In many cases this signal conditioning process is leveraging the presence of the aforementioned a-priori known reference symbols in the waveform in order to achieve optimal signal conditioning results, at the cost of reduced bandwidth efficiency due to reference symbol overhead. Signal conditioning algorithms that make use of known reference symbols are sometimes called data-aided algorithms, whereas signal conditioning algorithms that do not make use of a-priori known reference symbols are sometimes called non-data- aided algorithms. Typically, the accuracy of conventional data-aided algorithms is much better than non-data aided algorithms. In any event these algorithms have parameters that must be adjusted in order to achieve optimal performance of the receiver. The check node processing unit according to the invention, comprised in a parity check code decoder (280) arranged to provide a loss function based on soft syndromes (903), facilitates optimisation of these algorithm parameters without needing access to reference symbols in the signal.

[0089] The specific parity check decoder circuit of this invention is designed such that it can provide specific output information that can be used by the signal conditioning function of the receiver such that the need to incorporate known reference symbols in the signal can be significantly reduced, and the bandwidth efficiency increased accordingly. As such the circuit of this invention supports in a computationally efficient manner a class of high performant non-data-aided algorithms for signal conditioning that derives from the domain of machine learning and artificial intelligence. One criterium to be satisfied for the decoder circuit in order to achieve this is that the output of the mathematical operations implemented by the decoder circuit have to be differentiable with respect to their inputs, such that stochastic gradient descent (SGD) methods are applicable, as explained further below.

[0090] Aside from Additive White Gaussian Noise (AWGN) and Signal to Noise Ratio (SNR) specific characteristics of the signal at the receiver input could be more or less pronounced effects of Rayleigh fading, frequency selectivity, phase noise, interference type e.g., static frequency clean wave (CW), modulated interfere^ sweeping CW interfere^ and interference strength. These characteristics are not known in advance and require a flexible decoder and receiver means, such as the one presented herein, that can adapt to the real-time characteristics at hand, such as to reduce the amount of required message passing iterations needed for error correction, which again reduces power consumption, latency and decoder throughput.

[0091] As already mentioned, the check node processing unit and the parity check code decoder presented herein are based on a systolic system. A systolic system or systolic array is a network of processors that rhythmically compute and pass data through the system. A systolic array is an interconnected matrix of individual signal processing units or "cells", where the cells process individual elements of an input matrix and exchange processed output to perform an overall operation. A systolic array has the characteristic features of modularity, regularity, local interconnection, high degree of pipelining, and highly synchronized multiprocessing. Since a large number of cells drawn from a small set of cell types with bounded degrees are connected with neighbouring cells in a regular fashion, systolic arrays are easily realized in contemporary VLSI technology, which is known to be very efficient when it comes to power consumption required for a given computational task.

[0092] The generic systolic array architecture of these VLSI circuits is inspired by the systolic flow of blood through the human body. Examples of existing general purpose VLSI devices that have a systolic architecture are Graphical Processing Units (GPUs). A more recent addition to this family is the so-called Tensor Processing Unit (TPU).

[0093] In systolic arrays, once data is taken out of the memory, it can be used effectively at each cell of the array it passes while being pumped from cell to cell along the array, thus avoiding the classic memory access bottleneck problem commonly incurred in Von Neumann machines. The novel decoding means presented herein is specifically designed to the principles of systolic systems in order to leverage advantages of memory efficiency and power efficiency. Hence, the decoder circuit of the invention is not only differentiable in support of non-data aided signal conditioning, but also offers a hardware and interconnect-efficient architecture with minimal memory throughput required thanks to a systolic design.

[0094] More specifically, the decoder according to this invention employs such a systolic array for the memory and power efficient computation of reliability messages (i.e. check node messages), variable node messages and syndromes. This is considerably different from existing RAM-based decoders which contain a RAM for every bit node as well as for every check node. The systolic array based approach presented herein also differs from the fully parallel architecture where every connection from check node to bit node as well as from bit node to check node is instantiated with a wire and register.

[0095] In the systolic array based processing parity check code decoder described herein, a check node processing unit is operable to compute all reliability messages associated with a check node, and directly connect them to the variable nodes (also called bit nodes) associated with this check node, according to the parity check matrix H, without a need to retrieve the inputs from memory or registers, nor a need for storing the outputs to memory or registers (as is the case in prior art approaches). The check node processing unit comprises a lattice of locally connected processing elements.

[0096] Unlike conventional parallel architectures employing a lattice of processing elements, the systolic array based processing unit is characterized by a regular data flow. Typically, two or more data streams flow through the cells of the systolic arrays with various speeds and directions. Data items (here input messages) from different streams interact with each other and trigger computations in the processing elements where they meet. A systolic array is in general an n-dimensional structural pipeline with local communication between the processing elements. In this invention it is a 3-dimensional structure. Thus, a systolic array simultaneously exploits both pipelining and parallelism in the decoding algorithm.

[0097] Fig.8 illustrates an example embodiment of a systolic check node processing unit that produces (in parallel), for one given row index j of the parity check matrix H, all reliability messages 904 (also named check node messages) Aji to the various variable nodes Vi, starting from the bit node (variable node) messages 910 denoted as Q.ij, whereas indices / correspond to non-zero columns in row j of H. In some embodiments, as for example in Fig.8, an input selector 920 may be provided, configured to select the input messages 911 required for the computation and corresponding to the variable node messages 910 rather than the log likelihood ratio metrics (LLRs) 901. In the example of Fig.8, the degree of the check node is 7, which means that row j of parity check matrix H has exactly 7 non-zero entries and that the processing unit receives 7 inputs and produces at its output 7 reliability messages (to 7 bit nodes).

[0098] Fig.9 offers a three-dimensional view on the lattice circuit that interconnects by means of data links the plurality of processing elements, and their inputs and outputs, within the lattice circuit 930 shown in the example processing unit of Fig.8. In Fig.9 identical copies of input messages (bit node messages) Qij, Q₂j, through Q₇j are shown to be fed twice, once horizontally along the Y-axis of the lattice and once vertically along the x-axis of the lattice. The output reliability messages (check node messages) Xji, Xj2 through Xj? are fed out of the lattice along the z-axis of the 3D structure. Fig.9 clearly shows there is both an ascending ladder formed by a first subset of processing elements 1201 and a descending ladder structure (i.e., a ladder structure in the opposite direction as the ladder built by outputs of the first subset) formed by a second subset of processing elements 1211. The two ladder structures interconnect the input messages and processing elements in the XY-plane of the three- dimensional lattice circuit. Fig.9 also shows a final cross-interconnection of those ladders, along the z- axis, to the processing elements 1221 and the outputs, i.e. the reliability messages. The subset of processing elements in the z-dimension are labelled 1221. Each processing element in the lattice accepts exactly two inputs and produces one output. For example, processing elements in the XY-plane receive one input message via a data link along the first dimension and one input message via a data link along the second dimension. The output of a processing element is either a reliability message (see for example nodes 'opl' and 'op7' in Fig.9) or an intermediate message which is used as an input message for another processing element. As such, Fig.9 clearly illustrates that the proposed decoder exhibits all the properties of a systolic array and inherits the aforementioned benefits from the systolic array (low latency, reduced power consumption high throughput and less memory bottlenecks). [0099] From the three-dimensional lattice structure shown in Fig.8 or, more clearly, in Fig.9 it can now also be understood that in this efficient parity check decoder architecture, the processing unit that calculates the reliability messages for a check node of degree d_v comprises exactly N_pe = 3 x (d_v - 2) processing elements 902. In the embodiment depicted in Fig.8, where d_v = 7, this corresponds to exactly N_pe = 3 x (7-2) = 15 processing elements 902. In contrast to the papers mentioned in the background section no unused or dummy processing elements are required to be present in the lattice structure. Note that in Fig.9, in order not to overload the figure, the optional extra processing elements required to compute the soft syndrome are not shown.

[0100] In another embodiment of the check node processing unit a syndrome output 903 is produced on top of the reliability messages ij,. This requires one extra identical processing element 905 (see Fig.8), optionally followed by a means to compute the sign function 925, where the sign() function in this description is defined as : Sign(x) = 0 if x > 0 and = 1 if x< 0. Note that in the embodiment depicted in Fig.8 this optional sign () function is also shown. Embodiments including a syndrome output comprise N_pe = 3 x (d_v - 2) + 1 processing elements for a check node of degree d_v.

[0101] In one particular embodiment the syndrome output takes on, for example, a binary value 0 if the parity check corresponding to row m of parity check matrix H is satisfied when evaluated on the input messages of the check node processing unit, or a binary value 1 if that parity check is not satisfied. This requires that the likelihood ratio metrics (LLRs) 901 rather than to the variable node messages 910 be fed to the syndrome computation. Possibly this may be achieved by means of the input selector 920, if present, that selects input messages 911. In other embodiments there may be no need for syndrome computation by the check node processing unit (for instance because a nonsyndrome related iteration stopping criterion is used). In that case there is no need for an input selector 920 in the design of the check node processing unit. In yet another embodiment it may be preferred to output a so-called soft syndrome that exhibits both a sign and a magnitude. In yet another embodiment it may be preferred to calculate the soft or hard syndrome on the basis of the variable node messages 910, Qjj., rather than on the basis of the updated LLR inputs that serve as the input for hard decisions. In such embodiment there is no need to have an input selector 920 and the syndrome output may always be available along with the reliability output messages i, .

[0102] The inventors have realized that a receiver 298 as a whole (and the decoder 280 as a part of it) next to its ordinary purpose of outputting user traffic bits, can also be regarded as a predictive circuit that estimates (for every parity check equation j) the conditional probability that a given parity check equation j is satisfied. This probability is conditioned on a given received signal 206. A soft- syndrome can be regarded as the natural logarithm of this conditional probability that the parity check equation j was satisfied. In the language of machine learning, one can say a soft syndrome is a kind of 'logit' corresponding to a to be predicted 'label' given by parity check bit Cj. In the transmitter, prior to corruption introduced by the channel 299, all check bit Cj equal zero regardless the unknown user bits involved in the parity check. So, this alternative interpretation of a receiver as a binary classification prediction circuit, has the special property whereby correct prediction on the basis of soft syndromes always results in a predicted label that equals zero.

[0103] In a preferred embodiment, the hard or soft syndrome outputs of the individual check node processing units are fed to a summation circuit which outputs a so called syndrome-sum which expresses a likelihood that the code word at the output of the current iteration indeed satisfies the relationship Hx^T=0. Let zj,_m be the soft syndrome output (903) according to Fig.8 (wherein the optional sign function 925 is understood to be omitted), obtained in the final decoder iteration, of the j^th parity check equation of the m^th codeword in a batch of codewords processed by the decoder, then the following types of summation circuits are all understood to loosely fall under the term 'syndrome-sum':

S = ± log sigmoid^Zj ^) j m whereby the amount of codewords in a batch may also equal 1 (leading to a reduced summation) and whereby the second and third form are more sophisticated versions of the first form that deal with large values of Zj,_m in a more standard way. Alternative embodiments of the syndrome sum can be conceived whereby the summation also takes into account the soft-syndrome outputs corresponding to parity check nodes in earlier iterations, rather than only the ones corresponding to the final iteration. Those skilled in the art of machine learning and artificial intelligence will appreciate that all these types of soft-syndrome sums S represent a binary cross-entropy loss function between the parity check bits Cj that are the 'labels' and the 'logits' Zj, whereby the 'correct' check node labels are a priori known to be zero. The loss functions based on the soft syndrome sum are differentiable with respect to the inputs of the decoder 280 (and therefore differentiable w.r.t. the parameters of functions 270, 262 etc that affect the input metrics of the decoder, but also advantageously w.r.t. the parameters that define the processing element shown in Fig.10). The inventors have recognized that a receiver that self-learns and adapts to minimize this loss presented by the sum of the soft syndromes as computed by the processing element, automatically also minimizes the user bit errors at the receiver output.

[0104] Fig.10 illustrates a preferred embodiment of a single processing element used as a building block of the lattice circuit of the check node processing unit of Fig.9. As already indicated previously, it has exactly two inputs 1001 and 1002 and produces exactly one output 1040. The inputs are either LLR values or variable node messages. The output is either an intermediate message to be applied to another processing element of the lattice circuit or a reliability message Xji. The processing element includes adder circuits 1020 and multiplier circuits 1030, as well as a function evaluator 1010. In the processing element according to the invention, the function evaluator returns the element-wise output of a rectifier function f(x), where f(x) = x if x > 0, f(x) = 0 otherwise, for all elements of the input vector to the function evaluator. This rectifier function may be implemented as a low complexity VLSI circuit that evaluates the sign bit of its input and returns an output equal to its input when that sign bit is 0, or returns a zero valued output otherwise.

[0105] In order to illustrate that the processing element of Fig.10 is indeed suitable for belief propagation, those skilled in the art may readily evaluate that an example embodiment whereby multiplication factors a_mn and Ck in Fig.10 take on the following values:

while the adder inputs take on values bi = b₂ = b₃ = b = d = 0, will lead to a correctly functioning LDPC decoder, closely approximating the BER performance of a min-sum algorithm. In other embodiments, the processing element may use other values for the multiplication factors and terms used in the addition. Advantageously these other values are learnt online through the application of a stochastic gradient descent function that operates in order to minimize the loss function given by the soft- syndrome sum discussed above. In contrast to the present invention, the traditional check node unit computation supporting the belief propagation algorithm known in the art as the Sum Product Algorithm (SPA) whereby Xji is computed according to tanh (^)) (4),

and a soft-syndrome z, is typically computed according to (5), whereby the product is taken over all indices k as opposed to (4)

Zj = 2ta.nh~ (Hfc tanh (^)) (5),

[0106] The processing element illustrated in the example of Fig.10 has a computational structure that brings along numerical stability advantages over the various reliability message computation methods known in the art, which exhibit undesired properties such as the fact that atanh(tanh(x)) does not equal x when x is quantized and fed through look-up table (LUT) that tabulate the functions tanh() and atanh(). Also, importantly, whereas (4) and (5) are differentiable functions from a mathematical standpoint, it is unpractical and unstable to compute their gradient in a hardware efficient manner. Popular existing methods that replace these functions by Look-Up Tables in VLSI are certainly not differentiable. In contrast, the processing element of the invention illustrated in Fig.10 has an optimal computational structure with multipliers, adders and rectifier functions only, whereby the rectifier function f(x) = x when x >= 0 and f(x) = 0 elsewhere. These are the very same standard mathematical operations used readily at massive scale in neural networks which require hardware efficient and fast differentiation in support of so-called backward propagation of gradients. It is noted that in the context of neural networks the industry accepted way to define the derivative of f(x), which is called f'(x), as follows: f'(x) = 1 when x >= 0 and f'(x) = 0 elsewhere. As such one can readily see the extreme benefit of using this standard function inside the processing element of this invention, with an eye to the differentiability of the receiver including the decoder.

[0107] It is known in the art that the SPA algorithm which uses formulas (4) and (5) can achieve near-optimal decoding performance, but it is complex. The Min-Sum Algorithm (MSA) is a reduced complexity algorithm with min-sum approximation, but the outputs from check nodes in MSA decoding are overestimated compared to Sum Product Algorithm (SPA). Normalized Min-Sum Algorithm (NMSA) and Offset Min-Sum Algorithm (OMSA) aim to optimize MSA. Layered message passing algorithm can be combined with SPA, MSA, NMSA and OMSA, which leads to Layered Sum Product Algorithm (LSPA), Layered Min-Sum Algorithm (LMSA), Layered Normalized Min-Sum Algorithm (LNMSA) and Layered Offset Min-Sum Algorithm (LOMSA), because layered message passing algorithm can accelerate the convergence process. In contrast to the processing element presented in this invention, none of the MSA, NMSA, OMSA, LNMSA are readily differentiable in a hardware efficient way, precluding the selflearning aspect of the receiver that is a main objective facilitated by the proposed processing element. [0108] The outputs from variable nodes in MSA decoding are overestimated compared to SPA due to the approximation (simplification) of the check node computation used in SPA. There are several ad-hoc methods to optimize Min-Sum Algorithms (MSA) to make this approximation more accurate. The two most popular methods are Normalized Min-Sum Algorithm (NMSA) and Offset Min-Sum Algorithm (OMSA). The idea behind NMSA and OMSA is to reduce the magnitude of variable node outputs.

[0109] The parity check code decoding means presented herein can self-learn, through gradient descent based on the soft-syndrome sum loss function, to closely approximate any of these message computation and scheduling schemes and offers a less ad-hoc, smooth hybrid approach that matches the characteristics of the signal at the input of the receiver in real time in order to reduce the amount of required message passing iterations needed for error correction, which reduces power consumption, latency and decoder throughput.

[0110] A processing element as in Fig.10 or any variant embodiment of it can be implemented efficiently in VLSI as a systolic array such that the overall lattice that interconnects these processing elements to form the check node processing unit of Fig.8 (also shown, differently, in Fig.9), is again a systolic array.

[0111] Parity check codes are essentially random in nature (and hence, require highly irregular hardware interconnectivity), which goes against efficient VLSI implementation paradigms that call for regular and structured design.

[0112] Fig.11 illustrates an embodiment of the proposed interconnect architecture for a parity check code decoder, particularly well-suited for relatively small code block sizes. All updated bit node messages Q.ij from the (k-l)^th iteration are conceptually grouped into a placeholder vector 1301, symbolically represented as the one-dimensional vector [q] k-i- A first Matrix Multiplication Unit MMU 1302, which itself may be implemented in VLSI as a systolic lattice, computes an output vector 1303, symbolically represented as the one-dimensional vector [v]k, according to the matrix multiplication:

[v]k = [A] . [q]k-i.

[0113] In one embodiment, each row of matrix A contains exactly one non-zero element in accordance with the interconnect scheme defined by the parity check matrix H. Note that the matrix A does not equal the matrix H, it is constructed on the basis of matrix H though. In some embodiments, this non-zero element in any given row of matrix A may take on the unity value. In other embodiments, the non-zero element may take on other values proportional to the confidence level attributed to certain bit node messages. This confidence level may for instance be a function of the length of the cycles in the Tanner graph local to the variable node or its proximity to known synchronization symbols in the received symbol stream.

[0114] The matrix A is constructed such as to ensure that the right groupings of scaled bit node messages are presented as input messages to the various check node processing units 1304, which are implemented internally according to the three-dimensional lattice architecture described herein and shown in the examples of Fig.8 or Fig.9. These processing units 1304 compute all reliability messages (check node messages) which are conceptually grouped into a placeholder vector 1305, symbolically represented as the one-dimensional vector [ k. Another placeholder vector 1306, symbolically represented as [w]k is conceptually defined as the concatenation of [ k with the vector 1309 of input metrics called LLR or Xi elsewhere in this disclosure. A second matrix multiplication unit MMU_B 1307 then is responsible for computing the vector 1308 of bit node messages, symbolically represented by [q]k, ready for feeding the next, k^th iteration, according to the equation [q]k = [B] . [w]k, wherein each row of matrix B now typically contains as many non-zero elements as required to realise both the required message interconnect and the variable message computations. In some embodiments the non-zero elements of the matrix B assume values exactly equal to 1, as is the case if the aim is to exactly implement formula (3). In other embodiments non-zero weights can be attributed to different reliability messages input to the variable node computation. In some embodiments the matrix B includes sufficient rows to allow for the simultaneous computation of both the a posteriori probability messages Qi (soft information messages) and the variable node messages.

[0115] Fig.12 illustrates the use of matrix multiplication for combined signal interconnect

(routing) and variable node message computation according to formula (3), for a hypothetical very small parity check matrix 1701. The Matrix Multiplication Unit 1705 performs a matrix multiplication of the input [w] by a matrix [B] in order to produce the output [q]. The subvector that contains variable node messages for bit index 4 is highlighted in dashed rectangle 1706. Rows below the solid line 1710 are optionally present in embodiments where the Matrix Multiplication Unit also computes the soft information messages Qi, such as 1708 which highlights Q4.

[0116] In an embodiment a matrix multiplication circuit is provided for combined signal interconnect (routing) and variable node message computation wherein the matrix multiplication means is realized according to systolic array principles. In such an embodiment the matrix multi plication unit reuses inputs many times as part of producing the output. Each input value is read once but used for many different operations without storing it back to a register. Wires only connect spatially adjacent Arithmetic Logic Units (ALUs), which makes them short and energy-efficient. The ALUs perform only multiplications and additions in fixed patterns, which simplifies their design.

[0117] Fig.13 shows an interconnect structure according to an embodiment of the proposed parity check code decoding means, which is typically better suited for larger code block lengths. Any existing sub-structure (CSI or other sub-structure) of the parity check matrix is thereby advantageously used to implement an interconnect and variable node message computation scheme that can be seen as a hybrid between the approach of Fig.11 and the parity check matrix defined wire-based interconnect of Fig.5. Indeed, in Fig.13 the variable node messages are re-ordered 'by wire' according to the parity check matrix H in block 1502, yielding an output vector 1503, symbolically represented as the vector [v]k, slices or portions of which are presented to the various check node processing units 1504, which again are implemented internally according to the 3D lattice architecture described herein and shown in the examples of Fig.8 or Fig.9. These processing units 1504 compute all reliability messages which are conceptually grouped into a placeholder vector 1505, symbolically represented as the one-dimensional vector [ k. Another placeholder vector 1506, symbolically represented as [w]k is conceptually defined as the concatenation of the check node messages [ ij]k with the vector 1509 of input metrics called LLR or Xi elsewhere in this disclosure. The elements of the placeholder vector 1506 are suitably reorganized to present to the input of a plurality of smaller matrix multiplication units with weights [B]i, such that the vectors [q]ki represent the updated edges associated with variable node Vi of the Tanner graph in the k^th iteration. [0118] Fig.14 illustrates the difference in approach, for the same hypothetical small parity check matrix as shown in Fig.12. The same variable node messages pertaining to bit 4 are highlighted, but in this embodiment they are computed by a smaller matrix multiplication unit 1825, which multiplies a smaller input vector [w] 1810 by a smaller matrix [B]₄ in order to produce the output [q]₄. An interconnect network 1805 routes a selected subset of the information in vector [w] to this MMU 1825. Again the rows in matrix [B]₄ below the line 1820 are only present in embodiments that not only compute the updated variable node messages Q _J but also the updated soft information Q₄.

[0119] It should be noted that while Fig.12 and Fig.14 show matrices which only contain zeros and ones, the interconnect network and variable node message processing unit presented herein fully support non-zero values other than exactly unity, that is to say values that can take on any value (e.g., 8 bit integer, 16 bit integer or even floating point). As such the parity check decoder can support using other formulas than formula (3), as it is known that attributing different weights to different messages, as a function of the local cycle lengths in the Tanner graph of the parity check code, can significantly reduce the number of iterations required to correctly decode a received codeword. This is especially relevant for High Density Parity Check (HDPC) codes such as BCH codes, Hamming codes, polar codes, Reed Solomon codes etc but also for LDPC codes with short block lengths and small graph cycle lengths. [0120] For good decoding performance it is important that the lengths of cycles in the graph be as long as possible. Short cycles, such as a length 4 cycle, can possibly degrade the performance of the message passing decoding approach to decode a parity check coded signal. The decoding means presented herein can self-learn, through gradient descent based on the soft-syndrome sum loss function, to individually and independently adjust the addition terms and multiplication factors in Fig.10 to change the reliability messages sent by parity check nodes to variable nodes, such that if a message is less reliable since it is produced by a parity check node with a large number of small cycles in its local neighbourhood, this message will be attenuated properly. The parameters of the systolic parity check code decoder of the invention (e.g., multiplicative weights in the processing element of Fig.10, function type of the function evaluator in Fig.10, matrix weights for the combined signal interconnect (routing) and variable node message computation shown in Fig.11 or 12) can be tuned either manually (analytically) or through machine learning in order to minimize the number of message passing iterations required to achieve a certain bit error probability at the output of the decoder. Such parameter tuning may either happen offline while the receiver is not receiving any data or online while the receiver is receiving data.

[0121] The systolic parity check code decoder of the invention can exist in embodiments that mix approaches as shown in Fig.12 and Fig.14 (Fig.11 and Fig.13 conversely), in order to match optimally the particular dimensions and capabilities of pre-existing systolic matrix multiplication units on a VLSI chip such as, for example, a GPU or a TPU.

[0122] Sometimes, it is desirable to have a receiver, coupled to a communication channel, that can decode different parity check coded signals such that each coded signal is generated in accordance with a different parity check matrix H. For example, sometimes a communication device needs to receive and decode a first signal having a corresponding first parity check matrix, Hl, a second signal that has a corresponding second parity check matrix. H2, and so on. This is common in communication links that exploit so called Adaptive Coding and Modulation (ACM) in response to dynamic fading events such as rain fades.

[0123] Accordingly, in the parity check code decoder presented herein, the matrix elements of matrices [A] in 1302 and [B] in 1307 (Fig.11), as well as the matrix elements of matrices [B]i in 1507 (Fig.13) can be configured in accordance with decoding of a first coded signal and subsequently be dynamically reconfigured to take on values in accordance with decoding of a second coded signal. Over the last decade, systolic matrix multiplication implementations in VLSI, especially for sparse matrices, have become very area and power efficient. The dynamic reconfigurable interconnect mechanism presented herein, combines the tasks of flexible message passing interconnect with the task of calculating the variable node messages by leveraging fast, low latency matrix multiplication circuitry. As such it naturally supports Adaptive Coding and Modulation while avoiding spending VLSI area on unused processing nodes (e.g., unused bit engines and/or check engines).

[0124] Another of the many benefits and advantages provided inherently by the systolic array based decoder presented herein for decoding parity check coded signals is that it can be tuned to meet precisely the performance requirements of the target application. Specifically, the architecture allows any number of bit nodes and check nodes to perform the processing in parallel. This is in contrast to for example state of the art semi-parallel parity check code decoders which limit the bit nodes and check nodes to one each per sub-block of the parity check matrix H.

[0125] In another aspect the invention relates to a method for processing a parity check coded signal. An illustrative embodiment of such a method is described here. The method initially involves receiving and processing a continuous-time signal. This receiving and processing of the continuous-time signal may also involve performing any necessary down-conversion of a first continuous-time signal thereby generating a second continuous-time signal. Any frequency conversion that may need to be performed may possibly be performed by direct conversion from carrier frequency to a baseband frequency. This frequency conversion may alternatively be performed via an IF (Intermediate Frequency). The received continuous-time signal is typically brought down in frequency to a baseband continuous-time signal when performing the method. Also, certain types of gain adjustment and/or gain control may be applied to the received continuous-time signal.

[0126] The method also involves sampling the first (or second) continuous-time signal thereby generating a discrete-time signal and extracting l,Q (In-phase, Quadrature) components there from. This sampling may be performed using an ADC (Analog to Digital Converter) or equivalent means to generate the discrete-time signal from the appropriately down-converted (and potentially also filtered, gain adjusted, etc.) received continuous-time signal. The l,Q components of the individual samples of the discrete time signal are also extracted within this step. The method then involves demodulating the l,Q components and can involve performing symbol mapping of the l,Q components (e.g., to a constellation shape having a mapping of the constellation points therein) thereby generating a sequence of discrete-valued modulation symbols.

[0127] A next step of the method involves performing updating of edge messages until a stopping condition is met (e.g., for a predetermined number of iterations, until all syndromes are equal to zero, or until some other stopping criterion is met). This step may be viewed as performing the parity check code decoding in accordance with any of the various embodiments described above. This message passing decoding generally involves bit engine processing for updating bit edge messages (variable edge messages) as well as check engine processing for updating check edge messages. In addition, the parity check code decoding of the method involves employing systolic check node processing units comprising each a three-dimensional lattice circuit interconnecting identical processing elements, and systolic matrix multiplication units that simultaneously realize interconnect circuitry as well as variable message computation, during bit engine processing and check engine processing.

[0128] After the stopping condition has been met, the method involves making hard decisions based on soft information corresponding to most recently updated bit edge messages. The method ultimately involves outputting a best estimate of the coded bits (the codeword, or code block) (that includes the information bits) that has been extracted from the received continuous-time signal.

[0129] One of average skill in the art will also recognize that the functional building blocks, and other illustrative blocks, modules and components herein, can be implemented as illustrated or by discrete components, application specific integrated circuits, processors executing appropriate Software and the like or any combination thereof. Moreover, although described in detail for purposes of clarity and understanding by way of the aforementioned embodiments, the present invention is not limited to such embodiments. It will be obvious to one of average skill in the art that various changes and modifications may be practiced within the spirit and scope of the invention, as limited only by the scope of the appended claims. [0130] While the invention has been illustrated and described in detail in the drawings and foregoing description, such illustration and description are to be considered illustrative or exemplary and not restrictive. The foregoing description details certain embodiments of the invention. It will be appreciated, however, that no matter how detailed the foregoing appears in text, the invention may be practiced in many ways. The invention is not limited to the disclosed embodiments.

[0131] Other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed invention, from a study of the drawings, the disclosure and the appended claims. In the claims, the word "comprising" does not exclude other elements or steps, and the indefinite article "a" or "an" does not exclude a plurality. A single processor or other unit may fulfil the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage. A computer program may be stored/distributed on a suitable medium, such as an optical storage medium or a solid-state medium supplied together with or as part of other hardware, but may also be distributed in other forms, such as via the Internet or other wired or wireless telecommunication systems. Any reference signs in the claims should not be construed as limiting the scope.

Claims

1. A check node processing unit for use in decoding a signal encoded with a parity check code, said parity check code associated with a parity check matrix and representable by a graph with a plurality of check nodes and a plurality of bit nodes, said check node processing unit configured to determine reliability messages to be exchanged between said check nodes and said bit nodes when decoding said signal, said check node processing unit comprising :

- a plurality of processing elements (1201, 1211, 1221) with a plurality of input messages, each processing element of said plurality comprising a first and a second input to receive a first and a second input message of said plurality, respectively, and an output to generate, based on said first and said second input message, an output message being a reliability message or an intermediate message,

- a plurality of data links interconnecting processing elements of said plurality to form a three- dimensional lattice circuit, whereby a first and a second dimension of said lattice circuit define a plane wherein output messages of a first subset of interconnected processing elements (1201) form in said plane a step pattern having a direction and wherein output messages of a second subset of interconnected processing elements (1211) form in said plane a step pattern having an opposite direction of said first step pattern, wherein each input message of said plurality is fed twice to said three-dimensional lattice circuit, once via a first data link of said plurality along said first dimension and once via a second data link along said second dimension, wherein interconnected processing elements of a third subset of interconnected processing elements (1221) are each arranged to receive along a third dimension of said lattice circuit at least a first intermediate message output by another processing element of said plurality and to output a corresponding reliability message based on at least said first intermediate message, and wherein at least one processing element of said plurality comprises :

- a plurality of multiplier circuits (1030) and adder circuits (1020) arranged to process said first or said second input message of said at least one processing element,

- a function evaluator (1010) arranged to operate on intermediate signals resultingfrom said processing using multiplier circuits and adder circuits of said plurality, whereby multiplier circuits and adder circuits of said plurality are also arranged to process outputs of said function evaluator and whereby said function evaluator is implemented as a rectifier circuit.

2. Check node processing unit as in claim 1, further comprising a processing element arranged to produce a soft syndrome.

3. Check node processing unit as in any of claims 2 to 4, comprising a one-sign function evaluator to transform said soft syndrome into a hard syndrome.

4. Check node processing unit as in any of the previous claims, further comprising an input selection block adapted to select between log likelihood ratios and variable node messages as said input messages.

5. Check node processing unit as in claim 4, arranged to determine said reliability messages from input messages received at said first input of said processing elements and to determine a syndrome from input messages received at said second input of said processing elements.

6 Parity check code decoder comprising one or more check node processing unit as in any of the previous claims.

7. Parity check code decoder as in claim 6, comprising a circuit to generate a soft-syndrome sum over at least one soft syndrome output.

8. Parity check code decoder as in claim 7, arranged to run a stochastic gradient descent algorithm minimizing a loss function based on said soft-syndrome sum.

9. Parity check code decoder as in claim 8, comprising computation means (1302, 1307, 1507) for deriving from a plurality of bit node messages resulting from a current iteration a set of input messages for said one or more check node processing units for use in a next decoding iteration and for deriving from said reliability messages output by said one or more check node processing units a set of bit node messages for the next decoding iteration.

10. Parity check code decoder as in claim 9, wherein said computation means comprises a unit for deriving said set of input messages and a different unit for deriving said set of bit node messages.

11. Parity check code decoder as in claim 10, wherein also a set of input metrics is considered when deriving said set of bit node messages and wherein said different unit is split up in a plurality of subunits.

12. Parity check code decoder as in any of claims 8 to 11, implemented in VLSI.

13. Receiver structure comprising a parity check code decoder as in any of claims 8 to 12.

14. Method to decode a signal encoded with a parity check code, with a check node processing unit as in any of claims 1 to 5, said parity check code associated with a parity check matrix and representable by a graph with a plurality of check nodes and a plurality of bit nodes, the method comprising :

- receiving a plurality of input messages to be fed to a three-dimensional lattice circuit comprising a plurality of processing elements interconnected via data links, whereby a first and a second dimension of said lattice circuit define a plane wherein output messages of a first subset of interconnected processing elements (1201) form in said plane a step pattern having a direction and wherein output messages of a second subset of interconnected processing elements (1211) form in said plane a step pattern having an opposite direction of said first step pattern, - applying to each processing element of said plurality a first and a second input message, whereby each input message of said plurality is fed twice to said three-dimensional lattice circuit, once via a first data link along said first dimension and once via a second data link along said second dimension, whereby interconnected processing elements of a third subset of interconnected processing elements (1221) each receive along a third dimension of said lattice circuit at least a first intermediate message output by another processing element of said plurality,

- generating at the output of each processing element, based on the applied first and second input message, either a reliability message or an intermediate message.