[go: up one dir, main page]

WO2023031632A1 - Encoder, decoder and communication system and method for conveying sequences of correlated data items from an information source across a communication channel using joint source and channel coding, and method of training an encoder neural network and decoder neural network for use in a communication system - Google Patents

Encoder, decoder and communication system and method for conveying sequences of correlated data items from an information source across a communication channel using joint source and channel coding, and method of training an encoder neural network and decoder neural network for use in a communication system Download PDF

Info

Publication number
WO2023031632A1
WO2023031632A1 PCT/GB2022/052266 GB2022052266W WO2023031632A1 WO 2023031632 A1 WO2023031632 A1 WO 2023031632A1 GB 2022052266 W GB2022052266 W GB 2022052266W WO 2023031632 A1 WO2023031632 A1 WO 2023031632A1
Authority
WO
WIPO (PCT)
Prior art keywords
encoder
item
interpolation
decoder
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/GB2022/052266
Other languages
French (fr)
Inventor
Deniz GUNDUZ
David Burth KURKA
Tze-Yang TUNG
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ip2ipo Innovations Ltd
Original Assignee
Imperial College Innovations Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Imperial College Innovations Ltd filed Critical Imperial College Innovations Ltd
Publication of WO2023031632A1 publication Critical patent/WO2023031632A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/004Arrangements for detecting or preventing errors in the information received by using forward error control
    • H04L1/0041Arrangements at the transmitter end
    • H04L1/0042Encoding specially adapted to other signal generation operation, e.g. in order to reduce transmit distortions, jitter, or to improve signal shape
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M13/00Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
    • H03M13/03Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words
    • H03M13/05Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words using block codes, i.e. a predetermined number of check bits joined to a predetermined number of information bits
    • H03M13/11Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words using block codes, i.e. a predetermined number of check bits joined to a predetermined number of information bits using multiple parity bits
    • H03M13/1102Codes on graphs and decoding on graphs, e.g. low-density parity check [LDPC] codes
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M13/00Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
    • H03M13/63Joint error correction and other techniques
    • H03M13/6312Error control coding in combination with data compression
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M13/00Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
    • H03M13/65Purpose and implementation aspects
    • H03M13/6597Implementations using analogue techniques for coding or decoding, e.g. analogue Viterbi decoder
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/004Arrangements for detecting or preventing errors in the information received by using forward error control
    • H04L1/0045Arrangements at the receiver end
    • H04L1/0047Decoding adapted to other signal detection operation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/0001Systems modifying transmission characteristics according to link quality, e.g. power backoff
    • H04L1/0014Systems modifying transmission characteristics according to link quality, e.g. power backoff by adapting the source coding

Definitions

  • This present application relates to an encoder, decoder and communication system comprising a transmitter and receiver incorporating the encoder and decoder for conveying sequences of correlated data items from an information source across a communication channel using joint source and channel coding.
  • the encoder and decoder are neural networks and in embodiments the information source is a video and the correlated data items are video frames.
  • An aim of a data communication system is to efficiently and reliably send data from an information source over a communication channel from a transmitter at as high a rate as possible with as few errors as achievable in view of the channel noise, to enable a faithful representation of the original information source to be recovered at a receiver.
  • Information sources providing sequences of correlated data items which share similarities and encapsulate data redundancy from one item in the sequence to the next can represent a significant data payload for transmission between transmitters and receivers over communications channels.
  • video content as a sequence of video frames containing images that are typically heavily correlated over time as the video develops.
  • a video of a largely static scene such as from a security camera remains largely unchanged from one video frame to the next.
  • video transmission makes up around 80% of traffic on the Internet by volume, and the data burden on transmitters and receivers to correctly and efficiently transmit video data and other correlated sequences of data over communication channels is high.
  • Most digital communication systems today include a source encoder and separate channel encoder at a transmitter and a source decoder and separate channel decoder at a receiver.
  • the symbols of source data are first digitally compressed into bits by the source encoder.
  • the goal in source coding is to encode the sequence of source symbols into a coded representation of data elements to reduce the redundancy in the original sequence of source symbols.
  • lossless compression one has to remove redundancy such that the original information source can still be reconstructed as the original version from the coded representation, while lossy compression allows a certain amount of degradation in the reconstructed version under some specified distortion measure, for example squared error.
  • H264/MPEG is an example of a lossy source compression standards widely used in practice. Compressing the information source using a source encoder before transmission means that fewer resources are required for that transmission.
  • the output of the source encoder is then provided to a channel encoder.
  • the goal of the channel encoder is to encode the compressed data representation in a structured way using a suitable Error Correction Code (ECC) by adding redundancy such that even if some of these bits are distorted or lost due to noise over the channel, the receiver can still recover the original sequence of bits reliably.
  • ECC Error Correction Code
  • the amount of redundancy that is added depends on the statistical properties of the underlying communication channel and the target Bit Error Rate (BER).
  • the modulator converts the bits into signals that can be transmitted over the communication medium.
  • the transmitted waveform is specified by its In-Phase (I) and Quadrature (Q) components, and a modulator typically has a discrete set of pre-specified I and Q values, called a constellation, and each group of coded information bits are mapped to a single point in this constellation.
  • Example modulation schemes include phase shift keying (PSK) and quadrature amplitude modulation (QAM).
  • PSK phase shift keying
  • QAM quadrature amplitude modulation
  • the receiver receives and demodulates (for example, by coherent demodulation) a sequence of noisy symbols, where the noise has been added by the communication channel.
  • noisy demodulated symbols are then mapped to sequences of data elements by a channel decoder.
  • the decoded data elements are then passed to the source decoder, which decodes these data elements to try to reconstruct a representation of the original input source symbols to reconstruct the information source.
  • the source encoder and decoder are designed jointly, as are the channel encoder and decoder, but the source encoder/decoder and channel encoder/decoder are designed and operate separately to perform very different functions.
  • the main advantage of separate source and channel coding is the modularity it provides. This means that the same channel encoder and decoder can be used in conjunction with any source encoder and decoder.
  • a channel encoder can encode data elements for transmission over a channel irrespective of the data elements or the information source from which they have been derived.
  • the source encoder and decoder can be operated in conjunction with any channel encoder and decoder to transmit the encoded source symbols over a communication channel.
  • a source encoder can encode data elements for subsequent coding by the channel encoder independently of which channel encoder is used.
  • This separate source and channel coding design provides modularity and allows independent optimisation of each component, which was theoretically shown by Shannon (Shannon, 1948) to be optimal for point-to-point communication over static channel conditions in the asymptotic infinite blocklength regime.
  • Shannon Shannon, 1948
  • the limits of the separation-based designs are beginning to rear. In such scenarios, the compression delay and the feedback necessary to track the instantaneous channel condition under constant variation are challenging.
  • the theoretical optimality of separation for communication utilising infinite blocklengths with unlimited delay and complexity becomes less relevant for low-latency systems that require short blocklengths and low complexity operations.
  • cliff-effect is when the channel condition deteriorates below the condition that the channel encoder had anticipated, and the source information is lost completely, leading to a cliff edge deterioration of the system performance.
  • ARQ automatic repeat request
  • the cliff edge effect if a bit error rate in the transmission of a video over a communications channel exceeds a maximum error rate for the channel decoder to be able to decode the received signal, then the cliff edge effect causes the transmission to drop out completely, which can present significant challenges for live streamed video, such as from a drone to a base station. This can lead to discontinuous reception of transmitted video, and can force the source encoder having to encode the video at ever more lossy compression levels and lower resolutions. [0016] Effective and efficient transmission of sequences of correlated data items, such as video, over noisy communications channels that allows continuous reception while keeping errors in reception to a minimum, is therefore desirable in view of the particular performance requirements and volumes of this type of data to be transmitted.
  • the present disclosure provides an encoder, a decoder and communication system for conveying data from an information source across a communication channel using joint source and channel coding, comprising a transmitter and a receiver.
  • the communication system comprises a transmitter having an encoder for encoding input data from an information source for transmission of a transformed version of the input data across a communication channel.
  • the encoder has a key item encoder neural network for encoding data items as key items independent of any other data item in the sequence, and an interpolation item encoder neural network for encoding data items as interpolation items using data representing the input data item and at least one previous data item in the sequence.
  • the communication system also comprises a receiver having a decoder including complementary key item decoder neural network and interpolation item decoder neural network.
  • the receiver is for receiving and demodulating a noise-affected version of the encoder output vectors based on the signal transmitted across the communication channel and passing them to the decoder for reconstructing the sequence of correlated data items.
  • a training method is also disclosed in which the key item encoder neural network and interpolation item encoder neural network are trained together with the respective complementary key item decoder neural network and interpolation item decoder neural network, to minimise an objective function characterising a reconstruction error between input-output pairs of training data items.
  • the present disclosure provides an encoder for use in a transmitter of a communications system for conveying sequences of correlated data items from an information source across a communications channel using joint source and channel coding.
  • the encoder comprises a key item encoder neural network for encoding data items selected from the sequence to serve as a key item that can be directly reconstructed by the decoder, the key item encoding being based on the input data item and being independent of any other data item in the sequence.
  • the encoder further comprises an interpolation item encoder neural network for encoding data items selected from the sequence to serve as an interpolation item that can be reconstructed by the decoder using interpolation, the interpolation item encoding using data representing the input data item and at least one previous data item in the sequence.
  • the key item encoder neural network and interpolation item encoder neural network have hidden layers of connecting nodes with weights that, in use, map vectors of values received at nodes in an encoder input layer thereof to encoder output vectors provided at nodes of an encoder output layer thereof, the encoder output vectors being used for providing values in a signal space for modulating a carrier signal for transmission of a transformed version of the data items across a communication channel.
  • the key item encoder neural network and interpolation item encoder neural network have in the communications system respective complementary key item decoder neural network and interpolation item decoder neural network for receiving a noise-affected version of the encoder output vector from a receiver receiving and demodulating the signal transmitted across the communication channel and reconstructing the input vector to generate a representation of the input data item.
  • the connecting node weights of the key item encoder neural network and interpolation item encoder neural network have been trained together with the respective complementary key item decoder neural network and interpolation item decoder neural network, to minimise an objective function characterising a reconstruction error between input-output pairs of training data items.
  • the present disclosure provides a decoder for use in a receiver of a communications system for conveying sequences of correlated data items from an information source across a communications channel using joint source and channel coding.
  • the decoder comprises a key item decoder neural network for decoding data items from the sequence indicated as key items, the key item decoding being based on a noise-affected version of an encoder output vector generated at a transmitter by a complementary key item encoder neural network to encode the data item based on the input data item independent of any other data item in the sequence, the key item decoder neural network being configured for reconstructing the input vector to generate a representation of the input data item directly from the noise-affected version of the encoder input vector and independently of any other data item in the sequence.
  • the decoder further comprises an interpolation item decoder neural network for decoding data items indicated as interpolation items, the key item decoding being based on data representing at least one previous data item in the sequence and a noise-affected version of an encoder output vector generated at a transmitter by a complementary interpolation item encoder neural network to encode the data item based on data representing the input data item and at least one previous data item in the sequence, the noise-affected version of the encoder output vector having been received and demodulated at the receiver based on the signal transmitted across the communication channel, the key item decoder neural network being configured for reconstructing the input vector by interpolation from reconstructions of representations of other data items in the sequence to generate a representation of the input data item.
  • the key item decoder neural network and interpolation item decoder neural network have hidden layers of connecting nodes with weights that, in use, map vectors of values received at nodes in a decoder input layer thereof to decoder output vectors provided at nodes of an decoder output layer thereof, the decoder output vectors providing a reconstruction of the encoder input vector to generate a representation of the input data item.
  • the connecting node weights of the key item decoder neural network and interpolation item decoder neural network have been trained together with the respective complementary key item encoder neural network and interpolation item encoder neural network, to minimise an objective function characterising a reconstruction error between input-output pairs of training data.
  • the present disclosure provides a communication system for conveying sequences of correlated data items from an information source across a communication channel using joint source and channel coding.
  • the communication system comprises a transmitter including an encoder and a receiver including a decoder.
  • the encoder comprises a key item encoder neural network for encoding data items selected from the sequence to serve as a key item that can be directly reconstructed by the decoder, the key item encoding being based on the input data item and being independent of any other data item in the sequence.
  • the encoder further comprises an interpolation item encoder neural network for encoding data items selected from the sequence to serve as an interpolation item that can be reconstructed by the decoder using interpolation, the interpolation item encoding using data representing the input data item and at least one previous data item in the sequence.
  • the key item encoder neural network and interpolation item encoder neural network have hidden layers of connecting nodes with weights that, in use, map vectors of values received at nodes in an encoder input layer thereof to encoder output vectors provided at nodes of an encoder output layer thereof, the encoder output vectors being used for providing values in a signal space for modulating a carrier signal for transmission of a transformed version of the data items across a communication channel.
  • the transmitter is configured for transmitting signals over the communication channel based on signal values of the encoder output vectors of the key item encoder neural network and interpolation item encoder neural network.
  • the receiver is configured for receiving and demodulating a noise-affected version of the encoder output vectors based on the signal transmitted across the communication channel and passing them to the decoder for reconstructing the sequence of correlated data items.
  • the decoder comprises a key item decoder neural network for decoding data items from the sequence indicated as key items, the key item decoding being based on a noise-affected version of an encoder output vector generated at a transmitter, the key item decoder neural network being configured for reconstructing the input vector to generate a representation of the input data item directly from the noise-affected version of the encoder input vector and independently of any other data item in the sequence.
  • the decoder further comprises an interpolation item decoder neural network for decoding data items indicated as interpolation items, the key item decoding being based on data representing at least one previous data item in the sequence and a noise-affected version of an encoder output vector generated at a transmitter.
  • the interpolation item decoder neural network is configured for reconstructing the input vector by interpolation from reconstructions of representations of other data items in the sequence to generate a representation of the input data item.
  • the key item decoder neural network and interpolation item decoder neural network have hidden layers of connecting nodes with weights that, in use, map vectors of values received at nodes in a decoder input layer thereof to decoder output vectors provided at nodes of an decoder output layer thereof, the decoder output vectors providing a reconstruction of the encoder input vector to generate a representation of the input data item.
  • the present disclosure provides a method for conveying sequences of correlated data items from an information source across a communications channel using joint source and channel coding.
  • the method comprises, at a transmitter, for each data item in the sequence: selecting data items from the sequence of data items to serve as key items and interpolation items; encoding data items to serve as key items using a key item encoder neural network, the key item encoding being based on the input data item and being independent of any other data item in the sequence; encoding data items to serve as interpolation items using an interpolation item encoder neural network, the interpolation item encoding using data representing the input data item and at least one previous data item in the sequence.
  • the key item encoder neural network and interpolation item encoder neural network have hidden layers of connecting nodes with weights that, in use, map vectors of values received at nodes in an encoder input layer thereof to encoder output vectors provided at nodes of an encoder output layer thereof, the encoder output vectors being used for providing values in a signal space for modulating a carrier signal for transmission of a transformed version of the data items across a communication channel.
  • the method further comprises, at the transmitter, transmitting signals over a communication channel based on signal values of encoder output vectors of the key item encoder neural network and interpolation item encoder neural network.
  • the method further comprises, at a receiver: receiving and demodulating a noise-affected version of the encoder output vectors based on the signal transmitted across the communication channel; decoding data items from the sequence indicated as key items using a key item decoder neural network based on a noise-affected version of the encoder output vector for the data item, the key item decoder neural network being configured for reconstructing the input vector to generate a representation of the input data item directly from the noise-affected version of the encoder input vector and independently of any other data item in the sequence; and decoding data items from the sequence indicated as interpolation items using an interpolation item decoder neural network based on data representing at least one previous data item in the sequence and the noise-affected version of an encoder output vector for the data item, the key item decoder neural network being configured for reconstructing the input vector by interpolation from reconstructions of representations of other data items in the sequence to generate a representation of the input data item.
  • the key item decoder neural network and interpolation item decoder neural network have hidden layers of connecting nodes with weights that, in use, map vectors of values received at nodes in a decoder input layer thereof to decoder output vectors provided at nodes of an decoder output layer thereof, the decoder output vectors providing a reconstruction of the encoder input vector to generate a representation of the input data item.
  • the connecting node weights of the key item encoder neural network and interpolation item encoder neural network have been trained together with the respective complementary key item decoder neural network and interpolation item decoder neural network, to minimise an objective function characterising a reconstruction error between input-output pairs of training data items.
  • the present disclosure provides a computer readable medium comprising one or more instructions which when executed cause at least one of: a transmitter; and a receiver; to operate in accordance with the above-described method.
  • a machine-learned for conveying sequences of correlated data items from an information source across a communication channel using joint source and channel coding, which optimises the reconstructed quality end-to-end, is achieved. This method deviates from the separation-based designs by optimising a single encoder and decoder, which jointly provide the same or better performance compared to expert designed, modular systems.
  • the encoder is configured to encode the sequences of correlated data items from an information source for transmission across the communications channel as a streaming media. In embodiments, the encoder is configured to encode the sequences of correlated data items from an information source into a static media file. [0027] In embodiments, the sequences of correlated data items are a series of image frames providing a video. In embodiments, the correlated data items are each represented by a 3D matrix with a depth based on the colour channels, a height based on the height of the frame and a width based on the width of the frame. In embodiments, the encoder input layers of the key item encoder neural network and/or the interpolation item encoder neural network are configured to receive video frames as input vectors.
  • the encoder is configured to create a coded and compressed representation of the input data as an encoder output vector for transmission across the communications channel, and the decoder is configured to decode the noise-affected version of the encoder output vector back into an uncompressed reconstruction of the input data.
  • the quantity of information included in the output vector of the key item encoder neural network and/or the interpolation item encoder neural network is smaller than the input vector.
  • the interpolation encoder input layer is configured such that the interpolation item encoding uses data received at the input layer thereof representing the input data item in input data space and at least one previous data item in the sequence in input data space.
  • the data representing at least one previous data item in the sequence used by the interpolation item encoder to encode interpolation items comprises: the motion representation information representing the relative motion between the data item and at least one other data item in the sequence; and the residual information between the data item and a motion compensated version of at least one other data item in the sequence using the motion representation information in respect of that data item.
  • the interpolation encoder input layer is configured such that the interpolation item encoding uses data received at the input layer thereof representing: the input data item in the latent space defined by the output of the key item encoder neural network; and at least one previous data item in the sequence in the latent space defined by the output of the key item encoder neural network.
  • the data representing the input data item used by the interpolation item encoder to encode interpolation items comprises: the key item encoder output vector encoded for the data item by the key item encoder neural network; and wherein the data representing at least one previous data item in the sequence used by the interpolation item encoder to encode interpolation items comprises an encoder output vector transmitted by the encoder for at least one previous data item in the sequence.
  • the data representing at least one previous data item in the sequence used by the interpolation item decoder to decode interpolation items comprises a noise-affected version of an encoder output vector or a reconstruction of the encoder input vector providing a representation of the input data item for at least one previous data item in the sequence.
  • the data representing at least one previous data item in the sequence used by the interpolation item encoder to encode interpolation items further comprises data representing at least one subsequent data item in the sequence.
  • the encoder and decoder further comprise a static control module configured to select data items from the sequence of data items as key items and interpolation items according to a fixed order specified by a predetermined group of items, the static control module being further configured to use the key item encoder and decoder to encode and decode data items selected as key items, and to use the interpolation item encoder and decoder to encode and decode data items selected as interpolation items.
  • the encoder further comprises a dynamic control module having a dynamic decision agent configured to dynamically choose whether the data item is to serve as a key item or an interpolation item.
  • the dynamic decision agent is configured to dynamically choose whether the data item is to serve as a key item or an interpolation item based at least on one or more of: the current data item; the number of data items transmitted since last key item; a current average channel utilisation; and a channel utilisation constraint.
  • the dynamic decision agent is configured to dynamically choose whether the data item is to serve as a key item or an interpolation item so that the average channel utilisation is below the channel utilisation constraint.
  • the dynamic control module is configured to: select, based on a decision output by decision agent for the data item, whether the data item is to serve as a key item or an interpolation item; if the data item is selected to serve as a key item, use the key item encoder to encode the data item in the sequence to provide a key item encoder output vector for the item, the encoder being configured for transmitting the key item encoder output vector on the communications channel.
  • the dynamic control module is further configured to: if the data item is selected to serve as an interpolation item, use the interpolation encoder to encode the data item to provide an interpolation item encoder output vector for the item, the encoder being configured for transmitting the interpolation item encoder output vector on the communications channel.
  • the dynamic decision agent is configured to generate data mapping for the sequence of data items, which data items are key data items and which data items are interpolation data items, for transmission across the communications channel and for use by the decoder to determine whether the received noise-affected version of an encoder output vector should be decoded by the key item decoder neural network or the interpolation item decoder neural network.
  • the encoder output layers of the interpolation item encoder neural network and the decoder input layers of the interpolation item decoder neural network are divided into ordered blocks, and the neural networks are trained such that the interpolation item encoder neural network encodes descending ordering of information in increasing blocks of nodes, and such that, for interpolation items, the decoder reconstructs an increasingly refined representation of the input data with increasing blocks received in the noise-affected version of an encoder output vector.
  • the communication system further comprises a bandwidth allocation module configured to determine, for each data item in the sequence selected to serve as an interpolation item, a number of blocks of the interpolation encoder output layer to be transmitted over the communication channel, so as to allocate the available bandwidth in the communications channel to the transmission of interpolation items.
  • the bandwidth allocation module is further configured to determine the number of blocks of the interpolation encoder output layer to be transmitted over the communication channel to seek to minimise the reconstruction error between the representation of the input data reconstructed at the decoder and the input data encoded at the encoder.
  • the bandwidth allocation module is configured to determine the number of blocks of the interpolation encoder output layer to be transmitted over the communication channel based on at least motion representation information determined to represent the relative motion between the data item and at least one other data item in the sequence. In embodiments, the bandwidth allocation module is configured to determine a number of blocks of the interpolation encoder output layer to be transmitted over the communication channel, so as to seek to optimally allocate the available bandwidth in the communications channel between a group, a set or the whole sequence of data items to be transmitted.
  • the interpolation encoder neural network is configured to: maintain and update an internal state as successive interpolation items of a group of consecutive interpolation items are encoded by the interpolation encoder neural network; and after successive interpolation items have been encoded into the internal state, to provide the internal state as the interpolation encoder output vector for providing values in a signal space for modulating a carrier signal for transmission of a transformed version of the group of consecutive interpolation data items across the communication channel.
  • the encoder neural network is configured to output an encoder output vector for transmission for each key item and each group of consecutive interpolation items between key items.
  • the interpolation decoder neural network is configured to: for a group of consecutive interpolation items, recursively decode the noise-affected version of the encoder output vector received from a receiver to thereby reconstruct the encoder input vectors of successive interpolation items to generate a representation of the input data items of the group of consecutive interpolation items.
  • the interpolation encoder neural network and the interpolation decoder neural network are both provided by a recurrent neural network, optionally a Long Short-Term Memory (LSTM) network.
  • the encoder output vectors provide values in a signal space that represent in-phase and quadrature components for modulation of the carrier signal for transmission over the communication channel.
  • the encoder output vectors provide values defining a probability distribution sampleable to provide values in a signal space that represent in-phase and quadrature components for modulation of the carrier signal for transmission over the communication channel.
  • the encoder output vectors provide values in the signal space that are assigned exactly to symbol values of a predetermined finite set S of symbols of a predefined alphabet of carrier modulation signal values transmittable by the transmitter over the communication channel.
  • the predefined alphabet is a fixed, predefined constellation of symbols for digitally modulating the carrier signal to encode the input data for transmission over the communication channel.
  • the encoder output vectors provide values corresponding to a predetermined finite set of symbols of an existing channel encoder and decoder scheme for transmission of data over the communication channel.
  • the transformation applied by the communication channel may, in embodiments, also include an existing channel code.
  • the encoder and decoder may learn an optimum mapping of the input information source to inputs of an existing channel code of the communications channel that reduces reconstruction errors at the output of the decoder neural network. Although acting as an outer code in these embodiments, this learned coding of the encoder and decoder is still optimised based on the characteristics of the communication channel to reduce reconstruction errors, even though in these alternative embodiments the communication channel includes an existing channel code.
  • the present disclosure provides a method of training an encoder and a decoder for use in a communication system in accordance with the above aspects and embodiments of the present disclosure for conveying sequences of correlated data items from an information source across a communication channel using joint source and channel coding.
  • the method comprises: for input-output pairs of a set of training data items from the information source passed to the encoder, determining an objective function characterising a reconstruction error between input-output pairs of training data from the information source passed to the encoder and the representation of the input data reconstructed at the decoder; and using an appropriate optimisation algorithm operating on the objective function, updating the connecting node weights of the hidden layers of the key item encoder neural network, interpolation item encoder neural network, key item decoder neural network and interpolation item decoder neural network to seek to minimise the objective function.
  • the encoder neural networks and decoder neural networks have been trained together using training data in which a model of the communication channel is used to estimate channel noise and add it to the transmitted signal values to generate a noise-affected version of the vector of signal values in the input-output pairs of training data.
  • the encoder output layers of the interpolation item encoder neural network and the decoder input layers of the interpolation item decoder neural network are divided into ordered blocks, wherein the encoder output vector passed to the decoder input layer for each input-output pair during training is selected to have a random number of the ordered blocks, such that, following training, the interpolation item encoder neural network encodes descending ordering of information in increasing blocks of nodes, and such that, for interpolation items, the decoder reconstructs an increasingly refined representation of the input data with increasing blocks received in the noise-affected version of an encoder output vector.
  • the present disclosure provides a computer readable medium comprising one or more instructions, which when executed cause a computing device to operate the above-described methods of training an encoder and a decoder for use in the above-described communication systems for conveying sequences of correlated data items from an information source across a communication channel using joint source and channel coding.
  • Figure 1 shows a communication system for conveying sequences of correlated data items, such as video, from an information source across a communications channel using joint source and channel coding in accordance with an example of the present disclosure
  • Figure 2 shows the encoding, transmission and reconstruction of sequences of correlated data items in the form of video frames from an information source using a communication system in accordance with an example of the present disclosure
  • Figure 3 shows an example run time method for the transmitter and the encoder in accordance with an example of the present disclosure
  • Figure 4 shows an example run time method for the receiver and the decoder in accordance with an example of the present disclosure
  • Figure 5 shows an example training time method for the neural networks of the encoder and decoder in accordance with an example of the present disclosure
  • Figure 6 shows a structure of an communication system in accordance with an example of the present disclosure, showing the use of a key item encoder and decoder
  • FIG. 19 shows a performance comparison of the performance envelope of another example of the communication system of Figure 6 and the performance of separate source coding by H.265 (i.e.
  • MPEG-H Part 2 and channel coding by a low-density parity-check (LDPC) code at different code rates for encoding, transmitting and reconstructing other example sequences of correlated data items over a channel having AWGN at various signal to noise (SNR) ratios;
  • Figure 20 shows a visual comparison of reconstructed frames of an example video encoded and transmitted across a channel having additive white Gaussian noise at 13dB, 3dB and - 4dB, by an example of the communication system of Figure 6 trained at different SNRs and by a separate source coding by H.264 (i.e.
  • FIG. 21 shows a performance comparison of another example of the communication system of Figure 6 and the performance of separate source coding by H.264/H.265 and channel coding by a low-density parity-check (LDPC) 3/416QAM code for encoding, transmitting and reconstructing other example sequences of correlated data items over a channel having AWGN at a signal to noise (SNR) ratio of 20dB for different bandwidth compression rates; and
  • Figure 22 shows a performance comparison of the performance envelope of another example of the communication system of Figure 6, showing the difference in performance of the system having a uniform bandwidth allocation to the frames in a group of pictures, a non-uniform bandwidth allocation in accordance with a pre-determined heuristic, and an optimal bandwidth allocation based on embodiments of the present disclosure in which a bandwidth allocation module is used.
  • the terms “A or B,” “at least one of A and/or B,” or “one or more of A and/or B” may include all possible combinations of A and B.
  • “A or B,” “at least one of A and B,” “at least one of A or B” may indicate all of (1) including at least one A, (2) including at least one B, or (3) including at least one A and at least one B.
  • first and “second” may modify various components regardless of importance and do not limit the components. These terms are only used to distinguish one component from another. For example, a first user device and a second user device may indicate different user devices from each other regardless of the order or importance of the devices.
  • a first component may be denoted a second component, and vice versa without departing from the scope of the disclosure.
  • an element e.g., a first element
  • another element e.g., a second element
  • it can be coupled or connected with/to the other element directly or via a third element.
  • processor configured (or set) to perform A, B, and C may mean a generic-purpose processor (e.g., a CPU or application processor) that may perform the operations by executing one or more software programs stored in a memory device or a dedicated processor (e.g., an embedded processor) for performing the operations.
  • processor configured (or set) to perform A, B, and C may mean a generic-purpose processor (e.g., a CPU or application processor) that may perform the operations by executing one or more software programs stored in a memory device or a dedicated processor (e.g., an embedded processor) for performing the operations.
  • the terms as used herein are provided merely to describe some embodiments thereof, but not to limit the scope of other embodiments of the disclosure.
  • Figure 1 shows a communication system 100 comprising a transmitter 110 for conveying sequences of correlated data items, such as video, from an information source 111 across a communication channel 120 to a receiver 130 using joint source and channel coding in accordance with an example of the present disclosure.
  • Figure 2 shows the encoding, transmission and reconstruction of sequences of correlated data items in the form of video frames from an information source 11 using the communication system 100.
  • the transmitter 110 and receiver 130 may each be part of respective electronic devices for transmitting or receiving sequences of correlated data items, such as video.
  • the electronic device coupled to the transmitter 110 or receiver 130 may be a smartphone, a tablet, a personal computer such as a desktop computer, a laptop computer, a netbook computer, a workstation, a server, a wearable device such as a smart watch, smart glasses, a head-mounted device or smart clothes, an airborne or land drone, a robot or other autonomous device such as industrial or home robots, a security control panel, a gaming console, a security camera, a microphone, or an Internet of Things device for sensing or monitoring, such as a smart meter, various sensors, an electric or gas meter, a medical device such as a portable medical measuring device, a blood sugar measuring device, a heartbeat measuring device, or a body temperature measuring device, a navigation device, a global positioning system (GPS) receiver, an event data recorder (EDR), a flight data recorder (FDR), avionics, point of sale devices.
  • GPS global positioning system
  • EDR event data recorder
  • FDR flight data recorder
  • the electronic device may also be a base station or relay in a radio communication system, which may be capable of operating in accordance with one or more communications standards, such as the wireless communication standards 802.11xx for WiFi maintained by the Institute of Electrical and Electronics Engineers (IEEE), and the 3G, LTE and NR standards for cellular communications maintained by the 3rd Generation Partnership Project (3GPP), or any other radio transceiver for receiving signals transmitted across the communications channel, and decoding them for onward transmission, for example on the Internet.
  • the transmitter 110 includes an information source 111, at least one processor 112, memory 113 and a carrier modulator 118 coupled to an antenna 119 for transmitting data over communication channel 120.
  • the information source 111 is a source of data items to be transmitted over the communication channel 120 by the transmitter 110.
  • the information source 111 is a source of data provided as a sequence of correlated data items x 1, x 2, x 3, ... x n in which the correlation is manifested as some degree of redundancy in the data in adjacent items in the sequence.
  • the data items x 1, x 2, x 3, ... x n may, for example, be frames of a video in which the pixel data presented in consecutive video frames may be correlated in location and brightness.
  • the example information source 111 shown in Figure 2 which shows a video captured by a security camera of largely static scene of a harbour under constant illumination, there may be a significant amount of redundancy from one video frame to the next.
  • the video captures a moving item in a scene, such as a moving boat in the scene, or a moving scene caught by a panning camera, differences between one frame and the next may be analysable by optical flow analysis.
  • the information source 111 is not limited to being a source of video data, and the present disclosure is intended to be applicable to sources of any suitable sequences of correlated data items, where the correlation may occur in time or space, or both, or along any other suitable dimension over which the data items are correlated.
  • the information source 111 may be a source of sensor data from one or more sensors sensing one or more physical characteristics of a system that vary over time or location within the physical system in such a way that the data items are correlated.
  • the data items may be captured in steps of equal or unequal intervals along the dimension in which they are correlated.
  • the information source 111 is any information source suitable for arranging as a sequence of source symbols or fundamental data elements (for example, bits).
  • the information source 111 is a video source that provides the sequence of data items x 1, x 2, x 3, ... x n as video frames
  • the correlated data items x 1, x 2, x 3, ... x n may each be represented by a 3D matrix with a depth based on the colour channels (normally 3 channels for RGB), a height, H, based on the height of the frame and a width, W, based on the width of the frame, i.e. x n ⁇ R H ⁇ W ⁇ 3 .
  • the information source 111 may generate the correlated data items locally to the transmitter (such as a video captured by a camera coupled to the transmitter, such as in the electronic device of which the transmitter is a part) or it may be a source of data stored locally to the transmitter that was generated elsewhere, remotely from the transmitter 110.
  • the encoding and transmission of the data items from the information source 111 may be performed asynchronously with the time at which the data items were generated, or it may be performed live or in real time, with the encoding being performed largely contemporaneously to the generation of the data items.
  • the information source 111 may provide the data items for encoding and transmission as a static media file that is encoded and transmitted and then reconstructed and stored at the receiver 130 where it can be viewed in a player or conveyed further.
  • the information source 111 may also provide the data items for encoding and transmission as a stream of video frames for encoding and transmission on the fly, which are then to be reconstructed at the receiver 130 where it can be viewed in a player or conveyed further for replay as a streaming video, in which case the received video stream may or may not be stored locally at the receiver to allow subsequent asynchronous replay.
  • the information source 111 may store or generate ‘raw’ or ‘uncompressed’ data directly or indirectly representative of characteristics of the information source, to allow faithful reproduction of the information source 111 by a given combination of data processing hardware appropriately configured, for example by software or firmware.
  • the data items may be pre-processed before being passed to the encoder 115 through an initial form encoding which may already compress the data items. This does not preclude the encoder of the present disclosure learning a further optimal joint source channel coding for the communication channel 120 to minimise reconstruction errors.
  • the data items may represent segments of the data provided by the information source 111. For example, rather than each data item representing an individual video frame, the video frames may be divided into blocks or segments, with each block being represented by a separate sequence of data items.
  • the processor 112 executes instructions that can be loaded into memory 113.
  • the processor 112 can include any suitable number(s) and type(s) of processors or other devices in any suitable arrangement.
  • Example types of processor 112 include microprocessors, microcontrollers, digital signal processors, field programmable gate arrays and application specific integrated circuits.
  • the memory 113 may be provided by any structure(s) capable of storing and facilitating retrieval of information (such as data, program code, and/or other suitable information on a temporary or permanent basis).
  • the memory 113 can represent a random access memory or any other suitable volatile or non-volatile storage device(s).
  • the memory 113 may also contain one or more components or devices supporting longer-term storage of data, such as a ready only memory, hard drive, flash memory, or optical disc, which may store software code for loading into the memory 113 at runtime.
  • the processor 112 and memory 113 provide a Runtime Environment (RTE) 114 in which instructions or code loaded into the memory 113 can be executed by the processor to generate instances of software modules in the Runtime Environment 114.
  • RTE Runtime Environment
  • the memory 113 comprises instructions which, when executed by the one or more processors 112, cause the one or more processors 112 to instantiate an encoder 115 in the RTE 114.
  • the encoder 115 includes a key item encoder neural network 115k and an interpolation item encoder neural network 115i for encoding data items selected from the sequence of correlated data items to serve as key items and interpolation items respectively.
  • the encoder 115 may include a motion and residual module 115m for deriving interpolation information in the input data space for encoding interpolation items, for example using optical flow analysis in the case of video data.
  • the encoder 115 may also include a bandwidth allocation module 115b for determining the bandwidth to be allocated to the transmission of the interpolation items.
  • the encoder 115 may also include an encoder control module 115c to control the operation of the key item encoder neural network 115k and an interpolation item encoder neural network 115i to encode data items from the correlated sequence for transmission.
  • the encoder 115 may be configurable by instructions stored in memory 113 and implemented in RTE 114 to carry out the runtime methods described in relation to Figure 3, Figures 6-9 and 11, and Figures 12-15 for encoding sequences of input data items x 1, x 2, x 3, ... x n from information source 111 to sequences of encoder output vectors z 1, z 2, z 3, ... z n being used for providing values in a signal space for modulating a carrier signal for transmission of a transformed version of the data items across a communication channel 120, the signal values provided from the encoder output vectors z 1, z 2, z 3, ... z n representing a transformed version of the input data items.
  • the encoder 115 is configured to receive the data items x 1, x 2, x 3, ... x n , in the example the video frames, as input vectors for providing to input layers of the key item encoder neural network 115k and/or the interpolation item encoder neural network 115i. Once the encoder 115 has encoded data items from the sequence of correlated data items into encoder output vectors z 1, z 2, z 3, ... z n , the encoder output vectors z 1, z 2, z 3, ... z n are passed to the carrier modulator 118.
  • the encoder output vectors z 1, z 2, z 3, ... z n are used for providing values in a signal space for modulating, by the carrier modulator 118, a carrier signal for transmission of a transformed version of the data items across a communication channel 120.
  • the carrier modulator 118 may operate to in use directly encode the in-phase (I) and quadrature (Q) components of one or more carriers or subcarriers with signal values provided to the carrier modulator 118 by the encoder 115 using an appropriate modulation technique to provide a channel input signal 118i for transmission by antenna 119 across the communication channel 120.
  • a suitable multiplexing technique such as orthogonal Frequency-Division Multiplexing (OFDM) may be used.
  • OFDM orthogonal Frequency-Division Multiplexing
  • the carriers encoding the encoder output vectors z 1, z 2, z 3, ... z n in the channel input signal 118i are then transmitted by the antenna 119 onto the communication channel 120.
  • the encoder 115 may be configured to output encoder output vectors z 1, z 2, z 3, ... z n that may provide values defining a probability distribution sampleable to provide values in a signal space that represent in-phase and quadrature components for modulation of the carrier signal for transmission over the communication channel.
  • the key item encoder neural network 115k and/or the interpolation item encoder neural network 115i may be configured as variational autoencoders.
  • the carrier modulator 118 and antenna 119 may be of conventional construction and may be configured to encode the carriers/subcarriers with signal values of complex IQ representations.
  • the carrier modulator 118 may be configured to freely modulate the carriers/subcarriers with any IQ signal value within the signal space passed to it.
  • the encoder output vectors provide values in the signal space that are assigned exactly to symbol values of a predetermined finite set S of symbols of a predefined alphabet of carrier modulation signal values transmittable by the transmitter over the communication channel.
  • the predefined alphabet is a fixed, predefined constellation of symbols for digitally modulating the carrier signal to encode the input data for transmission over the communication channel.
  • the carrier modulator 118 may be configured to only be able to modulate the carriers/subcarriers with IQ values corresponding to one or more finite, fixed sets or ‘constellations’ of symbols such as by quadrature amplitude modulation (QAM) or binary phase-shift keying (BPSK).
  • the carrier modulator 118 and antenna 119 may be compatible with the 5G New Radio standard such that the transmittable symbols of IQ values are mapped to the 16-QAM, 64-QAM or 256-QAM constellations.
  • the carrier modulator 118 and antenna 119 may be hard-wired to work only with these symbols, and they may not be able to transmit signal values or symbols that are not within these standard constellation sets.
  • the encoder 115 may be configured to learn the optimum encoding within the available constellation of transmittable IQ signal values.
  • the communication channel 120 may be used to convey information from one or more such transmitters 110 to one or more such receivers 130.
  • the communication channel 120 may be a physical connection, e.g., a wire, or a wireless connection such as a radio channel as in the example shown in Figure 1.
  • the communication channel 120 including the noise associated with such a channel, is modelled and defined by its characteristics and statistical properties.
  • Channel characteristics can be identified by comparing the input and output of the channel, the output of which is likely to be a randomly distorted version of the input.
  • the distortion indicates channel statistics such as additive noise, or other imperfections in the communication medium such as fading or synchronization errors between the transmitter 110 and the receiver 130.
  • Channel characteristics include the distribution model of the channel noise, slow fading and fast fading.
  • Common channel models include binary symmetric channel and additive white Gaussian noise (AWGN) channel.
  • the receiver 130 includes at least one processor 132, memory 133 and a carrier demodulator 138 coupled to an antenna 139 for receiving data over communication channel 120.
  • a bus system (not shown) may be provided which supports communication between at the least one processor 132, memory 133, carrier demodulator 138 and antenna 139.
  • the receiver 130 thus includes an information sink 131 to which the reconstructed representation of the input data decoded by the decoder neural network 135 is provided.
  • the processor 132 executes instructions that can be loaded into memory 133, and in use provide a Runtime Environment (RTE) 134 in which instructions or code loaded into the memory 133 can be executed by the processor to generate instances of software modules in the Runtime Environment 134.
  • RTE Runtime Environment
  • the memory 133 comprises instructions which, when executed by the one or more processors 132, cause the one or more processors 132 to instantiate a decoder 135.
  • the antenna 139 of the receiver 130 receives as a channel output 138o from the communications channel 120 a noise-affected version of the encoder output vectors z 1, z 2, z 3, ... z n transmitted by the antenna 119 of the transmitter 110, the noise having been added by the communication channel 120.
  • the carrier demodulator 138 demodulates these noisy demodulated versions of the encoder output vectors z 1, z 2, z 3, ... z n , for example, by coherent demodulation, and passes them to the decoder 135 in the RTE 134.
  • noisy demodulated versions of the encoder output vectors z 1, z 2, z 3, ... z n are then mapped by the decoder 135 to a reconstructed representation of the originally input sequence of data items x 1, x 2, x 3, ... x n where they are passed to the information sink 131 at which a reconstruction of the information source 111 is collected for viewing, storing or conveying further.
  • the information sink 131 collects a decoded reconstruction of the video frames x 1, x 2, x 3, ... x n passed to the encoder 115 and transmitting over the communications channel 120.
  • the decoder 135 includes a key item decoder neural network 135k and an interpolation item decoder neural network 135i for decoding data items indicated as key items and interpolation items respectively.
  • the decoder 135 may include a motion and residual module 135m for reconstructing interpolation data items in the input data space using decoded interpolation information provided by interpolation item decoder neural network 135i.
  • the decoder 135 may also include a decoder control module 135c to control the operation of the key item decoder neural network 135k and an interpolation item decoder neural network 135i to decode data items from the correlated sequence for provision to information sink 131 at which the reconstruction representation of the input data items from the information source 111 is collected.
  • a decoder control module 135c to control the operation of the key item decoder neural network 135k and an interpolation item decoder neural network 135i to decode data items from the correlated sequence for provision to information sink 131 at which the reconstruction representation of the input data items from the information source 111 is collected.
  • the decoder 135 may be configurable by instructions stored in memory 133 and implemented in RTE 134 to carry out the runtime methods described in relation to Figure 4, Figures 6-8, 10 and 11, and Figures 12-14 and 16 for decoding sequences of noise-affected version of the encoder output vectors z 1, z 2, z 3, ... z n received over the communications channel 120 to a reconstructed representation of the originally input sequence of data items x 1, x 2, x 3, ... x n .
  • the encoder control module 115c may be configured for passing data items from the information source 111 to the key item encoder neural network 115k or the interpolation item encoder neural network 115i for encoding.
  • the encoder control module 115c may be configurable to operate as static control module, which may pass data items from the information source 111 to the key item encoder neural network 115k or the interpolation item encoder neural network 115i based on a fixed order specified by a predetermined group of items, such as a fixed group of pictures for a video encoding scheme (e.g. every 7 th item may be a key item, with the intervening items all being interpolation items).
  • a predetermined group of items such as a fixed group of pictures for a video encoding scheme (e.g. every 7 th item may be a key item, with the intervening items all being interpolation items).
  • the encoder control module 115c may also be configurable to operate instead as a dynamic control module, which may pass data items from the information source 111 to the key item encoder neural network 115k or the interpolation item encoder neural network 115i based on a dynamically assigned order specified by, for example, a decision agent implemented as a Markov Decision Process.
  • the decoder control module 135c may be configured for passing the noise-affected version of the encoder output vectors z 1, z 2, z 3, ... z n received over the communications channel 120 to either the key item decoder neural network 135k or the interpolation item decoder neural network 135i for decoding based on whether the respective data item is indicated as a key item or an interpolation item.
  • the passing of the noise-affected version of the encoder output vectors z 1, z 2, z 3, ... z n received over the communications channel 120 may be based on a fixed order specified by a predetermined group of items.
  • the receiver 130 may receive separate signalling from the transmitter 110 indicating the sequence of key items and interpolation items based on the dynamic operation of the encoder control module 115c.
  • the key item encoder neural network 115k and key item decoder neural network 135k are formed as a complementary pair which may be configured as an autoencoder for encoding and decoding data items in the sequence selected as key items independent of any other data item in the sequence.
  • the key item encoder neural network 115k is for encoding data items selected from the sequence to serve as key items that can be directly reconstructed by key item decoder neural network 135k, the encoding and decoding of key items being independent of any other data item in the sequence.
  • the interpolation item encoder neural network 115i and the interpolation item decoder neural network 135i are formed as a complementary pair which may be configured as an autoencoder, a recurrent neural network, a long short-term memory, or any other suitable neural network configuration, for encoding and decoding data items in the sequence selected as interpolation items by interpolation at least in relation to a previous data item in the sequence.
  • the interpolation item encoder neural network 115i is for encoding data items selected from the sequence to serve as interpolation items that can be reconstructed by the interpolation item decoder neural network 135i, and other components of the decoder 135 as needed, using interpolation, the encoding and decoding of interpolation items using data representing the input data item and at least one previous data item in the sequence.
  • Neural networks are machine learning models that employ multiple layers of nonlinear units (known as artificial “neurons”) to generate an output from an input. Neural networks may be composed of several layers, each layer formed from nodes. Neural networks can have one or more hidden layers in addition to the input layer and the output layer.
  • each layer uses a set of parameters, which are optimized during the training stage.
  • each layer comprises a set of nodes, the nodes having learnable biases and their inputs having learnable weights. Learning algorithms can automatically tune the weights and biases of nodes of a neural network to optimise the output in order to minimise an objective function using an optimisation algorithm such as gradient descent or stochastic gradient descent.
  • the key item encoder neural network 115k has an input layer having nodes for receiving input data x 1, x 2, x 3, ... x n for encoding representative of input data items from the information source.
  • the interpolation item encoder neural network 115i also has an input layer having nodes for receiving input data for encoding.
  • the data input to the input layer of the interpolation item encoder neural network 115i depends on the neural network architecture, and whether the interpolation item encoder neural network 115i encodes interpolation information in the input data space or the latent space.
  • the input layer of the interpolation item encoder neural network 115i may receive input data x n for encoding representative of input data items from the information source (relating to the current and at least the previous input data item, i.e.
  • the input layer of the interpolation item encoder neural network 115i may receive the output vector z n of the key item encoder neural network 115k for current and at least the previous input data item in the sequence (i.e. z n and z n-1 ).
  • the key item encoder neural network 115k and interpolation item encoder neural network 115i have respective encoder output layers that output encoder output vectors z 1, z 2, z 3, ... z n that are used for providing values in a signal space for modulating, by the carrier modulator 118, a carrier signal for transmission by antenna 119 over communications channel 120.
  • the key item encoder neural network 115k and interpolation item encoder neural network 115i have hidden layers of connecting nodes with weights that, in use, map vectors of values received at nodes in the encoder input layer thereof to the encoder output vectors such that the transmitter 110 transmits a transformed version z 1, z 2, z 3, ... z n of the input data items x 1, x 2, x 3, ... x n across the communication channel 120.
  • the key item decoder neural network 135k and interpolation item decoder neural network 135i have hidden layers of connecting nodes with weights that, in use, map vectors of values received at nodes in a decoder input layer thereof to decoder output vectors provided at nodes of an decoder output layer thereof, the decoder output vectors providing a reconstruction of the encoder input vector to generate a representation of the input data item.
  • the connecting node weights of the key item encoder neural network 115k and interpolation item encoder neural network 115i have been trained together with the respective complementary key item decoder neural network 135k and interpolation item decoder neural network 135i, to minimise an objective function characterising a reconstruction error between input-output pairs of training data items.
  • the training of the connecting node weights may be performed using an appropriate optimization algorithm operating on the objective function.
  • the input data from the information source 111 such as the image or video
  • the transmitter 110 can be received and decoded at the receiver 130 to allow a reconstructed representation of the original input image or video to be generated at information sink 131.
  • the key item encoder neural network 115k and interpolation item encoder neural network 115i may be configured to create a coded and compressed representation of the input data as an encoder output vector for transmission across the communications channel.
  • the encoder output vectors may contain less information than the encoder input vectors for each data item.
  • the encoder 115 may be configured such that the bandwidth allocation module 115b may interoperate with the key item encoder neural network 115k, interpolation item encoder neural network 115i and encoder control module 115c to encode the data items selected as interpolation items in such a way that an available bandwidth in the communication channel 120, or data budget, is shared between the data items so that the data items are compressed, for example such that a channel use constraint is met.
  • the bandwidth allocation module 115b may be provided by a neural network configured, for example by reinforcement learning, to select, for each interpolation item in a group of pictures, a number of blocks of the interpolation item encoder output vector for transmission, where the interpolation item encoder neural network 115i has been trained to encode increasing information with an increased number of encoder output vector blocks.
  • the bandwidth allocation module 115b may be configured work with the encoder control module 115c to dynamically select successive data items as interpolation items to be encoded together into the same recurrently updated encoder output vector for transmission, such that successive interpolation items are encoded together in a compressed transmission for recurrent decoding.
  • the decoder may be configured to decode the noise- affected version of the encoder output vector back into an uncompressed reconstruction of the input data.
  • the quantity of information included in the output vector of the key item encoder neural network and/or the interpolation item encoder neural network is smaller than the input vector.
  • the communication channel 120 may be one that also includes an existing channel encoder and decoder scheme, in which the signal space of the channel input may be the predetermined finite set symbols of the channel code (which could be bit values) for modulating a signal carrier for providing input signals in the alphabet in the input signal space for the communication channel 120.
  • the transformation applied by the communication channel 120 may, in embodiments, also include an existing channel code.
  • the signal space may be a message alphabet for an existing channel code by which the input signals to the given communications channel are modulated.
  • the encoder output vector will be mapped into the message alphabet of the corresponding channel code (rather than, for example, the raw IQ values transmittable by the transmitted).
  • the noise-affected version of the encoder input vector input to the decoder neural network 135 may correspond to the hard- decoded message of the existing channel decoder.
  • the encoder neural network 115 and decoder neural network 135 may learn an optimum mapping of the input information source 111 to inputs of an existing channel code of the communications channel 120 that reduces reconstruction errors at the output 131 of the decoder neural network135.
  • this learned coding of the encoder neural network 115 and decoder neural network 135 is still optimised based on the characteristics of the communication channel 120 to reduce reconstruction errors, even though in these alternative embodiments the communication channel 120 includes an existing channel code. This is unlike existing modular source codes which are defined independently of the random transformation applied by any channel. [0089] Reference will now be made to Figures 3 and 4 which set out in more detail how the transmitter 110 and receiver 130, and the trained neural networks of the encoder 115 and decoder 135, operate to transmit data from information source 111 across communication channel 120 by joint source and channel coding.
  • the data items may be received at the encoder control module 115c which may select each data items in the sequence as a key item or an interpolation item, according to a static or dynamic allocation of a group of items, and then passing to the key item encoder neural network 115k or interpolation item encoder neural network 115i as encoder input vectors x 1, x 2, x 3, ... x n for processing accordingly.
  • the encoder control module 115c may be a static control module configured to select data items from the sequence of data items as key items and interpolation items according to a fixed order specified by a predetermined group of items.
  • the encoder control module 115c may be a dynamic control module having a dynamic decision agent configured to dynamically choose whether the input data item x t is to serve as a key item or an interpolation item.
  • the dynamic decision agent may be configured to dynamically choose whether the input data item x t is to serve as a key item or an interpolation item based at least on one or more of: the current data item x t ; the number of data items transmitted since last key item; a current average channel utilisation; and a channel utilisation constraint.
  • the dynamic decision agent may be configured to dynamically choose whether the input data item x t is to serve as a key item or an interpolation item so that the average channel utilisation is below the channel utilisation constraint.
  • the dynamic decision agent may be configured to generate data mapping, for the sequence of data items, which data items are key data items and which data items are interpolation data items. This mapping is for transmission across the communications channel 120 and for use by the decoder 135 to determine whether the received noise-affected version of an encoder output vector z t should be decoded by the key item decoder neural network 135k or the interpolation item decoder neural network 135i.
  • the encoder 115 may determine, statically or dynamically, whether the data item x t is selected as a key item or an interpolation item. [0094] If the data item is selected as a key item, in step 303, the data item x t is passed to the key item encoder neural network 115k where it is encoded to a latent encoder output vector z t based on the input data item x t and being independent of any other data item in the sequence. As a key item, the data item x t can be directly reconstructed by the decoder from the noise- affected version of the encoder output vector z t alone.
  • the interpolation item encoder neural network 115i is used to encode to a latent encoder output vector z t representative of the input data item x t .
  • the encoding by the interpolation item encoder neural network 115i may be performed using data representing the input data item x t and at least one previous data item in the sequence x t-1 .
  • the encoding by the interpolation item encoder neural network 115i may be performed also using data representing the input data item x t and at least one subsequent data item in the sequence x t+1 .
  • the input data item x t , and other data items used in the encoding the latent encoder output vector z t by the interpolation item encoder neural network 115i may be pre-processed before being passed to the interpolation item encoder neural network 115i to provide representative data to facilitate the encoding of interpolation information to allow the reconstruction of the input data item x t at the decoder by interpolation from reconstructions of representations of other data items in the sequence.
  • the input data item x t may be pre-processed by a motion and residual module 115m of the encoder 115 to generate one or more of: motion representation information representing the relative motion between the data item and at least one other data item in the sequence; and the residual information between the data item and a motion compensated version of at least one other data item in the sequence using the motion representation information in respect of that data item.
  • the encoding of a latent encoder output vector z t for interpolation items by the interpolation item encoder neural network 115i may use data received at the input layer thereof representing the input data item in input data space and at least one previous data item in the sequence in input data space.
  • the input data space is the pixel space of the video.
  • the motion representation information may therefore be optical flow information for the input video frame x t produced from an optical flow analysis of the sequence of video frames.
  • the input data item x t may be pre-processed by the key item encoder neural network 115k to encode the data item x t into a latent space vector z t .
  • the input layer of the interpolation item encoder neural network 115i may be configured such that the interpolation item encoding uses data representing: the input data item in the latent space defined by the output of the key item encoder neural network 115k (i.e.
  • the output of the interpolation item encoder neural network 115i may also be a vector in the latent space z, the vector z t being representative of interpolation information in the latent space z for the data item x t . Encoding the interpolation information in the latent space z in this way may be more efficient and effective than encoding the interpolation information in the input data space of x.
  • the data item x t is encoded by the key item encoder neural network 115k or the interpolation item encoder neural network 115i into a latent space vector z t it is passed in step 305 to the carrier modulator 118.
  • the latent space vector z t has values usable for providing values in a signal space for modulating, by the carrier modulator 118, a carrier signal or one or more subcarriers for with a transformed version of the data item x t .
  • the carrier signal Once the carrier signal has been encoded it is transmitted across communication channel 120 using antenna 119.
  • the bandwidth allocation module 115b may control the number of output blocks of each latent space vector z t that are transmitted to share out the available bandwidth or bit budget, for example to meet an average channel use condition, or based on an allocation of bandwidth for the transmission.
  • the bandwidth allocation module 115b may be configured to determine the number of blocks of the interpolation encoder output layer to be transmitted to minimise the reconstruction error between the representation of the input data reconstructed at the decoder and the input data encoded at the encoder. In embodiments, the bandwidth allocation module 115b may be configured to determine the number of blocks of the interpolation encoder output layer to be transmitted based on at least motion representation information determined to represent the relative motion between the data item x t and at least one other data item in the sequence (e.g. x t-1 ).
  • the bandwidth allocation module is configured to determine a number of blocks of the interpolation encoder output layer to be transmitted, so as to seek to optimally allocate the available bandwidth in the communications channel between a group, a set or the whole sequence of data items to be transmitted (e.g. x 1, x 2, x 3, ... x n ).
  • the interpolation item encoder neural network 115i is provided by a recurrent neural network (RNN) such as an LSTM
  • RNN recurrent neural network
  • the bandwidth allocation module 115b may work together with the encoder control module 115c to encode successive interpolation items into the cell of the RNN for transmission in a single latent space vector z t for recursive decoding by the decoder 135.
  • the encoder neural network 115i is operated by the bandwidth allocation module 115b working together with the encoder control module 115c to maintain and update an internal cell state thereof as successive interpolation items of a group of consecutive interpolation items are encoded by the interpolation encoder neural network 115i.
  • the encoder 115 is configured to provide the internal state as the interpolation encoder output vector z t for providing values in a signal space for modulating the carrier signal for transmission of a transformed version of the group of consecutive interpolation data items across the communication channel 120. That is, the encoder 115 is configured to output an encoder output vector z t for transmission for each key item and each group of consecutive interpolation items between key items.
  • the transmission by the transmitter 110 of the carrier signals modulated using the latent space vectors z 1,2,3,...n may be in sequence as each latent space vector is encoded for each time step as the data item x t for that time step is generate or received, for example on the fly in the event of streaming data.
  • the latent space vectors z 1,2,3,...n may all be encoded first before being transmitted in sequence by the transmitter.
  • the transmitter process 300 is completed. [00102]
  • the run time method 400 for the receiver 130 and the decoder 135 starts in step 401 in which the antenna 139 receives a carrier signal from communications channel 120 and passes it to carrier demodulator 138 which demodulates the carrier signal to recover noise-affected version of encoder output vector z 1,2,3,...n as they are received, and passes them to the decoder control module 135c.
  • the decoder control module 135c determines whether the noise-affected version of encoder output vector z t for a given time step t encodes the input data item x t for that time step as a key item or an interpolation item. In embodiments where the ordering of the group of items follows is static and follows a pre-defined order, the decoder control module 135c may determine whether the received ⁇ is representative of a key item or an interpolation item based on that order. In embodiments where the ordering of the data items as key or interpolation items is assigned dynamically, the decoder control module 135c may determine whether the received is representative of a key item or an interpolation item based on mapping data received from the encoder 110.
  • step 403 the vector is passed to the key item decoder neural network 135k where it is directly decoded to provide a reconstruction of the input vector x t independently of any other data item in the sequence.
  • the key item decoder neural network 135k generates a representation of the input data items of x 1,2,3,...n indicated as key items directly from the relevant noise-affected version of the encoder input vectors received at the receiver.
  • step 404 the vector is passed to the interpolation item decoder neural network 135i where it is decoded to provide a reconstruction of the input vector x t based on data representing at least one previous data item x t-1 in the sequence and the noise- affected version of the encoder output vector z t for the data item.
  • the data representing at least one previous data item x t-1 in the sequence used by the interpolation item decoder neural network 135i to decode interpolation items may comprise a reconstruction of the encoder input vector providing a representation of the input data item x t-1 for at least the previous data item in the sequence.
  • the interpolation item decoder neural network 135i decodes the noise-affected version of the encoder output vector z t to provide an estimate of the motion representation information representing the relative motion between the data item x t and the at least one other data item in the sequence (e.g.
  • the interpolation item decoder neural network 135i decodes the noise-affected version of the encoder output vector z t to provide an estimate of the residual information between the data item x t and a motion compensated version of the at least one other data item in the sequence (e.g. x t-1 ) using the motion representation information in respect of that data item generated at the motion and residual module 115m and encoded by the interpolation item encoder neural network 115i.
  • the data representing at least one previous data item x t-1 in the sequence used by the interpolation item decoder neural network 135i to decode interpolation items may comprise a noise-affected version of an encoder output vector
  • the interpolation item decoder neural network 135i provides a reconstruction of the input vector x t by decodes the noise-affected version of the encoder output vector z t based on the vector and the vector for the previous data item. That is, the vector and are used recursively as inputs by the recurrent neural network to update the cell state and provide the reconstruction as an output.
  • the vector corresponds to a representation of the previous data item x t-1 in the latent space as represented by an encoding of the reconstruction using the key item encoder neural network 115k.
  • the decoder 135 may store a software module in memory 133 for instantiating the key item encoder neural network 115k locally in RTE 134.
  • the reconstruction of the encoder input vector providing a representation of the input data item x t-1 is obtained at the [00110]
  • the reconstruction of the encoder input vector x t is decoded by the key item decoder neural network 135k or the interpolation item decoder neural network 135i it is passed in step 405 to the information sink 131 at which the information source is reconstructed.
  • the reconstruction of the information source generated in the information sink 131 may be stored locally, for example in memory 133, for local reproduction at a later stage, or it may be reproduced contemporaneously without being stored permanently locally (for example in the case of streaming media).
  • the reconstruction of the information source generated in the information sink 131 may also be conveyed onward for reproduction elsewhere, for example using the Internet.
  • a training time process 500 for optimising the weights of the key item encoder and decoder neural network pair, 115k and 135k, and the interpolation item encoder and decoder neural network pair, 115i and 135i, to minimise reconstruction errors will now be described with reference to Figure 5.
  • the weights of the key item encoder and decoder neural network pair, 115k and 135k, and the interpolation item encoder and decoder neural network pair, 115i and 135i are jointly optimized end-to-end in an unsupervised manner by passing training data sample vectors as inputs through the communication system 100 (or a simulation thereof using a channel model to add noise) and receiving its reconstruction vector in a forward pass of training data through the neural networks.
  • the vectors is received (individually or in batches) and are passed through the communication system to obtain the reconstruction vectors to form input- output pairs of a set of training data in respect of a training data information source 111.
  • the input-output pairs of vectors of training data may be calculated empirically, by the transmitter 110, in the forward pass, encoding and transmitting the encoder output vector representation of the input vector of training data across the communication channel 120 where the signal values are subsequently received as the noise-affected vector decoded by receiver 130 to the reconstruction vector.
  • the key item encoder and decoder neural network pair, 115k and 135k, and the interpolation item encoder and decoder neural network pair, 115i and 135i can be optimised to take into account the noise in the channel through training based on empirical data capturing the effects of channel noise on the transmission.
  • the input- output pairs of vectors of training data may be generated using a model of the communication channel 120 to estimate channel noise and add it to the transmitted encoder output vector to generate simulation of the noise-affected vector subsequent decoding and reconstructing of the output training data vector by the decoder neural networks 135k and 135i.
  • a channel model can be adopted that simulates the practical channel experienced in the operational regime.
  • AWGN additive white Gaussian noise
  • the channel model can be any model that simulates an arbitrary transformation of the encoder output vector transmitted by the transmitter 110.
  • the training process may perform batchwise optimisation across groups of input- output pairs, such as using gradient descent to find a gradient error in the forward pass and determine an update to the weights.
  • stochastic gradient descent may be used in which the error is determined and weights updated for each input-output pair of vectors of training data, before the next of pair of vectors of the training data are determined using the updated weights.
  • an objective function is determined characterising a reconstruction error between the input-output pairs of vectors of training data.
  • the reconstruction error for the objective function is characterised using the Mean Squared Error loss between calculated by: [00117]
  • Other objective functions characterising the reconstruction error may be used.
  • the method further comprises, in steps 505 and 507 which may be performed together, using an appropriate optimisation algorithm operating on the objective function, updating the connecting node weights of the hidden layers of the key item encoder and decoder neural network pair, 115k and 135k, and the interpolation item encoder and decoder neural network pair, 115i and 135i, to seek to minimise the objective function.
  • the gradient descent optimisation algorithm is used to seek to minimize the objective function by using a differential of the objective function to determine the gradient and the direction towards a minimum value for the objective function.
  • the gradient descent algorithm operates on the objective function based on a differential of at least the key item encoder and decoder neural network pair, 115k and 135k, for training data items that are key items, and the interpolation item encoder and decoder neural network pair, 115i and 135i, for training data items that are interpolation items.
  • the gradient of the objective function can be efficiently calculated with respect to the weights in the key item encoder and decoder neural network pair, 115k and 135k, and the interpolation item encoder and decoder neural network pair, 115i and 135i, for example by unstacking the elementary functions used to compute the forward pass, and by repeatedly applying the chain rule to autodifferentiate them and determine the gradient with respect to the weights in the key item encoder and decoder neural network pair, 115k and 135k, and the interpolation item encoder and decoder neural network pair, 115i and 135i, by backpropagation.
  • the connecting node weights of the hidden layers of the key item encoder and decoder neural network pair, 115k and 135k, and the interpolation item encoder and decoder neural network pair, 115i and 135i are updated to seek to minimise the objective function.
  • this is achieved in the gradient descent optimisation method by the using the determined gradient to estimate an update to the weights of the key item encoder and decoder neural network pair, 115k and 135k, and the interpolation item encoder and decoder neural network pair, 115i and 135i, that is expected to step the objective function towards a minimum, where the local gradient is zero.
  • the weights of the key item encoder and decoder neural network pair, 115k and 135k, and the interpolation item encoder and decoder neural network pair, 115i and 135i are updated and, in step 509, it is checked whether there are more samples of training data in the training set.
  • the process 500 returns to step 501 and the next batch or training sample is received and the optimisation method is carried out again to further optimise the weights of the neural networks. If training over the training set is complete, the process 500 ends and a trained key item encoder and decoder neural network pair, 115k and 135k, and interpolation item encoder and decoder neural network pair, 115i and 135i, are provided for use in an operational communication system 100 for transmitting input data over a communication channel 120.
  • the encoder and decoder blocks are built as artificial neural networks with learnable parameters so that the transformation from/to data to latent representation (code) can be learned directly from data.
  • the constellation symbols ⁇ transmittable by the transmitter are predefined, as it is the case when using standard communication hardware and protocols, the or if an existing channel code is used, these pre- existing codes act as constraints for the optimization and the objective function.
  • a channel model is used in the forward pass of the training process, rather than empirically generating training data, the channel model can be included directly in the backward pass in the optimisation algorithm. If the channel model used is differentiable, it can be used directly in the backpropagation stage. If it is not differentiable, a generative adversarial network (GAN) may be used to learn a differentiable representation of the channel model.
  • GAN generative adversarial network
  • the key item encoder and decoder neural network pair, 115k and 135k, and the interpolation item encoder and decoder neural network pair, 115i and 135i can be optimised to take into account the noise in the channel through training based on a theoretical noise model of the communication channel.
  • the key item encoder and decoder neural network pair, 115k and 135k, and the interpolation item encoder and decoder neural network pair, 115i and 135i can be trained together using training data in which a model of the communication channel is used to estimate channel noise and add it to the transmitted signal values to generate the noise- affected version of the vector of signal values in the input-output pairs of training data.
  • the objective function may characterise and optimise against further constraints and characteristics of the communication system 100, such as to obtain an average power in the symbols transmitted across the communication system 100, so as to ensure the learned coding system satisfies an average power constraint.
  • the interpolation item encoder and decoder neural network pair, 115i and 135i are to encode descending ordering of information in increasing blocks of nodes, such that, for interpolation items, the decoder reconstructs an increasingly refined representation of the input data with increasing blocks received in the noise-affected version of an encoder output vector
  • the training process 500 can be adapted.
  • the encoder output layers of the interpolation item encoder neural network 115i and the decoder input layers of the interpolation item decoder neural network 135i are divided into ordered blocks.
  • the encoder output vector passed to the decoder input layer for each input-output pair during training is selected to have a random number of the ordered blocks, such that, following training, the interpolation item encoder neural network 115i encodes descending ordering of information in increasing blocks of nodes, and such that, for interpolation items, the decoder 135 reconstructs an increasingly refined representation of the input data with increasing blocks received in the noise-affected version of the encoder output vector.
  • the communication system 100 in this embodiment operates a static control module allocating data items according to a fixed group of pictures.
  • the communication system 100 in this embodiment determines interpolation information for encoding interpolation data items in a pixel space of the input data items.
  • the communication system 100 in this embodiment includes an interpolation encoder neural network 115b having an output layer of nodes divided into blocks such that it encodes descending ordering of information in increasing blocks of nodes, and a bandwidth allocation module 115b for selecting a number of blocks of an encoder output vector for interpolation items to share or allocate bandwidth between interpolation items in the group of pictures.
  • this arrangement has been shown to outperform existing separate source coding and channel coding schemes in terms of reduced reconstruction errors at the decoder across a wide range of different channel conditions.
  • the key item encoder neural network 115k parameterised by ⁇ , and mapping a frame to a complex latent vector representing the In-phase (I) and Quadrature (Q) components of a complex channel symbol, is then defined as: [00130] This is achieved by pairing consecutive real values at the output of the neural network.
  • the values in the complex latent vector mayfirst be power normalised to meet a power constraint, and in step 904 these values are then directly sent through the communication channel 120.
  • AWGN Additive White Gaussian Noise
  • the key item decoder neural network 135k parameterised by that maps the noisy latent vector received and demodulated at the receiver in step 1001 and passed to the key item decoder neural network 135k in step 1002 to decode it back to the original frame domain in step 1003 is defined as: [00133]
  • the key item encoder neural network 115k and key item decoder neural network 135k are then trained together using the method generally described in relation to Figure 5 using the mean-squared error as the loss function, defined as follows, to optimise the weights of the hidden layers thereof to minimise a reconstruction error.
  • a diagram of the key item encoder neural network 115k and key item decoder neural network 135k architecture is shown in Figure 7.
  • the notation kxsycz is used to signify kernel size x, stride y and z kernels.
  • the GDN layer refers to Generalised Divisive Normalisation, which is effective in density modelling and compression of images.
  • the network is fully convolutional, therefore it can accept input of any height (H) and width (W).
  • H height
  • W width
  • the separate interpolation item encoder neural network 115i specified as is used to encode motion representation and residual information determined in step 905 by motion and residual module 115m.
  • the architecture of the interpolation item encoder neural network 115i and interpolation item decoder neural network 135i is shown in Figure 8.
  • the motion representation is generated by motion and residual module 115m in respect of two other frames in the sequence by an optical flow estimator to generate the optical flow and residual information with respect to two frames referred to as anchor frames.
  • the motion and residual module 115m determines as shown in Figure 8 to be a motion compensated anchor frame according to the opticalflow to produce an approximation of frame using the determined optical flow.
  • the motion and residual module 115m determines the residual error in the optical flow interpolation: [00137]
  • the residual represents information not captured by opticalflow, such as occlusion/disocclusion and camera movements.
  • a pre-trained PWC-Net can be used in the motion and residual module 115m.
  • the interpolation item encoder neural network 115i parameterised by ⁇ - defines the mapping for interpolation data items into the latent space: [00139]
  • the interpolation item encoder neural network 115i is thus used to encode the data item based on the data items optical flows and residuals
  • a bandwidth allocation module 115b may be used to select a number of blocks of an encoder output vector for the interpolation item to share or allocate bandwidth between interpolation items in the group of pictures.
  • the values in the complex latent vector mayfirst be power normalised to meet a power constraint, and in step 908, the transmitter transmits the signal values of encoder output vector over communications channel 120.
  • the interpolation item decoder neural network 135i parameterised by defines the mapping: where the mask and a slice in the third dimension of satisfies: [00142]
  • the decoder motion and residual module 135m reconstructs the frame by: [00143] where ⁇ refers to element-wise multiplication.
  • the reconstructed frames generated in step 1003 by the key item decoder neural network 135k and in steps 1004 and 1005 by the interpolation item decoder neural network 135i and decoder motion and residual module 135m are then passed to information sink 131 at which the reconstruction of the data source 111 is stored.
  • the architecture of the interpolation item encoder neural network 115i and interpolation item decoder neural network 135i is functionally the same as for the key item encoder neural network 115k and key item decoder neural network 135k.
  • the interpolation item encoder neural network 115i is trained together with the interpolation item decoder neural network 135i using the method generally described in relation to Figure 5 and the mean-squared error as the loss function, to optimise the weights of the hidden layers thereof to minimise a reconstruction error.
  • the bandwidth allocation module 115b is trained as a separate neural network parameterised by ⁇ having the architecture as shown in Figure 11. Given a particular channel use constraint k per each GoP, reinforcement learning (RL) is utilised to learn the optimal bandwidth allocation policy for each frame in a GoP based on the frames themselves, that maximises the video quality.
  • RL reinforcement learning
  • the joint source-channel encoders having the neural network architecture of the key item encoder and decoder neural network pair, 115k and 135k, and the interpolation item encoder and decoder neural network pair, 115i and 135i, encoded video frames can be successively refined by sending increasingly more information.
  • This training process leads to the descending ordering of information from to
  • the reconstruction of the data item in at the decoder 130 includes less information.
  • the reconstruction of the data item in at the decoder 130 includes more information.
  • the number of blocks of the encoded latent vector generated by the encoder neural networks 115 can be selected for transmission to vary the bandwidth used to transmit the encoding of each data item. This may be based on an assessment of a relative amount of information needed to minimise reconstruction errors while meeting a channel use condition or a bit budget for the GoP.
  • the neural network parameterised by ⁇ of the bandwidth allocator module 115b is trained using reinforcement learning to allocate the available bandwidth to each of the frames in a GoP, using only the frames themselves, such that the loss metric is minimised.
  • the nth GoP in a video is defined as where [00149]
  • the action set A consists of all the ways to allocate the available bandwidth k to each frame in the GoP. Since we are concerned with maximising the visual quality of thefinal video, we define [00150] the reward as [00151] Deep Q-learning is used to learn the optimal allocation policy, where the network seeks to approximate Here S represents the set of all states (i.e. all GoPs in a video).
  • the purpose of the Q function is to map each state and action pair to a Q value, which represents the total discounted reward from step n given the state and action pair
  • the Q function is defined as mapping: where 0 ⁇ ⁇ ⁇ 1 is the discount factor, which is chosen close to 1 when aiming to optimize the average reward.
  • the purpose of the of deep neural network of the bandwidth allocator module 115b is to approximate To that end, the mean-squared error loss function L is used and gradient descent is performed to update the weights of the network as follows.
  • Figure 17 plots the performance of each trained model for a bandwidth compression ratio of 0.031 at each evaluation CSNR as given, in the top pane (a), by the measured peak signal to noise (PSNR) ratio indicative of the reconstruction quality of the video frames at the receiver 130, and in the bottom pane (b), by the measured multiscale structural similarity index measure (MS-SSIM) indicative of the similarity at different scales between the video from the information source 111 and the video reconstructed at the information sink 131.
  • PSNR peak signal to noise
  • MS-SSIM measured multiscale structural similarity index measure
  • MS-SSIM has been shown to perform better in approximating the human visual perception than the more simplistic structural similarity index (SSIM) on different subjective image and video databases
  • SSIM structural similarity index
  • the bandwidth compression rate for the separate source and channel coding models is chosen to be at a level the achieves equivalent performance to the best performing joint source and channel coding model at the highest evaluation CSER of 15dB, in order to compare the best achievable reconstruction performance by both joint- and separate- coding models as the channel noise increases and the evaluation CSNR decreases.
  • the bandwidth compression rate for the separate source and channel coding models that achieves the same peak performance to the best joint source and channel coding model at an evaluation CSNR of 15dB is lower than the bandwidth compression rate similarly performing joint source and channel coding model. This indicates that, for the same peak performance, the joint source and channel coding model achieves a higher bandwidth compression ratio, meaning less data is transmitted to achieve the same reconstruction performance.
  • the performance of the separation- coding does not improve as the channel condition improves above the cliff threshold CSNR either, meaning that, for better channel conditions above the cliff edge threshold, no improvement in quality is observable.
  • a cliff edge deterioration of the H.264 scheme is observed. This cliff edge in performance is simply not seen in the trained joint source channel coding models of the communication system 100 of the present disclosure.
  • the overall performance of the trained joint source and channel coding models of the communication system 100 is better than the best available H.264 and LDPC codes, as the best performing trained joint source and channel coding model beats the best available separation-coding H.264 with LDPC coding scheme for all evaluation CSNR channel noise levels. This is the case in both the PSNR and MS-SSIM metrics, suggesting the superior compression capability of the communication system 100 over separation-based schemes.
  • the trained joint source channel coding models of the communication system 100 consistently outperform the best performing conventional separation codes for reconstruction performance and compression rates, for all channel noise conditions. Further, because the encoder neural networks directly map the source inputs to the channel outputs, and the decoder neural networks directly map the noisy-received channel outputs to the reconstruction of the source inputs, the trained joint source channel coding models of the communication system 100 were consistently three orders of magnitude faster in terms of end-to-end encoding/decoding speed, compared to the best performing separate coding schemes, further reducing latency of transmission.
  • the current channel condition needs to be monitored and the weights of the encoder and decoder neural networks need to be adjusted to the weights trained to match the channel condition. That is, weights are chosen to correspond to a training condition in which the channel noise or SNR matched the estimate of the current channel condition.
  • SNR EST an accurate estimate of the current channel SNR that corresponds to the actual current channel SNR
  • SNR AWGN an actual current channel SNR
  • Figure 19 shows a performance comparison of the performance envelope of another example of the communication system of Figure 6 and the performance of separate source coding by H.265 (i.e. MPEG-H Part 2) and channel coding by LDPC at different code rates for encoding, transmitting and reconstructing other example sequences of correlated data items over a channel having AWGN at various signal to noise (SNR) ratios.
  • H.265 i.e. MPEG-H Part 2
  • channel LDPC channel coding by LDPC at different code rates for encoding, transmitting and reconstructing other example sequences of correlated data items over a channel having AWGN at various signal to noise (SNR) ratios.
  • SNR signal to noise
  • the trained joint source and channel encoder and decoder as disclosed herein outperforms H.264 by 0.46 dB in PSNR and by 0.0081 in MS-SSIM for SNR AWGN ⁇ [13, 20] dB, by 3.07 dB in PSNR and by 0.0485 in MS-SSIM for SNR AWGN ⁇ [3, 6] dB.
  • the trained joint source and channel encoder and decoder as disclosed herein falls short of H.265 by 3.22 dB in PSNR, but outperforms it by 0.0006 in MS-SSIM for SNR AWGN ⁇ [13, 20] dB.
  • the trained joint source and channel encoder and decoder as disclosed herein can be extremely efficient in practice using optimized hardware and library, more so than separation-based methods.
  • Figure 21 a performance comparison is shown of a trained joint source and channel encoder and decoder as disclosed herein and the performance of separate source coding by H.264/H.265 and channel coding by a low-density parity-check (LDPC) 3/416QAM code for encoding, transmitting and reconstructing other example sequences of correlated data items over a channel having AWGN at a signal to noise (SNR) ratio of 20dB for different bandwidth compression rates ⁇ .
  • SNR signal to noise
  • the bandwidth allocation module needs to be retrained with a different action set.
  • the joint source and channel encoder and decoder as disclosed herein beats H.264 with LDPC coding for all the bandwidth compression ratios tested in terms of both the PSNR and MS-SSIM metrics. It also beats H.265 using the MS-SSIM metric as shown in Fig.6b, although again, it falls short of H.265 in terms of the PSNR metric.
  • Figure 22 shows a performance comparison of the performance envelope of another example of the trained joint source and channel encoder and decoder as disclosed herein, showing the difference in performance of the system having a uniform bandwidth allocation to the frames in a group of pictures, a non-uniform bandwidth allocation in accordance with a pre- determined heuristic, and an optimal bandwidth allocation based on embodiments of the present disclosure in which a bandwidth allocation module is used.
  • the optimised bandwidth allocation the results obtained by using the allocation network is compared with that of uniform allocation (i.e. each frame having the same bandwidth allocation and with a heuristic bandwidth allocation policy.
  • a dynamic control module 115c allocating data items according to a dynamic decision agent, implemented as a Markov Decision Process (MDP).
  • MDP Markov Decision Process
  • the communication system 100 in this embodiment determines interpolation information for encoding interpolation data items in a latent space of the encoder output vectors encoded by the key item encoder neural network 115k. Further still, the communication system 100 includes an interpolation encoder neural network 115i configured as a recurrent neural network (RNN), more specifically a Long Short-Term Memory (LSTM), in which the internal cell state is updated to recurrently encode successive interpolation items until the next key frame, which is then normalized to the power constraint and transmitted as the output vector for recurrent decoding at the decoder 135.
  • RNN recurrent neural network
  • LSTM Long Short-Term Memory
  • a bandwidth allocation module 115b of the encoder 115 in this embodiment is implemented as the dynamic control module 115c for selecting whether or not successive items are for encoding together into the cell state of the interpolation encoder neural network 115i to share the bandwidth needed to transmit the encoder output vector between successive interpolation items.
  • this arrangement provides enhanced capabilities for efficiently encoding streaming media, such as live video, for transmission in such a way that the reconstruction performance is maintained high across a wide range of different channel conditions but the bandwidth used can be low.
  • the interpolation item encoder neural network 115i and interpolation item decoder neural network 135i are configured as RNNs, in particular LSTMs, these can be provided by any suitable function that can learn to recursively encode and decode successive items in a sequence into a state for transmission and recursive decoding, such as a transformer architecture.
  • thefixed GoP formulation where a group of N frames are considered jointly for compression, are forgone, and instead the encoder control module 115c addresses the question of which items to allocate as key items and as interpolation items dynamically, using a dynamic decision agent.
  • the dynamic decision agent is implemented in the embodiment as an infinite horizon Markov decision process (MDP).
  • a sequence of video frames received in step 1501 of encoder process 1500 as a sequence of input data items from information source 111.
  • the sequence of video frames may be a stream of video frames, for example being recorded live by a security camera or a drone.
  • each frame x t isfirst transformed into a latent space vector via the key item encoder neural network 115k, again denoted as [00178]
  • the complimentary key item decoder neural network is also similarly defined to have an architecture similar to denoted as that performs the opposite operation as [00179]
  • the bandwidth allocation module 115b working with the dynamic decision agent implemented as an MDP by dynamic control module 115c, dynamically determines whether the data item should serve as a key item or an interpolation item.
  • the MDP state at time step t is defined as a tuple where k is the number of frames since the last key frame.
  • motion information e.g. opticalflow vectors
  • step 1505 the code word c t for transmission at time step t is taken to be equal to the latent code z t output by the key item encoder neural network 115k.
  • step 1506 a secondary encoder network is utilised as the interpolation item encoder neural network 115i.
  • This neural network, and its counterpart interpolation item decoder neural network 135i, have an architecture and mode of operation that is different to the embodiment described above in Figures 6-11.
  • the interpolation item encoder neural network 115i in this embodiment is a recurrent neural network (RNN), more specifically a Long Short-Term Memory (LSTM), the architecture of which is shown in Figure 13, that takes at its input layer a tuple and maps it to a code word c t .
  • RNN recurrent neural network
  • LSTM Long Short-Term Memory
  • the interpolation item encoder neural network 115i may encode into c t multiple data items successively selected as interpolation items.
  • the final set of codewords C t is constructed from the sequence of keyframe decisions by the bandwidth allocation module 115b and dynamic decision agent and codewords such that [00186]
  • the codeword c t isflushed and stored in the set of codewords C t to be transmitted and the latent vector of the new keyframe z t+1 , is also appended in C t+1 as it is a keyframe. This implies, if a frame is chosen to be a key frame, its codeword is independent of all other codewords; if a frame is not chosen to be a key frame, then its codeword is dependent on the previous codeword.
  • interpolation between data items is now done implicitly by the interpolation item encoder neural network 115i in the latent space.
  • the benefit of interpolation in the latent space is that the interpolation item encoder neural network 115i may be able to transform the values in the pixel domain to a latent space where interpolation can be done more compactly.
  • optical flow interpolation which occurs in the input space (pixel) domain, does not capture occlusion/ disocculusion.
  • the residual needs to be computed and transmitted to account for this type of information.
  • opticalflow treats each frame as a 2D plane where the pixels are simply moved to obtain subsequent frames, without accounting for the fact that the scene itself may be 3D and therefore objects can appear as a result of objects in front of it moving in front of the camera.
  • the function can be thought of as a mapping to a space with higher dimensions (greater degree of freedom) that describes the various types of motion information (opticalflow, residual), and due to the greater degree of freedom, the interpolation can be done by translating the values in each dimension.
  • one dimension can be describing the x-axis movement, another can describe the occlusion of objects... etc.
  • the loop of step 1507 to pass the next interpolation data item x t+1 , to the key and then the interpolation item encoders to update internal state cell of the interpolation item encoder neural network means that all those consecutive non-key frames will be represented by a single code word c t .
  • the interpolation item encoder neural network 115i can thus be seen as a codeword updater, that takes in new information about the current frame through the latent vector z t and updates the previous code word c t-1 to obtain the new code word c t .
  • RNN recurrent neural network
  • LSTM long-short term memory
  • the encoder control module 115c may store a vector representing a map of key frame allocations as a binary vector describing which frames are key frames. This information may be sent by the transmitter 110 as side information using conventional digital modulation and channel coding. [00191] If the set of codewords transmitted to the receiver 130 by steps 1505 and 1509 at time step t is where , is the number of codewords in the set, the bandwidth allocation module 115b may set a channel use constraint B, such that the average channel utilisation is below B.
  • the codeword set C t may then be power normalised and transmitted by the carrier modulator 118 and antenna 119 across the channel 120. It should be noted that the process 1500, as set out more definitely in Algorithm 1 of Figure 14a, does not wait until a certain time t to send the codewords, but rather whenever a new codeword is appended to the set at time t, the new codeword is transmitted as soon as it becomes available. [00193] Turning now to the receiver process 1600 shown in Figure 16 and as set out more definitely in Algorithm 2 of Figure 14b, in step 1601 the receiver receives from the communications channel 120 and demodulates a set of noisy codewords and the key frame allocation map m t .
  • the decoder 130 follows a similar process to the encoder 110 for decoding.
  • the decoder control module 135c passes the codeword to the key item decoder neural network 135k where in step 1603 it decodes and recovers a reconstruction of the encoder input vector x t to generate a representation of the input data item.
  • the decoder control module 135c passes the codeword to the interpolation item decoder neural network 135i defined as a function which takes as an input a tuple and in step 1605 decodes the noisy detected codeword by mapping it to an estimate of the frame latent vector As can be seen from the input, the mapping in step 1605 is based on received signal values in codeword and a latent representation of previous data item (i.e.
  • step 1606 the estimate of the latent vector for time step t decoded by interpolation item decoder neural network 135i in step 1605 is passed to the key item decoder neural network 135k (i.e.
  • step 1604 to decode the latent vector to recover a reconstruction of the encoder input vector x t interpolated from the reconstruction of the previous encoder input vector x t-1 to generate a representation of the input data item for provision to the information sink 131.
  • the process performed by the decoder 135 using the key item decoder neural network 135k, interpolation item decoder neural network 135i, and the version of the key item encoder neural network 115k function stored locally at the decoder 130, is set out in Algorithm 2 shown in Figure 14b.
  • the interpolation item decoder neural network 135i in essence provides a decoder process that recursively unpacks the codeword by conditioning the current latent vector from the previous time step t.
  • the unpacking function is also performed by an LSTM module.
  • the internal state h t represents the current state of the unpacked codeword y t and the input of the LSTM x t represents the current latent vector
  • This recursive encoding and decoding process for successive interpolation items is illustrated in Figure 12 for increasing time steps.
  • the training of the key item encoder and decoder neural network pair, 115k and 135k, and the interpolation item encoder and decoder neural network pair, 115i and 135i, is such that the neural networks are trained together using the method generally described in relation to Figure 5 and the mean-squared error as the loss function, to optimise the weights of the hidden layers thereof to minimise a reconstruction error.
  • an AWGN noise model for the channel may be used.
  • the examples above indicate a software-driven implementation of components of the invention by a more general-purpose processor such as a CPU core based on program logic or instructions stored in a memory
  • certain components of the invention may be partly embedded as pre-configured electronic systems or embedded controllers and circuits embodied as programmable logic devices, using, for example, application-specific integrated circuits (ASICs) or Field-programmable gate arrays (FPGAs), which may be partly configured by embedded software or firmware.
  • ASICs application-specific integrated circuits
  • FPGAs Field-programmable gate arrays
  • the communication channel 120 should be understood as any transformation from the channel input space to the channel output space that includes a random transformation due to the channel.
  • the reference to the noise-affected version of the of the vector of signal values z received at the decoder should be understood to indicate that the input to the decoder is a vector of values correlated with the transmitted vector z of signal values (which is itself correlated with the input data x from the information source), transformed by the communication channel 120, whether that transformation is ‘noise’ or another channel transformation.
  • the communication channel 120 should be understood as encompassing any channel that applies a random transformation to the channel output space.
  • the communication channel 120 may be one that also includes an existing channel encoder and decoder scheme, in which the signal space of the channel input may be the predetermined finite set symbols of the channel code (which could be bit values) for modulating a signal carrier for providing input signals in the alphabet in the input signal space for the communication channel.
  • the transformation applied by the communication channel 120 may, in embodiments, also include an existing channel code.
  • the encoder and decoder pairs may be configured to learn a mapping to a predefined alphabet of symbols corresponding to a message alphabet for an existing channel code by which the input signals to the given communications channel are modulated.
  • the encoder output vectors z output from the encoder 115 will be mapped into the message alphabet of the corresponding channel code (rather than, for example, the raw IQ values transmittable by the transmitter over the communication channel 120, as in the embodiments described above).
  • the noise-affected channel output ⁇ input to the decoder neural network 135 may correspond to the decoded message of the existing channel decoder.
  • the encoder neural network 115 and decoder neural network 135 may learn an optimum mapping of the input information source 111 to inputs of an existing channel code of the communications channel 120 that reduces reconstruction errors at the output 131 of the decoder neural network135.
  • this learned coding of the encoder neural network 115 and decoder neural network 135 is still optimised end-to-end based on the characteristics of the communication channel 120 to reduce reconstruction errors, even though in these alternative embodiments the communication channel 120 includes an existing channel code. This is unlike existing modular source codes which are defined independently of the random transformation applied by any channel.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

A communication system is disclosed for conveying sequences of correlated data items from an information source across a communication channel using joint source and channel coding. The communication system comprises a transmitter having an encoder for encoding input data from an information source for transmission of a transformed version of the input data across a communication channel. The encoder has a key item encoder neural network for encoding data items as key items independent of any other data item in the sequence, and an interpolation item encoder neural network for encoding data items as interpolation items using data representing the input data item and at least one previous data item in the sequence. The communication system also comprises a receiver having a decoder including complementary key item decoder neural network and interpolation item decoder neural network. The receiver is for receiving and demodulating a noise-affected version of the encoder output vectors based on the signal transmitted across the communication channel and passing them to the decoder for reconstructing the sequence of correlated data items. A training method is also disclosed in which the key item encoder neural network and interpolation item encoder neural network are trained together with the respective complementary key item decoder neural network and interpolation item decoder neural network, to minimise an objective function characterising a reconstruction error between input-output pairs of training data items.

Description

ENCODER, DECODER AND COMMUNICATION SYSTEM AND METHOD FOR CONVEYING SEQUENCES OF CORRELATED DATA ITEMS FROM AN INFORMATION SOURCE ACROSS A COMMUNICATION CHANNEL USING JOINT SOURCE AND CHANNEL CODING, AND METHOD OF TRAINING AN ENCODER NEURAL NETWORK AND DECODER NEURAL NETWORK FOR USE IN A COMMUNICATION SYSTEM [0001] This present application relates to an encoder, decoder and communication system comprising a transmitter and receiver incorporating the encoder and decoder for conveying sequences of correlated data items from an information source across a communication channel using joint source and channel coding. In particular, the encoder and decoder are neural networks and in embodiments the information source is a video and the correlated data items are video frames. BACKGROUND [0002] An aim of a data communication system is to efficiently and reliably send data from an information source over a communication channel from a transmitter at as high a rate as possible with as few errors as achievable in view of the channel noise, to enable a faithful representation of the original information source to be recovered at a receiver. [0003] Information sources providing sequences of correlated data items which share similarities and encapsulate data redundancy from one item in the sequence to the next can represent a significant data payload for transmission between transmitters and receivers over communications channels. For example, video content as a sequence of video frames containing images that are typically heavily correlated over time as the video develops. For example, a video of a largely static scene such as from a security camera remains largely unchanged from one video frame to the next. As of 2021, video transmission makes up around 80% of traffic on the Internet by volume, and the data burden on transmitters and receivers to correctly and efficiently transmit video data and other correlated sequences of data over communication channels is high. [0004] Most digital communication systems today include a source encoder and separate channel encoder at a transmitter and a source decoder and separate channel decoder at a receiver. [0005] In digital communication systems, to transmit data from the information source over a communication channel, the symbols of source data are first digitally compressed into bits by the source encoder. The goal in source coding is to encode the sequence of source symbols into a coded representation of data elements to reduce the redundancy in the original sequence of source symbols. In lossless compression one has to remove redundancy such that the original information source can still be reconstructed as the original version from the coded representation, while lossy compression allows a certain amount of degradation in the reconstructed version under some specified distortion measure, for example squared error. For videos, H264/MPEG is an example of a lossy source compression standards widely used in practice. Compressing the information source using a source encoder before transmission means that fewer resources are required for that transmission. [0006] Once the data from the information source has been encoded to compress it down in size, to transfer this representation over a communication channel, the output of the source encoder is then provided to a channel encoder. The goal of the channel encoder is to encode the compressed data representation in a structured way using a suitable Error Correction Code (ECC) by adding redundancy such that even if some of these bits are distorted or lost due to noise over the channel, the receiver can still recover the original sequence of bits reliably. The amount of redundancy that is added depends on the statistical properties of the underlying communication channel and the target Bit Error Rate (BER). Generally, such channel coding schemes using Forward Error Correction (FEC) provide for a faithful recovery of the transmitted data elements (such as a compressed data source) despite the noise in the channel. However, as channel noise increases, BER will increase drastically and, when the channel noise is too high, the signal transmission will drop out completely, meaning the transmitted data cannot be recovered. There are many different channel coding techniques in practice that provide various complexity and performance trade-offs. Turbo codes and Low-density parity-check (LDPC) codes are examples of ECCs that are commonly used in modern communication systems such as WiMAX and fourth generation Long-Term Evolution (LTE) mobile communications. [0007] The coded bits at the output of the channel encoder are transmitted over the channel using a modulator. The modulator converts the bits into signals that can be transmitted over the communication medium. For example, in wireless systems using Quadrature Modulation of two out-of-phase amplitude modulated carrier signals, the transmitted waveform is specified by its In-Phase (I) and Quadrature (Q) components, and a modulator typically has a discrete set of pre-specified I and Q values, called a constellation, and each group of coded information bits are mapped to a single point in this constellation. Example modulation schemes include phase shift keying (PSK) and quadrature amplitude modulation (QAM). [0008] The receiver receives and demodulates (for example, by coherent demodulation) a sequence of noisy symbols, where the noise has been added by the communication channel. These noisy demodulated symbols are then mapped to sequences of data elements by a channel decoder. The decoded data elements are then passed to the source decoder, which decodes these data elements to try to reconstruct a representation of the original input source symbols to reconstruct the information source. [0009] Naturally, the source encoder and decoder are designed jointly, as are the channel encoder and decoder, but the source encoder/decoder and channel encoder/decoder are designed and operate separately to perform very different functions. [0010] The main advantage of separate source and channel coding is the modularity it provides. This means that the same channel encoder and decoder can be used in conjunction with any source encoder and decoder. That is, as long as the source encoder outputs data elements that can be encoded by the channel encoder, it does not matter if these bits come from an image compressor or a video encoder. Thus, a channel encoder can encode data elements for transmission over a channel irrespective of the data elements or the information source from which they have been derived. [0011] Similarly, the source encoder and decoder can be operated in conjunction with any channel encoder and decoder to transmit the encoded source symbols over a communication channel. Thus, a source encoder can encode data elements for subsequent coding by the channel encoder independently of which channel encoder is used. [0012] The transmission of sequences of correlated data items, such as video, in this way is particularly onerous can have significant implications and requirements on quality, latency and error rate, in particular for continuous transmission and reception of continuous flows of such data, as in video streams. [0013] For wireless video transmission, the problem is broken down into two core components: a source encoder that converts the video into a sequence of bits of the shortest possible length, from which a reconstruction of the original video sequence is possible within an allowable distortion; and a channel encoder that introduces redundancies such that the source encoded information is protected against channel distortions and interference. This separate source and channel coding design provides modularity and allows independent optimisation of each component, which was theoretically shown by Shannon (Shannon, 1948) to be optimal for point-to-point communication over static channel conditions in the asymptotic infinite blocklength regime. [0014] However, as more communication intensive paradigms emerge, such as wireless virtual reality (VR) and drone-based surveillance systems, which have ultra-low latency requirements and unpredictable channel conditions, the limits of the separation-based designs are beginning to rear. In such scenarios, the compression delay and the feedback necessary to track the instantaneous channel condition under constant variation are challenging. Also, the theoretical optimality of separation for communication utilising infinite blocklengths with unlimited delay and complexity becomes less relevant for low-latency systems that require short blocklengths and low complexity operations. Moreover, separation-based communication leads to what is known as the cliff-effect, which is when the channel condition deteriorates below the condition that the channel encoder had anticipated, and the source information is lost completely, leading to a cliff edge deterioration of the system performance. As a result, most current systems operate at a much more conservative data transmission rate than that is suggested by the instantaneous channel capacity, and employ additional error correction mechanisms through automatic repeat request (ARQ). [0015] For example, due to the cliff edge effect, if a bit error rate in the transmission of a video over a communications channel exceeds a maximum error rate for the channel decoder to be able to decode the received signal, then the cliff edge effect causes the transmission to drop out completely, which can present significant challenges for live streamed video, such as from a drone to a base station. This can lead to discontinuous reception of transmitted video, and can force the source encoder having to encode the video at ever more lossy compression levels and lower resolutions. [0016] Effective and efficient transmission of sequences of correlated data items, such as video, over noisy communications channels that allows continuous reception while keeping errors in reception to a minimum, is therefore desirable in view of the particular performance requirements and volumes of this type of data to be transmitted. [0017] It is in the above context that the present disclosure has been devised. BRIEF SUMMARY OF THE DISCLOSURE [0018] Recently, new alternatives for the design of wireless communication systems have been proposed. For example, machine learning (ML) approaches to encoding an information source to channel symbols for transmission across a communication channel have been proposed. By using ML in this way, new encoding schemes can be discovered and freely produced that optimise the efficient transmission of an information source across a noisy communication channel, without being limited to existing source or channel coding paradigms, often outperforming these legacy handcrafted approaches. Such new approaches to encoding information sources are optimised using ML but do not in any way take into account any correlation in the information source, for example, between sequences of frames of video data. [0019] Thus, viewed from one aspect, the present disclosure provides an encoder, a decoder and communication system for conveying data from an information source across a communication channel using joint source and channel coding, comprising a transmitter and a receiver. The communication system comprises a transmitter having an encoder for encoding input data from an information source for transmission of a transformed version of the input data across a communication channel. The encoder has a key item encoder neural network for encoding data items as key items independent of any other data item in the sequence, and an interpolation item encoder neural network for encoding data items as interpolation items using data representing the input data item and at least one previous data item in the sequence. The communication system also comprises a receiver having a decoder including complementary key item decoder neural network and interpolation item decoder neural network. The receiver is for receiving and demodulating a noise-affected version of the encoder output vectors based on the signal transmitted across the communication channel and passing them to the decoder for reconstructing the sequence of correlated data items. A training method is also disclosed in which the key item encoder neural network and interpolation item encoder neural network are trained together with the respective complementary key item decoder neural network and interpolation item decoder neural network, to minimise an objective function characterising a reconstruction error between input-output pairs of training data items. [0020] Viewed from another aspect, the present disclosure provides an encoder for use in a transmitter of a communications system for conveying sequences of correlated data items from an information source across a communications channel using joint source and channel coding. The encoder comprises a key item encoder neural network for encoding data items selected from the sequence to serve as a key item that can be directly reconstructed by the decoder, the key item encoding being based on the input data item and being independent of any other data item in the sequence. The encoder further comprises an interpolation item encoder neural network for encoding data items selected from the sequence to serve as an interpolation item that can be reconstructed by the decoder using interpolation, the interpolation item encoding using data representing the input data item and at least one previous data item in the sequence. The key item encoder neural network and interpolation item encoder neural network have hidden layers of connecting nodes with weights that, in use, map vectors of values received at nodes in an encoder input layer thereof to encoder output vectors provided at nodes of an encoder output layer thereof, the encoder output vectors being used for providing values in a signal space for modulating a carrier signal for transmission of a transformed version of the data items across a communication channel. The key item encoder neural network and interpolation item encoder neural network have in the communications system respective complementary key item decoder neural network and interpolation item decoder neural network for receiving a noise-affected version of the encoder output vector from a receiver receiving and demodulating the signal transmitted across the communication channel and reconstructing the input vector to generate a representation of the input data item. The connecting node weights of the key item encoder neural network and interpolation item encoder neural network have been trained together with the respective complementary key item decoder neural network and interpolation item decoder neural network, to minimise an objective function characterising a reconstruction error between input-output pairs of training data items. [0021] Viewed from another aspect, the present disclosure provides a decoder for use in a receiver of a communications system for conveying sequences of correlated data items from an information source across a communications channel using joint source and channel coding. The decoder comprises a key item decoder neural network for decoding data items from the sequence indicated as key items, the key item decoding being based on a noise-affected version of an encoder output vector generated at a transmitter by a complementary key item encoder neural network to encode the data item based on the input data item independent of any other data item in the sequence, the key item decoder neural network being configured for reconstructing the input vector to generate a representation of the input data item directly from the noise-affected version of the encoder input vector and independently of any other data item in the sequence. The decoder further comprises an interpolation item decoder neural network for decoding data items indicated as interpolation items, the key item decoding being based on data representing at least one previous data item in the sequence and a noise-affected version of an encoder output vector generated at a transmitter by a complementary interpolation item encoder neural network to encode the data item based on data representing the input data item and at least one previous data item in the sequence, the noise-affected version of the encoder output vector having been received and demodulated at the receiver based on the signal transmitted across the communication channel, the key item decoder neural network being configured for reconstructing the input vector by interpolation from reconstructions of representations of other data items in the sequence to generate a representation of the input data item. The key item decoder neural network and interpolation item decoder neural network have hidden layers of connecting nodes with weights that, in use, map vectors of values received at nodes in a decoder input layer thereof to decoder output vectors provided at nodes of an decoder output layer thereof, the decoder output vectors providing a reconstruction of the encoder input vector to generate a representation of the input data item. The connecting node weights of the key item decoder neural network and interpolation item decoder neural network have been trained together with the respective complementary key item encoder neural network and interpolation item encoder neural network, to minimise an objective function characterising a reconstruction error between input-output pairs of training data. [0022] Viewed from another aspect, the present disclosure provides a communication system for conveying sequences of correlated data items from an information source across a communication channel using joint source and channel coding. The communication system comprises a transmitter including an encoder and a receiver including a decoder. The encoder comprises a key item encoder neural network for encoding data items selected from the sequence to serve as a key item that can be directly reconstructed by the decoder, the key item encoding being based on the input data item and being independent of any other data item in the sequence. The encoder further comprises an interpolation item encoder neural network for encoding data items selected from the sequence to serve as an interpolation item that can be reconstructed by the decoder using interpolation, the interpolation item encoding using data representing the input data item and at least one previous data item in the sequence. The key item encoder neural network and interpolation item encoder neural network have hidden layers of connecting nodes with weights that, in use, map vectors of values received at nodes in an encoder input layer thereof to encoder output vectors provided at nodes of an encoder output layer thereof, the encoder output vectors being used for providing values in a signal space for modulating a carrier signal for transmission of a transformed version of the data items across a communication channel. The transmitter is configured for transmitting signals over the communication channel based on signal values of the encoder output vectors of the key item encoder neural network and interpolation item encoder neural network. The receiver is configured for receiving and demodulating a noise-affected version of the encoder output vectors based on the signal transmitted across the communication channel and passing them to the decoder for reconstructing the sequence of correlated data items. The decoder comprises a key item decoder neural network for decoding data items from the sequence indicated as key items, the key item decoding being based on a noise-affected version of an encoder output vector generated at a transmitter, the key item decoder neural network being configured for reconstructing the input vector to generate a representation of the input data item directly from the noise-affected version of the encoder input vector and independently of any other data item in the sequence. The decoder further comprises an interpolation item decoder neural network for decoding data items indicated as interpolation items, the key item decoding being based on data representing at least one previous data item in the sequence and a noise-affected version of an encoder output vector generated at a transmitter. The interpolation item decoder neural network is configured for reconstructing the input vector by interpolation from reconstructions of representations of other data items in the sequence to generate a representation of the input data item. The key item decoder neural network and interpolation item decoder neural network have hidden layers of connecting nodes with weights that, in use, map vectors of values received at nodes in a decoder input layer thereof to decoder output vectors provided at nodes of an decoder output layer thereof, the decoder output vectors providing a reconstruction of the encoder input vector to generate a representation of the input data item. The connecting node weights of the key item encoder neural network and interpolation item encoder neural network have been trained together with the connecting node weights of the respective complementary key item decoder neural network and interpolation item decoder neural network, to minimise an objective function characterising a reconstruction error between input-output pairs of training data items. [0023] Viewed from another aspect, the present disclosure provides a method for conveying sequences of correlated data items from an information source across a communications channel using joint source and channel coding. The method comprises, at a transmitter, for each data item in the sequence: selecting data items from the sequence of data items to serve as key items and interpolation items; encoding data items to serve as key items using a key item encoder neural network, the key item encoding being based on the input data item and being independent of any other data item in the sequence; encoding data items to serve as interpolation items using an interpolation item encoder neural network, the interpolation item encoding using data representing the input data item and at least one previous data item in the sequence. The key item encoder neural network and interpolation item encoder neural network have hidden layers of connecting nodes with weights that, in use, map vectors of values received at nodes in an encoder input layer thereof to encoder output vectors provided at nodes of an encoder output layer thereof, the encoder output vectors being used for providing values in a signal space for modulating a carrier signal for transmission of a transformed version of the data items across a communication channel. The method further comprises, at the transmitter, transmitting signals over a communication channel based on signal values of encoder output vectors of the key item encoder neural network and interpolation item encoder neural network. The method further comprises, at a receiver: receiving and demodulating a noise-affected version of the encoder output vectors based on the signal transmitted across the communication channel; decoding data items from the sequence indicated as key items using a key item decoder neural network based on a noise-affected version of the encoder output vector for the data item, the key item decoder neural network being configured for reconstructing the input vector to generate a representation of the input data item directly from the noise-affected version of the encoder input vector and independently of any other data item in the sequence; and decoding data items from the sequence indicated as interpolation items using an interpolation item decoder neural network based on data representing at least one previous data item in the sequence and the noise-affected version of an encoder output vector for the data item, the key item decoder neural network being configured for reconstructing the input vector by interpolation from reconstructions of representations of other data items in the sequence to generate a representation of the input data item. The key item decoder neural network and interpolation item decoder neural network have hidden layers of connecting nodes with weights that, in use, map vectors of values received at nodes in a decoder input layer thereof to decoder output vectors provided at nodes of an decoder output layer thereof, the decoder output vectors providing a reconstruction of the encoder input vector to generate a representation of the input data item. The connecting node weights of the key item encoder neural network and interpolation item encoder neural network have been trained together with the respective complementary key item decoder neural network and interpolation item decoder neural network, to minimise an objective function characterising a reconstruction error between input-output pairs of training data items. [0024] Viewed from another aspect, the present disclosure provides a computer readable medium comprising one or more instructions which when executed cause at least one of: a transmitter; and a receiver; to operate in accordance with the above-described method. [0025] In accordance with these aspects of the present disclosure, a machine-learned for conveying sequences of correlated data items from an information source across a communication channel using joint source and channel coding, which optimises the reconstructed quality end-to-end, is achieved. This method deviates from the separation-based designs by optimising a single encoder and decoder, which jointly provide the same or better performance compared to expert designed, modular systems. [0026] In embodiments, the encoder is configured to encode the sequences of correlated data items from an information source for transmission across the communications channel as a streaming media. In embodiments, the encoder is configured to encode the sequences of correlated data items from an information source into a static media file. [0027] In embodiments, the sequences of correlated data items are a series of image frames providing a video. In embodiments, the correlated data items are each represented by a 3D matrix with a depth based on the colour channels, a height based on the height of the frame and a width based on the width of the frame. In embodiments, the encoder input layers of the key item encoder neural network and/or the interpolation item encoder neural network are configured to receive video frames as input vectors. [0028] In embodiments, the encoder is configured to create a coded and compressed representation of the input data as an encoder output vector for transmission across the communications channel, and the decoder is configured to decode the noise-affected version of the encoder output vector back into an uncompressed reconstruction of the input data. In embodiments, the quantity of information included in the output vector of the key item encoder neural network and/or the interpolation item encoder neural network is smaller than the input vector. [0029] In embodiments, the interpolation encoder input layer is configured such that the interpolation item encoding uses data received at the input layer thereof representing the input data item in input data space and at least one previous data item in the sequence in input data space. In embodiments, the data representing at least one previous data item in the sequence used by the interpolation item encoder to encode interpolation items comprises: the motion representation information representing the relative motion between the data item and at least one other data item in the sequence; and the residual information between the data item and a motion compensated version of at least one other data item in the sequence using the motion representation information in respect of that data item. [0030] In other embodiments, the interpolation encoder input layer is configured such that the interpolation item encoding uses data received at the input layer thereof representing: the input data item in the latent space defined by the output of the key item encoder neural network; and at least one previous data item in the sequence in the latent space defined by the output of the key item encoder neural network. In embodiments, the data representing the input data item used by the interpolation item encoder to encode interpolation items comprises: the key item encoder output vector encoded for the data item by the key item encoder neural network; and wherein the data representing at least one previous data item in the sequence used by the interpolation item encoder to encode interpolation items comprises an encoder output vector transmitted by the encoder for at least one previous data item in the sequence. In embodiments, the data representing at least one previous data item in the sequence used by the interpolation item decoder to decode interpolation items comprises a noise-affected version of an encoder output vector or a reconstruction of the encoder input vector providing a representation of the input data item for at least one previous data item in the sequence. [0031] In embodiments, the data representing at least one previous data item in the sequence used by the interpolation item encoder to encode interpolation items further comprises data representing at least one subsequent data item in the sequence. [0032] In embodiments, the encoder and decoder further comprise a static control module configured to select data items from the sequence of data items as key items and interpolation items according to a fixed order specified by a predetermined group of items, the static control module being further configured to use the key item encoder and decoder to encode and decode data items selected as key items, and to use the interpolation item encoder and decoder to encode and decode data items selected as interpolation items. [0033] In other embodiments, the encoder further comprises a dynamic control module having a dynamic decision agent configured to dynamically choose whether the data item is to serve as a key item or an interpolation item. In embodiments, the dynamic decision agent is configured to dynamically choose whether the data item is to serve as a key item or an interpolation item based at least on one or more of: the current data item; the number of data items transmitted since last key item; a current average channel utilisation; and a channel utilisation constraint. In embodiments, the dynamic decision agent is configured to dynamically choose whether the data item is to serve as a key item or an interpolation item so that the average channel utilisation is below the channel utilisation constraint. In embodiments, the dynamic control module is configured to: select, based on a decision output by decision agent for the data item, whether the data item is to serve as a key item or an interpolation item; if the data item is selected to serve as a key item, use the key item encoder to encode the data item in the sequence to provide a key item encoder output vector for the item, the encoder being configured for transmitting the key item encoder output vector on the communications channel. In embodiments, the dynamic control module is further configured to: if the data item is selected to serve as an interpolation item, use the interpolation encoder to encode the data item to provide an interpolation item encoder output vector for the item, the encoder being configured for transmitting the interpolation item encoder output vector on the communications channel. In embodiments, the dynamic decision agent is configured to generate data mapping for the sequence of data items, which data items are key data items and which data items are interpolation data items, for transmission across the communications channel and for use by the decoder to determine whether the received noise-affected version of an encoder output vector should be decoded by the key item decoder neural network or the interpolation item decoder neural network. [0034] In embodiments, the encoder output layers of the interpolation item encoder neural network and the decoder input layers of the interpolation item decoder neural network are divided into ordered blocks, and the neural networks are trained such that the interpolation item encoder neural network encodes descending ordering of information in increasing blocks of nodes, and such that, for interpolation items, the decoder reconstructs an increasingly refined representation of the input data with increasing blocks received in the noise-affected version of an encoder output vector. In embodiments, the communication system further comprises a bandwidth allocation module configured to determine, for each data item in the sequence selected to serve as an interpolation item, a number of blocks of the interpolation encoder output layer to be transmitted over the communication channel, so as to allocate the available bandwidth in the communications channel to the transmission of interpolation items. In embodiments, the bandwidth allocation module is further configured to determine the number of blocks of the interpolation encoder output layer to be transmitted over the communication channel to seek to minimise the reconstruction error between the representation of the input data reconstructed at the decoder and the input data encoded at the encoder. In embodiments, the bandwidth allocation module is configured to determine the number of blocks of the interpolation encoder output layer to be transmitted over the communication channel based on at least motion representation information determined to represent the relative motion between the data item and at least one other data item in the sequence. In embodiments, the bandwidth allocation module is configured to determine a number of blocks of the interpolation encoder output layer to be transmitted over the communication channel, so as to seek to optimally allocate the available bandwidth in the communications channel between a group, a set or the whole sequence of data items to be transmitted. [0035] In other embodiments, the interpolation encoder neural network is configured to: maintain and update an internal state as successive interpolation items of a group of consecutive interpolation items are encoded by the interpolation encoder neural network; and after successive interpolation items have been encoded into the internal state, to provide the internal state as the interpolation encoder output vector for providing values in a signal space for modulating a carrier signal for transmission of a transformed version of the group of consecutive interpolation data items across the communication channel. In embodiments, the encoder neural network is configured to output an encoder output vector for transmission for each key item and each group of consecutive interpolation items between key items. In embodiments, the interpolation decoder neural network is configured to: for a group of consecutive interpolation items, recursively decode the noise-affected version of the encoder output vector received from a receiver to thereby reconstruct the encoder input vectors of successive interpolation items to generate a representation of the input data items of the group of consecutive interpolation items. In embodiments, the interpolation encoder neural network and the interpolation decoder neural network are both provided by a recurrent neural network, optionally a Long Short-Term Memory (LSTM) network. [0036] In embodiments, the encoder output vectors provide values in a signal space that represent in-phase and quadrature components for modulation of the carrier signal for transmission over the communication channel. In other embodiments, the encoder output vectors provide values defining a probability distribution sampleable to provide values in a signal space that represent in-phase and quadrature components for modulation of the carrier signal for transmission over the communication channel. In embodiments, the encoder output vectors provide values in the signal space that are assigned exactly to symbol values of a predetermined finite set S of symbols of a predefined alphabet of carrier modulation signal values transmittable by the transmitter over the communication channel. In embodiments, the predefined alphabet is a fixed, predefined constellation of symbols for digitally modulating the carrier signal to encode the input data for transmission over the communication channel. [0037] In other embodiments, the encoder output vectors provide values corresponding to a predetermined finite set of symbols of an existing channel encoder and decoder scheme for transmission of data over the communication channel. Thus, besides random noise applied by the communication channel, the transformation applied by the communication channel may, in embodiments, also include an existing channel code. Thus, in these embodiments, the encoder and decoder may learn an optimum mapping of the input information source to inputs of an existing channel code of the communications channel that reduces reconstruction errors at the output of the decoder neural network. Although acting as an outer code in these embodiments, this learned coding of the encoder and decoder is still optimised based on the characteristics of the communication channel to reduce reconstruction errors, even though in these alternative embodiments the communication channel includes an existing channel code. This is unlike existing modular source codes which are defined independently of the random transformation applied by any channel. [0038] Viewed from another aspect, the present disclosure provides a method of training an encoder and a decoder for use in a communication system in accordance with the above aspects and embodiments of the present disclosure for conveying sequences of correlated data items from an information source across a communication channel using joint source and channel coding. The method comprises: for input-output pairs of a set of training data items from the information source passed to the encoder, determining an objective function characterising a reconstruction error between input-output pairs of training data from the information source passed to the encoder and the representation of the input data reconstructed at the decoder; and using an appropriate optimisation algorithm operating on the objective function, updating the connecting node weights of the hidden layers of the key item encoder neural network, interpolation item encoder neural network, key item decoder neural network and interpolation item decoder neural network to seek to minimise the objective function. [0039] In embodiments, the encoder neural networks and decoder neural networks have been trained together using training data in which a model of the communication channel is used to estimate channel noise and add it to the transmitted signal values to generate a noise-affected version of the vector of signal values in the input-output pairs of training data. [0040] In embodiments, the encoder output layers of the interpolation item encoder neural network and the decoder input layers of the interpolation item decoder neural network are divided into ordered blocks, wherein the encoder output vector passed to the decoder input layer for each input-output pair during training is selected to have a random number of the ordered blocks, such that, following training, the interpolation item encoder neural network encodes descending ordering of information in increasing blocks of nodes, and such that, for interpolation items, the decoder reconstructs an increasingly refined representation of the input data with increasing blocks received in the noise-affected version of an encoder output vector. [0041] Viewed from another aspect, the present disclosure provides a computer readable medium comprising one or more instructions, which when executed cause a computing device to operate the above-described methods of training an encoder and a decoder for use in the above-described communication systems for conveying sequences of correlated data items from an information source across a communication channel using joint source and channel coding. [0042] It will be appreciated from the foregoing disclosure and the following detailed description of the examples that certain features and implementations described as being optional in relation to any given aspect of the disclosure set out above should be understood by the reader as being disclosed also in combination with the other aspects of the present disclosure, where applicable. Similarly, it will be appreciated that any attendant advantages described in relation to any given aspect of the disclosure set out above should be understood by the reader as being disclosed as advantages of the other aspects of the present disclosure, where applicable. That is, the description of optional features and advantages in relation to a specific aspect of the disclosure above is not limiting, and it should be understood that the disclosures of these optional features and advantages are intended to relate to all aspects of the disclosure in combination, where such combination is applicable. BRIEF DESCRIPTION OF THE DRAWINGS [0043] Embodiments of the invention are further described hereinafter with reference to the accompanying drawings, in which: Figure 1 shows a communication system for conveying sequences of correlated data items, such as video, from an information source across a communications channel using joint source and channel coding in accordance with an example of the present disclosure; Figure 2 shows the encoding, transmission and reconstruction of sequences of correlated data items in the form of video frames from an information source using a communication system in accordance with an example of the present disclosure; Figure 3 shows an example run time method for the transmitter and the encoder in accordance with an example of the present disclosure; Figure 4 shows an example run time method for the receiver and the decoder in accordance with an example of the present disclosure; Figure 5 shows an example training time method for the neural networks of the encoder and decoder in accordance with an example of the present disclosure; Figure 6 shows a structure of an communication system in accordance with an example of the present disclosure, showing the use of a key item encoder and decoder and an interpolation item encoder and decoder encoding the interpolation information in input data space; Figure 7 shows an architecture of a key item encoder and decoder in use in the communication system shown in the example communication system of Figure 6 for encoding, transmitting and reconstructing data items selected as key items; Figure 8 shows an architecture of an interpolation item encoder and decoder and motion and residual modules in use in the communication system shown in the example communication system of Figure 6 for encoding, transmitting and reconstructing data items selected as interpolation items; Figure 9 shows an example run time method for the transmitter and the encoder neural network in accordance with the example communication system of Figure 6; Figure 10 shows an example run time method for the receiver and the decoder neural network in accordance with the example communication system of Figure 6; Figure 11 shows an architecture of a bandwidth allocation neural network in use in the communication system shown in the example communication system of Figure 6 for determining a number of blocks of the interpolation item encoder output vector for transmission for the items of a group of items; Figure 12 shows a structure of a communication system in accordance with an example of the present disclosure, showing the use of a key item encoder and decoder and an interpolation item encoder and decoder encoding the interpolation information in latent space; Figure 13 shows an architecture of an interpolation item encoder and decoder for use in the communication system shown in the example communication system of Figure 12 for encoding, transmitting and reconstructing data items selected as interpolation items; Figure 14 shows example algorithms for controlling the encoding and decoding data items for use in the communication system shown in the example communication system of Figure 12; Figure 15 shows an example run time method for the transmitter and the encoder neural network in accordance with the example communication system of Figure 12; Figure 16 shows an example run time method for the receiver and the decoder neural network in accordance with the example communication system of Figure 12; Figure 17 shows a performance of an example of the communication system of Figure 6 for encoding, transmitting and reconstructing example sequences of correlated data items at various contrast signal to noise (CSNR) ratios; Figure 18 shows a performance comparison of the performance envelope of the example of the communication system of Figure 6 as shown in Figure 17 and the performance of separate source coding by H.264 (i.e. MPEG-4 Part 10) and channel coding by a low-density parity- check (LDPC) code at different code rates for encoding, transmitting and reconstructing the example sequences of correlated data items at various contrast signal to noise (CSNR) ratios; Figure 19 shows a performance comparison of the performance envelope of another example of the communication system of Figure 6 and the performance of separate source coding by H.265 (i.e. MPEG-H Part 2) and channel coding by a low-density parity-check (LDPC) code at different code rates for encoding, transmitting and reconstructing other example sequences of correlated data items over a channel having AWGN at various signal to noise (SNR) ratios; Figure 20 shows a visual comparison of reconstructed frames of an example video encoded and transmitted across a channel having additive white Gaussian noise at 13dB, 3dB and - 4dB, by an example of the communication system of Figure 6 trained at different SNRs and by a separate source coding by H.264 (i.e. MPEG-4 Part 10) and channel coding by a low-density parity-check (LDPC) code using different channel code schemes; Figure 21 shows a performance comparison of another example of the communication system of Figure 6 and the performance of separate source coding by H.264/H.265 and channel coding by a low-density parity-check (LDPC) 3/416QAM code for encoding, transmitting and reconstructing other example sequences of correlated data items over a channel having AWGN at a signal to noise (SNR) ratio of 20dB for different bandwidth compression rates; and Figure 22 shows a performance comparison of the performance envelope of another example of the communication system of Figure 6, showing the difference in performance of the system having a uniform bandwidth allocation to the frames in a group of pictures, a non-uniform bandwidth allocation in accordance with a pre-determined heuristic, and an optimal bandwidth allocation based on embodiments of the present disclosure in which a bandwidth allocation module is used. DETAILED DESCRIPTION [0044] Hereinafter, embodiments of the disclosure are described with reference to the accompanying drawings. However, it should be appreciated that the disclosure is not limited to the embodiments, and all changes and/or equivalents or replacements thereto also belong to the scope of the disclosure. The same or similar reference denotations may be used to refer to the same or similar elements throughout the specification and the drawings. [0045] As used herein, the terms “have,” “may have,” “include,” or “may include” a feature (e.g., a number, function, operation, or a component such as a part) indicate the existence of the feature and do not exclude the existence of other features. [0046] As used herein, the terms “A or B,” “at least one of A and/or B,” or “one or more of A and/or B” may include all possible combinations of A and B. For example, “A or B,” “at least one of A and B,” “at least one of A or B” may indicate all of (1) including at least one A, (2) including at least one B, or (3) including at least one A and at least one B. [0047] As used herein, the terms “first” and “second” may modify various components regardless of importance and do not limit the components. These terms are only used to distinguish one component from another. For example, a first user device and a second user device may indicate different user devices from each other regardless of the order or importance of the devices. For example, a first component may be denoted a second component, and vice versa without departing from the scope of the disclosure. [0048] It will be understood that when an element (e.g., a first element) is referred to as being (operatively or communicatively) “coupled with/to,” or “connected with/to” another element (e.g., a second element), it can be coupled or connected with/to the other element directly or via a third element. In contrast, it will be understood that when an element (e.g., a first element) is referred to as being “directly coupled with/to” or “directly connected with/to” another element (e.g., a second element), no other element (e.g., a third element) intervenes between the element and the other element. [0049] As used herein, the terms “configured (or set) to” may be interchangeably used with the terms “suitable for,” “having the capacity to,” “designed to,” “adapted to,” “made to,” or “capable of” depending on circumstances. The term “configured (or set) to” does not essentially mean “specifically designed in hardware to.” Rather, the term “configured to” may mean that a device can perform an operation together with another device or parts. [0050] For example, the term “processor configured (or set) to perform A, B, and C” may mean a generic-purpose processor (e.g., a CPU or application processor) that may perform the operations by executing one or more software programs stored in a memory device or a dedicated processor (e.g., an embedded processor) for performing the operations. [0051] The terms as used herein are provided merely to describe some embodiments thereof, but not to limit the scope of other embodiments of the disclosure. It is to be understood that the singular forms “a,” “'an,” and “the” include plural references unless the context clearly dictates otherwise. All terms including technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the embodiments of the disclosure belong. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein. In some cases, the terms defined herein may be interpreted to exclude embodiments of the disclosure. [0052] As used throughout the Figures, features or method steps are shown outlined in broken lines to indicate that such features or method steps are optional features for provision in some embodiments, but which are not provided in all embodiments to implement aspects of the disclosure. That is, aspects of the disclosure do not require these optional features to be included, or steps to be performed, and they are merely included in illustrative embodiments to provide further optional implementation details. [0053] Reference will now be made to Figure 1 and Figure 2. Figure 1 shows a communication system 100 comprising a transmitter 110 for conveying sequences of correlated data items, such as video, from an information source 111 across a communication channel 120 to a receiver 130 using joint source and channel coding in accordance with an example of the present disclosure. Figure 2 shows the encoding, transmission and reconstruction of sequences of correlated data items in the form of video frames from an information source 11 using the communication system 100. [0054] The transmitter 110 and receiver 130 may each be part of respective electronic devices for transmitting or receiving sequences of correlated data items, such as video. For example, the electronic device coupled to the transmitter 110 or receiver 130 may be a smartphone, a tablet, a personal computer such as a desktop computer, a laptop computer, a netbook computer, a workstation, a server, a wearable device such as a smart watch, smart glasses, a head-mounted device or smart clothes, an airborne or land drone, a robot or other autonomous device such as industrial or home robots, a security control panel, a gaming console, a security camera, a microphone, or an Internet of Things device for sensing or monitoring, such as a smart meter, various sensors, an electric or gas meter, a medical device such as a portable medical measuring device, a blood sugar measuring device, a heartbeat measuring device, or a body temperature measuring device, a navigation device, a global positioning system (GPS) receiver, an event data recorder (EDR), a flight data recorder (FDR), avionics, point of sale devices. The electronic device may also be a base station or relay in a radio communication system, which may be capable of operating in accordance with one or more communications standards, such as the wireless communication standards 802.11xx for WiFi maintained by the Institute of Electrical and Electronics Engineers (IEEE), and the 3G, LTE and NR standards for cellular communications maintained by the 3rd Generation Partnership Project (3GPP), or any other radio transceiver for receiving signals transmitted across the communications channel, and decoding them for onward transmission, for example on the Internet. [0055] The transmitter 110 includes an information source 111, at least one processor 112, memory 113 and a carrier modulator 118 coupled to an antenna 119 for transmitting data over communication channel 120. A bus system (not shown) may be provided which supports communication between at the least one processor 112, memory 113, carrier modulator 118 and antenna 119. [0056] The information source 111 is a source of data items to be transmitted over the communication channel 120 by the transmitter 110. The information source 111 is a source of data provided as a sequence of correlated data items x1, x2, x3, … xn in which the correlation is manifested as some degree of redundancy in the data in adjacent items in the sequence. The data items x1, x2, x3, … xn may, for example, be frames of a video in which the pixel data presented in consecutive video frames may be correlated in location and brightness. In the example information source 111 shown in Figure 2, which shows a video captured by a security camera of largely static scene of a harbour under constant illumination, there may be a significant amount of redundancy from one video frame to the next. Where the video captures a moving item in a scene, such as a moving boat in the scene, or a moving scene caught by a panning camera, differences between one frame and the next may be analysable by optical flow analysis. The information source 111 is not limited to being a source of video data, and the present disclosure is intended to be applicable to sources of any suitable sequences of correlated data items, where the correlation may occur in time or space, or both, or along any other suitable dimension over which the data items are correlated. For example, the information source 111 may be a source of sensor data from one or more sensors sensing one or more physical characteristics of a system that vary over time or location within the physical system in such a way that the data items are correlated. The data items may be captured in steps of equal or unequal intervals along the dimension in which they are correlated. [0057] The information source 111 is any information source suitable for arranging as a sequence of source symbols or fundamental data elements (for example, bits). For example, where the information source 111 is a video source that provides the sequence of data items x1, x2, x3, … xn as video frames, the correlated data items x1, x2, x3, … xn may each be represented by a 3D matrix with a depth based on the colour channels (normally 3 channels for RGB), a height, H, based on the height of the frame and a width, W, based on the width of the frame, i.e. xn ∈ ℝH×W×3. [0058] The information source 111 may generate the correlated data items locally to the transmitter (such as a video captured by a camera coupled to the transmitter, such as in the electronic device of which the transmitter is a part) or it may be a source of data stored locally to the transmitter that was generated elsewhere, remotely from the transmitter 110. The encoding and transmission of the data items from the information source 111 may be performed asynchronously with the time at which the data items were generated, or it may be performed live or in real time, with the encoding being performed largely contemporaneously to the generation of the data items. Either way, the information source 111 may provide the data items for encoding and transmission as a static media file that is encoded and transmitted and then reconstructed and stored at the receiver 130 where it can be viewed in a player or conveyed further. The information source 111 may also provide the data items for encoding and transmission as a stream of video frames for encoding and transmission on the fly, which are then to be reconstructed at the receiver 130 where it can be viewed in a player or conveyed further for replay as a streaming video, in which case the received video stream may or may not be stored locally at the receiver to allow subsequent asynchronous replay. [0059] The information source 111 may store or generate ‘raw’ or ‘uncompressed’ data directly or indirectly representative of characteristics of the information source, to allow faithful reproduction of the information source 111 by a given combination of data processing hardware appropriately configured, for example by software or firmware. Alternatively, the data items may be pre-processed before being passed to the encoder 115 through an initial form encoding which may already compress the data items. This does not preclude the encoder of the present disclosure learning a further optimal joint source channel coding for the communication channel 120 to minimise reconstruction errors. Further still, the data items may represent segments of the data provided by the information source 111. For example, rather than each data item representing an individual video frame, the video frames may be divided into blocks or segments, with each block being represented by a separate sequence of data items. [0060] The processor 112 executes instructions that can be loaded into memory 113. The processor 112 can include any suitable number(s) and type(s) of processors or other devices in any suitable arrangement. Example types of processor 112 include microprocessors, microcontrollers, digital signal processors, field programmable gate arrays and application specific integrated circuits. [0061] The memory 113 may be provided by any structure(s) capable of storing and facilitating retrieval of information (such as data, program code, and/or other suitable information on a temporary or permanent basis). The memory 113 can represent a random access memory or any other suitable volatile or non-volatile storage device(s). The memory 113 may also contain one or more components or devices supporting longer-term storage of data, such as a ready only memory, hard drive, flash memory, or optical disc, which may store software code for loading into the memory 113 at runtime. In use, the processor 112 and memory 113 provide a Runtime Environment (RTE) 114 in which instructions or code loaded into the memory 113 can be executed by the processor to generate instances of software modules in the Runtime Environment 114. [0062] The memory 113 comprises instructions which, when executed by the one or more processors 112, cause the one or more processors 112 to instantiate an encoder 115 in the RTE 114. The encoder 115 includes a key item encoder neural network 115k and an interpolation item encoder neural network 115i for encoding data items selected from the sequence of correlated data items to serve as key items and interpolation items respectively. The encoder 115 may include a motion and residual module 115m for deriving interpolation information in the input data space for encoding interpolation items, for example using optical flow analysis in the case of video data. The encoder 115 may also include a bandwidth allocation module 115b for determining the bandwidth to be allocated to the transmission of the interpolation items. The encoder 115 may also include an encoder control module 115c to control the operation of the key item encoder neural network 115k and an interpolation item encoder neural network 115i to encode data items from the correlated sequence for transmission. [0063] By implementing these component functional modules, the encoder 115 may be configurable by instructions stored in memory 113 and implemented in RTE 114 to carry out the runtime methods described in relation to Figure 3, Figures 6-9 and 11, and Figures 12-15 for encoding sequences of input data items x1, x2, x3, … xn from information source 111 to sequences of encoder output vectors z1, z2, z3, … zn being used for providing values in a signal space for modulating a carrier signal for transmission of a transformed version of the data items across a communication channel 120, the signal values provided from the encoder output vectors z1, z2, z3, … zn representing a transformed version of the input data items. [0064] In more detail, in embodiments, the encoder 115 is configured to receive the data items x1, x2, x3, … xn, in the example the video frames, as input vectors for providing to input layers of the key item encoder neural network 115k and/or the interpolation item encoder neural network 115i. Once the encoder 115 has encoded data items from the sequence of correlated data items into encoder output vectors z1, z2, z3, … zn, the encoder output vectors z1, z2, z3, … zn are passed to the carrier modulator 118. The encoder output vectors z1, z2, z3, … zn are used for providing values in a signal space for modulating, by the carrier modulator 118, a carrier signal for transmission of a transformed version of the data items across a communication channel 120. The carrier modulator 118 may operate to in use directly encode the in-phase (I) and quadrature (Q) components of one or more carriers or subcarriers with signal values provided to the carrier modulator 118 by the encoder 115 using an appropriate modulation technique to provide a channel input signal 118i for transmission by antenna 119 across the communication channel 120. Where multiple carriers or subcarriers are encoded simultaneously, a suitable multiplexing technique such as orthogonal Frequency-Division Multiplexing (OFDM) may be used. As shown in Figure 2, the carriers encoding the encoder output vectors z1, z2, z3, … zn in the channel input signal 118i are then transmitted by the antenna 119 onto the communication channel 120. In other embodiments, the encoder 115 may be configured to output encoder output vectors z1, z2, z3, … zn that may provide values defining a probability distribution sampleable to provide values in a signal space that represent in-phase and quadrature components for modulation of the carrier signal for transmission over the communication channel. For example, the key item encoder neural network 115k and/or the interpolation item encoder neural network 115i may be configured as variational autoencoders. [0065] The carrier modulator 118 and antenna 119 may be of conventional construction and may be configured to encode the carriers/subcarriers with signal values of complex IQ representations. The carrier modulator 118 may be configured to freely modulate the carriers/subcarriers with any IQ signal value within the signal space passed to it. [0066] In other embodiments, the encoder output vectors provide values in the signal space that are assigned exactly to symbol values of a predetermined finite set S of symbols of a predefined alphabet of carrier modulation signal values transmittable by the transmitter over the communication channel. In embodiments, the predefined alphabet is a fixed, predefined constellation of symbols for digitally modulating the carrier signal to encode the input data for transmission over the communication channel. For example, the carrier modulator 118 may be configured to only be able to modulate the carriers/subcarriers with IQ values corresponding to one or more finite, fixed sets or ‘constellations’ of symbols such as by quadrature amplitude modulation (QAM) or binary phase-shift keying (BPSK). For example, the carrier modulator 118 and antenna 119 may be compatible with the 5G New Radio standard such that the transmittable symbols of IQ values are mapped to the 16-QAM, 64-QAM or 256-QAM constellations. The carrier modulator 118 and antenna 119 may be hard-wired to work only with these symbols, and they may not be able to transmit signal values or symbols that are not within these standard constellation sets. The encoder 115 may be configured to learn the optimum encoding within the available constellation of transmittable IQ signal values. [0067] The communication channel 120 may be used to convey information from one or more such transmitters 110 to one or more such receivers 130. The communication channel 120 may be a physical connection, e.g., a wire, or a wireless connection such as a radio channel as in the example shown in Figure 1. There is an upper limit to the performance of a communication system 100 which depends on the system specified. In addition, there is also a specific upper limit for all communication systems which no system can exceed. This fundamental upper limit is an upper bound to the maximum achievable rate of reliable communication over a noisy channel and is known as Shannon’s capacity. [0068] The communication channel 120, including the noise associated with such a channel, is modelled and defined by its characteristics and statistical properties. Channel characteristics can be identified by comparing the input and output of the channel, the output of which is likely to be a randomly distorted version of the input. The distortion indicates channel statistics such as additive noise, or other imperfections in the communication medium such as fading or synchronization errors between the transmitter 110 and the receiver 130. Channel characteristics include the distribution model of the channel noise, slow fading and fast fading. Common channel models include binary symmetric channel and additive white Gaussian noise (AWGN) channel. [0069] The receiver 130 includes at least one processor 132, memory 133 and a carrier demodulator 138 coupled to an antenna 139 for receiving data over communication channel 120. A bus system (not shown) may be provided which supports communication between at the least one processor 132, memory 133, carrier demodulator 138 and antenna 139. The receiver 130 thus includes an information sink 131 to which the reconstructed representation of the input data decoded by the decoder neural network 135 is provided. [0070] Similarly to the processor 112 and memory 113 of the transmitter 110, in the receiver 130, the processor 132 executes instructions that can be loaded into memory 133, and in use provide a Runtime Environment (RTE) 134 in which instructions or code loaded into the memory 133 can be executed by the processor to generate instances of software modules in the Runtime Environment 134. The memory 133 comprises instructions which, when executed by the one or more processors 132, cause the one or more processors 132 to instantiate a decoder 135. [0071] The antenna 139 of the receiver 130 receives as a channel output 138o from the communications channel 120 a noise-affected version
Figure imgf000025_0001
of the encoder output vectors z1, z2, z3, … zn transmitted by the antenna 119 of the transmitter 110, the noise having been added by the communication channel 120. The carrier demodulator 138 demodulates these noisy demodulated versions
Figure imgf000025_0003
of the encoder output vectors z1, z2, z3, … zn, for example, by coherent demodulation, and passes them to the decoder 135 in the RTE 134. These noisy demodulated versions
Figure imgf000025_0005
of the encoder output vectors z1, z2, z3, … zn are then mapped by the decoder 135 to a reconstructed representation
Figure imgf000025_0002
of the originally input sequence of data items x1, x2, x3, … xn where they are passed to the information sink 131 at which a reconstruction of the information source 111 is collected for viewing, storing or conveying further. As can be seen in Figure 2, the information sink 131 collects a decoded reconstruction of the video frames x1, x2, x3, … xn passed to the encoder 115 and
Figure imgf000025_0004
transmitting over the communications channel 120. [0072] In detail, the decoder 135 includes a key item decoder neural network 135k and an interpolation item decoder neural network 135i for decoding data items indicated as key items and interpolation items respectively. The decoder 135 may include a motion and residual module 135m for reconstructing interpolation data items in the input data space using decoded interpolation information provided by interpolation item decoder neural network 135i. The decoder 135 may also include a decoder control module 135c to control the operation of the key item decoder neural network 135k and an interpolation item decoder neural network 135i to decode data items from the correlated sequence for provision to information sink 131 at which the reconstruction representation of the input data items from the information source 111 is collected. [0073] By implementing these component functional modules, the decoder 135 may be configurable by instructions stored in memory 133 and implemented in RTE 134 to carry out the runtime methods described in relation to Figure 4, Figures 6-8, 10 and 11, and Figures 12-14 and 16 for decoding sequences of noise-affected version
Figure imgf000026_0002
of the encoder output vectors z1, z2, z3, … zn received over the communications channel 120 to a reconstructed representation
Figure imgf000026_0001
of the originally input sequence of data items x1, x2, x3, … xn. [0074] The encoder control module 115c may be configured for passing data items from the information source 111 to the key item encoder neural network 115k or the interpolation item encoder neural network 115i for encoding. The encoder control module 115c may be configurable to operate as static control module, which may pass data items from the information source 111 to the key item encoder neural network 115k or the interpolation item encoder neural network 115i based on a fixed order specified by a predetermined group of items, such as a fixed group of pictures for a video encoding scheme (e.g. every 7th item may be a key item, with the intervening items all being interpolation items). The encoder control module 115c may also be configurable to operate instead as a dynamic control module, which may pass data items from the information source 111 to the key item encoder neural network 115k or the interpolation item encoder neural network 115i based on a dynamically assigned order specified by, for example, a decision agent implemented as a Markov Decision Process. [0075] The decoder control module 135c may be configured for passing the noise-affected version
Figure imgf000026_0003
of the encoder output vectors z1, z2, z3, … zn received over the communications channel 120 to either the key item decoder neural network 135k or the interpolation item decoder neural network 135i for decoding based on whether the respective data item is indicated as a key item or an interpolation item. Where a fixed group of items structure is used, the passing of the noise-affected version of the encoder output
Figure imgf000026_0004
vectors z1, z2, z3, … zn received over the communications channel 120 may be based on a fixed order specified by a predetermined group of items. Where a dynamic group of items is used, the receiver 130 may receive separate signalling from the transmitter 110 indicating the sequence of key items and interpolation items based on the dynamic operation of the encoder control module 115c. [0076] The key item encoder neural network 115k and key item decoder neural network 135k are formed as a complementary pair which may be configured as an autoencoder for encoding and decoding data items in the sequence selected as key items independent of any other data item in the sequence. That is, the key item encoder neural network 115k is for encoding data items selected from the sequence to serve as key items that can be directly reconstructed by key item decoder neural network 135k, the encoding and decoding of key items being independent of any other data item in the sequence. [0077] Similarly, the interpolation item encoder neural network 115i and the interpolation item decoder neural network 135i are formed as a complementary pair which may be configured as an autoencoder, a recurrent neural network, a long short-term memory, or any other suitable neural network configuration, for encoding and decoding data items in the sequence selected as interpolation items by interpolation at least in relation to a previous data item in the sequence. That is, the interpolation item encoder neural network 115i is for encoding data items selected from the sequence to serve as interpolation items that can be reconstructed by the interpolation item decoder neural network 135i, and other components of the decoder 135 as needed, using interpolation, the encoding and decoding of interpolation items using data representing the input data item and at least one previous data item in the sequence. [0078] Neural networks are machine learning models that employ multiple layers of nonlinear units (known as artificial “neurons”) to generate an output from an input. Neural networks may be composed of several layers, each layer formed from nodes. Neural networks can have one or more hidden layers in addition to the input layer and the output layer. The output of each layer is used as the input to the next layer (the next hidden layer or the output layer) in the network. Each layer generates an output from its input using a set of parameters, which are optimized during the training stage. For example, each layer comprises a set of nodes, the nodes having learnable biases and their inputs having learnable weights. Learning algorithms can automatically tune the weights and biases of nodes of a neural network to optimise the output in order to minimise an objective function using an optimisation algorithm such as gradient descent or stochastic gradient descent. [0079] The key item encoder neural network 115k has an input layer having nodes for receiving input data x1, x2, x3, … xn for encoding representative of input data items from the information source. [0080] The interpolation item encoder neural network 115i also has an input layer having nodes for receiving input data for encoding. The data input to the input layer of the interpolation item encoder neural network 115i depends on the neural network architecture, and whether the interpolation item encoder neural network 115i encodes interpolation information in the input data space or the latent space. [0081] That is, where the interpolation happens in the input data space, the input layer of the interpolation item encoder neural network 115i may receive input data xn for encoding representative of input data items from the information source (relating to the current and at least the previous input data item, i.e. xn and xn-1), as well as motion and residual information generated by the motion and residual module 115m. In other arrangements, where the interpolation happens in the latent space, the input layer of the interpolation item encoder neural network 115i may receive the output vector zn of the key item encoder neural network 115k for current and at least the previous input data item in the sequence (i.e. zn and zn-1). [0082] The key item encoder neural network 115k and interpolation item encoder neural network 115i have respective encoder output layers that output encoder output vectors z1, z2, z3, … zn that are used for providing values in a signal space for modulating, by the carrier modulator 118, a carrier signal for transmission by antenna 119 over communications channel 120. [0083] The key item encoder neural network 115k and interpolation item encoder neural network 115i have hidden layers of connecting nodes with weights that, in use, map vectors of values received at nodes in the encoder input layer thereof to the encoder output vectors such that the transmitter 110 transmits a transformed version z1, z2, z3, … zn of the input data items x1, x2, x3, … xn across the communication channel 120. [0084] Similarly, the key item decoder neural network 135k and interpolation item decoder neural network 135i have hidden layers of connecting nodes with weights that, in use, map vectors of values
Figure imgf000028_0003
received at nodes in a decoder input layer thereof to decoder output vectors
Figure imgf000028_0001
provided at nodes of an decoder output layer thereof, the decoder output vectors
Figure imgf000028_0002
providing a reconstruction of the encoder input vector to generate a representation of the input data item. [0085] In accordance with the present disclosure, the connecting node weights of the key item encoder neural network 115k and interpolation item encoder neural network 115i have been trained together with the respective complementary key item decoder neural network 135k and interpolation item decoder neural network 135i, to minimise an objective function characterising a reconstruction error between input-output pairs of training data items. The training of the connecting node weights may be performed using an appropriate optimization algorithm operating on the objective function. [0086] In this way, the input data from the information source 111 (such as the image or video) encoded and transmitted by the transmitter 110 can be received and decoded at the receiver 130 to allow a reconstructed representation of the original input image or video to be generated at information sink 131. [0087] The key item encoder neural network 115k and interpolation item encoder neural network 115i may be configured to create a coded and compressed representation of the input data as an encoder output vector for transmission across the communications channel. For example, the encoder output vectors may contain less information than the encoder input vectors for each data item. Further, the encoder 115 may be configured such that the bandwidth allocation module 115b may interoperate with the key item encoder neural network 115k, interpolation item encoder neural network 115i and encoder control module 115c to encode the data items selected as interpolation items in such a way that an available bandwidth in the communication channel 120, or data budget, is shared between the data items so that the data items are compressed, for example such that a channel use constraint is met. In embodiments, the bandwidth allocation module 115b may be provided by a neural network configured, for example by reinforcement learning, to select, for each interpolation item in a group of pictures, a number of blocks of the interpolation item encoder output vector for transmission, where the interpolation item encoder neural network 115i has been trained to encode increasing information with an increased number of encoder output vector blocks. In other embodiments, where the interpolation item encoder neural network 115i is provided as a recurrent neural network (RNN), such as a long short-term memory (LSTM), that encodes successive interpolation items into a internal cell state thereof for transmission as an encoder output vector, the bandwidth allocation module 115b may be configured work with the encoder control module 115c to dynamically select successive data items as interpolation items to be encoded together into the same recurrently updated encoder output vector for transmission, such that successive interpolation items are encoded together in a compressed transmission for recurrent decoding. Similarly then, the decoder may be configured to decode the noise- affected version
Figure imgf000029_0002
of the encoder output vector back into an uncompressed reconstruction
Figure imgf000029_0001
of the input data. In embodiments, the quantity of information included in the output vector of the key item encoder neural network and/or the interpolation item encoder neural network is smaller than the input vector. [0088] Although the examples described above disclose the output of the encoder 115 being a vector of signal values in IQ space transmittable by the transmitter over the communication channel 120, in other embodiments the communication channel 120 may be one that also includes an existing channel encoder and decoder scheme, in which the signal space of the channel input may be the predetermined finite set symbols of the channel code (which could be bit values) for modulating a signal carrier for providing input signals in the alphabet in the input signal space for the communication channel 120. Thus besides random noise applied by the communication channel, the transformation applied by the communication channel 120 may, in embodiments, also include an existing channel code. Thus in these embodiments, the signal space may be a message alphabet for an existing channel code by which the input signals to the given communications channel are modulated. In this case, the encoder output vector will be mapped into the message alphabet of the corresponding channel code (rather than, for example, the raw IQ values transmittable by the transmitted). The noise-affected version of the encoder input vector input to the decoder neural network 135 may correspond to the hard- decoded message of the existing channel decoder. In this respect, in these embodiments, the encoder neural network 115 and decoder neural network 135 may learn an optimum mapping of the input information source 111 to inputs of an existing channel code of the communications channel 120 that reduces reconstruction errors at the output 131 of the decoder neural network135. Although acting as an outer code in these embodiments, this learned coding of the encoder neural network 115 and decoder neural network 135 is still optimised based on the characteristics of the communication channel 120 to reduce reconstruction errors, even though in these alternative embodiments the communication channel 120 includes an existing channel code. This is unlike existing modular source codes which are defined independently of the random transformation applied by any channel. [0089] Reference will now be made to Figures 3 and 4 which set out in more detail how the transmitter 110 and receiver 130, and the trained neural networks of the encoder 115 and decoder 135, operate to transmit data from information source 111 across communication channel 120 by joint source and channel coding. [0090] As shown in Figure 3, the run time method 300 for the transmitter 110 and the encoder 115 starts in step 301 in which a sequence of correlated data items for intervals t = 1, 2, 3 … n, such as frames of a video, is received from an information source 111 for passing to the key item encoder neural network 115k or interpolation item encoder neural network 115i as encoder input vectors x1, x2, x3, … xn for encoding. In examples, the data items may be received at the encoder control module 115c which may select each data items in the sequence as a key item or an interpolation item, according to a static or dynamic allocation of a group of items, and then passing to the key item encoder neural network 115k or interpolation item encoder neural network 115i as encoder input vectors x1, x2, x3, … xn for processing accordingly. [0091] In embodiments, the encoder control module 115c may be a static control module configured to select data items from the sequence of data items as key items and interpolation items according to a fixed order specified by a predetermined group of items. [0092] Alternatively, or in addition, the encoder control module 115c may be a dynamic control module having a dynamic decision agent configured to dynamically choose whether the input data item xt is to serve as a key item or an interpolation item. The dynamic decision agent may be configured to dynamically choose whether the input data item xt is to serve as a key item or an interpolation item based at least on one or more of: the current data item xt; the number of data items transmitted since last key item; a current average channel utilisation; and a channel utilisation constraint. The dynamic decision agent may be configured to dynamically choose whether the input data item xt is to serve as a key item or an interpolation item so that the average channel utilisation is below the channel utilisation constraint. To facilitate decoding by the decoder 135 where the ordering of key and interpolation items is not fixed, and is instead dynamically assigned, the dynamic decision agent may be configured to generate data mapping, for the sequence of data items, which data items are key data items and which data items are interpolation data items. This mapping is for transmission across the communications channel 120 and for use by the decoder 135 to determine whether the received noise-affected version of an encoder output vector zt should be decoded by the key item decoder neural network 135k or the interpolation item decoder neural network 135i. [0093] Thus, in step 302, the encoder 115, or in embodiments more specifically the encoder control module 115c, may determine, statically or dynamically, whether the data item xt is selected as a key item or an interpolation item. [0094] If the data item is selected as a key item, in step 303, the data item xt is passed to the key item encoder neural network 115k where it is encoded to a latent encoder output vector zt based on the input data item xt and being independent of any other data item in the sequence. As a key item, the data item xt can be directly reconstructed by the decoder from the noise- affected version
Figure imgf000031_0001
of the encoder output vector zt alone. [0095] If the data item is selected as an interpolation item, in step 304, the interpolation item encoder neural network 115i is used to encode to a latent encoder output vector zt representative of the input data item xt. The encoding by the interpolation item encoder neural network 115i may be performed using data representing the input data item xt and at least one previous data item in the sequence xt-1. The encoding by the interpolation item encoder neural network 115i may be performed also using data representing the input data item xt and at least one subsequent data item in the sequence xt+1. In this respect, the input data item xt, and other data items used in the encoding the latent encoder output vector zt by the interpolation item encoder neural network 115i, may be pre-processed before being passed to the interpolation item encoder neural network 115i to provide representative data to facilitate the encoding of interpolation information to allow the reconstruction of the input data item xt at the decoder by interpolation from reconstructions of representations of other data items in the sequence. [0096] For example, before being passed to the interpolation item encoder neural network 115i, if the interpolation information for the data item xt is being evaluated in the input data space, the input data item xt may be pre-processed by a motion and residual module 115m of the encoder 115 to generate one or more of: motion representation information representing the relative motion between the data item and at least one other data item in the sequence; and the residual information between the data item and a motion compensated version of at least one other data item in the sequence using the motion representation information in respect of that data item. In this respect, the encoding of a latent encoder output vector zt for interpolation items by the interpolation item encoder neural network 115i may use data received at the input layer thereof representing the input data item in input data space and at least one previous data item in the sequence in input data space. For video frames, the input data space is the pixel space of the video. The motion representation information may therefore be optical flow information for the input video frame xt produced from an optical flow analysis of the sequence of video frames. [0097] In other examples, before being passed to the interpolation item encoder neural network 115i, if the interpolation information for the data item xt is being evaluated in the input data space, the input data item xt may be pre-processed by the key item encoder neural network 115k to encode the data item xt into a latent space vector zt. In this respect, the input layer of the interpolation item encoder neural network 115i may be configured such that the interpolation item encoding uses data representing: the input data item in the latent space defined by the output of the key item encoder neural network 115k (i.e. the vector zt output therefrom), and at least one previous data item in the sequence in the latent space defined by the output of the key item encoder neural network 115k (i.e. (i.e. the vector zt-1 output therefrom for the previous data item xt-1). The output of the interpolation item encoder neural network 115i may also be a vector in the latent space z, the vector zt being representative of interpolation information in the latent space z for the data item xt. Encoding the interpolation information in the latent space z in this way may be more efficient and effective than encoding the interpolation information in the input data space of x. [0098] Once the data item xt is encoded by the key item encoder neural network 115k or the interpolation item encoder neural network 115i into a latent space vector zt it is passed in step 305 to the carrier modulator 118. The latent space vector zt has values usable for providing values in a signal space for modulating, by the carrier modulator 118, a carrier signal or one or more subcarriers for with a transformed version of the data item xt. Once the carrier signal has been encoded it is transmitted across communication channel 120 using antenna 119. [0099] For the latent space vectors z1,2,3,…n provided by the interpolation item encoder neural network 115i, in embodiments where the encoder output layer of the interpolation item encoder neural network 115i is divided into blocks and is trained to encode descending ordering of information in increasing blocks of output nodes, and with increasing blocks in the latent space vector, the bandwidth allocation module 115b may control the number of output blocks of each latent space vector zt that are transmitted to share out the available bandwidth or bit budget, for example to meet an average channel use condition, or based on an allocation of bandwidth for the transmission. In embodiments, the bandwidth allocation module 115b may be configured to determine the number of blocks of the interpolation encoder output layer to be transmitted to minimise the reconstruction error between the representation of the input data reconstructed at the decoder and the input data encoded at the encoder. In embodiments, the bandwidth allocation module 115b may be configured to determine the number of blocks of the interpolation encoder output layer to be transmitted based on at least motion representation information determined to represent the relative motion between the data item xt and at least one other data item in the sequence (e.g. xt-1). In embodiments, the bandwidth allocation module is configured to determine a number of blocks of the interpolation encoder output layer to be transmitted, so as to seek to optimally allocate the available bandwidth in the communications channel between a group, a set or the whole sequence of data items to be transmitted (e.g. x1, x2, x3, … xn). [00100] On the other hand, in embodiments where the interpolation item encoder neural network 115i is provided by a recurrent neural network (RNN) such as an LSTM, the bandwidth allocation module 115b may work together with the encoder control module 115c to encode successive interpolation items into the cell of the RNN for transmission in a single latent space vector zt for recursive decoding by the decoder 135. That is, the encoder neural network 115i is operated by the bandwidth allocation module 115b working together with the encoder control module 115c to maintain and update an internal cell state thereof as successive interpolation items of a group of consecutive interpolation items are encoded by the interpolation encoder neural network 115i. After successive interpolation items have been encoded into the internal state, the encoder 115 is configured to provide the internal state as the interpolation encoder output vector zt for providing values in a signal space for modulating the carrier signal for transmission of a transformed version of the group of consecutive interpolation data items across the communication channel 120. That is, the encoder 115 is configured to output an encoder output vector zt for transmission for each key item and each group of consecutive interpolation items between key items. In this way, an available bandwidth or bit budget can be shared between the input data items, for example based on a dynamic decision in a Markov Decision Process of whether or not the current data item should be a key item or an interpolation item. [00101] The transmission by the transmitter 110 of the carrier signals modulated using the latent space vectors z1,2,3,…n may be in sequence as each latent space vector is encoded for each time step as the data item xt for that time step is generate or received, for example on the fly in the event of streaming data. On the other hand, where the data items x1,2,3,…n are previously generated and stored in a static media file, the latent space vectors z1,2,3,…n may all be encoded first before being transmitted in sequence by the transmitter. [00102] After the signal values from the encoded latent space vectors z1,2,3,…n have all been transmitted, the transmitter process 300 is completed. [00103] Turning now to Figure 4, the run time method 400 for the receiver 130 and the decoder 135 starts in step 401 in which the antenna 139 receives a carrier signal from communications channel 120 and passes it to carrier demodulator 138 which demodulates the carrier signal to recover noise-affected version
Figure imgf000034_0001
of encoder output vector z1,2,3,…n as they are received, and passes them to the decoder control module 135c. [00104] In step 402, the decoder control module 135c determines whether the noise-affected version of encoder output vector zt for a given time step t encodes the input data item xt for that time step as a key item or an interpolation item. In embodiments where the ordering of the group of items follows is static and follows a pre-defined order, the decoder control module 135c may determine whether the received
Figure imgf000034_0002
̂ is representative of a key item or an interpolation item based on that order. In embodiments where the ordering of the data items as key or interpolation items is assigned dynamically, the decoder control module 135c may determine whether the received
Figure imgf000034_0003
is representative of a key item or an interpolation item based on mapping data received from the encoder 110. [00105] If, in step 402, the noise-affected version
Figure imgf000034_0004
of encoder output vector zt is indicated as a key item, in step 403, the vector
Figure imgf000034_0007
is passed to the key item decoder neural network 135k where it is directly decoded to provide a reconstruction
Figure imgf000034_0005
of the input vector xt independently of any other data item in the sequence. In this way, the key item decoder neural network 135k generates a representation of the input data items of x1,2,3,…n indicated as key items directly from the relevant noise-affected version of the encoder input vectors received at
Figure imgf000034_0006
the receiver. [00106] If, in step 402, the noise-affected version
Figure imgf000034_0010
of encoder output vector zt is indicated as an interpolation item, in step 404, the vector
Figure imgf000034_0012
is passed to the interpolation item decoder neural network 135i where it is decoded to provide a reconstruction
Figure imgf000034_0011
of the input vector xt based on data representing at least one previous data item xt-1 in the sequence and the noise- affected version of the encoder output vector zt for the data item.
Figure imgf000034_0013
[00107] To provide a reconstruction
Figure imgf000034_0008
of the input vector xt , in embodiments where the interpolation is performed in the input data space, the data representing at least one previous data item xt-1 in the sequence used by the interpolation item decoder neural network 135i to decode interpolation items may comprise a reconstruction
Figure imgf000034_0009
of the encoder input vector providing a representation of the input data item xt-1 for at least the previous data item in the sequence. In embodiments, the interpolation item decoder neural network 135i decodes the noise-affected version of the encoder output vector zt to provide an estimate of the motion
Figure imgf000034_0014
representation information representing the relative motion between the data item xt and the at least one other data item in the sequence (e.g. xt-1) generated at the motion and residual module 115m and encoded by the interpolation item encoder neural network 115i. In embodiments, the interpolation item decoder neural network 135i decodes the noise-affected version of the encoder output vector zt to provide an estimate of the residual information between the data item xt and a motion compensated version of the at least one other data item in the sequence (e.g. xt-1) using the motion representation information in respect of that data item generated at the motion and residual module 115m and encoded by the interpolation item encoder neural network 115i. [00108] To provide a reconstruction of the input vector xt , in embodiments where the
Figure imgf000035_0011
interpolation is performed in the latent space, the data representing at least one previous data item xt-1 in the sequence used by the interpolation item decoder neural network 135i to decode interpolation items may comprise a noise-affected version of an encoder output vector In embodiments, the interpolation item decoder neural network 135i provides a
Figure imgf000035_0001
reconstruction
Figure imgf000035_0002
of the input vector xt by decodes the noise-affected version
Figure imgf000035_0008
of the encoder output vector zt based on the vector and the vector for the previous data item. That is,
Figure imgf000035_0012
Figure imgf000035_0004
the vector and
Figure imgf000035_0003
are used recursively as inputs by the recurrent neural network to update the cell state and provide the reconstruction as an output. Specifically, the vector
Figure imgf000035_0005
Figure imgf000035_0007
corresponds to a representation of the previous data item xt-1 in the latent space as represented by an encoding of the reconstruction
Figure imgf000035_0006
using the key item encoder neural network 115k. For this purpose, the decoder 135 may store a software module in memory 133 for instantiating the key item encoder neural network 115k locally in RTE 134. [00109] In embodiments, the reconstruction
Figure imgf000035_0009
of the encoder input vector providing a representation of the input data item xt-1 is obtained at the [00110] Once the reconstruction
Figure imgf000035_0010
of the encoder input vector xt is decoded by the key item decoder neural network 135k or the interpolation item decoder neural network 135i it is passed in step 405 to the information sink 131 at which the information source is reconstructed. The reconstruction of the information source generated in the information sink 131 may be stored locally, for example in memory 133, for local reproduction at a later stage, or it may be reproduced contemporaneously without being stored permanently locally (for example in the case of streaming media). The reconstruction of the information source generated in the information sink 131 may also be conveyed onward for reproduction elsewhere, for example using the Internet. [00111] Regarding training the neural networks, a training time process 500 for optimising the weights of the key item encoder and decoder neural network pair, 115k and 135k, and the interpolation item encoder and decoder neural network pair, 115i and 135i, to minimise reconstruction errors will now be described with reference to Figure 5. [00112] Once all the neural networks of the communication system 100 have been designed and initialised with suitable initial encoder and decoder weights and parameters, the weights of the key item encoder and decoder neural network pair, 115k and 135k, and the interpolation item encoder and decoder neural network pair, 115i and 135i, are jointly optimized end-to-end in an unsupervised manner by passing training data sample vectors
Figure imgf000036_0001
as inputs through the communication system 100 (or a simulation thereof using a channel model to add noise) and receiving its reconstruction vector
Figure imgf000036_0002
in a forward pass of training data through the neural networks. That is, in step 501 the vectors
Figure imgf000036_0003
is received (individually or in batches) and are passed through the communication system to obtain the reconstruction vectors
Figure imgf000036_0004
to form input- output pairs of a set of training data in respect of a training data information source 111. [00113] In examples, in the training phase, the input-output pairs of vectors
Figure imgf000036_0005
of training data may be calculated empirically, by the transmitter 110, in the forward pass, encoding and transmitting the encoder output vector
Figure imgf000036_0006
representation of the input vector
Figure imgf000036_0007
of training data across the communication channel 120 where the signal values are subsequently received as the noise-affected vector
Figure imgf000036_0009
decoded by receiver 130 to the reconstruction vector
Figure imgf000036_0008
In this way, the key item encoder and decoder neural network pair, 115k and 135k, and the interpolation item encoder and decoder neural network pair, 115i and 135i, can be optimised to take into account the noise in the channel through training based on empirical data capturing the effects of channel noise on the transmission. [00114] In other examples, as shown in the Figure 5 example, in the training phase, the input- output pairs of vectors
Figure imgf000036_0010
of training data may be generated using a model of the communication channel 120 to estimate channel noise and add it to the transmitted encoder output vector to generate simulation of the noise-affected vector
Figure imgf000036_0011
subsequent decoding and reconstructing of the output training data vector by the decoder neural networks 135k and
Figure imgf000036_0012
135i. In these examples, a channel model can be adopted that simulates the practical channel experienced in the operational regime. For simplicity, a complex additive white Gaussian noise (AWGN) channel model can be adopted, which produces the channel output where
Figure imgf000036_0013
Figure imgf000036_0014
is a vector containing elements drawn from zero-mean Gaussian distribution of variance σ2. However, in general, the channel model can be any model that simulates an arbitrary transformation of the encoder output vector
Figure imgf000036_0017
transmitted by the transmitter 110. [00115] The training process may perform batchwise optimisation across groups of input- output pairs, such as using gradient descent to find a gradient error in the forward pass and determine an update to the weights. In other examples, stochastic gradient descent may be used in which the error is determined and weights updated for each input-output pair of vectors
Figure imgf000036_0016
of training data, before the next of pair of vectors of the training data are determined
Figure imgf000036_0015
using the updated weights. [00116] When a batch or a single input-output pair of vectors
Figure imgf000037_0002
of training data have been received to optimise and update the weights in the training process, in step 503, an objective function is determined characterising a reconstruction error between the input-output pairs of vectors of training data. In the example, as shown in Figure 5, the reconstruction error for
Figure imgf000037_0003
the objective function is characterised using the Mean Squared Error loss between
Figure imgf000037_0004
calculated by:
Figure imgf000037_0001
[00117] Other objective functions characterising the reconstruction error may be used. [00118] Once the objective function has been calculated for the input-output pair of vectors
Figure imgf000037_0005
of training data, the method further comprises, in steps 505 and 507 which may be performed together, using an appropriate optimisation algorithm operating on the objective function, updating the connecting node weights of the hidden layers of the key item encoder and decoder neural network pair, 115k and 135k, and the interpolation item encoder and decoder neural network pair, 115i and 135i, to seek to minimise the objective function. [00119] In step 505, the gradient descent optimisation algorithm is used to seek to minimize the objective function by using a differential of the objective function to determine the gradient and the direction towards a minimum value for the objective function. Thus, in a backward pass through the communication system, the gradient descent algorithm operates on the objective function based on a differential of at least the key item encoder and decoder neural network pair, 115k and 135k, for training data items that are key items, and the interpolation item encoder and decoder neural network pair, 115i and 135i, for training data items that are interpolation items. For example, using backpropagation, the gradient of the objective function can be efficiently calculated with respect to the weights in the key item encoder and decoder neural network pair, 115k and 135k, and the interpolation item encoder and decoder neural network pair, 115i and 135i, for example by unstacking the elementary functions used to compute the forward pass, and by repeatedly applying the chain rule to autodifferentiate them and determine the gradient with respect to the weights in the key item encoder and decoder neural network pair, 115k and 135k, and the interpolation item encoder and decoder neural network pair, 115i and 135i, by backpropagation. [00120] Once the gradient of the object function is determined with respect to the weights in the key item encoder and decoder neural network pair, 115k and 135k, and the interpolation item encoder and decoder neural network pair, 115i and 135i, for the training data, in step 507, the connecting node weights of the hidden layers of the key item encoder and decoder neural network pair, 115k and 135k, and the interpolation item encoder and decoder neural network pair, 115i and 135i, are updated to seek to minimise the objective function. In examples, this is achieved in the gradient descent optimisation method by the using the determined gradient to estimate an update to the weights of the key item encoder and decoder neural network pair, 115k and 135k, and the interpolation item encoder and decoder neural network pair, 115i and 135i, that is expected to step the objective function towards a minimum, where the local gradient is zero. [00121] Once the estimate of the update to the weights is determined by an optimisation method, the weights of the key item encoder and decoder neural network pair, 115k and 135k, and the interpolation item encoder and decoder neural network pair, 115i and 135i, are updated and, in step 509, it is checked whether there are more samples of training data in the training set. If there are, the process 500 returns to step 501 and the next batch or training sample is received and the optimisation method is carried out again to further optimise the weights of the neural networks. If training over the training set is complete, the process 500 ends and a trained key item encoder and decoder neural network pair, 115k and 135k, and interpolation item encoder and decoder neural network pair, 115i and 135i, are provided for use in an operational communication system 100 for transmitting input data over a communication channel 120. [00122] For the optimization process, as the encoder and decoder blocks are built as artificial neural networks with learnable parameters so that the transformation from/to data to latent representation (code) can be learned directly from data. If the constellation symbols ^ transmittable by the transmitter are predefined, as it is the case when using standard communication hardware and protocols, the or if an existing channel code is used, these pre- existing codes act as constraints for the optimization and the objective function. [00123] If a channel model is used in the forward pass of the training process, rather than empirically generating training data, the channel model can be included directly in the backward pass in the optimisation algorithm. If the channel model used is differentiable, it can be used directly in the backpropagation stage. If it is not differentiable, a generative adversarial network (GAN) may be used to learn a differentiable representation of the channel model. In this way, the key item encoder and decoder neural network pair, 115k and 135k, and the interpolation item encoder and decoder neural network pair, 115i and 135i, can be optimised to take into account the noise in the channel through training based on a theoretical noise model of the communication channel. Thus the key item encoder and decoder neural network pair, 115k and 135k, and the interpolation item encoder and decoder neural network pair, 115i and 135i, can be trained together using training data in which a model of the communication channel is used to estimate channel noise and add it to the transmitted signal values
Figure imgf000039_0003
to generate the noise- affected version
Figure imgf000039_0004
of the vector of signal values in the input-output pairs of training data. [00124] It should be noted that, in other examples, the objective function may characterise and optimise against further constraints and characteristics of the communication system 100, such as to obtain an average power in the symbols transmitted across the communication system 100, so as to ensure the learned coding system satisfies an average power constraint. [00125] For example, where the interpolation item encoder and decoder neural network pair, 115i and 135i, are to encode descending ordering of information in increasing blocks of nodes, such that, for interpolation items, the decoder reconstructs an increasingly refined representation of the input data with increasing blocks received in the noise-affected version of an encoder output vector, the training process 500 can be adapted. Here, the encoder output layers of the interpolation item encoder neural network 115i and the decoder input layers of the interpolation item decoder neural network 135i are divided into ordered blocks. At training time the encoder output vector
Figure imgf000039_0001
passed to the decoder input layer for each input-output pair during training is selected to have a random number of the ordered blocks, such that, following training, the interpolation item encoder neural network 115i encodes descending ordering of information in increasing blocks of nodes, and such that, for interpolation items, the decoder 135 reconstructs an increasingly refined representation of the input data with increasing blocks received in the noise-affected version of the encoder output vector.
Figure imgf000039_0002
[00126] Referring now to Figures 6, 7, 8, 9, 10 and 11, a detailed example embodiment of a communication system 100 for transmitting video across a communications channel will now be described. The communication system 100 in this embodiment operates a static control module allocating data items according to a fixed group of pictures. The communication system 100 in this embodiment determines interpolation information for encoding interpolation data items in a pixel space of the input data items. Further still, the communication system 100 in this embodiment includes an interpolation encoder neural network 115b having an output layer of nodes divided into blocks such that it encodes descending ordering of information in increasing blocks of nodes, and a bandwidth allocation module 115b for selecting a number of blocks of an encoder output vector for interpolation items to share or allocate bandwidth between interpolation items in the group of pictures. As will be seen, this arrangement has been shown to outperform existing separate source coding and channel coding schemes in terms of reduced reconstruction errors at the decoder across a wide range of different channel conditions. [00127] It should be noted that, in relation to Figures 6, 7, 8, 9, 10 and 11, features with like reference numbers represent the same features as those described in relation to Figures 1 and 2, and should be understood accordingly, and so a detailed description of those features may be omitted in the following. [00128] Consider a group of N frames
Figure imgf000040_0001
from a video sequence. This is called a group of pictures (GoP) and Xn refers to the nth GoP in a video sequence received in step 901 from an information source. A static control module 115c in step 902 determines that thefirst
Figure imgf000040_0002
and last
Figure imgf000040_0003
frames are the key frames in the fixed GoP structure, and these are compressed and transmittedfirst using key item encoder neural network 115k. [00129] The key item encoder neural network 115k, parameterised by θ, and mapping a frame to a complex latent vector
Figure imgf000040_0004
representing the In-phase (I) and Quadrature (Q) components of a complex channel symbol, is then defined as:
Figure imgf000040_0005
[00130] This is achieved by pairing consecutive real values at the output of the neural network. [00131] The values in the complex latent vector
Figure imgf000040_0006
mayfirst be power normalised to meet a power constraint, and in step 904 these values are then directly sent through the communication channel 120. [00132] To simulate the channel 120 during training, an Additive White Gaussian Noise (AWGN) channel model is used, defined as:
Figure imgf000040_0007
where
Figure imgf000040_0013
is the Complex Gaussian distribution with zero mean and covariance matrix
Figure imgf000040_0011
being the identity matrix). Consequently, the key item decoder neural network 135k parameterised by
Figure imgf000040_0012
that maps the noisy latent vector
Figure imgf000040_0008
received and demodulated at the receiver in step 1001 and passed to the key item decoder neural network 135k in step 1002 to decode it back to the original frame domain
Figure imgf000040_0010
in step 1003 is defined as:
Figure imgf000040_0009
[00133] The key item encoder neural network 115k and key item decoder neural network 135k are then trained together using the method generally described in relation to Figure 5 using the mean-squared error as the loss function, defined as follows, to optimise the weights of the hidden layers thereof to minimise a reconstruction error.
Figure imgf000040_0014
[00134] A diagram of the key item encoder neural network 115k and key item decoder neural network 135k architecture is shown in Figure 7. The notation kxsycz is used to signify kernel size x, stride y and z kernels. The GDN layer refers to Generalised Divisive Normalisation, which is effective in density modelling and compression of images. The network is fully convolutional, therefore it can accept input of any height (H) and width (W). [00135] For the interpolation frames in between
Figure imgf000041_0001
are passed in step 902 the separate interpolation item encoder neural network 115i specified as
Figure imgf000041_0002
is used to encode motion representation and residual information determined in step 905 by motion and residual module 115m. The architecture of the interpolation item encoder neural network 115i and interpolation item decoder neural network 135i is shown in Figure 8. [00136] The motion representation is generated by motion and residual module 115m in respect of two other frames in the sequence by an optical flow estimator to generate the optical flow and residual information with respect to two frames
Figure imgf000041_0003
Figure imgf000041_0004
Figure imgf000041_0005
referred to as anchor frames. The anchor frames may be the previous and next frames in the sequence, such that t = 1. For this, let
Figure imgf000041_0006
be the opticalflow vectors that represent the motion information from frame
Figure imgf000041_0007
, and likewise for
Figure imgf000041_0008
The motion and residual module 115m then determines
Figure imgf000041_0010
as shown in Figure 8 to be a motion compensated anchor frame according to the opticalflow to produce an approximation
Figure imgf000041_0015
Figure imgf000041_0009
of frame using the determined optical flow. Then to determine the residual
Figure imgf000041_0011
for the
Figure imgf000041_0012
from the motion compensated anchor frame the motion and residual module 115m
Figure imgf000041_0013
determines the residual error in the optical flow interpolation:
Figure imgf000041_0014
[00137] The residual represents information not captured by opticalflow, such as occlusion/disocclusion and camera movements. To estimate the opticalflow, a pre-trained PWC-Net can be used in the motion and residual module 115m. [00138] Given all of this information, the interpolation item encoder neural network 115i
Figure imgf000041_0016
parameterised by φ- defines the mapping for interpolation data items into the latent space:
Figure imgf000041_0017
[00139] In step 906, the interpolation item encoder neural network 115i
Figure imgf000041_0018
is thus used to encode the data item
Figure imgf000041_0021
based on the data items optical flows and
Figure imgf000041_0020
Figure imgf000041_0019
residuals
Figure imgf000041_0022
In step 907, as described later, if the interpolation item encoder neural network 115i encodes descending ordering of information in increasing blocks of nodes, a bandwidth allocation module 115b may be used to select a number of blocks of an encoder output vector for the interpolation item to share or allocate bandwidth between interpolation items in the group of pictures. [00140] The values in the complex latent vector mayfirst be power normalised to meet a
Figure imgf000042_0001
power constraint, and in step 908, the transmitter transmits the signal values of encoder output vector over communications channel 120. [00141] The interpolation item decoder neural network 135i parameterised by
Figure imgf000042_0003
maps the noisy latent vector
Figure imgf000042_0002
received and demodulated at the receiver in step 1001 for the interpolation items and passed to the interpolation item decoder neural network 135k in step 1002 to decode it in step 1004 back to provide an estimate of the opticalflow, residual and a mask. That is, as can be seen in Figure 8, the interpolation item decoder neural network 135i parameterised by
Figure imgf000042_0004
defines the mapping:
Figure imgf000042_0008
where the mask and
Figure imgf000042_0006
a slice in the third dimension of
Figure imgf000042_0005
Figure imgf000042_0007
satisfies:
Figure imgf000042_0009
[00142] After the estimate of the opticalflow, residual and a mask are decoded, as can be seen in Figure 8 the decoder motion and residual module 135m reconstructs the frame by:
Figure imgf000042_0010
[00143] where ∗ refers to element-wise multiplication. The reconstructed frames
Figure imgf000042_0011
generated in step 1003 by the key item decoder neural network 135k and in steps 1004 and 1005 by the interpolation item decoder neural network 135i and decoder motion and residual module 135m are then passed to information sink 131 at which the reconstruction of the data source 111 is stored. Once all N frames of the GoP are reconstructed, the last frame
Figure imgf000042_0013
of the current GoP becomes thefirst key frame of the next GoP, and the same process is repeated.
Figure imgf000042_0012
[00144] The architecture of the interpolation item encoder neural network 115i and interpolation item decoder neural network 135i is functionally the same as for the key item encoder neural network 115k and key item decoder neural network 135k. Thus, similarly to the key item encoder and decoder pair as set out above, the interpolation item encoder neural network 115i is trained together with the interpolation item decoder neural network 135i using the
Figure imgf000042_0014
method generally described in relation to Figure 5 and the mean-squared error as the loss function, to optimise the weights of the hidden layers thereof to minimise a reconstruction error. Again, an AWGN noise model for the channel may be used. [00145] To allocate the available bandwidth, the bandwidth allocation module 115b is trained as a separate neural network parameterised by ψ having the architecture as shown in
Figure imgf000043_0001
Figure 11. Given a particular channel use constraint k per each GoP, reinforcement learning (RL) is utilised to learn the optimal bandwidth allocation policy for each frame in a GoP based on the frames themselves, that maximises the video quality. The bandwidth compression ratio is then defined as
Figure imgf000043_0002
[00146] As explained above in relation to Figure 5, the joint source-channel encoders having the neural network architecture of the key item encoder and decoder neural network pair, 115k and 135k, and the interpolation item encoder and decoder neural network pair, 115i and 135i, encoded video frames can be successively refined by sending increasingly more information.
This is achieved by dividing the latent vector into M equal sized blocks
Figure imgf000043_0003
Figure imgf000043_0004
while randomly varying the length L of the latent code selected for
Figure imgf000043_0005
transmission and decoding in each batch of input-output pairs
Figure imgf000043_0006
during training. This training process leads to the descending ordering of information from
Figure imgf000043_0007
to Thus, by selecting fewer blocks of the latent vector
Figure imgf000043_0008
for transmission, less
Figure imgf000043_0014
information encoding the input data item is transmitted, and the reconstruction of the data
Figure imgf000043_0009
item in at the decoder 130 includes less information. Similarly, by selecting more blocks of the latent vector for transmission, more information encoding the input data item is
Figure imgf000043_0013
Figure imgf000043_0012
transmitted, and the reconstruction of the data item in
Figure imgf000043_0010
at the decoder 130 includes more information. In this way, the number of blocks of the encoded latent vector
Figure imgf000043_0011
generated by the encoder neural networks 115 can be selected for transmission to vary the bandwidth used to transmit the encoding of each data item. This may be based on an assessment of a relative amount of information needed to minimise reconstruction errors while meeting a channel use condition or a bit budget for the GoP.
[00147] In the context of video interpolation, if the interpolation frame in consideration is
Figure imgf000043_0016
exactly the same with respect to the anchor frames that it is being interpolated from,
Figure imgf000043_0015
then no information needs to be transmitted as the pixels can simply be copied from the anchor frame to create the frame. On the other hand, if there is significant differences with respect to the anchor frames, then much more information will have to be sent in order to accurately interpolate the frame. Therefore, the neural network parameterised by ψ of the bandwidth
Figure imgf000043_0018
allocator module 115b is trained using reinforcement learning to allocate the available bandwidth to each of the frames in a GoP, using only the frames themselves, such that the
Figure imgf000043_0017
loss metric is minimised. [00148] That is, the nth GoP in a video is defined as where
Figure imgf000044_0002
Figure imgf000044_0001
[00149] The action set A consists of all the ways to allocate the available bandwidth k to each frame in the GoP. Since we are concerned with maximising the visual quality of thefinal video, we define [00150] the reward as
Figure imgf000044_0004
Figure imgf000044_0003
[00151] Deep Q-learning is used to learn the optimal allocation policy, where the network
Figure imgf000044_0006
seeks to approximate Here S represents the set of all states (i.e. all GoPs in a
Figure imgf000044_0005
video). The purpose of the Q function is to map each state and action pair to a Q value, which represents the total discounted reward from step n given the state and action pair The
Figure imgf000044_0007
Q function is defined as mapping:
Figure imgf000044_0008
where 0 ≤ γ ≤1 is the discount factor, which is chosen close to 1 when aiming to optimize the average reward. [00152] As indicated above, the purpose of the of deep neural network
Figure imgf000044_0010
of the bandwidth allocator module 115b is to approximate To that end, the mean-squared error loss
Figure imgf000044_0009
function L is used and gradient descent is performed to update the weights of the network as follows.
Figure imgf000044_0011
where
Figure imgf000045_0001
is the learning rate and B is a batch of data points containing sets of
Figure imgf000045_0003
Figure imgf000045_0002
[00153] The end-to-end objective to find the actions
Figure imgf000045_0007
selecting the optimum number of transmission blocks for each
Figure imgf000045_0008
for a GoP to minimise reconstruction errors for the available bandwidth can be separated into the following two optimisation problems:
Figure imgf000045_0004
and
Figure imgf000045_0005
where
Figure imgf000045_0006
and is the allocation of transmission blocks for each given to fr
Figure imgf000045_0009
Figure imgf000045_0010
ame
Figure imgf000045_0011
[00154] Upon initialisation of the communication system 100, the first frame
Figure imgf000045_0012
is sent using full bandwidth k, and thereafter, the neural network
Figure imgf000045_0013
of the bandwidth allocator module 115b is used to determine the optimal bandwidth allocation for the remaining N − 1 frames in the GoP such that the number of blocks of the encoder output vectors selected in step 907 for
Figure imgf000045_0015
Figure imgf000045_0014
transmission in step 908 is optimal. [00155] The performance of the joint source and channel coding communication system 100 described above in relation to Figures 6, 7, 8, 9, 10 and 11 has been measured for encoding, transmitting and reconstructing sequences of correlated data items at various contrast signal to noise (CSNR) ratios in a software-defined prototype of the communication system 100, and the results can be seen in Figure 17. In particular, the communication system 100 was trained optimally for encoding, transmitting and reconstructing sequences of correlated data items to minimise reconstruction errors at training CSNR levels of -5, 0, 5, 10 and 15 dB, and each trained model was then evaluated under test channel conditions of the same CSNR levels. [00156] Figure 17 plots the performance of each trained model for a bandwidth compression ratio of 0.031 at each evaluation CSNR as given, in the top pane (a), by the measured peak signal to noise (PSNR) ratio indicative of the reconstruction quality of the video frames at the receiver 130, and in the bottom pane (b), by the measured multiscale structural similarity index measure (MS-SSIM) indicative of the similarity at different scales between the video from the information source 111 and the video reconstructed at the information sink 131. MS-SSIM has been shown to perform better in approximating the human visual perception than the more simplistic structural similarity index (SSIM) on different subjective image and video databases [00157] As can be seen from Figure 17, the performance of the trained models in the communication system 100 improves as the training CSNR for the model increases and for each individual trained model, its performance also increases as the evaluation CSNR increases. [00158] This is expected as the channel noise corrupts the joint source and channel coding encoded symbols directly and the greater the noise power, the more difficult it is for the decoder 135 to correctly decode the noisy symbols. Most importantly, as can be seen in Figure 17,the trained models of the communication system 100 do not suffer from the cliff effect. We can see that, as the evaluation CSNR decreases (indicating greater and greater noise in the channel), the performance of each model gracefully degrades, as opposed to the cliff edge drop off that separation-based coding designs suffer from. This is because, in the communication system 100, the channel noise is allowed to directly distort the information being transmitted, which allows the cliff-edge effect to be avoided. [00159] In contrast, the cliff effect can clearly be seen in Figures 18, which plots the envelope of the performance for the different trained joint source channel coding models of the communication system 100 shown in Figure 17 against the same measured performance metrics of a conventional separate source coding by H.264 (i.e. MPEG-4 Part 10) and channel coding by a low-density parity-check (LDPC) code at different code rates. The bandwidth compression rate for the separate source and channel coding models is chosen to be at a level the achieves equivalent performance to the best performing joint source and channel coding model at the highest evaluation CSER of 15dB, in order to compare the best achievable reconstruction performance by both joint- and separate- coding models as the channel noise increases and the evaluation CSNR decreases. However, it should be noted that the bandwidth compression rate for the separate source and channel coding models that achieves the same peak performance to the best joint source and channel coding model at an evaluation CSNR of 15dB is lower than the bandwidth compression rate similarly performing joint source and channel coding model. This indicates that, for the same peak performance, the joint source and channel coding model achieves a higher bandwidth compression ratio, meaning less data is transmitted to achieve the same reconstruction performance. [00160] Regarding the cliff effect for the separation-based schemes, as can be seen from Figure 18, at every LDPC code rate, there exists an evaluation CSNR at which the performance of the separation-based coding scheme deteriorates rapidly, and above which CSRN the performance does not improve. This is due to the fact that the LDPC code rate is insufficient for the channel condition below that cliff edge evaluation CSNR threshold, and this would be observed as a signal drop out, with no received signal being decoded. This can cause significant problems when the channel conditions are variable giving poor reliability in transmission. Further, due to the pre-applied compression, the performance of the separation- coding does not improve as the channel condition improves above the cliff threshold CSNR either, meaning that, for better channel conditions above the cliff edge threshold, no improvement in quality is observable. As such, in separation coding, a cliff edge deterioration of the H.264 scheme is observed. This cliff edge in performance is simply not seen in the trained joint source channel coding models of the communication system 100 of the present disclosure. [00161] In fact, the overall performance of the trained joint source and channel coding models of the communication system 100 is better than the best available H.264 and LDPC codes, as the best performing trained joint source and channel coding model beats the best available separation-coding H.264 with LDPC coding scheme for all evaluation CSNR channel noise levels. This is the case in both the PSNR and MS-SSIM metrics, suggesting the superior compression capability of the communication system 100 over separation-based schemes. Overall, when assessed at the same bandwidth compression ratios the communication system 100 is 3.98dB and 6.07dB better on average in PSNR than H.264 with LDPC coding for ρ = 0:031 and ρ = 0:018, respectively. [00162] Thus from Figure 18 it can be seen that the trained joint source channel coding models of the communication system 100 consistently outperform the best performing conventional separation codes for reconstruction performance and compression rates, for all channel noise conditions. Further, because the encoder neural networks directly map the source inputs to the channel outputs, and the decoder neural networks directly map the noisy-received channel outputs to the reconstruction of the source inputs, the trained joint source channel coding models of the communication system 100 were consistently three orders of magnitude faster in terms of end-to-end encoding/decoding speed, compared to the best performing separate coding schemes, further reducing latency of transmission. [00163] In order to operate the joint source channel coding encoder and decoder at the performance envelope, the current channel condition needs to be monitored and the weights of the encoder and decoder neural networks need to be adjusted to the weights trained to match the channel condition. That is, weights are chosen to correspond to a training condition in which the channel noise or SNR matched the estimate of the current channel condition. In practice, to determine an accurate estimate (SNREST) of the current channel SNR that corresponds to the actual current channel SNR (SNRAWGN), the selection of the trained weights is adjusted such that the performance of the joint source channel coding encoder and decoder meets the rate- distortion curve of the performance envelope as closely as possible (i.e. such that SNREST = SNRAWGN is found). [00164] Figure 19 shows a performance comparison of the performance envelope of another example of the communication system of Figure 6 and the performance of separate source coding by H.265 (i.e. MPEG-H Part 2) and channel coding by LDPC at different code rates for encoding, transmitting and reconstructing other example sequences of correlated data items over a channel having AWGN at various signal to noise (SNR) ratios. [00165] As can be seen in the top pane of Figure 19, in terms of the PSNR metric, separation coding using H.265 and certain LDPC codings can outperform the performance envelope of the trained joint source and channel encoder and decoder as disclosed herein at higher PSNR values. However, a performance comparison is made using the more perceptually aligned MS- SSIM metric, as shown in the bottom pane of Figure 19, it can be seen that the performance envelope of the trained joint source and channel encoder and decoder as disclosed herein can outperform separation-based transmission with H.265. It should also be evident from Figure 19 that in the very low SNR regime (i.e., SNRAWGN < −1 dB), H.265 was unable to meet the compression rate required, and therefore did not produce any results in that range. The trained joint source and channel encoder and decoder as disclosed herein on the other hand, did not have this problem, and results were produced even at low SNR. Further optimization of the network architecture of the trained joint source and channel encoder and decoder as disclosed herein should bring the PSNR performance on a par with or better than H.265 for higher SNR values as well. [00166] Figure 20 shows a visual comparison of reconstructed frames of an example video encoded and transmitted across a channel having additive white Gaussian noise at an SNR of -4dB, 3dB and 13dB (from left to right), by an example of the communication system of Figure 6 trained at different SNRs (top pane, i.e. SNRTrain = -1dB, 6dB, 13dB, from left to right) and by a separate source coding by H.264 (i.e. MPEG-4 Part 10) and channel coding by a low-density parity-check (LDPC) code using different channel coding schemes (bottom pane, i.e.1/2 BPSK, 3/4 QPSK, 3/416QAM, from left to right). [00167] As can be seen, at SNRAWGN = 13 dB, the visual qualities of the videos produced by H.264 and the trained joint source and channel encoder and decoder as disclosed herein are similar. However, at SNRAWGN = 3 dB, the video produced by H.264 starts to look very pixelated, while the trained joint source and channel encoder and decoder as disclosed herein is still able to retain a smooth looking frame. At SNRAWGN = -4 dB, the capacity of the channel is too low for H.264 to compress the video sufficiently, therefore the output is simply black, while the trained joint source and channel encoder and decoder as disclosed herein is still able to achieve a reasonable video quality despite the very low channel SNR. [00168] In these tests, on average, in the AWGN case and a bandwidth compression rate of ρ = 0.031, the trained joint source and channel encoder and decoder as disclosed herein outperforms H.264 by 0.46 dB in PSNR and by 0.0081 in MS-SSIM for SNRAWGN ∈ [13, 20] dB, by 3.07 dB in PSNR and by 0.0485 in MS-SSIM for SNRAWGN ∈ [3, 6] dB. The trained joint source and channel encoder and decoder as disclosed herein falls short of H.265 by 3.22 dB in PSNR, but outperforms it by 0.0006 in MS-SSIM for SNRAWGN ∈ [13, 20] dB. Similarly, it is 0.61 dB worse than H.265 in PSNR but outperforms it by 0.0069 in MS-SSIM for SNRAWGN ∈ [3, 6] dB. [00169] With respect to complexity, using the NVIDIA TensorRT framework to optimize the inference time of the trained joint source and channel encoder and decoder as disclosed herein, it was found that the average inference time is approximately 26 ms. On the other hand, only the encoding time of H.264 took on average 24 ms, using hardware acceleration on the Intel i9- 9900K CPU. H.265 is even slower, at 92 ms. Therefore, the trained joint source and channel encoder and decoder as disclosed herein can be extremely efficient in practice using optimized hardware and library, more so than separation-based methods. [00170] Next, in Figure 21 a performance comparison is shown of a trained joint source and channel encoder and decoder as disclosed herein and the performance of separate source coding by H.264/H.265 and channel coding by a low-density parity-check (LDPC) 3/416QAM code for encoding, transmitting and reconstructing other example sequences of correlated data items over a channel having AWGN at a signal to noise (SNR) ratio of 20dB for different bandwidth compression rates ρ. By decreasing the bandwidth compression ratio ρ, the compression of the video is increased. To change ρ, the retraining of the joint source and channel encoder and decoder is not required. Rather, the bandwidth allocation module needs to be retrained with a different action set. As shown in Figure 21, we see that the joint source and channel encoder and decoder as disclosed herein beats H.264 with LDPC coding for all the bandwidth compression ratios tested in terms of both the PSNR and MS-SSIM metrics. It also beats H.265 using the MS-SSIM metric as shown in Fig.6b, although again, it falls short of H.265 in terms of the PSNR metric. [00171] Regarding the performance improvements that can be achieved by optimising the allocation of bandwidth to different frames in the Group of Pictures using the bandwidth allocation module and methods described herein, we refer to Figure 22. [00172] Figure 22 shows a performance comparison of the performance envelope of another example of the trained joint source and channel encoder and decoder as disclosed herein, showing the difference in performance of the system having a uniform bandwidth allocation to the frames in a group of pictures, a non-uniform bandwidth allocation in accordance with a pre- determined heuristic, and an optimal bandwidth allocation based on embodiments of the present disclosure in which a bandwidth allocation module is used. [00173] For comparison with the optimised bandwidth allocation, the results obtained by using the allocation network
Figure imgf000050_0003
is compared with that of uniform allocation (i.e. each frame having the same bandwidth allocation and with a heuristic bandwidth allocation policy. For the heuristic bandwidth allocation policy, 50% of the available bandwidth is allocated to the key frame and the remaining 50% of the available bandwidth is allocated to interpolation frames based on the magnitude of their optical flow (SSF) with respect to the reference frames. The intuition behind this heuristic policy is that the greater the magnitude of the optical flow, the more pixel warping is needed to interpolate the frame from its reference frames. Therefore, more bandwidth should be allocated to such frames. Further, since the reconstruction quality of the key frame affects the reconstruction quality of the remaining frames in the Group of Pictures, half of the bandwidth is allocated to it. In Figure 22, it can be seen that there is a clear and significant improvement in performance over both uniform and heuristic allocation when using the optimised bandwidth allocation provided by the trained allocation network
Figure imgf000050_0001
operated by the bandwidth allocation module as described herein. [00174] Overall, the optimised trained bandwidth allocation network improves upon the
Figure imgf000050_0002
uniform allocation policy by 0.35dB in PSNR and by 0.0025 in MS-SSIM, for ρ = 0.031. It also improves upon the heuristic allocation policy by 0.25 dB in PSNR and by 0.0015 in MS-SSIM, for the same ρ. [00175] Referring now to Figures 12 to 16, another detailed example embodiment of a communication system 100 for transmitting video across a communications channel will now be described. The communication system 100 in this embodiment operates a dynamic control module 115c allocating data items according to a dynamic decision agent, implemented as a Markov Decision Process (MDP). The communication system 100 in this embodiment determines interpolation information for encoding interpolation data items in a latent space of the encoder output vectors encoded by the key item encoder neural network 115k. Further still, the communication system 100 includes an interpolation encoder neural network 115i configured as a recurrent neural network (RNN), more specifically a Long Short-Term Memory (LSTM), in which the internal cell state is updated to recurrently encode successive interpolation items until the next key frame, which is then normalized to the power constraint and transmitted as the output vector for recurrent decoding at the decoder 135. In this way, a bandwidth allocation module 115b of the encoder 115 in this embodiment is implemented as the dynamic control module 115c for selecting whether or not successive items are for encoding together into the cell state of the interpolation encoder neural network 115i to share the bandwidth needed to transmit the encoder output vector between successive interpolation items. As well as providing a similar performance improvement compared to separate source and channel coding schemes as the previous embodiment, this arrangement provides enhanced capabilities for efficiently encoding streaming media, such as live video, for transmission in such a way that the reconstruction performance is maintained high across a wide range of different channel conditions but the bandwidth used can be low. While in this embodiment, the interpolation item encoder neural network 115i and interpolation item decoder neural network 135i are configured as RNNs, in particular LSTMs, these can be provided by any suitable function that can learn to recursively encode and decode successive items in a sequence into a state for transmission and recursive decoding, such as a transformer architecture. [00176] Thus, in this embodiment, thefixed GoP formulation, where a group of N frames are considered jointly for compression, are forgone, and instead the encoder control module 115c addresses the question of which items to allocate as key items and as interpolation items dynamically, using a dynamic decision agent. In particular, the dynamic decision agent is implemented in the embodiment as an infinite horizon Markov decision process (MDP). [00177] That is, as shown in Figures 12 and 15, consider a sequence of video frames,
Figure imgf000051_0001
received in step 1501 of encoder process 1500 as a sequence of input data items from information source 111. The sequence of video frames,
Figure imgf000051_0002
may be a stream of video frames, for example being recorded live by a security camera or a drone. In step 1503 each frame xt isfirst transformed into a latent space vector via the key item encoder neural
Figure imgf000051_0003
network 115k, again denoted as
Figure imgf000051_0004
Figure imgf000051_0005
[00178] The complimentary key item decoder neural network is also similarly defined to have an architecture similar to denoted as that performs the opposite
Figure imgf000051_0007
Figure imgf000051_0006
operation as
Figure imgf000051_0008
[00179] Then, at step 1504, the bandwidth allocation module 115b working with the dynamic decision agent implemented as an MDP by dynamic control module 115c, dynamically determines whether the data item should serve as a key item or an interpolation item. [00180] For the dynamic decision agent, the MDP state at time step t is defined as a tuple
Figure imgf000051_0010
Figure imgf000051_0009
where k is the number of frames since the last key frame. Then the MDP implements in the dynamic control module 115c an agent, who takes input at time step t the current state st and outputs a binary decision that states whether the current frame xt should be a key frame or not. That is, let the agent be a function where 0 implies the current
Figure imgf000052_0001
frame is not a key frame and 1 implies that it is. While in this embodiment, the state st = (zt , k) is considered for simplicity, this may not necessarily be the case, and st can in other embodiments include additional information such as motion information (e.g. opticalflow vectors) and residual information, as in the case in the embodiment described above. [00181] If the agent decides that xt should be a key frame, then in step 1505 the code word ct for transmission at time step t is taken to be equal to the latent code zt output by the key item encoder neural network 115k. [00182] On the other hand, if the agent decides that xt should not be a key frame, then in step 1506, a secondary encoder network is utilised as the interpolation item encoder neural network 115i. This neural network, and its counterpart interpolation item decoder neural network 135i, have an architecture and mode of operation that is different to the embodiment described above in Figures 6-11. Specifically, the interpolation item encoder neural network 115i in this embodiment is a recurrent neural network (RNN), more specifically a Long Short-Term Memory (LSTM), the architecture of which is shown in Figure 13, that takes at its input
Figure imgf000052_0004
layer a tuple and maps it to a code word ct.
Figure imgf000052_0002
[00183] That is, for the encoder 115,
Figure imgf000052_0003
[00184] As will be explained in more detail below, in steps 1507 and 1508, the interpolation item encoder neural network 115i may encode into ct multiple data items successively selected as interpolation items. [00185] Thus, thefinal set of codewords Ct is constructed from the sequence of keyframe decisions by the bandwidth allocation module 115b and dynamic
Figure imgf000052_0005
decision agent and codewords
Figure imgf000052_0007
such that
Figure imgf000052_0006
[00186] Thus, in the process 1500, if a new keyframe is initialised at time t + 1, then the codeword ct isflushed and stored in the set of codewords Ct to be transmitted and the latent vector of the new keyframe zt+1, is also appended in Ct+1 as it is a keyframe. This implies, if a frame is chosen to be a key frame, its codeword is independent of all other codewords; if a frame is not chosen to be a key frame, then its codeword is dependent on the previous codeword. This is similar to motion representation interpolation used in the embodiment described in relation to Figures 6-11 above, except rather than interpolating in the input data (i.e. pixel) domain, in this embodiment, interpolation between data items is now done implicitly by the interpolation item encoder neural network 115i in the latent space. The benefit of
Figure imgf000053_0001
interpolation in the latent space is that the interpolation item encoder neural network 115i may be able to transform the values in the pixel domain to a latent space where interpolation can be done more compactly. For example, in motion representation interpolation used in the embodiment described in relation to Figures 6-11 above, optical flow interpolation, which occurs in the input space (pixel) domain, does not capture occlusion/ disocculusion. As a result, the residual needs to be computed and transmitted to account for this type of information. This is due to the fact that opticalflow treats each frame as a 2D plane where the pixels are simply moved to obtain subsequent frames, without accounting for the fact that the scene itself may be 3D and therefore objects can appear as a result of objects in front of it moving in front of the camera. [00187] In contrast, in the latent space interpolation approach used in the interpolation item encoder neural network 115i in the present embodiment, the function
Figure imgf000053_0002
can be thought
Figure imgf000053_0003
of as a mapping to a space with higher dimensions (greater degree of freedom) that describes the various types of motion information (opticalflow, residual), and due to the greater degree of freedom, the interpolation can be done by translating the values in each dimension. For example, one dimension can be describing the x-axis movement, another can describe the occlusion of objects... etc. [00188] As indicated above, in the present embodiment, if there are consecutive non-key frames assigned, the loop of step 1507 to pass the next interpolation data item xt+1, to the key and then the interpolation item encoders to update internal state cell of the interpolation item encoder neural network, means that all those consecutive non-key frames will be represented by a single code word ct. The interpolation item encoder neural network 115i can thus be
Figure imgf000053_0004
seen as a codeword updater, that takes in new information about the current frame through the latent vector zt and updates the previous code word ct-1 to obtain the new code word ct. [00189] This is done using a recurrent neural network (RNN), such as a long-short term memory (LSTM) module as shown in Figure 13. That is, as shown in Figure 13, the internal state of the LSTM module (ℎ^) represents the current codeword ct, while the input to the LSTM at time t (i.e. xt), is the latent vector of the current frame zt. This architecture is only exemplary, and other neural network architectures can be used to perform the function of
Figure imgf000053_0005
[00190] To facilitate the decoding of the codewords by the decoder on receipt thereof, the encoder control module 115c may store a vector representing a map of key frame allocations as a binary vector describing which frames are key frames.
Figure imgf000054_0004
This information may be sent by the transmitter 110 as side information using conventional digital modulation and channel coding. [00191] If the set of codewords transmitted to the receiver 130 by steps 1505 and 1509 at time step t is
Figure imgf000054_0001
where
Figure imgf000054_0002
, is the number of codewords in the set, the bandwidth allocation module 115b may set a channel use constraint B, such that the average channel utilisation is below B. That is
Figure imgf000054_0003
[00192] The codeword set Ct may then be power normalised and transmitted by the carrier modulator 118 and antenna 119 across the channel 120. It should be noted that the process 1500, as set out more definitely in Algorithm 1 of Figure 14a, does not wait until a certain time t to send the codewords, but rather whenever a new codeword is appended to the set at time t, the new codeword is transmitted as soon as it becomes available. [00193] Turning now to the receiver process 1600 shown in Figure 16 and as set out more definitely in Algorithm 2 of Figure 14b, in step 1601 the receiver receives from the communications channel 120 and demodulates a set of noisy codewords
Figure imgf000054_0005
and the key frame allocation map mt. [00194] Then, in steps 1602-1606, the decoder 130 follows a similar process to the encoder 110 for decoding. In this respect, if the key frame allocation map mt, indicates that the noisy detected codeword
Figure imgf000054_0006
represents a key frame, in 1602 the decoder control module 135c passes the codeword to the key item decoder neural network 135k where in step 1603 it decodes and recovers a reconstruction
Figure imgf000054_0007
of the encoder input vector xt to generate a representation of the input data item. [00195] On the other hand, if the key frame allocation map mt, indicates that the noisy detected codeword
Figure imgf000054_0008
does not represent a key frame, in 1602 the decoder control module 135c passes the codeword
Figure imgf000054_0009
to the interpolation item decoder neural network 135i defined as a function which takes as an input a tuple
Figure imgf000054_0010
and in step 1605 decodes the noisy detected codeword by mapping it to an estimate of the frame latent vector
Figure imgf000054_0012
As can be
Figure imgf000054_0016
seen from the input, the mapping in step 1605 is based on received signal values in codeword and a latent representation of previous data item (i.e.
Figure imgf000054_0011
which is in this case generated by operating a version of the key item encoder neural network 115k function
Figure imgf000054_0013
stored locally at the decoder 130 (not shown in Figure 1, but this can be seen in the receiver 130 in Figure 12) on the reconstruction
Figure imgf000054_0014
of the encoder input vector xt-1 at the previous time step (i.e. by performing at the decoder 135).
Figure imgf000054_0015
[00196] That is, the noisy codeword is processed by decoder 135 in steps 1602 and 1605 for a time step ^ as follows:
Figure imgf000055_0001
[00197] In step 1606, the estimate of the latent vector for time step t decoded by
Figure imgf000055_0002
interpolation item decoder neural network 135i in step 1605 is passed to the key item decoder neural network 135k (i.e. to step 1604) to decode the latent vector
Figure imgf000055_0003
to recover a reconstruction of the encoder input vector xt interpolated from the reconstruction of the
Figure imgf000055_0004
previous encoder input vector xt-1 to generate a representation of the input data item for provision to the information sink 131. [00198] The process performed by the decoder 135 using the key item decoder neural network 135k, interpolation item decoder neural network 135i, and the version of the key item encoder neural network 115k function
Figure imgf000055_0005
stored locally at the decoder 130, is set out in Algorithm 2 shown in Figure 14b. As can seen by Algorithm 2, the interpolation item decoder neural network 135i in essence provides a decoder process that recursively unpacks the codeword by
Figure imgf000055_0008
conditioning the current latent vector
Figure imgf000055_0006
from the previous time step t. In practice, the unpacking function is also performed by an LSTM module. The internal state ℎt represents the
Figure imgf000055_0007
current state of the unpacked codeword yt and the input of the LSTM xt represents the current latent vector This recursive encoding and decoding process for successive interpolation items is illustrated in Figure 12 for increasing time steps. [00199] The training of the key item encoder and decoder neural network pair, 115k and 135k, and the interpolation item encoder and decoder neural network pair, 115i and 135i, is such that the neural networks are trained together using the method generally described in relation to Figure 5 and the mean-squared error as the loss function, to optimise the weights of the hidden layers thereof to minimise a reconstruction error. Again, an AWGN noise model for the channel may be used. [00200] While the examples above indicate a software-driven implementation of components of the invention by a more general-purpose processor such as a CPU core based on program logic or instructions stored in a memory, in alternative embodiments, certain components of the invention may be partly embedded as pre-configured electronic systems or embedded controllers and circuits embodied as programmable logic devices, using, for example, application-specific integrated circuits (ASICs) or Field-programmable gate arrays (FPGAs), which may be partly configured by embedded software or firmware. [00201] It should be noted that, in accordance with the present disclosure, the communication channel 120 should be understood as any transformation from the channel input space to the channel output space
Figure imgf000056_0003
that includes a random transformation due to the channel. This may include additive noise, interference, or other stochastic properties of the channel that will randomly transform the transmitted signal, e.g., fading and multi-path effects in wireless channels. Thus the reference to the noise-affected version of the of the vector of signal
Figure imgf000056_0001
values z received at the decoder should be understood to indicate that the input
Figure imgf000056_0002
to the decoder is a vector of values correlated with the transmitted vector z of signal values (which is itself correlated with the input data x from the information source), transformed by the communication channel 120, whether that transformation is ‘noise’ or another channel transformation. [00202] In this respect, in accordance with the present disclosure, the communication channel 120 should be understood as encompassing any channel that applies a random transformation to the channel output space. [00203] Thus, although the examples described above disclose the transmitted vector z of signal values being raw signal values in IQ space, taking any value (although this may be constrained to correspond to a fixed, pre-defined constellation) transmittable by the transmitter over the communication channel 120, in other embodiments the communication channel 120 may be one that also includes an existing channel encoder and decoder scheme, in which the signal space of the channel input may be the predetermined finite set symbols of the channel code (which could be bit values) for modulating a signal carrier for providing input signals in the alphabet in the input signal space for the communication channel. Thus besides random noise applied by the communication channel, the transformation applied by the communication channel 120 may, in embodiments, also include an existing channel code. Thus in these embodiments, the encoder and decoder pairs may be configured to learn a mapping to a predefined alphabet of symbols corresponding to a message alphabet for an existing channel code by which the input signals to the given communications channel are modulated. In this case, the encoder output vectors z output from the encoder 115 will be mapped into the message alphabet of the corresponding channel code (rather than, for example, the raw IQ values transmittable by the transmitter over the communication channel 120, as in the embodiments described above). The noise-affected channel output ̂ input to the decoder
Figure imgf000056_0004
neural network 135 may correspond to the decoded message of the existing channel decoder. In this respect, in these embodiments, the encoder neural network 115 and decoder neural network 135 may learn an optimum mapping of the input information source 111 to inputs of an existing channel code of the communications channel 120 that reduces reconstruction errors at the output 131 of the decoder neural network135. Although acting as an outer code in these embodiments, this learned coding of the encoder neural network 115 and decoder neural network 135 is still optimised end-to-end based on the characteristics of the communication channel 120 to reduce reconstruction errors, even though in these alternative embodiments the communication channel 120 includes an existing channel code. This is unlike existing modular source codes which are defined independently of the random transformation applied by any channel. [00204] Throughout the description and claims of this specification, the words “comprise” and “contain” and variations of them mean “including but not limited to”, and they are not intended to (and do not) exclude other components, integers or steps. Throughout the description and claims of this specification, the singular encompasses the plural unless the context otherwise requires. In particular, where the indefinite article is used, the specification is to be understood as contemplating plurality as well as singularity, unless the context requires otherwise. [00205] Features, integers, characteristics or groups described in conjunction with a particular aspect, embodiment or example of the invention are to be understood to be applicable to any other aspect, embodiment or example described herein unless incompatible therewith. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and/or all of the steps of any method or process so disclosed, may be combined in any combination, except combinations where at least some of such features and/or steps are mutually exclusive. The invention is not restricted to the details of any foregoing embodiments. The invention extends to any novel one, or any novel combination, of the features disclosed in this specification (including any accompanying claims, abstract and drawings), or to any novel one, or any novel combination, of the steps of any method or process so disclosed. In particular, any dependent claims may be combined with any of the independent claims and any of the other dependent claims.

Claims

CLAIMS 1. An encoder for use in a transmitter of a communications system for conveying sequences of correlated data items from an information source across a communication channel using joint source and channel coding, the encoder comprising: a key item encoder neural network for encoding data items selected from the sequence to serve as a key item that can be directly reconstructed by the decoder, the key item encoding being based on the input data item and being independent of any other data item in the sequence; an interpolation item encoder neural network for encoding data items selected from the sequence to serve as an interpolation item that can be reconstructed by the decoder using interpolation, the interpolation item encoding using data representing the input data item and at least one previous data item in the sequence; the key item encoder neural network and interpolation item encoder neural network having hidden layers of connecting nodes with weights that, in use, maps vectors of values received at nodes in an encoder input layer thereof to encoder output vectors provided at nodes of an encoder output layer thereof, the encoder output vectors being used for providing values in a signal space for modulating a carrier signal for transmission of a transformed version of the data items across a communication channel; the key item encoder neural network and interpolation item encoder neural network having in the communications system a respective complementary key item decoder neural network and interpolation item decoder neural network for receiving a noise-affected version of the encoder output vector from a receiver receiving and demodulating the signal transmitted across the communication channel and reconstructing the input vector to generate a representation of the input data item; wherein the connecting node weights of the key item encoder neural network and interpolation item encoder neural network have been trained together with the respective complementary key item decoder neural network and interpolation item decoder neural network, to minimise an objective function characterising a reconstruction error between input- output pairs of training data items.
2. A decoder for use in a receiver of a communications system for conveying sequences of correlated data items from an information source across a communications channel using joint source and channel coding, the decoder comprising: a key item decoder neural network for decoding data items from the sequence indicated as key items, the key item decoding being based on a noise-affected version of an encoder output vector generated at a transmitter by a complementary key item encoder neural network to encode the data item based on the input data item independent of any other data item in the sequence, the key item decoder neural network being configured for reconstructing the input vector to generate a representation of the input data item directly from the noise-affected version of the encoder input vector and independently of any other data item in the sequence; an interpolation item decoder neural network for decoding data items indicated as interpolation items, the key item decoding being based on data representing at least one previous data item in the sequence and a noise-affected version of an encoder output vector generated at a transmitter by a complementary interpolation item encoder neural network to encode the data item based on data representing the input data item and at least one previous data item in the sequence, the noise-affected version of the encoder output vector having been received and demodulated at the receiver based on the signal transmitted across the communication channel, the interpolation item decoder neural network being configured for reconstructing the input vector by interpolation from reconstructions of representations of other data items in the sequence to generate a representation of the input data item; the key item decoder neural network and interpolation item decoder neural network having hidden layers of connecting nodes with weights that, in use, map vectors of values received at nodes in a decoder input layer thereof to decoder output vectors provided at nodes of an decoder output layer thereof, the decoder output vectors providing a reconstruction of the encoder input vector to generate a representation of the input data item, wherein the connecting node weights of the key item decoder neural network and interpolation item decoder neural network have been trained together with the respective complementary key item encoder neural network and interpolation item encoder neural network, to minimise an objective function characterising a reconstruction error between input- output pairs of training data.
3. A communication system for conveying sequences of correlated data items from an information source across a communication channel using joint source and channel coding, comprising: a transmitter including an encoder as claimed in claim 1, the transmitter being configured for transmitting signals over a communication channel based on signal values of encoder output vectors of the key item encoder neural network and interpolation item encoder neural network provided by the encoder; a receiver including a decoder as claimed in claim 2, the receiver being configured for receiving and demodulating a noise-affected version of the encoder output vectors based on the signal transmitted across the communication channel and passing them to the decoder for reconstructing the sequence of correlated data items.
4. An encoder, decoder or communication system, as claimed in any preceding claim, wherein the sequences of correlated data items are a series of image frames providing a video.
5. An encoder, decoder or communication system, as claimed in any preceding claim, wherein the encoder is configured to create a coded and compressed representation of the input data as an encoder output vector for transmission across the communications channel; and wherein the decoder is configured to decode the noise-affected version of the encoder output vector back into an uncompressed reconstruction of the input data.
6. An encoder, decoder or communication system, as claimed in any preceding claim, wherein the interpolation encoder input layer is configured such that the interpolation item encoding uses data received at the input layer thereof representing the input data item in pixel space and at least one previous data item in the sequence in pixel space.
7. An encoder, decoder or communication system, as claimed in claim 6, wherein the data representing at least one previous data item in the sequence used by the interpolation item encoder to encode interpolation items comprises: the motion representation information representing the relative motion between the data item and at least one other data item in the sequence; and the residual information between the data item and a motion compensated version of at least one other data item in the sequence using the motion representation information in respect of that data item.
8. An encoder, decoder or communication system, as claimed in any of claims 1 to 5, wherein the interpolation encoder input layer is configured such that the interpolation item encoding uses data received at the input layer thereof representing: the input data item in the latent space defined by the output of the key item encoder neural network; and at least one previous data item in the sequence in the latent space defined by the output of the key item encoder neural network.
9. An encoder, decoder or communication system, as claimed in claim 8, wherein the data representing the input data item used by the interpolation item encoder to encode interpolation items comprises: the key item encoder output vector encoded for the data item by the key item encoder neural network; and wherein the data representing at least one previous data item in the sequence used by the interpolation item encoder to encode interpolation items comprises an encoder output vector transmitted by the encoder for at least one previous data item in the sequence.
10. An encoder, decoder or communication system, as claimed in claim 9, wherein the data representing at least one previous data item in the sequence used by the interpolation item decoder to decode interpolation items comprises a noise-affected version of an encoder output vector or a reconstruction of the encoder input vector providing a representation of the input data item for at least one previous data item in the sequence.
11. An encoder, decoder or communication system, as claimed in any preceding claim, wherein the data representing at least one previous data item in the sequence used by the interpolation item encoder to encode interpolation items further comprises data representing at least one subsequent data item in the sequence
12. An encoder, decoder or communication system, as claimed in any preceding claim, the encoder and decoder further comprising a static control module configured to select data items from the sequence of data items as key items and interpolation items according to a fixed order specified by a predetermined group of items, the static control module being further configured to use the key item encoder and decoder to encode and decode data items selected as key items, and to use the interpolation item encoder and decoder to encode and decode data items selected as interpolation items.
13. An encoder, decoder or communication system, as claimed in any preceding claim, the encoder further comprising a dynamic control module having a dynamic decision agent configured to dynamically choose whether the data item is to serve as a key item or an interpolation item.
14. An encoder, decoder or communication system, as claimed in claim 13, wherein the dynamic decision agent is configured to dynamically choose whether the data item is to serve as a key item or an interpolation item based at least on one or more of: the current data item; the number of data items transmitted since last key item; a current average channel utilisation; and a channel utilisation constraint.
15. An encoder, decoder or communication system, as claimed in claim 13 or 14, wherein the dynamic decision agent is configured to dynamically choose whether the data item is to serve as a key item or an interpolation item so that the average channel utilisation is below the channel utilisation constraint.
16. An encoder, decoder or communication system, as claimed in claim 13, 14 or 15, wherein the dynamic control module is configured to: select, based on a decision output by decision agent for the data item, whether the data item is to serve as a key item or an interpolation item; if the data item is selected to serve as a key item, use the key item encoder to encode the data item in the sequence to provide a key item encoder output vector for the item, the encoder being configured for transmitting the key item encoder output vector on the communications channel.
17. An encoder, decoder or communication system, as claimed in claim 16, the dynamic control module being further configured to: if the data item is selected to serve as an interpolation item, use the interpolation encoder to encode the data item to provide an interpolation item encoder output vector for the item, the encoder being configured for transmitting the interpolation item encoder output vector on the communications channel.
18. An encoder, decoder or communication system, as claimed in any of claims 13 to 17, the dynamic decision agent configured to generate data mapping for the sequence of data items, which data items are key data items and which data items are interpolation data items, for transmission across the communications channel and for use by the decoder to determine whether the received noise-affected version of an encoder output vector should be decoded by the key item decoder neural network or the interpolation item decoder neural network.
19. An encoder, decoder or communication system, as claimed in any preceding claim, wherein the encoder output layers of the interpolation item encoder neural network and the decoder input layers of the interpolation item decoder neural network are divided into ordered blocks, and wherein the neural networks are trained such that the interpolation item encoder neural network encodes descending ordering of information in increasing blocks of nodes, and such that, for interpolation items, the decoder reconstructs an increasingly refined representation of the input data with increasing blocks received in the noise-affected version of an encoder output vector.
20. An encoder, decoder or communication system, as claimed in claim 19, wherein the communication system further comprises a bandwidth allocation module configured to determine, for each data item in the sequence selected to serve as an interpolation item, a number of blocks of the interpolation encoder output layer to be transmitted over the communication channel, so as to allocate the available bandwidth in the communications channel to the transmission of interpolation items.
21. An encoder, decoder or communication system, as claimed in claim 20, wherein the bandwidth allocation module is further configured to determine the number of blocks of the interpolation encoder output layer to be transmitted over the communication channel to seek to minimise the reconstruction error between the representation of the input data reconstructed at the decoder and the input data encoded at the encoder.
22. An encoder, decoder or communication system, as claimed in claim 20 or 21, wherein the bandwidth allocation module is configured to determine the number of blocks of the interpolation encoder output layer to be transmitted over the communication channel based on at least motion representation information determined to represent the relative motion between the data item and at least one other data item in the sequence.
23. An encoder, decoder or communication system, as claimed in claim 20, 21 or 22, wherein the bandwidth allocation module is configured to determine a number of blocks of the interpolation encoder output layer to be transmitted over the communication channel, so as to seek to optimally allocate the available bandwidth in the communications channel between a group, a set or the whole sequence of data items to be transmitted.
24. An encoder, decoder or communication system, as claimed in any of claims 1 to 19, wherein the interpolation encoder neural network is configured to: maintain and update an internal state as successive interpolation items of a group of consecutive interpolation items are encoded by the interpolation encoder neural network; and after successive interpolation items have been encoded into the internal state, to provide the internal state as the interpolation encoder output vector for providing values in a signal space for modulating a carrier signal for transmission of a transformed version of the group of consecutive interpolation data items across the communication channel.
25. An encoder, decoder or communication system, as claimed in claim 24, wherein the encoder neural network is configured to output an encoder output vector for transmission for each key item and each group of consecutive interpolation items between key items.
26. An encoder, decoder or communication system, as claimed in claim 24 or 25, wherein the interpolation decoder neural network is configured to: for a group of consecutive interpolation items, recursively decode the noise-affected version of the encoder output vector received from a receiver to thereby reconstruct the encoder input vectors of successive interpolation items to generate a representation of the input data items of the group of consecutive interpolation items.
27. An encoder, decoder or communication system, as claimed in claim 24, 25 or 26, wherein the interpolation encoder neural network and the interpolation decoder neural network are both provided by a recurrent neural network, optionally a Long Short-Term Memory.
28. An encoder, decoder or communication system, as claimed in any preceding claim, wherein the encoder output vectors provide values in a signal space that represent in-phase and quadrature components for modulation of the carrier signal for transmission over the communication channel.
29. An encoder, decoder or communication system, as claimed in any preceding claim, wherein the encoder output vectors provide values defining a probability distribution sampleable to provide values in a signal space that represent in-phase and quadrature components for modulation of the carrier signal for transmission over the communication channel.
30. An encoder, decoder or communication system, as claimed in any preceding claim, wherein the encoder output vectors provide values corresponding to a predetermined finite set of symbols of an existing channel encoder and decoder scheme for transmission of data over the communication channel.
31. An encoder, decoder or communication system, as claimed in any preceding claim, wherein the encoder is configured to encode the sequences of correlated data items from an information source for transmission across the communications channel as a streaming media.
32. An encoder, decoder or communication system, as claimed in any of claims 1 to 30, wherein the encoder is configured to encode the sequences of correlated data items from an information source into a static media file.
33. An encoder, decoder or communication system, as claimed in any preceding claim, wherein the correlated data is a video and wherein the items are video frames.
34. An encoder, decoder or communication system, as claimed in claim 33, wherein the correlated data items are each represented by a 3D matrix with a depth based on the colour channels, a height based on the height of the frame and a width based on the width of the frame.
35. An encoder, decoder or communication system, as claimed in claim 33 or 34, wherein the encoder input layers of the key item encoder neural network and/or the interpolation item encoder neural network are configured to receive video frames as input vectors.
36. An encoder, decoder or communication system, as claimed in claim 35, wherein the quantity of information included in the output vector of the key item encoder neural network and/or the interpolation item encoder neural network is smaller than the input vector.
37. A method for conveying sequences of correlated data items from an information source across a communications channel using joint source and channel coding, the method comprising, at a transmitter, for each data item in the sequence: selecting data items from the sequence of data items to serve as key items and interpolation items; encoding data items to serve as key items using a key item encoder neural network, the key item encoding being based on the input data item and being independent of any other data item in the sequence; encoding data items to serve as interpolation items using an interpolation item encoder neural network, the interpolation item encoding using data representing the input data item and at least one previous data item in the sequence; the key item encoder neural network and interpolation item encoder neural network having hidden layers of connecting nodes with weights that, in use, map vectors of values received at nodes in an encoder input layer thereof to encoder output vectors provided at nodes of an encoder output layer thereof, the encoder output vectors being used for providing values in a signal space for modulating a carrier signal for transmission of a transformed version of the data items across a communication channel; and transmitting signals over a communication channel based on signal values of encoder output vectors of the key item encoder neural network and interpolation item encoder neural network; the method further comprising, at a receiver: receiving and demodulating a noise-affected version of the encoder output vectors based on the signal transmitted across the communication channel; decoding data items from the sequence indicated as key items using a key item decoder neural network based on a noise-affected version of the encoder output vector for the data item, the key item decoder neural network being configured for reconstructing the input vector to generate a representation of the input data item directly from the noise-affected version of the encoder input vector and independently of any other data item in the sequence; and decoding data items from the sequence indicated as interpolation items using an interpolation item decoder neural network based on data representing at least one previous data item in the sequence and the noise-affected version of an encoder output vector for the data item, the key item decoder neural network being configured for reconstructing the input vector by interpolation from reconstructions of representations of other data items in the sequence to generate a representation of the input data item; the key item decoder neural network and interpolation item decoder neural network having hidden layers of connecting nodes with weights that, in use, map vectors of values received at nodes in a decoder input layer thereof to decoder output vectors provided at nodes of an decoder output layer thereof, the decoder output vectors providing a reconstruction of the encoder input vector to generate a representation of the input data item, wherein the connecting node weights of the key item encoder neural network and interpolation item encoder neural network have been trained together with the respective complementary key item decoder neural network and interpolation item decoder neural network, to minimise an objective function characterising a reconstruction error between input- output pairs of training data items.
38. A method of training an encoder and a decoder for use in a communication system as claimed in any of claims 1 to 36 for conveying sequences of correlated data items from an information source across a communication channel using joint source and channel coding, comprising: for input-output pairs of a set of training data items from the information source passed to the encoder, determining an objective function characterising a reconstruction error between input-output pairs of training data from the information source passed to the encoder and the representation of the input data reconstructed at the decoder; and using an appropriate optimisation algorithm operating on the objective function, updating the connecting node weights of the hidden layers of the key item encoder neural network, interpolation item encoder neural network, key item decoder neural network and interpolation item decoder neural network to seek to minimise the objective function.
39. A method as claimed in claim 38, wherein the encoder neural networks and decoder neural networks have been trained together using training data in which a model of the communication channel is used to estimate channel noise and add it to the transmitted signal values to generate a noise-affected version of the vector of signal values in the input-output pairs of training data.
40. A method as claimed in claim 38 or 39, wherein the encoder output layers of the interpolation item encoder neural network and the decoder input layers of the interpolation item decoder neural network are divided into ordered blocks, wherein the encoder output vector passed to the decoder input layer for each input-output pair during training is selected to have a random number of the ordered blocks, such that, following training, the interpolation item encoder neural network encodes descending ordering of information in increasing blocks of nodes, and such that, for interpolation items, the decoder reconstructs an increasingly refined representation of the input data with increasing blocks received in the noise-affected version of an encoder output vector.
41. Computer readable medium comprising one or more instructions which when executed cause at least one of: a transmitter; and a receiver; to operate in accordance with the method of claim 37.
42. Computer readable medium comprising one or more instructions which when executed cause a computing device to operate a method as claimed in claim 37, 38 or 39 of training an encoder and a decoder for use in a communication system as claimed in any of claims 1 to 36 for conveying sequences of correlated data items from an information source across a communication channel using joint source and channel coding.
PCT/GB2022/052266 2021-09-06 2022-09-06 Encoder, decoder and communication system and method for conveying sequences of correlated data items from an information source across a communication channel using joint source and channel coding, and method of training an encoder neural network and decoder neural network for use in a communication system Ceased WO2023031632A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GBGB2112665.1A GB202112665D0 (en) 2021-09-06 2021-09-06 Title too long see minutes
GB2112665.1 2021-09-06

Publications (1)

Publication Number Publication Date
WO2023031632A1 true WO2023031632A1 (en) 2023-03-09

Family

ID=78076807

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2022/052266 Ceased WO2023031632A1 (en) 2021-09-06 2022-09-06 Encoder, decoder and communication system and method for conveying sequences of correlated data items from an information source across a communication channel using joint source and channel coding, and method of training an encoder neural network and decoder neural network for use in a communication system

Country Status (2)

Country Link
GB (1) GB202112665D0 (en)
WO (1) WO2023031632A1 (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220109825A1 (en) * 2021-12-14 2022-04-07 Karteek Renangi Validation framework for media encode systems
CN116155453A (en) * 2023-04-23 2023-05-23 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) A decoding method and related equipment for dynamic signal-to-noise ratio
CN116192340A (en) * 2023-04-27 2023-05-30 济南安迅科技有限公司 Error control method and device in optical communication network
US20230169694A1 (en) * 2021-11-30 2023-06-01 Qualcomm Incorporated Flow-agnostic neural video compression
CN116456094A (en) * 2023-06-15 2023-07-18 中南大学 A distributed video hybrid digital-analog transmission method and related equipment
CN116614637A (en) * 2023-07-19 2023-08-18 腾讯科技(深圳)有限公司 Data processing method, device, equipment and readable storage medium
CN116781156A (en) * 2023-08-24 2023-09-19 济南安迅科技有限公司 Channel coding method for optical communication network
CN117578741A (en) * 2024-01-15 2024-02-20 湖南智焜能源科技有限公司 Topology identification method and system for measuring switch
CN119484857A (en) * 2024-11-18 2025-02-18 重庆邮电大学 A screen content image transmission method based on semantic communication
WO2025077623A1 (en) * 2023-10-09 2025-04-17 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Systems and methods for obtaining neural networks for data compression
CN119847207A (en) * 2025-01-03 2025-04-18 暨南大学 Multi-unmanned aerial vehicle path planning method and system
CN120050002A (en) * 2025-02-24 2025-05-27 澳门大学 Variable length channel coding method with feedback, electronic device and storage medium
CN120321401A (en) * 2025-06-13 2025-07-15 深圳大学 Video codec transmission system construction method and system, equipment and medium
CN120744324A (en) * 2025-08-26 2025-10-03 四川大学华西医院 Deep learning-based signal bad track restoration method and device for cortex electroencephalogram

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114494048B (en) * 2022-01-11 2024-05-31 辽宁师范大学 Multi-stage progressive mixed distortion image restoration method based on supervised contrast learning
CN118509109A (en) * 2023-02-14 2024-08-16 华为技术有限公司 A source channel joint coding method and system

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020035683A1 (en) * 2018-08-15 2020-02-20 Imperial College Of Science, Technology And Medicine Joint source channel coding for noisy channels using neural networks

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020035683A1 (en) * 2018-08-15 2020-02-20 Imperial College Of Science, Technology And Medicine Joint source channel coding for noisy channels using neural networks

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MIKOLAJ JANKOWSKI ET AL: "AirNet: Neural Network Transmission over the Air", ARXIV.ORG, 26 May 2021 (2021-05-26), XP081968110 *
SKATCHKOVSKY NICOLAS ET AL: "End-to-End Learning of Neuromorphic Wireless Systems for Low-Power Edge Artificial Intelligence", 2020 54TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, IEEE, 1 November 2020 (2020-11-01), pages 166 - 173, XP033921597, DOI: 10.1109/IEEECONF51394.2020.9443351 *

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230169694A1 (en) * 2021-11-30 2023-06-01 Qualcomm Incorporated Flow-agnostic neural video compression
US12470726B2 (en) * 2021-12-14 2025-11-11 Intel Corporation Validation framework for media encode systems
US20220109825A1 (en) * 2021-12-14 2022-04-07 Karteek Renangi Validation framework for media encode systems
CN116155453A (en) * 2023-04-23 2023-05-23 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) A decoding method and related equipment for dynamic signal-to-noise ratio
CN116155453B (en) * 2023-04-23 2023-07-07 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Decoding method and related equipment for dynamic signal-to-noise ratio
CN116192340A (en) * 2023-04-27 2023-05-30 济南安迅科技有限公司 Error control method and device in optical communication network
CN116192340B (en) * 2023-04-27 2023-06-30 济南安迅科技有限公司 Error control method and device in optical communication network
CN116456094A (en) * 2023-06-15 2023-07-18 中南大学 A distributed video hybrid digital-analog transmission method and related equipment
CN116456094B (en) * 2023-06-15 2023-09-05 中南大学 Distributed video hybrid digital-analog transmission method and related equipment
CN116614637A (en) * 2023-07-19 2023-08-18 腾讯科技(深圳)有限公司 Data processing method, device, equipment and readable storage medium
CN116614637B (en) * 2023-07-19 2023-09-12 腾讯科技(深圳)有限公司 Data processing method, device, equipment and readable storage medium
CN116781156A (en) * 2023-08-24 2023-09-19 济南安迅科技有限公司 Channel coding method for optical communication network
CN116781156B (en) * 2023-08-24 2023-11-10 济南安迅科技有限公司 Channel coding method for optical communication network
WO2025077623A1 (en) * 2023-10-09 2025-04-17 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Systems and methods for obtaining neural networks for data compression
CN117578741A (en) * 2024-01-15 2024-02-20 湖南智焜能源科技有限公司 Topology identification method and system for measuring switch
CN117578741B (en) * 2024-01-15 2024-03-29 湖南智焜能源科技有限公司 Topology identification method and system for measuring switch
CN119484857A (en) * 2024-11-18 2025-02-18 重庆邮电大学 A screen content image transmission method based on semantic communication
CN119847207A (en) * 2025-01-03 2025-04-18 暨南大学 Multi-unmanned aerial vehicle path planning method and system
CN120050002A (en) * 2025-02-24 2025-05-27 澳门大学 Variable length channel coding method with feedback, electronic device and storage medium
CN120321401A (en) * 2025-06-13 2025-07-15 深圳大学 Video codec transmission system construction method and system, equipment and medium
CN120321401B (en) * 2025-06-13 2025-10-03 深圳大学 Video coding and decoding transmission system construction method and system, equipment and medium
CN120744324A (en) * 2025-08-26 2025-10-03 四川大学华西医院 Deep learning-based signal bad track restoration method and device for cortex electroencephalogram

Also Published As

Publication number Publication date
GB202112665D0 (en) 2021-10-20

Similar Documents

Publication Publication Date Title
WO2023031632A1 (en) Encoder, decoder and communication system and method for conveying sequences of correlated data items from an information source across a communication channel using joint source and channel coding, and method of training an encoder neural network and decoder neural network for use in a communication system
WO2022248891A1 (en) Communication system, transmitter, and receiver for conveying data from an information source across a communication channel using joint source and channel coding, and method of training an encoder neural network and decoder neural network for use in a communication system
US20210319286A1 (en) Joint source channel coding for noisy channels using neural networks
US20210351863A1 (en) Joint source channel coding based on channel capacity using neural networks
US10797728B1 (en) Systems and methods for diversity bit-flipping decoding of low-density parity-check codes
WO2020035684A1 (en) Joint source channel coding of information sources using neural networks
US9504042B2 (en) System and method for encoding and decoding of data with channel polarization mechanism
US20230079744A1 (en) Methods and systems for source coding using a neural network
Jankowski et al. AirNet: Neural network transmission over the air
US20210250049A1 (en) Adaptive Cross-Layer Error Control Coding for Heterogeneous Application Environments
US20220182111A1 (en) Mimo detector selection
Zhang et al. Semantic edge computing and semantic communications in 6g networks: A unifying survey and research challenges
EP3981094B1 (en) Peak to average power ratio reduction of optical systems utilizing error correction
EP3665879A1 (en) Apparatus and method for detecting mutually interfering information streams
CN116346281A (en) Communication system and method for transmitting and processing data
CN121040066A (en) Distributed video coding using reliability data
US20220294557A1 (en) Error correction in network packets
WO2017194012A1 (en) Polar code processing method and apparatus, and node
CN119853860B (en) Adaptive cross-domain gateway underwater acoustic coding method, device, electronic device and medium
US8289999B1 (en) Permutation mapping for ARQ-processed transmissions
CN114026827A (en) Transmitter algorithm
Mortaheb Deep Learning-Enabled Intelligent Goal-Oriented and Semantic Communication for 6G Networks
Khani Continuous Learning for Lightweight Machine Learning Inference at the Edge
WO2024226282A1 (en) Receiver side prediction of encoding selection data for video encoding
WO2025172978A1 (en) Obtaining learning model parameters at a device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22769351

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22769351

Country of ref document: EP

Kind code of ref document: A1