PT116910B

PT116910B - TWO-DIMENSIONAL MULTI-CONVOLUTIONAL CARE UNIT FOR MULTIVARIABLE TIME SERIES ANALYSIS

Info

Publication number: PT116910B
Application number: PT116910A
Authority: PT
Inventors: Jorge Pereira Gonçalves Rui; Manuel Ferreira Lobo Pereira Fernando; Miguel De Sousa Ribeiro Vítor
Original assignee: Univ Do Porto
Priority date: 2020-11-30
Filing date: 2020-11-30
Publication date: 2022-09-21
Also published as: PT116910A

Abstract

É, PORTANTO, UM OBJECTIVO DA PRESENTE INVENÇÃO UMA UNIDADE DE ATENÇÃO MULTI-CONVOLUCIONAL BI-DIMENSIONAL - 2D - A SER APLICADA NO DESEMPENHO DE ANÁLISE DE DADOS TRI-DIMENSIONAL - 3D - MTS, DE DADOS DE ENTRADA (1) COM PROPRIEDADES CÍCLICAS, UTILIZANDO UMA ARQUITECTURA RNN. ESTA UNIDADE É CAPAZ DE CONSTRUIR UM VECTOR DE ATENÇÃO INDEPENDENTE Α POR CADA VARIÁVEL DO MTS UTILIZANDO OPERAÇÕES CONVOLUCIONAIS 2D PARA CAPTURAR A IMPORTÂNCIA DE UMA ETAPA TEMPORAL DENTRO DOS SEGMENTOS CIRCUNDANTES E DA ÁREA DE ETAPAS TEMPORAIS. PARA ESSE PROPÓSITO, A UNIDADE DE ATENÇÃO BIDIMENSIONAL É COMPOSTA POR UM BLOCO DE DIVISÃO (2), UM BLOCO DE ATENÇÃO (3), UM BLOCO DE CONCATENAÇÃO (4) E UM BLOCO DE ESCALONAMENTO (5).IT IS, THEREFORE, AN OBJECTIVE OF THE PRESENT INVENTION A TWO-DIMENSIONAL - 2D - MULTI-CONVOLUTIONAL ATTENTION UNIT TO BE APPLIED IN THE PERFORMANCE OF THREE-DIMENSIONAL - 3D - MTS DATA ANALYSIS, OF INPUT DATA (1) WITH CYCLICAL PROPERTIES, USING A RNN ARCHITECTURE. THIS UNIT IS CAPABLE OF CONSTRUCTING AN INDEPENDENT ATTENTION VECTOR Α BY EACH MTS VARIABLE USING CONVOLUTIONAL 2D OPERATIONS TO CAPTURE THE IMPORTANCE OF A TIME STEP WITHIN THE SURROUNDING SEGMENTS AND THE TIME STEP AREA. FOR THIS PURPOSE, THE TWO-DIMENSIONAL ATTENTION UNIT IS COMPOSED OF A SPLITTING BLOCK (2), A ATTENTION BLOCK (3), A CONCATENATION BLOCK (4) AND A STAGING BLOCK (5).

Description

DESCRIÇÃODESCRIPTION

UNIDADE DE ATENÇÃO BI-DIMENSIONAL MULTI-CONVOLUCIONAL PARAMULTI-CONVOLUTIONAL BI-DIMENSIONAL ATTENTION UNIT FOR

ANÁLISE DE SÉRIES TEMPORAIS MULTIVARIÁVELMULTIVARIABLE TIME SERIES ANALYSIS

CAMPO DA INVENÇÃOFIELD OF THE INVENTION

A presente invenção está incluída no domínio das RedesThe present invention is included in the field of Networks

Neuronais Recorrentes. Em particular, a presente invenção referese a mecanismos de atenção aplicáveis para desempenhar análise de Série Temporal Multivariável com propriedades cíclicas, utilizando Redes Neuronais Recorrentes.Recurrent Neuronal. In particular, the present invention relates to attention mechanisms applicable to perform Multivariate Time Series analysis with cyclic properties, using Recurrent Neural Networks.

TÉCNICA ANTERIORPRIOR TECHNIQUE

Atenção é um mecanismo a ser combinado com RedesAttention is a mechanism to be combined with Networks

Neuronais Recorrentes (RNN) permitindo focar em determinadas partes da sequência de entrada ao prever uma determinada saída, prever ou classificar a sequência, possibilitando uma aprendizagem mais fácil e de maior qualidade. A combinação de mecanismos de atenção permitiu melhorar o desempenho em muitas tarefas, tornando-o parte integrante das RNNs modernas.Recurrent Neuronal (RNN) allowing to focus on certain parts of the input sequence by predicting a certain output, predicting or classifying the sequence, enabling easier and higher quality learning. The combination of attention mechanisms allowed to improve performance in many tasks, making it an integral part of modern RNNs.

Atenção foi introduzida originalmente para tarefas de tradução automática, mas se espalhou para muitas outras áreas de aplicação. Basicamente, a atenção pode ser vista como um bloco residual que multiplica o resultado com a sua própria entrada h± e em seguida reconecta-se à conduta de canalização (pipeline) da Rede Neuronal (NN) principal com uma sequência escalonada ponderada. Esses parâmetros de escalonamento são denominados de ponderações de atenção a± e o resultado é denominado de ponderaçãoAttention was originally introduced for machine translation tasks, but has spread to many other application areas. Basically, attention can be seen as a residual block that multiplies the result with its own h± input and then reconnects to the main Neural Network (NN) pipeline with a weighted staggered sequence. These scaling parameters are called a± attention weights and the result is called the weighting

- 2 de contexto ci para cada valor i da sequência, ou seja, todos juntos, são chamados de vector de contexto c do tamanho da sequência n. Esta operação é dada por:- 2 of context ci for each value i of the sequence, that is, all together, are called context vector c of sequence size n. This operation is given by:

nn

i=0 cálculo de a± é dado ao aplicar uma função de activação de softmax à sequência de entrada x¹ na camada 1:i=0 calculation of a± is given by applying a softmax activation function to the input sequence x ¹ in layer 1:

exp (x·) =----:—exp (x·) =----:—

ΣΚχΡΣΚχΡ

Isso significa que os valores de entrada da sequência irão competir uns com os outros para receber atenção, sabendo que, a soma de todos os valores obtidos a partir da activação de softmax, é 1, os valores de escalonamento no vector de atenção α terão valores entre [0,1] .This means that the input values of the sequence will compete with each other for attention, knowing that, the sum of all values obtained from the softmax activation, is 1, the scaling values in the attention vector α will have values between [0,1] .

O mecanismo de atenção pode ser aplicado antes ou depois das camadas recorrentes. Se a atenção é aplicada directamente à entrada, antes de entrar numa RNN, é chamada de atenção prévia, caso contrário, se for aplicada a uma sequência de saida RNN, é chamada de atenção posterior.The attention mechanism can be applied before or after recurring layers. If attention is applied directly to the input, before entering an RNN, it is called pre-attention, otherwise, if it is applied to an output sequence RNN, it is called post-attention.

No caso de dados de entrada de Séries de Tempo Multivariável (MTS), uma camada densa bi-dimensional é utilizada para desempenhar a atenção, a qual está sujeita a operações de permutação antes e depois desta camada, de modo a que o mecanismo de atenção possa ser aplicado entre os valores dentro de cadaIn the case of Multivariable Time Series (MTS) input data, a two-dimensional dense layer is used to perform the attention, which is subject to permutation operations before and after this layer, so that the attention mechanism can be applied between the values within each

- 3 sequência e não entre cada etapa temporal de todas as sequências.- 3 sequence and not between each temporal step of all sequences.

Uma camada recorrente convolucional bi-dimensional foi proposta por Chen et al. [1]. A motivação do trabalho foi prever a intensidade futura das chuvas com base em sequências de imagens meteorológicas. Aplicando essas camadas numa arquitectura de NN, foi possível superar os algoritmos de última geração para essa tarefa. As camadas convolucionais bi-dimensionais são camadas recorrentes, assim como qualquer outra camada recorrente, tal como Long Short-Term Memory (LSTM) (Memória Longa de Curto Prazo), mas onde as multiplicações de matrizes internas são trocadas por operações de convolução. Como resultado, os dados que fluem através das referidas células de camadas convolucionais bidimensionais permitem manter as características tri-dimensionais dos dados de MTS de entrada (Segmentos X Etapas temporais X Variáveis) em vez de ser apenas um mapa bi-dimensional (Etapas temporais X Variáveis).A two-dimensional convolutional recurrent layer was proposed by Chen et al. [1]. The motivation of the work was to predict the future intensity of rainfall based on sequences of meteorological images. Applying these layers in an NN architecture, it was possible to overcome the last generation algorithms for this task. Two-dimensional convolutional layers are recurrent layers, just like any other recurrent layer, such as Long Short-Term Memory (LSTM), but where internal matrix multiplications are replaced by convolution operations. As a result, the data flowing through said two-dimensional convolutional layer cells allows maintaining the three-dimensional characteristics of the input MTS data (Segments X Time Steps X Variables) instead of just being a two-dimensional map (Time Steps X Variables).

Existem soluções na técnica em que, como é o caso do documento do pedido de patente US9830709B2, se divulga um método para análise de vídeo com rede neuronal recorrente de atenção convolucional. Este método inclui a geração de um mapa de atenção multi-dimensional vigente. 0 mapa de atenção multi-dimensional vigente indica áreas de interesse numa primeira trama a partir de uma sequência de dados espaço-temporais. 0 método inclui ainda o recebimento de um mapa de característica multi-dimensional e convolvendo o mapa de atenção multi-dimensional vigente e o mapa de característica multi-dimensional para se obter um estado oculto multi-dimensional e um mapa de atenção multi-dimensional seguinte. 0 método identifica uma classe de interesse na primeira trama com base no estado oculto multi-dimensional e nos dados deThere are solutions in the art in which, as is the case in the document of patent application US9830709B2, a method for video analysis with recurrent neural network of convolutional attention is disclosed. This method includes the generation of an actual multi-dimensional attention map. The current multi-dimensional attention map indicates areas of interest in a first plot from a spatiotemporal data sequence. The method further includes receiving a multi-dimensional feature map and collapsing the current multi-dimensional attention map and the multi-dimensional feature map to obtain a multi-dimensional hidden state and a subsequent multi-dimensional attention map . The method identifies a class of interest in the first frame based on the multi-dimensional hidden state and data from

- 4 treino .- 4 training .

documento de patente US2018144208A1 divulga um modelo de atenção espacial que utiliza a informação do estado oculto vigente de um descodif icador de LSTM para guiar a atenção e extrair caracteristicas de imagem espacial para utilização em legendagem de imagem.patent document US2018144208A1 discloses a spatial attention model that uses current hidden state information from an LSTM decoder to guide attention and extract spatial image features for use in image captioning.

documento de patente CN109919188A divulga um método de classificação de sequência de tempo com base num mecanismo de atenção local espaçado e uma rede de estado de eco convolucional.patent document CN109919188A discloses a time sequence classification method based on a spaced local attention mechanism and a convolutional echo state network.

Como uma conclusão, todas as soluções existentes parecem não divulgar quaisquer adaptações necessárias a um mecanismo de atenção de uma arquitectura de RNN, que é aplicado ao caso específico de analisar os dados de MTS com propriedades cíclicas, para se conseguir uma análise mais precisa.As a conclusion, all existing solutions seem not to disclose any necessary adaptations to an attention mechanism of an RNN architecture, which is applied to the specific case of analyzing MTS data with cyclic properties, in order to achieve a more accurate analysis.

A presente solução pretende superar de forma inovadora esses problemas.This solution intends to overcome these problems in an innovative way.

SUMÁRIO DA INVENÇÃOSUMMARY OF THE INVENTION

É, por conseguinte, um objectivo da presente invenção uma unidade de atenção bi-dimensional (2D) multi-convolucional a ser aplicada no desempenho de análise de dados tri-dimensionais (3D) de MTS com propriedades cíclicas, utilizando uma arquitectura de RRN. É também um objectivo da presente invenção, um método de operação da unidade de atenção 2D multi-convolucional. Esta unidade é capaz de construir um vector de atenção independente α por cada variável do MTS utilizando operações convolucionais 2D para capturar a importância de uma etapa temporal dentro dos segmentos circundantes e da área de etapas temporais. Muitos subpadrões podem ser analisados utilizando camadas convolucionais 2DIt is therefore an object of the present invention a multi-convolutional two-dimensional (2D) attention unit to be applied in the analysis performance of three-dimensional (3D) MTS data with cyclic properties, using an RRN architecture. It is also an object of the present invention, a method of operation of the multi-convolutional 2D attention unit. This unit is capable of building an independent attention vector α for each MTS variable using 2D convolutional operations to capture the importance of a time step within the surrounding segments and time step area. Many subpatterns can be analyzed using 2D convolutional layers

- 5 empilhadas dentro do bloco de atenção.- 5 stacked inside attention block.

Num outro objectivo da presente invenção é descrito um sistema de processamento adaptado para desempenhar a análise de dados 3D de MTS com propriedades cíclicas, que compreende a unidade de atenção 2D agora desenvolvida.In another object of the present invention, a processing system adapted to perform the analysis of 3D MTS data with cyclic properties, comprising the 2D attention unit just developed, is described.

DESCRIÇÃO DAS FIGURASDESCRIPTION OF FIGURES

Figura 1 - representação do diagrama de blocos de um modo de realização da Unidade de Atenção 2D Multi-Convolucional desenvolvida, em que os sinais de referência representam:Figure 1 - representation of the block diagram of a realization of the 2D Multi-Convolutional Attention Unit developed, in which the reference signals represent:

- Dados de entrada de MTS 3D;- Input data from MTS 3D;

- Bloco de divisão;- Split block;

- Bloco de atenção 2D;- 2D attention block;

- Bloco de concatenação;- Block concatenation;

- Bloco de escalonamento.- Scaling block.

Figuras 2 e 3 - representações do diagrama de blocos de dois modos de realização de um sistema de processamento configurado para desempenhar análise em dados de MTS com propriedades cíclicas, em que os sinais de referência representam:Figures 2 and 3 - Block diagram representations of two embodiments of a processing system configured to perform analysis on MTS data with cyclic properties, in which the reference signals represent:

- Dados de entrada MTS 3D;- MTS 3D input data;

- Bloco de divisão;- Split block;

- Bloco de atenção 2D;- 2D attention block;

- Bloco de concatenação;- Block concatenation;

- Bloco de escala;- Scale block;

- RNN com camadas convolucionais 2D;- RNN with 2D convolutional layers;

- Camada densa;- dense layer;

Em que, na Figura 2 é representado o modo de realização do sistema de processamento onde a Unidade de Atenção 2D é aplicada antes daIn which, in Figure 2, the embodiment of the processing system is represented where the 2D Attention Unit is applied before the

- 6 RNN com camadas convolucionais 2D e, na Figura 3, é representado o modo de realização do sistema de processamento onde a Unidade de Atenção 2D é aplicada depois da RNN com camadas convolucionais 2D.- 6 RNN with 2D convolutional layers and, in Figure 3, the implementation of the processing system is represented where the 2D Attention Unit is applied after the RNN with 2D convolutional layers.

Figura 4 - representação de um mecanismo de preenchimento na dimensão dos segmentos dentro da Unidade de Atenção 2D.Figure 4 - Representation of a filling mechanism in the dimension of the segments inside the 2D Attention Unit.

DESCRIÇÃO DETALHADADETAILED DESCRIPTION

As configurações mais gerais e vantajosas da presente invenção são descritas no Sumário da invenção. Tais configurações são abaixo detalhadas de acordo com outros modelos de realização vantajosos e/ou preferidos de implementação da presente invenção.The most general and advantageous embodiments of the present invention are described in the Summary of the Invention. Such configurations are detailed below in accordance with other advantageous and/or preferred embodiments of implementing the present invention.

É descrita uma unidade de atenção 2D multi-convolucional especialmente desenvolvida para desempenhar a análise de dados 3D de MTS (1), utilizando arquicteturas de RNN (6) . Os dados de entrada de MTS 3D (1) são divididos em séries temporais individuais e para cada sequência é criado um trajecto com camadas convolucionais 2D e o resultado é novamente concatenado. A Figura 1 ilustra apenas uma convolução de filtro por cada sequência, ou seja, por cada variável dos dados de entrada do MTS (1), se a atenção estiver antes da RRN (6), conforme ilustrado na figura 2, ou por cada Número de Filtros gerados pela RNN, se o bloco de atenção for aplicado posteriormente, conforme ilustrado na figura 3 .A multi-convolutional 2D attention unit specially developed to perform 3D MTS data analysis (1) using RNN architectures (6) is described. The MTS 3D input data (1) is divided into individual time series and for each sequence a path with 2D convolutional layers is created and the result is again concatenated. Figure 1 illustrates only one filter convolution for each sequence, that is, for each MTS input data variable (1), if the attention is before the RRN (6), as illustrated in figure 2, or for each Number of Filters generated by RNN, if the attention block is applied later, as shown in figure 3 .

Dentro do bloco de atenção 2D, cada trajecto contém informação de mapa de caracteristica 3D para cada variável com: segmentos X número de filtro X etapas temporais. A primeira etapa é permutar a dimensão de número de filtro com a dimensão de segmentos de modo a que seja possível alimentar a RNN (6) que instruirá kernels 2D que correlaccionam segmentos e variáveis.Within the 2D attention block, each path contains 3D feature map information for each variable with: segments X filter number X time steps. The first step is to swap the filter number dimension with the segment dimension so that it is possible to feed the RNN (6) that will instruct 2D kernels that correlate segments and variables.

- 7 Para esses mapas 2D, é possível aplicar um mecanismo de preenchimento na dimensão do segmento. Isso é útil para séries temporais que exibem propriedades cíclicas. Por exemplo, se os segmentos representam dias e as etapas temporais são divididos por 24 horas um kernel 2D irá capturar padrões de atenção relacionados a algumas horas do dia e também o mesmo período nos dias anteriores e posteriores. Além disso, se houver segmentos de 7 dias, pode-se utilizar um mecanismo de preenchimento na dimensão do segmento de modo a que o processamento da borda, pelo kernel, possa correlacionar o primeiro dia da semana com o último dia da semana se os dados tenderem para terem um forte ciclo semanal. A última camada de convolução deve utilizar a função de activação de softmax de modo a que a informação dentro de cada mapa resultante compita por atenção. Isso manterá (ΣΡΣ£_oajj) = l importante para os valores de ponderação competitiva de cada mapa 2D por cada canal (Segmento i X etapa temporal j) . Em resumo, a última saída deve utilizar a activação de softmax para que cada valor tenha um factor de escalonamento na faixa [0,1] e todos somem 1.- 7 For these 2D maps, you can apply a fill mechanism to the segment dimension. This is useful for time series that exhibit cyclic properties. For example, if the segments represent days and the time steps are divided by 24 hours, a 2D kernel will capture attention patterns related to some hours of the day and also the same period in the days before and after. Furthermore, if there are 7-day segments, a padding mechanism can be used on the segment dimension so that edge processing by the kernel can correlate the first day of the week with the last day of the week if the data tend to have a strong weekly cycle. The last convolution layer must use the softmax activation function so that the information within each resulting map competes for attention. This will keep (ΣΡΣ£ _o ajj) = l important for the competitive weight values of each 2D map for each channel (Segment i X time step j) . In short, the last output must use softmax activation so that each value has a scaling factor in the range [0,1] and all add up to 1.

Antes da operação de concatenação, as dimensões são permutadas de volta à ordenação original e cada trajecto devolve um mapa 3D com o mesmo formato (segmentos X número de filtro X etapas temporais) como recebido na entrada do bloco de atenção. Estes mapas são concatenados uns com os outros, resultando num mapa de característica 4D de ponderações de atenção, a, com o formato: segmentos X número de filtro X etapas temporais X variáveis. Este mapa é compatível para multiplicação com h para se obter o mapa c de contexto 4D, como na atenção clássica. Este mapa de contexto 4D tem valores de escalonamento na dimensão de segmentos e de etapas temporais para cada número de filtro eBefore the concatenation operation, the dimensions are swapped back to the original ordering and each path returns a 3D map with the same format (segments X filter number X time steps) as received in the attention block input. These maps are concatenated with each other, resulting in a 4D feature map of attention weights, a, with the format: segments X filter number X time steps X variables. This map supports multiplication with h to obtain the 4D context c map, as in classical attention. This 4D context map has scaling values in the segment and time steps dimension for each filter number and

- 8 variável.- 8 variable.

A principal vantagem proporcionada pelo bloco de atenção 2D agora desenvolvido assenta em que, em vez de processar etapas individuais, é possível processar áreas de atenção na dimensão de segmentos e de etapas temporais, de acordo com os seus valores vizinhos, ou seja, sub-padrão na série temporal. A importância de cada área de atenção competirá com todas as outras da mesma forma tradicional, utilizando a activação de softmax. Desde que cada variável original de sequência/série temporal da entrada de MTS seja escalonada individualmente, cada variável de série temporal é processada individualmente. Assim, é aplicada uma operação de divisão para criar um bloco de atenção 2D para cada variável individual do MTS. Antes de escalonar as entradas, com a multiplicação matricial, todos os mapas 3D de atenção obtidos são concatenados resultando numa matriz 4D compatível. Desta forma, é construído um vector de atenção independente α por cada variável de MTS utilizando operações convolucionais 2D para capturar a importância de uma etapa temporal dentro de segmentos circundantes e área de etapas temporais. Muitos sub-padrões podem ser analisados utilizando camadas 2D convolucionais empilhadas dentro do bloco de atenção.The main advantage provided by the 2D attention block now developed is that, instead of processing individual steps, it is possible to process attention areas in the dimension of segments and temporal steps, according to their neighboring values, that is, sub- pattern in the time series. The importance of each attention area will compete with all others in the same traditional way, using softmax activation. Since each original sequence/timeseries variable from the MTS input is scaled individually, each timeseries variable is processed individually. Thus, a division operation is applied to create a 2D attention block for each individual MTS variable. Before scaling the entries, with matrix multiplication, all 3D attention maps obtained are concatenated, resulting in a compatible 4D matrix. In this way, an independent attention vector α is constructed for each MTS variable using 2D convolutional operations to capture the importance of a time step within surrounding segments and time step area. Many sub-patterns can be analyzed using 2D convolutional layers stacked within the attention block.

MODOS DE REALIZAÇÃO objectivo da presente invenção é uma unidade de atenção 2D multi-convolucional para desempenhar a análise de dados de entrada 3D de MTS (1). Para o propósito da presente invenção, os dados de entrada 3D de MTS (1) são definidos em termos de segmentos X número de filtro X etapas temporais X variáveis, tendo propriedades cíclicas é adequado para serem repartidos emEMBODIMENTS Object of the present invention is a multi-convolutional 2D attention unit for performing analysis of 3D MTS input data (1). For the purpose of the present invention, the 3D MTS input data (1) is defined in terms of segments X filter number X time steps X variables, having cyclic properties it is suitable to be partitioned into

- 9 segmentos .- 9 segments.

A unidade de atenção 2D multi-convolucional compreende o seguinte bloco: um bloco de divisão (2), um bloco de atenção (3), um bloco de concatenação (4) e um bloco de escalonamento (5).The multi-convolutional 2D attention unit comprises the following block: a division block (2), an attention block (3), a concatenation block (4) and a scaling block (5).

bloco de divisão (2) compreende meios de processamento adaptados para converter os dados de entrada 3D (1) em um mapa de caracteristicas 2D de segmentos X etapas temporais para cada métrica. A métrica pode ser ou variáveis dos dados de entrada 3D (1) ou o número de células recursivas geradas pela RNN (6) de acordo com respectivamente se a unidade é aplicada antes ou depois de um RNN (6) . 0 objectivo da operação de divisão é criar um bloco de atenção para cada variável individual nos dados de entrada 3D de MTS (1) . Desde que cada variável da sequência original dos dados de entrada 3D de MTS (1) seja dimensionada individualmente, cada variável dos dados de entrada (1) será processada individualmente.The division block (2) comprises processing means adapted to convert the 3D input data (1) into a 2D feature map of segments X time steps for each metric. The metric can be either variables of the 3D input data (1) or the number of recursive cells generated by the RNN (6) according to respectively whether the unit is applied before or after an RNN (6) . The purpose of the division operation is to create a block of attention for each individual variable in the MTS(1) 3D input data. Since each variable of the original sequence of MTS 3D input data (1) is scaled individually, each variable of the input data (1) will be processed individually.

bloco de atenção (3) compreende meios de processamento adaptados para implementar uma camada convolucional 2D. A referida camada convolucional 2D compreende pelo menos um filtro e uma função de activação de softmax. 0 bloco de atenção é configurado para aplicar a camada convolucional 2D ao mapa de caracteristica 2D, extraído a partir do bloco de divisão (2), com a finalidade de gerar um trajecto contendo informação de mapa de caracteristica tridimensional para cada métrica - variáveis ou número de célula recursiva - com: segmento X número de filtro X etapa temporal. Ao utilizar uma camada convolucional 2D dentro do bloco de atenção (3), é possível dar atenção a uma etapa temporal de acordo com os seus valores vizinhos e segmentos vizinhos - etapa temporal X segmento, permitindo extrair a importância de cada etapa temporalattention block (3) comprises processing means adapted to implement a 2D convolutional layer. Said 2D convolutional layer comprises at least one filter and a softmax activation function. The attention block is configured to apply the 2D convolutional layer to the 2D feature map, extracted from the division block (2), in order to generate a path containing three-dimensional feature map information for each metric - variables or number of recursive cell - with: segment X filter number X time step. By using a 2D convolutional layer within the attention block (3), it is possible to pay attention to a temporal step according to its neighboring values and neighboring segments - temporal step X segment, allowing to extract the importance of each temporal step

- 10 levando em consideração o contexto das etapas temporais contíguas e as etapas temporais na mesma área temporal de segmentos contíguos. Portanto, a importância de cada variável tomada dentro de um sub-padrão, competirá com todas as outras da mesma forma tradicional, utilizando a activação de softmax. O bloco de atenção (3) compreende ainda meios de processamento adaptados para implementar uma operação de permuta configurada para permutar duas dimensões num mapa de característica tridimensional. Mais particularmente, tal operação de permuta é utilizada para trazer segmentos de volta à primeira dimensão, assim como os dados de entrada originais (1).- 10 taking into account the context of contiguous temporal steps and temporal steps in the same temporal area of contiguous segments. Therefore, the importance of each variable taken within a subpattern will compete with all the others in the same traditional way, using softmax activation. The attention block (3) further comprises processing means adapted to implement a swapping operation configured to swap two dimensions in a three-dimensional feature map. More particularly, such a swap operation is used to bring segments back to the first dimension, just like the original input data (1).

O bloco de concatenação (4) é configurado para concatenar o mapa de característica 3D emitido pelo bloco de atenção (3), para gerar um mapa de caracterí stica 4D de ponderações de atenção, a, segmentos X números de filtro X etapas temporais X variáveis. Um bloco de escalonamento (5) é configurado para multiplicar os dados de entrada tridimensionais (1) com o mapa de característica tetradimensional de ponderações de atenção, α para gerar um mapa de contexto, c.The concatenation block (4) is configured to concatenate the 3D feature map outputted by the attention block (3) to generate a 4D feature map of attention weights, segments X filter numbers X time steps X variables . A scaling block (5) is configured to multiply the three-dimensional input data (1) with the four-dimensional feature map of attention weights, α to generate a context map, c.

Num modo de realização da unidade de atenção 2D multiconvolucional desenvolvida, esta é aplicada antes de uma RNN (6), e em que:In one embodiment of the developed multiconvolutional 2D attention unit, it is applied before an RNN (6), and in which:

- a métrica são as variáveis dos dados de entrada (1);- the metric is the input data variables (1);

- tais dados de entrada (1) são aplicados directamente ao bloco de divisão (2); e- such input data (1) are applied directly to the division block (2); and

- o número de filtros da camada convolucional 2D do bloco recursivo (3) é igual ao número de variáveis da entrada (1) .- the number of filters in the 2D convolutional layer of the recursive block (3) is equal to the number of input variables (1).

Num outro modo de realização da unidade de atenção 2D multi-convolucional desenvolvida, esta é aplicada depois de umaIn another embodiment of the developed multi-convolutional 2D attention unit, this is applied after a

- 11 RNN (6), e em que:- 11 RNN (6), and in which:

- a métrica é o número de células recursivas geradas na RNN (6);- the metric is the number of recursive cells generated in the RNN (6);

- a entrada (1) alimenta a RNN (6);- the input (1) feeds the RNN (6);

- o bloco de divisão (2) está adaptado para dividir a saída da RNN (6) dentro de uma série de sequências geradas por células recursivas; e o número de filtros da camada convolucional bidimensional do bloco recursivo (3) é igual ao número de células recursivas geradas pela RNN (6).- the division block (2) is adapted to divide the output of the RNN (6) into a series of sequences generated by recursive cells; and the number of filters in the two-dimensional convolutional layer of the recursive block (3) is equal to the number of recursive cells generated by the RNN (6).

Num outro modo de realização da unidade de atenção 2D multi-convolucional desenvolvida, a camada de convolução 2D do bloco de atenção (3) é programada para operar de acordo com um parâmetro de kernel uni-dimensional. Alternativamente, a camada de convolução 2D do bloco de atenção (3) é programada para operar de acordo com um parâmetro de kernel bi-dimensional.In another embodiment of the developed multi-convolutional 2D attention unit, the 2D convolution layer of the attention block (3) is programmed to operate according to a one-dimensional kernel parameter. Alternatively, the 2D convolution layer of the attention block (3) is programmed to operate according to a two-dimensional kernel parameter.

Num outro modelo de realização da unidade de atenção 2D multi-convolucional desenvolvida, a operação de permutação executada no bloco de atenção (3) é configurada para permutar a dimensão de número de filtro com a dimensão de segmento e/ou a dimensão de segmento com a dimensão de número de filtro.In another embodiment of the developed multi-convolutional 2D attention unit, the permutation operation performed on the attention block (3) is configured to swap the filter number dimension with the segment dimension and/or the segment dimension with the filter number dimension.

Num outro modo de realização da unidade de atenção 2D multi-convolucional desenvolvida, o bloco de atenção (3) é ainda configurado para implementar um mecanismo de preenchimento para o trajecto que contém a informação do mapa de característica 3D gerada pela camada convolucional 2D.In another embodiment of the developed multi-convolutional 2D attention unit, the attention block (3) is further configured to implement a filler mechanism for the path containing the 3D feature map information generated by the 2D convolutional layer.

É um outro objectivo da presente invenção, um sistema de processamento para desempenhar análise de dados de entrada 3D de MTS (1), definido em termos de segmentos X etapa temporal XIt is another object of the present invention, a processing system for performing analysis of 3D input data from MTS (1), defined in terms of segments X time step X

- 12 variáveis, compreendendo:- 12 variables, including:

- meios de processamento adaptados para implementar uma RNN (6);- processing means adapted to implement an RNN (6);

a unidade de atenção bi-dimensional multiconvolucional desenvolvida.the developed multiconvolutional two-dimensional attention unit.

Num modo de realização do sistema de processamento, a unidade de atenção 2D multi-convolucional é aplicada antes da RNN (6). Alternativamente, a unidade de atenção 2D multi-convolucional é aplicada depois da RNN (6).In one embodiment of the processing system, the multi-convolutional 2D attention unit is applied before the RNN (6). Alternatively, the 2D multi-convolutional attention unit is applied after the RNN (6).

Num modelo de realização do sistema de processamento, a RNN (6) é do tipo Long Short-Term Memory (Memória Longa de Curto Prazo).In one embodiment of the processing system, the RNN (6) is of the Long Short-Term Memory type.

Finalmente, é um objectivo da presente invenção, um método de operação da unidade de atenção 2D multi-convolucional desenvolvida, compreendendo as seguintes etapas:Finally, it is an object of the present invention, a method of operating the 2D multi-convolutional attention unit developed, comprising the following steps:

i. Converter dados de entrada 3D de MTS (1), definidos em termos de segmentos X etapas temporais X variáveis, dentro de um mapa de caracteristica bi-dimensional de segmentos X etapas temporais;i. Convert 3D MTS input data (1), defined in terms of segments X time steps X variables, into a two-dimensional feature map of segments X time steps;

ii. Aplicar uma camada convolucional 2D ao mapa de caracteristica 2D de modo a gerar um trajecto contendo a informação de mapa de caracteristica 3D para cada métrica com: segmentos X número de filtro X etapas temporais;ii. Apply a 2D convolutional layer to the 2D feature map in order to generate a path containing the 3D feature map information for each metric with: segments X filter number X time steps;

iii. Aplicar uma função de permuta à informação do mapa de caracteristica 3D com a finalidade de permutar a dimensão de número de filtro com a dimensão de segmento resultando num mapa de caracteristica 3D de número de filtro X segmentos X etapas temporais;iii. Applying a swap function to the 3D feature map information in order to swap the filter number dimension with the segment dimension resulting in a 3D feature map of filter number X segments X time steps;

iv. Repetir os passos ii. e iii. para todos os filtros da camada convolucional 2D e aplique uma função de activação de softmax à última camada convolucional com a finalidade de manter (ΣΡ=οΣ™ο α_Μ) ⁼ 1 para valores de ponderação competitiva de cada mapa de caracteristica 2D por cada número de filtro: segmento i X etapa temporal j;iv. Repeat steps ii. and iii. for all 2D convolutional layer filters and apply a softmax activation function to the last convolutional layer in order to keep (ΣΡ=οΣ™ο α _Μ ) ⁼ 1 for competitive weighting values of each 2D feature map by each number of filter: segment i X time step j;

v. Aplicar uma função de permuta para permutar de retorno à ordenação original da informação do mapa de caracteristica 3D do trajecto para cada métrica: segmentos X números de filtro X etapas temporais;v. Apply a swap function to swap back the original ordering of the 3D feature map information of the path for each metric: segments X filter numbers X time steps;

vi. Concatenar a informação do mapa de caracteristica 3D de cada trajecto resultando num mapa de caracteristica 4D de ponderações de atenção a, com o formato: segmentos X números de filtro X etapas temporais X variáveis;saw. Concatenate the information from the 3D feature map of each route resulting in a 4D feature map of attention weights a, with the format: segments X filter numbers X time steps X variables;

Em que a métrica corresponde a:Where the metric corresponds to:

- um número de variáveis da entrada (1) no caso de o bloco de atenção 2D ser aplicado antes de uma RNN (6); ou- a number of input variables (1) in case the 2D attention block is applied before an RNN (6); or

- um número de células recursivas geradas por uma RNN (6) se o bloco de atenção 2D for aplicado depois da referida RNN (6) .- a number of recursive cells generated by an RNN (6) if the 2D attention block is applied after said RNN (6).

Num modelo de realização do método, a correlação entre os segmentos é desempenhada configurando a camada convolucional 2D do bloco de atenção (3) para ter um kernel 2D.In one embodiment of the method, the correlation between the segments is performed by configuring the 2D convolutional layer of the attention block (3) to have a 2D kernel.

Num outro modelo de realização do método, é aplicado um mecanismo de preenchimento à dimensão de segmentos da informação do mapa de caracteristica 3D do trajecto preparado pela camada convolucional 2D do bloco de atenção (3).In another embodiment of the method, a filling mechanism is applied to the information segment size of the 3D feature map of the path prepared by the 2D convolutional layer of the attention block (3).

- 14 Como ficará claro para um especialista competente na técnica, a presente invenção não deve ser limitada aos modelos de realização aqui descritos, e são possíveis numerosas alterações, que ficam dentro do âmbito da presente invenção.- 14 As will be clear to a person skilled in the art, the present invention is not to be limited to the embodiments described herein, and numerous alterations are possible, which are within the scope of the present invention.

Certamente que os modelos de realização preferidos acima mostrados são combináveis, nas diferentes formas possíveis, sendo aqui evitada a repetição de todas essas combinações.Of course, the preferred embodiments shown above are combinable, in the different possible ways, the repetition of all these combinations being avoided here.

RESULTADOS EXPERIMENTAISEXPERIMENTAL RESULTS

A título de exemplo, apresentamos os resultados a partir de um estudo de caso relacionado ao consumo individual de energia eléctrica domiciliária. Este conjunto de dados é proporcionado pelo repositório de aprendizagem de máquina da UCI [1]. 0 foco está na análise de classificação de MTS e, portanto, são proporcionadas comparações de resultados entre metodologias de Aprendizagem profunda ('deep learning') utilizando métricas de precisão e de entropia cruzada categórica. Como valor alvo, o nível médio de consumo de energia activa global do domicílio para as 24 horas seguintes, em cinco classes, com base nas últimas 168 horas, ou seja, 7 dias. Utiliza-se uma janela deslizante de 24 horas. Cada etapa temporal é uma hora de dados. As cinco classes a prever são níveis a partir de muito baixo (nível 0) até muito alto (nível 4) . As séries temporais terão padrões representativos para todos os dias da semana, que podem ser agrupados e contidos num mapa 2D.As an example, we present the results from a case study related to the individual consumption of household electrical energy. This dataset is provided by the UCI machine learning repository [1]. The focus is on MTS classification analysis and therefore comparisons of results between Deep Learning methodologies using precision and categorical cross-entropy metrics are provided. As a target value, the household's average global active energy consumption level for the next 24 hours, in five classes, based on the last 168 hours, ie 7 days. A 24-hour sliding window is used. Each time step is one hour of data. The five classes to predict are levels from very low (level 0) to very high (level 4). Time series will have representative patterns for all days of the week, which can be grouped and contained in a 2D map.

- 15 LSTM Única:- 15 Single LSTM:

Precisão: 37,70%Accuracy: 37.70%

Precisão Precision Relembrar remember Pontuação de fl fl score Suporte Support 0 0 0.5000 0.5000 0.6957 0.6957 0.5818 0.5818 115 115 1 1 0.3333 0.3333 0.4286 0.4286 0.3750 0.3750 140 140 2 two 0.4815 0.4815 0.0922 0.0922 0.1548 0.1548 141 141 3 3 0.3488 0.3488 0.2778 0.2778 0.3093 0.3093 108 108 4 4 0.2750 0.2750 0.4783 0.4783 0.3492 0.3492 69 69 Média/total average/total 0.3991 0.3991 0.3770 0.3770 0.3468 0.3468 573 573

Tabela 1Table 1

LSTM com atenção padrão:LSTM with standard attention:

Precisão: 40,70%Accuracy: 40.70%

Precisão Precision Relembrar remember Pontuação de fl fl score Suporte Support 0 0 0.6442 0.6442 0.5826 0.5826 0.6119 0.6119 115 115 1 1 0.3799 0.3799 0.4789 0.4789 0.4237 0.4237 140 140 2 two 0.4110 0.4110 0.2143 0.2143 0.2817 0.2817 141 141 3 3 0.3185 0.3185 0.4630 0.4630 0.3774 0.3774 108 108 4 4 0.3065 0.3065 0.2714 0.2714 0.2879 0.2879 69 69 Média/total average/total 0.4198 0.4198 0.4070 0.4070 0.4015 0.4015 573 573

Tabela 2Table 2

LSTM com atenção Multi-convolução da invenção:LSTM with Multi-convolution attention of the invention:

Exatidão: 42,06%Accuracy: 42.06%

Precisão Precision Relembrar remember Pontuação de fl fl score Suporte Support 0 0 0.6481 0.6481 0.6087 0.6087 0.6278 0.6278 115 115 1 1 0.3486 0.3486 0.5429 0.5429 0.4246 0.4246 140 140 2 two 0.4222 0.4222 0.2695 0.2695 0.3290 0.3290 141 141 3 3 0.3750 0.3750 0.3333 0.3333 0.3529 0.3529 108 108 4 4 0.3443 0.3443 0.3043 0.3043 0.3231 0.3231 69 69

Média/total average/total 0.4313 0.4313 0.4206 0.4206 0.4161 0.4161 573 573

Tabela 3Table 3

LSTM simples com camadas convolucionais 2D:Simple LSTM with 2D convolutional layers:

Precisão: 42.41%Accuracy: 42.41%

Precisão Precision Relembrar remember f1-oontuacão f1-action Suoorte Southwest 0 0 0.5966 0.5966 0.6174 0.6174 0.6068 0.6068 115 115 1 1 0.3644 0.3644 0.5857 0.5857 0.4493 0.4493 140 140 2 two 0.5610 0.5610 0.1631 0.1631 0.2527 0.2527 141 141 3 3 0.3542 0.3542 0.4722 0.4722 0.3529 0.3529 108 108 4 4 0.3636 0.3636 0.2319 0.2319 0.2832 0.2832 69 69 Média/total average/total 0.4574 0.4574 0.4241 0.4241 0.4042 0.4042 573 573

Tabela 4Table 4

LSTM com camadas 2D-convolucionais com Bloco de atenção 2D multiconvolucionais com mecanismo de preenchimento em dimensão de segmentos: Precisão: 43.11%LSTM with 2D-Convolutional Layers with Multi-Convolutional 2D Attention Block with Thread-Dimensional Filling Engine: Accuracy: 43.11%

Precisão Precision Relembrar remember f1-Dontuacão f1-Dontuation SuDorte South 0 0 0.5940 0.5940 0.6870 0.6870 0.6371 0.6371 115 115 1 1 0.3653 0.3653 0.4357 0.4357 0.3974 0.3974 140 140 2 two 0.4148 0.4148 0.3972 0.3972 0.4058 0.4058 141 141 3 3 0.4253 0.4253 0.3426 0.3426 0.3795 0.3795 108 108 4 4 0.2745 0.2745 0.2029 0.2029 0.2333 0.2333 69 69 Média/total average/total 0.4237 0.4237 0.4311 0.4311 0.4244 0.4244 573 573

Tabela 5Table 5

- 17 REFERENCIAS- 17 REFERENCES

[1] - Xingjian Shi, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai kin Wong, and Wang chun Woo. Convolutional Istm network: A machine learning approach for precipitation nowcasting, 2015.[1] - Xingjian Shi, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai kin Wong, and Wang chun Woo. Convolutional Istm network: A machine learning approach for precipitation nowcasting, 2015.

[2] - Alice Berard Georges Hebrail. Individual household electric power consumption Data Set, November 2010. http://archive.ics.uci.edu/ml/datasets/Individual+household+elec tric+power+consumption.[2] - Alice Berard Georges Hebrew. Individual household electric power consumption Date Sep, November 2010. http://archive.ics.uci.edu/ml/datasets/Individual+household+elec tric+power+consumption.

Lisboa, 11 de março de 2021.Lisbon, March 11, 2021.

Claims

1. Multiconvolutional two-dimensional attention unit to perform the analysis of three-dimensional input data of multivariable time series (1), defined in terms of segments X time steps X variables; the unit characterized by comprising:

A division block (2) comprising processing means adapted to convert the three-dimensional input data (1) into a two-dimensional feature map of segments X temporal steps for each metric, the metric being the variables of the input data (1 ) or the number of recursive cells generated by the recursive neural network (6);

- An attention block (3) comprising processing means adapted to implement a two-dimensional convolutional layer comprising at least one filter and a softmax activation function; the attention block (3) being configured to apply the two-dimensional convolutional layer to the two-dimensional feature map in order to generate a path containing three-dimensional feature map information for a metric with: segments X filter number X steps temporal;

- the attention block (3) further comprises processing means adapted to implement a swapping operation configured to swap two dimensions in a three-dimensional feature map;

A concatenation block (4) configured to concatenate the three-dimensional feature map emitted by the attention block (3) to generate a four-dimensional feature map of attention weights, a;

A scaling block (5) configured to multiply the three-dimensional input data (1) with the four-dimensional feature map of attention weights, a, to generate a context map, c.

2. Multiconvolutional two-dimensional attention unit, according to claim 1, in which the multiconvolutional two-dimensional attention unit is applied before a recursive neural network (6), and in which:

- Metrics are input data variables (1);

- The input data (1) is applied directly to the division block (2); and the number of filters in the two-dimensional convolutional layer of the recursive block (3) is equal to the number of input variables (1).

3. Multiconvolutional bi-dimensional attention unit, according to claim 1, in which the multi-convolutional bi-dimensional attention unit is applied after a recursive neural network (6), and in which:

- The metric is the number of recursive cells, generated by the recursive neural network (6);

- The input data (1) feed the recursive neural network (6);

- the division block (2) is adapted to divide the output of the recursive neural network (6) into a series of sequences generated by recursive cells;

the number of filters of the two-dimensional convolutional layer of the attention block (3) is equal to the number of recursive cells generated by the recursive neural network (6).

A multiconvolutional two-dimensional attention unit according to any one of the preceding claims, wherein the two-dimensional convolution layer of the attention block (3) is programmed to operate according to a one-dimensional kernel parameter.

5. Multiconvolutional two-dimensional attention unit according to any one of the preceding claims 1 to 3, in which the two-dimensional convolution layer of the attention block (3) is programmed to operate according to a bi-dimensional kernel parameter. dimensional.

A multiconvolutional two-dimensional attention unit, according to any one of the preceding claims, wherein the permutation operation performed on the attention block (3) is configured to swap the filter number dimension with the segment dimension e/ or the segment dimension with the filter number dimension.

Multi-convolutional two-dimensional attention unit, according to any one of the preceding claims, in which the attention block (3) is further configured to implement a filling mechanism for the path containing the three-dimensional feature map information generated by the two-dimensional convolutional layer system.

8. Processing system to perform the analysis of three-dimensional multivariable time series input data (1), defined in terms of segments X time steps X variables, comprising:

- processing means adapted to implement a recursive neural network (6);

the multiconvolutional two-dimensional attention unit according to claims 1 to 7.

Processing system according to claim 8, in which the multiconvolutional two-dimensional attention unit is applied before the recursive neural network (6).

10.

Processing system according to claim 8, in which the multiconvolutional two-dimensional attention unit is applied after the recursive neural network (6).

11.

Processing system, according to any one of the preceding claims 8 to 10, in which the recursive neural network (6) is of the Long Short Term Memory type.

12. Method of operation of the multi-convolutional two-dimensional attention unit, according to claims 1 to 7, comprising the following steps:

i. Convert three-dimensional multivariate time series (1) input data, defined in terms of segments X time steps X variables, into a two-dimensional feature map of segments X time steps;

ii. Apply a two-dimensional convolutional layer to the two-dimensional feature map in order to generate a path containing three-dimensional feature map information for each metric with: segments X filter number X time steps;

iii. Applying a swap function to the three-dimensional feature map information in order to swap the filter number dimension with the segment dimension resulting in a three-dimensional feature map of filter number X segments X time steps;

iv. Repeat steps ii. and iii. for all filters of the two-dimensional convolutional layer and apply a softmax activation function to the last convolutional layer in order to maintain (Σ?=οΣ^ο ^a i,j) ⁼ 1 for competitive weighting values of each map of two-dimensional feature for each filter number: segment i X time step j;

v. Apply a permutation function to swap back to the original ordering of information from the three-dimensional feature map of the path for each metric: segments X filter number X time steps;

saw. Concatenate each piece of information from the three-dimensional feature map of the path resulting in a four-dimensional feature map of attention weights a, with the format: segments X filter number X time steps X variables;

Where the metric corresponds to:

- a number of input variables (1) in case the two-dimensional attention block is applied before a recursive neural network (6); or

- a number of recursive cells generated by a recursive neural network (6) if the two-dimensional attention block is applied after said recursive neural network (6).

13. Method according to the previous claim 12, characterized in that the correlation between the segments is performed by configuring the two-dimensional convolutional layer of the attention block (3) to have a two-dimensional kernel.

14. Method, according to the previous claims

12 or 13, in which a padding mechanism is applied to the information segment dimension of the three-dimensional feature map of the path prepared by the two-dimensional convolutional layer of the attention block (3).