WO1999001825A1

WO1999001825A1 - Method for constructing a neural network for modelling a phenomenon

Info

Publication number: WO1999001825A1
Application number: PCT/FR1998/001381
Authority: WO
Inventors: Hervé STOPPIGLIA; Gérard Dreyfus
Original assignee: INFORMATIQUE CDC
Current assignee: INFORMATIQUE CDC
Priority date: 1997-07-02
Filing date: 1998-06-29
Publication date: 1999-01-14
Anticipated expiration: 2000-01-02
Also published as: FR2765705B1; FR2765705A1

Abstract

The invention concerns the construction of a neural network, more particularly it concerns a method consisting in determining variables to be incorporated in an optimal model by evaluating results, and the construction of a neural network by determining the neural links based on a resulting model. It consists in inputting an additional variable which has random values, determining and classifying descriptors of the variables, by applying a criterion comparing the results, for determining an order of decreasing significance of the descriptors, then eliminating a variable whereof the descriptor is ranked after that of the additional variable. The invention is useful for modelling phenomena.

Description

Procédé de construction d'un réseau de neurones pour la modélisation d'un phénomèneMethod for building a neural network for modeling a phenomenon

La présente invention concerne un procédé de construction d'un réseau de neurones destiné à la modélisation d'un phénomène, ainsi qu'un réseau de neurones réalisé par mise en oeuvre du procédé selon l'invention.The present invention relates to a method of constructing a neural network intended for modeling a phenomenon, as well as a neural network produced by implementing the method according to the invention.

Bien que l'invention concerne un perfectionnement apporté à la construction des réseaux de neurones, on ne décrit pas l'ensemble des principes et procédés utilisés de façon classique pour la construction des réseaux de neurones car ils sont connus depuis longtemps, et il existe une littérature considérable à leur sujet. On ne décrit donc la technique antérieure que dans la mesure où l'invention s'y rapporte, et on décrit l'invention en référence à ces aspects de la technique antérieure.Although the invention relates to an improvement made to the construction of neural networks, the set of principles and methods conventionally used for the construction of neural networks is not described since they have been known for a long time, and there is a considerable literature on them. The prior art is therefore described only to the extent that the invention relates to it, and the invention is described with reference to these aspects of the prior art.

On considère d'abord un certain nombre de définitions utilisées dans le présent mémoire et qui sont en général celles de la technique antérieure.We first consider a number of definitions used in the present specification, which are generally those of the prior art.

Les réseaux de neurones sont des circuits matériels, réalisés par exemple sous forme de circuits intégrés, mais qui peuvent aussi être réalisés uniquement sous forme d'un logiciel. Un neurone est un élément qui possède des entrées destinées à recevoir des signaux représentatifs de variables et une ou plusieurs sorties, et il transmet des données de sortie ou de résultat par application d'une fonction d ' activation .Neural networks are hardware circuits, produced for example in the form of integrated circuits, but which can also be produced only in the form of software. A neuron is an element which has inputs intended to receive signals representative of variables and one or more outputs, and it transmits output or result data by application of an activation function.

Dans un réseau de neurones, il existe, en plus des neurones, des entrées, au moins une sortie et des liaisons formées entre les entrées et les neurones et entre les neurones et les sorties, et éventuellement entre les neurones. On démontre qu'un réseau de neurones du type précité et qui comporte plusieurs couches, c'est-à-dire ayant des liaisons en cascade entre des sorties de neurones d'une couche et des entrées de neurones d'une autre couche, est équivalent à un réseau de neurones à une seule couche dite "cachée", c'est-à-dire que tous les neurones du réseau n'ont des liaisons qu'avec les entrées et les sorties. Les variables sont des grandeurs qui peuvent prendre plusieurs valeurs et participent au phénomène qu'on veut modéliser. Le phénomène qu'on veut modéliser peut être quelconque, mais les réseaux de neurones sont évidemment appli- qués à des phénomènes dont on ne connaît pas la fonction qui lie les variables au résultat. Si l'on connaît cette fonction, il est plus simple et plus précis de réaliser directement un circuit mettant en oeuvre la fonction.In a neural network, in addition to the neurons, there are inputs, at least one output and connections formed between the inputs and the neurons and between the neurons and the outputs, and possibly between the neurons. It is shown that a neural network of the aforementioned type and which comprises several layers, that is to say having cascade links between neuron outputs of one layer and neuron inputs of another layer, is equivalent to a neural network with a single layer called "hidden", that is to say that all the neurons in the network have links only with inputs and outputs. Variables are quantities that can take several values and participate in the phenomenon that we want to model. The phenomenon we want to model can be arbitrary, but neural networks are obviously applied to phenomena whose function that binds the variables to the result is unknown. If we know this function, it is simpler and more precise to directly build a circuit implementing the function.

Un modèle d'un phénomène est représenté à la fois par l'ensemble des variables et par le traitement qu'elles subissent pour donner le résultat, notamment par les fonctions d'activation des neurones. Un sous -modèle est un modèle dont une variable au moins a été éliminé.A model of a phenomenon is represented both by the set of variables and by the processing they undergo to give the result, in particular by the activation functions of neurons. A submodel is a model from which at least one variable has been eliminated.

La validité d'un modèle est déterminée par apprentis - sage, c'est-à-dire par utilisation, comme signaux d'entrée, des valeurs de variables qui ont été déterminées et dont on connaît le résultat. L'apprentissage comprend l'application de plusieurs exemples, c'est-à-dire de plusieurs groupes de valeurs de variables, avec obtention de résultats qui peuvent être comparés aux résultats des exemples.The validity of a model is determined by learning, that is to say by using, as input signals, the values of variables which have been determined and whose result is known. Learning includes the application of several examples, that is to say several groups of variable values, with results which can be compared with the results of the examples.

L'appréciation de la validité d'un modèle s'effectue par comparaison du résultat obtenu dans l'apprentissage avec le résultat de l'exemple considéré.The validity of a model is assessed by comparing the result obtained in learning with the result of the example considered.

On utilise aussi des "descripteurs" qui sont des ensembles des valeurs d'une même variable dans un ensemble d'exemples utilisés pour l'apprentissage. Cet ensemble des valeurs de variables peut avoir diverses formes . Dans un exemple particulièrement intéressant, constituant un mode de réalisation préféré de l'invention, les descripteurs sont des vecteurs à N dimensions, N étant le nombre d'exemples utilisés pour l'apprentissage. Ces vecteurs agissent donc dans un espace à N dimensions. Chacun de ces vecteurs est orthogonal à un espace à N-l dimensions qui est défini comme étant l'espace à N-l dimensions dans lequel la projection du vecteur du descripteur, supposé non nul, est un point.We also use "descriptors" which are sets of values of the same variable in a set of examples used for learning. This set of variable values can take various forms. In a particularly interesting example, constituting a preferred embodiment of the invention, the descriptors are vectors with N dimensions, N being the number of examples used for learning. These vectors therefore act in an N-dimensional space. Each of these vectors is orthogonal to an N-l dimensional space which is defined as being the N-l dimensional space in which the projection of the vector of the descriptor, assumed to be non-zero, is a point.

Les définitions qui précèdent des termes utilisés dans le présent mémoire suggèrent déjà le problème auquel s'applique l'invention et qui est la modélisation d'un phénomène, permettant la réalisation optimale d'un réseau de neurones dont les entrées reçoivent les valeurs des variables et dont la sortie ou les sorties représentent des données de résultat. Le procédé de construction d'un tel réseau de neurones comprend en général, de manière connue, une première phase qui, à partir d'un groupe de variables trop important, détermine les seules variables qui doivent être utilisées parce qu'elles ont une signification dans le phénomène, et une seconde phase de construction d'un réseau de neurones optimal qui, à partir des signaux représentatifs des valeurs des variables, transmet des données de résultat représentant le phénomène.The foregoing definitions of terms used in this memo already suggest the problem facing the invention applies and which is the modeling of a phenomenon, allowing the optimal realization of a neural network whose inputs receive the values of the variables and whose output or outputs represent result data. The method of constructing such a neural network generally comprises, in a known manner, a first phase which, from too large a group of variables, determines the only variables which must be used because they have a meaning in the phenomenon, and a second phase of construction of an optimal neural network which, from signals representative of the values of the variables, transmits result data representing the phenomenon.

De manière connue, la première phase comprend la détermination de descripteurs, en nombre excessif, et la sélection, parmi l'ensemble des modèles possibles, de celui qui explique de la meilleure manière possible le phénomène observé. Il faut noter que cette explication doit tenir compte de la performance du modèle (faible écart entre le résultat donné par le modèle et les observations) , mais aussi de sa complexité (notamment parce que le traitement doit être aussi rapide que possible) .As is known, the first phase comprises the determination of descriptors, in excessive number, and the selection, among all the possible models, of the one which explains in the best possible way the observed phenomenon. It should be noted that this explanation must take into account the performance of the model (small difference between the result given by the model and the observations), but also its complexity (in particular because the processing must be as fast as possible).

On pourrait évaluer tous les modèles possibles. Il faut noter qu'un modèle a un type (par exemple linéaire ou non, statique ou dynamique, ...), une structure (définie par la famille de fonctions envisagées et l'ensemble des variables descriptives nécessaires) , et des paramètres (qui définissent la fonction choisie parmi la famille F de fonctions). Une première possibilité de sélection d'un modèle comprend la prise en considération d'un modèle complet utilisant tous les descripteurs, puis la réalisation de tous les sous-modèles possibles et, parmi ces sous- modèles possibles, la sélection du meilleur. Il faut alors estimer un nombre de modèles extrêmement important. En effet, lorsque le nombre de variables, et donc de descripteurs, est égal à P, il faut estimer 2^P modèles séparément. Par exemple, lorsque l'ensemble comporte quinze variables, le nombre de modèles possibles à comparer est de 32 768. Ce nombre devient vite extrêmement grand si bien que ce procédé devient rapidement inutilisable.We could assess all possible models. It should be noted that a model has a type (for example linear or not, static or dynamic, ...), a structure (defined by the family of functions envisaged and all the descriptive variables necessary), and parameters ( which define the function chosen from the family F of functions). A first possibility of selecting a model includes taking into consideration a complete model using all the descriptors, then making all the possible sub-models and, among these possible sub-models, selecting the best. We must then estimate an extremely large number of models. Indeed, when the number of variables, and therefore of descriptors, is equal to P, it is necessary to estimate 2 ^P models separately. For example, when the set has fifteen variables, the number of possible models to compare is 32,768. This number quickly becomes extremely large so that this process quickly becomes unusable.

On a donc mis au point d'autres procédés permettant de réduire le nombre de modèles à évaluer. On connaît ainsi des procédés destructif et constructif. Dans le premier procédé, on utilise, à partir du modèle complet à P descripteurs, tous les sous-modèles possibles à P-l descripteurs, on sélectionne celui qui donne la meilleure performance, et, si le sous-modèle est meilleur que le modèle complet, on reprend la procédure à partir de celui-ci alors que, s'il n'est pas meilleur que le modèle complet, on repart du modèle complet. Dans le procédé "constructif", on part d'un modèle à 0 descripteur et on construit les P modèles à 1 descripteur, on choisit le meilleur de ces modèles et on poursuit la procédure par addition d'un descripteur, jusqu'à ce que le modèle obtenu soit meilleur que tous les modèles obtenus par augmentation du nombre de descripteurs d'une unité. Ces deux procédés permettent une réduction très importante du nombre de modèles à évaluer.Other methods have therefore been developed which make it possible to reduce the number of models to be evaluated. We thus know of destructive and constructive processes. In the first method, we use, from the complete model with P descriptors, all the possible sub-models with Pl descriptors, we select the one which gives the best performance, and, if the sub-model is better than the complete model, we resume the procedure from it while, if it is not better than the complete model, we start from the complete model. In the "constructive" process, we start from a model with 0 descriptors and we construct the P models with 1 descriptor, we choose the best of these models and we continue the procedure by adding a descriptor, until the model obtained is better than all the models obtained by increasing the number of descriptors by one unit. These two methods allow a very significant reduction in the number of models to be evaluated.

Par rapport au procédé de sélection de modèle par évaluation de tous les modèles, les deux procédés précités peuvent ne pas donner un modèle optimal. Cependant, ils doivent être souvent utilisés dans la mesure où l'évaluation de la totalité des modèles possibles est en dehors des possibilités des machines disponibles de calcul. Lorsque les deux procédés (destructif et constructif) conduisent à un même modèle, la probabilité de celui-ci d'être le meilleur modèle est accrue. L'exécution successive des deux procédés constructif et destructif nécessite l'évaluation de P² modèles, c'est-à-dire un nombre très inférieur à 2^P modèles nécessaires pour l'évaluation de la totalité des modèles.Compared to the model selection method by evaluation of all the models, the two aforementioned methods may not give an optimal model. However, they should be used often since the evaluation of the totality of the possible models is beyond the possibilities of the available calculation machines. When the two processes (destructive and constructive) lead to the same model, the probability of it being the best model is increased. The successive execution of the two constructive and destructive processes requires the evaluation of P ² models, that is to say a number much less than 2 ^P models necessary for the evaluation of all of the models.

Les procédés de sélection des modèles nécessitent donc de nombreuses estimations de paramètres et l'emploi de tests d'hypothèses statistiques ou de critères d'information qui ne sont pas toujours faciles à comprendre par les utilisateurs non initiés. L'invention met en oeuvre un nouveau procédé pour la construction du réseau de neurones dans lequel est utilisé un nouveau procédé d'évaluation des modèles. Plus précisément, selon l'invention, les descripteurs sont ordonnés par ordre de signification décroissante. Au départ, l'ensemble des P descripteurs est suffisamment important pour décrire les données. Parmi ces P descripteurs, on détermine celui qui décrit le mieux la sortie voulue, puis le second et ainsi de suite. On obtient ainsi un classement des descripteurs. On considère alors les sous-modèles constitués par un seul descripteur, deux descripteurs, trois descripteurs, etc., en commençant à chaque fois par le descripteur le plus significatif. Il est donc possible de considérer un nombre très réduit de modèles. En outre, selon l'invention, on utilise au moins une variable supplémentaire qui a un descripteur supplémentaire qui est aléatoire, c'est-à-dire que les valeurs de la variable supplémentaire sont purement aléatoires . Lorsque les descripteurs sont ordonnés, on considère que tous ceux qui se trouvent après le descripteur aléatoire ont une signification qui n'est pas supérieure à celle du descripteur aléatoire et peuvent donc être éliminés.The methods of model selection therefore require numerous parameter estimates and the use of tests of statistical hypotheses or information criteria which are not always easy to understand by uninitiated users. The invention implements a new method for building the neural network in which a new model evaluation method is used. More precisely, according to the invention, the descriptors are ordered in decreasing order of meaning. At the start, the set of P descriptors is large enough to describe the data. Among these P descriptors, the one which best describes the desired output is determined, then the second and so on. We thus obtain a classification of descriptors. We then consider the sub-models consisting of a single descriptor, two descriptors, three descriptors, etc., starting each time with the most significant descriptor. It is therefore possible to consider a very small number of models. Furthermore, according to the invention, at least one additional variable is used which has an additional descriptor which is random, that is to say that the values of the additional variable are purely random. When the descriptors are ordered, it is considered that all those which are after the random descriptor have a meaning which is not greater than that of the random descriptor and can therefore be eliminated.

Plus précisément, dans un premier aspect, l'invention concerne un procédé de construction d'un réseau de neurones destiné à la modélisation d'un phénomène, le réseau comprenant des entrées destinées à recevoir des signaux représentatifs de valeurs de variables, des neurones destinés à appliquer une fonction d'activation aux signaux qu'ils reçoivent, au moins une sortie destinée à transmettre des données de résultat du modèle du phénomène, et des liaisons formées entre les entrées et les neurones et entre les neurones et la sortie, du type qui comprend, dans une première étape, la détermination des variables qui doivent être utilisées dans des modèles du phénomène par déter- mination de descripteurs représentatifs chacun des valeurs d'une variable, dans une seconde étape, la sélection des variables à incorporer à au moins un modèle optimal du phénomène par évaluation des résultats de plusieurs modèles, et dans une troisième étape, la construction d'un réseau de neurones par détermination des liaisons des neurones en fonction d'un modèle optimal obtenu ; selon l'invention, le procédé comprend aussi, pendant ou avant la première étape de détermination des descripteurs, l'introduction d'au moins une variable supplémentaire qui possède des valeurs aléatoires, et la détermination d'un descripteur représentatif des valeurs de cette variable supplémentaire, le classement des descripteurs, y compris celui de la variable supplémentaire, par application d'un critère de comparaison des résultats donnés par les modèles aux données représentatives du résultat du phénomène, avec détermination d'un ordre de signification décroissante des descripteurs, puis l'élimination d'au moins un descripteur qui, dans l'ordre de signification décroissante des descripteurs, est classé après le descripteur représentatif des valeurs de la variable supplémentaire.More specifically, in a first aspect, the invention relates to a method of constructing a neural network intended for modeling a phenomenon, the network comprising inputs intended to receive signals representative of values of variables, neurons intended applying an activation function to the signals which they receive, at least one output intended for transmitting result data of the model of the phenomenon, and links formed between the inputs and the neurons and between the neurons and the output, of the type which comprises, in a first step, the determination of the variables which must be used in models of the phenomenon by determining descriptors representative each of the values of a variable, in a second step, the selection of the variables to be incorporated into at least an optimal model of the phenomenon by evaluation of the results of several models, and in a third step, the construction of a network of neurons by determination of the connections of the neurons according to an optimal model obtained; according to the invention, the method also comprises, during or before the first step of determining the descriptors, the introduction of at least one additional variable which has random values, and the determination of a descriptor representative of the values of this variable classification, the classification of descriptors, including that of the additional variable, by applying a criterion of comparison of the results given by the models with the data representative of the result of the phenomenon, with determination of a decreasing order of significance of the descriptors, then the elimination of at least one descriptor which, in decreasing order of meaning of the descriptors, is classified after the descriptor representative of the values of the additional variable.

Dans un mode de réalisation avantageux, le procédé comporte en outre la représentation des descripteurs et du résultat du phénomène par des vecteurs d'un espace à N dimensions, N étant le nombre d'exemples d'un ensemble d'exemples d'apprentissage du phénomène, chaque exemple comprenant au moins une valeur de chacune des variables et au moins une donnée représentative du résultat du phénomène pour les valeurs correspondantes des variables . Dans ce mode de réalisation, le critère de comparaison utilisé pour le classement des descripteurs est avantageusement une comparaison, dans l'espace à N dimensions, des angles formés par un vecteur représentatif d'un descripteur avec le vecteur représentatif du résultat du phénomène.In an advantageous embodiment, the method further comprises the representation of the descriptors and of the result of the phenomenon by vectors of an N-dimensional space, N being the number of examples of a set of learning examples of the phenomenon, each example comprising at least one value of each of the variables and at least one datum representative of the result of the phenomenon for the corresponding values of the variables. In this embodiment, the comparison criterion used for the classification of the descriptors is advantageously a comparison, in the N-dimensional space, of the angles formed by a vector representative of a descriptor with the vector representative of the result of the phenomenon.

Dans ce mode de réalisation, l'étape de classement comprend de préférence la détermination du premier descripteur dans l'ordre de signification décroissante des descripteurs, et la projection des vecteurs descripteurs restants et du vecteur résultat sur l'espace à une dimension de moins qui est orthogonal à ce premier descripteur ; ensuite, cette étape comprend le classement des descripteurs dans cet espace à une dimension de moins pour la déter- mination du premier, dans l'ordre de signification décroissante, des descripteurs restants, et la projection des vecteurs descripteurs restants et du vecteur résultat sur un espace à une dimension de moins qui est orthogonal au premier descripteur dans l'ordre de signification décrois - santé des descripteurs restants, puis la répétition de ces étapes jusqu'au classement de tous les descripteurs ou jusqu'au classement du descripteur représentatif des valeurs de la variable supplémentaire.In this embodiment, the classification step preferably comprises the determination of the first descriptor in the order of decreasing meaning of the descriptors, and the projection of the descriptor vectors remaining and of the result vector on the space with one dimension less which is orthogonal to this first descriptor; then, this step includes the classification of the descriptors in this space to one less dimension for the determination of the first, in decreasing order of meaning, of the remaining descriptors, and the projection of the remaining descriptor vectors and of the result vector onto a space with one less dimension which is orthogonal to the first descriptor in decreasing order of meaning - health of the remaining descriptors, then the repetition of these steps until the classification of all the descriptors or until the classification of the descriptor representative of the values of the additional variable.

De préférence, la construction d'au moins un modèle optimal du phénomène par évaluation des résultats de plusieurs modèles comprend la construction de plusieurs sous- modèles successifs du phénomène, chaque sous-modèle contenant une variable de plus que le sous-modèle précédent, la variable ajoutée étant choisie dans l'ordre de signification décroissante des descripteurs, la variable du premier sous- modèle étant soit une constante, soit la variable la plus significative, et la sélection d'un sous-modèle comme modèle optimal par utilisation d'un critère de sélection.Preferably, the construction of at least one optimal model of the phenomenon by evaluation of the results of several models comprises the construction of several successive sub-models of the phenomenon, each sub-model containing one more variable than the previous sub-model, the added variable being chosen in decreasing order of meaning of the descriptors, the variable of the first sub-model being either a constant or the most significant variable, and the selection of a sub-model as optimal model by using a selection criteria.

Dans cet exemple de réalisation, le critère de sélec- tion d'un sous-modèle comprend de préférence la sélection du sous-modèle ayant le plus grand nombre de descripteurs donnant un niveau de risque de sélection de la variable supplémentaire qui est inférieur à un niveau choisi de seuil . Dans un second aspect, l'invention concerne un procédé de construction d'un réseau de neurones destiné à la modélisation d'un phénomène, le réseau comprenant des entrées destinées à recevoir des signaux représentatifs de valeurs de variables qui sont représentées par des descripteurs, des neurones destinés à appliquer une fonction d'activation aux signaux qu'ils reçoivent, au moins une sortie destinée à transmettre des données de résultat du modèle du phénomène, et des liaisons formées entre les entrées et les neurones et entre les neurones et la sortie, par détermination des liaisons des neurones en fonction du modèle ; le procédé comprend : - la construction d'un réseau de neurones à une seule couche dont le nombre de neurones est certainement trop élevé, les entrées des neurones correspondant aux descripteurs du modèle, le réseau de neurones contenant en outre, dans sa couche unique, au moins un neurone supplémentaire ayant une fonction d'activation dont les paramètres ont des valeurs aléatoires, etIn this exemplary embodiment, the criterion for selecting a sub-model preferably comprises the selection of the sub-model having the greatest number of descriptors giving a level of risk of selection of the additional variable which is less than a selected threshold level. In a second aspect, the invention relates to a method of constructing a neural network intended for modeling a phenomenon, the network comprising inputs intended to receive signals representative of values of variables which are represented by descriptors, neurons intended to apply an activation function to the signals which they receive, at least one output intended to transmit data of result of the model of the phenomenon, and connections formed between the inputs and the neurons and between the neurons and the output, by determining the connections of the neurons according to the model; the method includes: - the construction of a neural network with a single layer whose number of neurons is certainly too high, the inputs of the neurons corresponding to the descriptors of the model, the neural network containing in addition, in its single layer, at least one additional neuron having an activation function whose parameters have random values, and

- l'exécution d'un processus comprenant, avec le nombre trop élevé de neurones, un apprentissage des neurones par utilisation des descripteurs, et l'élimination au moins du neurone ayant la contribution la moins significative au résultat, afin que le réseau ait un nombre plus petit de neurones, puis- the execution of a process comprising, with the too high number of neurons, a learning of the neurons by use of the descriptors, and the elimination at least of the neuron having the least significant contribution to the result, so that the network has a smaller number of neurons and then

- la répétition de ce processus avec le nombre plus petit de neurones, au moins jusqu'à ce que le neurone à éliminer soit un neurone supplémentaire.- repeating this process with the smaller number of neurons, at least until the neuron to be eliminated is an additional neuron.

Dans ce mode de réalisation, l'apprentissage des neurones par utilisation des descripteurs est effectué de préférence avec une partie seulement des exemples. Il est avantageux que l'exécution d'un processus comprenne, avant l'élimination d'un neurone, au moins une répétition d'un apprentissage pour la confirmation du neurone ayant la contribution la moins significative.In this embodiment, the learning of neurons by using descriptors is preferably carried out with only part of the examples. It is advantageous that the execution of a process comprises, before the elimination of a neuron, at least one repetition of a training for the confirmation of the neuron having the least significant contribution.

Il est avantageux que le modèle du phénomène utilisé soit un modèle optimal obtenu par mise en oeuvre du procédé selon le premier aspect de l'invention.It is advantageous that the model of the phenomenon used is an optimal model obtained by implementing the method according to the first aspect of the invention.

D'autres caractéristiques et avantages de l'invention ressortiront mieux de la description qui suit d'un exemple de réalisation, faite en référence au dessin annexé sur lequel : - la figure 1 est un diagramme vectoriel représentant géométriquement un algorithme de comparaison des descripteurs ; - la figure 2 est un graphique indiquant les résultats obtenus d'une part avec une procédure dite "de Gram-Schmidt" et d'autre part avec évaluation des performances de tous les sous -modèles d'un ensemble complet ; - la figure 3 est un graphique indiquant le résultat donné par l'algorithme d'évaluation de Gram-Schmidt et la répartition du classement de la variable aléatoire ; etOther characteristics and advantages of the invention will emerge more clearly from the description which follows of an exemplary embodiment, given with reference to the appended drawing in which: FIG. 1 is a vector diagram representing geometrically an algorithm for comparing descriptors; FIG. 2 is a graph indicating the results obtained on the one hand with a so-called "Gram-Schmidt" procedure and on the other hand with evaluation of the performance of all the sub-models of a complete set; FIG. 3 is a graph indicating the result given by the Gram-Schmidt evaluation algorithm and the distribution of the classification of the random variable; and

- la figure 4 est un graphique illustrant un processus modêlisé dans un exemple de mise en oeuvre du procédé de l'invention.- Figure 4 is a graph illustrating a process modeled in an example of implementation of the method of the invention.

On considère un exemple plus détaillé de mise en oeuvre de 1 ' invention en référence à un exemple dans lequel on cherche à modéliser un processus.We consider a more detailed example of implementation of the invention with reference to an example in which we seek to model a process.

On dispose de P descripteurs (c'est-à-dire qu'on suppose initialement que P variables peuvent participer au résultat) . On construit donc P descripteurs sous forme de vecteurs dans un espace à N dimensions, N étant le nombre d'exemples. Chaque exemple comprend une valeur de chacune des P variables, et au moins la valeur d'un résultat. Pour le classement des descripteurs, on utilise avantageusement l'algorithme d¹ orthogonalisation de Gram-Schmidt modifié qu'on décrit maintenant rapidement. On peut cependant se reporter avantageusement, pour plus de détails, à l'article de S.Chen, S.A.Billings et .Luo, "Orthogonal least squares methods and their application to non-linear System identification". International Journal of Control, Vol. 50, n° 5, p. 1873 à 1896, 1989.We have P descriptors (that is, we initially assume that P variables can participate in the result). We therefore construct P descriptors in the form of vectors in an N-dimensional space, N being the number of examples. Each example includes a value from each of the P variables, and at least the value of a result. For the classification of descriptors is advantageously used the algorithm ¹ Gram-Schmidt orthogonalization changed now that described quickly. One can however advantageously refer, for more details, to the article by S. Chen, SABillings and. Luo, "Orthogonal least squares methods and their application to non-linear System identification". International Journal of Control, Vol. 50, n ° 5, p. 1873 to 1896, 1989.

L'algorithme d' orthogonalisation de Gram-Schmidt considère les descripteurs et la sortie voulue comme des vecteurs. Les notations sont les suivantes :The Gram-Schmidt orthogonalization algorithm considers the descriptors and the desired output as vectors. The ratings are as follows:

avec x. à 1 ' entrée P et Y

with x. at the P and Y input

La matrice X est la matrice des entrées (P colonnes correspondent aux P descripteurs du modèle et N lignes représentent les N exemples de l'ensemble d'apprentissage). On considère que la matrice X est composée de P vecteurs représentant chacun une entrée. Le vecteur Y est le vecteur de sortie (N lignes correspondent aux sorties observées des N exemples) .The matrix X is the input matrix (P columns correspond to the P descriptors of the model and N rows represent the N examples of the training set). We consider that the matrix X is composed of P vectors each representing an entry. The vector Y is the output vector (N lines correspond to the observed outputs of the N examples).

A la première itération, on détermine le vecteur d'entrée qui "explique" le mieux la sortie. Pour cela, on détermine l'angle du vecteur de sortie avec chaque vecteur d'entrée. On évalue à cet effet le carré des cosinus des angles. Le vecteur sélectionné est celui pour lequel le carré des cosinus est maximal.At the first iteration, the input vector is determined which best "explains" the output. To do this, the angle of the output vector with each input vector is determined. To this end, the square of the cosines of the angles is evaluated. The vector selected is the one for which the cosine square is maximum.

Une fois déterminé ce vecteur le plus significatif, on élimine sa contribution en projetant le vecteur de sortie et tous les vecteurs d'entrée restants sur un sous-espace ou espace à N-l dimensions qui est orthogonal au vecteur sélectionné.Once this most significant vector has been determined, its contribution is eliminated by projecting the output vector and all the remaining input vectors onto a subspace or space with N-1 dimensions which is orthogonal to the selected vector.

L'algorithme se poursuit jusqu'à ce que tous les vecteurs d'entrée aient été ordonnés. Selon l'invention, on peut interrompre l'évaluation lorsqu'on doit sélectionner le vecteur aléatoire.The algorithm continues until all the input vectors have been ordered. According to the invention, the evaluation can be interrupted when the random vector has to be selected.

A chaque itération, on calcule la solution ordinaire des moindres carrés et la valeur de 1 ' écart quadratique moyen correspondant. L'estimation des paramètres de la régression des moindres carrés est obtenue par résolution d'une équation linéaire ayant une matrice triangulaire supérieure et la norme du vecteur de sortie projeté détermine la valeur de l'écart quadratique moyen. La figure 1 indique l'interprétation géométrique de l'algorithme qu'on vient de décrire. Sur cette figure, on a représenté un espace à deux dimensions. Le vecteur de sortie Y est mieux "expliqué" par le vecteur X₂ que par le vecteur X_λ (l'angle θ₂ est plus petit que l'angle θ_x) . On sélectionne donc X₂ comme premier descripteur. Pour éliminer la partie expliquée par ce descripteur, on projette les vecteurs Y et X-_L (et de façon générale tous les vecteurs restants) sur le sous-espace orthogonal au vecteur X₂. Les projections sont utilisées pour la sélection du descripteur suivant mais, dans le cas de deux dimensions, il n'en existe plus puisqu'il ne reste plus qu'un seul vecteur d'entrée X-_]_ . L'algorithme de Gram-Schmidt qu'on vient de décrire ne donne pas toujours le résultat optimal. La figure 2 indique les résultats obtenus d'une part avec l'algorithme de Gram- Schmidt et d'autre part avec évaluation des performances des 1 024 sous-modèles d'un ensemble complet comprenant quinze points d'apprentissage de dix descripteurs dont cinq seulement sont pertinents. Les croix représentent les résultats des 1 024 sous-modèles possibles et la courbe le sous-modèle sélectionné par l'algorithme de Gram-Schmidt. On note que, à l'exception du sous-modèle à trois descripteurs, les sous- modèles obtenus sont toujours les meilleurs.At each iteration, the ordinary least squares solution and the value of the corresponding mean square deviation are calculated. The estimation of the parameters of the least squares regression is obtained by solving a linear equation having a higher triangular matrix and the norm of the projected output vector determines the value of the mean square deviation. Figure 1 shows the geometric interpretation of the algorithm just described. This figure shows a two-dimensional space. The output vector Y is better "explained" by the vector X ₂ than by the vector X _λ (the angle θ ₂ is smaller than the angle θ _x ). We therefore select X ₂ as the first descriptor. To eliminate the part explained by this descriptor, we project the vectors Y and X- _L (and generally all the remaining vectors) on the subspace orthogonal to the vector X ₂ . The projections are used for the selection of the following descriptor but, in the case of two dimensions, there does not exist any more since there remains only one input vector X- _] _. The Gram-Schmidt algorithm just described does not always give the optimal result. Figure 2 shows the results obtained on the one hand with the Gram-Schmidt algorithm and on the other hand with performance evaluation of the 1024 sub-models of a complete set comprising fifteen learning points from ten descriptors including five only are relevant. The crosses represent the results of the 1,024 possible submodels and the curve the submodel selected by the Gram-Schmidt algorithm. Note that, with the exception of the three descriptor sub-model, the sub-models obtained are always the best.

On a porté sur la figure 3 en superposition, en fonction du nombre de descripteurs, le résultat donné par l'algorithme d'évaluation de Gram-Schmidt et la répartition du classement de la variable aléatoire, avec indication, sur l'échelle des ordonnées de droite, de la probabilité en pourcentage. On note ainsi que la probabilité pour que la variable aléatoire soit comprise dans les cinq premiers descripteurs est inférieure à 10 %. On peut ainsi déterminer que, si l'on sélectionne un sous-modèle à cinq descripteurs, la probabilité pour qu'une variable aléatoire explique mieux le problème posé qu'un des cinq descripteurs sélectionnés est inférieure à 10 %. Le niveau de risque détermine le nombre de descripteurs retenu. Ce niveau de risque ne doit pas être trop élevé, car des variables non significatives peuvent alors d'être incorporées. Il ne doit pas être trop faible car des valeurs significatives peuvent de ne pas être incorporées. Dans le cas représenté, les seules possibilités de sélection sont cinq ou six descripteurs, c'est-à-dire le nombre réel de descripteurs significatifs ou ce nombre augmenté d'un descripteur non significatif.The result given by the Gram-Schmidt evaluation algorithm and the distribution of the classification of the random variable, with indication, on the ordinate scale, has been plotted on FIG. 3 in superposition, as a function of the number of descriptors. right, the percentage probability. We thus note that the probability for the random variable to be included in the first five descriptors is less than 10%. We can thus determine that, if we select a sub-model with five descriptors, the probability that a random variable better explains the problem posed than one of the five descriptors selected is less than 10%. The level of risk determines the number of descriptors used. This level of risk should not be too high, since non-significant variables can then be incorporated. It should not be too low as significant values may not be incorporated. In the represented case, the only possibilities selection criteria are five or six descriptors, ie the actual number of significant descriptors or this number increased by a non-significant descriptor.

Cette répartition du classement de la variable aléatoire peut aussi être réalisée uniquement par le calcul, mais on ne décrit pas celui-ci.This distribution of the classification of the random variable can also be carried out only by calculation, but this is not described.

Le traitement qu'on vient de décrire permet ainsi de déterminer les descripteurs qui doivent être conservés et le modèle optimal. On peut alors construire un réseau de neu- rones.The processing just described thus makes it possible to determine the descriptors which must be kept and the optimal model. We can then build a network of neurons.

On a déjà démontré que n'importe quel réseau de neurones à plusieurs couches de type non bouclé pouvait être représenté par un réseau de neurones à une seule couche cachée. On utilise donc initialement un réseau de neurones à une couche cachée dont le nombre de descripteurs (couche d'entrées) a été déterminé, et ayant un nombre de neurones trop élevé, puis on élimine les neurones qui n'ont pas une contribution significative. On poursuit l'apprentissage avec les neurones restants, et on élimine à nouveau les neurones inutiles. On arrête la procédure lorsqu'on n'élimine plus aucun neurone .We have already shown that any multilayer neural network of the non-looped type can be represented by a hidden single-layer neural network. We therefore initially use a neural network with a hidden layer whose number of descriptors (input layer) has been determined, and having a too high number of neurons, then we eliminate the neurons which do not have a significant contribution. We continue learning with the remaining neurons, and eliminate unnecessary neurons again. We stop the procedure when we no longer remove any neuron.

Dans un mode de réalisation particulièrement avantageux de l'invention, on utilise un processus analogue à celui de la sélection des descripteurs pour la sélection des neu- rones. Plus précisément, on introduit un neurone supplémentaire ayant une fonction d'activation qui n'est pas linéaire et dont les paramètres sont aléatoires. Dans cette réalisation, on exécute la procédure jusqu'à ce que ce neurone supplémentaire se classe après les autres neurones. (De manière connue, les fonctions d'activation sont continues, dérivables et bornées, et des exemples sont les fonctions trigonométriques hyperboliques, telles que la tangente hyperbolique, et les fonctions gaussiennes) .In a particularly advantageous embodiment of the invention, a process analogous to that of the selection of descriptors is used for the selection of neutrons. More precisely, we introduce an additional neuron having an activation function which is not linear and whose parameters are random. In this embodiment, the procedure is executed until this additional neuron ranks after the other neurons. (As is known, the activation functions are continuous, differentiable and bounded, and examples are the hyperbolic trigonometric functions, such as the hyperbolic tangent, and the Gaussian functions).

Si l'on dispose d'un très grand nombre d'exemples pour l'apprentissage, il est possible que le neurone supplémentaire soit tout de suite classé le dernier. Dans ce cas, l'utilisation d'un tel neurone supplémentaire ne présente pas d'intérêt. Il est alors préférable d'utiliser un sous- ensemble réduit pour l'apprentissage, afin que le neurone supplémentaire ne soit pas immédiatement le dernier. L'apprentissage s'effectue sur les exemples de ce sous- ensemble, on conserve la valeur moyenne de l'écart quadratique sur le reste de l'ensemble, et on applique la procédure de sélection aux exemples du sous -ensemble ; les coefficients du réseau de neurones correspondent à la valeur moyenne minimale de l'écart quadratique ainsi calculé. De cette manière, on supprime les neurones classés après le neurone supplémentaire.If there are a very large number of examples for learning, it is possible that the additional neuron is immediately ranked last. In this case, the use of such an additional neuron does not present of no interest. It is therefore preferable to use a reduced subset for learning, so that the additional neuron is not immediately the last. Learning takes place on the examples of this subset, we keep the mean value of the quadratic difference over the rest of the set, and we apply the selection procedure to the examples of the subset; the coefficients of the neural network correspond to the minimum mean value of the quadratic difference thus calculated. In this way, the classified neurons are deleted after the additional neuron.

Le procédé selon l'invention présente, grâce au classement des descripteurs, l'intérêt d'indiquer quelles sont les variables les plus significatives. Il permet une réduction considérable du temps de calcul nécessaire pour l'évaluation des descripteurs significatifs, puis pour la construction du réseau de neurones .The method according to the invention presents, thanks to the classification of descriptors, the advantage of indicating which are the most significant variables. It allows a considerable reduction in the computation time necessary for the evaluation of significant descriptors, then for the construction of the neural network.

L'invention concerne aussi des réseaux de neurones réalisés par mise en oeuvre du procédé précité. Ces réseaux de neurones, lorsque leur structure optimale a été ainsi évaluée par mise en oeuvre du procédé de l'invention, peuvent être réalisés par exemple sous forme de circuits intégrés, avec détermination des liaisons entre les entrées, les neurones et la sortie ou les sorties, et avec déter- mination des fonctions d'activation des neurones. ExempleThe invention also relates to neural networks produced by implementing the above method. These neural networks, when their optimal structure has been thus evaluated by implementing the method of the invention, can be produced for example in the form of integrated circuits, with determination of the connections between the inputs, the neurons and the output or the outputs, and with determination of the activation functions of the neurons. Example

On considère maintenant, à titre d'illustration, un exemple d'application de l'invention à la solution d'un problème de modélisation destiné à la simulation d'un processus.We now consider, by way of illustration, an example of application of the invention to the solution of a modeling problem intended for the simulation of a process.

La figure 4 est un graphique représentant, en ordonnées, la valeur donnée par un processus (sur une échelle allant de —15 à +15) en fonction du temps t, porté en abscisses. La courbe en trait gras représente la valeur donnée y_p(t) par le processus en fonction d'une commande u(t) représentée par la courbe en trait fin. Dans la première phase du procédé, on choisit 20 variables possibles y_p(t—1) à y_p(t—10) et u(t—1) à u(t—10). Le graphique de la figure 4 permet d'établir 20 descripteurs correspondant aux 20 variables pour 1 000 exemples. On ajoute un descripteur aléatoire, on exécute la première phase du procédé, et on obtient finalement les 3 variables y_p(t-l), y_p(t-2) et u(t-l).FIG. 4 is a graph representing, on the ordinate, the value given by a process (on a scale ranging from —15 to +15) as a function of time t, plotted on the abscissa. The bold line curve represents the value given y _p (t) by the process as a function of a command u (t) represented by the thin line curve. In the first phase of the process, 20 possible variables are chosen y _p (t — 1) to y _p (t — 10) and u (t — 1) to u (t — 10). The graph in FIG. 4 makes it possible to establish 20 descriptors corresponding to the 20 variables for 1000 examples. We add a random descriptor, we execute the first phase of the process, and we finally obtain the 3 variables y _p (tl), y _p (t-2) and u (tl).

On construit alors un réseau de neurones représentatif de ce processus. On utilise initialement un réseau à 20 neurones, plus un neurone aléatoire, chaque neurone ayant une fonction d'activation en forme de sigmoïde. Après un premier passage, il reste 17 neurones. Après un second passage, il reste 14 neurones. Le traitement s'arrête à 11 ou 12 neurones . Pour évaluer l'intérêt du procédé de l'invention, on construit 21 réseaux de neurones différents (de 0 à 20 neurones) , et on les compare pour déterminer le meilleur, par détermination de l'écart quadratique moyen en fonction du nombre de neurones. Cette évaluation est très longue et nécessite d'importants moyens de calcul. Le résultat indique que le meilleur réseau comporte 11 neurones. Ce résultat confirme l'exactitude du résultat obtenu beaucoup plus rapidement par le procédé de l'invention.We then build a neural network representative of this process. We initially use a network with 20 neurons, plus a random neuron, each neuron having an activation function in the form of a sigmoid. After a first pass, there are 17 neurons left. After a second pass, there are 14 neurons left. The treatment stops at 11 or 12 neurons. To evaluate the advantage of the method of the invention, 21 different neural networks are constructed (from 0 to 20 neurons), and they are compared to determine the best, by determining the mean square deviation as a function of the number of neurons . This evaluation is very long and requires significant means of calculation. The result indicates that the best network has 11 neurons. This result confirms the accuracy of the result obtained much more quickly by the process of the invention.

Il est bien entendu que l'invention n'a été décrite et représentée qu'à titre d'exemple préférentiel et qu'on pourra apporter toute équivalence technique dans ses éléments constitutifs sans pour autant sortir de son cadre. It is understood that the invention has only been described and shown as a preferential example and that any technical equivalence may be made in its constituent elements without going beyond its ambit.

Claims

CLAIMS 1. Method for constructing a neural network intended for modeling a phenomenon, the network comprising inputs intended to receive signals representative of values of variables, neurons intended to apply an activation function to signals that they receive, at least one output intended to transmit data of result of the model of the phenomenon, and connections formed between the inputs and the neurons and between the neurons and the output, of the type which includes: in a first step, determining the variables to be used in models of the phenomenon by determining descriptors representative of each of the values of a variable, in a second step, selecting the variables to be incorporated into at least one optimal model of the phenomenon by evaluating the results of several models, and in a third step, the construction of a neural network by determinate ion of neuron connections according to an optimal model obtained, characterized in that the method comprises

- during or before the first step of determining the descriptors, the introduction of at least one additional variable which has random values, and the determination of a descriptor representative of the values of this additional variable,

- the classification of the descriptors, including that of the additional variable, by applying a criterion of comparison of the results given by the models with the data representative of the result of the phenomenon, with determination of an order of decreasing significance of the descriptors, then

- the elimination of at least one descriptor which, in decreasing order of meaning of the descriptors, is classified after the descriptor representative of the values of the additional variable.

2. Method according to claim 1, characterized in that it further comprises the representation of the descriptors and of the result of the phenomenon by vectors of a space with N dimensions, N being the number of examples of a set of examples of learning the phenomenon, each example comprising at least one value of each of the variables and at least one datum representative of the result of the phenomenon for the corresponding values of the variables.

3. Method according to claim 2, characterized in that the comparison criterion used for the classification of the descriptors is a comparison, in the N-dimensional space, of the angles formed by a vector representative of a descriptor with the vector representative of the result of the phenomenon.

4. Method according to claim 3, characterized in that the classification step comprises the determination of the first descriptor in the order of decreasing meaning of the descriptors, and the projection of the remaining descriptor vectors and of the result vector onto the space at a minus dimension which is orthogonal to this first descriptor, then the classification of the descriptors in this space to one dimension less for the determination of the first, in decreasing order of meaning, of the remaining descriptors, and the projection of the descriptor vectors remainders and of the result vector on a space with one dimension less which is orthogonal to the first descriptor in the decreasing order of significance of the remaining descriptors, and the repetition of these stages until the classification of all the descriptors or until the classification descriptor representative of the values of the additional variable.

5. Method according to any one of the preceding claims, characterized in that the construction of at least one optimal model of the phenomenon by evaluation of the results of several models, comprises

- the construction of several successive sub-models of the phenomenon, each sub-model containing a variable of more than the previous sub-model, the added variable being chosen in the decreasing order of meaning of the descriptors, the variable of the first sub-model being either a constant or the most significant variable, and - the selection of a sub -model as an optimal model by using a selection criterion.

6. Method according to claim 5, characterized in that the criterion for selecting a sub-model comprises the selection of the sub-model having the largest number of descriptors giving a lower risk level of selection of the additional variable at a selected threshold level.

7. A method of constructing a neural network intended for modeling a phenomenon, the network comprising inputs intended to receive signals representative of values of variables which are represented by descriptors, neurons intended to apply a activation function for the signals which they receive, at least one output intended for transmitting result data of the model of the phenomenon, and connections formed between the inputs and the neurons and between the neurons and the output, by determining the connections of the neurons according to the model, characterized in that it comprises:

- the construction of a neural network with a single layer whose number of neurons is certainly too high, the inputs of the neurons corresponding to the descriptors of the model, the neural network containing in addition, in its single layer, at least one neuron additional with an activation function whose parameters have random values, and

- the execution of a process comprising, with the too high number of neurons, a learning of the neurons by use of the descriptors, and the elimination at least of the neuron having the least significant contribution to the result, so that the network has a smaller number of neurons and then - repeating this process with the smaller number of neurons, at least until the neuron to be eliminated is an additional neuron.

8. Method according to claim 7, characterized in that the learning of neurons by using descriptors is carried out with only part of the examples.

9. Method according to one of claims 7 and 8, characterized in that the execution of a process comprises, before the elimination of a neuron, at least one repetition of a training for the confirmation of the neuron having the least significant contribution.

10. Method according to any one of Claims 7 to 9, characterized in that the model of the phenomenon used is an optimal model obtained by implementing a method according to any one of Claims 1 to 6.