WO2024231616A1

WO2024231616A1 - Method and device for determining a visibility mask for a vision system on board a vehicle

Info

Publication number: WO2024231616A1
Application number: PCT/FR2024/050462
Authority: WO
Inventors: Hai Li
Original assignee: Stellantis Auto SAS
Current assignee: Stellantis Auto SAS
Priority date: 2023-05-05
Filing date: 2024-04-09
Publication date: 2024-11-14
Anticipated expiration: 2025-11-05
Also published as: FR3148485A1

Abstract

The invention relates to a method and a device for determining a visibility mask for a stereoscopic vision system on board a vehicle (10). The vision system comprises at least two cameras (11, 12) for acquiring images of the same three-dimensional scene from determined viewpoints. For this purpose, first and second images are received, depths associated with a set of pixels in the first image are predicted, this set of pixels is reprojected into the three-dimensional scene as a set of points according to the predicted depths and a third image is generated by projecting the set of points. A visibility mask is then determined from the spatial co-ordinates of the points of the set of points and from the co-ordinates of the pixels projected into the third image.

Description

DESCRIPTION Titre : Procédé et dispositif de détermination d’un masque de visibilité pour un système de vision embarqué dans un véhicule. Domaine technique [0001] La présente invention revendique la priorité de la demande française 2304517 déposée le 05.05.2023 dont le contenu (texte, dessins et revendications) est ici incorporé par référence. La présente invention concerne les procédés et dispositifs de détermination d’un masque de visibilité pour système de vision embarqué dans un véhicule, par exemple dans un véhicule automobile. La présente invention concerne également un procédé et un dispositif de contrôle d’un ou plusieurs systèmes ADAS embarqués dans un véhicule à partir d’un masque de visibilité déterminé. Arrière-plan technologique [0002] De nombreux véhicules modernes sont équipés de systèmes d’aide à la conduite dits ADAS (de l’anglais « Advanced Driver-Assistance System » ou en français « Système d’aide à la conduite avancé »). De tels systèmes ADAS sont des systèmes de sécurité passifs et actifs conçus pour éliminer la part d'erreur humaine dans la conduite de véhicules de tous types. Les ADAS utilisent des technologies avancées pour assister le conducteur pendant la conduite et améliorer ainsi ses performances. Les ADAS utilisent une combinaison de technologies de capteurs pour percevoir l’environnement autour d’un véhicule, puis fournissent des informations au conducteur ou agissent sur certains systèmes du véhicule. [0003] Il existe plusieurs niveaux d’ADAS, tels que les caméras de recul et les capteurs d'angle mort, les systèmes d'alerte de franchissement de ligne, les régulateurs de vitesse adaptatifs ou encore les systèmes de stationnement automatique. [0004] Les ADAS embarqués dans un véhicule sont alimentés par des données obtenues d’un ou plusieurs capteurs embarqués tels que, par exemple, des caméras. Ces caméras permettent notamment de détecter et de situer d’autres usagers de la route ou d’éventuels obstacles présents autour d’un véhicule afin, par exemple : - d’adapter l’éclairage du véhicule en fonction de la présence d’autres usagers ; - de réguler de façon automatique la vitesse du véhicule ; - d’agir sur le système de freinage en cas de risque d’impact avec un objet. [0005] De la qualité des données émises par un système de vision dépend donc le bon fonctionnement des périphériques d’aides à la conduite utilisant ces données. [0006] De nombreux systèmes de vision perçoivent un environnement autour d’un véhicule à partir de plusieurs images acquises par une ou plusieurs caméras. Lors de l’exploitation des images, des zones occluses des images qui correspondent à des zones de l’environnement qui ne sont pas présentes sur l’ensemble des images acquises sont définies. Un masque de visibilité associé à une image définit alors par exemple un filtre permettant de déterminer des pixels associés à des zones ne se retrouvant pas dans les autres images. [0007] Des solutions pour détecter une occlusion, c’est-à-dire pour déterminer un masque de visibilité existent. [0008] Une première solution présentée par « Occlusion Aware Unsupervised Learning of Optical Flow » de Yang Wang, Yi Yang, Zhenheng Yang, Liang Zhao, Peng Wang et Wei Xu parue le 4 avril 2018 se base sur le flux optique inverse. Pour chaque pixel d’une première image représenté par ses coordonnées, l’algorithme vérifie si un pixel d’une deuxième image arrive à ce pixel de la première image avec le flux optique inverse en balayant tous les pixels de la deuxième image. Cette méthode peut être utilisée pour les deux sens du flux d’optique pour identifier les zones occluses des deux images. [0009] Une deuxième solution est décrite dans le document « Geometry-based Occlusion- Aware Unsupervised Stereo Matching for Autonomous Driving algorithm ». La détection des zones occluses se base sur une contrainte géométrique : le pixel occlus et un autre pixel qui le cache se retrouvent projetés dans un même pixel d’une image reconstruite. Résumé de la présente invention [0010] Un objet de la présente invention est de résoudre au moins l’un des problèmes de l’arrière-plan technologique décrit précédemment. [0011] Un autre objet de la présente invention est de proposer une solution alternative pour déterminer un masque de visibilité pour tout système de vision afin d’améliorer la qualité des données issues de la ou les caméras du système de vision. [0012] Un autre objet de la présente invention est de réduire les ressources nécessaires pour la détermination d’un masque de visibilité. [0013] Selon un premier aspect, la présente invention concerne un procédé de détermination de masques de visibilité par un système de vision stéréoscopique embarqué dans un véhicule, le système de vision stéréoscopique comprenant un ensemble de caméras d’au moins deux caméras disposées de manière à acquérir chacune une image d’une scène tridimensionnelle selon un point de vue différent, ladite deuxième caméra se situant à droite de la première caméra d’un point de vue de la première caméra, le procédé étant caractérisé en ce qu’il comprend les étapes suivantes : - réception de premières et deuxièmes données respectivement représentatives d’une première et deuxième images acquises par respectivement une première et deuxième caméras de l’ensemble de caméras à un même instant temporel d’acquisition ; - prédiction de profondeurs associées à un ensemble de pixels de la première image par le système de vision stéréoscopique à partir d’un modèle de prédiction appris, chaque pixel de la première image ayant des coordonnées principales dans la première image ; - reprojection dans la scène tridimensionnelle de l’ensemble de pixels sous forme d’un ensemble de points en fonction des profondeurs, d’une matrice intrinsèque de la première caméra et de paramètres extrinsèques du système de vision stéréoscopique ; - détermination d’un premier masque de visibilité associé aux pixels de l’ensemble de pixels en fonction des coordonnées de points de l’ensemble de points, un pixel de l’ensemble de pixels étant non visible dans la deuxième image si les coordonnées d’un point dudit ensemble de points associé audit pixel le situent en dehors d’un champ de vision de la deuxième caméra déterminé en fonction d’une largeur de la deuxième image et d’une distance focale de la deuxième caméra ; - génération d’une troisième image par projection de l’ensemble de points en fonction d’une matrice intrinsèque de la deuxième caméra, des coordonnées secondaires dans la troisième image étant associées à chaque pixel de l’ensemble de pixels ; - rectification de la première image à partir des paramètres intrinsèques et extrinsèques des deux caméras, pour chaque ligne de la première image rectifiée, balayage de gauche à droite, selon un point de vue de la première caméra, de valeurs d’indices de colonne d’arrivée et détection d’un ensemble de pixels irréguliers dont l’indice de colonne d’arrivée ne suit pas une fonction monotone représentative d’une évolution d’indices de colonne d’arrivée en fonction d’un indice de colonne dans la première image rectifiée, et pour chaque pixel irrégulier de l’ensemble, identification d’un ensemble de pixels occlus dans la ligne à gauche de chaque pixel irrégulier dont un indice de colonne d’arrivée est supérieur ou égal à un indice de colonne d’arrivée de chaque pixel irrégulier, un deuxième masque de visibilité étant déterminé comme une union des ensembles des pixels occlus ; et - détermination d’un troisième masque de visibilité par association du premier masque de visibilité et du deuxième masque de visibilité. [0014] Selon une variante de procédé, la reprojection d’un pixel dans la scène tridimensionnelle est réalisée à l’aide de la formule suivante : ^^_{3 ^^}( ^^ _^^) = ^^. ^^( ^^ _^^| ^^⁻¹, ^^( ^^ _^^)) avec :

la scène tridimensionnelle du point issu de la reprojection du pixel ^^ _^^ de la première image, - ^^ une matrice de déplacement entre une position de la première caméra et une position de la deuxième caméra, - ^^ la matrice intrinsèque de la première caméra associée à la projection d’un point de la scène tridimensionnelle dans une image acquise par la première caméra, - ∅ une fonction de reprojection dans la scène tridimensionnelle d’un pixel en fonction de sa profondeur, - ^^( ^^ _^^) est une profondeur du pixel ^^ _^^ prédite par le système de vision stéréoscopique. [0015] Selon une autre variante de procédé, la projection d’un point de la scène tridimensionnelle est réalisée à l’aide de la formule suivante : ^_{^ ^^ = ^^} ⁽ _^^′ ^[ _{^^3 ^^} ⁽ _{^^ ^^} ^)]) avec : - ^^ une fonction pour passer de coordonnées homogènes dans l’espace à trois dimensions à des coordonnées pixels en deux dimensions en supprimant une dimension d’un vecteur, - ^^′ la matrice intrinsèque de la deuxième caméra associée à la projection d’un point de la scène tridimensionnelle dans une image acquise par la deuxième caméra, - ^^_{3 ^^} ⁽ ^^ _^^ ⁾ les coordonnées dans la scène tridimensionnelle du point issu de la reprojection du pixel ^^ _^^ de la première image. [0016] Selon encore une variante de procédé, le premier masque de visibilité est obtenu à l’aide de la formule suivante : ^^ ⋃ ^^ _^^ ^^ ^^ ^^ ^^ ^^ ^^ _{^^3 ^^( ^^ ^^)} ^{^^/2} ^_{^ ^^( ^^ ^^)} ^> _^^

issu de la reprojection du pixel ^^ _^^ de la première image, un axe des abscisses étant défini parallèlement à un axe suivant lequel se situent la première et la deuxième caméras, - ^^( ^^ _^^) la profondeur du pixel ^^ _^^ de la première image prédite par le système de vision stéréoscopique, - ^^ la largeur de la deuxième image, et - ^^ la distance focale de la deuxième caméra. [0017] Selon encore une autre variante de procédé, les profondeurs sont prédites par un réseau de neurones convolutif. [0018] Selon une variante supplémentaire de procédé, le réseau de neurones convolutif est entraîné pour minimiser une erreur photométrique définie par la fonction de perte suivante : _^^ ⁽ _^^ ⁾ _{= ∑ [} ⁽ _{1 − ^^} ⁾ _{⋅ | ^^} ⁽ _^^ ⁾ _{− ^^} ⁽ _^^ ⁾ _{| + ^^ ⋅ (1 −} ¹ _{^^ ^^ ^^ ^^ ( ^^} ⁽ _^^ ⁾ _{, ^^ ^^ ))]} _{^^ 2} ^{( )}

- ^^ ( ^^) une valeur du pixel ^^ dans la troisième image ; - SSIM une fonction qui prend en compte une structure locale ; et - ^^ un facteur de pondération dépendant d’un type d’environnement routier. [0019] Selon un deuxième aspect, la présente invention concerne un dispositif de détermination d’un masque de visibilité pour un système de vision embarqué dans un véhicule, le dispositif comprenant une mémoire associée à au moins un processeur configuré pour la mise en œuvre des étapes du procédé selon le premier aspect de la présente invention. [0020] Selon un troisième aspect, la présente invention concerne un véhicule, par exemple de type automobile, comprenant un dispositif tel que décrit ci-dessus selon le deuxième aspect de la présente invention. [0021] Selon un quatrième aspect, la présente invention concerne un programme d’ordinateur qui comporte des instructions adaptées pour l’exécution des étapes du procédé selon le premier aspect de la présente invention, ceci notamment lorsque le programme d’ordinateur est exécuté par au moins un processeur. [0022] Un tel programme d’ordinateur peut utiliser n’importe quel langage de programmation et être sous la forme d’un code source, d’un code objet, ou d’un code intermédiaire entre un code source et un code objet, tel que dans une forme partiellement compilée, ou dans n’importe quelle autre forme souhaitable. [0023] Selon un cinquième aspect, la présente invention concerne un support d’enregistrement lisible par un ordinateur sur lequel est enregistré un programme d’ordinateur comprenant des instructions pour l’exécution des étapes du procédé selon le premier aspect de la présente invention. [0024] D’une part, le support d’enregistrement peut être n’importe quel entité ou dispositif capable de stocker le programme. Par exemple, le support peut comporter un moyen de stockage, tel qu’une mémoire ROM, un CD-ROM ou une mémoire ROM de type circuit microélectronique, ou encore un moyen d’enregistrement magnétique ou un disque dur. [0025] D’autre part, ce support d’enregistrement peut également être un support transmissible tel qu’un signal électrique ou optique, un tel signal pouvant être acheminé via un câble électrique ou optique, par radio classique ou hertzienne ou par faisceau laser autodirigé ou par d’autres moyens. Le programme d’ordinateur selon la présente invention peut être en particulier téléchargé sur un réseau de type Internet. [0026] Alternativement, le support d’enregistrement peut être un circuit intégré dans lequel le programme d’ordinateur est incorporé, le circuit intégré étant adapté pour exécuter ou pour être utilisé dans l’exécution du procédé en question. [0027] Brève description des figures [0028] D’autres caractéristiques et avantages de la présente invention ressortiront de la description des exemples de réalisation particuliers et non limitatifs de la présente invention ci-après, en référence aux figures 1 à 5 annexées, sur lesquelles : [0029] [Fig.1] illustre schématiquement un système de vision stéréoscopique équipant un véhicule, selon un exemple de réalisation particulier et non limitatif de la présente invention ; [0030] [Fig.2] illustre schématiquement un système de vision stéréoscopique équipant un véhicule, selon un exemple de réalisation particulier et non limitatif de la présente invention ; [0031] [Fig.3] illustre schématiquement un dispositif configuré pour la détermination d’un masque de visibilité par système de vision embarqué dans le véhicule de la figure 1, selon un exemple de réalisation particulier et non limitatif de la présente invention ; [0032] [Fig.4] illustre un organigramme des différentes étapes d’un procédé de détermination d’un masque de visibilité par système de vision embarqué dans le véhicule de la figure 1, selon un exemple de réalisation particulier et non limitatif de la présente invention. [0033] [Fig.5] illustre une matrice présentant différents indices de colonne pour des pixels d’une ligne et un critère de visibilité pour un système de vision embarqué dans le véhicule de la figure 1, selon un exemple de réalisation particulier et non limitatif de la présente invention. [0034] Description des exemples de réalisation [0035] Un procédé et un dispositif de détermination d’un masque de visibilité pour un système de vision embarqué dans un véhicule vont maintenant être décrits dans ce qui va suivre en référence conjointement aux figures 1 à 5. Des mêmes éléments sont identifiés avec des mêmes signes de référence tout au long de la description qui va suivre. [0036] Selon un exemple particulier et non limitatif de réalisation de la présente invention, un procédé de détermination d’un masque de visibilité pour un système de vision stéréoscopique embarqué dans un véhicule est par exemple mis en œuvre par un calculateur du système embarqué du véhicule contrôlant ce système de vision. [0037] Le système de vision stéréoscopique comprend un ensemble de caméras d’au moins deux caméras disposées de manière à acquérir chacune une image d’une scène tridimensionnelle selon un point de vue différent, la deuxième caméra se situant à droite du point de vue de ladite première caméra. [0038] A cet effet, le procédé de détermination d’un masque de visibilité par un système de vision stéréoscopique embarqué dans un véhicule comprend la réception de premières et deuxièmes données respectivement représentatives d’une première et deuxième images acquises selon un point de vue différent par les première et deuxième caméras à un même instant temporel d’acquisition. [0039] Le procédé comprend également la prédiction de profondeurs associées à un ensemble de pixels de la première image par le système de vision stéréoscopique à partir d’un modèle de prédiction appris, chaque pixel de la première image ayant des coordonnées principales dans la première image, la reprojection dans la scène tridimensionnelle des pixels de la première image sous forme d’un ensemble de points en fonction des profondeurs, d’une matrice intrinsèque de la première caméra et de paramètres extrinsèques du système de vision stéréoscopique. [0040] Le procédé détermine alors un premier masque de visibilité associé aux pixels de l’ensemble de pixels en fonction des coordonnées de points de l’ensemble de points, un pixel de l’ensemble de pixels étant non visible dans la deuxième image si les coordonnées d’un point de l’ensemble de points associé au pixel le situent en dehors d’un champ de vision de la deuxième caméra déterminé en fonction d’une largeur de la deuxième image et d’une distance focale de la deuxième caméra. [0041] Le procédé comprend ensuite la génération d’une troisième image par projection de l’ensemble de points en fonction d’une matrice intrinsèque de la deuxième caméra, des coordonnées secondaires dans la troisième image étant associées à chaque pixel de l’ensemble de pixels. La génération de cette troisième image permet alors la détermination d’un deuxième masque de visibilité associé aux pixels de l’ensemble de pixels. Pour chaque ligne de la première image rectifiée, des valeurs d’indices de colonne d’arrivée sont balayées et un ensemble de pixels irréguliers est détecté. Pour chaque pixel irrégulier de l’ensemble, un ensemble de pixels est occlus est identifié dans la ligne à gauche de chaque pixel irrégulier détecté. Un deuxième masque de visibilité est alors déterminé comme une union des ensembles des pixels occlus. [0042] Un troisième masque de visibilité est alors déterminé par association du premier masque de visibilité et du deuxième masque de visibilité. [0043] La figure 1 illustre schématiquement un système de vision stéréoscopique équipant un véhicule, selon un exemple de réalisation particulier et non limitatif de la présente invention. [0044] Dans cet exemple, le véhicule 10 correspond à un véhicule à moteur thermique, à moteur(s) électrique(s) ou encore un véhicule hybride avec un moteur thermique et un ou plusieurs moteurs électriques. Le véhicule 10 correspond ainsi, par exemple, à un véhicule terrestre tel une automobile, un camion, un car, une moto. Enfin, le véhicule 10 correspond à un véhicule autonome ou non, c’est-à-dire un véhicule circulant selon un niveau d’autonomie déterminé ou sous la supervision totale du conducteur. [0045] Le véhicule 10 comprend avantageusement plusieurs caméras 11, 12 embarquées, chacune configurée pour acquérir des images d’une scène tridimensionnelle dans l’environnement du véhicule 10. Cet ensemble de caméras 11, 12 forme le système de vision stéréoscopique. Deux caméras 11, 12 sont illustrées sur la figure 1. La présente invention ne se limite cependant pas à un système de vision stéréoscopique comprenant deux caméras mais s’étend à tout système de vision comprenant deux caméras ou plus, par exemple 3, 4 ou 5 caméras. [0046] Les caméras 11, 12 disposent de paramètres intrinsèques connus. Ces paramètres se composent notamment de : - distance focale f1 de la première caméra 11 ; - distance focale f2 de la deuxième caméra 12 ; - distorsions qui sont dues aux imperfections du système optique de chaque caméra ; - direction C1 de l’axe optique de la première caméra 11 ; - direction C2 de l’axe optique de la deuxième caméra 12 ; - résolutions respectives des caméras 11, 12. [0047] Les paramètres intrinsèques caractérisent la transformation qui associe, pour un point image, les coordonnées caméra aux coordonnées pixel, dans chaque caméra. Ces paramètres ne changent pas si l'on déplace la caméra. [0048] Les distorsions, qui sont dues aux imperfections du système optique telles que des défauts de forme et de positionnement des lentilles des caméras, vont dévier les faisceaux lumineux et donc induire un écart de positionnement pour le point projeté par rapport à un modèle idéal. Il est possible alors de compléter le modèle de caméra en y introduisant les trois distorsions qui génèrent le plus d’effets, à savoir les distorsions radiales, de décentrage et prismatiques, induites par des défauts de courbure, de parallélisme des lentilles et de coaxialité des axes optiques. Dans cet exemple, les caméras sont supposées parfaites, c’est-à-dire que les distorsions ne sont pas prises en compte ou que leur correction est traitée au moment de l’acquisition d’une image. [0049] Ces caméras 11, 12 sont disposées de manière à acquérir chacune une image d’une scène tridimensionnelle selon un point de vue différent, la deuxième caméra 12 se situant à droite du point de vue de ladite première caméra 11, le premier point de vue est par exemple localisé sur ou dans le rétroviseur gauche du véhicule 10 ou en haut du pare-brise du véhicule 10, le deuxième point de vue est par exemple localisé sur ou dans le rétroviseur droit du véhicule 10 ou en haut du pare-brise du véhicule 10. Dans le cas où deux caméras sont situées en haut du pare-brise du véhicule, celles-ci sont alors placées à une certaine distance. [0050] Un premier repère est associé à la première caméra 11 : - la direction de l’axe y est définie par la position de la deuxième caméra 11, de manière à placer la deuxième caméra 12 sur l’axe y de la première caméra 11. La distance B séparant les deux caméras 11, 12 est appelée base de référence (en anglais « baseline ») et la direction séparant les deux caméras 11, 12 est celle de l’axe y ; - la direction de l’axe x est définie orthogonale à celle de l’axe y et orthogonale à celle de l’axe optique C1 de la première caméra 11 ; - la direction de l’axe z est définie orthogonale aux directions des axes x et y. Les trois axes x, y et z forment ainsi un repère orthonormé. [0051] Les paramètres extrinsèques liés à la position des caméras 11, 12 sont les paramètres suivants : - 3 translations dans les directions x, y et z : Tx, Ty et Tz constituant le vecteur translation T ; et - 3 rotations autour des axes x, y et z : Rx, Ry et Rz, constituant la matrice de rotation R. [0052] Une contrainte principale du système de vision stéréoscopique utilisé dans l’automobile est, par exemple, la grande distance entre les deux caméras. En effet, pour pouvoir couvrir une plage de mesure de 200 mètres, le « baseline » doit atteindre 60cm pour les caméras couramment utilisées dans ce domaine. [0053] Les deux caméras 11, 12 font l’acquisition d’images d’une scène tridimensionnelle se situant devant le véhicule 10, la première caméra 11 couvrant seule un premier champ d’acquisition 13, la deuxième caméra 12 couvrant seule un deuxième champ d’acquisition 14 et les deux caméras 11, 12 couvrant toutes deux un troisième champ d’acquisition 15. Les premier et troisième champs d’acquisition 13, 15 permettent ainsi une vision monoscopique de la scène tridimensionnelle par la première caméra 11, les deuxième et troisième champs d’acquisition 14, 15 permettent une vision monoscopique de la scène tridimensionnelle par la deuxième caméra 12 et le troisième champ d’acquisition 15 permet une vision stéréoscopique de la scène tridimensionnelle par le système de vision stéréoscopique composé des deux caméras 11, 12. [0054] Un obstacle 18 est placé dans le champ d’acquisition des caméras, par exemple dans le troisième champ d’acquisition 15. La présence de l’obstacle 18 définit un champ d’occlusion pour le système de vision stéréoscopique composé ici des trois champs 16, 17 et 19. [0055] Parmi ces trois champs, le champ 16 est visible depuis la deuxième caméra 12. La partie de la scène tridimensionnelle présente dans ce champ 16 est donc observable à l’aide du système de vision monoscopique composé de la deuxième caméra 12. [0056] Le champ 17 est quant à lui visible depuis la première caméra 11. La partie de la scène tridimensionnelle présente dans ce champ 17 est donc observable à l’aide du système de vision monoscopique composé de la deuxième caméra 12. [0057] Enfin, le champ 19 n’est visible depuis aucune des caméras. La partie de la scène tridimensionnelle présente dans ce champ 19 n’est donc pas observable. [0058] Les directions C1, C2 des axes optiques sont représentatifs d’une orientation du champ de vision de chaque caméra 11, 12. [0059] Il est évident qu’il est possible d’utiliser un tel système de vision stéréoscopique pour prendre des images de scènes tridimensionnelles situées sur les côtés ou derrière le véhicule 10 en l’équipant de caméras placées et orientées différemment. [0060] Les images acquises par les caméras 11, 12 à un instant temporel d’acquisition donné se présentent sous la forme de données représentant des pixels caractérisés par : - des coordonnées dans chaque image ; et - des données relatives aux couleurs et luminosité des objets de la scène tridimensionnelle observée sous forme par exemple de coordonnées colorimétriques RGB (de l’anglais « Red Green Blue », en français « Rouge Vert Bleu ») ou TSL (Ton, Saturation, Luminosité). [0061] Les images acquises par les caméras 11, 12 représentent des vues d’une même scène tridimensionnelle prises à des points de vue différents, les position des caméras étant distinctes. Sur cette scène tridimensionnelle se trouvent par exemple : - des bâtiments ; - des infrastructures routières ; - d’autres usagers immobiles, par exemple un véhicule stationné ; et/ou - d’autres usagers mobiles, par exemple un autre véhicule, un cycliste ou un piéton en mouvement. [0062] Ces images sont envoyées à un calculateur d’un dispositif équipant le véhicule 10 ou stockées dans une mémoire d’un dispositif accessible à un calculateur d’un dispositif équipant le véhicule 10. [0063] La figure 2 illustre schématiquement un système de vision stéréoscopique équipant un véhicule, selon un exemple de réalisation particulier et non limitatif de la présente invention. [0064] Des points 20, 21, 22 de la scène tridimensionnelle sont visibles du point de vue de la première caméra 11. [0065] Des points 21 sont également visibles du point de vue de la deuxième caméra 12. [0066] Des points 20 sont quant à eux occlus du point de vue de la deuxième caméra 12 car ils sont masqués par des points 21 se situant sur des mêmes axes dans le champ de vision de la deuxième caméra 12. [0067] Des points 22 sont situés en dehors du champ de vision de la deuxième caméra 12. [0068] Ainsi, lors de l’acquisition des première et deuxième images par respectivement la première caméra 11 et la deuxième caméra 12, des pixels associés aux points visibles 20, 21, 22 depuis le point de vue de la première caméra 11 seront présents dans la première image, alors que seuls des pixels associés aux points 21 visibles depuis le point de vue de la deuxième caméra seront présents dans la deuxième image. [0069] Les pixels associés aux points 20 visibles depuis le point de vue de la première caméra 11 et non visibles depuis le point de vue de la deuxième caméra 12 sont appelés par la suite pixels occlus. [0070] La figure 3 illustre schématiquement un dispositif 4 configuré pour la détermination d’un masque de visibilité pour un système de vision embarqué dans un véhicule 10, selon un exemple de réalisation particulier et non limitatif de la présente invention. Le dispositif 4 correspond par exemple à un dispositif embarqué dans le premier véhicule 10, par exemple un calculateur. [0071] Le dispositif 4 est par exemple configuré pour la mise en œuvre des opérations et/ou étapes décrites en regard des figures 1, 2 et 4. Des exemples d’un tel dispositif 4 comprennent, sans y être limités, un équipement électronique embarqué tel qu’un ordinateur de bord d’un véhicule, un calculateur électronique tel qu’une UCE (« Unité de Commande Electronique »), un téléphone intelligent, une tablette, un ordinateur portable. Les éléments du dispositif 4, individuellement ou en combinaison, peuvent être intégrés dans un unique circuit intégré, dans plusieurs circuits intégrés, et/ou dans des composants discrets. Le dispositif 4 peut être réalisé sous la forme de circuits électroniques ou de modules logiciels (ou informatiques) ou encore d’une combinaison de circuits électroniques et de modules logiciels. [0072] Le dispositif 4 comprend un (ou plusieurs) processeur(s) 40 configurés pour exécuter des instructions pour la réalisation des étapes du procédé et/ou pour l’exécution des instructions du ou des logiciels embarqués dans le dispositif 4. Le processeur 40 peut inclure de la mémoire intégrée, une interface d’entrée/sortie, et différents circuits connus de l’homme du métier. Le dispositif 4 comprend en outre au moins une mémoire 41 correspondant par exemple à une mémoire volatile et/ou non volatile et/ou comprend un dispositif de stockage mémoire qui peut comprendre de la mémoire volatile et/ou non volatile, telle que EEPROM, ROM, PROM, RAM, DRAM, SRAM, flash, disque magnétique ou optique. [0073] Le code informatique du ou des logiciels embarqués comprenant les instructions à charger et exécuter par le processeur est par exemple stocké sur la mémoire 41. [0074] Selon différents exemples de réalisation particuliers et non limitatifs, le dispositif 4 est couplé en communication avec d’autres dispositifs ou systèmes similaires (par exemple d’autres calculateurs) et/ou avec des dispositifs de communication, par exemple une TCU (de l’anglais « Telematic Control Unit » ou en français « Unité de Contrôle Télématique »), par exemple par l’intermédiaire d’un bus de communication ou au travers de ports d’entrée / sortie dédiés. [0075] Selon un exemple de réalisation particulier et non limitatif, le dispositif 4 comprend un bloc 42 d’éléments d’interface pour communiquer avec des dispositifs externes. Les éléments d’interface du bloc 42 comprennent une ou plusieurs des interfaces suivantes : - interface radiofréquence RF, par exemple de type Wi-Fi® (selon IEEE 802.11), par exemple dans les bandes de fréquence à 2,4 ou 5 GHz, ou de type Bluetooth® (selon IEEE 802.15.1), dans la bande de fréquence à 2,4 GHz, ou de type Sigfox utilisant une technologie radio UBN (de l’anglais Ultra Narrow Band, en français bande ultra étroite), ou LoRa dans la bande de fréquence 868 MHz, LTE (de l’anglais « Long-Term Evolution » ou en français « Evolution à long terme »), LTE-Advanced (ou en français LTE-avancé) ; - interface USB (de l’anglais « Universal Serial Bus » ou « Bus Universel en Série » en français) ; interface HDMI (de l’anglais « High Definition Multimedia Interface », ou « Interface Multimédia Haute Définition » en français) ; - interface LIN (de l’anglais « Local Interconnect Network », ou en français « Réseau interconnecté local »). [0076] Selon un autre exemple de réalisation particulier et non limitatif, le dispositif 4 comprend une interface de communication 43 qui permet d’établir une communication avec d’autres dispositifs (tels que d’autres calculateurs du système embarqué) via un canal de communication 430. L’interface de communication 43 correspond par exemple à un transmetteur configuré pour transmettre et recevoir des informations et/ou des données via le canal de communication 430. L’interface de communication 43 correspond par exemple à un réseau filaire de type CAN (de l’anglais « Controller Area Network » ou en français « Réseau de contrôleurs »), CAN FD (de l’anglais « Controller Area Network Flexible Data-Rate » ou en français « Réseau de contrôleurs à débit de données flexible »), FlexRay (standardisé par la norme ISO 17458) ou Ethernet (standardisé par la norme ISO/IEC 802-3). [0077] Selon un exemple de réalisation particulier et non limitatif, le dispositif 4 peut fournir des signaux de sortie à un ou plusieurs dispositifs externes, tels qu’un écran d’affichage 440, tactile ou non, un ou des haut-parleurs 450 et/ou d’autres périphériques 460 (système de projection) via respectivement les interfaces de sortie 44, 45, 46. Selon une variante, l’un ou l’autre des dispositifs externes est intégré au dispositif 4. [0078] La figure 4 illustre un organigramme des différentes étapes d’un procédé 2 de détermination d’un masque de visibilité pour un système de vision embarqué dans le véhicule de la figure 1, le système de vision comprenant au moins une caméra 11 disposée de manière à acquérir une image d’une scène tridimensionnelle selon un point de vue déterminé, selon un exemple de réalisation particulier et non limitatif de la présente invention. [0079] Le procédé est par exemple mis en œuvre par un ou plusieurs processeurs d’un ou plusieurs calculateurs embarqués dans le véhicule 10, par exemple par un calculateur contrôlant le système de vision. [0080] Dans une première étape 31, le calculateur reçoit des premières données représentatives d’une première image acquise par une caméra première 11 à un instant temporel donné. [0081] Dans une deuxième étape 32, le calculateur reçoit des deuxièmes données représentatives d’une deuxième image acquise par une deuxième caméra 12 au même instant temporel donné. [0082] Les deux images reçues correspondent à deux vues d’une même scène tridimensionnelle se déroulant aux alentours du véhicule 10 prises depuis deux points de vue différents à un même instant temporel donné. [0083] Afin de faciliter l’analyse des deux images reçues, les première et deuxième images sont rectifiées suivant une méthode connue de l’homme du métier. La rectification des images est faite à l’aide des paramètres intrinsèques et extrinsèques des caméras. Une telle méthode est décrite, par exemple, dans « Rectification Projective d’Images Stéréo non Calibrées Infrarouges avec prise en compte globale de la minimisation des distorsions » de Benoit Ducarouge, Thierry Sentenac, Florian Bugarin et Michel Devy du 16 juillet 2009. [0084] La méthode de rectification consiste à réorienter les lignes épipolaires pour qu’elles soient parallèles avec l’axe horizontal de l’image. Cette méthode est décrite par une transformation qui projette les épipoles à l’infini et dont les points correspondants sont nécessairement sur une même ordonnée. [0085] Un algorithme de rectification consiste, par exemple, en 4 étapes : - Faire pivoter (virtuellement) la première caméra 11 pour que l'épipole aille à l'infini le long de l'axe horizontal du repère qui lui est associé; - Appliquer la même rotation à la deuxième caméra 12 pour se retrouver dans la configuration géométrique initiale; - Faire pivoter la deuxième caméra de la rotation associée à la matrice de rotation ‘R’, correspondant au paramètre extrinsèque du système de vision stéréoscopique de départ; - Ajuster l'échelle dans les deux repères caméras. [0086] Il est à noter que la rectification simplifie la mise en correspondance des pixels des images stéréo, c’est-à-dire obtenues par un système de vision stéréoscopique. Le pixel correspondant dans la deuxième image à un pixel de la première image (et réciproquement) est positionné sur la même ligne. A partir de la connaissance de la géométrie épipolaire et donc d’une matrice fondamentale du système stéréo, l’objectif est alors de déterminer une paire de transformations projectives, appelée homographies, qui réorientent les projections épipolaires parallèlement aux lignes des images, donc à l’axe horizontal des caméras rectifiées. [0087] Dans une étape 33, des profondeurs associées à un ensemble de pixels de la première image sont prédites par le système de vision stéréoscopique à partir d’un modèle de prédiction appris. [0088] Un tel apprentissage auto-supervisé, c’est-à-dire ne nécessitant pas d’intervention extérieure ou une utilisation de données annotées, est par exemple réalisé en minimisant l’erreur photométrique calculée lors de reconstructions d’images. [0089] Des disparités sont déterminées à partir de l’obtention de première et deuxième images de la première caméra 11 et de la deuxième caméra 12 à un même instant temporel, les disparités étant définies par la fonction suivante : [0090] [Math 1] [0091] ^^ _^ ^{^} _^ ^{^} _^^ = ^^ _^ ^{^} ^^{^} − ^^( ^^ _^^) [0092] avec : - ^^ _^ ^{^} _^ ^{^} _^^ l’abscisse d’un pixel dans la deuxième image, - ^^ _^ ^{^} _^ ^{^} l’abscisse d’un pixel dans la première image, et - ^^( ^^ _^^) une disparité déterminée pour un pixel ^^ _^^ de la première image. [0093] des disparités précédemment déterminées, des profondeurs sont calculées pour les pixels de la première image : [0094] [Math 2] [0095] ^^ ( ^^ ) = ^^ × ^{^^} ^_{^ ^^ ^^ 1 ^^( ^^ ^^)} [0096]

- _{^^ ^^} du pixel ^^ _^^ de la première image prédite par le système de vision stéréoscopique , - ^^( ^^ _^^) une disparité déterminée pour un pixel ^^ _^^ de la première image, et - ^^₁ la distance focale de la première caméra 11. [0097] Une troisième image est reconstruite à partir de la première image et des profondeurs précédemment calculées via la formule suivante : [0098] [Math 3] [0099] ^^ _^^ = ^^( ^^′[ ^^. ^^( ^^ _^^| ^^⁻¹, ^^( ^^ _^^))]) avec :

coordonnées homogènes dans l’espace à trois dimensions à des coordonnées pixels en deux dimensions en supprimant une dimension d’un vecteur, - ^^ la matrice intrinsèque de la première caméra 11 associée à la projection d’un point de la scène tridimensionnelle dans une image acquise par la première caméra 11, - ^^′ la matrice intrinsèque de la deuxième caméra 12 associée à la projection d’un point de la scène tridimensionnelle dans une image obtenue par la deuxième caméra 12, - ^^ une matrice de déplacement entre une position de la première caméra et une position de la deuxième caméra, - ∅ une fonction de reprojection dans la scène tridimensionnelle d’un pixel en fonction de sa profondeur, et - ^^( ^^ _^^) est une profondeur du pixel ^^ _^^ de la première image prédite par le système de vision stéréoscopique . [0100] L’image reconstruite est ensuite comparée à la deuxième image afin de déterminer une erreur photométrique : [0101] [Math 4] [0102] ^^⁽ ^^⁾ = ^∑ _^^ [⁽1 − ^^⁾ ⋅ | ^^⁽ ^^⁾ − ^^ ⁽ ^^⁾| + ^^ ⋅ ⁽1 − ¹ 2 _{^^ ^^ ^^ ^^ ( ^^} ⁽ _^^ ⁾ _{, ^^} ⁽ _^^ ⁾ ₎ ⁾ _]

- - ^^ ( ^^) une valeur du pixel ^^ dans la troisième image, - SSIM (de l’anglais « structural similarity index measure », en français « mesure de l'indice de similarité structurelle ») une fonction qui prend en compte une structure locale, et - ^^ un facteur de pondération dépendant d’un type d’environnement routier. [0104] Le réseau de neurones convolutif est alors appris pour minimiser l’erreur photométrique précédemment définie. [0105] Ainsi, en sortie de l’étape 33, chaque pixel est défini en fonction de coordonnées principales (x,y) dans la première image et une profondeur prédite pour ce pixel. [0106] Dans une étape 34, les pixels de l’ensemble de pixels sont reprojetés dans la scène tridimensionnelle sous forme d’un ensemble de points en fonction des profondeurs prédites lors de l’étape 33, de la matrice intrinsèque K de la première caméra 11 et de paramètres extrinsèques T du système de vision stéréoscopique. [0107] Une telle reprojection se fait par exemple à partir de la formule suivante : [0108] [Math 5] [0109] ^^_{3 ^^} ⁽ ^^ _^^ ⁾ = ^^. ^^⁽ ^^ _^^| ^^⁻¹, ^^( ^^ _^^)⁾ [0110] avec : - ^^_{3 ^^}( ^^ _^^) les coordonnées dans la scène tridimensionnelle du point issu de la reprojection du pixel ^^ _^^ de la première image, - ^^ une matrice de déplacement entre une position de la première caméra 11 et une position de la deuxième caméra 12, - ^^ la matrice intrinsèque de la première caméra 11 associée à la projection d’un point de la scène tridimensionnelle dans une image acquise par la première caméra 11, - ∅ une fonction de reprojection dans la scène tridimensionnelle d’un pixel en fonction de sa profondeur, - ^^( ^^ _^^) est une profondeur du pixel ^^ _^^ de la première image prédite par le système de vision stéréoscopique . [0111] Ainsi, l’ensemble de points est situé dans la scène tridimensionnelle à des positions telles que pourrait les voir la deuxième caméra 12. [0112] Il est cependant possible que certains des points 22 projetés dans la scène tridimensionnelle se situent en dehors du champ de vision de la deuxième caméra 12. En effet, certaines parties de la scène tridimensionnelle ne sont pas visibles par les deux caméras 11, 12 à la fois. [0113] Dans une étape 35, un premier masque de visibilité associé aux pixels de l’ensemble de pixels est déterminé en fonction des coordonnées de points de l’ensemble de points. [0114] Un pixel de l’ensemble de pixels est défini comme non visible dans la deuxième image si les coordonnées du point 22 de l’ensemble de points associé au pixel le situent en dehors d’un champ de vision de la deuxième caméra 12 déterminé en fonction d’une largeur de la deuxième image et d’une distance focale de la deuxième caméra 12. [0115] Par exemple, il est possible qu’un pixel soit la projection d’un point 22 de la scène tridimensionnelle qui se situe dans le premier champ d’acquisition 13 que seule la première caméra 11 perçoit. Dans ce cas, le point 22 de la scène tridimensionnelle est en dehors du champ de de vision de la deuxième caméra 12. Il est alors nécessaire de détecter ce point 22 non visible pour la deuxième caméra 12 car celui-ci ne peut avoir de pixel associé dans la deuxième image. [0116] Le repère de coordonnées des points dans la scène tridimensionnelle est défini en fonction de l’orientation des caméras 11, 12. Ainsi, l’axe x du repère associé à la scène tridimensionnelle est parallèle à un axe défini par les positions des caméras 11, 12, les caméras étant placées sur cet axe, l’axe z du repère est l’axe focal de la deuxième caméra 12. [0117] Le principe est de comparer le rapport entre une abscisse ^^ _{^^3 ^^} d’un point de la scène tridimensionnelle et la profondeur du point de la scène tridimensionnelle au rapport entre la demi-largeur de la deuxième image et la distance focale de la deuxième caméra. [0118] Le premier masque de visibilité est obtenu par exemple à l’aide de la formule suivante : [0119] [Math 6] ^^ [0120] ⋃ _^^ ^^ _{^^3 ^^( ^^ ^^)} ^{^^/2} ^_^ ^^ ^^ ^^ ^^ ^^ ^^ _{^^( ^^ ^^)} > _^^ dans la scène tridimensionnelle issu de la reprojection du un axe des abscisses étant défini parallèlement à un axe

la première et la deuxième caméras 11, 12, - ^^( ^^ _^^) ladite profondeur du pixel ^^ _^^ de la première image prédite par le système de vision stéréoscopique, - ^^ la largeur de la deuxième image, et - ^^ la distance focale de la deuxième caméra 12. [0122] Ainsi, le premier masque de visibilité permet d’identifier les pixels de la première image pour lesquels les points 22 reprojetés dans la scène tridimensionnelle se situent en dehors du champ de vision de la deuxième caméra 12. [0123] Dans une étape 36, une troisième image est générée par projection de l’ensemble de points en fonction d’une matrice intrinsèque K’ de la deuxième caméra 12. [0124] Des coordonnées secondaires (i,j) dans la troisième image sont associées à chaque pixel de l’ensemble de pixels. [0125] La projection d’un point de la scène tridimensionnelle est réalisée par exemple à l’aide de la formule suivante : ^_{^ ^^ = ^^} ⁽ _^^′ ^[ _{^^3 ^^} ⁽ _{^^ ^^} ^)]) avec : - ^^ une fonction pour passer de coordonnées homogènes dans l’espace à trois dimensions à des coordonnées pixels en deux dimensions en supprimant une dimension d’un vecteur, - ^^′ la matrice intrinsèque de la deuxième caméra 12 associée à la projection d’un point de la scène tridimensionnelle dans une image acquise par la deuxième caméra 12, - ^^_{3 ^^} ⁽ ^^ _^^ ⁾ les coordonnées dans la scène tridimensionnelle du point issu de la reprojection du pixel ^^ _^^ de la première image. [0126] Lorsque cette opération est effectuée, chaque pixel de la première image est ainsi défini par ses coordonnées (x,y) dans la première image, par un indice de colonne d’arrivée i et par un indice de ligne d’arrivée j dans ladite troisième image. [0127] Dans une étape 37, un deuxième masque de visibilité associé aux pixels de l’ensemble de pixels est déterminé. [0128] Le principe est de déterminer les points de la scène tridimensionnelle qui sont masqués par d’autres points plus proches de la deuxième caméra 12, c’est-à-dire les pixels occlus associés à ces points. [0129] L’axe des abscisses de la première image est orienté positivement de gauche à droite. [0130] Pour chaque ligne de la première image rectifiée, un balayage des pixels est réalisé de gauche à droite selon un point de vue de la première caméra 11. Une matrice, telle que présentée dans la figure 5, est constituée, présentant des valeurs d’indices de colonne d’arrivée i pour chaque pixel de la ligne dont la colonne de départ est définie par ‘x’. [0131] En l’absence de pixels occlus, c’est-à-dire si tous les pixels d’une ligne ‘y’ de la première image rectifiée ont un pixel d’arrivée différent sur la ligne ‘j’ de la troisième image rectifiée, alors les indices ‘i’ de colonne d’arrivée sont répartis dans la matrice suivant une fonction monotone croissante. [0132] On appelle pixel irrégulier un pixel de la première image dont l’indice i’ de colonne d’arrivée ne suit pas la fonction monotone précédemment décrite. Un tel pixel irrégulier, placé à un rang n de la matrice, est alors détecté lorsque i’n < in-1. [0133] A la suite de la détection en position n d’un pixel irrégulier, l’ensemble des pixels de la ligne balayée masqués par ce pixel irrégulier est alors identifié, un pixel masqué ou occlus étant un pixel dont l’indice de colonne d’arrivée ik est supérieur ou égale à l’indice de colonne d’arrivée i’n du pixel irrégulier. Ainsi, un pixel à gauche du pixel irrégulier est occlus par le pixel irrégulier si ik≥i’n. [0134] Le deuxième masque de visibilité est ainsi défini comme l’union des pixels occlus précédemment identifiés. [0135] Par exemple, sur la figure 5, le pixel en sixième position (x(p) = 6) est un pixel irrégulier. En effet, son indice de colonne d’arrivée est égal à 3 alors que l’indice de colonne d’arrivée du pixel qui le précède (x(p) = 5) est égal à 5. [0136] Les pixels masqués ou occlus situés à gauche du pixel irrégulier sont alors identifiés, il s’agit des pixels en troisième, quatrième et cinquième position. En effet, leurs indices de colonne d’arrivée dans la troisième image sont respectivement supérieurs ou égaux à l’indice de colonne d’arrivée du pixel en sixième position dans la matrice. Ainsi, si V(p) représente la visibilité d’un pixel p dans la deuxième image, alors V(p)=0 pour les pixels masqués précédemment identifiés. [0137] A l’inverse V(p)=1 pour les pixels p de la ligne visibles dans la deuxième image. [0138] Dans une étape 38, un troisième masque de visibilité est déterminé comme étant l’association du premier masque de visibilité et du deuxième masque de visibilité. [0139] Ainsi, ce troisième masque de visibilité prend en considération l’ensemble des pixels de la première image associés à des points de la scène tridimensionnelle qui se situent en dehors du champ de vision de la deuxième caméra 12 et l’ensemble des pixels de la première image occlus. [0140] L’avantage d’une telle définition d’un masque de visibilité est de le déterminer sans calcul additionnel, des reprojections et projections étant déjà utilisées pour l’apprentissage et des profondeurs étant déjà prédites par le système de vision stéréoscopique. Cette solution permet également de se passer de calcul de flux optique souvent utilisé pour ce type d’application et nécessitant beaucoup de ressources pour des calculs. [0141] Cette définition d’un masque de visibilité permet d’identifier les pixels visibles dans la première image et non visibles dans la seconde image. [0142] L’utilisation de ce troisième masque de visibilité lors de l’apprentissage du réseau de neurones convolutif permet d’améliorer la pertinence de la définition de paramètres d’entrée de ce réseau de neurones convolutif, l’apprentissage est ainsi plus efficace. [0143] Si l’ADAS utilise des données d’entrée telles que les profondeurs déterminées par le système de vision stéréoscopique pour déterminer la distance entre une partie du véhicule 10, par exemple le pare-chocs avant, et un autre usager présent sur la route, l’ADAS est alors en mesure de déterminer si la profondeur prédite est fiable lorsque le pixel est bien visible dans les première et deuxième images. [0144] Bien entendu, la présente invention ne se limite pas aux exemples de réalisation décrits ci-avant mais s’étend à un procédé de détermination d’un masque de visibilité pour un système de vision embarqué dans un véhicule, qui inclurait des étapes secondaires sans pour cela sortir de la portée de la présente invention. Il en serait de même d’un dispositif configuré pour la mise en œuvre d’un tel procédé. [0145] La présente invention concerne également un véhicule, par exemple automobile ou plus généralement un véhicule autonome à moteur terrestre, comprenant le dispositif 4 de la figure 3. DESCRIPTION Title: Method and device for determining a visibility mask for an on-board vision system in a vehicle. Technical field [0001] The present invention claims priority from French application 2304517 filed on 05.05.2023, the content of which (text, drawings and claims) is incorporated herein by reference. The present invention relates to methods and devices for determining a visibility mask for an on-board vision system in a vehicle, for example in a motor vehicle. The present invention also relates to a method and device for controlling one or more ADAS systems on-board a vehicle from a determined visibility mask. Technological background [0002] Many modern vehicles are equipped with so-called ADAS (Advanced Driver-Assistance System) driver assistance systems. Such ADAS systems are passive and active safety systems designed to eliminate the element of human error in driving vehicles of all types. ADAS use advanced technologies to assist the driver while driving and improve their performance. ADAS use a combination of sensor technologies to perceive the environment around a vehicle and then provide information to the driver or act on certain vehicle systems. [0003] There are several levels of ADAS, such as rearview cameras and blind spot sensors, lane departure warning systems, adaptive cruise control, and automatic parking systems. [0004] ADAS embedded in a vehicle are supplied with data obtained from one or more embedded sensors such as, for example, cameras. These cameras make it possible in particular to detect and locate other road users or possible obstacles present around a vehicle in order, for example: - to adapt the lighting of the vehicle according to the presence of other users; - to automatically regulate the speed of the vehicle; - to act on the braking system in the event of a risk of impact with an object. [0005] The quality of the data emitted by a vision system therefore determines the proper functioning of the driving assistance peripherals using this data. [0006] Many vision systems perceive an environment around a vehicle from several images acquired by one or more cameras. When exploiting the images, occluded areas of the images which correspond to areas of the environment that are not present on all of the acquired images are defined. A visibility mask associated with an image then defines, for example, a filter for determining pixels associated with areas not found in other images. [0007] Solutions for detecting an occlusion, i.e. for determining a visibility mask, exist. [0008] A first solution presented by “Occlusion Aware Unsupervised Learning of Optical Flow” by Yang Wang, Yi Yang, Zhenheng Yang, Liang Zhao, Peng Wang and Wei Xu published on April 4, 2018 is based on reverse optical flow. For each pixel of a first image represented by its coordinates, the algorithm checks whether a pixel of a second image arrives at this pixel of the first image with the reverse optical flow by scanning all the pixels of the second image. This method can be used for both directions of optical flow to identify the occluded areas of the two images. [0009] A second solution is described in the document “Geometry-based Occlusion-Aware Unsupervised Stereo Matching for Autonomous Driving algorithm”. The detection of occluded areas is based on a geometric constraint: the occluded pixel and another pixel which hides it are projected into the same pixel of a reconstructed image. Summary of the present invention [0010] An object of the present invention is to solve at least one of the problems of the technological background described above. [0011] Another object of the present invention is to propose an alternative solution for determining a visibility mask for any vision system in order to improve the quality of the data from the camera(s) of the vision system. [0012] Another object of the present invention is to reduce the resources required for determining a visibility mask. [0013] According to a first aspect, the present invention relates to a method for determining visibility masks by a stereoscopic vision system on board a vehicle, the stereoscopic vision system comprising a set of cameras of at least two cameras arranged so as to each acquire an image of a three-dimensional scene from a different point of view, said second camera being located to the right of the first camera from a point of view of the first camera, the method being characterized in that it comprises the following steps: - receiving first and second data respectively representative of a first and second image acquired by respectively a first and second camera of the set of cameras at the same acquisition time instant; - predicting depths associated with a set of pixels of the first image by the stereoscopic vision system from a learned prediction model, each pixel of the first image having principal coordinates in the first image; - reprojection into the three-dimensional scene of the set of pixels in the form of a set of points as a function of the depths, of an intrinsic matrix of the first camera and of extrinsic parameters of the stereoscopic vision system; - determination of a first visibility mask associated with the pixels of the set of pixels as a function of the coordinates of points of the set of points, a pixel of the set of pixels being not visible in the second image if the coordinates of a point of said set of points associated with said pixel locate it outside a field of vision of the second camera determined as a function of a width of the second image and a focal length of the second camera; - generation of a third image by projection of the set of points as a function of an intrinsic matrix of the second camera, secondary coordinates in the third image being associated with each pixel of the set of pixels; - rectification of the first image from the intrinsic and extrinsic parameters of the two cameras, for each line of the first rectified image, scanning from left to right, according to a point of view of the first camera, arrival column index values and detection of a set of irregular pixels whose arrival column index does not follow a monotonic function representative of an evolution of arrival column indices as a function of a column index in the first rectified image, and for each irregular pixel of the set, identification of a set of occluded pixels in the line to the left of each irregular pixel whose arrival column index is greater than or equal to an arrival column index of each irregular pixel, a second visibility mask being determined as a union of the sets of occluded pixels; and - determination of a third visibility mask by association of the first visibility mask and the second visibility mask. [0014] According to a method variant, the reprojection of a pixel in the three-dimensional scene is carried out using the following formula: ^^ _{3 ^^} ( ^^ _^^ ) = ^^. ^^( ^^ _^^ | ^^ ⁻¹ , ^^( ^^ _^^ )) with:

the three-dimensional scene of the point resulting from the reprojection of the pixel ^^ _^^ of the first image, - ^^ a displacement matrix between a position of the first camera and a position of the second camera, - ^^ the intrinsic matrix of the first camera associated with the projection of a point of the three-dimensional scene in an image acquired by the first camera, - ∅ a reprojection function in the three-dimensional scene of a pixel according to of its depth, - ^^( ^^ _^^ ) is a depth of the pixel ^^ _^^ predicted by the stereoscopic vision system. [0015] According to another method variant, the projection of a point of the three-dimensional scene is carried out using the following formula: ^ _{^ ^^ = ^^} ⁽ _^^′ ^[ _{^^3 ^^} ⁽ _{^^ ^^} ^)]) with: - ^^ a function for going from homogeneous coordinates in three-dimensional space to pixel coordinates in two dimensions by removing one dimension of a vector, - ^^′ the intrinsic matrix of the second camera associated with the projection of a point of the three-dimensional scene in an image acquired by the second camera, - ^^ _{3 ^^} ⁽ ^^ _^^ ⁾ the coordinates in the three-dimensional scene of the point resulting from the reprojection of the pixel ^^ _^^ of the first image. [0016] According to yet another method variant, the first visibility mask is obtained using the following formula: ^^ ⋃ ^^ _^^ ^^ ^^ ^^ ^^ ^^ ^^ _{^^3 ^^( ^^ ^^)} ^{^^/2} ^ _{^ ^^( ^^ ^^)} ^> _^^

from the reprojection of the pixel ^^ _^^ of the first image, an abscissa axis being defined parallel to an axis along which the first and second cameras are located, - ^^( ^^ _^^ ) the depth of the pixel ^^ _^^ of the first image predicted by the stereoscopic vision system, - ^^ the width of the second image, and - ^^ the focal length of the second camera. [0017] According to yet another method variant, the depths are predicted by a convolutional neural network. [0018] According to an additional method variant, the convolutional neural network is trained to minimize a photometric error defined by the following loss function: _^^ ⁽ _^^ ⁾ _{= ∑ [} ⁽ _{1 − ^^} ⁾ _{⋅ | ^^} ⁽ _^^ ⁾ _{− ^^} ⁽ _^^ ⁾ _{| + ^^ ⋅ (1 −} ¹ _{^^ ^^ ^^ ^^ ( ^^} ⁽ _^^ ⁾ _{, ^^ ^^ ))]} _{^^ 2} ^{( )}

- ^^ ( ^^) a value of the pixel ^^ in the third image; - SSIM a function that takes into account a local structure; and - ^^ a weighting factor depending on a type of road environment. [0019] According to a second aspect, the present invention relates to a device for determining a visibility mask for a vision system embedded in a vehicle, the device comprising a memory associated with at least one processor configured for implementing the steps of the method according to the first aspect of the present invention. [0020] According to a third aspect, the present invention relates to a vehicle, for example of the automobile type, comprising a device as described above according to the second aspect of the present invention. [0021] According to a fourth aspect, the present invention relates to a computer program which comprises instructions adapted for executing the steps of the method according to the first aspect of the present invention, this in particular when the computer program is executed by at least one processor. [0022] Such a computer program may use any programming language and be in the form of source code, object code, or intermediate code between source code and object code, such as in a partially compiled form, or in any other desirable form. [0023] According to a fifth aspect, the present invention relates to a computer-readable recording medium on which is recorded a computer program comprising instructions for carrying out the steps of the method according to the first aspect of the present invention. [0024] On the one hand, the recording medium may be any entity or device capable of storing the program. For example, the medium may comprise means storage medium, such as a ROM memory, a CD-ROM or a microelectronic circuit type ROM memory, or a magnetic recording means or a hard disk. [0025] Furthermore, this recording medium may also be a transmissible medium such as an electrical or optical signal, such a signal being able to be conveyed via an electrical or optical cable, by conventional or hertzian radio or by self-directed laser beam or by other means. The computer program according to the present invention may in particular be downloaded from a network such as the Internet. [0026] Alternatively, the recording medium may be an integrated circuit in which the computer program is incorporated, the integrated circuit being adapted to execute or to be used in the execution of the method in question. [0027] Brief description of the figures [0028] Other features and advantages of the present invention will emerge from the description of the particular and non-limiting exemplary embodiments of the present invention below, with reference to the appended figures 1 to 5, in which: [0029] [Fig. 1] schematically illustrates a stereoscopic vision system equipping a vehicle, according to a particular and non-limiting exemplary embodiment of the present invention; [0030] [Fig. 2] schematically illustrates a stereoscopic vision system equipping a vehicle, according to a particular and non-limiting exemplary embodiment of the present invention; [0031] [Fig. 3] schematically illustrates a device configured for determining a visibility mask by a vision system embedded in the vehicle of figure 1, according to a particular and non-limiting exemplary embodiment of the present invention; [0032] [Fig.4] illustrates a flowchart of the different steps of a method for determining a visibility mask by a vision system on board the vehicle of FIG. 1, according to a particular and non-limiting exemplary embodiment of the present invention. [0033] [Fig.5] illustrates a matrix having different column indices for pixels of a row and a visibility criterion for an on-board vision system in the vehicle of FIG. 1, according to a particular and non-limiting exemplary embodiment of the present invention. [0034] Description of the exemplary embodiments [0035] A method and a device for determining a visibility mask for an on-board vision system in a vehicle will now be described in the following with joint reference to FIGS. 1 to 5. The same elements are identified with the same reference signs throughout the description which follows. [0036] According to a particular and non-limiting exemplary embodiment of the present invention, a method for determining a visibility mask for an on-board stereoscopic vision system in a vehicle is for example implemented by a computer of the on-board system of the vehicle controlling this vision system. [0037] The stereoscopic vision system comprises a set of cameras of at least two cameras arranged so as to each acquire an image of a three-dimensional scene from a different point of view, the second camera being located to the right of the point of view of said first camera. [0038] For this purpose, the method for determining a visibility mask by a stereoscopic vision system on board a vehicle comprises receiving first and second data respectively representative of a first and second image acquired from a different point of view by the first and second cameras at the same acquisition time instant. [0039] The method also comprises predicting depths associated with a set of pixels of the first image by the stereoscopic vision system from a learned prediction model, each pixel of the first image having principal coordinates in the first image, reprojecting the pixels of the first image into the three-dimensional scene in the form of a set of points as a function of the depths, an intrinsic matrix of the first camera and extrinsic parameters of the stereoscopic vision system. [0040] The method then determines a first visibility mask associated with the pixels of the set of pixels as a function of the coordinates of points of the set of points, a pixel of the set of pixels being not visible in the second image if the coordinates of a point of the set of points associated with the pixel locate it outside a field of vision of the second camera determined as a function of a width of the second image and a focal length of the second camera. [0041] The method then comprises generating a third image by projection of the set of points as a function of an intrinsic matrix of the second camera, secondary coordinates in the third image being associated with each pixel of the set of pixels. The generation of this third image then allows the determination of a second visibility mask associated with the pixels of the set of pixels. For each line of the first rectified image, arrival column index values are scanned and a set of irregular pixels is detected. For each irregular pixel in the set, a set of occluded pixels is identified in the line to the left of each detected irregular pixel. A second visibility mask is then determined as a union of the sets of occluded pixels. [0042] A third visibility mask is then determined by associating the first visibility mask and the second visibility mask. [0043] FIG. 1 schematically illustrates a stereoscopic vision system equipping a vehicle, according to a particular and non-limiting exemplary embodiment of the present invention. [0044] In this example, the vehicle 10 corresponds to a vehicle with a thermal engine, an electric motor(s) or a hybrid vehicle with a thermal engine and one or more electric motors. The vehicle 10 thus corresponds, for example, to a land vehicle such as an automobile, a truck, a bus, a motorcycle. Finally, the vehicle 10 corresponds to an autonomous or non-autonomous vehicle, i.e. a vehicle traveling according to a determined level of autonomy or under the total supervision of the driver. [0045] The vehicle 10 advantageously comprises several on-board cameras 11, 12, each configured to acquire images of a three-dimensional scene in the environment of the vehicle 10. This set of cameras 11, 12 forms the stereoscopic vision system. Two cameras 11, 12 are illustrated in FIG. 1. The present invention is however not limited to a stereoscopic vision system comprising two cameras but extends to any vision system comprising two or more cameras, for example 3, 4 or 5 cameras. [0046] The cameras 11, 12 have known intrinsic parameters. These parameters consist in particular of: - focal length f1 of the first camera 11; - focal length f2 of the second camera 12; - distortions which are due to imperfections in the optical system of each camera; - direction C1 of the optical axis of the first camera 11; - direction C2 of the optical axis of the second camera 12; - respective resolutions of the cameras 11, 12. [0047] The intrinsic parameters characterize the transformation which associates, for an image point, the camera coordinates with the pixel coordinates, in each camera. These parameters do not change if the camera is moved. [0048] The distortions, which are due to imperfections in the optical system such as defects in the shape and positioning of the lenses of the cameras, will deflect the light beams and therefore induce a positioning deviation for the projected point compared to an ideal model. It is then possible to complete the camera model by introducing the three distortions which generate the most effects, namely the radial, decentering and prismatic distortions, induced by defects in curvature, parallelism of the lenses and coaxiality of the optical axes. In this example, the cameras are assumed to be perfect, meaning that distortions are either not taken into account or their correction is processed at the time of image acquisition. [0049] These cameras 11, 12 are arranged so as to each acquire an image of a three-dimensional scene from a different point of view, the second camera 12 being located to the right of the point of view of said first camera 11, the first point of view is for example located on or in the left rearview mirror of the vehicle 10 or at the top of the windshield of the vehicle 10, the second point of view is for example located on or in the right rearview mirror of the vehicle 10 or at the top of the windshield of the vehicle 10. In the case where two cameras are located at the top of the windshield of the vehicle, they are then placed at a certain distance. [0050] A first reference point is associated with the first camera 11: - the direction of the y axis is defined by the position of the second camera 11, so as to place the second camera 12 on the y axis of the first camera 11. The distance B separating the two cameras 11, 12 is called the reference base (in English "baseline") and the direction separating the two cameras 11, 12 is that of the y axis; - the direction of the x axis is defined orthogonal to that of the y axis and orthogonal to that of the optical axis C1 of the first camera 11; - the direction of the z axis is defined orthogonal to the directions of the x and y axes. The three axes x, y and z thus form an orthonormal reference point. [0051] The extrinsic parameters related to the position of the cameras 11, 12 are the following parameters: - 3 translations in the x, y and z directions: Tx, Ty and Tz constituting the translation vector T; and - 3 rotations around the x, y and z axes: Rx, Ry and Rz, constituting the rotation matrix R. [0052] A main constraint of the stereoscopic vision system used in automobiles is, for example, the large distance between the two cameras. Indeed, to be able to cover a measurement range of 200 meters, the "baseline" must reach 60 cm for the cameras commonly used in this field. [0053] The two cameras 11, 12 acquire images of a three-dimensional scene located in front of the vehicle 10, the first camera 11 covering only a first acquisition field 13, the second camera 12 covering only a second acquisition field acquisition field 14 and the two cameras 11, 12 both covering a third acquisition field 15. The first and third acquisition fields 13, 15 thus allow a monoscopic vision of the three-dimensional scene by the first camera 11, the second and third acquisition fields 14, 15 allow a monoscopic vision of the three-dimensional scene by the second camera 12 and the third acquisition field 15 allows a stereoscopic vision of the three-dimensional scene by the stereoscopic vision system composed of the two cameras 11, 12. [0054] An obstacle 18 is placed in the acquisition field of the cameras, for example in the third acquisition field 15. The presence of the obstacle 18 defines an occlusion field for the stereoscopic vision system composed here of the three fields 16, 17 and 19. [0055] Among these three fields, the field 16 is visible from the second camera 12. The part of the three-dimensional scene present in this field 16 is therefore observable using the monoscopic vision system composed of the second camera 12. [0056] The field 17 is visible from the first camera 11. The part of the three-dimensional scene present in this field 17 is therefore observable using the monoscopic vision system composed of the second camera 12. [0057] Finally, the field 19 is not visible from any of the cameras. The part of the three-dimensional scene present in this field 19 is therefore not observable. [0058] The directions C1, C2 of the optical axes are representative of an orientation of the field of vision of each camera 11, 12. [0059] It is obvious that it is possible to use such a stereoscopic vision system to take images of three-dimensional scenes located on the sides or behind the vehicle 10 by equipping it with cameras placed and oriented differently. [0060] The images acquired by the cameras 11, 12 at a given acquisition time instant are presented in the form of data representing pixels characterized by: - coordinates in each image; and - data relating to the colors and brightness of the objects of the observed three-dimensional scene in the form, for example, of colorimetric coordinates. RGB (from the English "Red Green Blue", in French "Rouge Vert Bleu") or TSL (Tone, Saturation, Brightness). [0061] The images acquired by the cameras 11, 12 represent views of the same three-dimensional scene taken from different viewpoints, the positions of the cameras being distinct. On this three-dimensional scene are for example: - buildings; - road infrastructures; - other stationary users, for example a parked vehicle; and/or - other mobile users, for example another vehicle, a cyclist or a pedestrian in motion. [0062] These images are sent to a computer of a device equipping the vehicle 10 or stored in a memory of a device accessible to a computer of a device equipping the vehicle 10. [0063] FIG. 2 schematically illustrates a stereoscopic vision system equipping a vehicle, according to a particular and non-limiting exemplary embodiment of the present invention. [0064] Points 20, 21, 22 of the three-dimensional scene are visible from the point of view of the first camera 11. [0065] Points 21 are also visible from the point of view of the second camera 12. [0066] Points 20 are occluded from the point of view of the second camera 12 because they are masked by points 21 located on the same axes in the field of vision of the second camera 12. [0067] Points 22 are located outside the field of vision of the second camera 12. [0068] Thus, during the acquisition of the first and second images by the first camera 11 and the second camera 12 respectively, pixels associated with the visible points 20, 21, 22 from the point of view of the first camera 11 will be present in the first image, whereas only pixels associated with the points 21 visible from the point of view of the second camera 12 will be present in the first image. will be present in the second image. [0069] The pixels associated with the points 20 visible from the point of view of the first camera 11 and not visible from the point of view of the second camera 12 are hereinafter called occluded pixels. [0070] FIG. 3 schematically illustrates a device 4 configured for determining a visibility mask for a vision system embedded in a vehicle 10, according to a particular and non-limiting exemplary embodiment of the present invention. The device 4 corresponds for example to a device embedded in the first vehicle 10, for example a computer. [0071] The device 4 is for example configured for implementing the operations and/or steps described with respect to FIGS. 1, 2 and 4. Examples of such a device 4 include, but are not limited to, embedded electronic equipment such as an on-board computer of a vehicle, an electronic computer such as an ECU (“Electronic Control Unit”), a smartphone, a tablet, a laptop. The elements of the device 4, individually or in combination, can be integrated into a single integrated circuit, into several integrated circuits, and/or into discrete components. The device 4 can be produced in the form of electronic circuits or software (or computer) modules or even a combination of electronic circuits and software modules. [0072] The device 4 comprises one (or more) processor(s) 40 configured to execute instructions for carrying out the steps of the method and/or for executing the instructions of the software(s) embedded in the device 4. The processor 40 can include integrated memory, an input/output interface, and various circuits known to those skilled in the art. The device 4 further comprises at least one memory 41 corresponding for example to a volatile and/or non-volatile memory and/or comprises a memory storage device which can comprise volatile and/or non-volatile memory, such as EEPROM, ROM, PROM, RAM, DRAM, SRAM, flash, magnetic or optical disk. [0073] The computer code of the embedded software(s) comprising the instructions to be loaded and executed by the processor is for example stored in the memory 41. [0074] According to various particular and non-limiting exemplary embodiments, the device 4 is coupled in communication with other similar devices or systems (for example other computers) and/or with communication devices, for example a TCU (from the English “Telematic Control Unit” or in French “Telematic Control Unit”), for example via a communication bus or through dedicated input/output ports. [0075] According to a particular and non-limiting exemplary embodiment, the device 4 comprises a block 42 of interface elements for communicating with external devices. The interface elements of the block 42 comprise one or more of the following interfaces: - RF radio frequency interface, for example of the Wi-Fi® type (according to IEEE 802.11), for example in the 2.4 or 5 GHz frequency bands, or of the Bluetooth® type (according to IEEE 802.15.1), in the 2.4 GHz frequency band, or of the Sigfox type using UBN (Ultra Narrow Band) radio technology, or LoRa in the 868 MHz frequency band, LTE (Long-Term Evolution), LTE-Advanced; - USB interface (Universal Serial Bus); HDMI interface (High Definition Multimedia Interface); - LIN interface (from the English "Local Interconnect Network"). [0076] According to another particular and non-limiting exemplary embodiment, the device 4 comprises a communication interface 43 which makes it possible to establish communication with other devices (such as other computers of the embedded system) via a communication channel 430. The communication interface 43 corresponds for example to a transmitter configured to transmit and receive information and/or data via the communication channel 430. The communication interface 43 corresponds for example to a wired network of the CAN type (from the English "Controller Area Network"), CAN FD ... Flexible Data-Rate” or in French “Flexible Data Rate Controller Network”), FlexRay (standardized by ISO 17458) or Ethernet (standardized by ISO/IEC 802-3). [0077] According to a particular and non-limiting exemplary embodiment, the device 4 can provide output signals to one or more external devices, such as a display screen 440, touch-sensitive or not, one or more speakers 450 and/or other peripherals 460 (projection system) via the output interfaces 44, 45, 46 respectively. According to a variant, one or other of the external devices is integrated into the device 4. [0078] FIG. 4 illustrates a flowchart of the different steps of a method 2 for determining a visibility mask for a vision system embedded in the vehicle of FIG. 1, the vision system comprising at least one camera 11 arranged so as to acquire an image of a three-dimensional scene according to a determined point of view, according to a particular and non-limiting exemplary embodiment of the present invention. [0079] The method is for example implemented by one or more processors of one or more computers embedded in the vehicle 10, for example by a computer controlling the vision system. [0080] In a first step 31, the computer receives first data representative of a first image acquired by a first camera 11 at a given time instant. [0081] In a second step 32, the computer receives second data representative of a second image acquired by a second camera 12 at the same given time instant. [0082] The two images received correspond to two views of the same three-dimensional scene taking place around the vehicle 10 taken from two different viewpoints at the same given time instant. [0083] In order to facilitate the analysis of the two images received, the first and second images are rectified according to a method known to those skilled in the art. The rectification of the images is made using the intrinsic and extrinsic parameters of the cameras. Such a method is described, for example, in "Projective Rectification of Non-Calibrated Infrared Stereo Images with Global Accounting for Distortion Minimization" by Benoit Ducarouge, Thierry Sentenac, Florian Bugarin and Michel Devy of July 16, 2009. [0084] The rectification method consists of reorienting the epipolar lines so that they are parallel to the horizontal axis of the image. This method is described by a transformation that projects the epipoles to infinity and whose corresponding points are necessarily on the same ordinate. [0085] A rectification algorithm consists, for example, of 4 steps: - Rotate (virtually) the first camera 11 so that the epipole goes to infinity along the horizontal axis of the reference frame associated with it; - Apply the same rotation to the second camera 12 to end up in the initial geometric configuration; - Rotate the second camera by the rotation associated with the rotation matrix 'R', corresponding to the extrinsic parameter of the initial stereoscopic vision system; - Adjust the scale in the two camera frames. [0086] It should be noted that rectification simplifies the matching of pixels of stereo images, i.e. obtained by a stereoscopic vision system. The pixel corresponding in the second image to a pixel of the first image (and vice versa) is positioned on the same line. From the knowledge of the epipolar geometry and therefore of a fundamental matrix of the stereo system, the objective is then to determine a pair of projective transformations, called homographies, which reorient the epipolar projections parallel to the lines of the images, therefore to the horizontal axis of the rectified cameras. [0087] In a step 33, depths associated with a set of pixels of the first image are predicted by the stereoscopic vision system from a learned prediction model. [0088] Such self-supervised learning, i.e. not requiring external intervention or use of annotated data, is for example achieved by minimizing the photometric error calculated during image reconstructions. [0089] Disparities are determined from obtaining first and second images from the first camera 11 and the second camera 12 at the same time instant, the disparities being defined by the following function: [0090] [Math 1] [0091] ^^ _^ ^{^} _^ ^{^} _^^ = ^^ _^ ^{^} ^ ^{^} − ^^( ^^ _^^ ) [0092] with: - ^^ _^ ^{^} _^ ^{^} _^^ the abscissa of a pixel in the second image, - ^^ _^ ^{^} _^ ^{^} the abscissa of a pixel in the first image, and - ^^( ^^ _^^ ) a disparity determined for a pixel ^^ _^^ of the first image. [0093] From the previously determined disparities, depths are calculated for the pixels of the first image: [0094] [Math 2] [0095] ^^ ( ^^ ) = ^^ × ^{^^} ^ _{^ ^^ ^^ 1 ^^( ^^ ^^)} [0096]

- _{^^ ^^} of the pixel ^^ _^^ of the first image predicted by the stereoscopic vision system, - ^^( ^^ _^^ ) a disparity determined for a pixel ^^ _^^ of the first image, and - ^^ ₁ the focal length of the first camera 11. [0097] A third image is reconstructed from the first image and the depths previously calculated via the following formula: [0098] [Math 3] [0099] ^^ _^^ = ^^( ^^′[ ^^. ^^( ^^ _^^ | ^^ ⁻¹ , ^^( ^^ _^^ ))]) with:

homogeneous coordinates in three-dimensional space dimensions to two-dimensional pixel coordinates by removing one dimension of a vector, - ^^ the intrinsic matrix of the first camera 11 associated with the projection of a point of the three-dimensional scene into an image acquired by the first camera 11, - ^^′ the intrinsic matrix of the second camera 12 associated with the projection of a point of the three-dimensional scene into an image obtained by the second camera 12, - ^^ a displacement matrix between a position of the first camera and a position of the second camera, - ∅ a reprojection function in the three-dimensional scene of a pixel as a function of its depth, and - ^^( ^^ _^^ ) is a depth of the pixel ^^ _^^ of the first image predicted by the stereoscopic vision system. [0100] The reconstructed image is then compared to the second image to determine a photometric error: [0101] [Math 4] [0102] ^^ ⁽ ^^ ⁾ = ^∑ _^^ [ ⁽ 1 − ^^ ⁾ ⋅ | ^^ ⁽ ^^ ⁾ − ^^ ⁽ ^^ ⁾ | + ^^ ⋅ ⁽ 1 − ¹ 2 _{^^ ^^ ^^ ^^ ( ^^} ⁽ _^^ ⁾ _{, ^^} ⁽ _^^ ⁾ ₎ ⁾ _]

- - ^^ ( ^^) a value of the pixel ^^ in the third image, - SSIM (from the English "structural similarity index measure") a function that takes into account a local structure, and - ^^ a weighting factor depending on a type of road environment. [0104] The convolutional neural network is then learned to minimize the photometric error previously defined. [0105] Thus, at the output of step 33, each pixel is defined according to principal coordinates (x,y) in the first image and a predicted depth for this pixel. [0106] In a step 34, the pixels of the set of pixels are reprojected into the three-dimensional scene in the form of a set of points according to the depths predicted during step 33, the intrinsic matrix K of the first camera 11 and extrinsic parameters T of the stereoscopic vision system. [0107] Such a reprojection is done for example from the following formula: [0108] [Math 5] [0109] ^^ _{3 ^^} ⁽ ^^ _^^ ⁾ = ^^. ^^ ⁽ ^^ _^^ | ^^ ⁻¹ , ^^( ^^ _^^ ) ⁾ [0110] with: - ^^ _{3 ^^} ( ^^ _^^ ) the coordinates in the three-dimensional scene of the point resulting from the reprojection of the pixel ^^ _^^ of the first image, - ^^ a displacement matrix between a position of the first camera 11 and a position of the second camera 12, - ^^ the intrinsic matrix of the first camera 11 associated with the projection of a point of the three-dimensional scene in an image acquired by the first camera 11, - ∅ a reprojection function in the three-dimensional scene of a pixel as a function of its depth, - ^^( ^^ _^^ ) is a depth of the pixel ^^ _^^ of the first image predicted by the stereoscopic vision system. [0111] Thus, the set of points is located in the three-dimensional scene at positions such that the second camera 12 could see them. [0112] It is however possible that some of the points 22 projected in the three-dimensional scene are located outside the field of vision of the second camera 12. Indeed, some parts of the three-dimensional scene are not visible by the two cameras 11, 12 at the same time. [0113] In a step 35, a first visibility mask associated with the pixels of the set of pixels is determined according to the coordinates of points of the set of points. [0114] A pixel of the set of pixels is defined as not visible in the second image if the coordinates of the point 22 of the set of points associated with the pixel locate it in outside a field of vision of the second camera 12 determined as a function of a width of the second image and a focal length of the second camera 12. [0115] For example, it is possible that a pixel is the projection of a point 22 of the three-dimensional scene which is located in the first acquisition field 13 that only the first camera 11 perceives. In this case, the point 22 of the three-dimensional scene is outside the field of vision of the second camera 12. It is then necessary to detect this point 22 not visible to the second camera 12 because it cannot have an associated pixel in the second image. [0116] The coordinate reference of the points in the three-dimensional scene is defined as a function of the orientation of the cameras 11, 12. Thus, the x axis of the reference associated with the three-dimensional scene is parallel to an axis defined by the positions of the cameras 11, 12, the cameras being placed on this axis, the z axis of the reference is the focal axis of the second camera 12. [0117] The principle is to compare the ratio between an abscissa ^^ _{^^3 ^^} of a point of the three-dimensional scene and the depth of the point of the three-dimensional scene to the ratio between the half-width of the second image and the focal length of the second camera. [0118] The first visibility mask is obtained for example using the following formula: [0119] [Math 6] ^^ [0120] ⋃ _^^ ^^ _{^^3 ^^( ^^ ^^)} ^{^^/2} ^ _^ ^^ ^^ ^^ ^^ ^^ ^^ _{^^( ^^ ^^)} > _^^ in the three-dimensional scene resulting from the reprojection of the an abscissa axis being defined parallel to an axis

the first and second cameras 11, 12, - ^^( ^^ _^^ ) said depth of the pixel ^^ _^^ of the first image predicted by the stereoscopic vision system, - ^^ the width of the second image, and - ^^ the focal length of the second camera 12. [0122] Thus, the first visibility mask makes it possible to identify the pixels of the first image for which the points 22 reprojected in the three-dimensional scene are located outside the field of vision of the second camera 12. [0123] In a step 36, a third image is generated by projection of the set of points according to an intrinsic matrix K' of the second camera 12. [0124] Secondary coordinates (i,j) in the third image are associated with each pixel of the set of pixels. [0125] The projection of a point of the three-dimensional scene is carried out for example using the following formula: ^ _{^ ^^ = ^^} ⁽ _^^′ ^[ _{^^3 ^^} ⁽ _{^^ ^^} ^)]) with: - ^^ a function for going from homogeneous coordinates in three-dimensional space to pixel coordinates in two dimensions by removing one dimension of a vector, - ^^′ the intrinsic matrix of the second camera 12 associated with the projection of a point of the three-dimensional scene in an image acquired by the second camera 12, - ^^ _{3 ^^} ⁽ ^^ _^^ ⁾ the coordinates in the three-dimensional scene of the point resulting from the reprojection of the pixel ^^ _^^ of the first image. [0126] When this operation is carried out, each pixel of the first image is thus defined by its coordinates (x,y) in the first image, by an arrival column index i and by an arrival line index j in said third image. [0127] In a step 37, a second visibility mask associated with the pixels of the set of pixels is determined. [0128] The principle is to determine the points of the three-dimensional scene which are masked by other points closer to the second camera 12, i.e. the occluded pixels associated with these points. [0129] The abscissa axis of the first image is oriented positively from left to right. [0130] For each line of the first rectified image, a scan of the pixels is carried out from left to right according to a point of view of the first camera 11. A matrix, such as presented in Figure 5, is constituted, having arrival column index values i for each pixel of the line whose starting column is defined by 'x'. [0131] In the absence of occluded pixels, that is to say if all the pixels of a line 'y' of the first rectified image have a different arrival pixel on the line 'j' of the third rectified image, then the arrival column indices 'i' are distributed in the matrix according to an increasing monotonic function. [0132] An irregular pixel is a pixel of the first image whose arrival column index i' does not follow the monotonic function previously described. Such an irregular pixel, placed at a rank n of the matrix, is then detected when i'n < in-1. [0133] Following the detection at position n of an irregular pixel, all of the pixels of the scanned line masked by this irregular pixel are then identified, a masked or occluded pixel being a pixel whose arrival column index ik is greater than or equal to the arrival column index i'n of the irregular pixel. Thus, a pixel to the left of the irregular pixel is occluded by the irregular pixel if ik≥i'n. [0134] The second visibility mask is thus defined as the union of the occluded pixels previously identified. [0135] For example, in FIG. 5, the pixel in sixth position (x(p) = 6) is an irregular pixel. Indeed, its arrival column index is equal to 3 while the arrival column index of the pixel preceding it (x(p) = 5) is equal to 5. [0136] The masked or occluded pixels located to the left of the irregular pixel are then identified, these are the pixels in third, fourth and fifth position. Indeed, their arrival column indices in the third image are respectively greater than or equal to the arrival column index of the pixel in sixth position in the matrix. Thus, if V(p) represents the visibility of a pixel p in the second image, then V(p)=0 for the previously identified masked pixels. [0137] Conversely V(p)=1 for the pixels p of the line visible in the second image. [0138] In a step 38, a third visibility mask is determined as being the association of the first visibility mask and the second visibility mask. [0139] Thus, this third visibility mask takes into consideration all the pixels of the first image associated with points of the three-dimensional scene which are located outside the field of vision of the second camera 12 and all the pixels of the first occluded image. [0140] The advantage of such a definition of a visibility mask is to determine it without additional calculation, reprojections and projections already being used for learning and depths already being predicted by the stereoscopic vision system. This solution also makes it possible to do without optical flow calculation often used for this type of application and requiring a lot of resources for calculations. [0141] This definition of a visibility mask makes it possible to identify the pixels visible in the first image and not visible in the second image. [0142] The use of this third visibility mask during the training of the convolutional neural network makes it possible to improve the relevance of the definition of input parameters of this convolutional neural network, the training is thus more efficient. [0143] If the ADAS uses input data such as the depths determined by the stereoscopic vision system to determine the distance between a part of the vehicle 10, for example the front bumper, and another user present on the road, the ADAS is then able to determine whether the predicted depth is reliable when the pixel is clearly visible in the first and second images. [0144] Of course, the present invention is not limited to the exemplary embodiments described above but extends to a method for determining a visibility mask for a vision system embedded in a vehicle, which would include secondary steps without thereby departing from the scope of the present invention. The same would apply to a device configured for implementing such a method. [0145] The present invention also relates to a vehicle, for example an automobile or more generally an autonomous land-powered vehicle, comprising the device 4 of FIG. 3.

Claims

CLAIMS 1. Method for determining visibility masks by a stereoscopic vision system embedded in a vehicle (10), the stereoscopic vision system comprising a camera set of at least two cameras (11, 12) arranged so as to each acquire an image of a three-dimensional scene from a different point of view, said second camera (12) being located to the right of said first camera (11) from a point of view of said first camera (11), said method being characterized in that it comprises the following steps: - reception (31, 32) of first and second data respectively representative of a first and second image acquired by respectively a first and second camera (11, 12) of said camera set at the same acquisition time instant; - prediction (33) of depths associated with a set of pixels of the first image by said stereoscopic vision system from a learned prediction model, each pixel of the first image having principal coordinates (x, y) in the first image; - reprojecting (34) into the three-dimensional scene said set of pixels in the form of a set of points as a function of said depths, of an intrinsic matrix of the first camera (11) and of extrinsic parameters of said stereoscopic vision system; - determining (35) a first visibility mask associated with the pixels of said set of pixels as a function of the coordinates of points of said set of points, a pixel of said set of pixels being not visible in said second image if the coordinates of a point of said set of points associated with said pixel locate it outside a field of vision of said second camera (12) determined as a function of a width of the second image and a focal length of the second camera (12); - generating (36) a third image by projecting said set of points as a function of an intrinsic matrix of said second camera (12), of the arrival column indices (i) and of the arrival line indices (j) in said third image being associated with each pixel of said set of pixels; - rectification of the first image from the intrinsic and extrinsic parameters of the two cameras, for each line of said first rectified image, scanning from left to right, according to a point of view of the first camera (11), arrival column index values (i), constitution of a matrix having arrival column index values (i) as a function of a column index (x) in the first rectified image, and, from said matrix, detection of a set of irregular pixels and for each irregular pixel of its arrival column index (i'), and for each irregular pixel of said set, identification of a set of occluded pixels in said line to the left of said each irregular pixel whose arrival column index (i) is greater than or equal to an arrival column index (i') of said each irregular pixel, a second visibility mask being determined (37) as a union of said sets of occluded pixels; and - determining (38) a third visibility mask by associating said first visibility mask and said second visibility mask.

2. Method according to claim 1, for which the reprojection of a pixel in the three-dimensional scene is carried out using the following formula: ^^ _{3 ^^} ( ^^ _^^ ) = ^^. ^^( ^^ _^^ | ^^ ⁻¹ , ^^( ^^ _^^ )) with: in the three-dimensional scene of the point resulting from the reprojection of the pixel ^^ _^^ of the first image, - ^^ a displacement matrix between a position of the first camera (11) and a position of the second camera (12), - ^^ the intrinsic matrix of the first camera (11) associated with a projection of a point of the three-dimensional scene in an image acquired by the first camera (11), - ^^ a reprojection function in the three-dimensional scene of a pixel as a function of its depth, - ^^( ^^ _^^ ) is a depth of the pixel ^^ _^^ of the first image predicted by the stereoscopic vision system.

3. Method according to one of claims 1 to 2, for which the projection of a point of the three-dimensional scene is carried out using the following formula: ^ _{^ ^^ = ^^} ⁽ _^^′ ^[ _{^^3 ^^} ⁽ _{^^ ^^} ^)]) with: - ^^ a function for going from homogeneous coordinates in three-dimensional space to pixel coordinates in two dimensions by removing one dimension of a vector, - ^^′ the intrinsic matrix of the second camera (12) associated with a projection of a point of the three-dimensional scene in an image acquired by the second camera (12), - ^^ _{3 ^^} ⁽ ^^ _^^ ⁾ the coordinates in the three-dimensional scene of the point resulting from the reprojection of the pixel ^^ _^^ of the first image.

4. Method according to one of claims 1 to 3, for which said first visibility mask is obtained using the following formula: ^^ ⋃ ^^ _{^^3 ^^( ^^ ^^)} _^^ ^^ ^^ ^^ ^^ ^^ ^^ ^{^^/2} ^ _{^ ^^( ^^ ^^)} ^> _^^

resulting from the reprojection of the pixel ^^ _^^ of the first image, an abscissa axis being defined parallel to an axis along which the first and second cameras (11, 12) are located, - ^^( ^^ _^^ ) said depth of the pixel ^^ _^^ of the first image predicted by the stereoscopic vision system, - ^^ the width of the second image, and - ^^ the focal length of the second camera (12).

5. Method according to one of claims 1 to 4, for which said depths are predicted by a convolutional neural network.

6. The method of claim 5, wherein the convolutional neural network is trained to minimize a photometric error defined by the following loss function: _^^ ⁽ _^^ ⁾ _{= ∑ [} ⁽ _{1 − ^^} ⁾ _{⋅ | ^^} ⁽ _{^^ − ^^ ^^ | + ^^ ⋅ (1 −} ¹ _{^^ ^^ ^^ ^^ ( ^^ ^^ , ^^ ^^ ))]} _^^ ^{) ( )} ₂ ^{( ) ( )}

- ^^ ( ^^) a value of the pixel ^^ in the third image; - SSIM a function that takes into account a local structure; and - ^^ a weighting factor depending on a type of road environment.

7. Computer program comprising instructions for implementing the method according to any one of the preceding claims, when these instructions are executed by a processor.

8. Device (4) for determining a visibility mask for a vision system on board a vehicle (10), said device (4) comprising a memory (41) associated with at least one processor (40) configured for implementing the steps of the method according to any one of claims 1 to 6.

9. System for determining a visibility mask for a vision system on board a vehicle (10) comprising at least two cameras (11, 12) and a device according to claim 8.

10. Vehicle (10) comprising the device (4) according to claim 8 or the system according to claim 9.