NL2021481B1

NL2021481B1 - A method for automatically annotating and identifying a living being or an object with an identifier, such as RFID, and computer vision.

Info

Publication number: NL2021481B1
Application number: NL2021481A
Authority: NL
Inventors: Jean Baptist Van Oldenborgh Marc
Original assignee: Kepler Vision Tech Bv
Priority date: 2018-08-17
Filing date: 2018-08-17
Publication date: 2020-02-24

Abstract

The invention relates to a method for training a machine learning model to identify a subject having at least one machine readable identifier providing a subject ID, said method comprising: - providing a computer vision system with an image capturing system comprising at least one image capturing device, and a reader system comprising at least one reader for reading said at least one machine readable identifier; - defining said machine learning model in said computer vision system; - capturing an image using said image capturing system, said image showing said subject, - reading said subject ID using said reader system when capturing said image, and linking said subject ID with said image, said linking providing said image with a linked subject ID, resulting in at least one annotated image, - capturing at least one further image showing said subject, linking said linked subject ID to said at least one further image providing at least one further annotated image, and - subjecting said annotated image and said at least one further annotated image to said machine learning model for training said machine learning model.

Description

A method for automatically annotating and identifying a living being or an object with an identifier, such as RFID, and computer vision.

Field of the invention

The invention relates to a method, device, and computer program product for training a machine learning model to identify a subject having at least one machine readable identifier providing a subject ID.

Background of the invention

Artificial intelligence (Al) is developing rapidly and Al applications are supporting or will support all industries including the aerospace industry, agriculture, chemical industry, computer industry, construction industry, defense industry, education industry, energy industry, entertainment industry, financial services industry, food industry, health care industry, hospitality industry, information industry, manufacturing, mass media, mining, telecommunication industry, transport industry, water industry and direct selling industry.

Human-machine communication becomes more and more important. Machines (such as computers, smartphones, tablets and robots) are penetrating society rapidly.

Computer vision is an area of Al wherein machine learning is used to classify living beings and objects in images. Training a machine learning model for computer vision involves providing a training set with annotated images. Often a large number of images need to be annotated manually to establish a computer vision system with sufficient accuracy. Automatic annotation, instead of manual annotation, of living beings and objects in images can reduces the time and costs of annotation dramatically.

In “Automatic Image Annotation via Label Transfer in the Semantic Space’’, May 2016, by Tiberio Uricchio et al. (https://arxiv.org/abs/1605.04770) according to its abstract describes “Automatic image annotation is among the fundamental problems in computer vision and pattern recognition, and it is becoming increasingly important in order to develop algorithms that are able to search and browse large-scale image collections. In this paper, we propose a label propagation framework based on Kernel Canonical Correlation Analysis (KCCA), which builds a latent semantic space where correlation of visual and textual features are well preserved into a semantic embedding.

The proposed approach is robust and can work either when the training set is well annotated by experts, as well as when it is noisy such as in the case of user-generated tags in social media. We report extensive results on four popular datasets. Our results show that our KCCA-based framework can be applied to several state-of-the-art label transfer methods to obtain significant improvements. Our approach works even with the noisy tags of social users, provided that appropriate denoising is performed. Experiments on a large scale setting show that our method can provide some benefits even when the semantic space is estimated on a subset of training images.”

US20070086626, with title “Individual identity authentication systems”, according to its abstract describes “A single image from a camera (14) is captured of an individual (40) seeking entry through a door held by a door latch (24). An image processor (16) looks for and locates a tag (42) worn by the individual (40) in the image and reads an identification (ID) code from the tag (42). A comparator (20) compares this ID code with ID codes in an identification database (22) to find a match. Once a match of ID codes is found, the image processor (16) looks for and locates a face (44) of the individual (40) in the image and extracts facial features from the face (44). The comparator (20) compares the extracted facial features with facial features associated with the matched ID code, from the identification database (22), to find a match. Once there is a match of facial features, the door latch (24) is released.”

In “Automatic image annotation and retrieval using cross-media relevance model”, July 2003, by J. Jeon et al. (http://hpds.ee.kuas.edu.tw/download/parallel_processing/97/97present/20081226/Aut omatic%20Image%20Annotation%20and%20Retrieval%20using.pdf) according to its abstract describes “Libraries have traditionally used manual image annotation for indexing and then later retrieving their image collections. However, manual image annotation is an expensive and labor intensive procedure and hence there has been great interest in coming up with automatic ways to retrieve images based on content. Here, we propose an automatic approach to annotating and retrieving images based on a training set of images. We assume that regions in an image can be described using a small vocabulary of blobs. Blobs are generated from image features using clustering. Given a training set of images with annotations, we show that probabilistic models allow us to predict the probability of generating a word given the blobs in an image. This may be used to automatically annotate and retrieve images given a word as a query. We show that relevance models allow us to derive these probabilities in a natural way.

Experiments show that the annotation performance of this cross-media relevance model is almost six times as good (in terms of mean precision) than a model based on wordblob co-occurrence model and twice as good as a state of the art model derived from machine translation. Our approach shows the usefulness of using formal information retrieval models for the task of image annotation and retrieval.”

US 8,380,558, with title “Method and system for analyzing shopping behavior in a store by associating RFID data with video-based behavior and segmentation data”, according to its abstract describes “The present invention is a method and system for analyzing shopping behavior by associating RFID data, such as tracking data by the RFID tag identifications, with video-based behavior and segmentation data, such as behavior analysis and demographic composition analysis of the customers, utilizing a plurality of means for sensing and using RFID tags, a plurality of means for capturing images, and a plurality of computer vision technologies. In the present invention, the association can further comprise the association of the RFID with the transaction data or any time-based measurement in the retail space. The analyzed shopping behavior in the present invention helps people to better understand business elements in a retail space. It is one of the objectives of the present invention to provide an automatic videobased segmentation of customers in the association with the RFID based tracking of the customers, based on a novel usage of a plurality of means for capturing images and a plurality of computer vision technologies on the captured visual information of the people in the retail space. The plurality of computer vision technologies can comprise face detection, person tracking, body parts detection, and demographic classification of the people, on the captured visual information of the people in the retail space. “

CN107066605, with title “Image identification-based device information automatic retrieval and display method”, according to its abstract describes “The invention relates to an image identification-based device information automatic retrieval and display method. The method is mainly and technically characterized by comprising the following steps of establishing a real scene map of a substation; obtaining a view angle picture of the position of a browser, and identifying a device type of a device contained in the picture in real time; obtaining a monitoring information account corresponding to the device type; and dynamically displaying the monitoring information account on the real scene map. By adopting the method, a user does not need to perform manual annotation; the information retrieval is performed according to the device type automatically identified in the picture and a device ID; and the information display is more intelligent and quicker.’’

In “Attention-based Deep Multiple Instance Learning”, Feb 2018, by Maximilian Use et al. (https://arxiv.org/abs/1802.04712) according to its abstract describes “Multiple instance learning (MIL) is a variation of supervised learning where a single class label is assigned to a bag of instances. In this paper, we state the MIL problem as learning the Bernoulli distribution of the bag label where the bag label probability is fully parameterized by neural networks. Furthermore, we propose a neural network-based permutation-invariant aggregation operatorthat corresponds to the attention mechanism. Notably, an application of the proposed attention-based operator provides insight into the contribution of each instance to the bag label. We show empirically that our approach achieves comparable performance to the best MIL methods on benchmark MIL datasets and it outperforms other methods on a MNIST-based MIL dataset and two real-life histopathology datasets without sacrificing interpretability.”

Summary of the invention

In order to train a machine learning (ML) model for computer vision, often a training set with a large number of annotated images should be provided. Annotating images manually is a tedious job. Annotating images automatically is saving resources and therefore efficient but is often lacking the accuracy for training a ML model when a high reliability of the model is required.

Hence, it is an aspect of the invention to provide an improved and/or alternative method for annotating images which automates the annotating process and preferably further, at least partly, obviates one or more of above-described drawbacks, in particular by increasing the accuracy of the labeled data by automatic annotation.

The method according to the invention allows Al systems to improve over time due to the increasing availability of labelled or annotated data. In many cases it would not be necessary anymore to pre-train a ML model anymore for a specific application.

There is provided a method for training a machine learning model to identify a subject having at least one machine readable identifier providing a subject ID, said method comprising:

- providing a computer vision system with an image capturing system comprising at least one image capturing device, and a reader system comprising at least one reader for reading said at least one machine readable identifier;

- defining said machine learning model in said computer vision system;

- capturing an image using said image capturing system, said image showing said subject;

- reading said subject ID using said reader system when capturing said image, and linking said subject ID with said image, said linking providing said image with a linked subject ID, resulting in at least one annotated image;

- capturing at least one further image showing said subject, linking said linked subject ID to said at least one further image providing at least one further annotated image, and

- subjecting said annotated image and said at least one further annotated image to said machine learning model for training said machine learning model.

There is further provided A system for identifying a subject having at least one machine readable identifier providing a subject ID, said system comprising:

- a computer vision system comprising an image capturing system comprising at least one image capturing device, and a reader system comprising at least one reader for reading said at least one machine readable identifier;

- a machine learning model defined in said computer vision system;

said computer vision system in operation:

A subject can be an animal, a person or an object. A product is an example of an object.

A reader is a device for reading machine readable identifiers. A reader can consist of an antenna to receive a signal. Examples of readers are a RFID reader, a barcode scanner/camera, QR scanner/camera, chip and pin card reader, biometric reader (such as for fingerprint and iris recognition) and audio analyser (for voice and sound recognition).

An image capturing device is a device that can provide an image, in particular a digital image or digital picture. Such a device can comprise a camera of a filming (motion picture) device. Examples are devices comprising a CCD or similar imaging elements. As such, these devices are known to a skilled person.

In order to detect and localize a subject in a scene from a captured image an embodiment uses a method to detect subjects. Such a method will use machine learning techniques (mainly deep learning) to design and train a model which detects subjects given an input of a visual representation, e.g. an RGB image, as the system perceives. The model is trained on a large amount of annotated data; it comprises images with and without subjects and locations of the subjects are annotated.

In the case of deep learning, a detection framework such as Faster-RCNN, SSD, R-FCN, Mask-RCNN, or one of their derivatives can be used. A base model structure can be VGG, AlexNet, ResNet, GoogLeNet, adapted from the previous, or a new one. A model can be initialized with weights and trained similar tasks to improve and speedup the training. Optimizing the weights of a model, in case of deep learning, can be done with the help of deep learning frameworks such as Tensorflow, Caffe, or MXNET. To train a model, optimization methods such as Adam or RMSProb can be used. Classification loss functions such Hinge Loss or Softmax Loss can be used. Other approaches which utilize handcrafted features (such as LBP, SIFT, or HOG) and conventional classification methods (such as SVM or Random Forest) can be used.

In an embodiment, after localizing subjects in a scene from captured images, trained multiple instance neural networks (MINN) are used to match the correct subject IDs with subjects.

In an embodiment, after localizing subjects in a scene from retrieved images, a deep neural network (DNN) is trained to compare subjects from different captured images with each other in order to detect similar subjects.

In order to detect similar subjects from different captured images, an embodiment uses machine learning techniques (mainly deep learning) to design and train a model which detects the similarity of subjects, given an input of a visual representation, e.g. a RGB images, as the system perceives. The model is trained on a large amount of annotated data; it comprises images of subjects wherein similar subjects the are annotated.

For example, a pretrained DNN on ImageNet, e.g. VGGNet, AlexNet, ResNet, Inception and Xception, can be adapted by taking the convolution layers from these pretrained DNN networks, and on top of them adding new layers specially designed for detecting similar subjects, and train the network as described in the previous paragraph.

In case similar subjects are detected with sufficient reliability, the subject in the different captured images are automatically annotated with one or more subject IDs which are consistent with the session IDs retrieved by a reader system for the captured images. For example, if there is a similar subject detected in both captured image A and captured image B while for these images multiple subject IDs have been retrieved, then the similar subject in both image A and image B will automatically be annotated with the section of the subject IDs belonging to the subject IDs of image A and of image B.

In an embodiment, the method further comprises providing said subject with said machine readable identifier providing a subject ID.

In an embodiment, when capturing said at least one further image, a further subject ID is read using said reader system and said further subject ID is linked to said at least one further image.

In an embodiment, said annotated image and said at least one further annotated image are included in a training dataset that is built during performing said method, and said training dataset is used for at least one of training and additionally training said machine learning model.

In an embodiment, the machine learning model comprises a machine learning model part for localizing subjects in at least one of said captured image and said captured at least one further image.

In an embodiment, the reader system comprises at least a first reader and a second reader, wherein said first reader reads said subject ID when said image is captured, and said second reader reads said subject ID when said at least one further image is captured.

In this respect, “when” can be in a timeframe about said capturing such that it ensures that the image shows a subject of the subject ID.

In an embodiment, the subject comprises at least a first and a second machine readable identifier, said first reader reads said first machine readable identifier for providing said subject ID, and said second reader reads said second machine readable identifier for providing said subject ID.

In an embodiment, the first and second reader and said first and a second machine readable identifier are of a different type, wherein said first and second reader provide a first and second identifier, and in particular said vision system provides said subject ID from said first and second identifier. For instance, the first reader is an RFID reader and the second reader is a chip card reader.

In an embodiment, at least one selected from said linked subject ID and a further subject ID is repeated.

In an embodiment, the capturing said at least one further image and said linking said linked subject ID to said at least one further image continuously repeated, providing a series of said at least one further annotated image, in particular said capturing is repeated when there is one or more subject in a field of view of said image capturing system.

In an embodiment, the capturing said at least one further image is continuously repeated, and said reader system repeats reading said subject ID using said reader system when a said at least one further image is captured, providing each time a renewed subject ID, linking said renewed subject ID with said at least one further image, said linking providing said at least one further image with a linked subject ID, resulting in at least one further annotated image, for providing a series of annotated images.

In an embodiment, the annotating images is continued until a predetermined reliability level for identifying said subject in an image is reached.

In an embodiment, the method further is for training a machine learning model to identify a plurality of subject each having at least one machine readable identifier providing a subject ID for each subject, wherein said reader system reads said machine readable identifiers of at least part of said plurality of subjects, providing a series of subject IDs, said image capturing system captures said image with said at least part of said plurality of subjects and, and links said image with said at least part of said plurality of subjects with said series of subject IDs, providing said annotated image.

In an embodiment, the image capturing system captures said at least one further image with said at least part of said plurality of subjects and, and links said image with said at least part of said plurality of subjects with said series of subject IDs, providing said annotated image.

The method is in an embodiment further provided for training a machine-learning model to identify an animal among a group of animals, in particular a livestock animal amidst a group of livestock animals, using the method described above.

There is further provided A computer program product for running on a data processor on a computer vision system, wherein said computer program product when running on said data processor: enables said computer vision system to perform the method described above.

The term “statistically” when used herein, relates to dealing with the collection, analysis, interpretation, presentation, and organization of data. In particular, it comprises modelling behavior of a population. Using probability distributions, a probability of optimizing transmission reliability is calculated and predicted.

The term “substantially” herein, such as in “substantially all emission” or in “substantially consists”, will be understood by the person skilled in the art. The term “substantially” may also include embodiments with “entirely”, “completely”, “all”, etc. Hence, in embodiments the adjective substantially may also be removed. Where applicable, the term “substantially” may also relate to 90% or higher, such as 95% or higher, especially 99% or higher, even more especially 99.5% or higher, including 100%. The term “comprise” includes also embodiments wherein the term “comprises” means “consists of’.

The term functionally will be understood by, and be clear to, a person skilled in the art. The term “substantially” as well as “functionally” may also include embodiments with “entirely”, “completely”, “all”, etc. Hence, in embodiments the adjective functionally may also be removed. When used, for instance in “functionally parallel”, a skilled person will understand that the adjective “functionally” includes the term substantially as explained above. Functionally in particular is to be understood to include a configuration of features that allows these features to function as if the adjective “functionally” was not present. The term “functionally” is intended to cover variations in the feature to which it refers, and which variations are such that in the functional use of the feature, possibly in combination with other features it relates to in the invention, that combination of features is able to operate or function. For instance, if an antenna is functionally coupled or functionally connected to a communication device, received electromagnetic signals that are receives by the antenna can be used by the communication device. The word “functionally” as for instance used in “functionally parallel” is used to cover exactly parallel, but also the embodiments that are covered by the word “substantially” explained above. For instance, “functionally parallel” relates to embodiments that in operation function as if the parts are for instance parallel. This covers embodiments for which it is clear to a skilled person that it operates within its intended field of use as if it were parallel.

Furthermore, the terms first, second, third and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and that the embodiments of the invention described herein are capable of operation in other sequences than described or illustrated herein.

The devices or apparatus herein are amongst others described during operation. As will be clear to the person skilled in the art, the invention is not limited to methods of operation or devices in operation.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. Use of the verb to comprise and its conjugations does not exclude the presence of elements or steps other than those stated in a claim. The article a or an preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the device or apparatus claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.

The invention further applies to an apparatus or device comprising one or more of the characterizing features described in the description and/or shown in the attached drawings. The invention further pertains to a method or process comprising one or more of the characterizing features described in the description and/or shown in the attached drawings.

The various aspects discussed in this patent can be combined in order to provide additional advantages. Furthermore, some of the features can form the basis for one or more divisional applications.

Brief description of the drawings

Embodiments of the invention will now be described, by way of example only, with reference to the accompanying schematic drawings in which corresponding reference symbols indicate corresponding parts, and in which:

FIG 1 schematically depicts an embodiment for training a machine learning model to identify products labeled with a barcode;

FIGs 2A-2C schematically depict an embodiment for training a machine learning model to identify cows earmarked with a RFID chip;

FIGs 3A-3C schematically depict another embodiment for training a machine learning model to identify cows earmarked with a RFID chip;

FIGs 4A-4C schematically depicts an embodiment for training a machine learning model to identify travelers using a boarding pass, and

FIGs 5 A-5C schematically depict an embodiment for training a machine learning model to identify a woman identifying herself at different locations.

The drawings are not necessarily on scale.

Description of preferred embodiments

FIG 1 schematically depicts an embodiment in a warehouse 106 for training a machine learning model 9”’, defined in a computer vision system 99, to identify products 10 labeled with a barcode as subject ID. The computer vision system 99 is operationally coupled with scanner 5 and cameras 1 and Γ. The barcode of the products 10 are scanned by scanner 5 and the cameras 1 and 1’ capture images of the products 10. An annotated image of product 10’ captured by camera 1, comprising a scanned barcode of product 10’, is subjected to machine learning model 9’”. A further annotated image of product 10’ captured by camera 1’, comprising a scanned barcode of product 10’, is also subjected to machine learning model 9’”. Product 10’ in the captured images is automatically labeled or annotated with a unique subject ID belonging to its barcode. Machine learning model 9’” that is trained in this way can thus be applied to identify product 10’.

FIG 2A-2C schematically depicts an embodiment, at a farmyard 101 and 103, for training a machine learning model 9’, defined in a computer vision system, to identify cow 13 among cows 14 and 15. Cow 13 is earmarked with a RFID chip 23, cow 14 is earmarked with a RFID chip 24 and cow 15 is earmarked with a RFID chip 25. The signals 33, 34 and 35, belonging respectively to the RFID chips 23, 24 and 25, comprise a unique subject IDs for respectively cow 13, 14 and 15. Antennas 3 and 3’ are operationally coupled to a RFID reader. The RFID reader, camera 1 and 1’ are operationally coupled to the computer vision system.

In FIG 2A, the three cows 13,14 and 15 are grouped at a farmyard 101. The signal s 33, 34, and 35 are being received by antenna 3. Camera 1 captured an image of the cows 13, 14 and 15. An annotated image 201” (FIG 2C) captured by camera 1, comprising the subject IDs of cow 13, 14 and 15, is subjected to the machine learning model 9’.

In FIG 2B, cow 13 is eating at a cratch 8 in a designated area at a farmyard 103. The signal 33 is being received by antenna 3. Camera Γ captured a further image of cow

13. A further annotated image 203 (FIG 2C) captured by camera 1’, comprising the unique subject ID of cow 13, is subjected to the machine learning model 9’.

In FIG 2C, cow 13 in the annotated image 201 ” and cow 13 in the annotated image 203 are thus automatically labeled or annotated with the unique subject ID belonging to RFID chip 23 (marked with an arrow) which is in section of subject IDs of annotated image 201” and 203. Machine learning model 9’ that is trained in this way can thus be applied to identify cow 13 in an image.

In practice, the computer vision system will continuously capture images of one or more cows and read subject IDs. These will be automatically linked to provide annotated images and applied to the machine learning model 9’. In this way, the machine learning model 9’ can be (additionally) trained and improved. If the machine learning model 9’ qualifies the annotated image as being below a predefined threshold, the annotated image may be disregarded in the training process, and/or the annotated image may even be removed from the system.

FIG 3A-3C schematically depicts an embodiment, at a farmyard 101, fortraining a machine learning model 9, defined in a computer vision system, to identify cow 13 among cows 11, 12, 13, 14 and 15. Cow 11 is earmarked with a RFID chip 21, cow 12 is earmarked with a RFID chip 22, cow 13 is earmarked with a RFID chip 23, cow 14 is earmarked with a RFID chip 24 and cow 15 is earmarked with a RFID chip 25. The signals 31, 32, 33, 34 and 35, belonging respectively to the RFID chips 21, 22, 23, 24 and 25, comprise a unique subject IDs for respectively cow 11, 12, 13, 14 and 15. Antennas 3 is operationally coupled to a RFID reader. The RFID reader and camera 1 are operationally coupled to the computer vision system.

In FIG 3 A, the three cows 11,12 and 13 are grouped at a farmyard 101. The signal s 31, 32, and 33 are being received by antenna 3. Camera 1 captured an image of the cows

11, 12 and 13. An annotated image 201 (FIG 3C) captured by camera 1, comprising the subject IDs of cow 11, 12 and 13, is subjected to the machine learning model 9.

In FIG 3B, the three cows 13, 14and 15 are grouped at a farmyard 101.The signals 33, 34, and 35 are being received by antenna 3. Camera 1 captured a further image of the cows 13, 14 and 15. A further annotated image 201’ (FIG 3C) captured by camera 1, comprising the subject IDs of cow 13, 14 and 15, is subjected to the machine learning model 9.

In FIG 3C, cow 13 in the annotated image 201 and cow 13 in the annotated image 201’ are thus automatically labeled or annotated with the unique subject ID belonging to RFID chip 23 (marked with an arrow) which is in section of subject IDs of annotated image 201 and 201’. Machine learning model 9 that is trained in this way can thus be applied to identify cow 13 in an image.

The RFID chip can either be active or passive.

FIG4A-4C schematically depicts an embodiment, at airport halls 104 and 105, for training a machine learning model 9”, defined in a computer vision system, to identify a person 16 among a crowd. Person 16 is carrying a chip card 26. The chip card 26 comprises a unique subject IDs for person 16. Chip card reader 4 and camera 1 are operationally coupled to the computer vision system.

In FIG 4A, person 16 is in the process of entering the airport in airport hall 104 by unlocking turn style 7, by putting his chip card 26 in card reader 4. Camera 1 captured an image of person 16. An annotated image 204 (FIG 4C) captured by camera 1, comprising the subject IDs of person 16, is subjected to the machine learning model 9”.

In FIG 4B, person 16 is walking in an airport hall 105. Camera 1 captured an image of person 16. An image 205 (FIG 4C) captured by camera 1 is subjected to the machine learning model 9”.

In FIG 4C, person 16 in the annotated image 204 and person 16 in the image 205 are automatically labeled or annotated with the unique subject ID belonging to chip card 26 since person 16 in annotated image 204 and image 205 are detected as to be likely similar. Machine learning model 9” that is trained in this way can thus be applied to identify person 16 in an image.

FIG 5A-5C schematically depicts an embodiment for training a machine learning model to identify a woman 17 identifying herself at different locations 107, 108 and 109, for training a machine learning, defined in a computer vision system, to identify a woman 17 in various situations. Turn style 7’ with fingerprint reader 4’, ATM cash machine 6 with a bank card reader, ID card reader 4” and image capturing device 1 are operationally coupled to the computer vision system.

In FIG 5 A, woman 17 in an office entrance 107 identifies herself at turn style 7’ by putting her finger 27 on a fingerprint reader 4’ while image capturing device 1 5 captures at least one image of her.

In FIG 5B, woman 17, in a designated area 108, withdraws cash from an ATM cash machine 6 with a bank card reader, and identifies herself by a bank card 27’ and by typing her pin code on the ATM cash machine while image capturing device 1 captures at least one image of her.

In FIG 5C, woman 17 in a town hall 109 identifies herself at a counter by showing her ID card 27’ ’ to an ID card reader 4’ ’ while image capturing device 1 captures at least one image of her.

It will also be clear that the above description and drawings are included to illustrate some embodiments of the invention, and not to limit the scope of protection.

Starting from this disclosure, many more embodiments will be evident to a skilled person. These embodiments are within the scope of protection and the essence of this invention and are obvious combinations of prior art techniques and the disclosure of this patent.

Claims

Conclusions

A method of training a machine learning model to identify a subject with at least one machine readable identifier providing a subject ID, the method comprising:

- providing a computer vision system with an image recording system comprising at least one image recording device, and a reading system comprising at least one reader for reading at least one machine-readable identifier;

- defining the machine learning model in the computer vision system;

- capturing an image using the image recording system, the image showing the subject;

- reading the subject ID by using the reading system when capturing the image, and coupling the subject ID to the image, the coupling presenting the image of an associated subject ID, resulting in at least at least one annotated image;

capturing at least one further image showing the subject, coupling the coupled subject ID with the at least one further image to provide at least one further annotated image, and

- subjecting the annotated image and the at least one further annotated image to the machine learning model for training the machine learning model.

The method of claim 1, further comprising providing the subject with the machine-readable Identifier providing the subject ID.

The method of claim 1 or 2, wherein upon recording the at least one further image, a further subject ID is read using the reading system and the further subject ID is coupled to the at least one further image.

The method according to any of the preceding claims, wherein the annotated image and the at least one further annotated image are included in a training data set that is built up during the execution of the method, and the training data set is used for at least one selected from training and additional training of the machine learning model.

The method of any preceding claim, wherein the machine learning model includes a machine learning model portion for locating subjects in at least one selected from the recorded image and the at least one further recorded image.

The method of any preceding claim, wherein said reader system comprises at least a first reader and a second reader, the first reader reading the subject ID when the image is captured, and the second reader reading the subject ID when it records at least one further image.

The method of any of the preceding claims when dependent on claim 6, wherein the subject comprises at least a first and a second machine-readable identifier, first reader reads the first machine-readable identifier to provide the subject- ID and the second reader reads the second machine readable identifier to provide the subject ID.

A method according to any one of the preceding claims, wherein the first and second reader and the first and a second machine-readable identifier are of a different type, wherein the first and second reader are a first and a second

NL2021481 provide identifier, and in particular the vision system provides the subject ID of the first and second identifiers,

The method of any preceding claim, wherein reading at least one selected from the linked subject ID and a further subject iD is repeated,

The method according to any one of the preceding claims, wherein capturing the at least one further image and coupling the coupled subject ID to the at least one further image is continuously repeated, whereby a sequence of the at least one further annotated image in particular, recording is repeated when there are one or more subjects in a field of view of the image recording system.

The method according to any one of the preceding claims, wherein the recording of the at least one further image is continuously repeated and the reading system repeats reading the subject ID using the reading system when one of the at least one further image is recorded thereby providing a renewed subject ID each time, coupling the renewed subject ID to the at least one further image, the coupling providing the at least one further image with a coupled subject ID, resulting in at least one further annotated image, to provide a series of annotated images.

The method of any preceding claim, wherein image annotation is continued until a predetermined confidence level for identifying the subject in an image is reached.

The method according to any of the preceding claims, for training a machine mode! to identify multiple subjects each having at least one machine-readable identifier to provide a subject ID for each subject, the reading system reading the machine-readable identifiers of at least a portion of the multiple subjects, to provide multiple subject IDs, wherein the image recording system records the image with the at least a portion of the multiple subjects, and couples the image to the at least a portion of the multiple subjects with the multiple subject IDs, providing the annotated images.

The method of claim 13, wherein the image recording system records the at least one further image with the at least part of the plural subjects, and couples the image with the at least part of the multiple subjects with the multiple subject IDs, for providing the annotated images.

A method of training a machine learning model to identify an animal in a group of animals, in particular a livestock animal, among a group of livestock animals, using the method of any of the preceding claims.

A system for identifying a subject with at least one machine-readable identifier providing a subject ID, the system comprising:

a computer vision system comprising an image recording system comprising at least one image recording device, and a reading system comprising at least one reader for reading the at least one machine-readable identifier;

- a machine learning model defined in the computer vision system; the computer vision system operating:

- capture an image with the image recording system, the image showing the subject;

- reads the subject ID using the reading system when recording the image, and coupling the subject ID to the image, wherein the

5 linking the image to a linked subject ID provides, resulting in at least one annotated image;

- records at least one further image showing the subject, coupling the coupled subject ID with the at least one further image provides the at least one further annotated image, and

10 - subject the annotated image and the at least one further annotated image to the machine learning model for training the machine learning model.

17. A computer program product for operating on a data processor on

15 a computer vision system, wherein the computer program product when running on the data processor:

enables the computer vision system to perform the method of any of the preceding claims 1-14.