AU2017279658A1

AU2017279658A1 - Pose-aligned descriptor for person re-id with geometric and orientation information

Info

Publication number: AU2017279658A1
Application number: AU2017279658A
Authority: AU
Inventors: Rajanish Calisa; Fei MAI; Geoffrey Richard Taylor
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2017-12-20
Filing date: 2017-12-20
Publication date: 2019-07-04

Abstract

Abstract An apparatus (150) comprising: a computer readable storage device (209) storing a software program (233); a processor (205) for executing the program (233) to perform a method (1100) for matching objects of interest in video frames obtained from at least two video cameras, the 5 method comprising the steps of: receiving (1110) a pair of video frames (1111); for the object of interest; in each of the pair of received frames: determining (1220) an orientation appearance signature (1221); a colour appearance signature (1241); and a geometric appearance signature (813); and determining (1270) an orientation-aware appearance signature (370) dependent upon the orientation, colour, and geometric appearance signatures; determining (940) a distance 10 metric (941) depending upon the orientation-aware appearance signatures (370) for the objects of interest; determining (1130) depending upon the distance metric (941) a distance (1131) between the objects of interest; and matching (1140) the objects of interest dependent upon the distance (1131). P280381 / 14064697_2

Description

POSE-ALIGNED DESCRIPTOR FOR PERSON RE-ID WITH GEOMETRIC AND ORIENTATION INFORMATION

TECHNICAL FIELD

The present description relates generally to image processing and, in particular, to the problem 5 of matching objects in video frames between two camera views to determine whether a “candidate object” is an “object of interest”. In one example, the terms “candidate object” and “object of interest”, “respectively refer to (i) a person in a crowded airport, the person being merely one of the crowd, and (ii) a person in that crowd that has been identified as being of particular interest. The present description also relates to a computer program product including a computer readable medium having recorded thereon a computer program for matching objects between two camera views to determine whether a candidate object is an object of interest.

BACKGROUND

Public venues such as shopping centres, parking lots and train stations are increasingly subject to surveillance using large-scale networks of video cameras. Application domains of large-scale video surveillance include security, safety, traffic management and business analytics. In one example application from the security domain, a security officer may want to view any video feed containing a particular suspicious person in order to identify undesirable activities. In another example from the business analytics domain, a shopping centre may wish to track customers across multiple cameras in order to build a profile of shopping habits.

A key task in many of these applications is rapid and robust object matching across multiple camera views. In one example, also called “hand-off’, object matching is applied to persistently track multiple objects across a first and second camera with overlapping fields of view. In another example, also called “re-identification”, object matching is applied to locate a specific object of interest across multiple cameras in the network with non-overlapping fields of view.

In the following discussion, the term “object matching” is alternately referred to as “hand-off object matching”, “re-identification object matching”, “object identification” and “object recognition”.

A common approach for object matching includes the steps of extracting an “appearance signature” for each object, and using this signature to compute a similarity between different objects. Throughout this description, the term “appearance signature” refers to a set of

P280381 / 14064697_2

-22017279658 20 Dec 2017 parameter values characterising the appearance of an object or region of an image, and is alternately referred to as “appearance model”, “feature descriptor”, “descriptor” and “feature vector”.

Robust object matching is a challenging problem for several reasons. Firstly, many objects may 5 have similar appearance, such as commuters in a crowd of commuters on public transport wearing similar business attire. Furthermore, the viewpoint (i.e. the orientation and distance of an object in the camera’s field of view) can vary significantly between cameras in a network. In one example, a single network may simultaneously include cameras which capture a frontal view of a person in one camera view and a profile view of the same person in another camera view. This poses a challenge to object matching since the appearance model of a frontal view of a person can be quite different from an appearance model determined from a profile view. Although the person is the same, the appearance model can be quite different depending on the relative orientation of the camera and the object (hereinafter generally referred to merely as “orientation”). A related problem is that the appearance model is determined based on the colour and texture of the object (ie the person) in the image. Hence, a person wearing a white shirt will have the same or a similar appearance as a person wearing a white coat.

One prior-art method for appearance-based object re-identification teaches a method of training different support vector machines (SVM) classifiers based on the orientation of persons in matched and unmatched pairs of images. Positive and negative pairs during training are separated into two categories based on orientation, i.e. similar and dissimilar orientation. Pairwise feature dissimilarity is determined and two classifiers are trained to distinguish between positive and negative pairs depending on whether the orientation is similar or different. While this method tries to overcome the change in appearance due to orientation by training separate classifiers based on same or different orientation, it has a serious drawback. A hard decision has to be made based on orientation of the images to be matched in order to select the classifier to use. Since determination of orientation itself can be error-prone, the method is heavily reliant on correct determination of orientation. Further, splitting the dataset at training time into separate groups based on similarity (or otherwise) of orientation of positive and negative pairs, reduces the number of training samples used to train each classifier.

Yet another prior-art method for appearance-based object re-identification teaches a method of determining appearance signatures for different orientations. In this method, from multiple shots of a person a database of features representing the person is constructed, in which the

P280381 / 14064697_2

-32017279658 20 Dec 2017 features are grouped by orientation. For a set of eight orientations, the person is segmented into rigid parts and for each part a set of sparse codes is leamt. Sparse codes are determined using a dictionary of visual elements also known as a code book. Such a dictionary is used to represent the extracted database of features from the person’s image using sparse codes. The features are represented as a linear combination of a small number of these visual elements (the smaller the number of visual elements, the sparser the representation becomes). The coefficient values used in the linear representation form the sparse code representation of the extracted feature descriptor. In this way a sparse and compact representation of the image features is obtained. Matching is done by combining codes for corresponding body parts at different orientations into a single feature vector such that the resulting feature vector for the body part is somewhat invariant to orientation. A disadvantage of this method is that it requires the capture of a person in different orientations by a camera. When multiple shots at different orientations are missing, the method performs poorly.

SUMMARY

It is an object of the present invention to substantially overcome, or at least ameliorate, one or more disadvantages of existing arrangements, or to provide a useful alternative.

Disclosed are arrangements, referred to as Orientation-aware Descriptor (OAD) arrangements, which seek to address the above problems by determining a feature descriptor incorporating the appearance of the person and the orientation of the person with respect to a camera, and geometric features of the body parts of the person.

According to a first aspect of the present disclosure, there is provided a method for matching objects of interest in video frames obtained from at least two video cameras, the method comprising the steps of:

receiving a pair of video frames;

for the object of interest in each of the pair of received frames:

determining an orientation appearance signature; determining a colour appearance signature; determining a geometric appearance signature; and determining an orientation-aware appearance signature dependent upon the orientation appearance signature, the colour appearance signature, and the geometric appearance signature;

P280381 / 14064697_2

-42017279658 20 Dec 2017 determining a distance metric depending upon the orientation-aware appearance signatures for the objects of interest in the pair of frames;

determining depending upon the distance metric a distance between the objects of interest in the pair of frames; and matching the objects of interest in the pair of frames dependent upon the distance.

According to another aspect of the present disclosure, there is provided an apparatus for implementing any one of the aforementioned methods.

According to another aspect of the present disclosure there is provided a computer program 10 product including a computer readable medium having recorded thereon a computer program for implementing any one of the methods described above.

Other aspects are also disclosed.

BRIEF DESCRIPTION OF THE DRAWINGS

One or more embodiments of the invention will now be described with reference to the 15 following drawings, in which:

Fig. 1 is a simplified diagram illustrating an image of an object of interest captured by a first digital camera and an image of candidate objects captured by a second digital camera, to which OAD arrangements may be applied;

Figs. 2A and 2B form a schematic block diagram of a general purpose computer system upon 20 which OAD arrangements described can be practiced;

Fig. 3 is a diagram illustrating a simplified process of determining an orientation-aware descriptor according to one OAD arrangement;

Fig. 4 is an illustration of construction of an orientation-aware descriptor according to one OAD arrangement;

Fig. 5A and Fig. 5B collectively depict an example of a method of determining the orientation of the object of interest according to one OAD arrangement;

P280381 / 14064697_2

-52017279658 20 Dec 2017

Figs. 6A and 6Bcollectively depict an example of a method for determining the orientation of the object of interest according to one OAD arrangement;

Fig. 7 is an illustration of body part segmentation and creation of body part descriptors according to one OAD arrangement;

Fig. 8A and 8B collectively depict body part segmentation and geometric feature extraction of body parts according to one OAD arrangement;

Fig. 9 is a schematic flow diagram showing an example of a method of determining a distance metric at training time according to one OAD arrangement;

Fig. 10 is a schematic flow diagram showing an example of a method of determining 10 orientation-aware descriptors for a pair of images;

Fig. 11 is a schematic flow diagram showing an example of a method of determining orientation-aware descriptors for two images and matching the two images using their orientation-aware descriptors and a learned metric according to one OAD arrangement;

Fig. 12 is a schematic flow diagram showing an example of a method of determining an 15 orientation-aware descriptor for a single image using body-part segmentation according to one

OAD arrangement;

Fig. 13 is a schematic flow diagram showing an example of a method of determining an orientation-aware descriptor for a single image without using body-part segmentation according to one OAD arrangement;

Fig. 14 is a schematic flow diagram showing an example of a method of training a machine learning algorithm to detect a person’s orientation according to one OAD arrangement;

Fig. 15 is an example of a positive training pair and a negative training pair for training a machine learning algorithm according to one OAD arrangement; and

Fig. 16 is a schematic flow diagram showing an example of a method of matching a target of 25 interest with one or more images of other people or objects according to one OAD arrangement.

P280381 / 14064697_2

-62017279658 20 Dec 2017

DETAILED DESCRIPTION INCLUDING BEST MODE

Context

Where reference is made in any one or more of the accompanying drawings to steps and/or features, which have the same reference numerals, those steps and/or features have for the purposes of this description the same function(s) or operation(s), unless the contrary intention appears.

It is to be noted that the discussions contained in the Background section and the section above relating to prior art arrangements relate to discussions of documents or devices which may form public knowledge through their respective publication and/or use. Such discussions should not be interpreted as a representation by the present inventors or the patent applicant that such documents or devices in any way form part of the common general knowledge in the art.

The inventors have realized that the disclosed OAD arrangements enable an object of interest to be re-identified in video frames across camera views notwithstanding variations in a person’s appearance caused due to changes in pose, unlike existing methods that are invariant to only some of these properties or sensitive to noise.

The present description provides a method and system for matching objects captured on two camera views, each of which captures images of objects (such as persons) with different orientation characteristics, using an appearance signature, referred to as an “orientation aware appearance signature”.

The orientation aware appearance signature comprises a combination (typically a concatenation), of three distinct appearance signatures namely (i) an “orientation appearance signature” which is based upon features defining the objects orientation with respect to the camera, (ii) a “geometric appearance signature” which is based upon features defining geometric properties of object parts into which the object can be decomposed, such as a ratio between the length of the object and the length of a part of that object, and (iii) a “colour appearance signature” based upon features defining one or more of colour, texture and shape features of the object.

P280381 / 14064697_2

-72017279658 20 Dec 2017

Fig. 1 illustrates an exemplary use case to which OAD arrangements may be applied. In this example, the goal is to determine whether a person of interest 100 observed in an image 110 of a first scene captured by a first digital camera 115, is present in an image 120 of a second scene captured by a second digital camera 125. This use case is a very common use case that occurs in surveillance and video monitoring systems where an OAD arrangement can be practiced. The cameras 115 and 125 are connected to a computer system 150 to which OAD arrangements may be applied. In this example, the second image 120 contains three people 130, 131 and 132, any one of which may be the person of interest 100. OAD arrangements may be applied to determine which of the three objects 130, 131 and 132 is a best match for the object of interest

100. OAD arrangements may equally be applied when images of the object of interest and candidate objects are captured by different cameras simultaneously or at different times, or captured by the same camera at different times, including images that represent the same scene or different scenes, or multiple scenes with different candidate objects.

An image, such as the image 110, is made up of “visual elements”. The terms “pixel”, “pixel location” and “image location” are used interchangeably throughout this specification to refer to one or more of the visual elements in a captured image. Each pixel of an image is described by one or more values characterising a property of the scene captured in the image. More particularly, the normal value e.g. intensity of the pixel is taken to characterise a property of the scene. In one example, a single intensity value characterises the brightness of the scene at the pixel location. In another example, a triplet of values characterise the colour of the scene at the pixel location. Furthermore, a “region”, “image region” or “cell” in an image refers to a collection of one or more spatially adjacent visual elements.

A “feature” represents a derived parameter value or set of derived parameter values determined from the pixel values in an image region. One example of a colour appearance signature is a histogram of pixel colours and image intensity gradients within predefined spatial cells of a rectified image. Image rectification is a transformation process used to project one-or-more images onto a common image plane. Image rectification involves removing perspective distortion in the image as an example. In one example, a feature is a histogram of colour values in the image region. In another example, a feature is an “edge” response value determined by estimating an intensity gradient in the region. In yet another example, a feature is a filter response, such as a Gabor filter response, determined by the convolution of pixel values in the region with a filter kernel.

P280381 / 14064697_2

-82017279658 20 Dec 2017

Furthermore, a “feature map” assigns a feature value to each pixel in an image region. In one example, a feature map assigns a feature value which is an intensity value to each pixel in an image region. In another example, a feature map assigns a feature value which is a hue value to each pixel in an image region. In yet another example, a feature map assigns a feature value which is a Gabor filter response to each pixel in an image region.

Finally, a “feature distribution” refers to the relative frequency of feature values in a feature map, normalized by the total number of feature values. In one OAD arrangement, a feature distribution is a set of colour histograms (RGB, HSV etc.) as well as Histogram of gradients features. Another example of an appearance signature is a “bag-of-words” model of quantized keypoint descriptors. Considering the example of Scale Invariant Feature Transform (SIFT) descriptors, a visual bag-of-words model can be learned by performing k-means clustering of the SIFT descriptors. The cluster centres determined by k-means can be used to produce a codebook of a dictionary of visual words. Then image regions can be represented using the visual words rather than the original feature descriptors.

The terms “pose”, “orientation”, “object pose”, and “object orientation” may refer to the direction of movement of an object 100 with respect to the camera. It may also refer to the direction the object is facing when an image of the object is captured from the camera. The “pose” may be a continuous value which indicates degrees of angle with respect to the camera or it could be a discrete label from a quantization of a continuous value. Examples of a discrete pose value would be “front” and “left” indicating the object is moving towards the camera or to the left of the camera respectively.

Referring to Fig. 1, in the image 120 of the scene captured from the camera 125, the object of interest 130 has an orientation of rear and left, and the object of interest 132 has an orientation of front and right.

A “bounding box” refers to a rectilinear image region enclosing an object in an image.

Referring to Fig. 1, 105 is a bounding box encompassing the object if interest 100. In one OAD arrangement, an operator of a surveillance system may define a bounding box 105 to express his intention of re-identifying that object of interest in another camera’s image.

The term “foreground mask” refers to a binary image with non-zero values at pixel locations corresponding to an object of interest. A non-zero pixel location in a foreground mask is known as a “foreground pixel”. The term “background pixel” refers to those pixels in an image (or

P280381 / 14064697_2

-92017279658 20 Dec 2017 within the corresponding bounding box) that are not foreground pixels. The set of “background pixels” in a “foreground mask” is the “scene”. Referring to Fig. 1, the foreground mask corresponds to the image pixels of the person 100 within the bounding box 105.

As illustrated in Fig. 1, the digital cameras 115 and 125 communicate with a computer system 5 150. This exemplary OAD arrangement can be applied to a range of applications. In one example, the computer system 150 allows a security guard to select an object of interest, made visible on one or more video displays such as 214, using an interactive user interface (such as a keyboard 202 or a pointing device 203), and the computer system 150 returns images of one or more candidate objects determined to be the object of interest. In another example, the computer system 150 automatically selects an object of interest and matches the object across multiple distributed cameras in order to analyse the long-term behaviour of the object.

Overview of the Invention

As described above, the present description relates to methods that enable an object of interest to be matched across camera views despite variations in orientation of the person in the camera views. The OAD method performs matching by determining an orientation aware appearance signature for the object of interest by fusing (i) a colour appearance signature of the object of interest (determined using colour, texture, shape etc.) (ii) an orientation appearance signature of the object, and (iii) a geometric appearance signature of the object of interest based upon properties extracted from parts of the object of interest. In one example, the object is a person and the parts correspond to arms, legs, etc. In another example, the object is a car and the parts correspond to wheels, doors, bumpers, etc.

The OAD arrangement is described in the present specification in relation to people, however this does not prevent the application of the OAD method to other object types. The disclosed OAD arrangements quantify the difference in appearance between pairs of images by encoding an orientation descriptor and a geometric descriptor as part of the orientation aware appearance signature.

In the example depicted in Fig. 1 the shirt of the person of interest 130 may be longer than the shirt of the person 132 (this relates to the geometric appearance signature) while the colour appearance signature (described using colour and texture) of their shirts may be identical.

Encoding the geometric properties of the shirts of 132 and 130 (this relates to the geometric

P280381 / 14064697_2

-102017279658 20 Dec 2017 appearance signature) would enable the two objects to be distinguished even though their colour appearance signature might be similar.

In another example, the orientation of the person 100 in one camera view 115 may be different from the orientation of the same person 131 appearing in another camera view 125. In this example, the orientation appearance signatures are defined as the pose of the person as seen from the respective points of view of the cameras 115 and 125. For example, the person 100 has a frontal orientation in the view of the camera 115, and has a rear and left orientation as shown at 131 in the camera view of 125.

By encoding the orientation appearance signature along with the colour appearance signature, the OAD system is able to distinguish between persons who might otherwise appear similar in appearance due to differences in orientation, thus creating false positives.

The determination of an orientation-aware descriptor is akin to encoding camera installation geometry into the person descriptor. In other words, since the camera geometry at a particular installation is fixed, i.e. cameras 115 and 125 are installed to capture a particular view of the scene, people captured in one camera image 110 in one orientation will appear in a possibly different but consistent orientation in another camera image 120. For example, in Fig. 1, an object of interest 100 who appears in a frontal pose in the camera image 110, will appear in a rear and left pose in the camera image 120.

Fig. 3 depicts a process of determining the orientation-aware descriptor for an object of interest

310 in a scene in one OAD arrangement. In this example, the orientation-aware descriptor is computed as follows. The object of interest 310 is first detected in the scene and a bounding box 311 is determined. Then for the object of interest 310 a foreground mask 330 is determined. Within this foreground mask 330, a set 340 of different body parts such as 341, 342 of the person 340 are determined. Body parts include hair, face, upper body, lower body, arms, shoes etc. In some arrangements, accessories such as bags or back packs may also be detected.

In one arrangement a set 360 of colour appearance signatures is determined for each of the detected body parts 340. One example of an appearance signature is a histogram of pixel colours and image intensity gradients within predefined spatial cells of a rectified image. Another example of a colour appearance signature is a “bag-of-words” model of quantized keypoint descriptors. Considering the example of Scale Invariant Feature Transform (SIFT) descriptors, a visual bag-of-words model can be learned by performing k-means clustering of

P280381 / 14064697_2

-112017279658 20 Dec 2017 the SIFT descriptors. The cluster centres determined by k-means can be used to produce a codebook of a dictionary of visual words. Then image regions can be represented using the visual words rather than the original feature descriptors.

Furthermore, for each body part in the set 340 of the body parts, a set 350 of geometric appearance signatures such as length of a part, colour moments, moments of area etc., are determined. Colour moments are geometric features, and they are determined using individual colour channels like RGB.

Further an orientation appearance signature 320 of the object of interest 310 is also estimated. The orientation appearance signature may be encoded as a probability distribution of a set of orientations.

Finally using the set 360 of colour appearance signatures of the body parts (eg 341, 342, ..), the orientation appearance signature 320 of the object of interest 310, and the set 350 of geometric appearance signatures of the body parts, an orientation-aware appearance descriptor (also referred to as an orientation aware appearance signature) 370 is determined.

In one OAD arrangement, the appearance signatures 320, 350 and 360 are concatenated together to form the orientation aware appearance signature 370 for the whole image 310. In one OAD arrangement, the set 350 of geometric appearance signatures of the set 340 of the body parts are concatenated with the set 360 of colour appearance signatures of the set 340 of body parts. In another OAD arrangement, instead of determining the set 340 of body parts, a colour appearance signature 360 may be determined based on all the pixels (for example, dividing up the image into horizontal overlapping stripes) within the foreground mask 330 and the orientation-aware appearance signature 370 may be determined based on the colour appearance signature of the entire image and the orientation appearance signature.

The determination of orientation-aware appearance signature is further described with reference to an example depicted in Fig. 4.

Fig. 4 depicts construction of an orientation-aware descriptor according to one OAD arrangement. For an object of interest 420, an orientation 425 is determined. In one OAD arrangement, this orientation may be one of a set 425 of eight different orientations. The orientation of the object of interest is determined with respect the camera view capturing an image of the object of interest 420. Based on the determined orientation 425, an orientation

P280381 / 14064697_2

-122017279658 20 Dec 2017 appearance signature (alternately referred to as an orientation appearance descriptor) 450 is determined

Subsequently, for the detected body parts 430 and 440, colour appearance signatures 460 and 470, and geometric appearance descriptors 465 and 475 respectively are determined.

Finally, an orientation-aware appearance descriptor 480 is determined by concatenating the individual descriptors 460, 465, 470, 475 and 450.

Embodiments (with examples and alternatives)

Figs. 2A and 2B depict a general-purpose computer system 150, upon which the various OAD arrangements described can be practiced.

As seen in Fig. 2A, the computer system 150 includes: a computer module 201; input devices such as a keyboard 202, a mouse pointer device 203, a scanner 226, one or more cameras such as the cameras 115 and 125, and a microphone 280; and output devices including a printer 215, a display device 214 and loudspeakers 217. An external Modulator-Demodulator (Modem) transceiver device 216 may be used by the computer module 201 for communicating to and from remote cameras such as 116 over a communications network 220 via a connection 221. The communications network 220 may be a wide-area network (WAN), such as the Internet, a cellular telecommunications network, or a private WAN. Where the connection 221 is a telephone line, the modem 216 may be a traditional “dial-up” modem. Alternatively, where the connection 221 is a high capacity (e.g., cable) connection, the modem 216 may be a broadband modem. A wireless modem may also be used for wireless connection to the communications network 220.

The computer module 201 typically includes at least one processor unit 205, and a memory unit 206. For example, the memory unit 206 may have semiconductor random access memory (RAM) and semiconductor read only memory (ROM). The computer module 201 also includes an number of input/output (I/O) interfaces including: an audio-video interface 207 that couples to the video display 214, loudspeakers 217 and microphone 280; an FO interface 213 that couples to the keyboard 202, mouse 203, scanner 226, cameras 115 and 125 and optionally a joystick or other human interface device (not illustrated); and an interface 208 for the external modem 216 and the printer 215. In some implementations, the modem 216 may be incorporated within the computer module 201, for example within the interface 208. The

P280381 / 14064697_2

-132017279658 20 Dec 2017 computer module 201 also has a local network interface 211, which permits coupling of the computer system 150 via a connection 223 to a local-area communications network 222, known as a Local Area Network (LAN). As illustrated in Fig. 2A, the local communications network 222 may also couple to the wide network 220 via a connection 224, which would typically include a so-called “firewall” device or device of similar functionality. The local network interface 211 may comprise an Ethernet circuit card, a Bluetooth® wireless arrangement or an IEEE 802.11 wireless arrangement; however, numerous other types of interfaces may be practiced for the interface 211.

The I/O interfaces 208 and 213 may afford either or both of serial and parallel connectivity, the 10 former typically being implemented according to the Universal Serial Bus (USB) standards and having corresponding USB connectors (not illustrated). Storage devices 209 are provided and typically include a hard disk drive (HDD) 210. Other storage devices such as a floppy disk drive and a magnetic tape drive (not illustrated) may also be used. An optical disk drive 212 is typically provided to act as a non-volatile source of data. Portable memory devices, such optical disks (e.g., CD-ROM, DVD, Blu-ray DiscTM), USB-RAM, portable, external hard drives, and floppy disks, for example, may be used as appropriate sources of data (depicted as 225) to the system 150.

The components 205 to 213 of the computer module 201 typically communicate via an interconnected bus 204 and in a manner that results in a conventional mode of operation of the computer system 150 known to those in the relevant art. For example, the processor 205 is coupled to the system bus 204 using a connection 218. Likewise, the memory 206 and optical disk drive 212 are coupled to the system bus 204 by connections 219. Examples of computers on which the described arrangements can be practised include IBM-PC’s and compatibles, Sun Sparcstations, Apple MacTM or a like computer systems.

The OAD method may be implemented using the computer system 150 wherein the processes of Figs. 9-14, to be described, may be implemented as one or more OAD software application programs 233 executable within the computer system 150. In particular, the steps of the OAD method are effected by instructions 231 (see Fig. 2B) in the software 233 that are carried out within the computer system 150. The software instructions 231 may be formed as one or more code modules, each for performing one or more particular tasks. The software may also be divided into two separate parts, in which a first part and the corresponding code modules

P280381 / 14064697_2

-142017279658 20 Dec 2017 performs the OAD methods and a second part and the corresponding code modules manage a user interface between the first part and the user.

The OAD software may be stored in a computer readable medium, including the storage devices described below, for example. The software is loaded into the computer system 150 from the computer readable medium, and then executed by the computer system 150. A computer readable medium having such software or computer program recorded on the computer readable medium is a computer program product. The use of the computer program product in the computer system 150 preferably effects an advantageous apparatus for implementing the OAD method.

The software 233 is typically stored in the HDD 210 or the memory 206. The software is loaded into the computer system 150 from a computer readable medium (eg 225), and executed by the computer system 150. Thus, for example, the software 233 may be stored on the optically readable disk storage medium (e.g., CD-ROM) 225 that is read by the optical disk drive 212. A computer readable medium having such software or computer program recorded on it is a computer program product. The use of the computer program product in the computer system 150 preferably effects an apparatus for practicing the OAD arrangements.

In some instances, the OAD application programs 233 may be supplied to the user encoded on one or more CD-ROMs 225 and read via the corresponding drive 212, or alternatively may be read by the user from the networks 220 or 222. Still further, the software can also be loaded into the computer system 150 from other computer readable media. Computer readable storage media refers to any non-transitory tangible storage medium that provides recorded instructions and/or data to the computer system 150 for execution and/or processing. Examples of such storage media include floppy disks, magnetic tape, CD-ROM, DVD, Blu-rayTM Disc, a hard disk drive, a ROM or integrated circuit, USB memory, a magneto-optical disk, or a computer readable card such as a PCMCIA card and the like, whether or not such devices are internal or external of the computer module 201. Examples of transitory or non-tangible computer readable transmission media that may also participate in the provision of software, application programs, instructions and/or data to the computer module 201 include radio or infra-red transmission channels as well as a network connection to another computer or networked device, and the Internet or Intranets including e-mail transmissions and information recorded on Websites and the like.

P280381 / 14064697_2

-152017279658 20 Dec 2017

The second part of the application programs 233 and the corresponding code modules mentioned above may be executed to implement one or more graphical user interfaces (GUIs) to be rendered or otherwise represented upon the display 214. Through manipulation of typically the keyboard 202 and the mouse 203, a user of the computer system 150 and the application may manipulate the interface in a functionally adaptable manner to provide controlling commands and/or input to the applications associated with the GUI(s). Other forms of functionally adaptable user interfaces may also be implemented, such as an audio interface utilizing speech prompts output via the loudspeakers 217 and user voice commands input via the microphone 280.

Fig. 2B is a detailed schematic block diagram of the processor 205 and a “memory” 234. The memory 234 represents a logical aggregation of all the memory modules (including the HDD 209 and semiconductor memory 206) that can be accessed by the computer module 201 in Fig. 2A.

When the computer module 201 is initially powered up, a power-on self-test (POST) program 250 executes. The POST program 250 is typically stored in a ROM 249 of the semiconductor memory 206 of Fig. 2A. A hardware device such as the ROM 249 storing software is sometimes referred to as firmware. The POST program 250 examines hardware within the computer module 201 to ensure proper functioning and typically checks the processor 205, the memory 234 (209, 206), and a basic input-output systems software (BIOS) module 251, also typically stored in the ROM 249, for correct operation. Once the POST program 250 has run successfully, the BIOS 251 activates the hard disk drive 210 of Fig. 2A. Activation of the hard disk drive 210 causes a bootstrap loader program 252 that is resident on the hard disk drive 210 to execute via the processor 205. This loads an operating system 253 into the RAM memory 206, upon which the operating system 253 commences operation. The operating system 253 is a system level application, executable by the processor 205, to fulfil various high level functions, including processor management, memory management, device management, storage management, software application interface, and generic user interface.

The operating system 253 manages the memory 234 (209, 206) to ensure that each process or application running on the computer module 201 has sufficient memory in which to execute without colliding with memory allocated to another process. Furthermore, the different types of memory available in the system 150 of Fig. 2A must be used properly so that each process can run effectively. Accordingly, the aggregated memory 234 is not intended to illustrate how

P280381 / 14064697_2

-162017279658 20 Dec 2017 particular segments of memory are allocated (unless otherwise stated), but rather to provide a general view of the memory accessible by the computer system 150 and how such is used.

As shown in Fig. 2B, the processor 205 includes a number of functional modules including a control unit 239, an arithmetic logic unit (AFU) 240, and a local or internal memory 248, sometimes called a cache memory. The cache memory 248 typically includes a number of storage registers 244 - 246 in a register section. One or more internal busses 241 functionally interconnect these functional modules. The processor 205 typically also has one or more interfaces 242 for communicating with external devices via the system bus 204, using a connection 218. The memory 234 is coupled to the bus 204 using a connection 219.

The OAD application program 233 includes a sequence of instructions 231 that may include conditional branch and loop instructions. The program 233 may also include data 232 which is used in execution of the program 233. The instructions 231 and the data 232 are stored in memory locations 228, 229, 230 and 235, 236, 237, respectively. Depending upon the relative size of the instructions 231 and the memory locations 228-230, a particular instruction may be stored in a single memory location as depicted by the instruction shown in the memory location 230. Alternately, an instruction may be segmented into a number of parts each of which is stored in a separate memory location, as depicted by the instruction segments shown in the memory locations 228 and 229.

In general, the processor 205 is given a set of instructions which are executed therein. The processor 205 waits for a subsequent input, to which the processor 205 reacts to by executing another set of instructions. Each input may be provided from one or more of a number of sources, including data generated by one or more of the input devices 202, 203, data received from an external source across one of the networks 220, 202, data retrieved from one of the storage devices 206, 209 or data retrieved from a storage medium 225 inserted into the corresponding reader 212, all depicted in Fig. 2A. The execution of a set of the instructions may in some cases result in output of data. Execution may also involve storing data or variables to the memory 234.

The disclosed OAD arrangements use input variables 254, which are stored in the memory 234 in corresponding memory locations 255, 256, 257. The OAD arrangements produce output variables 261, which are stored in the memory 234 in corresponding memory locations 262, 263, 264. Intermediate variables 258 may be stored in memory locations 259, 260, 266 and 267.

P280381 / 14064697_2

-172017279658 20 Dec 2017

Referring to the processor 205 of Fig. 2B, the registers 244, 245, 246, the arithmetic logic unit (AFU) 240, and the control unit 239 work together to perform sequences of micro-operations needed to perform “fetch, decode, and execute” cycles for every instruction in the instruction set making up the program 233. Each fetch, decode, and execute cycle comprises:

· a fetch operation, which fetches or reads an instruction 231 from a memory location 228, 229, 230;

• a decode operation in which the control unit 239 determines which instruction has been fetched; and • an execute operation in which the control unit 239 and/or the AFU 240 execute the instmction.

Thereafter, a further fetch, decode, and execute cycle for the next instmction may be executed. Similarly, a store cycle may be performed by which the control unit 239 stores or writes a value to a memory location 232.

Each step or sub-process in the processes of Figs. 9-14 is associated with one or more segments of the program 233 and is performed by the register section 244, 245, 247, the AFU 240, and the control unit 239 in the processor 205 working together to perform the fetch, decode, and execute cycles for every instmction in the instmction set for the noted segments of the program 233.

The OAD method may alternatively be implemented in dedicated hardware such as one or more integrated circuits performing the OAD functions or sub functions. Such dedicated hardware may include graphic processors, digital signal processors, or one or more microprocessors and associated memories, and may reside on platforms such as video cameras.

Fig. 9 shows an example of a method 900 of learning a distance metric for comparing objects in an image using an orientation-aware descriptor, according to one OAD arrangement. In one example, the matching method 900 is used for learning a distance metric for comparing objects in an image with the intention of identifying an object of interest. The method 900 may be implemented as one or more software code modules of the software application program 233 resident in the hard disk drive 210 and being controlled in its execution by the processor 205.

P280381 / 14064697_2

-182017279658 20 Dec 2017

Fig. 10 shows an example of a method 1000 of determining orientation-aware signatures for a pair of images, according to one OAD arrangement. The method 1000 may be implemented as one or more software code modules of the software application program 233 resident in the hard disk drive 210 and being controlled in its execution by the processor 205.

Fig. 11 shows an example of a method 1100 of matching objects between images using an orientation-aware descriptor, according to one OAD arrangement. In one example, the matching method 1100 is used to compute distance between images of two objects of interest using the learned distance metric from the process 1100. The method 1100 may be implemented as one or more software code modules of the software application program 233 resident in the hard disk drive 210 and being controlled in its execution by the processor 205.

Fig. 12 shows an example of a method 1200 of computing an orientation-aware descriptor given an object of interest, according to one OAD arrangement. The method 1200 may be implemented as one or more software code modules of the software application program 233 resident in the hard disk drive 210 and being controlled in its execution by the processor 205.

Fig. 13 shows an example of a method 1300 of computing an orientation-aware descriptor given an object of interest, according to one OAD arrangement. The method 1300 may be implemented as one or more software code modules of the software application program 233 resident in the hard disk drive 210 and being controlled in its execution by the processor 205. The method 1300 is an alternative embodiment of method 1200.

Returning to Fig. 9, the process 900 starts at step 910, performed by the processor 205 executing the software program 233, which inputs labelled images 911 of “target training pairs” (also referred to as training pairs) to the system 150. Typically, the labelled target training pairs are prepared manually by a human operator of a surveillance system. The target training pairs 911 contain positive training pairs which are pairs of images of the same persons (or identities) as well as negative training pairs which are pairs of image of different persons. Referring to Fig. 15, 942 is a pair of positive training images containing images 1511 and 1512 and 943 is a pair of negative training images 1511 and 1522. In one OAD arrangement, these labelled images pairs 911 represent training data for learning a distance metric (such as a Mahalanobis distance) for determining the distance between target image pairs. A well-known method for learning a distance metric from labelled positive and negative samples of images is the KISSME algorithm.

P280381 / 14064697_2

-192017279658 20 Dec 2017

KISSME metric learning learns a distance metric of the form given by Equation 1 below where Xj and x'j are orientation aware descriptors calculated according to an OAD arrangement and (χ_έ — xj) represents a difference between two orientation aware descriptors calculated according an OAD arrangement and (x;, xj) is the measured distance between the two orientation aware descriptors. .

/{(χ,,χ,-) = (x, - Xy)⁷4(x_; - Xy) Equation 1 where A is chosen to produce a distance proportional to the log-likelihood ratio defined by Equation 2 as follows:

y _n , Mxyxjbv^yQ) ^{P X}''^Xy °⁸\ρ(χ_ί;χ}|γί = yy)7 Equation 2 where yi and yj label the identity of the probe and model respectively. Assuming the difference between pairs of samples is Gaussian distributed with zero mean, the above likelihood ratio can be expressed as shown in Equation 3 as follows:

p(xf.xj) = (X; - xp^rRTi=.yy “ ²7^yyK^xi “ ^xp + ^c Equation 3 where E_y._=y and Σ_γ._ψγ are covariance matrices for matching (positive pairs) and nonmatching pairs (negative pairs) and c is a constant that can be discarded for the purpose of computing a distance p(x_t, xj). Comparing Equation 1 and Equation 3, the desired distance metric is obtained by choosing A as the difference between the inverse of the covariance matrices of positive pairs and negative pairs as shown in Equation 4 as follows:

A = Lyfy. - Σ-fy. Equation 4

Then the process 900 proceeds to a following step 920, performed by the processor 205 executing the software program 233 and described hereinafter in more detail with reference to Fig. 10. The step 920 creates orientation-aware appearance signatures 921 for each image in a single pair of the images 911. After the step 920, the process 900 proceeds to a step 930, performed by the processor 205 executing the software program 233, which checks if all training pairs in the data 911 have been processed. If “yes”, then the process 900 follows a YES arrow and proceeds to a step 940. If “no”, then the process 900 follows a NO arrow and proceeds to the step 920 which processes the next pair of images from the data 911 to create orientation-aware appearance signatures for the pair.

P280381 / 14064697_2

-202017279658 20 Dec 2017

The step 940, performed by the processor 205 executing the software program 233, learns (ie determines) a distance metric 941 based on all pairs of images in the data 911. In one OAD arrangement, the distance metric 941 is learned using the KISSME algorithm. In the KISSME algorithm, the training image pairs 911 are separated into positive examples such as 942 and negative examples such as 943. Positive examples are training pairs of the same object of interest. Negative examples are training pairs of different objects of interest. Referring to Fig 15, 942 is a pair of positive training images containing images 1511 and 1512 and 943 is a pair of negative training images 1511 and 1522. Typically, a surveillance operator may collect images of persons using cameras and organise the collected images into positive and negative training pairs. This can be done either manually through visual inspection of images or a surveillance system may be used to assist the human operator by suggesting a set of images as candidate positive and negative matches.

For the positive training pairs such as 942, a difference of the corresponding orientation-aware appearance signatures 921 of the images in the pair is determined for each pair 942 and a covariance matrix (referred to as a positive pair covariance matrix such as 944) of the difference is determined. Similarly, for the negative training pairs, a difference of the corresponding orientation-aware appearance signatures 921 is determined for each image in the pair, and a covariance matrix (referred to as a negative pair covariance matrix 945) of the differences is determined. If X; is an orientation aware feature descriptor representing one image of a z training pair (either positive or negative training pair) and Xy is an orientation aware feature descriptor of the second image of a training pair (again it can be either a positive or a negative training pair), then (x. — Xy), represents the difference between the two orientation aware feature descriptors. Further, at the step 940, a covariance matrix 944 based on all the positive training pairs and a covariance matrix 945 based on all the negative training pairs is determined.

The distance metric 941 is then determined as the difference between the inverse of the positive pairs covariance matrix 944 and the negative pairs covariance matrix 945. After the step 940, the process 900 proceeds to a step 999 where the process ends. The step 940 is performed once using orientation descriptors of all training pairs.

Fig. 10 is a schematic flow diagram showing an example of a process 1000 of determining orientation-aware descriptors for a pair of images. The process 1000 of Fig. 10 starts at a step 1010, performed by the processor 205 executing the software program 233, which fetches a first image 1011 of an image pair from the data 911. Then the process 1000 proceeds to step 1020,

P280381 / 14064697_2

-212017279658 20 Dec 2017 described hereinafter in more detail with reference to Fig. 12. The step 1020, performed by the processor 205 executing the software program 233, determines an orientation-aware appearance signature 1021 for the first image 1011. Then the process 1000 proceeds to step 1030, performed by the processor 205 executing the software program 233, where a second image

1031 of the pair of images is fetched or accessed. Then the process 1000 proceeds to a following step 1040. The step 1040, performed by the processor 205 executing the software program 233, determines an orientation-aware descriptor 1041 for the second image 1031 of the pair as described in detail by the process 1200 of Fig. 12. After step 1040, the process 1000 proceeds to step 1099 where the process ends.

Fig. 11 is a schematic flow diagram showing an example of a method 1100 of determining orientation-aware descriptors for two images and matching the two images using their orientation-aware descriptors and a learned metric according to one OAD arrangement. The method 1100 is described with respect to Fig. 1.

The process 1100 of Fig. 11 starts at a step 1110, performed by the processor 205 executing the software program 233. The step 1110 inputs a pair of images 1111 to be matched using an orientation-aware appearance signature. One of the pair of images to match may be provided by an operator of a surveillance system. In one OAD arrangement, an operator may tag a person 100 to match. The surveillance system may determine several candidate persons (e.g. 130, 131 and 132) to determine a match for person 100. Thus the pair of images 1111 at step 1110 may be made up of person 100 and one of the persons 130, 131 or 132. In one OAD arrangement, the pair of images 1111 may be extracted from the two non-overlapping camera images 110 and 120 captured from cameras 115 and 125 respectively. In this OAD arrangement, the object of interest 100 from the captured camera image 110 is one of the images of the image pair 1111 at the step 1110. In a surveillance system where an OAD arrangement is implemented, an operator may select or tag the object of interest 100 to find a matching target (ie a matching object of interest) in a different camera view 120. In another OAD arrangement, the target of interest 100 may be automatically selected by the system because of suspicious behaviour of 100. In either case, it is understood that selecting a target of interest involves determining at least a bounding box 105 encompassing the target. Such a bounding box may be determined using a pedestrian detection algorithm. In other OAD arrangements, a head detection algorithm may be used and by applying a suitable aspect ratio, the bounding box for the person may be determined based on a bounding box of the head region. Similarly, the second image of the image pair 1111 at the step 1110 may be extracted from the image 120 from the second camera 125. In one OAD

P280381 / 14064697_2

-222017279658 20 Dec 2017 arrangement, it may be one of the following objects of interest 130, 131 or 132. In one example, the object of interest 100 is compared with another object of interest 131.

Once the two images to make the pair 1111 are selected, the process 1100 proceeds to step 1120, performed by the processor 205 executing the software program 233, which determines orientation-aware appearance signatures 1121 for the image pair 1111 as described in detail by the process 1000 of Fig. 10.

Then the process proceeds to a step 1130, performed by the processor 205 executing the software program 233. The step 1130 determines a distance 1131 between the two targets 100 and 131 (ie the distance between the two images making up the pair 1111 of images) based on their orientation-aware appearance signatures 1121 from step 1120 and the learned distance metric 941 (e.g. using KISSME) from the step 940 of the process 900. The process 1100 is repeated for each pair of images. The step 1130 applies to a single pair. At the step 1110, input is a single pair of images. Contrasting with 900, where the input is several pairs of images some of which we know are of the same person and some of different persons. Then the process 1100 proceeds to step 1140, performed by the processor 205 executing the software program 233, which matches the input pair of images 1111 based on the distance 1131 computed using the learned distance metric 941 and the orientation-aware appearance signatures 1121. The distance between the two images is calculates as a distance measure using Equation 1. Referring to Equation 1, the matrix A is replaced with the distance metric 941 which was learned at step 940 of process 900. According to the distance measure, the smaller the value of the distance measure, the more similar the images are for a given pair.

In one OAD arrangement, the distance 1131 between targets is determined by comparing the distance between the object of interest 100 and each of the targets 130, 131 and 132, and the target located at the shortest distance to the object of interest is chosen as the matching object.

In another OAD arrangement, a threshold may be used to select one or more matching objects. In this arrangement, any distance below a certain threshold may be considered a potential matching target. In the example for Fig. 1, the object of interest 100 may have the least distance to object of interest 131 and thus 131 may be considered as a match for 100. This process is described in detail as part of process 1600 of Fig. 16. In one OAD arrangement, the result of the step 1140, which is the Match Result 1141, is a Boolean (true or false) value which indicates whether the input pair of images 1111 match or not based on comparing the distance 1131 against a threshold.

P280381 / 14064697_2

-232017279658 20 Dec 2017

Fig. 16 is a schematic flow diagram showing an example of a method 1600 of matching a target of interest with one or more images of other people or objects according to one OAD arrangement. The process 1600 describes matching a target of interest 1611 captured from one camera (e.g. a target of interest may be the person 100 of scene 110) with one or more images of people from a different camera (e.g. 130, 131, 132 of scene 120). The process 1600 starts at a step 1610, performed by the processor 205 executing the software program 233, where an image of the target of interest is provided to the process 1600. This is usually done by a human surveillance operator who tags a person as a target of interest to find that person in images captured from other cameras attached to the surveillance system. Then the process 1600 proceeds to a step 1620, performed by the processor 205 executing the software program 233, where the process 1600 may detect one or more images of other objects (people) from a different camera image (e.g. 120 captured from camera 125). In other OAD arrangements, a head detection algorithm may be used and by applying a suitable aspect ratio, the bounding box for the person may be determined based on a bounding box of the head region. Using such an algorithm, persons 130, 131 and 132 from the image 120 may be detected.

Then the process 1600 proceeds to a step 1640, performed by the processor 205 executing the software program 233. At the step 1640, the process creates a list pairs of images 1641 for matching. For example, the pairs (100, 130), (100, 131) and (100, 132) may be created. Then the process 1600 proceeds to a step 1650, performed by the processor 205 executing the software program 233. The step 1650 is implemented by the process 1100 of Fig 11. After executing the step 1650, the process 1600 proceeds to a step 1660, performed by the processor 205 executing the software program 233, where it checks if a match has been found based on step 1650. If “No”, the process 1600 proceeds to the step 1650 to match the next pair of images in the list of image pairs 1641 created at step 1640. If “yes”, the process 1600 proceeds to a step

1699, performed by the processor 205 executing the software program 233, where the process

1600 ends. At the step 1660 if the process 1600 determines a match has not been found, then the process 1600 proceeds to a step 1670, performed by the processor 205 executing the software program 233, where it checks if all image pairs in the list of image pairs 1641 have been processed, if “yes”, then the process 1600 proceeds to the step 1699 where it ends. If “no”, then the process 1600 proceeds to the step 1650 where the next image pair from the list of image pairs 1641 is processed.

P280381 / 14064697_2

-242017279658 20 Dec 2017

Returning to Fig. 11, it is noted that the method 1100 is executed with a single pair of images as input. However, the process 1100 may be called by another process in a loop for several pairs of images, as shown in Fig. 16.

Fig. 12 is a schematic flow diagram showing an example of a method 1200 of determining an 5 orientation-aware appearance signature for a single image using body-part segmentation according to one OAD arrangement. The process 1200 of Fig. 12 will now be described with reference to examples from Figs. 5A, 5B, 6A, 6B, 7 and 8.

The process 1200 describes the method 1200 of creating an orientation-aware descriptor according one OAD arrangement. The process 1200 starts at a step 1205, performed by the processor 205 executing the software program 233, which receives an image 1206 of a person within a bounding box. Referring back to Fig. 1, the process 1200 receives the cut out comprising the contents of the bounding box 105 from the image 110 of the person 100. Then the process 1200 proceeds to a step 1210, performed by the processor 205 executing the software program 233, which determines a foreground mask 1211 (ie pixels corresponding to the person’s body) for the target of interest 100.

In one OAD arrangement, the foreground mask 1211 may be determined through a process of background subtraction. In this method, a Gaussian mixture model (GMM) of the colour distribution of a scene is learned using one or more images from a video of the scene with no foreground objects in the scene. The GMM model is used to segment images of the scene from the same camera. In another OAD arrangement, a scene model for a static scene and static camera is learned based on the colour distribution of the scene using a collection of images devoid of foreground objects. The scene model is used to determine a segmentation of an image that is subsequently used to initialize an energy-based segmentation algorithm. The output of the step 1210 ie the foreground mask 1211 is a binary mask for the foreground pixels of the target of interest 100. After determining the person foreground mask 1211, the process 1200 proceeds to a step 1220, performed by the processor 205 executing the software program 233.

The step 1220 determines an orientation appearance signature 1221 of the person with respect to a camera. There are several methods available to determine the orientation of a person. In one OAD arrangement, a feature vector is computed from the pixels corresponding to the foreground mask 1211 where the feature vector is a histogram of orientation gradients. In another arrangement, the feature vector is the scale invariant feature vector. This feature vector can be used as input to a machine learning classification algorithm that returns an orientation of

P280381 / 14064697_2

-252017279658 20 Dec 2017 the object with respect to the camera. An implementation of a method for training a suitable machine learning classification algorithm is described with reference to Fig. 14. In one OAD arrangement, the orientation 1221 of the object of interest is determined as a set of pose probabilities.

Referring to Fig. 5A as an example, an object of interest 520 is detected in a scene 500. The result of orientation estimation is illustrated in Fig. 5B. The object of interest 520 is determined to be at an orientation 570 of around 225° or in other words has a pose of rear-left with respect to the camera. Since there is always some uncertainty associated with the orientation determination, the returned orientation 590 is a set of pose probabilities which typically add up to 1. In this OAD arrangement, the orientation wheel 560 shows 8 quantized orientations and the orientation vector 590 correspondingly shows 8 entries corresponding the 8 quantized orientations. The first entry in the vector corresponds to the orientation 225° and shows a probability value of 0.5 indicating that there is a 50% likelihood that the orientation of the object of interest is 225°. In other arrangements, different quantization of orientation (such as 4,

6, 16 etc.) may be used. When the number of quantized orientations 560 is 8, then the pose probabilities vector 590 has 8 entries where each entry represents a particular quantized direction and the probability that the object of interest 520 is in that direction.

Referring to Fig. 6A as another example, the object of interest 605 is detected in the scene 600. The result of orientation estimation is illustrated in Fig. 6B. The object of interest 605 is determined to be at an orientation 660 of around 0° or in other words has a pose of frontal with respect to the camera. In this OAD arrangement, the returned orientation 690 is still represented as a set of pose probabilities which typically add up to 1. The pose probability 695 corresponding to the orientation 660 which has the highest value is set to the value 1 (ie 696) in a modified pose probabilities vector 691. This is tantamount to selecting the most dominant pose from a set of pose probabilities determined at this step. In other OAD arrangements, instead of selecting the pose probability corresponding to the highest value, the top N (where N is 1,2, 3, etc.) pose probabilities may be considered and their probabilities suitable adjusted and the rest discarded. It is noted that the first entry 697 in the pose probabilities vector 690 corresponds to the orientation of around 225° (ie 698).

Returning to Fig. 12, alternatively the step 1220 may also be implemented using deep convolutional neural networks (CNN). In one OAD arrangement, a prior-art method can be used, where a CNN can be jointly trained to recognize several attributes of a person in an image.

P280381 / 14064697_2

-262017279658 20 Dec 2017

These attributes include gender, orientation, accessories (e.g. bag in hand, on shoulder etc.). Such a CNN can be trained using several thousand images of pedestrians in various orientations. In step 1220, such trained CNN may be used to determine pose probabilities 590.

After the step 1220, the process 1200 proceeds to a step 1230, performed by the processor 205 executing the software program 233, which performs body part segmentation to produce segments 1231. For body part segmentation, a method of semantic segmentation may be used to determine part of the body of a person of interest. Well known CNNs which perform classification tasks such as AlexNet, GoogFeNet etc, can be adapted to perform a segmentation task. A skip architecture is used that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations.

The body part segmentation performed by the step 1230 is explained with reference to Fig. 7.

Fig. 7 is an illustration of body part segmentation and creation of body part descriptors according to one OAD arrangement. The result of semantic segmentation on an object of interest 710 is a set of constituent body parts 720, 730, 740, 750, 760, and 770. For instance, 720 is the person’s hair, 730 is the face, 740 is the upper body or shirt, 750 is the lower body or trousers, 760, 770 are the two shoes.

After the step 1230, the process 1200 proceeds to step 1240, performed by the processor 205 executing the software program 233, which determines colour appearance signatures for individual parts. In one OAD arrangement, a colour appearance signature is a histogram of pixel colours and image intensity gradients within predefined spatial cells of a rectified image. Another example of a colour appearance signature is a “bag-of-words” model of quantized keypoint descriptors. A prior-art method such as the Weighted Histograms of Overlapping Stripes (WHOS) descriptor may be used to determine the colour appearance signatures for individual parts. The Weighted Histograms of Overlapping Stripes (WHOS) descriptor is a concatenation of colour, texture and shape features. WHOS encodes a spatial distribution of colour and texture by dividing an image into regions. WHOS encodes the global shape of an object using histograms of oriented gradients (HOG); WHOS uses local binary patterns (FBP) instead of Gabor histograms to encode texture; and WHOS does not encode relative appearance between different regions.

P280381 / 14064697_2

-272017279658 20 Dec 2017

Again referring back to Fig. 7, a colour appearance signature for the hair 720 may be a colour histogram or image intensity gradient (or a combination thereof) 725. Similarly, for the face 730, the colour appearance signature is 735, for the upper body 740, the colour appearance signature is 745, for the lower body 750, the colour appearance signature is 755, for the first shoe 760, the colour appearance signature is 765, for the second shoe 770, the colour appearance signature is 775.

After the step 1240, the process 1200 proceeds to a step 1250, performed by the processor 205 executing the software program 233, which determines geometric appearance signatures of the body parts. This step is described with reference to Fig. 8A and Fig. 8B.

Referring to Fig. 8A, after a target of interest 810 is segmented into respective body parts (also referred to as body segments, or more generally object segments) at the step 1230, for a specific body part 811 which corresponds to the upper body, a colour appearance signature 812 is determined at the step 1240, and at the present step 1250, a geometric appearance signature 813 is determined. In an OAD arrangement, this geometric appearance signature may be represented by constructing a function “F” (also refererd to as a functional relationship) which takes two parameters “X” and “Yl”, where “X” is a length of the object of interest 810 (ie “X” characterises the object of interest 810 to form one characterisation of the object 810) and Yl is a length of the body part 811(ie “Yl” characterises the object part 811 to form one characterisation of the object part 811). In one OAD arrangement, the function “F” computes the ratio of “Yl” and “X”. This computed value 813 is appended to the colour appearance signature 812 to form a descriptor for the body part 811. In another OAD arrangement, the computed value 813 instead of being a single value can be a vector of values. For example, the upper body may be classified with labels “long”, “short”, “medium”, “normal” etc. Each label may be assigned a multi-valued descriptor which is appended to the colour appearance signature 812. For example, for the target of interest 810, the upper body part 811 is rather long and may be assigned a label value “long”. In another OAD arrangement, if 813 was a single value representing the ratio, then it may have a value greater than 0.5. In a similar, manner, geometric information 816 of one of the shoes 814 is also determined and appended to the appearance signature 815.

Considering another example from Fig. 8B, for a target 820, an upper body part is 821, a colour appearance signature is 822 and a geometric appearance signature is 823. Comparing with the target 810, the colour appearance signatures 812 and 822 are similar since the respective body

P280381 / 14064697_2

-282017279658 20 Dec 2017 parts 811 and 821 have similar colour and texture. However, their respective geometric appearance signatures 813 and 823 are quite different. In one OAD arrangement, 813 is a ratio greater than 0.5 (or “long” if labels are used) and 823 is a ratio less than 0.5 (or “short” if labels are used). Again referring to Fig. 8B, the target 820 is wearing boots 824 which are longer than the shoes 814 of the target 810. Hence even though their respective colour appearance signatures 825 (for part 824) and 815 (for part 814) may be similar, their respective geometric features 826 and 816 are different. In another OAD arrangement, the geometric appearance signature determination performed at this step may use Hu Moments, or ratio of area of individual parts to the whole body and such.

Returning to Fig. 12, after the step 1250, the process 1200 proceeds to a step 1260, performed by the processor 205 executing the software program 233, which checks if all parts have been processed. If “yes”, then the process 1200 follows a YES arrow and proceeds to a step 1270, otherwise the process 1200 follows a NO arrow and proceeds to the step 1240 where the colour appearance signature for the next part is determined. The step 1270, performed by the processor 205 executing the software program 233, determines the orientation-aware appearance signature 370 based on the colour appearance signatures such as 1241 computed at the step 1240, the geometric appearance signatures such as 813 computed at the step 1250 and the estimated orientation 1221 determined by the step 1220.

In one OAD arrangement, the orientation-aware descriptor 370 is computed by concatenating the respective body part colour appearance signatures 360, geometric appearance signatures 350 and the person’s orientation 320. An example of this is show in 480 of Fig. 4.

Then the process 1200 proceeds to a step 1299 where it ends.

When constructing the orientation-aware appearance signature 370, some body parts may be unavailable, either because they are missing or not detected. For example, referring back to Fig.

1, for the target 100, both hands are visible in the camera image 110 but for the corresponding target in 120, only one hand is visible for the target 131. In one OAD arrangement, the colour appearance signature and geometric appearance signature for a missing body part are specified as a colour appearance signature and a geometric appearance signature which have all “0” (ie zero) values, these “0” value colour appearance signatures and geometric appearance signatures being referred to as substitute colour appearance signatures and substitute geometric appearance signatures. In another OAD arrangement, the substitute colour appearance signatures and substitute geometric appearance signatures for missing body parts are specified as a colour

P280381 / 14064697_2

-292017279658 20 Dec 2017 appearance signatures and geometric appearance signatures which describe average body parts. In other words, if the feature distribution of the body part that is missing is examined from the training data, the mean or mode of the distribution may be used as a substitute for the missing body part.

Fig. 13 is a schematic flow diagram showing an example of a method 1300 of determining an orientation-aware appearance signature for a single image without using body-part segmentation according to one OAD arrangement. Fig. 13 depicts an alternative OAD arrangement, which is an alternative for the process 1200 in Fig. 12. The process 1300 starts at step 1310, performed by the processor 205 executing the software program 233, which | determines a personas foreground mask 1211. This step is same as the step 1210 of the process 1200 in Fig. 12. Then the process 1300 proceeds to a step 1320, performed by the processor 205 executing the software program 233. The step 1320 determines the orientation 1221 of the person. This step is same as the step 1220 of the process 1200. Then the process 1300 proceeds to a step 1340, performed by the processor 205 executing the software program 233. The step

1340 determines a colour appearance signature 1341 based on the whole body of the person.

This is done by considering the pixels that correspond to the foreground mask 1211 of the person determined at the step 1310. In one OAD arrangement, the colour appearance signature 1341 for the whole person is a histogram of pixel colours and image intensity gradients within predefined spatial cells of a rectified image. Another example of a colour appearance signature is a “bag-of-words” model of quantized keypoint descriptors. A prior-art method such as the WHOS descriptor may be used to determine the colour appearance signature 1341 for the whole person. Then the process proceeds to step 1370, performed by the processor 205 executing the software program 233, which determines an orientation-aware appearance signature 1371 based on the colour appearance signature 1341 of the step 1340 and the estimated orientation 1221 of the step 1320. Then the process 1300 proceeds to step 1399 where it ends.

Fig. 14 is a schematic flow diagram showing an example of a method 1400 of training a machine learning algorithm to detect a person’s orientation according to one OAD arrangement. Fig. 14 describes the process 1400 to train a classifier for estimating the pose of an object in an image of the object. Process 1400 starts at a receiving step 1410, performed by the processor

205 executing the software program 233, which receives as input one or more images 1411 of an object for two or more object poses. In one OAD arrangement, the images 1411 comprise up to 8 sets of one or more labels for each of 8 poses of an object comprising, “front”, “back”, “right”, “left”, “right front”, “right back”, “left front”, “left back”. Control then passes to a step

P280381 / 14064697_2

-302017279658 20 Dec 2017

1420, performed by the processor 205 executing the software program 233, which determines, for each image received from the step 1410, image features 1421, and for each image feature, a feature vector 1422 is determined. In one OAD arrangement, the feature vector 1422 is a histogram of orientation gradients. In another OAD arrangement, the feature vector 1422 is a scale invariant feature vector. Control then passes to step 1430, performed by the processor 205 executing the software program 233, which uses the image features (i.e. the corresponding feature vectors 1422) for each image and corresponding pose labels received at step 141 Oto train a classification algorithm. In one OAD arrangement the classification algorithm is a support vector machine (SVM). In another OAD arrangement, the classification algorithm is a random forest. The method 1400 concludes at a step 1499 after training the classification algorithm and returns the trained classification model 1431 for use by another process such as in step 1220 of process 1200.

INDUSTRIAL APPLICABILITY

The arrangements described are applicable to the computer and data processing industries and particularly for the video surveillance and monitoring industries.

The foregoing describes only some embodiments of the present invention, and modifications and/or changes can be made thereto without departing from the scope and spirit of the invention, the embodiments being illustrative and not restrictive.

In the context of this specification, the word “comprising” means “including principally but not necessarily solely” or “having” or “including”, and not “consisting only of’. Variations of the word comprising, such as “comprise” and “comprises” have correspondingly varied meanings.

Claims

The claim(s) defining the invention are as follows:

1. A method for matching objects of interest in video frames obtained from at least two video cameras, the method comprising the steps of:

5 receiving a pair of video frames;

for the object of interest in each of the pair of received frames:

determining an orientation appearance signature; determining a colour appearance signature; determining a geometric appearance signature; and

10 determining an orientation-aware appearance signature dependent upon the orientation appearance signature, the colour appearance signature, and the geometric appearance signature;

determining a distance metric depending upon the orientation-aware appearance signatures for the objects of interest in the pair of frames;

15 determining depending upon the distance metric a distance between the objects of interest in the pair of frames; and matching the objects of interest in the pair of frames dependent upon the distance.
2. The method according to claim 1, wherein prior to the determining of the orientation,

20 colour and geometric appearance signatures, the method comprises, for each frame in the pair of frames, a step of determining a foreground mask for the object of interest in the frame, and wherein the step of determining the orientation appearance signatures for the objects of interest in each frame comprises determining a feature vector from pixels of the foreground mask.

25
3. The method according to claim 2, wherein the step of determining the colour appearance signatures for the objects of interest in each frame comprises the steps of:

segmenting the object of interest into segments; and determining colour appearance signatures for each of the segments.

30
4. The method according to claim 3, wherein the step of determining the geometric appearance signatures for the objects of interest in each frame comprises constructing, for each segment of the object of interest, a functional relationship between a parameter characterising the object of interest and a parameter characterising the segment of the object of interest.

P280381 / 14064697_2

-322017279658 20 Dec 2017
5. The method according to claim 4, wherein the step of determining the orientation-aware appearance signature comprises concatenating the orientation appearance signature, the colour appearance signature, and the geometric appearance signature.

5
6. The method according to claim 3, wherein if segments of the object of interest are unavailable, substitute colour appearance signatures and substitute geometric appearance signatures are used to characterise the unavailable segments.
7. The method according to claim 6, wherein the substitute colour appearance signatures

10 and substitute geometric appearance signatures have either zero values, or values describing average segments.
8. The method according to claim 1, wherein the step of determining a distance metric comprises the steps of:

15 receiving training image pairs (of objects of interest; and for each pair of images determining orientation-aware appearance signatures dependent upon corresponding orientation, colour and geometric appearance signatures;

separating the training images pairs into positive examples being pairs of images of the same object of interest, and negative example being pairs of images of different objects of

20 interest;

for each pair in the positive examples;

determining a difference of the corresponding orientation-aware appearance signatures;

determining a positive pairs covariance matrix of the difference; and

25 for each pair in the negative examples;

determining a difference of the corresponding orientation-aware appearance signatures;

determining a negative pairs covariance matrix of the difference; and determining a difference between the inverse of the positive pairs covariance matrix and

30 the negative pairs covariance matrix to form the distance metric.
9. The method according to claim 2, wherein the step of determining the colour appearance signatures for the objects of interest in each frame comprises determining colour appearance signatures for the whole object of interest.

P280381 / 14064697_2

-332017279658 20 Dec 2017
10. A method for matching objects of interest in video frames obtained from at least two video cameras, the method comprising the steps of:

receiving a pair of video frames;

for the object of interest in each of the pair of received frames:

5 determining an orientation appearance signature;

determining a colour appearance signature;

determining an orientation-aware appearance signature dependent upon the orientation appearance signature, the colour appearance signature, and the geometric appearance signature;

10 determining a distance metric depending upon the orientation-aware appearance signatures for the objects of interest in the pair of frames;

determining depending upon the distance metric a distance between the objects of interest in the pair of frames; and matching the objects of interest in the pair of frames dependent upon the distance.
11. An apparatus for matching objects of interest in video frames obtained from at least two video cameras, the apparatus comprising:

at least one non-transitory computer readable storage device storing a processor executable software program; and

20 at least one processor for executing the program to perform a method for matching objects of interest in video frames obtained from at least two video cameras, the method comprising the steps of:

receiving a pair of video frames;

for the object of interest in each of the pair of received frames:

25 determining an orientation appearance signature;

determining a colour appearance signature; determining a geometric appearance signature; and determining an orientation-aware appearance signature dependent upon the orientation appearance signature, the colour appearance signature, and the geometric

30 appearance signature;

determining a distance metric depending upon the orientation-aware appearance signatures for the objects of interest in the pair of frames;

determining depending upon the distance metric a distance between the objects of interest in the pair of frames; and

35 matching the objects of interest in the pair of frames dependent upon the distance.

P280381 / 14064697_2

-342017279658 20 Dec 2017
12. A non-transitory computer readable storage device storing a processor executable software program for execution by at least one processor to perform a method for matching objects of interest in video frames obtained from at least two video cameras, the method

5 comprising the steps of:

receiving a pair of video frames;

for the object of interest in each of the pair of received frames: determining an orientation appearance signature; determining a colour appearance signature;

10 determining a geometric appearance signature; and determining an orientation-aware appearance signature dependent upon the orientation appearance signature, the colour appearance signature, and the geometric appearance signature;

determining a distance metric depending upon the orientation-aware appearance 15 signatures for the objects of interest in the pair of frames;

determining depending upon the distance metric a distance between the objects of interest in the pair of frames; and matching the objects of interest in the pair of frames dependent upon the distance.