WO2014190494A1

WO2014190494A1 - Method and device for facial recognition

Info

Publication number: WO2014190494A1
Application number: PCT/CN2013/076343
Authority: WO
Inventors: Yanfeng Zhang; Xiaojun Ma; Yuntao Shi; Jun Xu
Original assignee: Thomson Licensing SAS
Current assignee: Thomson Licensing SAS
Priority date: 2013-05-28
Filing date: 2013-05-28
Publication date: 2014-12-04
Anticipated expiration: 2015-11-28

Abstract

It is provided a method for recognizing a person in a video, wherein the method comprises steps of obtaining an image frame from the video containing the person; and recognizing the person by matching the image frame in a reference image database, wherein the reference image database consists of reference images associated with persons appearing in the video.

Description

METHOD AND DEVICE FOR FACIAL RECOGNITION

TECHNICAL FIELD

The present invention relates to data processing, and more particularly relates to a method and device for facial recognition.

BACKGROUND

A facial recognition system is a computer application for automatically identifying or verifying a person from a digital image or a video frame from a video source. One of the ways to do this is by comparing selected facial features from the image with a facial database. The core technology for facial recognition is Local Feature Analysis (LFA) algorithm. Facial recognition systems use various LFA techniques to encode, match and recognize faces. To recognize a face in a video, a traditional method firstly locates human faces in a video frame, and then recognizes those faces by matching them against the database of known reference faces. Using the traditional method to recognize actors in movies or TV series, a database with a huge number of reference faces should be built in advance. SUMMARY

According to an aspect of the present invention, it is provided a method for recognizing a person in a video, wherein the method comprises steps of obtaining an image frame from the video containing the person; and recognizing the person by matching the image frame in a reference image database, wherein the reference image database consists of reference images associated with persons appearing in the video.

Further, the reference image database stores the relationship between names and reference images, the method further comprising presenting a name that corresponding to the matched reference image in the reference image database. Further, the reference image database stores the relationship between names and reference images, the method further comprising searching for relevant information of the person by using name that corresponding to the matched reference image in the reference image database, wherein the relevant information comprises at least one of the following: picture having the person, video clip having the person, biography; and presenting the relevant information.

Further, the method further comprising obtaining a list of names for the persons appearing in the video; obtaining images corresponding to names in the list as the reference images from at least one information source providing correspondence relationship between persons' names and persons' images.

Further, the step of obtaining the list of names further comprising obtaining name of the video; obtaining the list by using the name to search in at least one information source providing correspondence relationship between video names and lists of names for persons appearing in the videos.

According to an aspect of the present invention, it is provided a device for recognizing a person in a video, wherein the device comprising a reference image database module for obtaining a reference image database consisted of reference images associated with persons appearing in the video; and a facial recognition module for recognizing the person contained in an image frame from the video by matching the image frame with the reference images in the reference image database.

Further, the reference image database stores the relationship between names and reference images, the device further comprising a presenting module for presenting a name that corresponding to the matched reference image in the reference image database.

Further, the reference image database stores the relationship between names and reference images, the device further comprising a searching module for searching for relevant information of the person by using name that corresponding to the matched reference image in the reference image database, wherein the relevant information comprises at least one of the following: picture having the person, video clip having the person, biography; and a presenting module for presenting the relevant information.

Further, the device further comprises a list fetching module for obtaining a list of names for the persons appearing in the video; wherein the reference image database module is further used for obtaining images corresponding to names in the list as the reference images from at least one information source providing correspondence relationship between persons' names and persons' images.

Further, the list fetching module is further used for obtaining name of the video; obtaining the list by using the name to search in at least one information source providing correspondence relationship between video names and lists of names for persons appearing in the videos.

It is to be understood that more aspects and advantages of the invention will be found in the following detailed description of the present invention.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, will be used to illustrate an embodiment of the invention, as explained by the description. The invention is not limited to the embodiment.

In the drawings:

Fig. 1 is a diagram showing a system architecture for facial recognition according to an embodiment of present invention;

Fig. 2 is a diagram showing the key frame extraction according to the embodiment of present invention;

Fig. 3 is a flow chart showing a method for obtaining a cast list for a video according to the embodiment of present invention;

Fig. 4 is a flow chart showing steps for face recognition according to the embodiment of the present invention; Fig. 5 is a diagram showing a web page for presenting information to the user according to the embodiment of the present invention.

DETAILED DESCRI PTION The embodiment of the present invention will now be described in detail in conjunction with the drawings. In the following description, some detailed descriptions of known functions and configurations may be omitted for clarity and conciseness.

The present invention provides a method for effective facial recognition. The method can be used, for example, for effectively recognizing actors in a movie or a TV episode and providing their relevant information to users. With the movie name or TV series name and other related information, such as release year of the movie, season number and episode number of a TV episode, the method searches and retrieves cast list or starring list from movie/TV database website on the Internet, such as lmdb.com, allrovi.com, findanyfilm.com, TV.com, etc. Then the actor names in the cast list are used to build a temporary CLFI (cast list face image) database. Each temporary CLFI database corresponds to a specific movie or TV episode, and is built temporarily just for currently watched movie or TV episode. In a movie or TV episode's CLFI database, only images relating to actors in the cast list are contained for facial recognition and matching. Such images are obtained by searching in image search website (such as Google image search, yahoo image search) or picture sharing website (for example Flickr) with the actor names in the cast list. With the CLFI database of a movie or a TV episode, a facial recognition algorithm is applied to find who the actor is. Besides, after getting the actor's name, it is optional to retrieve and provide relevant information about the actor to users. Such information may include, but is not limited to, biography, awards or nomination, filmography, etc. The relevant information is retrieved by searching on the Internet. With the above method, the calculation requirement for facial recognition is minimized by using a temporary database, which is automatically built and contains reference images relevant to the actors in the cast list.

Fig.1 shows a system architecture for facial recognition in terms of function according to an embodiment of present invention. It shall note that a person skilled in the art may contemplate a different system architecture for implementing the principle of the present invention by combining functions among modules as shown in the Fig. 1 .

The system is conceptually divided into 3 blocks, i.e. user side, server side and the Internet. It shall note that a person skilled in the art may divide the system in a different way while implementing the principle of the present invention. For example, the server side is incorporated into the block of the Internet.

Actor image obtaining module 101 is located on user's device, such as a computer, a television, a PC tablet, or a smartphone, etc., and used to obtain an image containing the actor that the user wants to know. The module 101 concerns how to capture the actor image from a watching video, e.g. a movie or a TV episode. The following three methods can be used to capture the actor image for different application situations.

1 . Key frame extraction:

Fig. 2 is a diagram showing the key frame extraction according to the embodiment of present invention. As illustrated in fig.2, the first device is a TV set with the connection from air broadcast, satellite, fiber optic, cable or IP connection from the broadcaster. The second device is equipped with an input- friendly component, such as touch screen. This type of device includes laptop, tablet, PDA, smart phone etc. There is a communication channel between the first device and the second device to provide data exchange. When a user finds an actor he prefers to know in the video, he can send a command message (e.g. a capture command) via the second device to the first device, and then the first device will send back a short clip of the video. The second device extracts image frame from the video clips and presents them on its screen. The user selects an appropriate image frame containing the actor to show it on the screen of the second device.

2. Picture taking

While user is watching a movie or a TV episode, he uses a device with a camera, such as a smart phone, a PC tablet, etc., to take a picture that contains the actor he prefers to know. Besides, he can select to record a video clip with the device, and choose an image containing the actor from key frames extracted by the device.

3. Pausing a video and copying the screen.

While the user is watching a movie or a TV episode on a smart device, such as a computer, a PC tablet, or a smart phone, he can pause the video and instruct the device to copy current screen as the actor image.

If the captured image containing more than one person, the device can divide the image into sub images, each of which contains one person, and present them for the user to select. There are other ways for doing the selection of the actor. For example, the user can designate the targeting actor by drawing a cycle or a rectangle on the actor's face, or by just clicking on his face. Then the designated actor face image is sent to the server.

Modules of cast list fetching 102, face database building 103, facial recognition 104 and actor information searching and presenting 105 are located in the server side, for example in a server computer of service provider. It shall note that the server can reside in user's home or at the remote, e.g. service provider's premise.

Cast list fetching module 102 is configured to obtain a cast list of a movie or a TV episode from relevant movie/TV database websites on the Internet. Such movie/TV database websites on the Internet, for example, imdb.com, allrovi.com, findanyfilm.com and tv.com etc., usually provide an online database of information related to movies, television shows, actors, production crew personnel, video games and fictional characters featured in visual entertainment media.

Fig. 3 is a flow chart showing a method for obtaining a cast list for a video, e.g. a movie or a TV episode. In the step 301 , cast list fetching module 102 in the server gets information about the video being watched. The information includes, but is not limited to, the name and release year of a movie, the name, the season number and the episode number for a TV episode. Normally, the information can be obtained from EPG (Electronic Program Guide). In an example, only the name of the video is used. In the step 302, the information is used to search in the first site of the movie/TV website list. The website list includes websites capable of providing online information related to movies, television shows, actors, production crew personnel, video games and fictional characters featured in the video. In the embodiment, the website list includes imdb.com, allrovi.com, findanyfilm.com, tv.com, etc. In the step 303, it checks whether or not there is entry in the search result for the video. If no, in the step 304 it will continue to search the next website in the list until it reaches the end of the list. If yes, in the step 307 it will use one of keywords including "cast", "starring" and "actor" etc. to search for the cast list. In the step 308, it checks if it succeeds in obtaining the cast list. If not, it will determine that no result is found and prompt such information to the user in the steps 305 and 306. If yes, in the step 309 it will sends the cast list to the face database building module.

Face database building module 103 is configured to use the actor names in the cast list to search for the actor's images in image searching engines, such as Google/Yahoo image search, or picture sharing website, for example, Flickr, and then build a temporary reference image database by using the searched images as reference images. The database has at least two data fields, one data field contains the actor names and the other data field contains reference images for the actor. It shall note that one or more reference images correspond to one actor name. Besides, the temporary reference image database is stored in the server. When a new cast list becomes available (e.g. proceeding to next episode), the server needs to update the temporary reference image database. The server firstly searches in the temporary reference image database with the actor names to find those names in the new cast list that don't have entry in the temporary reference image database. And then the server obtains reference images for those names and updates the temporary reference image database.

Facial recognizing module 104 is configured to recognize the actor in an image provided by the actor image obtaining module 101 by matching between the actor image provided by the actor image obtaining module 101 and the reference images in the temporary image database.

With the actor image retrieved or taken from the currently watched video and with the temporary reference image database for the currently watched video, the facial recognizing module 104 is executed to find the identity of the actor. Facial recognition is a recognition process that analyzes facial characteristics, and then uses the characteristics to find similar or same faces in the database. There already exits multiple facial recognition algorithms, such as Eigenface, Fishferface and Elastic Graph Matching, Support Vector Machine, and Neural Networks, etc. Fig. 4 is a flow chart showing steps for face recognition according to the embodiment of the present invention.

In the Fig. 4, face detection and face alignment are the basis for feature extraction. The actor image captured or taken from currently watched video is passed to face detection, which will go through the color model, skin segmentation to extract any possible face region from background scenes. Then the face alignment crops and resizes face areas to make sure they have the same sizes and scales. Face alignment also removes the noise and illumination interference, remedies affine transformation etc. After face alignment, feature extraction is performed on the aligned face to provide effective information - face features that are useful for distinguishing between faces of different persons. The lower part in the rectangle in the Fig. 4 is the feature extraction preprocess for reference images in the temporary reference image database. The preprocess procedure deals with all reference images in the temporary reference image database to extract their features and to store these features. The preprocess uses the same face detection, face alignment and feature extraction as the above operation on the actor image. The extracted actor image features are then compared with the features of the reference images for recognizing the actor. To measure the similarity degree between the features of the actor image and the features of one reference image, Euclidean distance between the two groups of features is calculated. If the Euclidean distance is within a similarity threshold, it can be determined that the reference image matches with the actor image. The value of similarity threshold depends on the actual feature extraction algorithm and system requirement. The output of the facial recognition procedure is the actor name.

The actor information searching and presenting module 105 is configured to use the identified actor name to search in the movie/TV database websites on the Internet again to retrieve relevant information about the identified actor. It shall note that Internet search engine, such as Google, Bing, etc., can also be used to retrieve the relevant information about the identified actor. The relevant information includes biography, photos including the actor, video clips including the actor, awards and nominations etc. The name and the relevant information are organized and presented to the user. Fig. 5 is a diagram showing a web page for presenting information to the user.

In order to decrease response time for facial recognition and reduce the requirement on the calculation power of the server, the functions realized by cast listing fetching module, face database building module and the function of feature extraction for reference images in the face database can be executed before actor image is provided by the actor image obtaining module 101 . That is to say, when user starts to see a video, the system automatically executes the above three functions.

According to a variant, the method can be used in sports event, e.g. golf, boxing, etc. Besides, the method can further be used for a video as long as there is a list of names associated with the video, i.e. a list of names for persons who appear in the video.

According to a variant, if more than one person is included in an image frame, all persons will be identified and relevant information for all persons is provided and presented to user for selection of one targeting person.

A number of implementations have been described. Nevertheless, it will be understood that various modifications may be made. For example, elements of different implementations may be combined, supplemented, modified, or removed to produce other implementations. Additionally, one of ordinary skill will understand that other structures and processes may be substituted for those disclosed and the resulting implementations will perform at least substantially the same function(s), in at least substantially the same way(s), to achieve at least substantially the same result(s) as the implementations disclosed. Accordingly, these and other implementations are contemplated by this application and are within the scope of the invention as defined by the appended claims.

Claims

1 . A method for recognizing a person in a video, wherein the method comprsing steps of

obtaining an image frame from the video containing the person; and

recognizing the person by matching the image frame in a reference image database, wherein the reference image database consists of reference images associated with persons appearing in the video.

2. The method of the claim 1 , wherein the reference image database stores the relationship between names and reference images, the method further comprising presenting a name that corresponding to the matched reference image in the reference image database.

3. The method of the claim 1 , wherein the reference image database stores the relationship between names and reference images, the method further comprising searching for relevant information of the person by using name that corresponding to the matched reference image in the reference image database, wherein the relevant information comprises at least one of the following: picture having the person, video clip having the person, biography; and

presenting the relevant information.

4. The method of the claim 1 , wherein the method further comprising

obtaining a list of names for the persons appearing in the video;

obtaining images corresponding to names in the list as the reference images from at least one information source providing correspondence relationship between persons' names and persons' images.

5. The method of the claim 4, wherein the step of obtaining the list of names further comprising obtaining name of the video;

obtaining the list by using the name to search in at least one information source providing correspondence relationship between video names and lists of names for persons appearing in the videos.

6. A device for recognizing a person in a video, wherein the device comprising a reference image database module for obtaining a reference image database consisted of reference images associated with persons appearing in the video; and a facial recognition module for recognizing the person contained in an image frame from the video by matching the image frame with the reference images in the reference image database.

7. The device of the claim 6, wherein the reference image database stores the relationship between names and reference images, the device further comprising a presenting module for presenting a name that corresponding to the matched reference image in the reference image database.

8. The device of the claim 6, wherein the reference image database stores the relationship between names and reference images, the device further comprising a searching module for searching for relevant information of the person by using name that corresponding to the matched reference image in the reference image database, wherein the relevant information comprises at least one of the following: picture having the person, video clip having the person, biography; and

a presenting module for presenting the relevant information.

9. The device of the claim 6, wherein the device further comprising

a list fetching module for obtaining a list of names for the persons appearing in the video;

wherein the reference image database module is further used for obtaining images corresponding to names in the list as the reference images from at least one information source providing correspondence relationship between persons' names and persons' images.

10. The device of the claim 9, wherein the list fetching module is further used for obtaining name of the video; obtaining the list by using the name to search in at least one information source providing correspondence relationship between video names and lists of names for persons appearing in the videos.