WO2016075293A1 - Accelerated support vector machine (svm) learning using clustering - Google Patents
Accelerated support vector machine (svm) learning using clustering Download PDFInfo
- Publication number
- WO2016075293A1 WO2016075293A1 PCT/EP2015/076567 EP2015076567W WO2016075293A1 WO 2016075293 A1 WO2016075293 A1 WO 2016075293A1 EP 2015076567 W EP2015076567 W EP 2015076567W WO 2016075293 A1 WO2016075293 A1 WO 2016075293A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- features
- block
- svm
- feature
- negative
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
Definitions
- the present disclosure relates to image recognition or searching. More precisely, the present disclosure relates to image searching based on Exemplar Support Vector Machines (E- SVM).
- E- SVM Support Vector Machines
- An image search is a search for an image or images.
- the input to the image search may be an image or images.
- Image searching is relevant in many fields, including computer vision, object recognition, video tracking, and video-based location determination/mapping.
- Fig. 1 illustrates a method 100 for performing image searching.
- the method 100 includes receiving query image(s) 1 10 that will be the basis of the image search; extracting image features 120 from the query image(s); and performing image searching 130 by using the image features from block 120.
- Block 130 may perform image searching based on any technique, including image retrieval and semantic searching techniques.
- Fig. 1A illustrates an example of an image retrieval search technique.
- An image retrieval search generally locates images that have the same scene as a query image. The located same-scene images may even have scene alterations resulting from task-related transformation(s) (e.g., changes in scene illumination, image cropping, scaling, wide changes in the perspective of the camera, high compression ratios, or picture-of-video-screen artifacts).
- task-related transformation(s) e.g., changes in scene illumination, image cropping, scaling, wide changes in the perspective of the camera, high compression ratios, or picture-of-video-screen artifacts.
- Query image features may be received at block 141.
- Block 142 may perform image retrieval based on image features received from block 141 and image features received from a large database of features 145.
- Block 142 may compare the image features received from block 141 with the image features received from the large database of features 145.
- the large database of features 145 may consist of image features extracted from images stored in a database of search images 143.
- Block 144 may extract features of each image in the database of search images 143.
- the large database of features 145 may store the results determined by block 144.
- Blocks 143-145 may be computed offline or prior to the execution of blocks 141, 142, and 146.
- block 142 performs image retrieval based on a distance between the features from block 141 and the features of the large database from block 145. For example, block 142 may calculate the Euclidean distance between a query image feature from block 141 and each of the features from block 145. Block 142 may choose the images from database 143 that correspond to the smallest distances. Block 142 may provide the search results to block 146. Block 146 may output the search result(s). If multiple results are returned, the results may be ranked and the ranking may be provided along with the results.
- Fig. IB illustrates an example of a semantic search.
- the semantic search searches images for visual concepts representing a search query (e.g., cats).
- Block 151 receives features of positive images (i.e., features of images that contain the queried visual concept (e.g., cats)).
- Block 152 receives features of negative images (i.e., features of images that do not contain the queried visual concept (e.g., no cats, instead contain dogs, other animals or no animals)).
- Block 153 may perform classifier learning to determine a classifier.
- Block 153 performs classifier learning based on the positive features from block 151 and the negative features from block 152.
- Block 153 may choose a classifier that best differentiates the positive features from the negative features.
- the classifier learning may be an SVM algorithm. Alternatively, the classifier learning may be other learning techniques.
- Block 155 may perform image searching based on the classifier from block 153 and the features from the large database 154.
- the large database of features 154 may consist of feature vectors of images that will be searched. For example, the large database of features 154 may be determined in accordance with the principles described in connection with block 145 of Fig. l a.
- Block 155 may apply the classifier from block 153 to each of the features from block 154.
- Block 156 may output the result(s) from block 155. If multiple results are returned, the results may be ranked and the ranking may be provided along with the results.
- An E-SVM is a classifier that is a linear Support Vector Machine learned from a single positive example, referred to as the exemplar.
- the exemplar can be determined based on the positive example and a set of generic, negative examples N.
- Malisiewicz et al, Exemplar-SVMs for Visual Object Detection, Label Transfer and Image Retrieval, International Conference of Machine Learning, 2012 discuss E-SVMs.
- E-SVM search techniques require an inefficiently large set of generic, negative examples Af (e.g., a set of one million feature vectors). The large set is inefficient during data computations. Also, present E-SVM search techniques are limited to using an E-SVM representation as a classifier.
- Hard-negative mining is a process of (i) selecting subset of generic negative features Af (hereinafter referred to as the "hard-negative cache"); (ii) training a classifier based on the hard- negatives cache; and (iii) increasing the hard-negative cache based on items from Af that have a classification score inside a margin (i.e., greater than - l).
- Af generic negative features
- E-SVM searching based on hard-negative mining is still a highly complex and inefficient process because it still relies on a large set of generic negatives.
- E-SVM techniques also have limited generalization because, despite the size of the negative set, they utilize a single positive exemplar.
- Present E-SVM determinations rely on asymmetric determinations.
- Image retrieval using the asymmetric approach has many shortcomings, including sub-par performance relative to features tailored for the image retrieval task.
- the asymmetric approach has not been considered for semantic search because E-SVMs are tailored for data-constrained scenarios, and many positives are generally available in semantic search.
- the asymmetric approach relies on a very large generic negatives pool, and suffers from limits in generalization that must be overcome by deriving extra positives from the original exemplar.
- An aspect of present principles relate to methods and apparatus for image searching.
- the method may include determining a classifier based on at least a centroid representing a plurality of negative features and performing image searching based on the classifier.
- the apparatus may include an apparatus configured to determine a classifier based on centroids representing a plurality of negative features and to perform image searching based on the classifier. The centroids are determined based on a clustering of the plurality of negative features.
- the centroid may be determined based on a corresponding cluster.
- the corresponding cluster may include a subset of the plurality of negative features.
- the subset of negative features includes negative features that are closer to the corresponding cluster than any other cluster.
- the centroid may be determined based on a left-singular vector of a matrix composed of the subset features.
- a score is determined for a search image by applying the classifier to the search image.
- the centroid further may have associated auxiliary data.
- the auxiliary data may be used to modify the at least a centroid.
- the auxiliary data may correspond to a weight value.
- the weight may be determined based on the amount of negative features that are associated with the at least a centroid.
- the auxiliary data may be statistical information of the subset of negative features associated with the centroid.
- the classifier may be determined based on a positive E-SVM feature associated and the centroids as negative features.
- the classifier may be determined based on a positive RE-SVM feature and RE-SVM features associated with the centroids.
- a feature may be assigned to a centroid based on an inner product between the feature and the centroid.
- the total amount of features corresponding to the plurality of negative features may have an amount greater than ten times the size of each negative feature.
- FIG. 1 illustrates a flow diagram of a method for image searching.
- FIG. 1 A illustrates a flow diagram of a method for an image retrieval search.
- FIG. IB illustrates a flow diagram of a method for a semantic search.
- FIG. 1C illustrates an example of an illustration of a graphical representation of an E-
- FIG. 2 illustrates a flow chart of an exemplary method for determining E-SVM feature(s).
- FIG. 3 illustrates a flow chart of an exemplary method for determining an RE-SVM feature(s).
- FIG. 4 illustrates a flow chart of an exemplary method for semantic search using RE- SVM features.
- FIG. 5 illustrates a flow chart of an exemplary method for extracting RE-SVM local descriptors.
- FIG. 6 illustrates a flow chart of an exemplary method for accelerated E-SVM determinations.
- FIG. 6A illustrates a flow chart of an exemplary method for centroid determination.
- FIG. 6B illustrates an exemplary diagram of a plot of similarity clustering.
- FIG. 7 illustrates a block diagram of an exemplary image processing device.
- FIG. 8 illustrates a block diagram of an exemplary distributed image processing system.
- FIG. 9 illustrates an exemplary apparatus. DETAILED DESCRIPTION
- an "E-SVM function” refers to a determination of linear classifiers based on one or more positive and negative examples.
- the E-SVM function may perform a learning determination based on the positive examples of a class (i.e., examples that represent the class; for example, for the class "cats", the set of examples may consist of images of cats) and negative examples (i.e., examples that do not represent a class).
- a learning algorithm may choose a function that best differentiates positive examples from negative examples.
- E-SVM feature refers to the output of the E-SVM function.
- an "RE-SVM function” refers to a recursive framework for determining
- the recursive framework may determine an initial set of E-SVM features based on image feature(s) of a query image and image features of generic negative images.
- the recursive framework may determine an initial RE-SVM feature by applying an E-SVM function to E-SVM features of query image(s) and E- SVM features of generic negative images.
- the recursive framework may then recursively determine RE-SVM features by repetitively applying the E-SVM function based on a prior repetition's determination of RE-SVM query features and features of generic negative images.
- an "RE-SVM feature" refers to the output of the RE-SVM function.
- the scalars, vectors and matrices may be denoted using, respectively standard, bold, and uppercase-bold typeface (e.g., scalar , vector a and matrix A).
- the notation v A . may denote a vector from a sequence vi , v 2 , . . , , ⁇ ⁇ ⁇ , and v k may denote the fe-th coefficient of vector v.
- the notation [a*]* respectively, [ ⁇ 3 ⁇ 4]3 ⁇ 4) may denote concatenation of the vectors a fc (scalars ⁇ 3 ⁇ 4 ) to form a single column vector.
- 3 ⁇ 4 may denote the Jacobian matrix with (i, j th entry
- An aspect of present principles is directed to mapping E-SVM features to RE-SVM feature(s) based on an E-SVM function.
- An aspect of present principles is directed to determining RE-SVM feature(s) during a post-processing stage.
- enhancement of image searching and image classification may occur based on E-SVM features obtained by concatenating the activations of Deep Convolutional Neural Networks (DCNNs).
- An aspect of present principles is directed to automatically determining an RE-SVM feature that contains unique information of a query image relative to information of generic, negative images. The automatic extraction of the unique information is more efficient than feature extractors that rely on manual encoding of uniqueness into a feature extraction process.
- An aspect of present principles is directed to image searching based on symmetrically determined E-SVM features.
- symmetric determinations or “symmetrically” determined refers image searching based on E-SVM features of query images (i.e., positive images), E-SVM features of generic negative images, and E-SVM features of images that will be searched. These symmetric determinations improve robustness and efficiency because the E-SVM function has been applied to the images and this function determines, from the feature representing each image, a representation that separates the image from a large generic negative pool of features.
- the image feature encoder may be a global encoder, may be based on aggregated local features, or may be derived from C s-based features.
- the features may be denoted by vectors in a space R D .
- the parameter ff D may represent the space of all possible vectors having D, real-valued entries.
- An aspect of present principles provides a novel solution to the problem of complexity of generic negative image sets with a large amount of images.
- An aspect of present principles reduces complexity based on clustering of the generic negative features.
- the clustering may consist of deriving centroids.
- Each centroid may represent a subset of the large set of generic negative images.
- Each centroid may be associated to a subset of generic negative features that are closest to that centroid. That is, the generic negative features of a subset are the generic features that are closer to that centroid than to any other centroid.
- the distance may be determined by maximizing the inner-product similarities between the generic negative features of the subset and the centroid.
- An E-SVM feature in accordance with present principles may be determined based on an optimization determined as follows:
- the parameters ⁇ , ⁇ , and ⁇ ⁇ may have positive, scalar values.
- the parameters ⁇ , ⁇ , and on may control regularization and relative weight of positive and negative features.
- Each of the parameters ⁇ , ⁇ , and on can be manually chosen or can be automatically determined.
- the parameter x may be a feature vector that relates to a query image, a generic negative image, or a search image.
- feature vector x may be determined using an encoder such as a VLAD, Fisher Vector, CNN activation coefficients, or another E-SVM feature.
- the parameter N may represent a set of generic feature vectors (e.g., of a size N).
- Parameter w is the parameter that is being optimized under Equation No. (1).
- the parameter zi is negative feature that is part of the set of A/negative features.
- the set of generic feature vectors ⁇ may be a large set of vectors.
- the optimization of Equation No. (1) may determine the best (i.e., the optimal) vector w.
- the optimization may be based on stochastic gradient descent.
- the stochastic gradient decent may be determined using a Pegasos algorithm.
- the stochastic gradient descent may process a feature vector of parameter x or parameter zi
- the feature vectors x and zi may be processed in a random order.
- the parameters ai and a-i may be determined during the stochastic gradient descent algorithm.
- the optimization of Equation No. (1) may be an SVM problem relying on hinge loss, where the amount of features in the positive set is unbalanced relative to the amount of features in the negative set. For example, there may be one positive feature and a million negative features.
- an E-SVM feature may be determined based on a i 2 normalization of the result of Equation No. (1).
- the normalized E-SVM feature may be determined as follows:
- the parameters x and ⁇ may have the same values as parameters x and M of Equation No. (1).
- h normalization other types of normalization may be used such as h normalization or other type of post-processing may be used such as zeroing of negative coefficients or signed square roots of coefficients.
- An E-SVM feature in accordance with present principles may be determined based on an optimization determined as follows:
- the parameters x and Af may have the same values as parameters x and of Equation No. (1).
- the parameter z may denote a negative feature in a set of negative features ⁇ ' .
- the parameters Ci, C2 are regularization parameters. In one example the parameters Ci, C2 may be customized empirically.
- the parameters w' and b denote the parameters over which there is optimization or minimization.
- an E-SVM feature may be determined based on a 3 ⁇ 4 normalization of the result of Equation No. (3) as follows:
- h normalization other types of normalization may be used such as h normalization or other type of post-processing may be used such as zeroing of negative coefficients or signed square roots of coefficients.
- An aspect of present principles is directed to determining RE-SVM feature(s) based on E-SVM features, including E-SVM features determined in accordance with the principles described in connection with Equation Nos. (l)-(4).
- the RE-SVM feature(s) may be determined based on an E-SVM function that utilizes previously determined E-SVM features of query, training and/or search images.
- a feature encoder may encode E-SVM features instead of the image feature representations. For example, if x is a feature vector encoded by a feature encoder, the feature encoder may instead encode an E-SVM feature (e.g., w(x,A' ' )).
- An aspect of present principles is directed to E-SVM functions based on symmetrically determined E-SVM features.
- the symmetric approach includes determining E-SVM features for a query image x°.
- the symmetric approach further includes determining corresponding E- SVM features for each image x in a database V.
- the database V may consist of images that will be searched.
- the E-SVM feature w° for a query image x° may be represented by:
- the set of E-SVM features Wi corresponding with image xi in a database I of search images may be represented by:
- Equation No. (6) ⁇ w,- - w(x.i,A ' )) ⁇ i
- search images 3 ⁇ 4 or E-SVM features Wi corresponding may be sorted based on their similarity to the query image.
- similarity may be measured between an E-SVM feature w° derived from a query image feature x° and an E-SVM feature Wj derived from a database image feature xj as follows:
- the parameter wy represents the ESVM feature of a database image xy.
- the parameter w° represents the ESVM feature of a query image.
- the parameter Sj may represent a similarity score of the image feature of the query image and the jth search image. Because each feature w is uniquely associated with a single image, the parameter Sj may also represent a similarity score between the query image and the database image xj. A high similarity score may indicate a high similarity between the images, whereas a low similarity score may indicate a low similarity between the images.
- the score may be determined based on the inner product between wj and w°. In another example, the score may be determined based on a negated Euclidean distance (e.g.,—
- Image searching based on E-SVM or RE-SVM features includes learning a classifier.
- a classifier based on E-SVM or RE-SVM features may be learned based on an optimization as follows:
- the annotated training set ⁇ j ⁇ x il may consist of labels y» e ⁇ -1 , 1).
- Vector v represents a classifier v.
- Parameter C represents a scalar chosen empirically and controlling the regularization.
- Parameter N represents the size of the training set.
- Parameters Wi represent the
- E-SVM features of the training set images and b represent the bias or offset between the origin and the hyperplane orthogonal to v.
- E-SVM or RE-ESVM features will be determined for the query image, the negative images and the search images.
- the image classifier v is determined based on the training (positive and negative) images.
- the image classifier v is applied to a database of searched images that are also represented by E-SVM or RE-SVM features.
- a classifier may be learned from the RE-SVM representations of training images.
- the training image(s) may be utilized to determine a K-nearest- neighbor classifier, or a nearest-class-mean classifier.
- the RE-SVM features may be determined by recursively applying an E-SVM function to E-SVM or RE-SVM features (e.g., w(x,N * )).
- initial E-SVM features may be represented by:
- the parameter w° may represent the initial feature of an image.
- the image can be any image, including a query, database, or training image.
- the parameter Af° may represent the initial feature representations of generic, negative images.
- a first RE-SVM (RE-SVMi) feature is determined based on an E-SVM function applied to initial features.
- a -th recursion RE-SVM (RE-SVMj) feature may be determined as follows:
- the value of j may be equal to or greater than 1.
- the j-th RE-SVM (or the j-th E-SVM) may be determined in accordance with the principles described with Equations Nos. (l)-(4).
- the E-SVM function in Equation Nos. (10) and (10a) may be derived according to Equation Nos. (l)-(4).
- the RE-SVM feature construction may be similar to deep architectures that use an output of a given layer as the input of a subsequent layer.
- an aspect of present principles unlike deep architecture, determines E-SVM or RE-SVM features using an unsupervised learning routine on a per-image basis.
- unsupervised may indicate automatic, machine learning without user annotations to a training set.
- each j-th RE-SVM feature (RE-SVMj) may be determined by a single, non-linear, and/or convex problem, as opposed to a standard tandem linear/non-linear arrangement.
- local descriptors ⁇ s k ⁇ k may be extracted from images.
- the local descriptors may be extracted for the query image, the negative images, the database images that will be searched, or any other images that are used to learn classifiers.
- the descriptors may be aggregated based on a descriptor aggregating method, such as VLAD, the Fisher Vector, or Bag-of-Words to produce a single feature vector.
- E-SVMs are agnostic to the underlying feature representation, it is possible to compute an E-SVM from each local descriptor s k by using an adequate set M of generic local descriptors, and further to do so recursively as in Equations (10) and (10a).
- the resulting RE- SVM local descriptors can subsequently be used to build aggregated features using the standard methods previously mentioned such as VLAD, the Fisher Vector or Bag-of-Words.
- the subsequent features can be further processed using RE-SVM learning functions as discussed in connection with Equation Nos. (10) and (10a). Accelerating E-SVM learning using similarity clustering
- An aspect of present principles is directed to clustering generic, negative image features to reduce complexity.
- computational complexity of E-SVM feature determinations is decreased despite the existence of a large amount of generic, negative images.
- the amount of generic negative image features is reduced based on corresponding centroid features. This reduction results in reduced storage and complexity.
- this decrease is achieved by deriving a smaller amount of representative centroids from a large amount of generic, negative features.
- a centroid may refer to a vector that represents a subset of features.
- the centroid may represent the mean of the subset, the medoid of the subset (i.e., the subset feature closest to the mean) or a singular vector or eigenvector derived from the subset.
- the centroids may replace the large amount of generic features, thereby resulting in a smaller training set for an E-SVM function, and hence a faster learning time.
- the amount of centroids needs to be chosen empirically, as using too few centroids may result in a non-representative generic negative pool.
- the centroids may further have associated auxiliary data.
- the auxiliary data may be a weight that is associated with each centroid.
- the weight may represent the number of generic negative image features that are part of a corresponding centroid.
- the auxiliary data may consist of statistics derived from the generic negative image features of the corresponding centroid. The statistics may include the mean and correlation matrix of the generic negative image features of the corresponding centroid.
- the centroids may be determined based on a clustering approach that is applied to features of generic, negative images.
- the clustering approach may be applied to multi-modal distribution of generic images' features.
- a clustering approach may locate the centroids of the multi-modal distribution. The large amount of negative image features may then be replaced by a smaller set of centroids, weighted by their importance.
- the centroids may be determined based on:
- the parameters Ck and ci are centroids.
- the parameter K is the number of learned centroids.
- the parameterJV is the generic negative pool.
- the parameter ⁇ is the subset of features from the generic negative pool that correspond to the centroid Ck. The subset of features ⁇ may be assigned to centroid Ck because they are closer to centroid Ckthan any other centroid.
- the parameters z and z' are feature vectors from the generic negative pool. Each centroid c k vector may be determined based on a first left-singular vector of a matrix derived by stacking the vectors for % e ⁇ as columns.
- the process for clustering features to determine centroids may include: i) updating each 3 ⁇ 4 in accordance with Equation No. (1 1), ii) updating cluster information with a determination of whether a generic negative feature should be assigned to that cluster, and (iii) repeating (i) and (ii) recursively.
- the present clustering approach includes clustering that is sensitive to the sign of the inner product ⁇ ' ⁇ 3 ⁇ 4 (compare with dictionary learning where the cluster assignment is done using the absolute value, e.g., )z' T ci I).
- the process for clustering or centroid determination may be performed by block 603 of Fig. 6 or blocks 658 and 661 of Fig. 6A.
- each feature z may be assigned a centroid.
- the feature z may be assigned to a centroid based on the distance between the feature z and the centroid.
- an E-SVM function may be performed based on the centroids of the generic negative images instead of the features of the generic negative images.
- E-SVM features may be determined by applying an E- SVM function to centroids as follows:
- Equation No. (12) corresponds to a modification of Equation No. (1).
- the parameter c k may correspond to a kth centroid.
- the parameter iV A denotes the number of vectors in Af assigned to 3 ⁇ 4 ,. Letting
- Fig. 2 illustrates a flow chart of a method 200 for extracting E-SVM features.
- Block 201 may receive at least an image.
- block 201 may modify the image to include at least a mirrored, flipped, cropped, shifted and scaled version of the original image.
- block 201 may receive a plurality of images of a queried scene (e.g., a plurality of images of the same scene of the statue of liberty).
- Block 202 may determine an initial feature vector for the image received at block 201.
- the initial, base feature vector may be any type of feature vector, including VLAD, BoW, Fisher, and deep CNN activations.
- the initial feature vector of the image may also be known as the positive feature.
- Block 203 may receive a set of generic images.
- the generic images may comprise the generic negative set M as discussed in connection with Equation Nos. (1) and (3).
- the amount of generic negative set images may be large enough to allow for a reasonable computational time on a target CPU (e.g., a per-image E-SVM feature construction time of about 1 second, corresponding to about 100,000 negative images).
- the content of the generic negative images varies between each negative image.
- Block 204 may determine initial feature vectors the generic, negative images received from block 203. For example, block 204 may determine an initial feature vector for each image in the set of generic images. The feature vectors determined by block 204 may be the same type of feature vectors as the feature vector determined by block 202. Block 204 may provide the determined feature vectors, i.e., the generic negative features, to block 205.
- Block 205 may determine an E-SVM feature 210.
- Block 205 may determine the E- SVM feature 210 based on the initial feature vector from block 202 and the initial generic negative feature vectors received from block 204.
- Block 205 may include one or more of blocks 206, 207 and 208.
- block 205 may combine or omit the functionality described in connection with one or more of blocks 206, 207 and 208.
- the functionality of blocks 206-208 may be combined and performed by block 205.
- block 205 may perform a classifier determination other than the SVM determination.
- block 205 may perform additional post-processing scheme to the determined exemplar feature.
- Block 206 may perform an SVM determination based on the initial feature from block 202 and the initial generic negative features from block 204.
- Block 206 may perform an SVM or E-SVM function.
- Block 206 may determine a vector u and a bias parameter b.
- the vector u may be orthogonal to a hyper-plane that separates generic negatives from an exemplar feature(s).
- the bias parameter b may specify a displacement of the hyper-plane along the orthogonal.
- block 206 may determine the orthogonal vector u and the bias parameter b in accordance with the principles described in connection with Equation Nos. (1) and (3).
- Block 207 may process a vector u from block 206.
- Block 207 may normalize the vector u to determine parameter w.
- Block 207 may optionally post-process the vector u or parameter w.
- block 207 may perform post-processing such as a normalization under a 1-2 normalization (e.g., in accordance with the principles described in connection with Equation Nos. (2) and (4)).
- the processing may be squared root related to the Hellinger Kernel.
- the processing can be a sequential application of a plurality of post-processing solutions.
- Block 205 may output an E-SVM feature 210.
- the E-SVM feature 210 may correspond to the E-SVM feature illustrated in Fig. 1C.
- the E-SVM feature 210 may be a post-processed version of the E-SVM feature illustrated in Fig. 1C.
- Fig. 1C illustrates an orthogonal vector « 174 and an E-SVM feature w 170.
- An SVM algorithm may determine an orthogonal vector u 174 and a bias b parameter 173.
- the orthogonal vector u 174 may be determined as discussed in connection with blocks 205 and 206 of Fig. 2.
- the orthogonal vector « 174 may be orthogonal to a separating hyper-plane 172.
- the orthogonal vector u 174 may be determined based on an exemplar feature x that may represent a queried image or a positive image concept.
- the orthogonal vector u 174 may be normalized to produce the E-SVM feature w 170.
- the orthogonal vector u 174 may be normalized as described in connection with block 207 of Fig. 2.
- the parameter u and/or the E- SVM feature w 170 may be determined based on negative vectors zi 175.
- negative vectors zi may be determined in accordance with the principles described in connection with block 204 of Fig. 2.
- the negative vectors zi 175 may be selected from a pool of generic features 176.
- Fig. 3 illustrates a flow chart of a method 300 for determining RE-SVM feature(s) in accordance with present principles.
- Block 301 may receive one or more image(s).
- Block 302 may determine an initial feature for the image from block 301.
- the initial feature vector of the image may be known as the initial positive feature(s).
- block 302 may determine the initial feature vector in accordance with the principles described in connection with block 202 of Fig. 2.
- Block 303 may receive a set of generic images.
- the generic images may be negative images determined in accordance with the images described in connection with block 203 of Fig. 2.
- Block 304 may determine initial negative features for the generic, negative images from block 303.
- Block 304 may determine initial negative features in accordance with the principles described in connection with block 204 of Fig. 3.
- block 304 may be performed off-line or prior to the performance of method 300.
- Block 305 may determine a first, positive RE-SVM feature (RE-SVMi) based on the positive feature(s) from block 302 and the negative features from block 304.
- Block 305 may determine the RE-SVMi feature based on an SVM determination, e.g., by applying an E-SVM function to the positive and negative features.
- Block 305 may perform an SVM determination as discussed in connection with Fig. 2 (e.g., blocks 205-207) and Fig. 1C.
- Block 305 may determine the RE-SVMi feature as described in connection with Equation Nos. (1) to (4).
- Block 306 may determine first negative RE-SVMi features A/ "1 corresponding to the generic image features from block 304.
- block 306 may determine the first generic negative RE-SVMi features in accordance with the principles described in connection with Equation Nos. (l)-(4), (10), (10a).
- block 306 may use the output of block 304 to replace the set of generic features ⁇ "0 discussed in connection with Equation Nos. (1)- (4), (10), (10a).
- block 306 may use the output of block 304 as a generic negative pool Af 0 and then may use and each feature in the pool of features Af° to determine a corresponding RE-SVM feature while using all or some of the features of the pool Jsf° as a generic negative pool.
- the RE-SVM features may be determined as described in connection with Equation Nos. (l)-(4), (10), (10a).
- block 306 may be optional and the image features from block 304 may be provided directly to block 307. In one example, block 306 may be performed off-line or prior to the performance of method 300.
- Block 307 may determine a second, positive RE-SVM feature (RE-SVM2).
- Block 307 may determine the RE-SVM2 feature based on the first, positive RE-SVMi feature and the first negative RE-SVMi features ⁇ ⁇ ' .
- Block 307 may determine the RE-SVM2 feature based on an SVM determination. The SVM determination of block 307 would consider second, positive RE-SVM2 feature as the positive exemplar and first negative RE-SVM features JV 1 as the negative exemplars.
- Block 307 may perform an SVM determination as discussed in connection with Fig. 2 (e.g., blocks 205-208) and Fig. 1C.
- Block 307 may determine the RE-SVM2 feature as described in connection with Equation Nos. (1-4, 10, 10a).
- Block 309 may determine an Nth positive RE-SVMN feature.
- Block 309 may recursively determine the RE-SVMN feature based on an RE-SVMN-I positive feature and RE-SVMN-I negative features.
- the N- 1 RE-SVMN- i negative features may be determined by block 308.
- the N-1 RE-SVMN-I negative features may be determined based on the N-2 RE-SVMN-2 negative features.
- Each N-1 RE-SVMN-I negative feature may be determined in accordance with the principles described in connection with Equation Nos. (1-4, 10, 10a).
- Each N-1 RE-SVM negative feature is computed by using the corresponding N-2 RE-SVMN-2 negative feature as a positive feature and using all other N-2 RE-SVMN-2 features as negative features.
- block 308 may be optional and the image features from blocks 304 or 306 may be provided directly to block 307. In one example, block 308 may be performed off-line or prior to the performance of method 300.
- Block 309 may determine the final, positive RE-SVMN feature 310 based on N iterations of the recursive framework.
- Block 309 may determine the RE-SVMN feature 310 based on an SVM determination.
- Block 309 may perform an SVM determination as discussed in connection with Fig. 2 (e.g., blocks 205-208) and Fig. 1C.
- Block 309 may determine the RE-SVMN feature as described in connection with Equation Nos. (l)-(4), (10), and (10a). The RE-SVM determination may be repeated N times to determine the RE-SVMN feature.
- Fig. 4 illustrates an exemplary image search based on RE-SVM features.
- Block 401 may receive positive images (e.g., one or more query images). The positive images may represent a visual concept (e.g., cat).
- Block 402 may determine positive RE-SVM features for the positive images from block 1.
- Block 402 may determine positive RE-SVM features in accordance with the principles described in connection with Equation Nos. (l)-(4) and Fig. 3.
- Block 403 may receive negative images.
- the negative images may be counter examples of the visual concept (e.g., images not containing cats).
- the negative images may be images a set of generic, negative images.
- Block 404 may determine negative RE-SVM features for the negative images from block 403.
- block 404 may determine negative RE-SVM features in accordance with the principles described in connection with Equation Nos. (l)-(4), (10) and (10a) and Fig. 3.
- Block 405 may learn a classifier based on the negative and positive RE-SVM features received from blocks 402 and 404, respectively.
- the learned classifier can be an SVM classifier, or a K classifier, or a nearest class mean classifier, or any other type of classifier.
- Block 406 may determine image search results based on the classifier from block 405 and features from a large database of image features 407.
- the large database of image features 407 may contain the features of the images that will be searched.
- the large database of image features 407 may be determined in accordance with the principles described in connection with blocks 143-145 of Fig. la and block 154 of Fig. lb.
- Block 406 may determine a score for each image by applying the classifier to each of the search image features.
- the search features of block 407 may be RE-SVM features determined in accordance with the principles described in connection with Fig. 3 and Equation Nos. (10)- (10a).
- the search results may be ranked based on the score and the ranking may be provided along with the results.
- Fig. 5 illustrates a flow chart of a method 500 for extracting RE-SVM local descriptors.
- Block 501 may extract N local descriptors from an input image.
- Block 502 may extract many local descriptors from a generic pool of images.
- Block 503 may compute RE-SVM local descriptors based on the local descriptors extracted at blocks 501 and 502.
- Block 504 may output the results of block 503.
- the output of block 504 may be provided to a local descriptor encoder, such as a VLAD, Fisher or Bag of Words local descriptor encoder.
- the ⁇ local descriptors provided by block 504 may be aggregated into a single global image feature.
- the global feature may represent features of a query, positive, negative, training, search or any other image and may be used during image searching.
- Fig. 6 illustrates a flow chart of a method 600 for accelerated E-SVM or RE-SVM feature determination.
- Method 600 may include a block 601 which may receive features of an image.
- block 601 may receive the image and may determine the image features.
- the features may be E-SVM or RE-SVM features as discussed in connection with Figs. 2 and 3.
- the features may be initial image features or any other type of features.
- Block 601 may determine the E-SVM or RE-SVM image features as discussed in connection with methods 200 and 300 of Figs. 2 and 3, respectively.
- the features may be E-SVM or RE-SVM features derived from local descriptors.
- Block 601 may provide the image features to block 604.
- Block 602 may receive a large set of features of negative, generic images.
- the large set of features may have a size that is a multiple of the size of the feature vector (e.g., ⁇ ⁇ , lOOx, l OOOx).
- Each feature vector may have any size, including in the range of 100 to tens of thousands.
- the total size of the large set may be very high, including over 100,000 or over a million features.
- block 602 may receive features of generic images as described in connection with blocks 303 and 304. In another example, block 602 may determine the features of generic images.
- the features of the generic images may be initial, base features, E-SVM features or RE-SVM features. In another example, the features may be E-SVM or RE-SVM features derived from local descriptors. Block 602 may provide the features of the generic images to block 603.
- Block 603 may perform similarity clustering for the generic features from block 602. In one example, block 603 may perform cosine-similarity clustering. In one example, block
- 603 may perform similarity clustering in accordance with the principles described by Equation
- block 603 may perform similarity clustering in accordance with the principles described in connection with Fig 6a.
- block 603 may determine each centroid by: i) initializing clusters , and ii) updating each centroid c fc and each corresponding cluster , and (iii) repeating (i) and (ii) recursively.
- the present clustering approach includes clustering that is sensitive to the sign of the inner product z' c; (compare with dictionary learning where the cluster assignment is done using the absolute value, e.g.,
- Block 603 may optionally perform the clustering offline before any image searching is carried out.
- the processing associated with block 603 may be performed prior to image searching and the results may be stored.
- Block 604 may perform the E-SVM function using as generic negatives the centroids Ck from block 603.
- the centroids ck may be utilized as the generic negative features are typically utilized by an E-SVM function.
- block 604 may perform the E-
- block 604 may perform an E-SVM function based on the centroids from block 603 as the generic negative features.
- block 604 may perform an E-SVM function as described in accordance with the principles of block 205 of Fig. 2.
- block 604 may perform the RE-SVM determination according to blocks 305, 307, or 309 of Fig. 3.
- the auxiliary data may be represent the number of generic negative features vectors Nk in each cluster k.
- Block 604 may determine scaled centroids by multiplying the centroid k by the corresponding Nk .
- Block 604 may perform an E-SVM function based on the scaled centroids.
- Block 604 may perform E-SVM determinations in accordance with Equation Nos. (l)-(4) and (12).
- Fig. 6A illustrates an exemplary method for performing similarity clustering in accordance with present principles.
- Fig. 6A includes a block 650 that may receive generic negative features.
- the generic features may be E-SVM, RE-SVM or base features such as VLAD, Bag-of-words or Fisher vector.
- the generic negative features may be the features received from block 602 in Figure 6.
- the set of generic negative features may have a size N (i.e., there are N generic negative features received by block 651).
- block 650 may receive the actual values of each generic negative feature.
- block 650 may receive one or more pointers to the values of the generic negative features.
- Each negative generic feature may be represented by a vector.
- Block 651 may initialize centroids in accordance with present principles.
- block 651 may initialize a set of centroids c ⁇ , CKatti where there are a total number of K centroids.
- block 651 may initialize centroids by randomly choosing K generic negative features from the features received in block 650. The value of K is determined empirically and should be smaller than the total number of generic negative features.
- block 651 may initialize the centroids based on the K generic negative features. For example, block 651 may loop through the K generic negative features and assign each feature to a corresponding centroid.
- block 651 may assign a first negative feature to a first centroid ci, a second negative feature to a second centroid c 2 , a third negative feature to a third centroid C3, and so on until the Kth negative feature which is assigned to centroid Ck.
- the centroids may be initialized to fixed values that do not depend on the generic negatives.
- each centroid may be assigned a value that is determined by a random number generator.
- Each centroid may be a vector of the same size as the features stored in the set N.
- Block 652 may begin a loop (hereinafter "loop(l)") that will loop and perform the processing of blocks 653-662 using a variable i having a range from one through a number Z.
- the number Z may signify the number of loop iterations that will be performed to optimize the values of the centroids.
- the number Z may be received from an outside source (e.g., may be set by a user) or may be empirically determined.
- Block 652 may pass control to block 653.
- Block 653 may initialize clusters in accordance with present principles.
- block 653 may initialize a set of clusters Ci, C 2 , ... Ck.
- the set of clusters may have the same amount of elements (K) as the set of centroids initialized by block 651.
- Each cluster C may have a one to one association with a corresponding centroid.
- cluster Ci corresponds to centroid ci
- cluster C2 corresponds to centroid c 2
- cluster C3 corresponds to centroid C3
- cluster Ck corresponds to centroid Ck.
- Each cluster C is a list designed to point to features associated to the corresponding centroid of that cluster.
- Block 653 may initialize all the clusters (e.g., clusters Ci, C2, ... Ck) to zero.
- Block 653 may then pass control to block 654.
- Block 654 may begin a loop (hereinafter "loop(2)") that will loop through all the generic negative features from block 650 using a variable j having a range from one to the total number of generic negative features N.
- Block 655 may begin a loop (hereinafter "loop(3)") that will loop through all the centroids ci, ... Ck and their corresponding clusters Ci, ... Ck using a variable / having a range from one to the total number of centroids K. Block 655 may pass control to block 656.
- loop(3) a loop that will loop through all the centroids ci, ... Ck and their corresponding clusters Ci, ... Ck using a variable / having a range from one to the total number of centroids K.
- Block 656 may determine a distance between the feature zj and a centroid ci. In one example, block 656 may determine distance based on the signed inner product or the absolute value of an inner product between the feature zj and the centroid ci. Block 656 may pass control to block 657 when the value of 1 is greater than K. Block 657 may end loop 3. Block 657 may then pass control to block 658.
- Block 658 may assign feature zj to a closest centroid.
- block 658 may determine the closest centroid to the feature zj based on the distance between the feature zj and all the centroids determined by loop 3.
- block 658 may determine the centroid with the highest absolute value of an inner product between feature zi and the centroid.
- block 658 may determine the closest centroid based on any other method that compares the distances determined by loop 3.
- Block 658 may assign the feature zj to the closest centroid by updating the cluster corresponding to that closest centroid.
- 658 may update the corresponding cluster to include an identifier to feature zj.
- Block 658 may pass control to block 659 when the value of j is greater than N.
- Block 659 may end loop 2. Block 659 may pass control to block 660.
- Block 660 may begin a loop (hereinafter "loop(4)") that will loop through all the centroids ci, ... Ck and their corresponding clusters Ci, ... Ck using a variable m having a range from one to the total number of centroids K. Block 660 may pass control to block 661.
- loop(4) a loop that will loop through all the centroids ci, ... Ck and their corresponding clusters Ci, ... Ck using a variable m having a range from one to the total number of centroids K.
- Block 661 may update the value for the centroid Cm.
- block 661 may update the value of the centroid c m based on the first left-singular vector of the features identified by the corresponding cluster Cm.
- Block 661 may review the values of the cluster Cm that corresponds to the centroid c m .
- the cluster Cm may identify the features clustered with centroid c m (e.g., the features closest to centroid c m ).
- Block 661 may create a matrix Mm based on the feature vectors identified by cluster Cm. This matrix may be the correlation matrix for the features assigned to cluster Cm.
- Block 661 may then determine an updated value for the centroid Cm based on the left-singular vector of the matrix Mm with the strongest singular value.
- Block 661 may pass control to block 662 when the value of m is greater than K.
- Block 662 may end loop 4.
- Block 662 may pass control to block 663 when the value of i is greater than A.
- Block 663 may further optionally output updated values for all the centroid ci, .. . Ck.
- Fig. 6B illustrates a plot of similarity clustering in accordance with present principles.
- Fig. 6B illustrates an example of the similarity clustering performed by the method of Fig. 6A or by block 603 of Fig. 6.
- Fig. 6B illustrates a two dimensional feature space where the generic negative features of size two exist, with each of the two axes corresponding to one of the entries in each feature vector zi
- Fig. 6B illustrates for a set of 10 generic negative features zi 671, Z2 672, Z3 673, TA 61 A, Z5 675, Z6 676, Z7 677, zs 678, Z9 679, zio 680.
- Fig. 6B further illustrates three centroids ci 681, a 682, C3 683 and their corresponding weights Ni 691, N2692, N3 693.
- each of the weights Ni, N 2 , N3 may represent the number of nearest features of the corresponding centroid ci, a, C3.
- Each centroid ci, c 2 , C3 may be learned in accordance with the principles described in connection with Fig. 6A.
- each centroid ci, a, C3 may be associated with a respective
- centroid ci 681 may have a cluster of features Z6 676, Z7 677, zs 678. Furthermore, centroid ci 681 may have an associated weight Ni 691. The value of weight Ni may be 3, indicating that there are three features assigned to cluster ci 681. Centroid C2 682 may have a cluster of features zi 671 , Z3 673, TA 61 A, is 679. Furthermore, centroid C2 682 may have an associated weight 2 692. The value of weight N2 may be 4, indicating that there are four features assigned to cluster C2 682.
- Centroid C3 683 may have a cluster of features Z2 672, Z5 675, zio 680. Furthermore, centroid C3 683 may have an associated weight N3 693. The value of weight N3 may be 3, indicating that there are three features assigned to cluster C3 683.
- Fig. 7 is a block diagram illustrating an exemplary image processing system 700.
- the image processing system includes an image processing device 710 and a display 720.
- the device 710 and the display 720 may be connected by a physical link.
- the device 710 and the display 720 may communicate via a communication link, such as, for example a wireless network, a wired network, a short range communication network or a combination of different communication networks.
- the display 720 may allow a user to interact with image processing device 710, including, for example, inputting criteria for performing an image search.
- the display 720 may also display the output of an image search.
- the image processing device 710 includes memory and processor 730 and any other software or hardware that allow the performance of image searcher 740, RE-SVM image feature determinator 750 and accelerated feature determinator 760.
- the device 710 may be configured to perform any of the features of the blocks depicted in FIG. 2-6a. .
- Each of the RE-SVM image searcher 740, RE-SVM image feature determinator 750 and accelerated feature determinator 760 may be executed in whole or in part on image processing device 710. Alternatively, each of the image searcher 740, RE-SVM image feature determinator 750 and accelerated feature determinator 760 may be performed remotely and the results may be communicated to device 710 via a communication link.
- the dashed boxes may indicate that each of the box processes may be executed on image processing device 710 or may be executed remotely.
- Fig. 8 illustrates an example of various image processing devices 801 -804 and a server 805.
- the image processing devices may be smartphones (e.g., device 801), tablets (e.g., device 802), laptops (e.g., device 803), or any other image processing device 804 that includes software and hardware to execute the features of described in the Figures.
- the image processing devices 801 -804 may be similar to the image processing device 710 and the image processing system 700 described in connection with Fig. 7.
- Image searcher 740, RE-SVM image feature determinator 750 and accelerated feature determinator 760 may be executed on any of the devices 801-804, on server 805, or in a combination of any of the devices 801-804 and server 805.
- Fig. 9 represents an exemplary architecture of an apparatus 900 which may be configured to implement methods described in relation with Fig. 1-6A and Equation Nos. 1- 10a.
- apparatus 900 represents an apparatus that may be configured to implement the methods according to present principles, including principles described in relation to Figs. l-6a.
- Apparatus 900 comprises following elements that are linked together by a data and address bus 901 :
- a microprocessor 902 (or CPU), which is, for example, a DSP (or Digital Signal
- ROM Read Only Memory
- RAM Random Access Memory
- an I/O interface 905 for reception of data to transmit, from an application; and a battery 906 (or other suitable power source).
- the battery 906 is external to the apparatus.
- the word « register » used in the specification can correspond to area of small capacity (some bits) or to very large area (e.g. a whole program or large amount of received or decoded data).
- ROM 903 comprises at least a program and parameters. Algorithm of the methods according to the invention is stored in the ROM 903. When switched on, the CPU 902 uploads the program in the RAM and executes the corresponding instructions.
- RAM 904 comprises, in a register, the program executed by the CPU 902 and uploaded after switch on of the apparatus 900, input data in a register, intermediate data in different states of the method in a register, and other variables used for the execution of the method in a register.
- the implementations described herein may be implemented in, for example, a method or a process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method or an apparatus), the implementation of features discussed may also be implemented in other forms (for example a program).
- An apparatus may be implemented in, for example, appropriate hardware, software, and firmware.
- the methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants ("PDAs”), and other devices that facilitate communication of information between end-users.
- PDAs portable/personal digital assistants
- the image or picture I is obtained from a source.
- the source belongs to a set comprising:
- a local memory e.g. a video memory or a RAM (or Random Access Memory), a flash memory, a ROM (or Read Only Memory), a hard disk ;
- a storage interface (905), e.g. an interface with a mass storage, a RAM, a flash memory, a ROM, an optical disc or a magnetic support;
- a communication interface (905) e.g. a wireline interface (for example a bus interface, a wide area network interface, a local area network interface) or a wireless interface (such as a IEEE 802.1 1 interface or a Bluetooth® interface); and
- a wireline interface for example a bus interface, a wide area network interface, a local area network interface
- a wireless interface such as a IEEE 802.1 1 interface or a Bluetooth® interface
- an image capturing circuit e.g. a sensor such as, for example, a CCD (or Charge- Coupled Device) or CMOS (or Complementary Metal-Oxide-Semiconductor)).
- a CCD or Charge- Coupled Device
- CMOS Complementary Metal-Oxide-Semiconductor
- the decoded image I is sent to a destination; specifically, the destination belongs to a set comprising:
- a local memory (903 or 904), e.g. a video memory or a RAM, a flash memory, a hard disk ;
- a storage interface (905), e.g. an interface with a mass storage, a RAM, a flash memory, a ROM, an optical disc or a magnetic support;
- a communication interface (905) e.g. a wireline interface (for example a bus interface (e.g. USB (or Universal Serial Bus)), a wide area network interface, a local area network interface, a HDMI (High Definition Multimedia Interface) interface) or a wireless interface (such as a IEEE 802.11 interface, Wi-Fi ® or a Bluetooth ® interface); and
- a wireline interface for example a bus interface (e.g. USB (or Universal Serial Bus)
- a wide area network interface e.g. USB (or Universal Serial Bus)
- a local area network interface e.g. HDMI (High Definition Multimedia Interface) interface
- a wireless interface such as a IEEE 802.11 interface, Wi-Fi ® or a Bluetooth ® interface
- bitstream BF and/or F are sent to a destination.
- bitstream F and BF or both bitstreams F and BF are stored in a local or remote memory, e.g. a video memory (904) or a RAM (904), a hard disk (903).
- one or both bitstreams are sent to a storage interface (905), e.g. an interface with a mass storage, a flash memory, ROM, an optical disc or a magnetic support and/or transmitted over a communication interface (905), e.g. an interface to a point to point link, a communication bus, a point to multipoint link or a broadcast network.
- the bitstream BF and/or F is obtained from a source.
- the bitstream is read from a local memory, e.g. a video memory (904), a RAM (904), a ROM (903), a flash memory (903) or a hard disk (903).
- the bitstream is received from a storage interface (905), e.g. an interface with a mass storage, a RAM, a ROM, a flash memory, an optical disc or a magnetic support and/or received from a communication interface (905), e.g. an interface to a point to point link, a bus, a point to multipoint link or a broadcast network.
- apparatus 900 being configured to implement methods in accordance with present principles, belongs to a set comprising:
- a tablet or tablet computer
- a video server e.g. a broadcast server, a video-on-demand server or a web server.
- apparatus 900 being configured to implement an image processing process in accordance with present principles, belongs to a set comprising:
- a tablet or tablet computer
- Implementations of the various processes and features described herein may be embodied in a variety of different equipment or applications.
- Examples of such equipment include an encoder, a decoder, a post-processor processing output from a decoder, a preprocessor providing input to an encoder, a video coder, a video decoder, a video codec, a web server, a set-top box, a laptop, a personal computer, a cell phone, a PDA, and other communication devices.
- the equipment may be mobile and even installed in a mobile vehicle.
- An aspect of present principles may be implemented on any electronic device or combination of electronic devices.
- the present principles may be implemented on any of variety of devices including a computer, a laptop, a smartphone, a handheld computing system, a remote server, or on any other type of dedicated hardware.
- Various examples of the present invention are described below with reference to the figures.
- Implementations of the various processes and features described herein may be embodied in a variety of different equipment or applications.
- Examples of such equipment include an encoder, a decoder, a post-processor processing output from a decoder, a preprocessor providing input to an encoder, a video coder, a video decoder, a video codec, a web server, a set-top box, a laptop, a personal computer, a cell phone, a PDA, and other communication devices.
- the equipment may be mobile and even installed in a mobile vehicle.
- the methods may be implemented by instructions being performed by a processor, and such instructions (and/or data values produced by an implementation) may be stored on a processor-readable medium such as, for example, an integrated circuit, a software carrier or other storage device such as, for example, a hard disk, a compact diskette (“CD"), an optical disc (such as, for example, a DVD, often referred to as a digital versatile disc or a digital video disc), a random access memory (“RAM”), or a read-only memory (“ROM”).
- the instructions may form an application program tangibly embodied on a processor-readable medium. Instructions may be, for example, in hardware, firmware, software, or a combination.
- a processor may be characterized, therefore, as, for example, both a device configured to carry out a process and a device that includes a processor-readable medium (such as a storage device) having instructions for carrying out a process. Further, a processor- readable medium may store, in addition to or in lieu of instructions, data values produced by an implementation.
- implementations may produce a variety of signals formatted to carry information that may be, for example, stored or transmitted.
- the information may include, for example, instructions for performing a method, or data produced by one of the described implementations.
- a signal may be formatted to carry as data the rules for writing or reading the syntax of a described example, or to carry as data the actual syntax-values written by a described example.
- Such a signal may be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal.
- the formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream.
- the information that the signal carries may be, for example, analog or digital information.
- the signal may be transmitted over a variety of different wired or wireless links, as is known.
- the signal may be stored on a processor-readable medium.
- Various examples of the present invention may be implemented using hardware elements, software elements, or a combination of both. Some examples may be implemented, for example, using a computer-readable medium or article which may store an instruction or a set of instructions that, if executed by a machine, may cause the machine to perform a method and/or operations in accordance with the examples.
- a machine may include, for example, any suitable processing platform, computing platform, computing device, processing device, computing system, processing system, computer, processor, or the like, and may be implemented using any suitable combination of hardware and/or software.
- the computer- readable medium or article may include, for example, any suitable type of memory unit, memory device, memory article, memory medium, storage device, storage article, storage medium and/or storage unit.
- the instructions may include any suitable type of code, such as source code, compiled code, interpreted code, executable code, static code, dynamic code, encrypted code, and the like, implemented using any suitable high-level, low-level, object- oriented, visual, compiled and/or interpreted programming language.
- the implementations described herein may be implemented in, for example, a method or a process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method), the implementation of features discussed may also be implemented in other forms (for example, an apparatus or program).
- An apparatus and constituents included therein, for example, a processor, an encoder and a decoder may be implemented in, for example, appropriate hardware, software, and firmware.
- the methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants ("PDAs”), and other devices that facilitate communication of information between end-users.
- PDAs portable/personal digital assistants
- Determining the information may include one or more of, for example, estimating the information, calculating the information, predicting the information, or retrieving the information from memory.
- Accessing the information may include one or more of, for example, receiving the information, retrieving the information (for example, from memory), storing the information, processing the information, transmitting the information, moving the information, copying the information, erasing the information, calculating the information, determining the information, predicting the information, or estimating the information.
- Receiving is, as with “accessing”, intended to be a broad term.
- Receiving the information may include one or more of, for example, accessing the information, or retrieving the information (for example, from memory).
- “receiving” is typically involved, in one way or another, during operations such as, for example, storing the information, processing the information, transmitting the information, moving the information, copying the information, erasing the information, calculating the information, determining the information, predicting the information, or estimating the information.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Library & Information Science (AREA)
- Databases & Information Systems (AREA)
- Image Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
An aspect of present principles relate to methods and apparatus for image searching. The method may include determining a classifier based on at least a centroid representing a plurality of negative features and performing image searching based on the classifier. The apparatus may include an apparatus configured to determine a classifier based on centroids representing a plurality of negative features and to perform image searching based on the classifier. The centroids are determined based on a clustering of the plurality of negative features.
Description
ACCELERATED SUPPORT VECTOR MACHINE (SVM) LEARNING USING
CLUSTERING
TECHNICAL FIELD
The present disclosure relates to image recognition or searching. More precisely, the present disclosure relates to image searching based on Exemplar Support Vector Machines (E- SVM).
BACKGROUND
Various computer or machine based applications offer image searching abilities. An image search is a search for an image or images. The input to the image search may be an image or images. Image searching is relevant in many fields, including computer vision, object recognition, video tracking, and video-based location determination/mapping.
Fig. 1 illustrates a method 100 for performing image searching. The method 100 includes receiving query image(s) 1 10 that will be the basis of the image search; extracting image features 120 from the query image(s); and performing image searching 130 by using the image features from block 120. Block 130 may perform image searching based on any technique, including image retrieval and semantic searching techniques.
Fig. 1A illustrates an example of an image retrieval search technique. An image retrieval search generally locates images that have the same scene as a query image. The located same-scene images may even have scene alterations resulting from task-related transformation(s) (e.g., changes in scene illumination, image cropping, scaling, wide changes in the perspective of the camera, high compression ratios, or picture-of-video-screen artifacts).
Query image features may be received at block 141. Block 142 may perform image retrieval based on image features received from block 141 and image features received from a large database of features 145. Block 142 may compare the image features received from block 141 with the image features received from the large database of features 145. The large database of features 145 may consist of image features extracted from images stored in a database of search images 143. Block 144 may extract features of each image in the database of search images 143. The large database of features 145 may store the results determined by block 144. Blocks 143-145 may be computed offline or prior to the execution of blocks 141, 142, and 146.
In one example, block 142 performs image retrieval based on a distance between the features from block 141 and the features of the large database from block 145. For example, block 142 may calculate the Euclidean distance between a query image feature from block 141
and each of the features from block 145. Block 142 may choose the images from database 143 that correspond to the smallest distances. Block 142 may provide the search results to block 146. Block 146 may output the search result(s). If multiple results are returned, the results may be ranked and the ranking may be provided along with the results.
Fig. IB illustrates an example of a semantic search. The semantic search searches images for visual concepts representing a search query (e.g., cats). Block 151 receives features of positive images (i.e., features of images that contain the queried visual concept (e.g., cats)). Block 152 receives features of negative images (i.e., features of images that do not contain the queried visual concept (e.g., no cats, instead contain dogs, other animals or no animals)). Block 153 may perform classifier learning to determine a classifier. Block 153 performs classifier learning based on the positive features from block 151 and the negative features from block 152. Block 153 may choose a classifier that best differentiates the positive features from the negative features. In one example, the classifier learning may be an SVM algorithm. Alternatively, the classifier learning may be other learning techniques.
Block 155 may perform image searching based on the classifier from block 153 and the features from the large database 154. The large database of features 154 may consist of feature vectors of images that will be searched. For example, the large database of features 154 may be determined in accordance with the principles described in connection with block 145 of Fig. l a. Block 155 may apply the classifier from block 153 to each of the features from block 154. Block 156 may output the result(s) from block 155. If multiple results are returned, the results may be ranked and the ranking may be provided along with the results.
An E-SVM is a classifier that is a linear Support Vector Machine learned from a single positive example, referred to as the exemplar. The exemplar can be determined based on the positive example and a set of generic, negative examples N. For example, Malisiewicz et al, Exemplar-SVMs for Visual Object Detection, Label Transfer and Image Retrieval, International Conference of Machine Learning, 2012 discuss E-SVMs.
Present E-SVM search techniques require an inefficiently large set of generic, negative examples Af (e.g., a set of one million feature vectors). The large set is inefficient during data computations. Also, present E-SVM search techniques are limited to using an E-SVM representation as a classifier.
Hard-negative mining is a process of (i) selecting subset of generic negative features Af (hereinafter referred to as the "hard-negative cache"); (ii) training a classifier based on the hard- negatives cache; and (iii) increasing the hard-negative cache based on items from Af that have
a classification score inside a margin (i.e., greater than - l). However, E-SVM searching based on hard-negative mining is still a highly complex and inefficient process because it still relies on a large set of generic negatives.
Present E-SVM techniques also have limited generalization because, despite the size of the negative set, they utilize a single positive exemplar. Present E-SVM determinations rely on asymmetric determinations. Image retrieval using the asymmetric approach has many shortcomings, including sub-par performance relative to features tailored for the image retrieval task. The asymmetric approach has not been considered for semantic search because E-SVMs are tailored for data-constrained scenarios, and many positives are generally available in semantic search. The asymmetric approach relies on a very large generic negatives pool, and suffers from limits in generalization that must be overcome by deriving extra positives from the original exemplar.
SUMMARY OF INVENTION
An aspect of present principles relate to methods and apparatus for image searching.
The method may include determining a classifier based on at least a centroid representing a plurality of negative features and performing image searching based on the classifier. The apparatus may include an apparatus configured to determine a classifier based on centroids representing a plurality of negative features and to perform image searching based on the classifier. The centroids are determined based on a clustering of the plurality of negative features.
The centroid may be determined based on a corresponding cluster. The corresponding cluster may include a subset of the plurality of negative features. The subset of negative features includes negative features that are closer to the corresponding cluster than any other cluster. The centroid may be determined based on a left-singular vector of a matrix composed of the subset features.
A score is determined for a search image by applying the classifier to the search image.
The centroid further may have associated auxiliary data. The auxiliary data may be used to modify the at least a centroid. The auxiliary data may correspond to a weight value. The weight may be determined based on the amount of negative features that are associated with the at least a centroid. The auxiliary data may be statistical information of the subset of negative features associated with the centroid.
The classifier may be determined based on a positive E-SVM feature associated and the centroids as negative features. The classifier may be determined based on a positive RE-SVM feature and RE-SVM features associated with the centroids.
A feature may be assigned to a centroid based on an inner product between the feature and the centroid. The total amount of features corresponding to the plurality of negative features may have an amount greater than ten times the size of each negative feature.
SUMMARY OF THE DRAWINGS
The features and advantages of the present invention may be apparent from the detailed description below when taken in conjunction with the Figures described below:
FIG. 1 illustrates a flow diagram of a method for image searching.
FIG. 1 A illustrates a flow diagram of a method for an image retrieval search.
FIG. IB illustrates a flow diagram of a method for a semantic search.
FIG. 1C illustrates an example of an illustration of a graphical representation of an E-
SVM.
FIG. 2 illustrates a flow chart of an exemplary method for determining E-SVM feature(s).
FIG. 3 illustrates a flow chart of an exemplary method for determining an RE-SVM feature(s).
FIG. 4 illustrates a flow chart of an exemplary method for semantic search using RE- SVM features.
FIG. 5 illustrates a flow chart of an exemplary method for extracting RE-SVM local descriptors.
FIG. 6 illustrates a flow chart of an exemplary method for accelerated E-SVM determinations.
FIG. 6A illustrates a flow chart of an exemplary method for centroid determination.
FIG. 6B illustrates an exemplary diagram of a plot of similarity clustering.
FIG. 7 illustrates a block diagram of an exemplary image processing device.
FIG. 8 illustrates a block diagram of an exemplary distributed image processing system.
FIG. 9 illustrates an exemplary apparatus.
DETAILED DESCRIPTION
As used herein, an "E-SVM function" refers to a determination of linear classifiers based on one or more positive and negative examples. The E-SVM function may perform a learning determination based on the positive examples of a class (i.e., examples that represent the class; for example, for the class "cats", the set of examples may consist of images of cats) and negative examples (i.e., examples that do not represent a class). As described herein, a learning algorithm may choose a function that best differentiates positive examples from negative examples.
As used herein, an "E-SVM feature" refers to the output of the E-SVM function.
As used herein, an "RE-SVM function" refers to a recursive framework for determining
RE-SVM features in accordance with the principles herein. The recursive framework may determine an initial set of E-SVM features based on image feature(s) of a query image and image features of generic negative images. The recursive framework may determine an initial RE-SVM feature by applying an E-SVM function to E-SVM features of query image(s) and E- SVM features of generic negative images. The recursive framework may then recursively determine RE-SVM features by repetitively applying the E-SVM function based on a prior repetition's determination of RE-SVM query features and features of generic negative images. As used herein, an "RE-SVM feature" refers to the output of the RE-SVM function.
The scalars, vectors and matrices may be denoted using, respectively standard, bold, and uppercase-bold typeface (e.g., scalar , vector a and matrix A). The notation vA. may denote a vector from a sequence vi , v2, . . , , νΛ·, and vk may denote the fe-th coefficient of vector v. The notation [a*]* ( respectively, [<¾]¾) may denote concatenation of the vectors afc (scalars <¾) to form a single column vector. Finally, ¾ may denote the Jacobian matrix with (i, j th entry
An aspect of present principles is directed to mapping E-SVM features to RE-SVM feature(s) based on an E-SVM function. An aspect of present principles is directed to determining RE-SVM feature(s) during a post-processing stage. In one example, enhancement of image searching and image classification may occur based on E-SVM features obtained by concatenating the activations of Deep Convolutional Neural Networks (DCNNs). An aspect of present principles is directed to automatically determining an RE-SVM feature that contains unique information of a query image relative to information of generic, negative images. The automatic extraction of the unique information is more efficient than feature extractors that rely on manual encoding of uniqueness into a feature extraction process.
An aspect of present principles is directed to image searching based on symmetrically determined E-SVM features. As used herein, "symmetric" determinations or "symmetrically" determined refers image searching based on E-SVM features of query images (i.e., positive images), E-SVM features of generic negative images, and E-SVM features of images that will be searched. These symmetric determinations improve robustness and efficiency because the E-SVM function has been applied to the images and this function determines, from the feature representing each image, a representation that separates the image from a large generic negative pool of features.
An aspect of present principles is directed to a generic, -dimensional image feature encoder. The image feature encoder may be a global encoder, may be based on aggregated local features, or may be derived from C s-based features. The features may be denoted by vectors in a space RD. The parameter ff Dmay represent the space of all possible vectors having D, real-valued entries.
An aspect of present principles provides a novel solution to the problem of complexity of generic negative image sets with a large amount of images. An aspect of present principles reduces complexity based on clustering of the generic negative features. The clustering may consist of deriving centroids. Each centroid may represent a subset of the large set of generic negative images. Each centroid may be associated to a subset of generic negative features that are closest to that centroid. That is, the generic negative features of a subset are the generic features that are closer to that centroid than to any other centroid. The distance may be determined by maximizing the inner-product similarities between the generic negative features of the subset and the centroid.
An E-SVM feature in accordance with present principles may be determined based on an optimization determined as follows:
A ft"
w(x, A ) = argmin [~ (|w||^ + ti¾ max(0. 1— xT w) + α_ι _ max(0, 1 + z^w)] , Equation No. (1) «e«" 2 ' ' ,;=ι
The parameters λ, αι, and α ι may have positive, scalar values. The parameters λ, αι, and on may control regularization and relative weight of positive and negative features. Each of the parameters λ, αι, and on can be manually chosen or can be automatically determined.
The parameter x may be a feature vector that relates to a query image, a generic negative image, or a search image. In one example, feature vector x may be determined using an encoder such as a VLAD, Fisher Vector, CNN activation coefficients, or another E-SVM feature. The parameter N may represent a set of generic feature vectors (e.g., of a size N). Parameter w is
the parameter that is being optimized under Equation No. (1). The parameter zi is negative feature that is part of the set of A/negative features. The set of generic feature vectors Λί may be a large set of vectors.
The optimization of Equation No. (1) may determine the best (i.e., the optimal) vector w. The optimization may be based on stochastic gradient descent. The stochastic gradient decent may be determined using a Pegasos algorithm. The stochastic gradient descent may process a feature vector of parameter x or parameter zi The feature vectors x and zi may be processed in a random order. The parameters ai and a-i may be determined during the stochastic gradient descent algorithm. The optimization of Equation No. (1) may be an SVM problem relying on hinge loss, where the amount of features in the positive set is unbalanced relative to the amount of features in the negative set. For example, there may be one positive feature and a million negative features.
In one example, an E-SVM feature may be determined based on a i2 normalization of the result of Equation No. (1). In one example, the normalized E-SVM feature may be determined as follows:
The parameters x and λί may have the same values as parameters x and M of Equation No. (1).
In other examples, other types of normalization may be used such as h normalization or other type of post-processing may be used such as zeroing of negative coefficients or signed square roots of coefficients.
An E-SVM feature in accordance with present principles may be determined based on an optimization determined as follows:
The parameters x and Af may have the same values as parameters x and of Equation No. (1). The parameter z may denote a negative feature in a set of negative features Λ'. The parameters Ci, C2 are regularization parameters. In one example the parameters Ci, C2 may be customized empirically. The parameters w' and b denote the parameters over which there is optimization or minimization.
In one example, an E-SVM feature may be determined based on a ¾ normalization of the result of Equation No. (3) as follows:
u(x, Λ )
w(x,jV)
Equation No. (4) ni x, Af)
In other examples, other types of normalization may be used such as h normalization or other type of post-processing may be used such as zeroing of negative coefficients or signed square roots of coefficients.
An aspect of present principles is directed to determining RE-SVM feature(s) based on E-SVM features, including E-SVM features determined in accordance with the principles described in connection with Equation Nos. (l)-(4). In one example, the RE-SVM feature(s) may be determined based on an E-SVM function that utilizes previously determined E-SVM features of query, training and/or search images. In one example, a feature encoder may encode E-SVM features instead of the image feature representations. For example, if x is a feature vector encoded by a feature encoder, the feature encoder may instead encode an E-SVM feature (e.g., w(x,A'')).
An aspect of present principles is directed to E-SVM functions based on symmetrically determined E-SVM features. The symmetric approach includes determining E-SVM features for a query image x°. The symmetric approach further includes determining corresponding E- SVM features for each image x in a database V. The database V may consist of images that will be searched.
The E-SVM feature w° for a query image x° may be represented by:
Equation No. (5) w° = w (x° , Λ )
The set of E-SVM features Wi corresponding with image xi in a database I of search images may be represented by:
Equation No. (6) {w,- - w(x.i,A'))}i
During image searching, search images ¾ or E-SVM features Wi corresponding may be sorted based on their similarity to the query image. In one example, similarity may be measured between an E-SVM feature w° derived from a query image feature x° and an E-SVM feature Wj derived from a database image feature xj as follows:
T o
Equation No. (7) Sj = Wj w
The parameter wy represents the ESVM feature of a database image xy. The parameter w° represents the ESVM feature of a query image. The parameter Sj may represent a similarity score of the image feature of the query image and the jth search image. Because each feature w is uniquely associated with a single image, the parameter Sj may also represent a similarity score between the query image and the database image xj. A high similarity score may indicate a high similarity between the images, whereas a low similarity score may indicate a low similarity between the images. In one example, the score may be determined based on the inner product between wj and w°. In another example, the score may be determined based on a negated Euclidean distance (e.g.,— | wj - w0 |2).
Image searching based on E-SVM or RE-SVM features includes learning a classifier. A classifier based on E-SVM or RE-SVM features may be learned based on an optimization as follows:
C Λ'
argmin |v|2 +— * max(0, 1 - v + b))
Equation No. (8) v<b Λ <=i
The annotated training set {j^ x il, may consist of labels y» e {-1 , 1). Vector v represents a classifier v. Parameter C represents a scalar chosen empirically and controlling the regularization. Parameter N represents the size of the training set. Parameters Wi represent the
E-SVM features of the training set images and b represent the bias or offset between the origin and the hyperplane orthogonal to v.
In an image search based on a symmetric determinations, E-SVM or RE-ESVM features will be determined for the query image, the negative images and the search images. The image classifier v is determined based on the training (positive and negative) images. The image classifier v is applied to a database of searched images that are also represented by E-SVM or RE-SVM features.
In one example, a classifier may be learned from the RE-SVM representations of training images. For example, the training image(s) may be utilized to determine a K-nearest- neighbor classifier, or a nearest-class-mean classifier. The RE-SVM features may be determined by recursively applying an E-SVM function to E-SVM or RE-SVM features (e.g., w(x,N*)).
In one example, initial E-SVM features may be represented by:
Equation No. (9a) w° = x
Equation No. (9b) M° = M
The parameter w° may represent the initial feature of an image. The image can be any image, including a query, database, or training image. The parameter Af° may represent the initial feature representations of generic, negative images. In one example, a first RE-SVM (RE-SVMi) feature is determined based on an E-SVM function applied to initial features. In one example, a -th recursion RE-SVM (RE-SVMj) feature may be determined as follows:
Equation No. (10) w'' = w(wJ " 1 ,Ni~1 ),
Equation No. (10a) where JV' = {w(z, J\ '~ l ), z 6 JSfj~~ }
The value of j may be equal to or greater than 1. The j-th RE-SVM (or the j-th E-SVM) may be determined in accordance with the principles described with Equations Nos. (l)-(4). The E-SVM function in Equation Nos. (10) and (10a) may be derived according to Equation Nos. (l)-(4).
In one example, the RE-SVM feature construction may be similar to deep architectures that use an output of a given layer as the input of a subsequent layer. However, an aspect of present principles, unlike deep architecture, determines E-SVM or RE-SVM features using an unsupervised learning routine on a per-image basis. As used herein, unsupervised may indicate automatic, machine learning without user annotations to a training set. Additionally, each j-th RE-SVM feature (RE-SVMj) may be determined by a single, non-linear, and/or convex problem, as opposed to a standard tandem linear/non-linear arrangement.
In one example, local descriptors {sk}k may be extracted from images. The local descriptors may be extracted for the query image, the negative images, the database images that will be searched, or any other images that are used to learn classifiers. The descriptors may be aggregated based on a descriptor aggregating method, such as VLAD, the Fisher Vector, or Bag-of-Words to produce a single feature vector.
Since E-SVMs are agnostic to the underlying feature representation, it is possible to compute an E-SVM from each local descriptor sk by using an adequate set M of generic local descriptors, and further to do so recursively as in Equations (10) and (10a). The resulting RE- SVM local descriptors can subsequently be used to build aggregated features using the standard methods previously mentioned such as VLAD, the Fisher Vector or Bag-of-Words. In turn, the subsequent features can be further processed using RE-SVM learning functions as discussed in connection with Equation Nos. (10) and (10a).
Accelerating E-SVM learning using similarity clustering
An aspect of present principles is directed to clustering generic, negative image features to reduce complexity. In one example, computational complexity of E-SVM feature determinations is decreased despite the existence of a large amount of generic, negative images. In one example, the amount of generic negative image features is reduced based on corresponding centroid features. This reduction results in reduced storage and complexity. In one example, this decrease is achieved by deriving a smaller amount of representative centroids from a large amount of generic, negative features. As used herein, a "centroid" may refer to a vector that represents a subset of features. For example, the centroid may represent the mean of the subset, the medoid of the subset (i.e., the subset feature closest to the mean) or a singular vector or eigenvector derived from the subset. The centroids may replace the large amount of generic features, thereby resulting in a smaller training set for an E-SVM function, and hence a faster learning time. The amount of centroids needs to be chosen empirically, as using too few centroids may result in a non-representative generic negative pool.
In one example, the centroids may further have associated auxiliary data. In one example, the auxiliary data may be a weight that is associated with each centroid. In one example, the weight may represent the number of generic negative image features that are part of a corresponding centroid. In another example, the auxiliary data may consist of statistics derived from the generic negative image features of the corresponding centroid. The statistics may include the mean and correlation matrix of the generic negative image features of the corresponding centroid.
In one example, the centroids may be determined based on a clustering approach that is applied to features of generic, negative images. In one example, the clustering approach may be applied to multi-modal distribution of generic images' features. In one example, a clustering approach may locate the centroids of the multi-modal distribution. The large amount of negative image features may then be replaced by a smaller set of centroids, weighted by their importance. In one example, the centroids may be determined based on:
K
argmax^ ^ zT<¾,
Equation (1 1): ,-,c K fc=1 ζ€Λ
where Λ = {«'€ λί, argmax znci = k}
Equation (1 la): i
The parameters Ck and ci are centroids. The parameter K is the number of learned centroids. The parameterJV is the generic negative pool. The parameter Λ is the subset of features from the generic negative pool that correspond to the centroid Ck. The subset of features Λ may be assigned to centroid Ck because they are closer to centroid Ckthan any other centroid. The parameters z and z' are feature vectors from the generic negative pool. Each centroid ck vector may be determined based on a first left-singular vector of a matrix derived by stacking the vectors for % e Λ as columns.
In one example, the process for clustering features to determine centroids may include: i) updating each ¾ in accordance with Equation No. (1 1), ii) updating cluster information with a determination of whether a generic negative feature should be assigned to that cluster, and (iii) repeating (i) and (ii) recursively. Unlike dictionary learning for sparse coding, the present clustering approach includes clustering that is sensitive to the sign of the inner product ζ'τ ¾ (compare with dictionary learning where the cluster assignment is done using the absolute value, e.g., )z'Tci I). In one example, the process for clustering or centroid determination may be performed by block 603 of Fig. 6 or blocks 658 and 661 of Fig. 6A.
After all the centroids ¾¾ are determined, each feature z may be assigned a centroid. In one example, the feature z may be assigned to a centroid based on the distance between the feature z and the centroid.
After the features have been assigned to centroids, an E-SVM function may be performed based on the centroids of the generic negative images instead of the features of the generic negative images. For example, E-SVM features may be determined by applying an E- SVM function to centroids as follows:
'C ^
ιι(χ,Λ > b = argiiiin Jw'|2+Ci max(0, 1— xTw'— &)+rr r .¾ max(0, H-cJ '+δ). Equation 12 " w,b ' |Λ 1 fc== 1
As shown above, Equation No. (12) corresponds to a modification of Equation No. (1).
The parameter ck may correspond to a kth centroid. The parameter iVA. denotes the number of vectors in Af assigned to ¾,. Letting |N| denote the number of feature vectors inside the generic negatives , the computation time required to solve the problem in Equation No. (12) is the same as that required when using only K < \Af\ generic negative features.
Fig. 2 illustrates a flow chart of a method 200 for extracting E-SVM features. Block 201 may receive at least an image. In one example, block 201 may modify the image to include at least a mirrored, flipped, cropped, shifted and scaled version of the original image. Alternatively, block 201 may receive a plurality of images of a queried scene (e.g., a plurality of images of the same scene of the statue of liberty).
Block 202 may determine an initial feature vector for the image received at block 201. The initial, base feature vector may be any type of feature vector, including VLAD, BoW, Fisher, and deep CNN activations. The initial feature vector of the image may also be known as the positive feature.
Block 203 may receive a set of generic images. In one example, the generic images may comprise the generic negative set M as discussed in connection with Equation Nos. (1) and (3). The amount of generic negative set images may be large enough to allow for a reasonable computational time on a target CPU (e.g., a per-image E-SVM feature construction time of about 1 second, corresponding to about 100,000 negative images). In one example, the content of the generic negative images varies between each negative image.
Block 204 may determine initial feature vectors the generic, negative images received from block 203. For example, block 204 may determine an initial feature vector for each image in the set of generic images. The feature vectors determined by block 204 may be the same type of feature vectors as the feature vector determined by block 202. Block 204 may provide the determined feature vectors, i.e., the generic negative features, to block 205.
Block 205 may determine an E-SVM feature 210. Block 205 may determine the E- SVM feature 210 based on the initial feature vector from block 202 and the initial generic negative feature vectors received from block 204. Block 205 may include one or more of blocks 206, 207 and 208. In one example, block 205 may combine or omit the functionality described in connection with one or more of blocks 206, 207 and 208. Alternatively, the functionality of blocks 206-208 may be combined and performed by block 205. In one example, block 205 may perform a classifier determination other than the SVM determination. In one example, block 205 may perform additional post-processing scheme to the determined exemplar feature.
Block 206 may perform an SVM determination based on the initial feature from block 202 and the initial generic negative features from block 204. Block 206 may perform an SVM or E-SVM function. Block 206 may determine a vector u and a bias parameter b. The vector u may be orthogonal to a hyper-plane that separates generic negatives from an exemplar feature(s). The bias parameter b may specify a displacement of the hyper-plane along the
orthogonal. In one example, block 206 may determine the orthogonal vector u and the bias parameter b in accordance with the principles described in connection with Equation Nos. (1) and (3).
Block 207 may process a vector u from block 206. Block 207 may normalize the vector u to determine parameter w. Block 207 may optionally post-process the vector u or parameter w. For example, block 207 may perform post-processing such as a normalization under a 1-2 normalization (e.g., in accordance with the principles described in connection with Equation Nos. (2) and (4)). In another example, the processing may be squared root related to the Hellinger Kernel. In another example, the processing can be a sequential application of a plurality of post-processing solutions.
Block 205 may output an E-SVM feature 210. The E-SVM feature 210 may correspond to the E-SVM feature illustrated in Fig. 1C. Alternatively, the E-SVM feature 210 may be a post-processed version of the E-SVM feature illustrated in Fig. 1C.
Fig. 1C illustrates an orthogonal vector « 174 and an E-SVM feature w 170. An SVM algorithm may determine an orthogonal vector u 174 and a bias b parameter 173. The orthogonal vector u 174 may be determined as discussed in connection with blocks 205 and 206 of Fig. 2. The orthogonal vector « 174 may be orthogonal to a separating hyper-plane 172. The orthogonal vector u 174 may be determined based on an exemplar feature x that may represent a queried image or a positive image concept. The orthogonal vector u 174 may be normalized to produce the E-SVM feature w 170. For example, the orthogonal vector u 174 may be normalized as described in connection with block 207 of Fig. 2. The parameter u and/or the E- SVM feature w 170 may be determined based on negative vectors zi 175. In one example, negative vectors zi may be determined in accordance with the principles described in connection with block 204 of Fig. 2. The negative vectors zi 175 may be selected from a pool of generic features 176.
Fig. 3 illustrates a flow chart of a method 300 for determining RE-SVM feature(s) in accordance with present principles. Block 301 may receive one or more image(s).
Block 302 may determine an initial feature for the image from block 301. The initial feature vector of the image may be known as the initial positive feature(s). In one example, block 302 may determine the initial feature vector in accordance with the principles described in connection with block 202 of Fig. 2.
Block 303 may receive a set of generic images. The generic images may be negative images determined in accordance with the images described in connection with block 203 of Fig. 2. Block 304 may determine initial negative features for the generic, negative images from block 303. Block 304 may determine initial negative features in accordance with the principles described in connection with block 204 of Fig. 3. In one example, block 304 may be performed off-line or prior to the performance of method 300.
Block 305 may determine a first, positive RE-SVM feature (RE-SVMi) based on the positive feature(s) from block 302 and the negative features from block 304. Block 305 may determine the RE-SVMi feature based on an SVM determination, e.g., by applying an E-SVM function to the positive and negative features. Block 305 may perform an SVM determination as discussed in connection with Fig. 2 (e.g., blocks 205-207) and Fig. 1C. Block 305 may determine the RE-SVMi feature as described in connection with Equation Nos. (1) to (4).
Block 306 may determine first negative RE-SVMi features A/"1 corresponding to the generic image features from block 304. In one example, block 306 may determine the first generic negative RE-SVMi features in accordance with the principles described in connection with Equation Nos. (l)-(4), (10), (10a). For example, block 306 may use the output of block 304 to replace the set of generic features Λ"0 discussed in connection with Equation Nos. (1)- (4), (10), (10a). In one example, block 306 may use the output of block 304 as a generic negative pool Af0 and then may use and each feature in the pool of features Af° to determine a corresponding RE-SVM feature while using all or some of the features of the pool Jsf° as a generic negative pool. The RE-SVM features may be determined as described in connection with Equation Nos. (l)-(4), (10), (10a). In one example, block 306 may be optional and the image features from block 304 may be provided directly to block 307. In one example, block 306 may be performed off-line or prior to the performance of method 300.
Block 307 may determine a second, positive RE-SVM feature (RE-SVM2). Block 307 may determine the RE-SVM2 feature based on the first, positive RE-SVMi feature and the first negative RE-SVMi features ΛΓ' . Block 307 may determine the RE-SVM2 feature based on an SVM determination. The SVM determination of block 307 would consider second, positive RE-SVM2 feature as the positive exemplar and first negative RE-SVM features JV1 as the negative exemplars. Block 307 may perform an SVM determination as discussed in connection with Fig. 2 (e.g., blocks 205-208) and Fig. 1C. Block 307 may determine the RE-SVM2 feature as described in connection with Equation Nos. (1-4, 10, 10a).
Block 309 may determine an Nth positive RE-SVMN feature. Block 309 may recursively determine the RE-SVMN feature based on an RE-SVMN-I positive feature and RE-SVMN-I negative features. The N- 1 RE-SVMN- i negative features may be determined by block 308. The N-1 RE-SVMN-I negative features may be determined based on the N-2 RE-SVMN-2 negative features. Each N-1 RE-SVMN-I negative feature may be determined in accordance with the principles described in connection with Equation Nos. (1-4, 10, 10a). Each N-1 RE-SVM negative feature is computed by using the corresponding N-2 RE-SVMN-2 negative feature as a positive feature and using all other N-2 RE-SVMN-2 features as negative features. In one example, block 308 may be optional and the image features from blocks 304 or 306 may be provided directly to block 307. In one example, block 308 may be performed off-line or prior to the performance of method 300.
Block 309 may determine the final, positive RE-SVMN feature 310 based on N iterations of the recursive framework. Block 309 may determine the RE-SVMN feature 310 based on a positive RE-SVMN- i and N-1 negative RE-SVM features. For example, when N=2, the output RE-SVMN feature 310 is RE-SVM2 from block 307. In another example, when N=5, RE-SVM feature 310 may be RE-SVM5 which may be determined based on a previously determined RE- SVM4 feature and previously determined fourth negative RE-SVM features for generic, negative images.
Block 309 may determine the RE-SVMN feature 310 based on an SVM determination. Block 309 may perform an SVM determination as discussed in connection with Fig. 2 (e.g., blocks 205-208) and Fig. 1C. Block 309 may determine the RE-SVMN feature as described in connection with Equation Nos. (l)-(4), (10), and (10a). The RE-SVM determination may be repeated N times to determine the RE-SVMN feature.
Fig. 4 illustrates an exemplary image search based on RE-SVM features. Block 401 may receive positive images (e.g., one or more query images). The positive images may represent a visual concept (e.g., cat). Block 402 may determine positive RE-SVM features for the positive images from block 1. Block 402 may determine positive RE-SVM features in accordance with the principles described in connection with Equation Nos. (l)-(4) and Fig. 3.
Block 403 may receive negative images. The negative images may be counter examples of the visual concept (e.g., images not containing cats). In one example, the negative images may be images a set of generic, negative images. Block 404 may determine negative RE-SVM features for the negative images from block 403. In one example, block 404 may determine
negative RE-SVM features in accordance with the principles described in connection with Equation Nos. (l)-(4), (10) and (10a) and Fig. 3.
Block 405 may learn a classifier based on the negative and positive RE-SVM features received from blocks 402 and 404, respectively. The learned classifier can be an SVM classifier, or a K classifier, or a nearest class mean classifier, or any other type of classifier.
Block 406 may determine image search results based on the classifier from block 405 and features from a large database of image features 407. The large database of image features 407 may contain the features of the images that will be searched. The large database of image features 407 may be determined in accordance with the principles described in connection with blocks 143-145 of Fig. la and block 154 of Fig. lb. Block 406 may determine a score for each image by applying the classifier to each of the search image features. In a symmetric determination, the search features of block 407 may be RE-SVM features determined in accordance with the principles described in connection with Fig. 3 and Equation Nos. (10)- (10a). The search results may be ranked based on the score and the ranking may be provided along with the results.
Fig. 5 illustrates a flow chart of a method 500 for extracting RE-SVM local descriptors. Block 501 may extract N local descriptors from an input image. Block 502 may extract many local descriptors from a generic pool of images. Block 503 may compute RE-SVM local descriptors based on the local descriptors extracted at blocks 501 and 502. Block 504 may output the results of block 503. The output of block 504 may be provided to a local descriptor encoder, such as a VLAD, Fisher or Bag of Words local descriptor encoder. The Ν local descriptors provided by block 504 may be aggregated into a single global image feature. The global feature may represent features of a query, positive, negative, training, search or any other image and may be used during image searching.
Fig. 6 illustrates a flow chart of a method 600 for accelerated E-SVM or RE-SVM feature determination. Method 600 may include a block 601 which may receive features of an image. In one example, block 601 may receive the image and may determine the image features. In one example, the features may be E-SVM or RE-SVM features as discussed in connection with Figs. 2 and 3. Alternatively, the features may be initial image features or any other type of features. Block 601 may determine the E-SVM or RE-SVM image features as discussed in connection with methods 200 and 300 of Figs. 2 and 3, respectively. In another example, the
features may be E-SVM or RE-SVM features derived from local descriptors. Block 601 may provide the image features to block 604.
Block 602 may receive a large set of features of negative, generic images. The large set of features may have a size that is a multiple of the size of the feature vector (e.g., Ι Οχ, lOOx, l OOOx). Each feature vector may have any size, including in the range of 100 to tens of thousands. The total size of the large set may be very high, including over 100,000 or over a million features.
In one example, block 602 may receive features of generic images as described in connection with blocks 303 and 304. In another example, block 602 may determine the features of generic images. The features of the generic images may be initial, base features, E-SVM features or RE-SVM features. In another example, the features may be E-SVM or RE-SVM features derived from local descriptors. Block 602 may provide the features of the generic images to block 603.
Block 603 may perform similarity clustering for the generic features from block 602. In one example, block 603 may perform cosine-similarity clustering. In one example, block
603 may perform similarity clustering in accordance with the principles described by Equation
Nos. (1 1) and (1 1 a) and output the resulting centroids Ck.
In one example, block 603 may perform similarity clustering in accordance with the principles described in connection with Fig 6a. In one example, block 603 may determine each centroid by: i) initializing clusters , and ii) updating each centroid cfc and each corresponding cluster , and (iii) repeating (i) and (ii) recursively. Unlike dictionary learning for sparse coding, the present clustering approach includes clustering that is sensitive to the sign of the inner product z' c; (compare with dictionary learning where the cluster assignment is done using the absolute value, e.g., |z'Tc( |).
Block 603 may optionally perform the clustering offline before any image searching is carried out. For example, the processing associated with block 603 may be performed prior to image searching and the results may be stored.
Block 604 may perform the E-SVM function using as generic negatives the centroids Ck from block 603. For example, the centroids ck may be utilized as the generic negative features are typically utilized by an E-SVM function. In one example, block 604 may perform the E-
SVM function by further considering auxiliary data for each centroid.
In one example, block 604 may perform an E-SVM function based on the centroids from block 603 as the generic negative features. For example, block 604 may perform an E-SVM
function as described in accordance with the principles of block 205 of Fig. 2. In another example, block 604 may perform the RE-SVM determination according to blocks 305, 307, or 309 of Fig. 3.
In one example, the auxiliary data may be represent the number of generic negative features vectors Nk in each cluster k. The variable k, with k=l, ...,K, identifies a cluster number, and the variable K denotes the total number of clusters. Block 604 may determine scaled centroids by multiplying the centroid k by the corresponding Nk . Block 604 may perform an E-SVM function based on the scaled centroids. Block 604 may perform E-SVM determinations in accordance with Equation Nos. (l)-(4) and (12).
Fig. 6A illustrates an exemplary method for performing similarity clustering in accordance with present principles. Fig. 6A includes a block 650 that may receive generic negative features. The generic features may be E-SVM, RE-SVM or base features such as VLAD, Bag-of-words or Fisher vector. In one example, the generic negative features may be the features received from block 602 in Figure 6. In one example, the set of generic negative features may have a size N (i.e., there are N generic negative features received by block 651). In one example, block 650 may receive the actual values of each generic negative feature. In another example, block 650 may receive one or more pointers to the values of the generic negative features. Each negative generic feature may be represented by a vector.
Block 651 may initialize centroids in accordance with present principles. In one example, block 651 may initialize a set of centroids c\, CK„ where there are a total number of K centroids. In one example, block 651 may initialize centroids by randomly choosing K generic negative features from the features received in block 650. The value of K is determined empirically and should be smaller than the total number of generic negative features. In one example, block 651 may initialize the centroids based on the K generic negative features. For example, block 651 may loop through the K generic negative features and assign each feature to a corresponding centroid. For example, block 651 may assign a first negative feature to a first centroid ci, a second negative feature to a second centroid c2, a third negative feature to a third centroid C3, and so on until the Kth negative feature which is assigned to centroid Ck. In another example, the centroids may be initialized to fixed values that do not depend on the generic negatives. For example, each centroid may be assigned a value that is determined by a random number generator. Each centroid may be a vector of the same size as the features stored in the set N.
Block 652 may begin a loop (hereinafter "loop(l)") that will loop and perform the processing of blocks 653-662 using a variable i having a range from one through a number Z. The number Z may signify the number of loop iterations that will be performed to optimize the values of the centroids. In one example, the number Z may be received from an outside source (e.g., may be set by a user) or may be empirically determined. Block 652 may pass control to block 653.
Block 653 may initialize clusters in accordance with present principles. In one example, block 653 may initialize a set of clusters Ci, C2, ... Ck. The set of clusters may have the same amount of elements (K) as the set of centroids initialized by block 651. Each cluster C may have a one to one association with a corresponding centroid. For example, cluster Ci corresponds to centroid ci, cluster C2 corresponds to centroid c2, cluster C3 corresponds to centroid C3, ..., cluster Ck corresponds to centroid Ck. Each cluster C is a list designed to point to features associated to the corresponding centroid of that cluster. Block 653 may initialize all the clusters (e.g., clusters Ci, C2, ... Ck) to zero. Block 653 may then pass control to block 654.Block 654 may begin a loop (hereinafter "loop(2)") that will loop through all the generic negative features from block 650 using a variable j having a range from one to the total number of generic negative features N. Block 654 may pass control to block 655. For example, when j=l, loop 1 may loop through generic negative feature zi. Likewise, when j=4, loop 2 may loop through generic negative feature z4.
Block 655 may begin a loop (hereinafter "loop(3)") that will loop through all the centroids ci, ... Ck and their corresponding clusters Ci, ... Ck using a variable / having a range from one to the total number of centroids K. Block 655 may pass control to block 656.
Block 656 may determine a distance between the feature zj and a centroid ci. In one example, block 656 may determine distance based on the signed inner product or the absolute value of an inner product between the feature zj and the centroid ci. Block 656 may pass control to block 657 when the value of 1 is greater than K. Block 657 may end loop 3. Block 657 may then pass control to block 658.
Block 658 may assign feature zj to a closest centroid. In one example, block 658 may determine the closest centroid to the feature zj based on the distance between the feature zj and all the centroids determined by loop 3. In one example, block 658 may determine the centroid with the highest absolute value of an inner product between feature zi and the centroid. In another example, block 658 may determine the closest centroid based on any other method that compares the distances determined by loop 3. Block 658 may assign the feature zj to the closest
centroid by updating the cluster corresponding to that closest centroid. In one example, block
658 may update the corresponding cluster to include an identifier to feature zj.
Block 658 may pass control to block 659 when the value of j is greater than N. Block
659 may end loop 2. Block 659 may pass control to block 660.
Block 660 may begin a loop (hereinafter "loop(4)") that will loop through all the centroids ci, ... Ck and their corresponding clusters Ci, ... Ck using a variable m having a range from one to the total number of centroids K. Block 660 may pass control to block 661.
Block 661 may update the value for the centroid Cm. In one example, block 661 may update the value of the centroid cm based on the first left-singular vector of the features identified by the corresponding cluster Cm. Block 661 may review the values of the cluster Cm that corresponds to the centroid cm. The cluster Cm may identify the features clustered with centroid cm (e.g., the features closest to centroid cm). Block 661 may create a matrix Mm based on the feature vectors identified by cluster Cm. This matrix may be the correlation matrix for the features assigned to cluster Cm. Block 661 may then determine an updated value for the centroid Cm based on the left-singular vector of the matrix Mm with the strongest singular value. Block 661 may pass control to block 662 when the value of m is greater than K. Block 662 may end loop 4. Block 662 may pass control to block 663 when the value of i is greater than A. Block 663 may further optionally output updated values for all the centroid ci, .. . Ck.
Fig. 6B illustrates a plot of similarity clustering in accordance with present principles. In one example, Fig. 6B illustrates an example of the similarity clustering performed by the method of Fig. 6A or by block 603 of Fig. 6. Fig. 6B illustrates a two dimensional feature space where the generic negative features of size two exist, with each of the two axes corresponding to one of the entries in each feature vector zi
Fig. 6B illustrates for a set of 10 generic negative features zi 671, Z2 672, Z3 673, TA 61 A, Z5 675, Z6 676, Z7 677, zs 678, Z9 679, zio 680. Fig. 6B further illustrates three centroids ci 681, a 682, C3 683 and their corresponding weights Ni 691, N2692, N3 693. In an example, each of the weights Ni, N2, N3 may represent the number of nearest features of the corresponding centroid ci, a, C3. Each centroid ci, c2, C3 may be learned in accordance with the principles described in connection with Fig. 6A.
As illustrated in Fig. 6B, each centroid ci, a, C3 may be associated with a respective
Cluster 1, Cluster 2 and Cluster 3. Each feature zi is part of a cluster of the closest or nearest centroid. For example, centroid ci 681 may have a cluster of features Z6 676, Z7 677, zs 678. Furthermore, centroid ci 681 may have an associated weight Ni 691. The value of weight Ni
may be 3, indicating that there are three features assigned to cluster ci 681. Centroid C2 682 may have a cluster of features zi 671 , Z3 673, TA 61 A, is 679. Furthermore, centroid C2 682 may have an associated weight 2 692. The value of weight N2 may be 4, indicating that there are four features assigned to cluster C2 682. Centroid C3 683 may have a cluster of features Z2 672, Z5 675, zio 680. Furthermore, centroid C3 683 may have an associated weight N3 693. The value of weight N3 may be 3, indicating that there are three features assigned to cluster C3 683.
Fig. 7 is a block diagram illustrating an exemplary image processing system 700. The image processing system includes an image processing device 710 and a display 720. In one example, the device 710 and the display 720 may be connected by a physical link. In another example, the device 710 and the display 720 may communicate via a communication link, such as, for example a wireless network, a wired network, a short range communication network or a combination of different communication networks.
The display 720 may allow a user to interact with image processing device 710, including, for example, inputting criteria for performing an image search. The display 720 may also display the output of an image search.
The image processing device 710 includes memory and processor 730 and any other software or hardware that allow the performance of image searcher 740, RE-SVM image feature determinator 750 and accelerated feature determinator 760. The device 710 may be configured to perform any of the features of the blocks depicted in FIG. 2-6a. .
Each of the RE-SVM image searcher 740, RE-SVM image feature determinator 750 and accelerated feature determinator 760 may be executed in whole or in part on image processing device 710. Alternatively, each of the image searcher 740, RE-SVM image feature determinator 750 and accelerated feature determinator 760 may be performed remotely and the results may be communicated to device 710 via a communication link. The dashed boxes may indicate that each of the box processes may be executed on image processing device 710 or may be executed remotely.
Fig. 8 illustrates an example of various image processing devices 801 -804 and a server 805. The image processing devices may be smartphones (e.g., device 801), tablets (e.g., device 802), laptops (e.g., device 803), or any other image processing device 804 that includes software and hardware to execute the features of described in the Figures. The image processing devices 801 -804 may be similar to the image processing device 710 and the image processing system 700 described in connection with Fig. 7. Image searcher 740, RE-SVM image feature
determinator 750 and accelerated feature determinator 760 may be executed on any of the devices 801-804, on server 805, or in a combination of any of the devices 801-804 and server 805.
Fig. 9 represents an exemplary architecture of an apparatus 900 which may be configured to implement methods described in relation with Fig. 1-6A and Equation Nos. 1- 10a. In one example, apparatus 900 represents an apparatus that may be configured to implement the methods according to present principles, including principles described in relation to Figs. l-6a.
Apparatus 900 comprises following elements that are linked together by a data and address bus 901 :
a microprocessor 902 (or CPU), which is, for example, a DSP (or Digital Signal
Processor);
a ROM (or Read Only Memory) 903 ;
a RAM (or Random Access Memory) 904;
- an I/O interface 905 for reception of data to transmit, from an application; and a battery 906 (or other suitable power source).
According to a variant, the battery 906 is external to the apparatus. In each of mentioned memory, the word « register » used in the specification can correspond to area of small capacity (some bits) or to very large area (e.g. a whole program or large amount of received or decoded data). ROM 903 comprises at least a program and parameters. Algorithm of the methods according to the invention is stored in the ROM 903. When switched on, the CPU 902 uploads the program in the RAM and executes the corresponding instructions.
RAM 904 comprises, in a register, the program executed by the CPU 902 and uploaded after switch on of the apparatus 900, input data in a register, intermediate data in different states of the method in a register, and other variables used for the execution of the method in a register.
The implementations described herein may be implemented in, for example, a method or a process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method or an apparatus), the implementation of features discussed may also be implemented in other forms (for example a program). An apparatus may be implemented in, for example, appropriate hardware, software, and firmware. The methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable
logic device. Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants ("PDAs"), and other devices that facilitate communication of information between end-users.
According to a specific example of image processing, the image or picture I is obtained from a source. For example, the source belongs to a set comprising:
a local memory (903 or 904), e.g. a video memory or a RAM (or Random Access Memory), a flash memory, a ROM (or Read Only Memory), a hard disk ;
a storage interface (905), e.g. an interface with a mass storage, a RAM, a flash memory, a ROM, an optical disc or a magnetic support;
- a communication interface (905), e.g. a wireline interface (for example a bus interface, a wide area network interface, a local area network interface) or a wireless interface (such as a IEEE 802.1 1 interface or a Bluetooth® interface); and
an image capturing circuit (e.g. a sensor such as, for example, a CCD (or Charge- Coupled Device) or CMOS (or Complementary Metal-Oxide-Semiconductor)).
According to different embodiments, the decoded image I is sent to a destination; specifically, the destination belongs to a set comprising:
a local memory (903 or 904), e.g. a video memory or a RAM, a flash memory, a hard disk ;
a storage interface (905), e.g. an interface with a mass storage, a RAM, a flash memory, a ROM, an optical disc or a magnetic support;
a communication interface (905), e.g. a wireline interface (for example a bus interface (e.g. USB (or Universal Serial Bus)), a wide area network interface, a local area network interface, a HDMI (High Definition Multimedia Interface) interface) or a wireless interface (such as a IEEE 802.11 interface, Wi-Fi ® or a Bluetooth ® interface); and
- a display.
According to different examples, the bitstream BF and/or F are sent to a destination. As an example, one of bitstream F and BF or both bitstreams F and BF are stored in a local or remote memory, e.g. a video memory (904) or a RAM (904), a hard disk (903). In a variant, one or both bitstreams are sent to a storage interface (905), e.g. an interface with a mass storage, a flash memory, ROM, an optical disc or a magnetic support and/or transmitted over a communication interface (905), e.g. an interface to a point to point link, a communication bus, a point to multipoint link or a broadcast network.
According to different examples, the bitstream BF and/or F is obtained from a source. Exemplarily, the bitstream is read from a local memory, e.g. a video memory (904), a RAM (904), a ROM (903), a flash memory (903) or a hard disk (903). In a variant, the bitstream is received from a storage interface (905), e.g. an interface with a mass storage, a RAM, a ROM, a flash memory, an optical disc or a magnetic support and/or received from a communication interface (905), e.g. an interface to a point to point link, a bus, a point to multipoint link or a broadcast network.
According to different examples, apparatus 900 being configured to implement methods in accordance with present principles, belongs to a set comprising:
a mobile device ;
a communication device ;
a game device ;
a tablet (or tablet computer) ;
a laptop ;
a still image camera;
a video camera ;
an encoding chip;
a still image server ; and
a video server (e.g. a broadcast server, a video-on-demand server or a web server).
According to different examples, apparatus 900 being configured to implement an image processing process in accordance with present principles, belongs to a set comprising:
a mobile device ;
a communication device ;
a game device ;
a set top box;
a TV set;
a tablet (or tablet computer) ;
a laptop ;
a display and
a decoding chip.
Implementations of the various processes and features described herein may be embodied in a variety of different equipment or applications. Examples of such equipment
include an encoder, a decoder, a post-processor processing output from a decoder, a preprocessor providing input to an encoder, a video coder, a video decoder, a video codec, a web server, a set-top box, a laptop, a personal computer, a cell phone, a PDA, and other communication devices. As should be clear, the equipment may be mobile and even installed in a mobile vehicle.
An aspect of present principles may be implemented on any electronic device or combination of electronic devices. For example, the present principles may be implemented on any of variety of devices including a computer, a laptop, a smartphone, a handheld computing system, a remote server, or on any other type of dedicated hardware. Various examples of the present invention are described below with reference to the figures.
Implementations of the various processes and features described herein may be embodied in a variety of different equipment or applications. Examples of such equipment include an encoder, a decoder, a post-processor processing output from a decoder, a preprocessor providing input to an encoder, a video coder, a video decoder, a video codec, a web server, a set-top box, a laptop, a personal computer, a cell phone, a PDA, and other communication devices. As should be clear, the equipment may be mobile and even installed in a mobile vehicle.
Additionally, the methods may be implemented by instructions being performed by a processor, and such instructions (and/or data values produced by an implementation) may be stored on a processor-readable medium such as, for example, an integrated circuit, a software carrier or other storage device such as, for example, a hard disk, a compact diskette ("CD"), an optical disc (such as, for example, a DVD, often referred to as a digital versatile disc or a digital video disc), a random access memory ("RAM"), or a read-only memory ("ROM"). The instructions may form an application program tangibly embodied on a processor-readable medium. Instructions may be, for example, in hardware, firmware, software, or a combination. Instructions may be found in, for example, an operating system, a separate application, or a combination of the two. A processor may be characterized, therefore, as, for example, both a device configured to carry out a process and a device that includes a processor-readable medium (such as a storage device) having instructions for carrying out a process. Further, a processor- readable medium may store, in addition to or in lieu of instructions, data values produced by an implementation.
As will be evident to one of skill in the art, implementations may produce a variety of signals formatted to carry information that may be, for example, stored or transmitted. The
information may include, for example, instructions for performing a method, or data produced by one of the described implementations. For example, a signal may be formatted to carry as data the rules for writing or reading the syntax of a described example, or to carry as data the actual syntax-values written by a described example. Such a signal may be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal. The formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream. The information that the signal carries may be, for example, analog or digital information. The signal may be transmitted over a variety of different wired or wireless links, as is known. The signal may be stored on a processor-readable medium.
A number of implementations have been described. Nevertheless, it will be understood that various modifications may be made. For example, elements of different implementations may be combined, supplemented, modified, or removed to produce other implementations. Additionally, one of ordinary skill will understand that other structures and processes may be substituted for those disclosed and the resulting implementations will perform at least substantially the same function(s), in at least substantially the same way(s), to achieve at least substantially the same result(s) as the implementations disclosed. Accordingly, these and other implementations are contemplated by this application.
Numerous specific details have been set forth herein to provide a thorough understanding of the present invention. It will be understood by those skilled in the art, however, that the examples above may be practiced without these specific details. In other instances, well- known operations, components and circuits have not been described in detail so as not to obscure the present invention. It can be appreciated that the specific structural and functional details disclosed herein may be representative and do not necessarily limit the scope of the present invention.
Various examples of the present invention may be implemented using hardware elements, software elements, or a combination of both. Some examples may be implemented, for example, using a computer-readable medium or article which may store an instruction or a set of instructions that, if executed by a machine, may cause the machine to perform a method and/or operations in accordance with the examples. Such a machine may include, for example, any suitable processing platform, computing platform, computing device, processing device, computing system, processing system, computer, processor, or the like, and may be implemented using any suitable combination of hardware and/or software. The computer-
readable medium or article may include, for example, any suitable type of memory unit, memory device, memory article, memory medium, storage device, storage article, storage medium and/or storage unit. The instructions may include any suitable type of code, such as source code, compiled code, interpreted code, executable code, static code, dynamic code, encrypted code, and the like, implemented using any suitable high-level, low-level, object- oriented, visual, compiled and/or interpreted programming language.
The implementations described herein may be implemented in, for example, a method or a process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method), the implementation of features discussed may also be implemented in other forms (for example, an apparatus or program). An apparatus and constituents included therein, for example, a processor, an encoder and a decoder, may be implemented in, for example, appropriate hardware, software, and firmware. The methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants ("PDAs"), and other devices that facilitate communication of information between end-users.
Additionally, this application or its claims may refer to "determining" various pieces of information. Determining the information may include one or more of, for example, estimating the information, calculating the information, predicting the information, or retrieving the information from memory.
Further, this application or its claims may refer to "accessing" various pieces of information. Accessing the information may include one or more of, for example, receiving the information, retrieving the information (for example, from memory), storing the information, processing the information, transmitting the information, moving the information, copying the information, erasing the information, calculating the information, determining the information, predicting the information, or estimating the information.
Additionally, this application or its claims may refer to "receiving" various pieces of information. Receiving is, as with "accessing", intended to be a broad term. Receiving the information may include one or more of, for example, accessing the information, or retrieving the information (for example, from memory). Further, "receiving" is typically involved, in one way or another, during operations such as, for example, storing the information, processing the
information, transmitting the information, moving the information, copying the information, erasing the information, calculating the information, determining the information, predicting the information, or estimating the information.
Claims
1. A method for image searching, comprising:
determining a classifier (604) based on at least a centroid (603) representing a plurality of negative features; and
performing image searching based on the classifier;
wherein the centroids are determined based on a clustering (603) of the plurality of negative features.
2. An apparatus for image searching, comprising:
a processor (730, 902) configured to determine a classifier based on centroids representing a plurality of negative features and to perform image searching based on the classifier;
wherein the centroids are determined based on a clustering of the plurality of negative features.
3. The method of claim 1 or the apparatus of claim 2, wherein the at least a centroid is determined based on a corresponding cluster and wherein the corresponding cluster includes a subset of the plurality of negative features.
4. The method or apparatus of claim 3, wherein the subset of negative features includes negative features that are closer to the corresponding cluster than any other cluster.
5. The method or apparatus of claim 3, wherein the at least a centroid is determined based on a left-singular vector of a matrix composed of the subset features.
6. The method of claim 1 or the apparatus of claim 2, wherein a score is determined for a search image by applying the classifier to the search image.
7. The method of claim 1 or the apparatus of claim 2, wherein the at least a centroid further has associated auxiliary data.
8 The method or apparatus of claim 7, wherein the auxiliary data is utilized to modify the at least a centroid.
9. The method or apparatus of claim 7, wherein the auxiliary data corresponds to a weight value.
10. The method or apparatus of claim 9, wherein the weight is determined based on the amount of negative features that are associated with the at least a centroid.
11. The method or apparatus of claim 7, wherein the auxiliary data is statistical information of the subset of negative features associated with the at least a centroid.
12. The method of claim 1 or the apparatus of claim 2, wherein the classifier is determined based on a positive E-SVM feature associated and the centroids as negative features.
13. The method of claim 1 or the apparatus of claim 2, wherein the classifier is determined based on a positive RE-SVM feature and RE-SVM features associated with the centroids.
14. The method of claim 1 or the apparatus of claim 2, wherein a feature is assigned to a centroid based on an inner product between the feature and the centroid.
15. The method of claim 1 or the apparatus of claim 2, wherein a total amount of features corresponding to the plurality of negative features has an amount greater than ten times the size of each negative feature.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP14306810.4 | 2014-11-14 | ||
| EP14306810 | 2014-11-14 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2016075293A1 true WO2016075293A1 (en) | 2016-05-19 |
Family
ID=52016008
Family Applications (2)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/EP2015/076522 Ceased WO2016075274A1 (en) | 2014-11-14 | 2015-11-13 | Methods, systems and apparatus for image recognition based on recursively determined exemplar-support vector machines (e-svm) features |
| PCT/EP2015/076567 Ceased WO2016075293A1 (en) | 2014-11-14 | 2015-11-13 | Accelerated support vector machine (svm) learning using clustering |
Family Applications Before (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/EP2015/076522 Ceased WO2016075274A1 (en) | 2014-11-14 | 2015-11-13 | Methods, systems and apparatus for image recognition based on recursively determined exemplar-support vector machines (e-svm) features |
Country Status (1)
| Country | Link |
|---|---|
| WO (2) | WO2016075274A1 (en) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107844337A (en) * | 2017-10-31 | 2018-03-27 | 广东欧珀移动通信有限公司 | Background application cleaning method, device, storage medium and electronic equipment |
| KR20180038169A (en) * | 2016-10-06 | 2018-04-16 | 가톨릭대학교 산학협력단 | Safety classification method of the city image using deep learning-based data feature |
| US11010665B2 (en) | 2015-12-22 | 2021-05-18 | Applied Material Israel, Ltd. | Method of deep learning-based examination of a semiconductor specimen and system thereof |
Families Citing this family (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109543571B (en) * | 2018-11-07 | 2022-03-08 | 西安交通大学 | Intelligent identification and retrieval method for special-shaped processing characteristics of complex products |
| CN110175626A (en) * | 2019-04-15 | 2019-08-27 | 哈尔滨理工大学 | One kind is based on SVM image identification system and method under cloud platform |
| WO2022098295A1 (en) | 2020-11-06 | 2022-05-12 | Visenze Pte Ltd | A system and a method for generating an image recognition model and classifying an input image |
-
2015
- 2015-11-13 WO PCT/EP2015/076522 patent/WO2016075274A1/en not_active Ceased
- 2015-11-13 WO PCT/EP2015/076567 patent/WO2016075293A1/en not_active Ceased
Non-Patent Citations (5)
| Title |
|---|
| ABHINAV SHRIVASTAVA ET AL: "Data-driven visual similarity for cross-domain image matching", ACM TRANSACTIONS ON GRAPHICS, 1 December 2011 (2011-12-01), pages 1, XP055249175, Retrieved from the Internet <URL:http://graphics.cs.cmu.edu/projects/crossDomainMatching/abhinav-sa11.pdf> [retrieved on 20160211], DOI: 10.1145/2070781.2024188 * |
| CERVANTES ET AL: "Support vector machine classification for large data sets via minimum enclosing ball clustering", NEUROCOMPUTING, ELSEVIER SCIENCE PUBLISHERS, AMSTERDAM, NL, vol. 71, no. 4-6, 1 January 2008 (2008-01-01), pages 611 - 619, XP022440133, ISSN: 0925-2312, DOI: 10.1016/J.NEUCOM.2007.07.028 * |
| GIANG HOANG NGUYEN ET AL: "Efficient SVM training with reduced weighted samples", NEURAL NETWORKS (IJCNN), THE 2010 INTERNATIONAL JOINT CONFERENCE ON, IEEE, PISCATAWAY, NJ, USA, 18 July 2010 (2010-07-18), pages 1 - 5, XP031771698, ISBN: 978-1-4244-6916-1 * |
| HWANJO YU ET AL: "Classifying large data sets using SVMs with hierarchical clusters", PROCEEDINGS OF THE 9TH. ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING. KDD-2003. WASHINGTON, DC, AUG. 24 - 27, 2003., 1 January 2003 (2003-01-01), US, pages 306, XP055248592, ISBN: 978-1-58113-737-8, DOI: 10.1145/956750.956786 * |
| KE GAO ET AL: "Clustering Guided SVM for Semantic Image Retrieval", PERVASIVE COMPUTING AND APPLICATIONS, 2007. ICPCA 2007. 2ND INTERNATIO NAL CONFERENCE ON, IEEE, PI, 1 July 2007 (2007-07-01), pages 199 - 203, XP031152509, ISBN: 978-1-4244-0970-9 * |
Cited By (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11010665B2 (en) | 2015-12-22 | 2021-05-18 | Applied Material Israel, Ltd. | Method of deep learning-based examination of a semiconductor specimen and system thereof |
| US11205119B2 (en) | 2015-12-22 | 2021-12-21 | Applied Materials Israel Ltd. | Method of deep learning-based examination of a semiconductor specimen and system thereof |
| US11348001B2 (en) | 2015-12-22 | 2022-05-31 | Applied Material Israel, Ltd. | Method of deep learning-based examination of a semiconductor specimen and system thereof |
| US12183066B2 (en) | 2015-12-22 | 2024-12-31 | Applied Materials Israel Ltd. | Method of deep learning-based examination of a semiconductor specimen and system thereof |
| KR20180038169A (en) * | 2016-10-06 | 2018-04-16 | 가톨릭대학교 산학협력단 | Safety classification method of the city image using deep learning-based data feature |
| KR102036957B1 (en) * | 2016-10-06 | 2019-10-25 | 가톨릭대학교 산학협력단 | Safety classification method of the city image using deep learning-based data feature |
| CN107844337A (en) * | 2017-10-31 | 2018-03-27 | 广东欧珀移动通信有限公司 | Background application cleaning method, device, storage medium and electronic equipment |
| CN107844337B (en) * | 2017-10-31 | 2019-08-06 | Oppo广东移动通信有限公司 | Background application cleaning method and device, storage medium and electronic equipment |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2016075274A1 (en) | 2016-05-19 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11823050B2 (en) | Semi-supervised person re-identification using multi-view clustering | |
| CN115937655B (en) | Multi-order feature interaction target detection model, construction method, device and application thereof | |
| WO2023004206A1 (en) | Unsupervised hashing method for cross-modal video-text retrieval with clip | |
| WO2016075293A1 (en) | Accelerated support vector machine (svm) learning using clustering | |
| Cao et al. | Landmark recognition with compact BoW histogram and ensemble ELM | |
| CN114780767B (en) | Large-scale image retrieval method and system based on deep convolutional neural network | |
| WO2018105194A1 (en) | Method and system for generating multi-relevant label | |
| Li et al. | When humans meet machines: Towards efficient segmentation networks | |
| Shi et al. | Training DCNN by combining max-margin, max-correlation objectives, and correntropy loss for multilabel image classification | |
| Yi et al. | Elanet: effective lightweight attention-guided network for real-time semantic segmentation | |
| CN117453949A (en) | A video positioning method and device | |
| Chen et al. | Discriminative BoW framework for mobile landmark recognition | |
| CN116091551B (en) | A method and system for target retrieval and tracking based on multimodal fusion | |
| CN114627282A (en) | Target detection model establishing method, target detection model application method, target detection model establishing device, target detection model application device and target detection model establishing medium | |
| CN104951791A (en) | Data classification method and apparatus | |
| CN114911958B (en) | Semantic preference-based rapid image retrieval method | |
| WO2024045929A9 (en) | Model training method and apparatus, and computer device and storage medium | |
| CN110659392B (en) | Retrieval method and device, and storage medium | |
| Tong et al. | A review of indoor-outdoor scene classification | |
| Hu et al. | Action recognition using multiple pooling strategies of CNN features | |
| Chengjian et al. | Image annotation via deep neural network | |
| Li et al. | Compact convolutional neural network transfer learning for small-scale image classification | |
| EP3096243A1 (en) | Methods, systems and apparatus for automatic video query expansion | |
| CN114329006B (en) | Image retrieval method, apparatus, device, and computer-readable storage medium | |
| US20170309004A1 (en) | Image recognition using descriptor pruning |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15794573 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 15794573 Country of ref document: EP Kind code of ref document: A1 |