WO2010038187A1 - Procédé d'indexation, de reconnaissance et de récupération de groupes de données en présence de bruit - Google Patents
Procédé d'indexation, de reconnaissance et de récupération de groupes de données en présence de bruit Download PDFInfo
- Publication number
- WO2010038187A1 WO2010038187A1 PCT/IB2009/054243 IB2009054243W WO2010038187A1 WO 2010038187 A1 WO2010038187 A1 WO 2010038187A1 IB 2009054243 W IB2009054243 W IB 2009054243W WO 2010038187 A1 WO2010038187 A1 WO 2010038187A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- input data
- arbitrary
- sets
- data
- representations
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/61—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/68—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/683—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
- G11B27/19—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
- G11B27/28—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
Definitions
- This invention relates generally to data clusters retrieval. More particularly, it relates to data clusters indexing, recognition and retrieval and the application in the field of advertisements or media content recognition.
- U.S. Pat. No 6,990,453 discloses a system and methods for recognizing sound and music signals in high noise and distortion. The method is based on a set of landmark time points and associated fingerprints (i.e. a unique identifier that represents a number of features of the media sample). To perform recognition, landmarks and fingerprints are computed for the unknown sample and used to retrieve match fingerprints from the database.
- landmarks and fingerprints are computed for the unknown sample and used to retrieve match fingerprints from the database.
- U.S. Pat. No 7,549,052 ('Generating and match hashes of multimedia content') is disclosed a method for generating robust hashes (i.e. hashes are short summaries or signatures of data files which can be used to identify the file) for multimedia content, for example, audio clips.
- the present invention provides a method for indexing, recognition and retrieval arbitrary subsets of input data (e.g. an advert or a media sample) within arbitrary sets of input data (e.g. a TV recording).
- the method for indexing, recognition and retrieval herein disclosed presents several advantages that finally solve the most relevant issues of the known methods: reliability, robust computing and speed.
- the invention herein described discloses a method for fingerprints indexing, recognition and retrieval and not a method for generating fingerprints. Said invention can use any of the method for generating fingerprints known to the skilled in the art.
- the method comprises many steps that are implemented by specific scripts or components. First of all, it is necessary to extract from arbitrary sets of input data the fingerprints that characterize them. This step involves the transformation of data clusters and of data sets in strings of numbers in binary base, using methods that are well known and of public domain. Technical literature and articles covering this topic is available also on the web and very precise (for instance, 'Robust Audio Hashing for Content Identification' by J. Haitsma, T. Kalker and J. Oostveen). The data clusters and the input sets are transformed in sets of strings of elements in one o more numerical bases. Said strings are then divided in a plurality of parts and the elements that compose said strings are linked together.
- the next step involves the generation of an alternative representation of said parts in one or more numerical bases, not necessarily coincident with those of first transformation.
- Said alternative transformations of data is used to index the representations of data sets where clusters searching will be performed.
- a convenient metric e.g. Hamming distance
- the sets of representations of data to a certain maximum distance from a certain number of representations of data are computed.
- Said representations are used as samples, initially, executing a first retrieval to retrieve all the positions where such representations of data are present inside the sets of research transformed and then executing a test for a preliminary recognition (named 'first recognition') of the clusters representations inside the sets of research transformed as described.
- the method herein disclosed is intrinsically robust and allows a reliable clusters search and retrieval even in presence of noise (i.e. background noise, fade-in and fade- out, transmission errors, interference, et cetera) since it is not based on the hypothesis that for every recognition exist at least one audio fingerprint that is not corrupted.
- the method is also intrinsically more reliable because it generates all the fingerprints of maximum Hamming distance and not only a subset.
- the method is also faster because it needs only a few milliseconds of a sound track recording to recognize the entire sound track (rather than several seconds of recording) with a high probability of success.
- the method is suitable for many applications, such as TV measuring services, for example, for calculating how many times and when a certain advert is broadcasted during an entire daily TV programs schedule. In this way, a correct calculation of the royalty due to the copyrights owner can be performed. Music copyright infringements and plagiarism in musical tracks or advertisements can therefore be recognized.
- advertiser can also assess the effectiveness of an advertising campaign.
- the embodiment concerns the application of the method for indexing, recognizing and retrieval advertisement inside generic TV registration files (e.g. files containing an entire daily TV programs schedule codified in the known standard).
- generic TV registration files e.g. files containing an entire daily TV programs schedule codified in the known standard.
- the following description represents a typical example of such an optimal method for the recognition of adverts inside generic TV registration files but it is given without limiting the generality of the invention herein disclosed. Best Mode for Carrying out the Invention
- the scripts are listed in a order that allow their sequential recall for adverts recognition and retrieval.
- the order of the scripts described in the following preferred embodiment is not rigid and variations with regards to the method, the single scripts and the order of the scripts are obvious to the person skilled in the field.
- the term 'file' is used to describe one or more entities, the entities can be in any format for which the necessary values can be calculated.
- the method is typically implemented as software running on a computer system, with individual steps most efficiently implemented as independent software modules.
- the invention is not limited to any particular hardware system, or programming language. Accordingly, the following preferred embodiment of the invention is set forth without any loss of generality to, and without imposing limitations upon, the claimed invention.
- the recognition process consists of the following steps.
- the variables that characterize the elaboration are arranged.
- the variable bpf (bit for fingerprint) must be arranged so that it is equal to the number of bit that compose each fingerprint and it must be an exact power of 2.
- the variable ppf (parts for fingerprint) must be an exact power of 2, less or equal to the number of bit of fingerprint and will determine the numbers of parts that each fingerprint under evaluation will be subdivided in.
- the variable bpp (bit for part) is calculated instead automatically and represents the number of bit of each part of fingerprints under evaluation.
- the variable fpl (fingerprint for reading) indicates the number of fingerprints read during each reading of the generic file of the fingerprints where the adverts have to be searched.
- nfpc represents the number of fingerprints to be used for a first comparison, i.e. the number of fingerprints that is necessary to consider in order to calculate the sets of fingerprints that are at the maximum Hamming distance mdh from those used for the first comparison for each advert.
- variable smip and smap represent the minimum and maximum percentage threshold between which the indices of first comparison must always be extracted. For example, if an advert is composed of 5182 fingerprints, the variable smip is equal to 0.1 (10%) and the variable smap is equal to 0.9 (90%), then the nfpc fingerprints will be extracted among those characterized by indexes of the advert larger of 519 and lower of 4663.
- the thresholds have been introduced because often the adverts have a fade-in and a fade-out and consequently in these sectors the percentage of distortion, respect to the original fingerprints, is larger that in other cases. In this way we can avoid useless computing because fingerprints that can be retrieved with difficulty in generic files are not taken into account.
- said fingerprints present a very large Hamming distance, a situation that should be avoided in computing because the calculations involved increase exponentially.
- noise and other disturb sources e.g. due to cuts of the adverts and the fades in and fades out
- the variable mdh maximum Hamming distance
- mdh maximum Hamming distance
- variable sdv threshold of validity
- Ttf time between fingerprints
- the files containing fingerprints of the adverts are arranged in a suitable way and other structures are prepared. Since each advert is short, the basic idea is that it is possible to create for each of them a matrix that is stored and saved as a new files for further use. Such matrix contains all the fingerprints of the advert arranged in rows. If, for example, an advert is composed of 5182 fingerprints each of 32 bit, after the elaboration of said advert, a matrix of 5182 rows and 32 columns will be stored and saved (one row for each fingerprint). These matrices will be useful in the script ' Recognition Adverts ' for validating or not the advert recognition inside the generic files considering the several docking indexes extracted during the elaboration as explained below.
- This script generates several structures concerning the generic files, as well as the substructure of indexing of each generic file. For example, if a file is composed of 14878366 fingerprints and the variable fpl is equal to 50000, then 284 readings are required (each reading of 50000 fingerprints with the exception of the last one) to sub- index completely the generic file considered and the fingerprints, of 32 bit, are divided into 4 parts (8 bit for part), where the number of possible combinations of 8 bit is 256. During each reading 50000 fingerprints are taken from the generic file, broken into 4 parts and each part converted in decimal number. The 4 decimal numbers represent the indexes of indexing inside a structure composed of 4 parts, each part with 256 fields (the possible decimal values of the parts of 8 bit).
- the index of fingerprint considered inside the generic file is thus inserted in each of such addresses. For example, after the third reading, we have just read the fingerprints with indexes from 100001 to 150000 inside the generic file considered. Taking the fingerprint corresponding to the index 100001, breaking it into 4 parts and converting said parts in decimal numbers, we finally obtain the numbers 130 78 27 253. Therefore, we access to the indexing structure and insert the index 100001 in positions pic(l).num(130), pic(2).num(78), pic(l).num(27), pic(l).num(253), inserting said index as last index if there are already other numbers in these positions.
- This script creates and stores a matrix with all the possible combinations of bpp bit
- Said matrix is subsequently used to generate all the fingerprints of maximum Hamming distance mdh from the one given, in a more efficient way than known methods in the art, and avoiding useless repetitions and generating each fingerprint of maximum Hamming distance mdh in a reliable way, by means of the use of analogous substructures of indexing to those created after the readings of generic files.
- the fingerprint indexes of first comparison are extracted, such indexes will be then used to generate the fingerprints sets of maximum Hamming distance mdh. If, for example, nfdc is equal to 3 and an advert is composed of 5182 fingerprints and the variables of minimum and maximum thresholds smip and smap are set equal to 0.1 (10%) and 0.9 (90%), firstly we calculate the thresholds (about 519 and 4463) and then the interval between 519 and 4463 is divided in three parts almost equal one to each other, and from each of this parts a fingerprint index of first comparison is extracted.
- Each advert recorded in a file is considered and the fingerprint of first comparison of each advert is retrieved by means of the indexes previously extracted. For each fingerprint retrieved the set of fingerprints of maximum Hamming distance mdh is calculated and this is performed in two phases. In the first, the script ' Generation Fingerprint Parts Of Maximum Distance ' is recalled so that for every fingerprint considered, an index-linked structure is generated with all the parts of maximum Hamming distance mdh from the parts of fingerprint. Such a structure is composed of a number of fields equal to the numbers of parts of the fingerprint considered and each of this fields is further composed of others mdh.
- each set of fingerprints of maximum Hamming distance mdh of each fingerprint of first comparison is used to search for some perfect match with the fingerprints inside the generic files. Since the recordings are not heavily disturbed and having considered several fingerprints of first comparison (7 or 8), the basic idea is that it is enough generating sets of fingerprints of short maximum Hamming distance (1 or 2). In this way, it was possible to successfully retrieve the elements of such sets inside the generic files, store their positional indexes inside the file and use finally said positional indexes in the script ' Recognition Adverts '. To do so in an efficient way, all the indexes of the generic file considered are loaded, starting from its pieces previously indexed.
- Each set of fingerprints of maximum Hamming distance mdh of each advert from each fingerprint of first comparison is therefore considered.
- the several parts of each fingerprint of such sets are then considered, by extracting from the generic file the sets of fingerprint indexes with said parts in the positions indicated, and then performing the intersections among such sets to obtain the set with the fingerprint indexes characterized by a perfect match, in the generic file under evaluation.
- the procedure is repeated for each fingerprint of the set, then intersecting at last the sets obtained among them and storing the resulting set that represents the set of fingerprint of maximum Hamming distance mdh from the fingerprint of first comparison considered.
- the adverts in the several generic files are searched, starting from the docking fingerprint indexes extracted. If, for example, 3 fingerprint indexes of first comparison have been extracted for an advert, then subsequently, 3 sets of fingerprints of maximum Hamming distance mdh are created and for each of such sets the docking fingerprint indexes with perfect match inside the generic files are extracted. For example, lets suppose the fingerprint of first comparison of an advert with 5182 fingerprints has index 1602 and the generation of the fingerprints set of maximum Hamming distance mdh has leaded to extract from one of the generic files of docking fingerprint indexes 23451 and 55456.
- indexes knowing that they are associated to fingerprint of index 1602 of the advert, it is possible to superimpose the matrix of the advert of 5182 rows (one for fingerprint) and 32 columns (one for each bit of fingerprint) so that its row 1602 is centred and superimposed with the fingerprints of index 23451 and 55456 of the generic file considered while the remaining rows are naturally superimposed with the other fingerprints of interest in the generic file considered, from index 23451 -(1602-1) to index 23451+(5182-1602) in the first case and from index 55456-(1602-l) to index 55456+(5182- 1602) in the second case.
- the procedure of elimination takes into account the docking fingerprint index with percentage of match greater not still considered, that is 90%, for the whole advert; then, it calculates the interval defined by the starting point and the end point of the advert inside the generic file with docking fingerprint index 77000 and fingerprint index of first comparison given; further, it deletes all the results that correspond to docking fingerprint indexes that belong to said interval, that are 60% and 70%, storing the data of elaboration concern that of maximum percentage of match, that is 90%, continuing then with the successive docking fingerprint index with maximum percentage of match not still considered in the same way, that is 20000, until it has considered all the docking fingerprint indexes remained.
- the example herewith described refers to fingerprints with length of 32 bit (the binary base it is therefore used), fingerprint subdivided in 4 parts converted to decimal base.
- the fingerprints can be of any length, with elements in one or more numerical bases, any bases, and that also the numerical bases of conversion of values of parts, given from concatenation of elements that compose them, can be any and not coincided necessarily with those previous.
- the method described can be used in all the fields of data retrieval where it is necessary to recognize and to retrieve data clusters aggregated among them (the single advert in this example) inside set containing data of the generic file (entire daily TV programs scheduling).
- Data that have to be equal may instead differ due to the presence of noise (or other interferences) to a great or lesser extent (Hamming distance among 'semi-equal' fingerprints greater than 0 in this case) not allowing therefore the use of a perfect match to retrieval the adverts.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Library & Information Science (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
L’invention concerne un procédé d'indexation, de reconnaissance et de récupération de groupes de données. Un groupe de données est extrait par un ensemble plus grand de données d'entrée génériques, en particulier des fichiers audio ou multimédias, même en présence de bruit. Le procédé est approprié pour de nombreuses applications, telles que : des services de mesure de télévision, par exemple pour calculer la fréquence de diffusion et les temps de diffusion des publicités transmises pendant une planification de programmes de télévision journalière entière ; pour calculer des royalties dues à un propriétaire de droits d'auteur ; pour reconnaître des violations de droits d'auteur de musique et un plagiat dans des pistes musicales ou des publicités. Le procédé est intrinsèquement plus robuste et permet une reconnaissance de publicités plus fiable et plus rapide que n'importe quel autre procédé connu dans l'art. Un mode de réalisation de l'invention associé à la reconnaissance de publicités est également présenté. Plus particulièrement, le mode de réalisation décrit la reconnaissance de plus de 200 publicités, transmises pendant une planification de programmes de télévision journalière entière, en moins de 20 minutes avec une probabilité de réussite proche de 100 %.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| ITVE2008A000074 | 2008-09-30 | ||
| IT000074A ITVE20080074A1 (it) | 2008-09-30 | 2008-09-30 | Algoritmo di riconoscimento e eventuale reperimento di cluster di dati all'interno di insiemi di dati piu' grandi in presenza di eventuale rumore o altre forme di disturbo che li corrompano in modo piu' o meno marcato |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2010038187A1 true WO2010038187A1 (fr) | 2010-04-08 |
Family
ID=41402466
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/IB2009/054243 Ceased WO2010038187A1 (fr) | 2008-09-30 | 2009-09-28 | Procédé d'indexation, de reconnaissance et de récupération de groupes de données en présence de bruit |
Country Status (2)
| Country | Link |
|---|---|
| IT (1) | ITVE20080074A1 (fr) |
| WO (1) | WO2010038187A1 (fr) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104537380A (zh) * | 2014-12-30 | 2015-04-22 | 小米科技有限责任公司 | 聚类方法和装置 |
| CN109101598A (zh) * | 2018-07-31 | 2018-12-28 | 成都华栖云科技有限公司 | 一种小图片页面渲染方法 |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2002065782A1 (fr) * | 2001-02-12 | 2002-08-22 | Koninklijke Philips Electronics N.V. | Contenu multi-media : creation et mise en correspondance de hachages |
| WO2005101243A1 (fr) * | 2004-04-13 | 2005-10-27 | Matsushita Electric Industrial Co. Ltd. | Procede et appareil d'identification de sons audio tels que la musique |
-
2008
- 2008-09-30 IT IT000074A patent/ITVE20080074A1/it unknown
-
2009
- 2009-09-28 WO PCT/IB2009/054243 patent/WO2010038187A1/fr not_active Ceased
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2002065782A1 (fr) * | 2001-02-12 | 2002-08-22 | Koninklijke Philips Electronics N.V. | Contenu multi-media : creation et mise en correspondance de hachages |
| WO2005101243A1 (fr) * | 2004-04-13 | 2005-10-27 | Matsushita Electric Industrial Co. Ltd. | Procede et appareil d'identification de sons audio tels que la musique |
Non-Patent Citations (2)
| Title |
|---|
| HAITSMA J ET AL: "An efficient database search strategy for audio fingerprinting", PROCEEDINGS OF THE 2003 IEEE RADAR CONFERENCE, HUNTSVILLE, AL, US, 5 May 2003 (2003-05-05) - 8 May 2003 (2003-05-08), IEEE, NEW YORK, NY, US, pages 178 - 181, XP010642541, ISBN: 978-0-7803-7920-6 * |
| HAITSMA J ET AL: "Robust Audio Hashing For Content Identification", PROCEEDINGS INTERNATIONAL WORKSHOP ON CONTENT-BASED MULTIMEDIA INDEXING, 1 September 2001 (2001-09-01), pages 1 - 8, XP002264398 * |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104537380A (zh) * | 2014-12-30 | 2015-04-22 | 小米科技有限责任公司 | 聚类方法和装置 |
| CN109101598A (zh) * | 2018-07-31 | 2018-12-28 | 成都华栖云科技有限公司 | 一种小图片页面渲染方法 |
Also Published As
| Publication number | Publication date |
|---|---|
| ITVE20080074A1 (it) | 2008-12-30 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US10497378B2 (en) | Systems and methods for recognizing sound and music signals in high noise and distortion | |
| Miller et al. | Audio fingerprinting: nearest neighbor search in high dimensional binary spaces | |
| US8453170B2 (en) | System and method for monitoring and recognizing broadcast data | |
| Cano et al. | Audio fingerprinting: concepts and applications | |
| US8688248B2 (en) | Method and system for content sampling and identification | |
| Wei et al. | Frame fusion for video copy detection | |
| US20100161656A1 (en) | Multiple step identification of recordings | |
| CN105975568B (zh) | 一种音频处理方法及装置 | |
| WO2005101998A2 (fr) | Procede et systeme d'echantillonnage et d'identification de contenu | |
| CN1997989A (zh) | 用于自动检测和标识广播音频或视频节目信号的方法和装置 | |
| KR100916310B1 (ko) | 오디오 신호처리 기반의 음악 및 동영상간의 교차 추천 시스템 및 방법 | |
| George et al. | Scalable and robust audio fingerprinting method tolerable to time-stretching | |
| WO2010038187A1 (fr) | Procédé d'indexation, de reconnaissance et de récupération de groupes de données en présence de bruit | |
| CN109271501A (zh) | 一种音频数据库的管理方法及系统 | |
| CN113053393B (zh) | 音频标注处理装置 | |
| Maksimović et al. | Detection and localization of partial audio matches in various application scenarios | |
| Maksimović et al. | Detection and localization of partial audio matches | |
| Camarena-Ibarrola et al. | Robust radio broadcast monitoring using a multi-band spectral entropy signature | |
| Khemiri et al. | A generic audio identification system for radio broadcast monitoring based on data-driven segmentation | |
| Senevirathna et al. | A Highly Robust Audio Monitoring System for Radio Broadcasting. | |
| JP2013171139A (ja) | 音楽放送の楽曲音声データと曲名との関連付け方法 | |
| Hossain et al. | A Quick Video Searching Approach from a Continuous Stream using Audio Fingerprint | |
| Catalán | Quality assessment and enhancement of an industrial-strength audio fingerprinting system | |
| Raimond et al. | Using the past to explain the present | |
| CN104994400A (zh) | 一种获取主持人姓名用来索引视频的方法及装置 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 09744747 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 09744747 Country of ref document: EP Kind code of ref document: A1 |