[go: up one dir, main page]

WO2015118431A1 - Method for capture and analysis of multimedia content - Google Patents

Method for capture and analysis of multimedia content Download PDF

Info

Publication number
WO2015118431A1
WO2015118431A1 PCT/IB2015/050670 IB2015050670W WO2015118431A1 WO 2015118431 A1 WO2015118431 A1 WO 2015118431A1 IB 2015050670 W IB2015050670 W IB 2015050670W WO 2015118431 A1 WO2015118431 A1 WO 2015118431A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
content
band
hash
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/IB2015/050670
Other languages
French (fr)
Inventor
David ABREU FELINO CARVALHÃO
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
EDGE INNOVATION LDA
Original Assignee
EDGE INNOVATION LDA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by EDGE INNOVATION LDA filed Critical EDGE INNOVATION LDA
Publication of WO2015118431A1 publication Critical patent/WO2015118431A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04HBROADCAST COMMUNICATION
    • H04H60/00Arrangements for broadcast applications with a direct linking to broadcast information or broadcast space-time; Broadcast-related systems
    • H04H60/56Arrangements characterised by components specially adapted for monitoring, identification or recognition covered by groups H04H60/29-H04H60/54
    • H04H60/58Arrangements characterised by components specially adapted for monitoring, identification or recognition covered by groups H04H60/29-H04H60/54 of audio
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04HBROADCAST COMMUNICATION
    • H04H2201/00Aspects of broadcast communication
    • H04H2201/90Aspects of broadcast communication characterised by the use of signatures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04HBROADCAST COMMUNICATION
    • H04H60/00Arrangements for broadcast applications with a direct linking to broadcast information or broadcast space-time; Broadcast-related systems
    • H04H60/35Arrangements for identifying or recognising characteristics with a direct linkage to broadcast information or to broadcast space-time, e.g. for identifying broadcast stations or for identifying users
    • H04H60/38Arrangements for identifying or recognising characteristics with a direct linkage to broadcast information or to broadcast space-time, e.g. for identifying broadcast stations or for identifying users for identifying broadcast time or space
    • H04H60/40Arrangements for identifying or recognising characteristics with a direct linkage to broadcast information or to broadcast space-time, e.g. for identifying broadcast stations or for identifying users for identifying broadcast time or space for identifying broadcast time

Definitions

  • the present invention relates to a method of operating a computational device in order to capture and analyze multimedia content.
  • Document EP 1760693 describes the extraction and matching of fingerprints specific to audio signals.
  • a Fast Fourier Transform (FFT) is here applied over original audio signal and over again on samples of each considered frequency band by using a nonlinear timescale.
  • the aforementioned document reveals also that fingerprints are individually saved in database, with a set of bits used as an index, and the frequency-elimination is determined to increase the feasibility in noisy or distortion conditions. At least, the document does not show any observations on how to provide a similar solution of audio stream analysis during a television broadcasting.
  • Document US8396705 shows a method of extraction and correspondence, or matching, of characteristic fingerprints from audio signals.
  • FFT is applied in order to obtain an audio spectrogram and also on audio signal in each frequency range.
  • a non-linear timescale, logarithm or exponential, is used in the document.
  • the document shows that the analysis can have band frequency overlapping.
  • a set of bits of feature vectors identified in fingerprinting process are used in this document as index in database.
  • Document WO0227600 regards to audio acquisition from mobile devices and to analysis of specific audio type - music, describing the main service of Shazam.
  • infomercials is also referred, by analyzing the audio in order to promote the impulse purchase in every moment and not only when the event appears in the screen.
  • a number of fingerprinting and hashing techniques are described, with the purpose of building databases and comparing samples. In practice, problems are observed in this solution regarding technical performance and output.
  • Document WO02061652 refers to the audio acquisition of an emission signal, identification of one or more entities in the sample and establishment of a transaction with the user in relation to the identified entities. This transaction may include the sending of additional information or the transaction of values in respect to the identified companies. Marketing area is not addressed. Regarding the technical solution presented in this document, the comparison already made for WO 0227600 document is applied.
  • the present application discloses a method of operating a computational device comprising the steps:
  • the method further comprises the steps:
  • the step of registering a positive identification and the content and temporal stamps in the database if there is a match further comprises the step of presenting an advertisement related to the acquired audio content, based on a relation stored in the database. In a further embodiment, this step comprises the step of presenting a replacement advertisement in real-time. In one embodiment, the step of registering a positive identification and the content and temporal stamps, further comprises the steps:
  • the step of comparing the at least one hash of the at least one Power Peak of the Band with at least one hash in the database comprises the steps:
  • the step of processing at least one hash of the at least one Power Peak of the Band comprises processing the hash with:
  • & is the bitwise AND operation
  • the PPBi is the Power Peak of the Band at index i
  • N is the number of frequency bands.
  • the step of processing at least one hash of the at least one Power Peak of the Band comprises the steps :
  • the step of dividing the Power Peak of the Bands into at least one group comprises two groups, where the first half of the frequency bands is the first group and second half is the second group.
  • the step of processing at least one Power Peak of the Band of a frequency band of the audio spectrum comprises the frequency range from 50 Hz to 2047 Hz with the bands 50-100Hz, 100-200Hz, 200-300Hz, 300- 400Hz, 400-500HZ, 500-600Hz, 600-700Hz, 700-800Hz, 800- 900Hz, 900-lOOOHz, 1000-llOOHz, 1100-1200Hz, 1200-1300Hz, 1300-1400HZ, 1400-1500HZ, 1500-1600Hz, 1600-1700Hz, 1700- 1800Hz, 1800-1900HZ, 1900-2047Hz.
  • the audio spectrum is processed using the Fast Fourier Transform.
  • the step of dividing the sample into at least one block comprises overlapping blocks with uniform size of 4096 bytes, separated by 512 bytes and containing 3584 bytes in common.
  • the step of capturing an audio stream and division into at least one sample comprises an overlapping sample with a sampling divergent configuration which contain three samples spanning 5 seconds, with an offset of 0.1 seconds.
  • the present application also discloses a computational device, comprising:
  • the device is configured to implement the method described .
  • the computational device comprises a connection to a multimedia content transmission, where the multimedia content transmission is any of:
  • This invention is hence related to a method implemented by a computational device for identification of content transmitted by a video stream and determination with position accuracy of this content via analysis of respective audio stream.
  • All attempts for audio identification are recorded, including the moment of attempt, its success and, in case of achievement, the name of the content and the instant when the match occurred.
  • b) These attempts are thus evaluated in relation to the previous ones, from which the system obtains a conclusion in relation to the content in display, according to the following conditions: i. If the content is recognized, the instant is reported; ii. If an attempt of identification is not succeed, new evaluations will be performed over the involved key points, with size samples of 3s, 7s and 10s.
  • These assessments are executed in parallel and aim to determine if the last successful identification is recognized in any segment of the most recent attempt .
  • a broadcasting system provides access to network streams from one or multiple channels. Initially, each network stream, which consists of multiple audio and video streams of content data that are transmitted to the end users, is captured . Transcode the stream to normalized stream of audio
  • the network streams are retransmitted to one or multiple processes capable of interpreting the raw signal and transcoding it to different formats. These processes isolate the audio streams, normalizing them to a Pulse-Code Modulation (PCM), with a sample rate of 44.1 kHz, precision of 8 bits and 1 audio channel, known as mono. This process runs continuously and in real time.
  • PCM Pulse-Code Modulation
  • the normalized audio stream is addressed to the audio analysis component, which divides it into smaller blocks with an uniform size - of 4096 bytes.
  • the blocks are overlapped, instead of contiguous, being separated by 512 bytes between them and having 3584 bytes in common.
  • This process takes, approximately, 10.25 milliseconds to analyze one second of normalized audio.
  • the Fast Fourier Transform (FFT) is applied to each one of aforementioned blocks, obtaining the audio spectrum associated with the block. This operation takes, approximately, 1.19 milliseconds per block.
  • PPBs are collected from the following subset of frequency range: 50-100Hz, 100-200HZ, 200-300HZ, 300-400Hz, 400-500Hz, 500-600Hz, 600- 700Hz, 700-800HZ, 800-900Hz, 900-1000Hz, 1000-llOOHz, 1100- 1200Hz, 1200-1300HZ, 1300-1400Hz, 1400-1500Hz, 1500-1600Hz, 1600-1700HZ, 1700-1800HZ, 1800-1900Hz, 1900-2047Hz. Apply a hash function to PPBs
  • the obtained PPBs are divided into two groups - the first 10 bands and the last 10 bands - and the hash algorithm is applied. This process takes, approximately, 0.002 milliseconds .
  • the resultant hashes are recorded in database. A reference to the analyzed audio data and to the position of the corresponding block in determined time is maintained.
  • the present invention is useful for providing the user's multimedia devices, such as set top boxes, smartphones, tablets, televisions, laptops and among others, with an easy, accurate and fast access to content, products and services related to the visualized content via the aforementioned devices.
  • External metadata are included in the audio stream as an additional variable during the analysis and detection of audio matches, and the identification of content in reproduction.
  • This metadata is based on EPG system, which is publicly traded by broadcasting companies, allowing to assess what has been visualized, with some degree of uncertain .
  • the audio identification component uses the EPG in order to decrease the spectrum of key points that can correspond to the content in visualization.
  • This research scope is constituted of key points from EPG content transmitted immediately before-and-after of the content, which has been visualized, in case they are already available in database.
  • This solution includes both capture and analysis of audio stream, in a faster way, saving processing time and reducing significantly energy consumption when compared to remaining solutions disclosed and/or available in the market.
  • This invention will also serve to fix broadcast mismatch problems in relation to what was reported by EPG, analyzing continuously the corresponding emissions.
  • the present invention relates to a method implemented for a computational device in order to identify the content transmitted through a video stream-related data and to determine the transmission time position of that content from the capture and analysis of the corresponding audio stream.
  • this invention is helpful to provide the user's multimedia devices, such as set top boxes, smartphones, tablets, televisions, laptops and among others, with an easy, accurate and significant fast access to content, products and services related to the aforementioned devices and corrections in Electronic Program Guides (EPG) , in comparison with the available solutions until now. Nonetheless, the present invention is not only limited to the aforementioned functionalities.
  • EPG Electronic Program Guides
  • the current invention presents several applications, mainly : - Analysis of multimedia broadcasting, maintaining the synchronism in relation to what is reporting by EPG;
  • Contents can be information about the content that has been visualized, such as, the information about the main characters, places, facts about the television program, similar programs that may have interest for the user, or advertising, among others ;
  • the request for obtaining the content may be an added value for statistical analysis, for example, to conclude which ads are the most popular.
  • the attachments contain the figures that show up preferential realizations whose purpose; however, it is not to limit the object of this request. - Applications to analysis of video broadcast, but also audio identifying the TV advertisement and presenting a replacement advertisement in real-time.
  • Figure 1 illustrates an embodiment of the method of capture and analysis of multimedia content, where the reference numbers show:
  • Figure 2 illustrates an embodiment of the method of capture and analysis of multimedia content, where the reference numbers show:
  • FIG. 3 illustrates an embodiment of the database creation method, where the reference numbers show:
  • Figure 4 illustrates an embodiment of the method of presentation of advertising contextualized with acquired audio content, where the reference numbers show:
  • Figure 5 illustrates an embodiment of the method of detecting the mismatch of at least one transmission of a TV content broadcasting in relation to what is reported by EPG for said transmission, where the reference numbers show: 201 - Capture audio stream;
  • Figure 6 illustrates an embodiment of the method of correcting the mismatch of a transmission of a TV content broadcasting in relation to what is reported by the EPG said transmission:
  • Figure 7 illustrates an embodiment of the database access method, where the reference numbers show:
  • the current invention relates to a method implemented by a computational device to identify the content transmitted by a video or audio stream and determine the position accuracy of this content via analysis of the corresponding audio stream.
  • All attempts for audio identification are recorded, including the moment of attempt, its success and, in case of achievement, the name of the content and the instant when the match occurred.
  • b) These attempts are thus evaluated in relation to the previous ones, from which the system obtains a conclusion in relation to the content in display, according to the following conditions: i. If the content is recognized, the instant is reported; ii. If an attempt of identification is not succeed, new evaluations will be performed over the involved key points, with size samples of 3s, 7s and 10s.
  • These assessments are executed in parallel and aim at determining if the last successful identification is recognized in any segment of the most recent attempt .
  • the audio track is acquired in digital format by using one of the following, input, sources: a network stream provided by a broadcasting system; a digital audio device; a microphone that captures the audio environment.
  • input, sources a network stream provided by a broadcasting system; a digital audio device; a microphone that captures the audio environment.
  • the audio track is coded or transcoded to a normalized audio format - Pulse-Code Modulation (PCM) , with a sample rate of 44.1 kHz, 8 bits of resolution and 1 audio channel (mono) .
  • PCM Pulse-Code Modulation
  • the Fast Fourier Transform is applied to each one of aforementioned blocks, obtaining the audio spectrum associated with the block.
  • the spectrum of audio is normalized;
  • the analysis is limited to the relevant frequency range whose partition is made in a number of non-overlapping frequency bands proportional to the frequency range size, with uniform distribution, i.e., from 50 Hz to 2047 Hz, being the bands 50-100Hz, 100-200Hz, 200- 300Hz, 300-400HZ, 400-500Hz, 500-600Hz, 600-700Hz, 700- 800Hz, 800-900HZ, 900-1000Hz, 1000-llOOHz, 1100-1200Hz, 1200-1300HZ, 1300-1400HZ, 1400-1500Hz, 1500-1600Hz, 1600-1700HZ, 1700-1800HZ, 1800-1900Hz, 1900-2047Hz ; d) For each one band identified above, the corresponding frequency is the power peak on that band - Power Peak of the Band (PPB) is determined.
  • PPB Power Peak of the Band
  • the hash information is recorded in database, together with the position in the determined time of that segment.
  • the means to reduce the search space by contextualization are heuristically determined, i.e., through analyzing the programming of a given cable channel and determining the possible TV programs in a certain time and date;
  • a broadcast system provides access to network streams from one or multiples channels.
  • the network stream is acquired, consisting of multiple audio and video streams of content data, which are transmitted to the end users.
  • the audio track is acquired in digital format, by using one of the following inputs: a network stream provided by a broadcasting system; a digital audio device; a microphone that captures the audio environment.
  • the normalized audio stream is addressed to the audio analysis component, which:
  • the audio spectrum is normalized; c) The analysis is limited to the relevant frequency range whose partition is made in a number of non-overlapping frequency bands proportional to the frequency range size, with uniform distribution, i.e., from 50 Hz to 2047 Hz, being the bands 50-100Hz, 100-200Hz, 200- 300Hz, 300-400HZ, 400-500Hz, 500-600Hz, 600-700Hz, 700- 800Hz, 800-900HZ, 900-1000Hz, 1000-llOOHz, 1100-1200Hz, 1200-1300HZ, 1300-1400HZ, 1400-1500Hz, 1500-1600Hz, 1600-1700HZ, 1700-1800HZ, 1800-1900Hz, 1900-2047Hz ) ; d) For each one band identified above, the corresponding frequency is the power peak on that band - Power Peak of the Band (PPB) is determined.
  • PPB Power Peak of the Band
  • the resulted hash is stored.
  • steps 204 to 213 are repeated, with a sampling divergent configuration, i.e., three samples spanning 5 seconds, with an offset of 0.1 seconds (209);
  • the matching results and the hashes from collected PPBs are stored for subsequent analysis (212, 213) .
  • - Audio acquisition and analysis from one or several transmissions of a TV content broadcasting company broadcasting each one in representation of a TV channel, detection of content in visualization and presentation of advertising contextualized with acquired audio content.
  • advertising-type which may be presented are the products shown on the visualized content, travels to the displayed places, clothes used by characters, among others;
  • - Audio acquisition and analysis from one or multiple transmissions of a TV content broadcasting company each one in representation of a TV channel, detection of the content in visualization and its temporal instant, notification of its mismatch in relation to what is reported by EPG in the channel;
  • the process runs according to the shown flowchart. From the moment the visualized content mismatch and the corresponding instant in relation to what is reported by EPG is detected, then the flowchart is added with a set of steps for that mismatch identification and notification.
  • An additional step for identification of mismatch in relation to EPG (501) is introduced after the step for indicating a positive identification and storage of contents and temporal stamp (211); In case of the mismatch in relation to EPG is not identified, the loss of synchronization is marked (502) .
  • the step which points to the loss of synchronization (502) is replaced, in the process of Example 2, by a step to fix the content and temporal stamp of EPG (601) .
  • the present realization is not restricted to the aforementioned realizations in this document and a person who has intermediate knowledge in this area may predict several possibilities to change it without withdraw the general idea, as defined in the claims.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The present invention relates to a method implemented for a computational device in order to identify the content transmitted through a video or audio stream data and to determine the transmission time position of that content from the capture and analysis of the corresponding audio stream. Disclosed is a solution comprising a stream server (101), a computational device (102), an update to the Electronic Program Guide and a Commercial ID and Timestamp (104). Therefore, this invention is useful to provide the user's multimedia devices, such as set top boxes, smartphones, tablets, televisions, laptops and among others, with an easy, accurate and fast access to content, products and services related to the aforementioned devices and corrections in Electronic Program Guides, being able to be applied in industries as information technologies, telecommunications, communication, marketing, multimedia and entertainment.

Description

DESCRIPTION
"METHOD FOR CAPTURE AND ANALYSIS OF MULTIMEDIA CONTENT" Technical field
The present invention relates to a method of operating a computational device in order to capture and analyze multimedia content.
Background art
The following documents describe some state-of-the-art solutions .
Document EP 1760693 describes the extraction and matching of fingerprints specific to audio signals. A Fast Fourier Transform (FFT) is here applied over original audio signal and over again on samples of each considered frequency band by using a nonlinear timescale.
The aforementioned document reveals also that fingerprints are individually saved in database, with a set of bits used as an index, and the frequency-elimination is determined to increase the feasibility in noisy or distortion conditions. At least, the document does not show any observations on how to provide a similar solution of audio stream analysis during a television broadcasting.
Document US 8321394 presents a correspondence, or matching, method for fingerprint by using message digest algorithm. The description of fingerprinting algorithm is not very detailed, not being cited how frequencies and temporal instants of the samples are analyzed. The use of this algorithm in context of identification of a certain moment in audio stream, in addition with the EPG, isn't mentioned in the document, and the goal of that invention is neither aiming at fixing EPG nor advertising the products in context of transmitted contents.
Document US8396705 shows a method of extraction and correspondence, or matching, of characteristic fingerprints from audio signals. In the document, FFT is applied in order to obtain an audio spectrogram and also on audio signal in each frequency range. A non-linear timescale, logarithm or exponential, is used in the document. The document shows that the analysis can have band frequency overlapping. A set of bits of feature vectors identified in fingerprinting process are used in this document as index in database.
Document WO0227600 regards to audio acquisition from mobile devices and to analysis of specific audio type - music, describing the main service of Shazam. The possible use of invention in infomercials is also referred, by analyzing the audio in order to promote the impulse purchase in every moment and not only when the event appears in the screen. A number of fingerprinting and hashing techniques are described, with the purpose of building databases and comparing samples. In practice, problems are observed in this solution regarding technical performance and output.
Document WO02061652 refers to the audio acquisition of an emission signal, identification of one or more entities in the sample and establishment of a transaction with the user in relation to the identified entities. This transaction may include the sending of additional information or the transaction of values in respect to the identified companies. Marketing area is not addressed. Regarding the technical solution presented in this document, the comparison already made for WO 0227600 document is applied.
Document WO 3091990 addresses to a preliminary calculus of key points. The key points referred in this document are then combined in pairs for obtaining a fingerprint object. These are used as a base to create a histogram in order to evaluate the existence of a peak relative to a high correspondence, or match, probability. In practice, a low probability of match is observed.
Hence, an easy, fast and efficient access and user interaction method with the available content in transmitted emissions by hardware devices is currently needed .
Summary
The present application discloses a method of operating a computational device comprising the steps:
- dividing an audio stream into at least one sample;
- dividing the sample into at least one block;
- processing an audio spectrum of the block;
- processing at least one Power Peak of the Band of the at least one frequency band of the audio spectrum;
- processing at least one hash of the at least one Power Peak of the Band and storing it in the database.
In one embodiment, the method further comprises the steps:
- capturing an audio stream from the connection to a multimedia content transmission;
- comparing the at least one hash of the at least one Power Peak of the Band with at least one hash in the database ; - if there is a match, a positive identification and the content and temporal stamps are registered in the database ;
- if no match is found, the loss of synchronization is registered in the database.
In another embodiment, the step of registering a positive identification and the content and temporal stamps in the database if there is a match, further comprises the step of presenting an advertisement related to the acquired audio content, based on a relation stored in the database. In a further embodiment, this step comprises the step of presenting a replacement advertisement in real-time. In one embodiment, the step of registering a positive identification and the content and temporal stamps, further comprises the steps:
- comparing a temporal instant of the identified content to an electronic program guide and processing a mismatch; and
- correcting the electronic program guide based on the mismatch .
In another embodiment, the step of comparing the at least one hash of the at least one Power Peak of the Band with at least one hash in the database, comprises the steps:
- determining a set of the possible matching hashes, stored in the database, in a certain time and date, based on an electronic program guide; and
- comparing the at least one hash of the at least one Power Peak of the Band with the set of the possible matching hashes. In a further embodiment, the step of processing at least one hash of the at least one Power Peak of the Band, comprises processing the hash with:
1
(PPB0 & 16777214) +
Figure imgf000006_0001
where & is the bitwise AND operation, the PPBi is the Power Peak of the Band at index i, and N is the number of frequency bands.
In one embodiment, the step of processing at least one hash of the at least one Power Peak of the Band, comprises the steps :
- dividing the Power Peak of the Bands into at least one group; and
- processing a hash for each group.
In another embodiment, the step of dividing the Power Peak of the Bands into at least one group, comprises two groups, where the first half of the frequency bands is the first group and second half is the second group.
In a further embodiment, the step of processing at least one Power Peak of the Band of a frequency band of the audio spectrum, comprises the frequency range from 50 Hz to 2047 Hz with the bands 50-100Hz, 100-200Hz, 200-300Hz, 300- 400Hz, 400-500HZ, 500-600Hz, 600-700Hz, 700-800Hz, 800- 900Hz, 900-lOOOHz, 1000-llOOHz, 1100-1200Hz, 1200-1300Hz, 1300-1400HZ, 1400-1500HZ, 1500-1600Hz, 1600-1700Hz, 1700- 1800Hz, 1800-1900HZ, 1900-2047Hz.
In one embodiment, the audio spectrum is processed using the Fast Fourier Transform. In another embodiment, the step of dividing the sample into at least one block, comprises overlapping blocks with uniform size of 4096 bytes, separated by 512 bytes and containing 3584 bytes in common.
In a further embodiment, the step of capturing an audio stream and division into at least one sample, comprises an overlapping sample with a sampling divergent configuration which contain three samples spanning 5 seconds, with an offset of 0.1 seconds.
The present application also discloses a computational device, comprising:
- a data processing means;
- a memory; and
- a database,
where the device is configured to implement the method described .
In one embodiment, the computational device comprises a connection to a multimedia content transmission, where the multimedia content transmission is any of:
- a network stream provided by a broadcasting system;
- an audio digital device; or
- a microphone that captures the ambient audio. Disclosure of invention
It was verified that the analysis of multimedia broadcasts, by using EPG, could result in creation of a solution for product placement and deployment, or Product Placement, in such way that the presented products are synchronized with the content that is transmitted on television or in other electronic device for content broadcast. Besides, it was also identified that audio analysis synchronized with EPG could identify the advertising content, being in this way an added value for advertising/marketing agencies. In addition to these needs, mismatches between information provided by EPG and the content transmitted on television were found during the development phase. Therefore, this solution could also be used as a mean to fix those mismatches, being extremely useful in area of telecommunications .
In summary, the market lacked in solutions that allow not only the identification of TV content but also the achievement of very accurate temporal positioning, enabling the synchronization with other contents or devices containing television content.
Considering the above, a technological solution was then developed in order to respond to the identified market needs, being sufficiently versatile and quite reliable as well .
This invention is hence related to a method implemented by a computational device for identification of content transmitted by a video stream and determination with position accuracy of this content via analysis of respective audio stream. a) All attempts for audio identification are recorded, including the moment of attempt, its success and, in case of achievement, the name of the content and the instant when the match occurred. b) These attempts are thus evaluated in relation to the previous ones, from which the system obtains a conclusion in relation to the content in display, according to the following conditions: i. If the content is recognized, the instant is reported; ii. If an attempt of identification is not succeed, new evaluations will be performed over the involved key points, with size samples of 3s, 7s and 10s. These assessments are executed in parallel and aim to determine if the last successful identification is recognized in any segment of the most recent attempt .
If it occurs, the conclusion from condition i. is applied.
If not, the loss of synchronism relative to emission of an advertising block, or other emission not recognized by the system, is reported.
The global steps of the method for capture and analysis of audio, key point's extraction and respective temporal instant for use in future identifications are described below .
Capture of stream
A broadcasting system provides access to network streams from one or multiple channels. Initially, each network stream, which consists of multiple audio and video streams of content data that are transmitted to the end users, is captured . Transcode the stream to normalized stream of audio
The network streams are retransmitted to one or multiple processes capable of interpreting the raw signal and transcoding it to different formats. These processes isolate the audio streams, normalizing them to a Pulse-Code Modulation (PCM), with a sample rate of 44.1 kHz, precision of 8 bits and 1 audio channel, known as mono. This process runs continuously and in real time.
Analyze the audio stream
The normalized audio stream is addressed to the audio analysis component, which divides it into smaller blocks with an uniform size - of 4096 bytes. The blocks are overlapped, instead of contiguous, being separated by 512 bytes between them and having 3584 bytes in common.
This process takes, approximately, 10.25 milliseconds to analyze one second of normalized audio.
Collection of Power Peaks of the Bands - PPBs
The Fast Fourier Transform (FFT) is applied to each one of aforementioned blocks, obtaining the audio spectrum associated with the block. This operation takes, approximately, 1.19 milliseconds per block.
The audio spectrum is normalized and PPBs are collected from the following subset of frequency range: 50-100Hz, 100-200HZ, 200-300HZ, 300-400Hz, 400-500Hz, 500-600Hz, 600- 700Hz, 700-800HZ, 800-900Hz, 900-1000Hz, 1000-llOOHz, 1100- 1200Hz, 1200-1300HZ, 1300-1400Hz, 1400-1500Hz, 1500-1600Hz, 1600-1700HZ, 1700-1800HZ, 1800-1900Hz, 1900-2047Hz. Apply a hash function to PPBs
The obtained PPBs are divided into two groups - the first 10 bands and the last 10 bands - and the hash algorithm is applied. This process takes, approximately, 0.002 milliseconds .
Store the hashes from collected PPBs for subsequent analysis
The resultant hashes are recorded in database. A reference to the analyzed audio data and to the position of the corresponding block in determined time is maintained.
The present invention is useful for providing the user's multimedia devices, such as set top boxes, smartphones, tablets, televisions, laptops and among others, with an easy, accurate and fast access to content, products and services related to the visualized content via the aforementioned devices.
External metadata are included in the audio stream as an additional variable during the analysis and detection of audio matches, and the identification of content in reproduction. This metadata is based on EPG system, which is publicly traded by broadcasting companies, allowing to assess what has been visualized, with some degree of uncertain .
The audio identification component uses the EPG in order to decrease the spectrum of key points that can correspond to the content in visualization. This research scope is constituted of key points from EPG content transmitted immediately before-and-after of the content, which has been visualized, in case they are already available in database. This solution includes both capture and analysis of audio stream, in a faster way, saving processing time and reducing significantly energy consumption when compared to remaining solutions disclosed and/or available in the market. This invention will also serve to fix broadcast mismatch problems in relation to what was reported by EPG, analyzing continuously the corresponding emissions.
In practice, the method implemented by a computational device in the referred hardware devices of this solution is faster and simple as never before.
The present invention relates to a method implemented for a computational device in order to identify the content transmitted through a video stream-related data and to determine the transmission time position of that content from the capture and analysis of the corresponding audio stream.
Therefore, this invention is helpful to provide the user's multimedia devices, such as set top boxes, smartphones, tablets, televisions, laptops and among others, with an easy, accurate and significant fast access to content, products and services related to the aforementioned devices and corrections in Electronic Program Guides (EPG) , in comparison with the available solutions until now. Nonetheless, the present invention is not only limited to the aforementioned functionalities.
Possible applications of current invention
The current invention presents several applications, mainly : - Analysis of multimedia broadcasting, maintaining the synchronism in relation to what is reporting by EPG;
- Analysis of video broadcast, but also audio, with the purpose of presenting the content related to that transmission. Contents can be information about the content that has been visualized, such as, the information about the main characters, places, facts about the television program, similar programs that may have interest for the user, or advertising, among others ;
- Analysis of video broadcast, but also audio, in order to verify the user's best interests. The request for obtaining the content, mainly, about advertising, may be an added value for statistical analysis, for example, to conclude which ads are the most popular.
- Synchronization applications with video and other devices or content that need to be synchronized with certain multimedia contents;
- Applications for identifying the sounds produced by machines or equipment, such as, to identify cars that cross the road by analyzing their sound, or to identify machine failures, for example, by identifying the absence of a specific sound in an operating machine;
- In order to provide an easy technical understanding, the attachments contain the figures that show up preferential realizations whose purpose; however, it is not to limit the object of this request. - Applications to analysis of video broadcast, but also audio identifying the TV advertisement and presenting a replacement advertisement in real-time.
Brief description of drawings
Figure 1 illustrates an embodiment of the method of capture and analysis of multimedia content, where the reference numbers show:
101 - stream server;
102 - computational device;
103 - Update EPG; and
104 - Commercial ID & Timestamp.
Figure 2 illustrates an embodiment of the method of capture and analysis of multimedia content, where the reference numbers show:
201 - Capture audio stream;
202 - Transcode stream to normalized audio stream;
203 - Analyze audio stream;
204 - Collect PPBs;
205 - Hash PPBs;
206 - Compare samples of collected PPB hashes with existing library;
207 - Match found?;
208 - First analysis?;
209 - Update sampling configuration;
210 - Loss of synchronization, store unknown content and timestamp;
211 - Store identified content and timestamp;
212 - Store match results in matching history for subsequent analysis; and
213 - Store collected PPB Hashes for subsequent analysis. Figure 3 illustrates an embodiment of the database creation method, where the reference numbers show:
201 - Capture audio stream;
202 - Transcode stream to normalized audio stream;
203 - Analyze audio stream;
204 - Collect PPBs;
205 - Hash PPBs; and
213 - Store collected PPB Hashes for subsequent analysis.
Figure 4 illustrates an embodiment of the method of presentation of advertising contextualized with acquired audio content, where the reference numbers show:
201 - Capture audio stream;
202 - Transcode stream to normalized audio stream;
203 - Analyse audio stream;
204 - Collect PPBs;
205 - Hash PPBs;
206 - Compare samples of collected PPB hashes with existing library;
207 - Match found?;
208 - First analysis?;
209 - Update sampling configuration;
210 - Loss of synchronization, store unknown content and timestamp;
211 - Store identified content and timestamp;
212 - Store match results in matching history for subsequent analysis; and
213 - Store collected PPB Hashes for subsequent analysis.
Figure 5 illustrates an embodiment of the method of detecting the mismatch of at least one transmission of a TV content broadcasting in relation to what is reported by EPG for said transmission, where the reference numbers show: 201 - Capture audio stream;
202 - Transcode stream to normalized audio stream;
203 - Analyze audio stream;
204 - Collect PPBs;
205 - Hash PPBs;
206 - Compare samples of collected PPB hashes with existing library;
207 - Match found?;
208 - First analysis?;
209 - Update sampling configuration;
210 - Loss of synchronization, store unknown content and timestamp;
211 - Store identified content and timestamp;
212 - Store match results in matching history for subsequent analysis;
213 - Store collected PPB Hashes for subsequent analysis;
501 - Identified content and timestamp in synchronization with EPG? ; and
502 - Report loss of synchronization.
Figure 6 illustrates an embodiment of the method of correcting the mismatch of a transmission of a TV content broadcasting in relation to what is reported by the EPG said transmission:
201 - Capture audio stream;
202 - Transcode stream to normalized audio stream;
203 - Analyze audio stream;
204 - Collect PPBs;
205 - Hash PPBs;
206 - Compare samples of collected PPB hashes with existing library;
207 - Match found?;
208 - First analysis?; 209 - Update sampling configuration;
210 - Loss of synchronization, store unknown content and timestamp;
211 - Store identified content and timestamp;
212 - Store match results in matching history for subsequent analysis;
213 - Store collected PPB Hashes for subsequent analysis; 501 - Identified content and timestamp in synchronization with EPG? ; and
601 - Fix EPG' s reported content and timestamp.
Figure 7 illustrates an embodiment of the database access method, where the reference numbers show:
701 - User loads client application;
702 - Fetch identified content and timestamp from server;
703 - Fetch products related to identified content and current timestamp;
704 - Show fetched products; and
705 - User exited?.
Best mode for carrying out the invention
The current invention relates to a method implemented by a computational device to identify the content transmitted by a video or audio stream and determine the position accuracy of this content via analysis of the corresponding audio stream. a) All attempts for audio identification are recorded, including the moment of attempt, its success and, in case of achievement, the name of the content and the instant when the match occurred. b) These attempts are thus evaluated in relation to the previous ones, from which the system obtains a conclusion in relation to the content in display, according to the following conditions: i. If the content is recognized, the instant is reported; ii. If an attempt of identification is not succeed, new evaluations will be performed over the involved key points, with size samples of 3s, 7s and 10s. These assessments are executed in parallel and aim at determining if the last successful identification is recognized in any segment of the most recent attempt .
If it occurs, the conclusion from condition i. is applied.
If not, the loss of synchronism relative to emission of an advertising block, or other emission not recognized by the system, is reported.
The creation of a comparison database is required to apply the method of this invention and it is created as follows.
1. Capture of the audio stream (201)
The audio track is acquired in digital format by using one of the following, input, sources: a network stream provided by a broadcasting system; a digital audio device; a microphone that captures the audio environment. When the comparison database is created, it is assumed that audio input provides a higher quality stream comparing with the audio input used for content analysis.
2. Transcode the stream to normalized stream of audio (202)
The audio track is coded or transcoded to a normalized audio format - Pulse-Code Modulation (PCM) , with a sample rate of 44.1 kHz, 8 bits of resolution and 1 audio channel (mono) .
3. Collection of Power Peaks of the Bands - PPBs (203, 204)
a) The Fast Fourier Transform (FFT) is applied to each one of aforementioned blocks, obtaining the audio spectrum associated with the block. b) The spectrum of audio is normalized; c) The analysis is limited to the relevant frequency range whose partition is made in a number of non-overlapping frequency bands proportional to the frequency range size, with uniform distribution, i.e., from 50 Hz to 2047 Hz, being the bands 50-100Hz, 100-200Hz, 200- 300Hz, 300-400HZ, 400-500Hz, 500-600Hz, 600-700Hz, 700- 800Hz, 800-900HZ, 900-1000Hz, 1000-llOOHz, 1100-1200Hz, 1200-1300HZ, 1300-1400HZ, 1400-1500Hz, 1500-1600Hz, 1600-1700HZ, 1700-1800HZ, 1800-1900Hz, 1900-2047Hz ; d) For each one band identified above, the corresponding frequency is the power peak on that band - Power Peak of the Band (PPB) is determined. 4. Application of hash function to PPBs (205) a) The partition of the band is performed in uniform intervals, i.e., the band is divided into two groups - the first ten bands and the second ten bands; b) The following hash algorithm is applied in combination with PPBs from each band partition:
■ 100)) + (PPB0 & 16777214)
Figure imgf000020_0001
where & is the bitwise AND operation, and ΡΡΒ± is the PPB at index ' i ' .
5. Storage of obtained hashes (206)
The hash information is recorded in database, together with the position in the determined time of that segment.
In order to determinate the content of video or audio data stream and the time position, the following operations are executed .
1. Reduction of source space
The means to reduce the search space by contextualization are heuristically determined, i.e., through analyzing the programming of a given cable channel and determining the possible TV programs in a certain time and date;
2. Stream capture (201)
A broadcast system provides access to network streams from one or multiples channels. Initially, the network stream is acquired, consisting of multiple audio and video streams of content data, which are transmitted to the end users. The audio track is acquired in digital format, by using one of the following inputs: a network stream provided by a broadcasting system; a digital audio device; a microphone that captures the audio environment.
3. Transcode the stream to normalized audio stream (202) The network streams are retransmitted to one or multiple processes capable of interpreting the raw signal and transcoding it to different formats. These processes isolate the audio streams, normalizing them to a Pulse-code modulation (PCM), with a sample rate of 44.1 kHz, 8 bits of resolution and 1 audio channel, known as mono. This process runs continuously and in real time.
4. Analysis of audio stream (203)
The normalized audio stream is addressed to the audio analysis component, which:
a) Divides the coded stream into a certain number of overlapping samples of uniform size, with a small offset, a fraction of the sample size, among samples, i.e., three samples taking one second, with a offset of 0.1 seconds;
b) Divides each sample in smaller segments, with an uniform size - of 4096 bytes. The blocks are overlapped, instead of contiguous, being separated by 512 bytes between them and having 3584 bytes in common.
5. Collection of Power Peaks of the Bands - PPBs (2044) a) The Fast Fourier Transform (FFT) is applied to each one of aforementioned blocks, obtaining the audio spectrum associated with the block;
b) The audio spectrum is normalized; c) The analysis is limited to the relevant frequency range whose partition is made in a number of non-overlapping frequency bands proportional to the frequency range size, with uniform distribution, i.e., from 50 Hz to 2047 Hz, being the bands 50-100Hz, 100-200Hz, 200- 300Hz, 300-400HZ, 400-500Hz, 500-600Hz, 600-700Hz, 700- 800Hz, 800-900HZ, 900-1000Hz, 1000-llOOHz, 1100-1200Hz, 1200-1300HZ, 1300-1400HZ, 1400-1500Hz, 1500-1600Hz, 1600-1700HZ, 1700-1800HZ, 1800-1900Hz, 1900-2047Hz ) ; d) For each one band identified above, the corresponding frequency is the power peak on that band - Power Peak of the Band (PPB) is determined.
6. Application of hash function to PPBs (205)
a) The partition of the band is performed into two groups - the first ten bands and the last ten bands;
b) The following hash algorithm is applied in combination with PPBs from each band partition: & 16777214) (ΐ000ί_1 ■ 100)) + (PPB0 & 16777214)
Figure imgf000022_0001
where & is the bitwise AND operation, and ΡΡΒ± is the PPB at index ' i ' .
7. Storage of resulted hashes
The resulted hash is stored.
8. Comparing the resulted hashes with the existing library When all hashes of all samples are created, the reduced source space is compared with them (207) : a) If there is a match, a positive identification is pointed out and the contents and temporal stamp are stored (211) ;
b) If no positive identification is obtained, the steps 204 to 213 are repeated, with a sampling divergent configuration, i.e., three samples spanning 5 seconds, with an offset of 0.1 seconds (209);
c) If no match is found, the loss of synchronism is shown (210) ;
d) The temporal stamp associated with the last sample of positive identification sequences is considered as being the correct time for the content in analysis.
9. Storage of matching results and collected PPBs for subsequent analysis
The matching results and the hashes from collected PPBs are stored for subsequent analysis (212, 213) .
- Audio acquisition and analysis, extraction of key points and the corresponding temporal instants for using in future identifications;
- Audio acquisition and analysis from one or several transmissions of a TV content broadcasting company broadcasting, each one in representation of a TV channel, detection of content in visualization and presentation of advertising contextualized with acquired audio content. Examples of advertising-type which may be presented are the products shown on the visualized content, travels to the displayed places, clothes used by characters, among others; - Audio acquisition and analysis from one or multiple transmissions of a TV content broadcasting company, each one in representation of a TV channel, detection of the content in visualization and its temporal instant, notification of its mismatch in relation to what is reported by EPG in the channel;
- Audio acquisition and analysis from one or multiple transmissions of a company of TV content broadcasting, each one in representation of a TV channel, detection of the content in visualization and its temporal instant, fix of its mismatch in relation to what is reported by EPG in the channel;
- Audio acquisition and analysis from one or multiple transmissions of a TV content broadcasting company, each one in representation of a TV channel, detection of advertising slots in order to allow interactions with the end user during these periods.
- Audio acquisition and analysis by using a mobile device microphone, detection of content in visualization and presentation of advertising contextualized with acquired audio content. Products shown on the visualized content, travels to the displayed places, clothes used by characters, among others are examples of advertising-type;
- Audio acquisition and analysis by using a mobile device microphone, detection of content in visualization and presentation of advertising contextualized with acquired audio content. Characters, places, facts about the program, similar programs, among others are examples of information-type;
Example 1
Regarding the particular application described in figure 4 legend, the process runs according to the shown flowchart. From the moment the visualized content mismatch and the corresponding instant in relation to what is reported by EPG is detected, then the flowchart is added with a set of steps for that mismatch identification and notification.
Example 2
Regarding the particular application described in figure 5 legend, the process runs according to the flowchart described in above example, to which is added a number of correction steps for EPG after the step for application of hash function to the PPBs (205) and according to the found mismatch .
An additional step for identification of mismatch in relation to EPG (501) is introduced after the step for indicating a positive identification and storage of contents and temporal stamp (211); In case of the mismatch in relation to EPG is not identified, the loss of synchronization is marked (502) .
Example 3
Regarding the particular application described in figure 6 legend, the step which points to the loss of synchronization (502) is replaced, in the process of Example 2, by a step to fix the content and temporal stamp of EPG (601) . The present realization is not restricted to the aforementioned realizations in this document and a person who has intermediate knowledge in this area may predict several possibilities to change it without withdraw the general idea, as defined in the claims.
The preferential realizations described above are obviously combinable with each other. The following claims define preferential realizations additionally.

Claims

1. Method of operating a computational device, comprising the steps:
- dividing an audio stream into at least one sample;
- dividing the sample into at least one block;
- processing an audio spectrum of the block;
- processing at least one Power Peak of the Band of the at least one frequency band of the audio spectrum;
- processing at least one hash of the at least one Power Peak of the Band and storing it in the database .
2. Method according to the previous claim, further comprising the steps:
- capturing an audio stream from the connection to a multimedia content transmission;
- comparing the at least one hash of the at least one Power Peak of the Band with at least one hash in the database;
- if there is a match, a positive identification and the content and temporal stamps are registered in the database;
- if no match is found, the loss of synchronization is registered in the database.
3. Method according to the previous claims, wherein the step of if there is a match, a positive identification and the content and temporal stamps are registered in the database, further comprises the step: - presenting an advertisement related to the acquired audio content, based on a relation stored in the database .
4. Method according to the previous claim, wherein the step of presenting an advertisement related to the acquired audio content, comprises the step:
- presenting a replacement advertisement in real¬ time .
5. Method according to any of the claims 2 to 4, wherein the step of if there is a match, a positive identification and the content and temporal stamps are registered in the database, further comprises the steps :
- comparing a temporal instant of the identified content to an electronic program guide and processing a mismatch; and
- correcting the electronic program guide based on the mismatch.
6. Method according to any of the claims 2 to 4, wherein the step of comparing the at least one hash of the at least one Power Peak of the Band with at least one hash in the database, comprises the steps:
- determining a set of the possible matching hashes, stored in the database, in a certain time and date, based on an electronic program guide; and
- comparing the at least one hash of the at least one Power Peak of the Band with the set of the possible matching hashes.
7. Method according to any of the previous claims, wherein the step of processing at least one hash of the at least one Power Peak of the Band, comprises processing the hash with:
1
(PPB0 & 16777214) +
Figure imgf000029_0001
where & is the bitwise AND operation, the ΡΡΒ± is the Power Peak of the Band at index i, and N is the number of frequency bands.
8. Method according to any of the previous claims, wherein the step of processing at least one hash of the at least one Power Peak of the Band, comprises the steps:
- dividing the Power Peak of the Bands into at least one group; and
- processing a hash for each group.
9. Method according to the previous claim, wherein the step of dividing the Power Peak of the Bands into at least one group, comprises two groups, where the first half of the frequency bands is the first group and second half is the second group.
10. Method according to any of the previous claims, wherein the step of processing at least one Power Peak of the Band of a frequency band of the audio spectrum, comprises the frequency range from 50 Hz to 2047 Hz with the bands 50-100Hz, 100-200Hz, 200-300Hz, 300- 400Hz, 400-500HZ, 500-600Hz, 600-700Hz, 700-800Hz, 800- 900Hz, 900-lOOOHz, 1000-llOOHz, 1100-1200Hz, 1200- 1300Hz, 1300-1400HZ, 1400-1500Hz, 1500-1600Hz, 1600- 1700Hz, 1700-1800HZ, 1800-1900Hz, 1900-2047Hz.
11. Method according to any of the previous claims, wherein the audio spectrum is processed using the Fast Fourier Transform.
12. Method according to any of the previous claims, wherein the step of dividing the sample into at least one block, comprises overlapping blocks with uniform size of 4096 bytes, separated by 512 bytes and containing 3584 bytes in common.
13. Method according to any of the previous claims, wherein the step of capturing an audio stream and division into at least one sample, comprises an overlapping sample with a sampling divergent configuration which contain three samples spanning 5 seconds, with an offset of 0.1 seconds .
14. A computational device, comprising:
- a data processing means;
- a memory; and
- a database,
where the device is configured to implement the method described in any of the previous claims.
15. Computational device according to the previous claim, further comprising a connection to a multimedia content transmission, where the multimedia content transmission is any of:
- a network stream provided by a broadcasting system;
- an audio digital device; or
- a microphone that captures the ambient audio.
PCT/IB2015/050670 2014-02-05 2015-01-29 Method for capture and analysis of multimedia content Ceased WO2015118431A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
PT107452 2014-02-05
PT10745214 2014-02-05

Publications (1)

Publication Number Publication Date
WO2015118431A1 true WO2015118431A1 (en) 2015-08-13

Family

ID=52727176

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2015/050670 Ceased WO2015118431A1 (en) 2014-02-05 2015-01-29 Method for capture and analysis of multimedia content

Country Status (1)

Country Link
WO (1) WO2015118431A1 (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002027600A2 (en) 2000-09-27 2002-04-04 Shazam Entertainment Ltd. Method and system for purchasing pre-recorded music
WO2002061652A2 (en) 2000-12-12 2002-08-08 Shazam Entertainment Ltd. Method and system for interacting with a user in an experiential environment
WO2003091990A1 (en) 2002-04-25 2003-11-06 Shazam Entertainment, Ltd. Robust and invariant audio pattern matching
EP1760693A1 (en) 2005-09-01 2007-03-07 Seet Internet Ventures Inc. Extraction and matching of characteristic fingerprints from audio signals
US20110173208A1 (en) * 2010-01-13 2011-07-14 Rovi Technologies Corporation Rolling audio recognition
US20120191231A1 (en) * 2010-05-04 2012-07-26 Shazam Entertainment Ltd. Methods and Systems for Identifying Content in Data Stream by a Client Device
US8321394B2 (en) 2009-11-10 2012-11-27 Rovi Technologies Corporation Matching a fingerprint
US20130047177A1 (en) * 2010-02-24 2013-02-21 Gérard Delegue Method and server for detecting a video program received by a user

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002027600A2 (en) 2000-09-27 2002-04-04 Shazam Entertainment Ltd. Method and system for purchasing pre-recorded music
WO2002061652A2 (en) 2000-12-12 2002-08-08 Shazam Entertainment Ltd. Method and system for interacting with a user in an experiential environment
WO2003091990A1 (en) 2002-04-25 2003-11-06 Shazam Entertainment, Ltd. Robust and invariant audio pattern matching
EP1760693A1 (en) 2005-09-01 2007-03-07 Seet Internet Ventures Inc. Extraction and matching of characteristic fingerprints from audio signals
US8396705B2 (en) 2005-09-01 2013-03-12 Yahoo! Inc. Extraction and matching of characteristic fingerprints from audio signals
US8321394B2 (en) 2009-11-10 2012-11-27 Rovi Technologies Corporation Matching a fingerprint
US20110173208A1 (en) * 2010-01-13 2011-07-14 Rovi Technologies Corporation Rolling audio recognition
US20130047177A1 (en) * 2010-02-24 2013-02-21 Gérard Delegue Method and server for detecting a video program received by a user
US20120191231A1 (en) * 2010-05-04 2012-07-26 Shazam Entertainment Ltd. Methods and Systems for Identifying Content in Data Stream by a Client Device

Similar Documents

Publication Publication Date Title
US12316811B2 (en) Methods and apparatus to identify media using hybrid hash keys
EP3424221B1 (en) Media channel identification and action with multi-match detection based on reference stream comparison
JP5951133B2 (en) Method and apparatus for identifying media
US9563699B1 (en) System and method for matching a query against a broadcast stream
US9451048B2 (en) Methods and systems for identifying information of a broadcast station and information of broadcasted content
US11445242B2 (en) Media content identification on mobile devices
US20140161263A1 (en) Facilitating recognition of real-time content
US12445681B2 (en) Methods and apparatus to optimize reference signature matching using watermark matching
KR101155465B1 (en) System for monitoring advertisements from broadcasting data and method thereof
Bisio et al. A television channel real-time detector using smartphones
WO2015118431A1 (en) Method for capture and analysis of multimedia content
CN112312208A (en) Multimedia information processing method and device, storage medium and electronic equipment
US20250086256A1 (en) Methods and systems to identify media content using watermark metadata and mapped audio signatures
US20250203155A1 (en) Crediting exposure to media identified using source filtering
HK40032363A (en) Media channel identification and action with multi-match detection based on reference stream comparison
HK1256916B (en) Method and system for taking action based on channel identification
HK1207501B (en) Methods and apparatus for identifying media

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15711822

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15711822

Country of ref document: EP

Kind code of ref document: A1