HK1132405B

HK1132405B - Methods and apparatus to meter content exposure using closed caption information

Info

Publication number: HK1132405B
Application number: HK09111849.1A
Authority: HK
Inventors: 阿伦‧拉马斯瓦米
Original assignee: 尼尔森（美国）有限公司
Priority date: 2006-06-15
Filing date: 2007-06-11
Publication date: 2013-03-01

Description

Method and apparatus for metering content exposure using closed caption information

RELATED APPLICATIONS

This patent claims priority to U.S. provisional application serial No. 60/804,893 entitled "Methods and apparatus for metering Content Consumption Using Closed Caption and program guide Information," filed on 15/6/2006, the entire contents of which are hereby incorporated by reference.

Technical Field

The present invention relates generally to metering content exposure (content exposure), and more particularly, to a method and apparatus for metering content exposure using closed caption information.

Background

Exposure of media content may be metered by collecting, identifying, and/or extracting audience measurement codes embedded in the content being presented. Content providers (e.g., television and/or radio broadcasters) typically insert, embed, and/or otherwise place such audience measurement codes into content to facilitate identification of the content. Alternatively or additionally, the exposure of the content may be measured by collecting signatures representative of the content. By comparing one or more audience measurement codes and/or signatures collected during the presentation of content to a database of known audience measurement codes and/or signatures, the exposure of a particular piece of content to one or more individuals, audiences, and/or households may be measured.

Drawings

Fig. 1 is a schematic illustration of an example system that utilizes closed caption information to meter content exposure.

FIG. 2 illustrates an example manner of implementing the example content exposure metric of FIG. 1.

Fig. 3 is an example histogram of television channels most likely to be presented and/or consumed during a given time period.

Fig. 4 is an exemplary table of audience measurement codes and cue data.

FIG. 5 illustrates an example manner of implementing the example processing server of FIG. 1.

FIG. 6 is a flow diagram representing an example process that may be performed to implement the example content exposure meter of FIG. 1.

FIG. 7 is a flow diagram representing an example process that may be performed to implement the example processing server of FIG. 1.

FIG. 8 is a schematic illustration of an example processor platform that may be used and/or programmed to perform the example processes of FIG. 6 and/or FIG. 7 to implement the example content exposure meter and/or the example processing server of FIG. 1.

Detailed Description

FIG. 1 illustrates an example system for metering content exposure using closed caption (closed caption) information constructed in accordance with the teachings of the present invention. The example system of FIG. 1 meters: a) content being presented and/or consumed while the content is broadcast and/or b) content not being presented and/or consumed while the content is broadcast (e.g., the system meters content being presented and/or consumed that was previously recorded while broadcast and presented later (i.e., time shifted viewing). To meter content exposure, the example system of FIG. 1 uses closed caption information and/or content identifiers. As used herein, a "content identifier" is any type of data and/or information associated with, embedded in, inferred from, and/or inserted into a piece of content that can be used to identify the piece of content. Audience measurement codes (e.g., audio codes, audio watermarks, video watermarks, Vertical Blanking Interval (VBI) codes, image watermarks, and/or any other watermarks that are embedded in content to facilitate identification of the content by a content provider (e.g., a television and/or radio broadcaster)), public or private identifiers in a bitstream, closed caption information, metadata, signatures, or any other type of data may be used as the content identifier. The content identifier is typically not noticed by the viewer during playback, but this is not an absolute case. For content currently being broadcast, the example system of fig. 1 utilizes audience measurement codes and/or signatures to identify content (e.g., audio, video, images, and/or others) being presented and/or consumed. In particular, the collected audience measurement codes and/or signatures may be compared to a database of audience measurement codes and/or signatures representing known content to facilitate identifying the content being presented. Similarly, for previously recorded content, the example system may also utilize audience measurement codes and/or signatures to identify the presented media content.

Since the audience measurement codes and/or signatures determined from previously recorded content may be substantially time-offset from a reference database of audience measurement codes and/or signatures, matching audience measurement codes and/or signatures to a database to determine what content is being presented and/or consumed may become difficult and/or time consuming. Thus, the example system of FIG. 1 utilizes closed caption information to identify the most likely content to be presented when presenting and/or consuming content. This possible content information is then used during matching of the audience measurement codes and/or signatures determined from previously recorded content with a database of audience measurement codes and/or signatures, as described below. In particular, the possible content information may enable the extracted and/or determined audience measurement codes and/or signatures to be compared to a smaller subset of the audience measurement code database. Closed caption information and/or content information that may be presented and/or consumed may also be utilized to meter the content currently being broadcast.

The example system of fig. 1 includes any type of media device 105, such as a set-top box (STB), a Digital Video Recorder (DVR), a Video Cassette Recorder (VCR), a Personal Computer (PC), a game console, a television, a media player, etc., for receiving, playing, viewing, recording, and/or decoding any type of content. Example content includes Television (TV) programs, movies, videos, commercials, advertisements, audio, video, games, and so forth. In the example system of fig. 1, the example media device 105 receives content via any type of source, such as: a satellite receiver and/or antenna 110; a Radio Frequency (RF) input signal 115 received via any type of cable television signal and/or terrestrial broadcast; any type of data communication network, such as the internet 120; such as any type of data and/or media storage 125 for a Hard Disk Drive (HDD), a VCR tape, a Digital Versatile Disk (DVD), a Compact Disk (CD), a flash memory device, etc. In the example system of fig. 1, the content (independent of its source) may include closed caption information and/or data. Alternatively or additionally, closed caption information and/or data may be provided and/or received separately from the content itself. The media device 105 and/or the content exposure meter 150 may synchronize such separately received closed caption information and/or data with the content.

To provide and/or broadcast content, the example system of FIG. 1 includes, for example, any type and/or number of content providers 130 of a television station, a satellite broadcaster, a movie studio, and the like. In the example shown in fig. 1, content provider 130 delivers and/or provides content to example media device 105 via satellite broadcast, terrestrial broadcast, cable television broadcast, internet 120, and/or media store 125 using satellite transmitter 135 and satellite and/or satellite relay 140.

To meter exposure and/or consumption of content, the example system of FIG. 1 includes a content exposure meter 150. The example content exposure meter 150 of fig. 1 receives audio data 155 and/or video data 160 from the example media device 105. The example content exposure meter 150 also receives any type of content guide information and/or data 165. The content guide data 165 may be broadcast and/or communicated to the content exposure meter 150 or downloaded and/or otherwise received by the content broadcast meter 150 via the internet 120, satellite input, RF input 115, media device 105, and/or media storage 125. In some examples, the content guide data 165 is, for example, an extensible markup language (XML) file containing television program information (e.g., television guide listings) for any number of days and/or customized for the geographic location (e.g., postal or zip code) of the content exposure meter 150. The example content exposure meter 150 of fig. 1 may be, for example: (a) PC; (b) the example media device 105 may be implemented by, implemented in, and/or otherwise associated with the example media device 105, and/or c) an XML data collection server as described in PCT patent application PCT/US2004/000818, the entire contents of which are incorporated herein by reference. An example manner of implementing the example content exposure meter 150 is described below with respect to fig. 2. And an example process performed to implement the example content exposure meter 150 is described with respect to fig. 6.

As described below with respect to fig. 2, 3, and 6, the example content exposure meter 150 of fig. 1 uses the content guide data 165 and/or data derived from the content guide data 165, as well as closed caption information, e.g., derived from the video data 160, to identify one or more television programs and/or movies, e.g., that may be being presented (e.g., viewed) by the media device 105 and/or via the media device 105. To enable measurement of content exposure, as described below, the example content exposure meter 150 of FIG. 1 collects and/or generates audience measurement codes and/or signatures that may be used to identify content being presented. In the event that content is presented and/or consumed asynchronously to the time that the content is broadcast (e.g., content that was previously recorded and/or consumed at the time of broadcast and is currently being played back at media device 105 and/or via media device 105), the example content exposure meter 150 utilizes closed caption information and content guide information (e.g., Electronic Program Guide (EPG) information) to identify which of a set of potential candidate content represents the content that is most likely to be presented to the panelist/user/family member. The example content exposure meter 150 may also utilize closed caption information to identify which currently broadcast content is being presented and/or consumed. Any included and/or associated closed caption information and/or data is also stored when content is stored and/or recorded, for example, at media device 105 and/or via media device 105. For example, if the received content contains embedded closed caption information, the closed caption information is saved according to the content being recorded.

When performing content metering, the example content exposure meter 150 of FIG. 1 divides the time at which content presentation occurs into a set of presentation time intervals (e.g., 30 seconds) and determines the most likely content to be presented and/or consumed for each time interval. The time interval may have any duration depending on the desired interval size of the metering to be performed. In addition, the duration of the time interval may be fixed or variable.

For each presentation time interval, the example content exposure meter 150 of FIG. 1 provides the processing server 175 with a ranked list of candidate content that represents the most likely segment of content to present now and/or in the past. The processing server 175 may be geographically separate from the content exposure meter 150 and/or may be co-located with the example content exposure meter 150. In the example of fig. 1, the ranked list of candidate content is provided to the processing server 175 as a list of content exposure hints (hit) 170A. In the example of fig. 1, cues 170A are ordered according to the probability that the candidate content associated with each given cue is content presented and/or consumed during a time interval of interest, and cues 170A may include, for example, three or four most likely items. The processing server 175 may receive and process content exposure hints 170A from any number of content exposure meters 150 that may be geographically distributed. As described below with respect to FIG. 2, the example content exposure meter 150 also collects any type of audience measurement codes and/or signatures (collectively audience measurement data 170B) from the audio data 155. The audience measurement data 170B is provided to the processing server 175 along with the content exposure hints 170A. An example table used by the content exposure meter 150 to provide the hints 170A and audience measurement data 170B to the processing server 175 is described below with respect to fig. 4. Additionally or alternatively, the cues 170A and audience measurement data 170B may be formatted as XML files. The audio measurement data 170B may include and/or represent video codes, video signatures, image codes, image signatures, and the like. To simplify the illustration, the following disclosure is directed to using any type of code and/or signature as the audience measurement data 170B.

To facilitate creating the cues 170A to identify content (e.g., previously recorded content) that is presented and/or consumed asynchronously with the time of the content broadcast, the example content exposure meter 150 stores and/or retains content guide data 165 (e.g., EPG data) and/or data derived from the content guide data 165 collected during a previous period (e.g., over the past 14 days). As such, as described below, the content exposure meter 150 may use the currently collected and/or previously collected content guide data 165 and/or data derived from the currently collected and/or previously collected content guide data 165 to identify content that is presented (e.g., displayed, viewed, and/or listened to) at the media device 105 and/or via the media device 105. In the illustrated example, the period of time that the content guide data 165 and/or data derived from the content guide data 165 is retained by the example content exposure meter 150 is the period of time that the example processing server 175 is programmed to calculate and/or tabulate statistics regarding content exposure.

In the example shown in fig. 1, cues 170A and audience measurement data (e.g., codes and/or signatures) 170B are provided from the content exposure meter 150 to the processing server 175 occasionally, periodically, or in real-time. Any type of technique for downloading and/or transferring data from the example content exposure meter 150 to the example processing server 175 may be used. For example, the cues 170A and audience measurement data 170B may be communicated via the internet 120, a Public Switched Telephone Network (PSTN)180, and/or a private network. Additionally or alternatively, the example content exposure meter 150 may periodically or aperiodically store the hints 170A and audience measurement data 170B on any type of non-volatile storage medium (e.g., recordable compact disc (CD-R)), transmit (e.g., pick up, post, etc.) the hints 170A and audience measurement data 170B to a processing service, and then load onto the example processing server 175.

The example processing server 175 of fig. 1 utilizes the hints 170A and the audience measurement data 170B received from the example content exposure meter 150 to determine which content to present and/or consume at the example media device 105 and/or via the example media device 105 to form content exposure data for the media device 105 and/or the set of one or more media devices 105. For example, the processing server 175 utilizes the hints 170A to more efficiently compare the audience measurement data (e.g., codes and/or signatures) 170B collected by the content exposure meter 150 to a database of audience measurement data (e.g., codes and/or signatures) stored and/or available at the processing server 175. As described above, the database of audience measurement data at the example processing server 175 desirably represents a large portion of the overall content, thereby increasing the likelihood of accurately identifying any content presented and/or consumed at the example media device 105. However, the larger the database, the greater the processing power required to perform a search of all the audience measurement data stored in the database to identify a match. The example processing server 175 of fig. 1 may, for example, receive audience measurement data from the content provider 130 and/or determine audience measurement data for the content 185 received at the processing server 175 and/or received by the processing server 175. Additionally, the content represented by the audience measurement data stored in the database may include content that has been broadcast and/or is to be broadcast and/or content that has not yet been broadcast but is available to the user via a DVD, VCR, or other storage medium. The example processing server 175 may use hints 170A to limit the amount of audience measurement data that must be compared to process content exposure metering information 170B from a substantial number of content exposure meters 150. An example processing server 175 is described below with respect to fig. 5. And an example process that may be performed to implement the example processing server 175 is described with respect to fig. 7.

The example processing server 175 of FIG. 1 combines content exposure data determined for a plurality of metered media devices 105 associated with a plurality of audiences to form meaningful content exposure statistics. For example, the processing server 175 of the illustrated example uses the combined content exposure data to determine overall effectiveness, reach, and/or audience demographics of the viewed content by processing the collected data using any type of statistical method.

FIG. 2 illustrates an example manner of implementing the example content exposure meter 150 of FIG. 1. To process the content guide data 165, the example content exposure meter 150 of FIG. 2 includes any type of indexing engine 205. The example indexing engine 205 implements any method, algorithm, and/or technique to process an XML file containing multiple records. The XML file is processed such that an index is created that identifies keywords that distinguish between the plurality of records represented by the XML file. Consider an example XML file containing a television guide listing in which each record in the XML file represents a separate television program. Each record in the XML file contains data about a television program, such as a channel number of the broadcast television program, a name associated with the channel on which the television program is broadcast, a program name of the television program, a description of the content of the television program, and a time at which the television program is to be broadcast. The example indexing engine 205 indexes the XML data to remove as much redundant information as possible while retaining keywords useful for distinguishing the listed television programs. For example, consider 6-6 with a plurality of television programs including "news" in the title and/or description: 01PM slot. Because the term "news" is "locally shared" (e.g., appears in the second program during the relevant time period), the example indexing engine 205 of fig. 2 does not include "news" in the indexed list of keywords. However, if one of those same television programs includes less locally common terms (e.g., the name of a particular guest and/or a description of a particular segment) in its program information, the example indexing engine 205 includes less locally common data (e.g., the name of a particular guest and/or one or more words from a description) in the indexed list of keywords.

To store index keywords, which may be formed by the index engine 205 or any other keyword server, the example content exposure meter 150 of fig. 2 includes a keyword database 210. Keywords stored in the keyword database 210 are indexed to associated channel numbers, channel names, program information (e.g., descriptions), and/or broadcast time information. The example key database 210 may use any type and/or number of data structures (e.g., matrices, arrays, variables, registers, data tables, etc.) to store index keys. In the illustrated example, the keyword database 210 is stored, for example, in any type of memory and/or machine accessible file 215. The example keyword database 210 of FIG. 2 includes index keywords for a current time period (e.g., a current week) and any number of previous time periods. The number and duration of the time periods included in the keyword database 210 depends on how much time the processing server 175 calculates and/or tabulates statistical backtracking about content exposure. For example, processing server 175 may be configured to only consider content from the previous fourteen (14) days. The example indexing engine 205 of fig. 2 periodically or aperiodically deletes and/or otherwise removes old keywords.

To extract and/or decode closed caption data and/or information from the video data 160, the example content exposure meter 150 of fig. 2 includes any type of closed caption decoding engine 220. The example closed caption decoding engine 220 of fig. 2 decodes a 21 line National Television Systems Committee (NTSC) television signal or a 22 line Phase Alternating Line (PAL) television signal to extract closed caption text 222, for example, using any type of method, algorithm, circuit, device, and/or technique. In the example systems of fig. 1 and 2, the example closed caption decoding engine 220 decodes the closed caption file 222 in real-time as the content is reviewed, displayed, viewed and/or played back at the media device 105 and/or via the media device 105. Additionally or alternatively, the video data 160 may be stored in the content exposure meter 150 and processed by the closed caption decoding engine 220 in non-real time. The example closed caption decoding engine 220 of fig. 2 also extracts and/or decodes temporal information associated with the closed caption data and/or information embedded with the closed caption data and/or information in the video data 160 (i.e., closed caption timestamps).

To determine the content that is most likely to be presented and/or consumed at and/or via the media device, the example content exposure meter 150 of fig. 2 includes a closed caption matcher 225. The example closed caption matcher 225 of fig. 2 compares the stream of closed caption text 222 to the indexed keywords in the keyword database 210 using any type of method, algorithm, circuit, device, and/or technique. When a match is determined, the content corresponding to the match is recorded. During a predetermined time interval (e.g., 5 minutes), the example closed caption matcher 225 counts the total number of matches identified and the number of matches for each particular content (e.g., television program). In the example of fig. 2, at the end of each time interval, the probability that a given candidate content is actually being presented and/or consumed is the number of matches for each candidate content divided by the total number of matches. The candidate content (e.g., television program) with the highest probability is the content that is most likely to be currently presented and/or consumed. In the example of fig. 1 and 2, the four pieces of content with the highest probabilities (i.e., the content most likely being presented and/or consumed) are provided to the processing server 175 as hints 170A for the current time interval. Of course, any number of threads 170A may be provided to processing server 175.

Fig. 3 illustrates an example histogram of probabilities 305 (i.e., likelihoods) of each of a plurality of television programs 310 being presented and/or consumed (e.g., viewed) during respective time intervals 315 having a duration of T minutes. As shown, each television program 310 is shown with a bar having a height that represents the likelihood of that television program being viewed during interval 315. In the example of fig. 3, as shown by bar 320, the most likely channel to be viewed during interval 315 is 6 at 3 months and 3 days 2006: 00-6: evening news on "FOX" television channel during the 01PM period. In the examples shown in fig. 1 to 3, the time period is determined based on the closed caption timestamp and thus has a finer granularity than the program start time, end time, and/or program duration. The granularity depends on the granularity of the closed caption timestamp and the length of the interval 315. At the end of interval 315, "FOX", "NBC", "ABC", and "CBS" are provided to processing server 175 as clues. As the media device 105 continues to provide the video data 160, the closed caption matcher 225 of fig. 2 continues to identify and count matches, then at the end of each interval 325, determines the probability of that interval 325, and provides the most likely four candidate content to the processing server 175 as hints 170A associated with the currently processing time interval 325.

Additionally or alternatively, the example closed caption matcher 225 of fig. 2 may not be able to identify exactly the content being presented and/or consumed if there is not a sufficient set of keywords. For example, the example closed caption matcher 225 may only be able to identify that the television station being watched is ABC but may not be able to discern which television program is being presented and/or consumed. Similarly, the closed caption matcher 225 may be able to identify that evening news is being presented and/or consumed, but not on which television channel. Or no hint 170A is available at a given time interval.

To collect the audio codes of the audio data 155, the example content exposure meter 150 of fig. 2 includes any type of audio code engine 230. The example audio code engine 230 utilizes any type of method, algorithm, circuit, device, and/or technique to search for, locate, extract, and/or decode audio codes inserted into the audio data 155 by a content provider (e.g., a television and/or radio broadcaster) to facilitate identification of content. Such audio codes can be commonly used in the industry for the purpose of detecting exposure of content. However, those skilled in the art will readily recognize that not all content has audio codes and/or signatures inserted.

To collect and/or generate audio signatures for the audio data 155, the example content exposure meter 150 of fig. 2 includes any type of audio signature engine 235. The example audio signature engine 235 of fig. 2 processes the audio data 155 using any type of method, algorithm, circuit, device, and/or technique to determine a binary fingerprint and/or signature that substantially and/or uniquely identifies a corresponding portion of the audio data 155. An example audio signature is computed by applying data compression to the audio data 155.

In the example shown in fig. 1 and 2, the example closed caption matcher 225 provides the audience measurement data (e.g., audio codes and/or signatures and/or various signatures) 170B to the processing server 175 along with the hints information 170A.

Although an example content exposure meter 150 is illustrated in fig. 2, the elements, modules, logic, memory, and/or devices illustrated in fig. 2 may be combined, rearranged, eliminated, and/or implemented in any manner. For example, the example closed caption matcher 225, the example indexing engine 205, and/or the example keyword database 210 may be implemented separately from the example content exposure meter 150 (e.g., by and/or in the example processing server 175). In such a wayIn one example, the content exposure meter 150 provides the closed caption information 222 and the audience measurement data 170B to the processing server 175, which generates the hints information 170A at the processing server 175. As described more fully below with respect to fig. 5, the processing server 175 uses the generated hints information 170A and audience measurement data 170B to identify content metered by the content exposure meter 150 for presentation and/or consumption at the media device 105 and/or via the media device 105. Further, the example indexing engine 205, the example keyword database 210, the example memory and/or files 215, the example closed caption matcher 225, the example closed caption decoding engine 220, the example audio code engine 230, the example audio signature 235, and/or, more generally, the example content exposure meter 150 may be implemented by hardware, software, firmware, and/or any combination of hardware, software, and/or firmware. For example, the example index engine 205, the example keyword database 210, the example memory and/or file 215, the example closed caption matcher 225, the example closed caption decoding engine 220, the example audio code engine 230, and/or the example audio signature engine 235 may be implemented via a computer-readable medium such as a computer-readable mediumAny type of processor, including a series of processors and/or microcontrollers, executing machine-accessible instructions. Further, the content exposure meter may include additional elements, modules, logic, memory, and/or devices and/or may include more than one of any illustrated elements, modules, and/or devices (e.g., a video code engine or a video signature engine).

Fig. 4 is an example cue and tuning information table having a plurality of entries 405 each corresponding to one of the cues 170A provided by the content exposure meter 150. In the example of fig. 4, each of the plurality of entries 405 contains a period interval identifier 410, a content timestamp 412 indicating when the content was presented and/or consumed, and cue information including one or more of: (a) a list 415 of the highest probability content sources (e.g., television channels); (b) a list 420 of the highest probability content segments (e.g., television programs); (c) a list of highest probability airtimes 425. According to the example of fig. 4, each plurality of entries 405 also includes any audience measurement data 430 (e.g., audio codes and/or audio signatures) located, extracted, decoded, identified, and/or calculated during the time period. The degree to which a particular timestamp entry 412 matches a particular broadcast time 425 indicates whether the corresponding content was rendered and/or consumed in real-time and/or whether the corresponding content was previously recorded and/or retrieved. Although an example cue and tuning information table is illustrated in fig. 4, one of ordinary skill in the art will readily recognize that the content exposure meter 150 may use any type of file, data structure, table, etc. to format the data prior to sending the data to the processing server 175. Further, more or fewer types of information may be included in the table.

Fig. 5 illustrates an example manner of implementing at least a portion of the example processing server 175 of fig. 1. To determine the audio codes and/or signatures of the audio data 185 provided by the content provider 130 and/or obtained from the content provider 130, the example processing server 175 of fig. 5 includes any type of audio code engine 505. The example audio code engine 505 utilizes any type of method, algorithm, circuit, device, and/or technique to search for, locate, extract, and/or decode and/or sign audio codes and/or signatures inserted into the audio data 185 by a content provider (e.g., a television and/or radio broadcaster) to facilitate identification of content. Such audio codes can be commonly used in the industry for the purpose of detecting exposure of content. However, one of ordinary skill in the art will readily recognize that not all content contains audio codes. Additionally or alternatively, the content provider 130 may provide only audio codes for content for which exposure and/or consumption statistics are desired.

To determine the audio signature of the audio data 185, the example processing server 175 of FIG. 5 includes any type of audio signature engine 510. The example audio signature engine 510 of fig. 5 processes the audio data 185 using any type of method, algorithm, circuit, device, and/or technique to determine a binary fingerprint and/or signature that substantially and/or uniquely identifies a corresponding portion of the audio data 185. An example audio signature is computed by applying data compression to the audio data 185.

In the example of fig. 5, audience measurement data 515 (e.g., audio codes and/or audio signatures) located, decoded, extracted, identified, and/or computed by the example audio code engine 505 and/or the example audio signature engine 510 and/or received from the content provider 130 is stored using any type and/or number of databases and/or data structures (e.g., matrices, arrays, variables, registers, data tables, etc.) and is stored, for example, in any type of memory and/or machine accessible file 520. The example audience measurement database 515 of fig. 5 is indexed by associated channel number, channel name, program information (e.g., description), and/or broadcast time information. The example audience measurement database 515 includes audio codes and/or signatures corresponding to content currently being broadcast, content broadcast in the past, and/or content to be broadcast in the future. The amount of data in the database 515 may be selected based on a desired time period over which the example processing server 175 is programmed to calculate and/or tabulate statistics regarding content exposure and/or consumption. For example, the example processing server 175 of fig. 5 may be configured to only consider content that was broadcast now and/or in the past and/or that was available in the previous fourteen (14) days. However, if it is desired to install a storage medium (e.g., a DVD), the database 515 should not be limited to being time-based.

To identify content presented and/or consumed at media device 105 and/or via media device 105, the example processing server 175 of fig. 5 includes a content matcher 525. The example content matcher 525 of fig. 5 utilizes the hints 170A and audience measurement data 170B received from the content exposure meter 150 to determine which content to present and/or consume at the example media device 105 and/or via the example media device 105 to form content exposure data 530 for the media device 105. In particular, the example content matcher 525 utilizes the provided hints 170A to identify a subset of the codes and/or signatures stored in the audience measurement database 515 of the processing server 175 for comparison to the audience measurement data 170B collected from the example media device 105. A match between the audience measurement data 170B and the particular audio code and/or signature 515 indicates that the content corresponding to the particular audio code and/or signature stored in the processing server 175 is content that is rendered and/or consumed at the media device 105 and/or via the media device 105.

The content matcher 525 can utilize the hints 170A to greatly reduce the amount of audience measurement data from the database 515 that must be compared to the audience measurement data 170B collected by the content exposure meter 150. As a result, a substantial amount of the audience measurement data 170B of the content exposure meter 150 may be processed. An example process that may be performed to implement the example content matcher 525 of fig. 5 is described below with respect to fig. 7.

In the example of fig. 5, the content exposure data 530 is stored using any type and/or number of data structures (e.g., matrices, arrays, variables, registers, data tables, etc.) and is stored, for example, in any type of memory and/or machine accessible file 535. The content exposure data 530 may include content exposure data for a plurality of other metered media devices 105 associated with a plurality of audiences to form meaningful content exposure statistics. The combined content exposure data 530 may be statistically processed to determine, for example, the overall effectiveness, reach, and/or audience demographics of the content presented and/or consumed.

Although an example processing server 175 has been illustrated in fig. 5, the elements, modules, logic, memory, and/or devices illustrated in fig. 5 may be combined, rearranged, eliminated, and/or implemented in any manner. For example, the example closed caption matcher 225, the example indexing engine 205, and/or the example keyword database 210 of fig. 2 may be implemented by and/or within the processing server 175. In such an example, the content exposure meter 150 provides the closed caption information 222 and the audience measurement data 170B to the processing server 175. Based on the received closed caption information 222, the processing garmentServer 175 generates hint information 170A therein. In some examples, the processing server 175 receives closed caption information 222 from some content exposure meters 150 and cue information 170A from other content exposure meters 150. Further, the example audio code engine 505, the example audio signature engine 510, the memory 520, the example content matcher 525, the example memory 535, and/or more generally the example processing server 175 may be implemented by hardware, software, firmware, and/or any combination of hardware, software, and/or firmware. For example, the example audio code engine 505, the example audio signature engine 510, the memory 520, the example content matcher 525, the example memory 535 may be implemented via software such as from、、Any type of processor 175, including a series of processors and/or a microcontroller. Further, the content exposure meter may include additional elements, modules, logic, memory, and/or devices and/or may include more than one of any illustrated elements, modules, and/or devices (e.g., a video code engine or a video signature engine).

Fig. 6 and 7 are flowcharts representative of example processes that may be performed to implement the example content exposure meter 150 and the example processing server 175 of fig. 1, respectively, and/or, more generally, to meter content exposure using closed caption information. The example processes of fig. 6 and/or 7 may be performed by a processor, a controller, and/or any other suitable processing device. For example, all or part of the flow diagrams of fig. 6 and/or 7 may be implemented in coded instructions stored on a tangible medium, such as a flash memory, a RAM associated with a processor (e.g., the example central processing unit 805 discussed below with respect to fig. 8). Or some or all of the example processes of fig. 6 and/or 7 may be implemented using an Application Specific Integrated Circuit (ASIC), a Programmable Logic Device (PLD), a Field Programmable Logic Device (FPLD), discrete logic, hardware, firmware, or the like. Moreover, some or all of the example processes of fig. 6 and/or 7 may be implemented manually or as a combination of any of the foregoing techniques (e.g., firmware and/or a combination of software and hardware). Moreover, although the example processes of FIGS. 6 and 7 are described with reference to the flowcharts of FIGS. 6 and 7, persons of ordinary skill in the art will readily appreciate that many other methods of implementing the example content exposure meter 150 and/or the example processing server 175 of FIG. 1, respectively, and/or, more generally, metering content exposure using closed caption information and program guide data may be employed. For example, the order of execution of the blocks may be changed, and/or some of the blocks described may be changed, eliminated, sub-divided, or combined. Additionally, those of ordinary skill in the art will appreciate that the example processes of fig. 6 and/or 7 may be performed sequentially and/or in parallel, e.g., by separate processing threads, processors, devices, circuits, etc.

The example process of fig. 6 begins with a closed caption matcher (e.g., the example closed caption matcher 225 of fig. 2) obtaining and/or receiving portions of closed caption text (i.e., a word and/or words) collected during a next time interval from the closed caption decoding engine 220 (block 605). The closed caption matcher 225 then compares the closed caption text to the indexed keywords in a keyword database (e.g., the keyword database 210 of fig. 2) (block 610). If a match of the at least one closed caption word with the at least one index keyword is identified (block 615), content corresponding to the matched keyword is identified (e.g., the example content 320 of fig. 3), and histogram information of the identified content is updated (block 620). If no match is identified (block 615), the update of the histogram is skipped.

The closed caption matcher 225 then determines whether the end of the time interval currently being processed (e.g., the example interval 315 of fig. 3) has been reached (i.e., whether the boundary of the time interval 315 (i.e., the interval boundary) has occurred) (block 630). If an interval boundary has not occurred (block 630), control returns to block 605 to get the next closed caption text (block 605). If an interval boundary has occurred (block 630), the closed caption matcher 225 obtains and/or receives any audio codes and/or signatures collected from content presented and/or consumed during the just-ended time interval (block 635), and obtains and/or receives audio signatures computed for content presented and/or consumed during the just-ended time interval (block 640). The closed caption matcher 225 then creates and/or adds a cue and audience measurement data entry (e.g., entry 405 of fig. 4) into the table and/or sends the cue and audience measurement data to the processing server 175.

The example process of fig. 7 begins with a content matcher (e.g., the example content matcher 525 of fig. 5) reading the hints 170A and the audience measurement data 170B at a time interval 315 (block 705). The content matcher 525 identifies the most probable content, content stream, and/or broadcast time (block 710) and determines whether audio codes for the most probable content, content stream, and/or broadcast time are available (block 715). Additionally or alternatively, the content matcher 525 may utilize content timestamps (e.g., the example timestamp 412 of fig. 4) when selecting the most likely content, content stream, and/or broadcast time at block 710. For example, the content matcher 525 may first select candidate content associated with presentation of live content (e.g., being presented when the content is broadcast). If applicable audio codes are included in the audience measurement data 170B (block 715), the content matcher 525 compares the audio codes and/or signatures to the audio codes and/or signatures 515 corresponding to the candidate content (block 720). If there is a match (block 725), the content matcher 525 scores (credit), scores (tally), and/or tabulates (block 730) the presentation of the candidate content (i.e., identifies the candidate content as being content for presentation and/or consumption) in conjunction with a timestamp (e.g., the example timestamp 412 of FIG. 4) in the content exposure data 530. The time stamp indicates the time of content exposure.

If the applicable audio codes and/or signatures are not available at block 715, or if the audio codes and/or signatures do not match at block 725, the content matcher 525 determines whether audio signatures of the most likely content candidates are available (block 735). If an audio signature is not available (block 735), the content matcher 525 assumes that the most likely candidate content, source, and/or broadcast time is presented and/or consumed and records the exposure of the candidate content along with a timestamp (e.g., the example timestamp 412 of FIG. 4) in the content exposure data 530 (block 730). The time stamp indicates the time of content exposure.

If an audio signature is available (block 735), the content matcher 525 compares the audio signature to the audio signatures 515 corresponding to the candidate content (block 740). If the audio signatures match (block 745), the content matcher 525 records the match (i.e., identifies the candidate content as content for rendering and/or consumption) and a timestamp (e.g., the example timestamp 412 of FIG. 4) into the content exposure data 530 (block 730). The time stamp indicates the time of content exposure.

If the audio signatures do not match (block 745), the content matcher 525 determines whether there are more cues (block 750). If there are no more threads (block 750), control passes to block 755 to determine if there are additional time intervals for the threads to be processed. Additionally or alternatively, the content matcher 525 compares the audience measurement data collected from the media device 105 to all audience measurement data 515 stored in the database to determine whether a match is identified.

If there are more cues (block 750), the content matcher 525 identifies the next most likely content candidate (block 760). Control then returns to block 715.

At block 755, if additional cues 170A and audience measurement data 170B for additional intervals are available (block 755), control returns to block 705 to process the next time interval. If no additional hints 170A and audience measurement data 170B are available (block 755), control exits from the example machine accessible instructions of FIG. 7.

Fig. 8 is a schematic diagram of an example processor platform 800 that may be used and/or programmed to, for example, perform the example processes of fig. 6 and/or 7 to implement the example content exposure meter 150 and the example processing server 175 of fig. 1, respectively, and/or, more generally, to meter content exposure using closed caption information and program guide data. For example, processor platform 800 may be implemented by one or more general purpose microprocessors, microcontrollers, or the like.

The processor platform 800 of the example of fig. 8 includes a general purpose programmable and/or special purpose processor 805. The processor 805 executes coded instructions 810 and/or 812 present in main memory of the processor 805 (e.g., within Random Access Memory (RAM)815 and/or Read Only Memory (ROM) 820). The processor 805 may be any type of processing unit (e.g., fromAnd/orA processor and/or microcontroller of any of a series of processors and/or microcontrollers). The processor 805 may perform the example processes of fig. 6 and/or 7, among others.

The processor 805 communicates with a main memory (including a RAM 815 and a ROM 820) via a bus 825. The RAM 815 may be implemented by DRAM, SDRAM, and/or any other type of RAM device. The ROM 820 may be implemented by flash memory and/or any other desired type of memory device. Access to the memory 815 and the memory 820 is typically controlled by a memory controller (not shown) in a conventional manner. The RAM 815 may be used, for example, to store the example keyword database 210 of fig. 2 and/or the example audience measurement database 515 and/or the example content exposure data 530 of fig. 5.

The processor platform 800 also includes a conventional interface circuit 835. The interface circuit 835 may be implemented by any type of well-known interface standard, such as an external memory interface, serial port, general purpose input/output, etc.

One or more input devices 835 and one or more output devices 840 are connected to the interface circuit 835. Input devices 835 may be used, for example, to receive audio data 155, video data 160, content guide data 165, audio data 185, and so forth. The output device 840 may be used, for example, to transmit the audience measurement data 170B and/or cues 170A from the content exposure meter 150 to the processing server 175.

Although certain example methods, apparatus and articles of manufacture have been described herein, the scope of coverage of this patent is not limited thereto. On the contrary, this patent covers all methods, apparatus and articles of manufacture fairly falling within the scope of the appended claims either literally or under the doctrine of equivalents.

Claims

1. A method of metering content exposure, the method comprising the steps of:

forming a keyword database based on a program guide describing a plurality of programs for a given time period;

collecting audience measurement parameters for a presented program, the audience measurement parameters usable to identify the presented program;

generating a plurality of likelihood values for respective ones of the plurality of programs based on a comparison of the keyword database and closed caption text associated with the presented program, the likelihood values representing likelihoods that respective ones of the plurality of programs are the presented program, the likelihood values generated without comparing the collected audience measurement parameters to any reference audience measurement parameters; and

selecting a subset of the plurality of programs using the plurality of likelihood values to form a list of most likely presented programs, wherein the selected subset includes a number of programs greater than one and less than all of the plurality of programs; and is

Transmitting a list of most likely to be presented programs and the collected audience measurement parameters to a collection server for comparing the collected audience measurement parameters to reference audience measurement parameters for respective ones of the most likely to be presented programs in an order selected in the list based on likelihood values for the respective ones of the most likely to be presented programs.

2. The method of claim 1, wherein generating the likelihood value comprises counting matches of the closed caption text and the keyword database for a respective one of the plurality of programs.

3. The method of claim 2, further comprising:

calculating a sum of one or more matches for a respective one of the plurality of programs; and

dividing each of said matches by said sum.

4. The method of claim 1, wherein the program guide information comprises an extensible markup language (XML) data structure.

5. The method of claim 1, wherein the audience measurement parameters collected comprise at least one of an audio code embedded in the presented program, a video code embedded in the presented program, an audio signature generated from the presented program, or a video signature generated from the presented program.

6. The method of claim 5, wherein the audio code is embedded in the presented program by a broadcaster to identify the presented program.

7. The method of claim 1, wherein the list further comprises at least one of a most likely channel, or a most likely time.

8. An apparatus for metering content exposure, the apparatus comprising:

audience measurement means for collecting audience measurement parameters for a presented program;

an indexing means that creates a keyword database based on a program guide describing a plurality of programs; and

a closed caption matcher, the closed caption matcher comprising:

a generating unit that generates likelihood values for respective ones of the plurality of programs based on a comparison of the keyword database and closed caption text associated with the presented program, the likelihood values representing likelihoods that the respective ones of the plurality of programs are the presented program, the likelihood values being generated without comparing the collected audience measurement parameters to any reference audience measurement parameters;

a selection unit that selects a subset of the plurality of programs based on the likelihood values to form a list of most likely presented programs that includes more than one and less than all of the plurality of programs;

a ranking unit that ranks the list of most likely-to-present programs based on the likelihood values for each; and

a transmitting unit that transmits a list of the sorted most likely to be presented programs and the collected audience measurement parameters to a collection server for comparing the collected audience measurement parameters with reference audience measurement parameters for respective ones of the most likely to be presented programs based on an order of the most likely to be presented programs in the list to determine audience presentation statistics.

9. The apparatus of claim 8, further comprising closed caption decoding means for extracting closed caption text.

10. The apparatus of claim 8, wherein the audience measurement parameters comprise at least one of an audio code embedded in the presented program, a video code embedded in the presented program, an audio signature generated from the presented program, or a video signature generated from the presented program.

11. The apparatus of claim 8, wherein the closed caption matcher generates the likelihood value by counting matches of the closed caption text and the keyword database for a respective one of the plurality of programs.

12. The apparatus of claim 8, wherein the indexing means is for generating a keyword database to remove redundant information.

13. The apparatus of claim 8, wherein the list further comprises at least one of a most likely channel, or a most likely time.

14. A method of metering content exposure, the method comprising the steps of:

receiving audience measurement parameters for a presented program from a content exposure meter;

receiving a list of most likely presented programs from the content exposure meter, generating a plurality of likelihood values for respective ones of a plurality of programs based on a comparison of a keyword database and closed caption text related to the presented programs, selecting and sorting the list of most likely presented programs based on the likelihood values, and the sorted list comprising a number of programs greater than one and less than all programs; and is

Comparing a reference audience measurement parameter for a respective one of the most likely to be present programs to the audience measurement parameters for the present programs until the present program is identified, the reference audience measurement parameters being compared according to an order of the most likely to be present programs in the list.

15. The method of claim 14, wherein the audience measurement parameters collected comprise at least one of an audio code embedded in the presented program, a video code embedded in the presented program, an audio signature generated from the presented program, or a video signature generated from the presented program.

16. The method of claim 14, wherein the ordered list further comprises at least one of a most likely channel or a most likely time.