HK1254819B - Detection of common media segments - Google Patents
Detection of common media segments Download PDFInfo
- Publication number
- HK1254819B HK1254819B HK18113909.3A HK18113909A HK1254819B HK 1254819 B HK1254819 B HK 1254819B HK 18113909 A HK18113909 A HK 18113909A HK 1254819 B HK1254819 B HK 1254819B
- Authority
- HK
- Hong Kong
- Prior art keywords
- media content
- displayed
- media
- segment
- unscheduled
- Prior art date
Links
Description
相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS
本申请根据35 U.S.C§119(e)要求于2015年7月16日提交的美国临时申请No.62/193,322的权益,其全部内容通过引用并入本文。This application claims the benefit under 35 U.S.C. §119(e) of U.S. Provisional Application No. 62/193,322, filed on July 16, 2015, the entire contents of which are incorporated herein by reference.
本申请涉及于2013年11月25日提交的美国专利申请No.14/089,003,现为2014年11月25日发布的美国专利No.8,898,714;于2015年6月9日发布的美国专利申请No.14/217,075(现为美国专利No.9,055,309);于2009年5月29日提交的美国临时申请No.61/182,334;于2009年12月29日提交的美国临时申请No.61/290,714;于2014年7月1日发布的美国专利申请No.12/788,748(现为美国专利No.8,769,584);于2013年11月26日发布的美国专利申请No.12/788,721(现为美国专利No.595,781),所有这些专利申请的全部内容通过引用并入本文。This application is related to U.S. Patent Application No. 14/089,003, filed November 25, 2013, now U.S. Patent No. 8,898,714, issued November 25, 2014; U.S. Patent Application No. 14/217,075, filed June 9, 2015 (now U.S. Patent No. 9,055,309); U.S. Provisional Application No. 61/182,334, filed May 29, 2009; and U.S. Provisional Application No. 61/182,334, filed May 29, 2009. U.S. Provisional Application No. 61/290,714, filed December 29, 2009; U.S. Patent Application No. 12/788,748, issued July 1, 2014 (now U.S. Patent No. 8,769,584); U.S. Patent Application No. 12/788,721, issued November 26, 2013 (now U.S. Patent No. 595,781), all of which are incorporated herein by reference in their entirety.
发明内容Summary of the Invention
自动内容识别(ACR)系统提供关于由特定媒体显示设备在特定时间点显示的内容的信息。ACR系统可以被实现为视频匹配系统。通常,视频匹配系统可以通过以下步骤来操作:从已知视频数据源获得数据样本,从这些样本生成标识信息,并将标识信息连同关于视频的已知信息一起存储在数据库中。特定的媒体设备可以从正在显示的未知视频中获得数据样本,从样本中生成标识信息,并尝试将该标识信息与存储在数据库中的标识信息进行匹配。当发现匹配时,特定媒体设备可以从数据库接收关于视频的已知信息。通常,媒体设备进行的匹配操作可以实时进行,即在视频正在设备上显示时进行。An automatic content recognition (ACR) system provides information about the content displayed by a specific media display device at a specific point in time. The ACR system can be implemented as a video matching system. Typically, a video matching system can operate by the following steps: obtaining data samples from a known video data source, generating identification information from these samples, and storing the identification information in a database together with known information about the video. A specific media device can obtain data samples from an unknown video being displayed, generate identification information from the samples, and attempt to match the identification information with the identification information stored in the database. When a match is found, the specific media device can receive the known information about the video from the database. Typically, the matching operation performed by the media device can be performed in real time, that is, when the video is being displayed on the device.
提供了用于当媒体内容流正在播放未调度的公共媒体段时识别媒体内容流的系统、方法和计算机程序产品。在各种实施方式中,计算设备可以被配置为识别由媒体显示设备在特定时间正在播放的媒体内容。计算设备可以被配置为实现视频匹配系统。计算设备可以接收多个媒体内容流,其中多个媒体内容流中的至少两个媒体内容流同时包括相同的未调度媒体段。计算设备可以被配置为确定媒体显示设备在当前时间正在播放未调度媒体段。为了作出该确定,计算设备可以在多个媒体内容流中的每一个媒体内容流中检查在当前时间可用的媒体内容。计算设备可以进一步被配置为从由媒体显示设备在当前时间正在播放的媒体内容流中包括的媒体内容中确定标识信息。标识信息可以识别媒体内容流。计算设备可以进一步确定背景相关的内容。当由媒体显示设备正在播放未调度媒体段时,可以禁用背景相关的内容。计算设备可以进一步被配置为在未调度媒体段已被播放之后显示媒体内容流和背景相关的内容。Systems, methods, and computer program products are provided for identifying a media content stream when the media content stream is currently playing an unscheduled common media segment. In various embodiments, a computing device may be configured to identify media content currently being played by a media display device at a specific time. The computing device may be configured to implement a video matching system. The computing device may receive multiple media content streams, wherein at least two of the multiple media content streams simultaneously include the same unscheduled media segment. The computing device may be configured to determine that the media display device is currently playing an unscheduled media segment. To make this determination, the computing device may examine each of the multiple media content streams for media content available at the current time. The computing device may be further configured to determine identification information from the media content included in the media content stream currently being played by the media display device. The identification information may identify the media content stream. The computing device may further determine context-related content. When the unscheduled media segment is currently being played by the media display device, the context-related content may be disabled. The computing device may be further configured to display the media content stream and the context-related content after the unscheduled media segment has been played.
在各种实施方式中,使用识别媒体内容流的标识信息来选择背景相关的内容。背景相关的内容可以被提供给媒体显示设备。In various implementations, identification information identifying a media content stream is used to select context-relevant content.The context-relevant content can be provided to a media display device.
在各种实施方式中,识别媒体内容流可以包括:在由媒体显示设备正在播放未调度媒体段时,检测叠加到未调度媒体段上的图形。该图形可以为媒体内容流提供附加的标识信息。In various implementations, identifying the media content stream may include detecting a graphic superimposed on the unscheduled media segment while the unscheduled media segment is being played by the media display device. The graphic may provide additional identification information for the media content stream.
在各种实施方式中,用于确定标识信息的媒体内容可以包括在未调度媒体段之前或之后在媒体内容流中包括的媒体内容。In various implementations, the media content used to determine the identification information may include media content included in the media content stream before or after the unscheduled media segment.
在各种实施方式中,计算设备可以被配置为确定媒体显示设备自从未调度媒体段开始起一直在播放未调度媒体段。在这些实施方式中,计算设备可以使用针对在未调度媒体段之前在媒体内容流中包括的媒体内容确定的标识信息来识别媒体内容流。In various embodiments, the computing device may be configured to determine that the media display device has been playing the unscheduled media segment since starting from the unscheduled media segment. In these embodiments, the computing device may identify the media content stream using identification information determined for media content included in the media content stream prior to the unscheduled media segment.
在各种实施方式中,计算设备可以被配置为确定媒体显示设备自从未调度媒体段开始之后的时间点起一直在播放未调度媒体段。在这些实施方式中,计算设备可以使用在未调度媒体段之后的、媒体内容流中包括的媒体内容的标识信息来标识媒体内容流。In various embodiments, the computing device may be configured to determine that the media display device has been playing the unscheduled media segment since a time point after the start of the unscheduled media segment. In these embodiments, the computing device may identify the media content stream using identification information of media content included in the media content stream after the unscheduled media segment.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
下面参考以下附图详细描述说明性实施例:Illustrative embodiments are described in detail below with reference to the following drawings:
图1示出了可以识别未知内容的匹配系统;Figure 1 shows a matching system that can identify unknown content;
图2示出了用于识别未知数据的匹配系统的部件;FIG2 illustrates components of a matching system for identifying unknown data;
图3示出了包括解码器的存储缓冲器302的视频摘录捕捉系统的示例;FIG3 shows an example of a video excerpt capture system including a memory buffer 302 of a decoder;
图4示出了包括解码器的存储缓冲器的视频摘录捕捉系统;FIG4 shows a video excerpt capture system including a memory buffer of a decoder;
图5示出了用于交互式电视系统的视频匹配系统的示例;FIG5 shows an example of a video matching system for an interactive television system;
图6示出了多个内容流同时携带相同的媒体段的示例;FIG6 shows an example in which multiple content streams simultaneously carry the same media segment;
图7示出了另一个示例,其中这里示出为电视频道的多个媒体流同时显示相同的媒体内容;FIG7 shows another example, where multiple media streams, here shown as television channels, simultaneously display the same media content;
图8示出了多个媒体内容流几乎同时携带相同的公共媒体段的示例;FIG8 shows an example in which multiple media content streams carry the same common media segments almost simultaneously;
图9示出了已经叠加到由媒体显示设备正在播放的公共媒体段上的图形的一个示例;FIG9 shows an example of a graphic that has been superimposed onto a common media segment being played by a media display device;
图10示出了当多个媒体内容流包括相同的未调度媒体段时可以实现的过程的示例;FIG10 illustrates an example of a process that may be implemented when multiple media content streams include the same unscheduled media segments;
图11是说明点位置和点位置周围的路径点的图表;FIG11 is a diagram illustrating a point location and waypoints around the point location;
图12是示出位于距查询点“x”的半径“r”距离内的点集合的图表。FIG. 12 is a graph showing the set of points that are within a radius “r” distance from a query point “x”.
图13是示出可能的点值的图表,其中“n”维空间中的角度被计算作为候选点资格的一部分;FIG13 is a chart showing possible point values where angles in “n” dimensional space are calculated as part of candidate point qualification;
图14是示出图12和图13的组合的图表,其中距查询点“x”的距离和角度被应用以确定候选“可疑者”以提交到路径追踪的进一步匹配系统步骤;FIG14 is a diagram illustrating the combination of FIG12 and FIG13, wherein the distance and angle from the query point "x" are applied to determine candidate "suspects" for submission to the further matching system step of path tracing;
图15是表示自相交路径和查询点的图表;以及FIG15 is a diagram showing self-intersecting paths and query points; and
图16是示出三个连续的点位置和点位置周围的路径点的图表。FIG. 16 is a diagram showing three consecutive point positions and waypoints around the point positions.
具体实施方式DETAILED DESCRIPTION
自动内容识别(ACR)系统提供关于由特定媒体显示设备在特定时间点显示的内容的信息。例如,ACR系统可以提供如下信息:诸如正在观看的频道、视频的标题、识别视频或视频内容的一些文本、正在观看的视频的一部分、视频的一个或多个类别、视频的作者和/或制片人等。随后可以使用该信息来例如为视频提供收视统计(例如,视频被多少人,在什么时间以怎样的频繁度被观看,等等)和/或向观众建议有指向性的内容,诸如广告或交互式内容。Automatic content recognition (ACR) systems provide information about the content displayed by a specific media display device at a specific point in time. For example, the ACR system can provide information such as the channel being watched, the title of the video, some text identifying the video or video content, the portion of the video being watched, one or more categories of the video, the author and/or producer of the video, etc. This information can then be used, for example, to provide viewing statistics for the video (e.g., how many people watched the video, at what time, and how frequently, etc.) and/or suggest targeted content to the viewer, such as advertisements or interactive content.
ACR系统可以被实现为视频匹配系统。通常,视频匹配系统可以通过以下步骤来操作:从已知视频数据源获得数据样本,从这些样本中生成标识信息,并将标识信息连同关于视频的已知信息一起存储在数据库中。特定的媒体设备可以使用类似的过程来识别未知的视频内容。具体地,媒体设备可以从正在显示的未知视频中获取数据样本,从样本中生成标识信息,并尝试将该标识信息与存储在数据库中的标识信息进行匹配。当发现匹配时,特定媒体设备可以从数据库中接收关于视频的已知信息。通常,媒体设备进行的匹配操作可以实时进行,即在视频正在设备上显示时进行。The ACR system can be implemented as a video matching system. Generally, a video matching system can operate by obtaining data samples from a known video data source, generating identification information from these samples, and storing the identification information together with the known information about the video in a database. A specific media device can use a similar process to identify unknown video content. Specifically, the media device can obtain data samples from an unknown video being displayed, generate identification information from the samples, and attempt to match the identification information with the identification information stored in the database. When a match is found, the specific media device can receive the known information about the video from the database. Generally, the matching operation performed by the media device can be performed in real time, that is, while the video is being displayed on the device.
然而,当同时在多个频道上显示相同内容时,如上所述的视频匹配系统可能难以识别媒体设备正在显示的媒体内容。此外,系统可能不能确定例如观众正在观看哪个频道,或者要提供什么背景相关的信息。However, when the same content is displayed on multiple channels at the same time, the video matching system described above may have difficulty identifying the media content being displayed by the media device. In addition, the system may not be able to determine, for example, which channel the viewer is watching, or what context-related information to provide.
当多个地方电视台提供联合内容以支持“突发新闻”故事时,多个频道同时显示相同内容的一个示例出现。例如,国家广播机构(例如,美国广播公司(ABC)、哥伦比亚广播系统(CBS)、全国广播公司(NBC)、福克斯网络、有线新闻网络(CNN)等)可以在政治演讲、自然灾害或人为事件发生时提供视频馈送。在该示例中,多个国家和/或地方广播频道可以在国家广播公司正在广播时拾取视频馈送,并且可以将馈送重新广播给地方观众。结果,多个频道可能同时显示相同的视频内容。在其它情况下可能会出现同样的情况,诸如当多个频道同时显示相同的商业事件或体育赛事时。One example of multiple channels showing the same content simultaneously occurs when multiple local television stations provide syndicated content to support a "breaking news" story. For example, a national broadcaster (e.g., ABC, CBS, NBC, CNN, etc.) may provide a video feed during a political speech, natural disaster, or man-made event. In this example, multiple national and/or local broadcast channels may pick up the video feed while the national broadcaster is broadcasting and may rebroadcast the feed to local viewers. As a result, multiple channels may be showing the same video content simultaneously. The same situation may occur in other situations, such as when multiple channels are showing the same commercial or sporting event simultaneously.
如上所述,视频匹配系统可以依赖于从已知视频源(诸如地方和国家频道)收集的数据样本,其中节目信息可以提供诸如视频内容的标题或其它标识字符串的信息以及关于视频内容的其它信息。然而,当多个频道显示相同的内容时,视频匹配系统可能不能唯一地识别视频内容。例如,视频匹配系统可以将一个视频与两个或更多个频道相关联。随后,如果媒体设备调谐到携带相同内容的频道中的一个,则视频匹配系统可能无法确定他的媒体设备已调谐到哪个频道。As described above, the video matching system can rely on data samples collected from known video sources (such as local and national channels), where program information can provide information such as the title or other identification string of the video content and other information about the video content. However, when multiple channels display the same content, the video matching system may not be able to uniquely identify the video content. For example, the video matching system may associate a video with two or more channels. Subsequently, if the media device is tuned to one of the channels carrying the same content, the video matching system may not be able to determine which channel his media device has been tuned to.
在各种实施方式中,视频匹配系统可以被配置为在存在由同时在多个频道上同时出现的公共视频段引起的模糊性的情况下提高自动内容识别的准确性。在没有准确性改进的情况下,在正被监控的多个频道上同时显示的公共视频段的存在可能导致识别由媒体设备正在显示的内容的模糊性。在各种实施方式中,视频匹配系统可以在媒体设备开始显示公共视频段之前和/或之后,使用来自媒体显示设备上显示的媒体内容的信息。使用该信息,视频匹配系统能够将标识信息附加到从公共视频段取得的样本。In various embodiments, the video matching system can be configured to improve the accuracy of automatic content recognition in the presence of ambiguity caused by common video segments appearing simultaneously on multiple channels. Without the accuracy improvement, the presence of common video segments displayed simultaneously on multiple channels being monitored can cause ambiguity in identifying the content being displayed by the media device. In various embodiments, the video matching system can use information from the media content displayed on the media display device before and/or after the media device begins displaying the common video segments. Using this information, the video matching system can attach identification information to samples obtained from the common video segments.
在可能在某个时间点显示公共视频段的国家和地方频道上,频道可以在显示公共视频段之前提供一些唯一可识别的内容。在各种实施方式中,该唯一内容可以用于帮助识别公共视频段。例如,在某些广播新闻中,新闻节目的一些部分已知来自地方频道。例如,新闻节目可以在公共视频段被显示之前介绍和/或评论公共视频段。介绍段可以被称为“话头”段,也就是说,两个或更多人坐在桌子后面并且在视频中从中央胸部向上分框的段。当检测到“话头”段时,视频匹配系统可以例如添加可以用于帮助识别公共视频段的新的时间线信号或者事件。On national and local channels that may display a common video segment at a certain point in time, the channel may provide some uniquely identifiable content before displaying the common video segment. In various embodiments, this unique content may be used to help identify the common video segment. For example, in some broadcast news, portions of the news program are known to be from a local channel. For example, the news program may introduce and/or comment on the common video segment before it is displayed. The introductory segment may be referred to as a "talk head" segment, that is, a segment where two or more people sit behind a table and are framed in the video from center chest up. When a "talk head" segment is detected, the video matching system may, for example, add a new timeline signal or event that may be used to help identify the common video segment.
在各种实施方式中,被配置为使用视频匹配系统的媒体显示设备可以获得并且追踪时间线信号或事件,诸如可以针对“话头”段生成的时间线信号或事件。在这些实施方式中,当媒体设备遇到未知且可能公共的视频段时,媒体设备可检查时间线事件。在一种情况下,媒体设备可能没有找到时间线事件,该时间线事件可以向媒体设备指示该设备刚打开或者调谐到显示公共视频内容的频道。在该情况下,媒体设备的数据收集过程可以被配置为避免使用公共视频段来进行识别。在其它情况下,媒体设备可以使用在公共视频段之前和/或之后接收的时间线事件来生成公共视频段的标识信息。In various embodiments, a media display device configured to use a video matching system can obtain and track timeline signals or events, such as timeline signals or events that can be generated for a "talk head" segment. In these embodiments, when a media device encounters an unknown and possibly public video segment, the media device can check the timeline event. In one case, the media device may not find a timeline event that can indicate to the media device that the device has just turned on or tuned to a channel displaying public video content. In this case, the data collection process of the media device can be configured to avoid using public video segments for identification. In other cases, the media device can use timeline events received before and/or after the public video segment to generate identification information for the public video segment.
在各种实施方式中,视频匹配还可以使用时间线事件来更快地识别商业广告。通常,视频匹配系统获得足够的样本以高概率匹配已知商业广告所花费的时间量可以被称为“商业广告置信间隔”。使用本文描述的技术,当媒体设备使用在指定的过去时间窗口内的时间线事件时,可以减少商业广告置信间隔。In various implementations, video matching can also use timeline events to more quickly identify commercials. Generally, the amount of time it takes for a video matching system to acquire enough samples to match a known commercial with a high probability can be referred to as the "commercial confidence interval." Using the techniques described herein, the commercial confidence interval can be reduced when a media device uses timeline events within a specified past time window.
I.音频-视频内容I. Audio-Visual Content
在各种实施方式中,视频内容匹配系统可以被配置为识别由媒体显示设备正在显示的媒体内容。在各种实施方式中,可以使用视频内容系统来向媒体显示设备提供背景指向性的内容,其中基于识别的媒体内容来选择指向性的内容。In various embodiments, the video content matching system can be configured to identify media content being displayed by a media display device. In various embodiments, the video content system can be used to provide contextually directed content to the media display device, wherein the directed content is selected based on the identified media content.
媒体内容包括视频、音频、文本、图形、视觉和/或可听数据的触觉表示,以及视觉、听觉或触觉信息的各种组合。例如,媒体内容可以包括同步的音频和视频,诸如电影或电视节目。作为另一个示例,媒体内容可以包括文本和图形,诸如网页。作为另一个示例,媒体内容可以包括照片和音乐,诸如带有电影配乐的照片幻灯片。Media content includes video, audio, text, graphics, tactile representations of visual and/or audible data, and various combinations of visual, auditory, or tactile information. For example, media content may include synchronized audio and video, such as a movie or television program. As another example, media content may include text and graphics, such as a web page. As another example, media content may include photos and music, such as a photo slideshow with a movie soundtrack.
媒体显示设备可以是能够显示各种媒体内容的设备。媒体显示设备可以包括例如电视系统。电视(TV)系统包括例如诸如网络电视和连线的电视(也称为“智能电视”)的电视,以及可选地整合在电视中或与之共处的设备,诸如机顶盒(STB)、数字视频光盘(DVD)播放器,和/或数字录像机(DVR)。A media display device may be a device capable of displaying various media content. A media display device may include, for example, a television system. A television (TV) system includes, for example, a network television and a connected television (also known as a "smart TV"), and optionally, devices integrated into or coexisting with the television, such as a set-top box (STB), a digital video disc (DVD) player, and/or a digital video recorder (DVR).
连线的电视是连接到诸如互联网的网络的电视。在各种实施方式中,网络连接的电视可以连接到地方有线或无线网络,诸如例如在私人家庭或商务办公室中。连线的电视可以运行诸如Google的Android的应用程序平台,或者配置为提供交互式智能电话或平板计算机类软件应用程序(其也可被称为“应用软件”)的一些其它平台。A wired TV is a TV that is connected to a network, such as the Internet. In various embodiments, a network-connected TV can be connected to a local wired or wireless network, such as, for example, in a private home or business office. A wired TV can run an application platform such as Google's Android, or some other platform configured to provide interactive smartphone or tablet-like software applications (which may also be referred to as "apps").
在各种实施方式中,媒体显示设备可以接收诸如电视信号的信号。电视信号例如包括表示视频和音频数据的信号,所述视频和音频数据一起广播并被同步以便同时显示。例如,电视信号可以包括电视节目和/或商业广告。在一些情况下,电视信号可以包括与电视信号中的音频-视频内容相关的附加信息。该附加数据可以被称为“元数据”。术语“元数据”还可以用于描述与除了电视信号以外的所传输的其它视频或音频-视频内容相关的信息(例如通过网络以数字化和/或分组化数据的形式传输)。元数据可以包括关于内容的信息,诸如识别内容的信息、内容的描述、内容的一个或多个类别、内容的作者和/或发布者等。因为元数据与元数据相关联的内容一起发送,所以元数据可用于在内容正被观看或播放时提供关于内容的信息。In various embodiments, a media display device may receive a signal such as a television signal. A television signal, for example, includes a signal representing video and audio data that are broadcast together and synchronized for simultaneous display. For example, a television signal may include television programs and/or commercials. In some cases, a television signal may include additional information related to the audio-video content in the television signal. This additional data may be referred to as "metadata." The term "metadata" may also be used to describe information related to other video or audio-video content transmitted in addition to the television signal (e.g., transmitted over a network in the form of digitized and/or packetized data). Metadata may include information about the content, such as information identifying the content, a description of the content, one or more categories of the content, the author and/or publisher of the content, and the like. Because metadata is sent together with the content associated with the metadata, metadata can be used to provide information about the content while the content is being viewed or played.
不是所有的媒体显示设备都可以访问元数据。因此,并非所有媒体显示设备都能够确定它们在任何给定时刻显示或播放的内容。在没有该信息的情况下,媒体显示设备可能无法为特定的观众提供定制的或个性化的内容或广告。尽管关于正被提供给媒体显示设备的内容的一些信息可能在分发渠道中获得,但是在内容到达媒体显示设备之前,该信息可能丢失或被移除。Not all media display devices have access to metadata. Therefore, not all media display devices can determine the content they are displaying or playing at any given moment. Without this information, media display devices may not be able to provide customized or personalized content or advertisements to specific viewers. Although some information about the content being provided to the media display device may be available in the distribution channel, this information may be lost or removed before the content reaches the media display device.
在一些实施方式中,可以使用各种方法向元数据提供音频-视频内容。例如,在一些实施方式中,可以使用水印将标识信息编码到内容中。在这些实施方式中,可以对标识信息进行编码,使得当内容被压缩以供传输和解压缩用于显示时信息不会丢失。然而,这种方法可能要求接收媒体显示设备能够从内容中提取标识信息。另外,这些方法可能不能使用几分之一秒的标识能力来实现正在播放的特定视频的即时标识。In some embodiments, various methods can be used to provide metadata for audio-video content. For example, in some embodiments, identifying information can be encoded into the content using watermarks. In these embodiments, the identifying information can be encoded so that it is not lost when the content is compressed for transmission and decompressed for display. However, this approach may require that the receiving media display device be able to extract the identifying information from the content. Furthermore, these approaches may not be able to achieve instant identification of the specific video being played using fraction-of-a-second identification capabilities.
在各种实施方式中,光纤和数字传输技术的进步已经使得媒体行业能够提供大的频道容量,其中“频道”包括传统广播频道、卫星信号、数字频道,流媒体内容和/或用户生成的内容。在一些情况下,诸如卫星系统的媒体提供商可能被称为多频道视频编程分配器(MVPD)。在一些实施方式中,媒体提供商也能够使用现代传输系统的增加数据容量来提供一些交互式内容,诸如交互式电视(ITV)。智能电视、机顶盒和类似设备的处理能力增强可能会进一步促成交互式内容。In various embodiments, advances in fiber optic and digital transmission technology have enabled the media industry to offer large channel capacities, where a "channel" includes traditional broadcast channels, satellite signals, digital channels, streaming content, and/or user-generated content. In some cases, a media provider such as a satellite system may be referred to as a multi-channel video programming distributor (MVPD). In some embodiments, media providers may also be able to use the increased data capacity of modern transmission systems to offer some interactive content, such as interactive television (ITV). Increased processing power in smart TVs, set-top boxes, and similar devices may further facilitate interactive content.
交互式电视可以使电视系统以类似于万维网的方式用作双向信息分配机制。交互式电视可以提供各种营销、娱乐和教育能力,诸如例如使得观众能够订购广告上展示的产品或服务,在游戏节目中与参赛者竞争,参加现场教室会话等。在一些实施方式中,交互式功能可以由机顶盒来控制。在这些实施方式中,机顶盒可以执行与视频内容相关联的交互式节目,诸如电视广播。交互式功能可以显示在电视的屏幕上,并且可以包括图标或菜单以允许观众经由电视的遥控器或键盘进行选择。Interactive television can make television system be used as two-way information distribution mechanism in the mode similar to the World Wide Web.Interactive television can provide various marketing, entertainment and educational capabilities, such as for example making it possible for viewers to order products or services displayed on advertisements, compete with contestants in game shows, participate in live classroom sessions, etc. In some embodiments, interactive functions can be controlled by a set-top box. In these embodiments, the set-top box can perform interactive programs associated with video content, such as television broadcasts. Interactive functions can be displayed on the screen of the television and can include icons or menus to allow viewers to select via the remote control or keyboard of the television.
在各种实施方式中,可以将交互式内容结合到音频-视频内容中。在一些实施方式中,音频-视频内容可以由广播流组成。广播流也可以被称为“频道”或“网络馈送”。术语“广播流”可以指通过例如天线、卫星、同轴电缆、数字用户线(DSL)电缆、光纤电缆或一些其它传输介质由电视接收的广播信号。在各种实施方式中,可以使用“触发器”将交互式内容结合到音频-视频内容中。触发器可能被插入到特定节目的内容中。包括触发器的内容可以被称为“增强节目内容”或“增强电视节目”或“增强视频信号”。触发器可以用于(例如,在机顶盒或智能电视中的处理器处)警告媒体显示设备交互式内容可用。触发器可以包含关于可用内容以及在哪里可以找到交互式内容(例如,存储器地址、网络地址和/或网站地址)的信息。触发器还可以包含可以在媒体显示设备上向观众显示的信息。例如,由触发器提供的信息可以显示在由媒体显示设备提供的屏幕的底部处。显示的信息可以提示观众执行一些动作或者在多个选项中进行选择。In various embodiments, interactive content can be incorporated into audio-video content. In some embodiments, the audio-video content can consist of a broadcast stream. A broadcast stream can also be referred to as a "channel" or "network feed." The term "broadcast stream" can refer to a broadcast signal received by a television via, for example, an antenna, satellite, coaxial cable, digital subscriber line (DSL) cable, fiber optic cable, or some other transmission medium. In various embodiments, interactive content can be incorporated into the audio-video content using "triggers." Triggers can be inserted into the content of a particular program. Content that includes triggers can be referred to as "enhanced program content," "enhanced television programs," or "enhanced video signals." Triggers can be used (e.g., at a processor in a set-top box or smart TV) to alert a media display device that interactive content is available. Triggers can include information about the available content and where to find the interactive content (e.g., a memory address, a network address, and/or a website address). Triggers can also include information that can be displayed to the viewer on the media display device. For example, the information provided by the trigger can be displayed at the bottom of a screen provided by the media display device. The displayed information can prompt the viewer to perform some action or select from multiple options.
II.视频匹配II. Video Matching
在各种实施方式中,视频内容系统可以被配置为识别由媒体显示设备正在显示或播放的媒体内容。在各种实施方式中,识别在特定时刻正在观看的内容的信息可用于捕捉观众的特定反应并适当地作出响应,诸如请求将内容倒带或请求从头开始重放视频。可替代地或另外地,标识信息可用于触发可以由内容提供商或广告商提供的诸如广告的指向性的内容。标识音频-视频内容的信息因此可用于向本来不具有智能电视能力的设备提供观众定制的视频点播(VoD)能力。In various embodiments, the video content system can be configured to identify media content being displayed or played by a media display device. In various embodiments, information identifying the content being viewed at a particular moment can be used to capture specific reactions from the viewer and respond appropriately, such as requesting that the content be rewound or that the video be replayed from the beginning. Alternatively or additionally, the identification information can be used to trigger targeted content, such as advertisements, that can be provided by content providers or advertisers. Information identifying audio-video content can therefore be used to provide viewer-customized video-on-demand (VoD) capabilities to devices that do not otherwise have smart TV capabilities.
在各种实施方式中,可以通过以定期间隔对正在媒体显示设备的屏幕上显示的像素数据的子集进行采样,并且然后在内容数据库中寻找相似的像素数据来识别视频段。在一些实施方式中,可以通过提取与视频段相关联的音频数据并且在内容数据库中寻找相似的音频数据来识别视频段。在一些实施方式中,可以通过使用自动语音识别技术处理与视频段相关联的音频数据以及从已知视频内容搜索文本录本来定位匹配文本短语,从而识别视频段。在一些实施方式中,可以通过处理与视频段相关联的元数据来识别视频段。In various embodiments, video segments can be identified by sampling a subset of pixel data being displayed on a screen of a media display device at regular intervals and then searching for similar pixel data in a content database. In some embodiments, video segments can be identified by extracting audio data associated with the video segments and searching for similar audio data in a content database. In some embodiments, video segments can be identified by processing the audio data associated with the video segments using automatic speech recognition technology and searching text transcripts from known video content to locate matching text phrases. In some embodiments, video segments can be identified by processing metadata associated with the video segments.
在各种实施方式中,视频匹配系统可用于向交互式媒体显示系统提供背景指向性的内容。背景指向性的内容可以基于正在显示的视频段的标识以及视频段正在播放的时间(白天或晚上,下午3点等)和/或当前正在显示的视频段的部分(例如,从视频的起始起的当前偏移)。这里,“播放时间”和“偏移时间”可以互换使用来描述当前正在显示的视频的一部分。In various embodiments, the video matching system can be used to provide context-specific content to the interactive media display system. The context-specific content can be based on the identity of the video segment being displayed, the time of day or night, 3 p.m., etc., and/or the portion of the video segment currently being displayed (e.g., the current offset from the start of the video). Here, "play time" and "offset time" can be used interchangeably to describe the portion of the video currently being displayed.
在各种实施方式中,配备有内容匹配系统的媒体显示设备可以能够推断出内容的主题,并且与观众相应地交互,该内容匹配系统具有当前由媒体显示设备显示或播放的识别内容。例如,媒体显示设备可以能够提供对内容的视频点播版本和/或内容的更高分辨率或3D格式的即时访问。另外,媒体显示设备可以提供使内容重新开始、快进、暂停和倒带的能力。在各种实施方式中,广告可被包括在内容中。在这些实施方式中,可以定制一些或全部广告消息,诸如例如针对观众的地理位置、人口统计群组或购物历史进行定制。可替代地或者另外地,可以减少广告的数量或者长度,或者可以完全删除广告。In various embodiments, a media display device equipped with a content matching system having identified content currently being displayed or played by the media display device may be able to infer the subject matter of the content and interact accordingly with the viewer. For example, the media display device may be able to provide instant access to a video-on-demand version of the content and/or a higher resolution or 3D format of the content. In addition, the media display device may provide the ability to restart, fast-forward, pause, and rewind the content. In various embodiments, advertisements may be included in the content. In these embodiments, some or all of the advertising messages may be customized, such as, for example, for the viewer's geographic location, demographic group, or shopping history. Alternatively or additionally, the number or length of advertisements may be reduced, or advertisements may be deleted entirely.
在各种实施方式中,一旦识别了视频段,就可以通过对由媒体显示设备显示或播放的像素数据(或相关联的音频数据)的子集进行采样并且在内容数据库中寻找相似的像素(或音频)数据来确定偏移时间。在各种实施方式中,可以通过提取与这种视频段相关联的音频或图像数据并在内容数据库中寻找相似的音频或图像数据来确定偏移时间。在各种实施方式中,可以通过使用自动语音识别技术处理与这种视频段相关联的音频数据来确定偏移时间。在各种实施方式中,可以通过处理与这种视频段相关联的元数据来确定偏移时间。In various embodiments, once a video segment is identified, the offset time can be determined by sampling a subset of the pixel data (or associated audio data) displayed or played by the media display device and searching for similar pixel (or audio) data in a content database. In various embodiments, the offset time can be determined by extracting audio or image data associated with such a video segment and searching for similar audio or image data in a content database. In various embodiments, the offset time can be determined by processing the audio data associated with such a video segment using automatic speech recognition technology. In various embodiments, the offset time can be determined by processing metadata associated with such a video segment.
在各种实施方式中,用于视频匹配的系统可被包括在电视系统中。在各种实施方式中,电视系统包括连线的电视。在各种实施方式中,视频匹配系统可部分地包括在连线的电视中,并且部分地包括在通过互联网连接到连线的电视的服务器上。In various embodiments, a system for video matching may be included in a television system. In various embodiments, the television system includes a wired television. In various embodiments, the video matching system may be included partially in the wired television and partially on a server connected to the wired television via the Internet.
图1示出可以识别未知内容的匹配系统100。在一些示例中,未知内容可以包括一个或多个未知数据点。在这种示例中,匹配系统100可以将未知数据点与参考数据点匹配以识别与未知数据点相关联的未知视频段。参考数据点可以被包括在参考数据库116中。FIG1 illustrates a matching system 100 that can identify unknown content. In some examples, the unknown content may include one or more unknown data points. In such examples, the matching system 100 can match the unknown data points with reference data points to identify unknown video segments associated with the unknown data points. The reference data points may be included in a reference database 116.
匹配系统100包括客户端设备102和匹配服务器104。客户端设备102包括媒体客户端106、输入设备108、输出设备110以及一个或多个背景应用程序126。媒体客户端106(其可以包括电视系统、计算机系统或能够连接到互联网的其它电子设备)可以解码与视频节目128相关联的数据(例如,广播信号、数据分组或其它帧数据)。媒体客户端106可以将视频的每个帧的解码内容放置到视频帧缓冲器中以准备显示或进一步处理视频帧的像素信息。客户端设备102可以是可接收和解码视频信号的任何电子解码系统。客户端设备102可以接收视频节目128并将视频信息存储在视频缓冲器(未示出)中。客户端设备102可以处理视频缓冲器信息并且产生未知的数据点(其可以被称为“线索”),如下面参考图3更详细描述的。媒体客户端106可以将未知数据点发送到匹配服务器104,以与参考数据库116中的参考数据点进行比较。Matching system 100 includes a client device 102 and a matching server 104. Client device 102 includes a media client 106, an input device 108, an output device 110, and one or more background applications 126. Media client 106 (which may include a television system, a computer system, or other electronic device capable of connecting to the Internet) can decode data associated with a video program 128 (e.g., a broadcast signal, data packets, or other frame data). Media client 106 can place the decoded content of each frame of the video into a video frame buffer in preparation for display or further processing of the pixel information of the video frame. Client device 102 can be any electronic decoding system that can receive and decode a video signal. Client device 102 can receive video program 128 and store the video information in a video buffer (not shown). Client device 102 can process the video buffer information and generate unknown data points (which may be referred to as "clues"), as described in more detail below with reference to FIG. Media client 106 can send the unknown data points to matching server 104 for comparison with reference data points in reference database 116.
输入设备108可以包括允许请求或其它信息被输入到媒体客户端106的任何合适的设备。例如,输入设备108可以包括键盘、鼠标、语音识别输入设备、用于从无线设备(例如,从遥控器、移动设备或其它合适的无线设备)接收无线输入的无线接口,或者任何其它合适的输入设备。输出设备110可以包括可呈现或以其它方式输出信息的任何合适的设备,诸如显示器、用于向无线设备(例如,向移动设备或其它合适的无线设备)发送无线输出的无线接口、打印机或其它合适的输出设备。The input device 108 may include any suitable device that allows requests or other information to be input into the media client 106. For example, the input device 108 may include a keyboard, a mouse, a voice recognition input device, a wireless interface for receiving wireless input from a wireless device (e.g., from a remote control, a mobile device, or other suitable wireless device), or any other suitable input device. The output device 110 may include any suitable device that can present or otherwise output information, such as a display, a wireless interface for sending wireless output to a wireless device (e.g., to a mobile device or other suitable wireless device), a printer, or other suitable output device.
匹配系统100可通过首先从已知视频数据源118收集数据样本来开始识别视频段的过程。例如,匹配服务器104收集数据以从各种视频数据源118建立和维护参考数据库116。视频数据源118可以包括电视节目、电影或任何其它合适的视频源的媒体提供商。来自视频数据源118的视频数据可以作为无线电广播、有线电视频道、来自互联网和来自任何其它视频数据源的流式来源来提供。在一些示例中,如下所述,匹配服务器104可以处理来自视频数据源118的接收视频以在参考数据库116中生成和收集参考视频数据点。在一些示例中,来自视频数据源118的视频节目可以由参考视频节目摘录系统(未示出)处理,所述参考视频节目摘录系统可以产生参考视频数据点并将其发送到参考数据库116以供存储。可以如上所述使用参考数据点来确定随后用于分析未知数据点的信息。The matching system 100 may begin the process of identifying video segments by first collecting data samples from known video data sources 118. For example, the matching server 104 collects data from various video data sources 118 to build and maintain a reference database 116. The video data sources 118 may include media providers of television programs, movies, or any other suitable video sources. The video data from the video data sources 118 may be provided as a streaming source from a radio broadcast, a cable TV channel, from the Internet, or from any other video data source. In some examples, as described below, the matching server 104 may process received video from the video data sources 118 to generate and collect reference video data points in the reference database 116. In some examples, the video programs from the video data sources 118 may be processed by a reference video program excerpt system (not shown), which may generate reference video data points and send them to the reference database 116 for storage. The reference data points may be used as described above to determine information that is subsequently used to analyze unknown data points.
匹配服务器104可以将在某一时间段(例如,若干天、若干周、若干个月或任何其它合适的时间段)内接收的每个视频节目的参考视频数据点存储在参考数据库116中。匹配服务器104可以建立并连续或周期性地更新电视节目样本的参考数据库116(例如,包括也可以被称为线索或线索值的参考数据点)。在一些示例中,收集的数据是从周期性视频帧(例如,每五个视频帧、每十个视频帧、每十五个视频帧或其它适当数量的帧)采样的视频信息的压缩表示。在一些示例中,为每个节目源收集每帧的数据字节数(例如,每帧25字节、每帧50字节、每帧75字节、每帧100字节,或者每帧的任何其它数量的字节)。任何数量的节目源可用于获得视频,诸如25个频道、50个频道、75个频道、100个频道、200个频道或任何其它数量的节目源。使用示例量的数据,在三天24小时期间收集的总数据变得非常大。因此,减少实际参考数据点集合的数量有利于减少匹配服务器104的存储负荷。Matching server 104 may store reference video data points for each video program received within a time period (e.g., days, weeks, months, or any other suitable time period) in reference database 116. Matching server 104 may establish and continuously or periodically update reference database 116 of sample television programs (e.g., including reference data points, which may also be referred to as clues or clue values). In some examples, the collected data is a compressed representation of video information sampled from periodic video frames (e.g., every fifth video frame, every tenth video frame, every fifteenth video frame, or another suitable number of frames). In some examples, the number of bytes of data per frame is collected for each program source (e.g., 25 bytes per frame, 50 bytes per frame, 75 bytes per frame, 100 bytes per frame, or any other number of bytes per frame). Any number of program sources may be used to obtain video, such as 25 channels, 50 channels, 75 channels, 100 channels, 200 channels, or any other number of program sources. Using this example amount of data, the total data collected over a three-day, 24-hour period becomes very large. Therefore, reducing the number of actual reference data point sets is beneficial to reducing the storage load of the matching server 104 .
媒体客户端106可以将通信122发送到匹配服务器104的匹配引擎112。通信122可以包括请求匹配引擎112识别未知内容。例如,未知内容可以包括一个或多个未知数据点,并且参考数据库116可以包括多个参考数据点。匹配引擎112可以通过将未知数据点与参考数据库116中的参考数据进行匹配来识别未知内容。在一些示例中,未知内容可以包括由显示器(对于基于视频的ACR)呈现的未知视频数据、搜索查询(对于Map Reduce系统,Bigtable系统或其它数据存储系统)、未知的面部图像(用于面部识别)、未知的图案图像(用于图案识别)或可以比对参考数据的数据库进行匹配的任何其它未知数据。参考数据点可以从从视频数据源118接收的数据中导出。例如,数据点可以从视频数据源118提供的信息中提取,并且可以被索引并存储在参考数据库116中。The media client 106 may send a communication 122 to the matching engine 112 of the matching server 104. The communication 122 may include a request for the matching engine 112 to identify unknown content. For example, the unknown content may include one or more unknown data points, and the reference database 116 may include multiple reference data points. The matching engine 112 may identify the unknown content by matching the unknown data points with reference data in the reference database 116. In some examples, the unknown content may include unknown video data presented by a display (for video-based ACR), a search query (for a Map Reduce system, a Bigtable system, or other data storage system), an unknown facial image (for facial recognition), an unknown pattern image (for pattern recognition), or any other unknown data that can be matched against a database of reference data. The reference data points may be derived from data received from the video data source 118. For example, the data points may be extracted from information provided by the video data source 118 and may be indexed and stored in the reference database 116.
匹配引擎112可以向候选确定引擎114发送请求以从参考数据库116确定候选数据点。候选数据点可以是与未知数据点相隔一确定距离的参考数据点。在一些示例中,参考数据点和未知数据点之间的距离可以通过将参考数据点的一个或多个像素(例如,单个像素、表示一组像素的值(例如,平均数、平均值、中值,或其它值)或其它合适数量的像素)与未知数据点的一个或多个像素进行比较来确定。在一些示例中,当每个样本位置处的像素在特定像素值范围内时,参考数据点可以与未知数据点相隔一确定的距离。Matching engine 112 can send a request to candidate determination engine 114 to determine candidate data points from reference database 116. A candidate data point can be a reference data point that is a certain distance away from the unknown data point. In some examples, the distance between the reference data point and the unknown data point can be determined by comparing one or more pixels of the reference data point (e.g., a single pixel, a value representing a group of pixels (e.g., an average, mean, median, or other value), or other suitable number of pixels) to one or more pixels of the unknown data point. In some examples, the reference data point can be a certain distance away from the unknown data point when the pixels at each sample location are within a specific range of pixel values.
在一个说明性示例中,像素的像素值可以包括红色值、绿色值和蓝色值(在红-绿-蓝(RGB)颜色空间中)。在这种示例中,可以通过如下方式将第一像素(或者表示第一组像素的值)与第二像素(或者表示第二组像素的值,其中第二组像素位于与第一组像素的相同显示缓冲器位置中)进行比较:分别比较相应的红色值、绿色值和蓝色值,并确保该值在一定的值范围内(例如在0-5的值内)。例如,当(1)第一像素的红色值在第二像素的红色值的0-255值范围(正或负)中的5个值内时,(2)第一像素的绿色值在第二像素的绿色值的0-255值范围(正或负)中的5个值内时,以及(3)第一像素的蓝色值在第二像素的蓝色值的0-255值范围(正或负)中的5个值内时,第一像素可以与第二像素匹配。在这种示例中,候选数据点是与未知数据点近似匹配的参考数据点,由此导致对于未知数据点识别多个候选数据点(与不同媒体段有关)。候选确定引擎114可以将候选数据点返回到匹配引擎112。In one illustrative example, the pixel value of a pixel may include a red value, a green value, and a blue value (in a red-green-blue (RGB) color space). In such an example, a first pixel (or a value representing a first group of pixels) may be compared to a second pixel (or a value representing a second group of pixels, where the second group of pixels is located in the same display buffer location as the first group of pixels) by comparing the corresponding red value, green value, and blue value, respectively, and ensuring that the values are within a certain value range (e.g., within a value range of 0-5). For example, a first pixel may be matched to a second pixel when (1) the red value of the first pixel is within 5 values of the 0-255 value range (positive or negative) of the red value of the second pixel, (2) the green value of the first pixel is within 5 values of the 0-255 value range (positive or negative) of the green value of the second pixel, and (3) the blue value of the first pixel is within 5 values of the 0-255 value range (positive or negative) of the blue value of the second pixel. In such an example, a candidate data point is a reference data point that closely matches the unknown data point, thereby resulting in identification of multiple candidate data points (related to different media segments) for the unknown data point.The candidate determination engine 114 can return the candidate data points to the matching engine 112.
对于候选数据点,匹配引擎112可以将令牌添加到仓中,该仓与候选数据点相关联并被分配给从中导出候选数据点的已识别视频段。可将相应的令牌添加到与识别的候选数据点对应的所有仓。当匹配服务器104从客户端设备102接收到更多未知数据点(对应于正在观看的未知内容)时,可以执行类似候选数据点确定过程,并且可以将令牌添加到与识别的候选数据点对应的仓。这些仓中只有一个仓对应于正在观看的未知视频内容段,其它仓对应于由于类似的数据点值(例如,具有相似的像素颜色值)而匹配但是不对应于正在观看的实际视频内容段的候选数据点。与正在观看的未知视频段对应的候选视频内容段的仓将比不与未知视频段对应的视频内容段的其它仓具有分配给它的更多令牌。例如,随着接收到更多的未知数据点,与该仓对应的更大量的参考数据点被识别为候选数据点,导致更多的令牌被添加到该仓。一旦仓包括特定数量的令牌,也就是说,仓达到预定阈值,则匹配引擎112可以确定与仓相关联的视频段当前正在客户端设备102上显示。视频段可以包括整个视频节目或视频节目的一部分。例如,视频段可以是视频节目、视频节目的场景、视频节目的一个或多个帧或视频节目的任何其它部分。For a candidate data point, the matching engine 112 may add a token to a bin associated with the candidate data point and assigned to the identified video segment from which the candidate data point was derived. Corresponding tokens may be added to all bins corresponding to the identified candidate data point. As the matching server 104 receives more unknown data points (corresponding to the unknown content being viewed) from the client device 102, a similar candidate data point determination process may be performed, and tokens may be added to the bins corresponding to the identified candidate data points. Only one of these bins corresponds to the unknown video content segment being viewed, while the other bins correspond to candidate data points that are matched due to similar data point values (e.g., having similar pixel color values) but do not correspond to the actual video content segment being viewed. The bin for the candidate video content segment corresponding to the unknown video segment being viewed will have more tokens assigned to it than other bins for video content segments that do not correspond to the unknown video segment. For example, as more unknown data points are received, a greater number of reference data points corresponding to that bin are identified as candidate data points, resulting in more tokens being added to that bin. Once a bin includes a certain number of tokens, that is, the bin reaches a predetermined threshold, the matching engine 112 can determine that the video segment associated with the bin is currently being displayed on the client device 102. The video segment can include an entire video program or a portion of a video program. For example, a video segment can be a video program, a scene of a video program, one or more frames of a video program, or any other portion of a video program.
图2示出用于识别未知数据的匹配系统200的部件。例如,匹配引擎212可以使用已知内容(例如,已知的媒体段、存储在数据库中用于对比搜索的信息、已知的面部或图案等)的数据库来执行用于识别未知内容(例如,未知的媒体段、搜索查询、面部或者图案的图像等)的匹配处理。例如,匹配引擎212接收将与参考数据库中的参考数据点204中的某一参考数据点匹配的未知数据内容202(其可以被称为“线索”)。未知数据内容202也可以由候选确定引擎214接收,或者从匹配引擎212发送到候选确定引擎214。候选确定引擎214可以进行搜索处理以通过搜索参考数据库中的参考数据点204来识别候选数据点206。在一个示例中,搜索处理可以包括产生邻近值集合(与未知数据内容202的未知值相距一定距离)的最近邻居搜索处理。下面更详细地讨论最近的邻近搜索过程。候选数据点206被输入到匹配引擎212以进行匹配过程以生成匹配结果208。取决于应用,匹配结果208可以包括由显示器呈现的视频数据、搜索结果、使用面部识别确定的脸部、使用图案识别确定的图案或任何其它结果。FIG2 illustrates components of a matching system 200 for identifying unknown data. For example, a matching engine 212 can use a database of known content (e.g., known media segments, information stored in a database for comparison searches, known faces or patterns, etc.) to perform a matching process for identifying unknown content (e.g., unknown media segments, search queries, images of faces or patterns, etc.). For example, the matching engine 212 receives unknown data content 202 (which can be referred to as a "clue") that matches a reference data point 204 in a reference database. The unknown data content 202 can also be received by or sent from the matching engine 212 to the candidate determination engine 214. The candidate determination engine 214 can perform a search process to identify candidate data points 206 by searching the reference data points 204 in the reference database. In one example, the search process can include a nearest neighbor search process that generates a set of neighboring values (at a certain distance from the unknown value of the unknown data content 202). The nearest neighbor search process is discussed in more detail below. Candidate data points 206 are input to a matching engine 212 for a matching process to generate matching results 208. Matching results 208 may include video data presented by a display, search results, faces determined using facial recognition, patterns determined using pattern recognition, or any other results depending on the application.
在确定未知数据点(例如,未知数据内容202)的候选数据点206时,候选确定引擎214确定未知数据点与参考数据库中的参考数据点204之间的距离。与未知数据点相隔一定距离的参考数据点被识别为候选数据点206。在一些示例中,参考数据点与未知数据点之间的距离可以通过将参考数据点的一个或多个像素与未知数据点的一个或多个像素进行比较来确定,如上面关于图1所描述的。在一些示例中,当每个样本位置处的像素在特定值范围内时,参考数据点可以与未知数据点相隔一定的距离。如上所述,候选数据点是与未知数据点近似匹配的参考数据点,并且由于近似匹配,对于该未知数据点识别出多个候选数据点(与不同媒体段有关)。候选确定引擎114可以将候选数据点返回到匹配引擎112。When determining candidate data points 206 for an unknown data point (e.g., unknown data content 202), the candidate determination engine 214 determines the distance between the unknown data point and reference data points 204 in the reference database. Reference data points that are a certain distance away from the unknown data point are identified as candidate data points 206. In some examples, the distance between the reference data point and the unknown data point can be determined by comparing one or more pixels of the reference data point to one or more pixels of the unknown data point, as described above with respect to FIG. 1. In some examples, the reference data point can be a certain distance away from the unknown data point when the pixels at each sample location are within a specific value range. As described above, a candidate data point is a reference data point that approximately matches the unknown data point, and due to the approximate match, multiple candidate data points (related to different media segments) are identified for the unknown data point. The candidate determination engine 114 can return the candidate data points to the matching engine 112.
图3示出了包括解码器的存储缓冲器302的视频摘录捕捉系统400的示例。解码器可以是匹配服务器104或媒体客户端106的一部分。解码器可以不与物理电视显示面板或设备一起操作或者不需要物理电视显示面板或设备。解码器可以解码并且在需要时将数字视频节目解密为电视节目的未压缩的位图表示。为了建立参考视频数据的参考数据库(例如,参考数据库316),匹配服务器104可以获取从视频帧缓冲器读取的视频像素的一个或多个阵列。视频像素阵列被称为视频块。视频块可以是任何形状或图案,但是为了该具体示例的目的,被描述为10×10像素阵列,包括水平十个像素和垂直十个像素。同样为了该示例的目的,假设存在从视频帧缓冲器内提取的均匀分布在缓冲器的边界内的25个像素块位置。Figure 3 shows an example of a video excerpt capture system 400 including a storage buffer 302 of a decoder. The decoder can be part of the matching server 104 or the media client 106. The decoder can operate without or require a physical television display panel or device. The decoder can decode and, if necessary, decrypt a digital video program into an uncompressed bitmap representation of the television program. In order to establish a reference database of reference video data (e.g., reference database 316), the matching server 104 can obtain one or more arrays of video pixels read from the video frame buffer. The array of video pixels is called a video block. The video block can be any shape or pattern, but for the purposes of this specific example, it is described as a 10×10 pixel array, comprising ten pixels horizontally and ten pixels vertically. Also for the purposes of this example, it is assumed that there are 25 pixel block positions extracted from the video frame buffer that are evenly distributed within the boundaries of the buffer.
像素块(例如,像素块304)的示例分配在图3中示出。如上所述,像素块可以包括像素阵列,诸如10×10阵列。例如,像素块304包括10×10像素阵列。像素可以包括颜色值,诸如红色、绿色和蓝色值。例如,示出了具有红-绿-蓝(RGB)颜色值的像素306。像素的颜色值可以由每种颜色的八位二进制值表示。可以用于表示像素的颜色的其它合适的颜色值包括亮度和色度(Y,Cb,Cr,也称为YUV)值或任何其它合适的颜色值。An example allocation of pixel blocks (e.g., pixel block 304) is shown in FIG3. As described above, a pixel block can include a pixel array, such as a 10×10 array. For example, pixel block 304 includes a 10×10 pixel array. Pixels can include color values, such as red, green, and blue values. For example, pixel 306 having a red-green-blue (RGB) color value is shown. The color value of a pixel can be represented by an eight-bit binary value for each color. Other suitable color values that can be used to represent the color of a pixel include brightness and chrominance (Y, Cb, Cr, also known as YUV) values or any other suitable color values.
取每个像素块的平均数(或在某些情况下的平均值),并且创建所得的数据记录并用时间码(或时间戳)标记它。例如,为每个10×10像素的块阵列寻找平均值,在这种情况下,对于每帧总共600位的像素信息,25个显示缓冲器位置中的每个显示缓冲器位置产生24位数据。在一个示例中,像素块304的平均值被计算,并且由像素块平均值308示出。在一个说明性示例中,时间码可以包括“新纪元时间”,其表示自1970年1月1日午夜起总的经过时间(以几分之一秒为单位)。例如,像素块平均值308的值与时间码412组合。在计算系统(包括例如基于Unix的系统)中,新纪元时间是公认的惯例。关于视频节目的信息(称为元数据)被附加到数据记录。元数据可以包括关于节目的任何信息,诸如节目标识符、节目时间、节目长度或任何其它信息。包括像素块的平均值、时间码和元数据的数据记录形成“数据点”(也被称为“线索”)。数据点310是参考视频数据点的一个示例。An average (or, in some cases, mean value) is taken for each pixel block, and the resulting data record is created and marked with a time code (or timestamp). For example, the average is found for each 10×10 pixel block array, in which case each of the 25 display buffer locations produces 24 bits of data for a total of 600 bits of pixel information per frame. In one example, the average value for pixel block 304 is calculated and shown by pixel block average 308. In one illustrative example, the time code may include "epoch time," which represents the total elapsed time (in fractions of a second) since midnight on January 1, 1970. For example, the value of pixel block average 308 is combined with time code 412. Epoch time is a recognized convention in computing systems (including, for example, Unix-based systems). Information about the video program (referred to as metadata) is appended to the data record. The metadata may include any information about the program, such as a program identifier, program time, program length, or any other information. The data record including the pixel block average, time code, and metadata forms a "data point" (also referred to as a "thread"). Data point 310 is an example of a reference video data point.
识别未知视频段的过程以类似于创建参考数据库的步骤开始。例如,图4示出包括解码器的存储器缓冲器402的视频摘录捕捉系统400。视频摘录捕捉系统400可以是处理由显示器(例如,在互联网连接的电视监控器(诸如智能电视)、移动设备或其它电视观看设备上)呈现的数据的客户端设备102的一部分。视频摘录捕捉系统400可以利用类似的过程来生成未知视频数据点410,如创建参考视频数据点310的系统300所使用的未知视频数据点410。在一个示例中,媒体客户端106可以将未知视频数据点410发送到匹配引擎112以由匹配服务器104识别与未知视频数据点410相关联的视频段。The process of identifying unknown video segments begins with steps similar to those used to create a reference database. For example, FIG4 illustrates a video excerpt capture system 400 including a memory buffer 402 of a decoder. The video excerpt capture system 400 can be part of a client device 102 that processes data presented by a display (e.g., on an Internet-connected television monitor (such as a smart TV), a mobile device, or other television viewing device). The video excerpt capture system 400 can utilize a similar process to generate unknown video data points 410 as used by the system 300 for creating reference video data points 310. In one example, the media client 106 can send the unknown video data point 410 to the matching engine 112 for the matching server 104 to identify video segments associated with the unknown video data point 410.
如图4中所示,视频块404可以包括10×10像素阵列。视频块404可以从显示器呈现的视频帧中提取。可以从视频帧中提取多个这种像素块。在一个说明性示例中,如果从视频帧中提取25个这种像素块,则结果将是表示75维空间中的位置的点。可以为阵列的每个颜色值(例如,RGB颜色值、Y、Cr、Cb颜色值等)计算平均数(或平均值)。数据记录(例如,未知视频数据点410)由平均像素值形成,并且当前时间被附加到该数据。可以使用上述技术将一个或多个未知视频数据点发送到匹配服务器104以与来自参考数据库116的数据匹配。As shown in Figure 4, video block 404 may comprise a 10×10 pixel array. Video block 404 may be extracted from a video frame presented on a display. A plurality of such pixel blocks may be extracted from a video frame. In an illustrative example, if 25 such pixel blocks are extracted from a video frame, the result will be a point representing a position in a 75-dimensional space. An average (or mean) may be calculated for each color value of the array (e.g., RGB color values, Y, Cr, Cb color values, etc.). A data record (e.g., unknown video data point 410) is formed by the average pixel value, and the current time is appended to the data. The above-described techniques may be used to send one or more unknown video data points to a matching server 104 for matching with data from a reference database 116.
III.常见的视频段III. Common video segments
图5示出了用于交互式电视(TV)系统501的视频匹配系统500的示例。给出交互式电视系统501作为媒体显示设备的示例。在该示例中,交互式电视系统501包括电视客户端503和背景指向性的客户端502。电视客户端503可以接收电视节目505。电视节目505可以包括音频-视频内容,其可以包括音频、视频和/或具有同步音频数据的视频数据。在一些情况下,电视节目505可以包括元数据。电视客户端503可以被配置为显示音频-视频内容,包括在屏幕上显示视频内容和/或通过扬声器播放音频内容。电视节目505可以从包括广播电视提供商、卫星电视提供商、互联网媒体提供商、音频-视频播放设备(例如DVD播放器、DVR播放器、VCR播放器等)等的各种源接收。FIG5 shows an example of a video matching system 500 for an interactive television (TV) system 501. The interactive television system 501 is given as an example of a media display device. In this example, the interactive television system 501 includes a television client 503 and a background-directed client 502. The television client 503 can receive television programs 505. The television programs 505 can include audio-video content, which can include audio, video and/or video data with synchronized audio data. In some cases, the television programs 505 can include metadata. The television client 503 can be configured to display audio-video content, including displaying video content on a screen and/or playing audio content through speakers. The television programs 505 can be received from various sources including broadcast television providers, satellite television providers, Internet media providers, audio-video playback devices (e.g., DVD players, DVR players, VCR players, etc.).
在各种实施方式中,视频匹配系统可以包括匹配服务器509。匹配服务器509可以识别由交互式电视系统501显示或播放的媒体。为了提供视频匹配服务,匹配服务器509可以接收来自摘录服务器520的已知媒体线索数据517。摘录服务器520可以从诸如视频点播(VoD)内容馈送515a、地方频道馈送515b和国家频道馈送515c的各种已知源获得数据。这些已知源中的每一个可以提供媒体数据(例如,视频和/或音频)以及识别媒体数据的信息,诸如节目指南或元数据。在各种实施方式中,摘录服务器520可以从从各种源接收的媒体数据中生成已知的媒体线索数据517。摘录服务器520可以向包括匹配服务器509的各种接收者提供已知媒体线索数据517。In various embodiments, the video matching system may include a matching server 509. Matching server 509 may identify media displayed or played by interactive television system 501. To provide video matching services, matching server 509 may receive known media cue data 517 from an excerpt server 520. Excerpt server 520 may obtain data from various known sources, such as video on demand (VoD) content feeds 515a, local channel feeds 515b, and national channel feeds 515c. Each of these known sources may provide media data (e.g., video and/or audio) and information identifying the media data, such as a program guide or metadata. In various embodiments, excerpt server 520 may generate known media cue data 517 from the media data received from the various sources. Excerpt server 520 may provide known media cue data 517 to various recipients, including matching server 509.
在各种实施方式中,摘录服务器520还可提供节目标识和时间数据514。在各种实施方式中,节目标识和时间数据514与已知媒体线索数据517同步,这意味着节目标识和时间数据514识别已知媒体线索517和/或提供与已知媒体提示相关联的媒体期望被显示的时间。节目标识和时间数据514也可以被称为元数据。In various embodiments, the excerpt server 520 may also provide program identification and time data 514. In various embodiments, the program identification and time data 514 is synchronized with the known media cue data 517, meaning that the program identification and time data 514 identifies the known media cue 517 and/or provides the time at which the media associated with the known media cue is expected to be displayed. The program identification and time data 514 may also be referred to as metadata.
在各种实施方式中,已知媒体线索数据517提供用于识别视频和/或音频数据的线索或密钥。已知媒体线索数据517可能已经从已知的音频-视频媒体获取,使得已知媒体线索517可以与已知的音频-视频媒体的名称和/或一些其它标识信息相关联。如下面进一步详细描述的,已知媒体线索517可以与从交互式电视系统501显示或播放的媒体中取得的类似线索对比以进行匹配。匹配服务器509可以将已知媒体线索数据517和节目标识数据514存储在数据库512中。In various embodiments, known media cue data 517 provides a clue or key for identifying video and/or audio data. Known media cue data 517 may have been obtained from known audio-visual media, such that known media cue 517 can be associated with the name and/or some other identifying information of the known audio-visual media. As described in further detail below, known media cue 517 can be compared to similar cues obtained from media displayed or played by interactive television system 501 for matching. Matching server 509 can store known media cue data 517 and program identification data 514 in database 512.
在各种实施方式中,匹配服务器509可以包括频道识别系统510。频道识别系统510可以从交互式电视系统501接收未知媒体线索507a。例如,电视客户端503可以从在任何给定的时间正显示或播放的音频-视频数据取得样本,并且可以从样本中生成线索。电视客户端503可以将这些线索作为未知媒体线索507a提供给匹配服务器509中的频道识别系统510。然后,频道识别系统510可以将未知媒体线索507a与已知媒体线索517进行匹配以识别由交互式电视系统501正显示或播放的媒体。In various embodiments, matching server 509 may include a channel identification system 510. Channel identification system 510 may receive unknown media cues 507a from interactive television system 501. For example, television client 503 may obtain samples of the audio-video data being displayed or played at any given time and may generate cues from the samples. Television client 503 may provide these cues as unknown media cues 507a to channel identification system 510 in matching server 509. Channel identification system 510 may then match unknown media cues 507a with known media cues 517 to identify the media being displayed or played by interactive television system 501.
在各种实施方式中,频道识别系统510确定未知媒体线索507a的节目标识。节目标识可以包括名称或描述,或者识别由交互式电视系统501显示的媒体内容的一些其它信息。频道识别系统510还可以提供时间,其中时间指示媒体由交互式电视系统501播放的时间。In various embodiments, channel identification system 510 determines a program identifier for unknown media cue 507a. The program identifier may include a name or description, or some other information that identifies the media content displayed by interactive television system 501. Channel identification system 510 may also provide a time, where the time indicates when the media was played by interactive television system 501.
在各种实施方式中,频道识别系统510可以将节目标识和时间数据513提供给背景指向性的管理器511。使用节目标识和时间数据513,背景管理器511可以确定背景相关的内容507b,包括例如应用程序和广告。背景指向性的管理器511可以将背景相关的内容507b提供给交互式电视系统501。例如,交互式电视系统501可以包括用于管理背景相关的内容507b的背景指向性的引擎502。在一些实施方式中,背景指向性的管理器511还可以将事件触发器507c提供给背景指向性的系统502。事件触发器507c可以指示背景指向性的引擎502播放或显示背景相关的内容507b。例如,事件触发器507c可以指示背景指向性的引擎502显示背景相关信息覆盖,其中该信息覆盖与信息覆盖相关的视频内容的显示配合工作。可替代地或另外地,事件触发器507c可以使诸如指向性的广告的替代媒体被显示。在一些实施方式中,背景指向性的引擎502可以将事件确认507d提供给背景指向性的管理器511,事件确认507d指示由事件触发器507c提供的指令已被执行。In various embodiments, channel identification system 510 may provide program identification and time data 513 to contextual manager 511. Using program identification and time data 513, contextual manager 511 may determine contextually relevant content 507b, including, for example, applications and advertisements. Contextual manager 511 may provide contextually relevant content 507b to interactive television system 501. For example, interactive television system 501 may include a contextually relevant engine 502 for managing contextually relevant content 507b. In some embodiments, contextual manager 511 may also provide event trigger 507c to contextual manager 502. Event trigger 507c may instruct contextual manager 502 to play or display contextually relevant content 507b. For example, event trigger 507c may instruct contextual manager 502 to display a contextually relevant information overlay, where the information overlay cooperates with the display of video content associated with the information overlay. Alternatively or additionally, event trigger 507c may cause alternative media, such as a targeted advertisement, to be displayed. In some implementations, the contextually oriented engine 502 may provide an event confirmation 507d to the contextually oriented manager 511, the event confirmation 507d indicating that the instructions provided by the event trigger 507c have been executed.
在各种实施方式中,背景指向性的客户端502可以可替代地或另外地向匹配服务器509提供收视信息。例如,作为事件确认507d的附加或替代,背景指向性的客户端502可以提供收视信息。在该实施方式中,收视信息可以包括例如关于特定媒体段多频繁地被播放,播放媒体段在一天内的哪个时间或一周中的哪一天播放,在媒体段之前和/或之后播放的内容,和/或媒体段在什么频道上播放的信息。在一些情况下,收视信息还可以包括关于观众的信息,诸如人口统计信息。In various embodiments, the contextual client 502 may alternatively or additionally provide viewing information to the matching server 509. For example, in addition to or in lieu of the event confirmation 507d, the contextual client 502 may provide viewing information. In this embodiment, the viewing information may include, for example, information about how frequently a particular media segment is played, the time of day or day of the week at which the media segment is played, content played before and/or after the media segment, and/or the channel on which the media segment is played. In some cases, the viewing information may also include information about the audience, such as demographic information.
在各种实施方式中,媒体显示设备可以配置有或连接到视频匹配系统。视频匹配系统可以能够识别在任何给定时刻由媒体显示设备显示或播放的媒体。如上所讨论,视频匹配系统可以对由设备正在播放的媒体进行视频和/或音频采样,从样本中生成标识符或“线索”,并且然后将线索与数据库进行匹配。通过识别在媒体显示设备上显示或播放的媒体,视频匹配系统可以能够提供背景相关的内容,包括应用程序、广告和/或替代媒体内容。In various embodiments, a media display device may be configured with or connected to a video matching system. The video matching system may be able to identify the media being displayed or played by the media display device at any given moment. As discussed above, the video matching system may sample the video and/or audio of the media being played by the device, generate identifiers or "clues" from the samples, and then match the clues against a database. By identifying the media being displayed or played on the media display device, the video matching system may be able to provide contextually relevant content, including applications, advertisements, and/or alternative media content.
当媒体显示设备可用的多个内容流或频道播放相同的内容(诸如“突发新闻”)时,该内容可能不是唯一可识别的。例如,在没有附加信息的情况下,可能不清楚由媒体显示设备正在显示哪个频道。When multiple content streams or channels available to a media display device play the same content (such as "breaking news"), the content may not be uniquely identifiable. For example, without additional information, it may not be clear which channel is being displayed by the media display device.
图6示出了多个内容流同时携带相同的媒体段的示例。图6进一步示出了一些方法示例,籍此视频匹配系统可以适应携带相同媒体段的多个频道。在所示示例中,给出了三个频道作为多个内容流的示例。这三个频道可以是例如三个广播电视频道。Figure 6 illustrates an example of multiple content streams simultaneously carrying the same media segment. Figure 6 further illustrates some example methods by which the video matching system can adapt to multiple channels carrying the same media segment. In the illustrated example, three channels are used as an example of multiple content streams. These three channels could be, for example, three broadcast television channels.
在该示例中,频道2 601播放常规调度的节目的两个段602、604。假设媒体显示设备正在播放频道2 601,则在两个时间间隔t1 603和t2 605期间,媒体显示设备将来自段1602和段2 604的样本发送到视频匹配系统。在时间间隔t1 603结束时,视频匹配系统能够识别段1 602,并且在时间间隔t2 605结束时,视频匹配系统能够识别段2 604。In this example, channel 2 601 plays two segments 602, 604 of a regularly scheduled program. Assuming the media display device is playing channel 2 601, during two time intervals t1 603 and t2 605, the media display device sends samples from segment 1 602 and segment 2 604 to the video matching system. At the end of time interval t1 603, the video matching system is able to identify segment 1 602, and at the end of time interval t2 605, the video matching system is able to identify segment 2 604.
在第三段606期间,频道2 601被公共媒体段(在此为直播池馈送段608)中断。该示例的直播池馈送段608是由例如国家广播公司提供的公共视频段。直播池馈送段608可供其每个联合站使用。直播池馈送的示例是“突发新闻”,也就是说,全国性新闻报道。直播池馈送段的其它示例包括体育赛事、联合节目和商业广告。在时间间隔t3 607期间,媒体显示设备可以将来自直播池馈送段608的样本发送到视频匹配系统。During the third segment 606, channel 2 601 is interrupted by a public media segment, here a live pool feed segment 608. The live pool feed segment 608 in this example is a public video segment provided by, for example, a national broadcaster. The live pool feed segment 608 is available to each of its syndicated stations. An example of a live pool feed is "breaking news," that is, national news reports. Other examples of live pool feed segments include sporting events, syndicated programs, and commercials. During time interval t3 607, the media display device can send samples from the live pool feed segment 608 to the video matching system.
视频匹配系统可以确定在时间间隔t3 607期间提供的样本是用于直播池馈送段608。在各种实施方式中,视频匹配系统可以基于找到与多个频道相关联的直播池馈送段的匹配线索来做出该确定。在确定频道2 601正在显示直播池馈送段608时,在一些实施方式中,视频匹配系统可将直播池段608视为在频道2 601上最近检测到的节目的延续。在这些实施方式中,视频匹配系统可以基于观众在直播池馈送段608开始的准确时刻改变频道的较低概率来做出该确定。在一些实施方式中,视频匹配系统可以进一步确定非预期的直播池馈送段608不可能与任何调度的交互式或指向性的内容相关。因此,在这些实施方式中,可以抑制或禁用所调度的交互式或指向性的内容。指向性的内容可能与未调度的直播池馈送段608不相关,因此显示例如交互式覆盖可能对观众是无益的。The video matching system can determine that the sample provided during time interval t3 607 is for a live pool feed segment 608. In various embodiments, the video matching system can make this determination based on finding matching clues for live pool feed segments associated with multiple channels. Upon determining that channel 2 601 is displaying live pool feed segment 608, in some embodiments, the video matching system can consider live pool segment 608 to be a continuation of the most recently detected program on channel 2 601. In these embodiments, the video matching system can make this determination based on the low probability of a viewer changing channels at the exact moment live pool feed segment 608 begins. In some embodiments, the video matching system can further determine that the unintended live pool feed segment 608 is unlikely to be related to any scheduled interactive or targeted content. Therefore, in these embodiments, the scheduled interactive or targeted content can be suppressed or disabled. Targeted content may not be related to the unscheduled live pool feed segment 608, and therefore, displaying, for example, an interactive overlay may not be beneficial to the viewer.
在直播池馈送段608结束时,频道2 601可以显示段4 609。在一些情况下,段4 609可以是调度的节目,意味着直播池馈送段608已经使段3 606中断,或在段3 606结束时进入,并且正在播放,而不是在段3 606之后调度什么内容。例如,段4 609可以是段3 606的节目的延续,它将在如果直播池段608没有播放的情况下在节目本该播放的时间点播放。作为另一个示例,段4 609可以是调度为在段3 606之后开始的新节目。段4 609可以在新节目的起始处开始,或者起始的一些部分可能已经被直播池馈送所忽视。At the end of live pool feed segment 608, channel 2 601 may display segment 4 609. In some cases, segment 4 609 may be a scheduled program, meaning that live pool feed segment 608 has interrupted segment 3 606, or entered at the end of segment 3 606, and is currently playing, rather than having something scheduled after segment 3 606. For example, segment 4 609 may be a continuation of the program from segment 3 606, playing at the same time that the program would have played if live pool segment 608 had not played. As another example, segment 4 609 may be a new program scheduled to begin after segment 3 606. Segment 4 609 may begin at the beginning of the new program, or some portion of the beginning may have been omitted by the live pool feed.
在一些情况下,与其忽视在段3 606和段4 609之间本应显示的节目,倒不如将段3606的节目暂停。一旦直播池馈送段608结束,则段3 606的节目可在段4 609中恢复,在段3中节目停止的位置重新开始。可替代地,段3 606的节目可在段4 609中重启。在一些实施方式中,在直播池馈送段608结束时,可以让观众选择是恢复段3 606的节目还是从头开始再次播放节目。In some cases, rather than ignoring the program that would have been displayed between segments 3 606 and 4 609, the program in segment 3 606 may be paused. Once the live pool feed segment 608 ends, the program in segment 3 606 may be resumed in segment 4 609, starting over at the point where the program in segment 3 stopped. Alternatively, the program in segment 3 606 may be restarted in segment 4 609. In some embodiments, at the end of the live pool feed segment 608, the viewer may be given the option of resuming the program in segment 3 606 or starting the program over again from the beginning.
图6中的另一个示例由频道7 610提供。在频道7 610上,段1 611和段2 612是规则地调度的。此后,频道7 610被直播池段613中断。在该示例中,媒体显示设备可以在直播池段613开始之后不久或当直播池段613正在播送时调谐到频道7 610中。在两个时间间隔t1614和t2 615期间,媒体显示设备可以将来自直播池馈送段613的样本发送到视频匹配系统。视频匹配系统可以随后确定频道7 610正在播放直播池馈送段613。在各种实施方式中,视频匹配系统可以例如基于找到与多个频道相关联的直播池馈送段613的匹配线索来做出该确定。在该示例中,视频匹配系统可能不能确定媒体显示设备当前正在播放哪个频道。在一些实施方式中,视频匹配系统可以避免在直播池馈送段正在播出的同时提供背景相关的内容。Another example in Figure 6 is provided by channel 7 610. On channel 7 610, segments 1 611 and 2 612 are regularly scheduled. Thereafter, channel 7 610 is interrupted by a live pool segment 613. In this example, the media display device can tune into channel 7 610 shortly after live pool segment 613 begins, or while live pool segment 613 is being broadcast. During two time intervals t1 614 and t2 615, the media display device can send samples from live pool feed segment 613 to the video matching system. The video matching system can then determine that channel 7 610 is playing live pool feed segment 613. In various embodiments, the video matching system can make this determination based on, for example, finding matching clues for live pool feed segments 613 associated with multiple channels. In this example, the video matching system may not be able to determine which channel the media display device is currently playing. In some embodiments, the video matching system can avoid providing contextually relevant content while a live pool feed segment is being broadcast.
在直播池馈送段613结束时,频道7 610恢复到用段3 616调度节目。在时间间隔t3 617期间,媒体显示设备可将来自段3 616的样本发送到视频匹配系统,并且此时,视频匹配系统可以能够确定媒体显示设备被调谐到频道7 610。在一些实施方式中,视频匹配系统可以将在时间间隔t1 614和t2 615期间取得的样本与频道7 610相关联。视频匹配系统可以进一步向观众提供与频道7 610相关的背景相关的内容。At the end of the live pool feed segment 613, channel 7 610 reverts to scheduling programming using segment 3 616. During time interval t3 617, the media display device can send samples from segment 3 616 to the video matching system, and at this time, the video matching system can be able to determine that the media display device is tuned to channel 7 610. In some embodiments, the video matching system can associate samples taken during time intervals t1 614 and t2 615 with channel 7 610. The video matching system can further provide the viewer with contextually relevant content related to channel 7 610.
另一个示例在图6中由频道9 618示出。在该示例中,频道9 618显示规则调度的段1 619、段2 620、段3 622、段4 624和段5 626。假设媒体显示设备被调谐到频道9 618中,在时间间隔t1 621、t2 623和t3 625期间,媒体显示设备将来自段620、622,624的样本发送到视频匹配系统。在一些实施方式中,视频匹配系统可以确定频道9 618不包括直播池馈送段。视频匹配系统可以进一步提供背景相关的内容,诸如例如用观众的特别感兴趣的商业广告替换原先的商业广告,其中替换商业广告可以基于之前从媒体显示设备收集的信息。Another example is shown in FIG6 by channel 9 618. In this example, channel 9 618 displays the regularly scheduled segments 1 619, 2 620, 3 622, 4 624, and 5 626. Assuming the media display device is tuned to channel 9 618, during time intervals t1 621, t2 623, and t3 625, the media display device sends samples from segments 620, 622, and 624 to the video matching system. In some embodiments, the video matching system may determine that channel 9 618 does not include a live pool feed segment. The video matching system can further provide contextually relevant content, such as replacing a previous commercial with one of particular interest to the viewer, where the replacement commercial can be based on information previously collected from the media display device.
在一些情况下,频道可以在公共池馈送段已经开始之后开始显示公共池馈送。图7示出了另一个示例,其中这里示出为电视频道的多个媒体流同时显示相同的媒体内容。在该示例中,频道2(ch 2)701在切换到直播池馈送段708之前显示规则调度的段1 702、段2704和段3 706。如果媒体显示设备在大约直播池馈送段708开始时调谐到频道2 701,则媒体显示设备可以在时间间隔t1 703、t2 705、t3 707期间和随后的段4709期间的t4 717期间将样本发送到视频匹配系统。如上所讨论,视频匹配系统可以在接收到来自段4 709的样本时确定媒体显示设备被调谐到频道2 701中。In some cases, a channel may begin displaying a common pool feed after the common pool feed segment has already begun. FIG. 7 shows another example in which multiple media streams, shown here as television channels, simultaneously display the same media content. In this example, channel 2 (ch 2) 701 displays regularly scheduled segments 1 702, 2 704, and 3 706 before switching to a live pool feed segment 708. If a media display device is tuned to channel 2 701 at approximately the start of the live pool feed segment 708, the media display device may send samples to the video matching system during time intervals t1 703, t2 705, t3 707, and during t4 717 during the subsequent segment 4 709. As discussed above, the video matching system may determine that the media display device is tuned to channel 2 701 upon receiving samples from segment 4 709.
该示例的频道7 710显示规则调度的段1 711和段2 712。虽然直播池馈送段713开始于时间点721,但是频道7 710延迟切换到直播池馈送段713。这种延迟可能是因为例如段2 712超时运行,因为频道7 710的规划员确定允许段2 712结束,和/或因为段2 712包括对直播池馈送段713的介绍。假设媒体显示设备在时间721附近被调谐到频道7 710,媒体显示设备可以在时间间隔t1 714和t2 715期间将样本发送到视频匹配系统。视频匹配系统可以确定频道7 710正在播放直播池馈送段713,但可能无法确定媒体显示设备正在播放哪个频道。Channel 7 710 of this example displays regularly scheduled segments 1 711 and 2 712. Although live pool feed segment 713 begins at time 721, channel 7 710 delays switching to live pool feed segment 713. This delay may be due to, for example, segment 2 712 timing out, because the scheduler of channel 7 710 determined to allow segment 2 712 to end, and/or because segment 2 712 includes an introduction to live pool feed segment 713. Assuming that the media display device is tuned to channel 7 710 around time 721, the media display device can send samples to the video matching system during time intervals t1 714 and t2 715. The video matching system can determine that channel 7 710 is playing live pool feed segment 713, but may not be able to determine which channel the media display device is playing.
在另一个示例场景中,媒体显示设备最初可以被调谐到频道7 1310,并且然后在时间721切换到频道2 701。在时间721处,频道2 701可能已经在播放直播池馈送段708。因为直播池馈送段708与多个频道相关联,所以在时间间隔t2 705和t3 707期间,视频匹配系统可能不能确定媒体显示设备已经改变到哪个频道。一旦段4 709显示,则视频匹配设备就可以确定媒体显示设备被调谐到频道2 701中。在做出该确定之后,视频匹配系统可以将来自t 2 705和t 3 707的样本与频道2 701相关联。In another example scenario, the media display device may initially be tuned to channel 7 1310 and then switch to channel 2 701 at time 721. At time 721, channel 2 701 may already be playing live pool feed segment 708. Because live pool feed segment 708 is associated with multiple channels, the video matching system may not be able to determine which channel the media display device has changed to during the time interval t2 705 and t3 707. Once segment 4 709 is displayed, the video matching system may determine that the media display device is tuned to channel 2 701. After making this determination, the video matching system may associate the samples from t2 705 and t3 707 with channel 2 701.
图8示出了多个媒体内容流几乎同时携带相同的公共媒体段810的示例。在该示例中,两个媒体内容流804a、804b被传递到两个不同的媒体显示设备806a、806b。在其它示例中,各自携带相同媒体内容的许多不同的媒体内容流可以同时被递送到许多不同的媒体显示设备。在该示例中,两个媒体显示设备806a、806b可以例如在同一家庭中,由同一家庭中的两个不同的人808a、808b使用。例如,一个家庭成员808a正在客厅中的电视806a上观看电视节目,而另一个家庭成员808b正在书房中的膝上型计算机上观看电视节目。可替代地,两个媒体显示设备806a、806b和使用它们的人808a、808b可以是不相关的并且位于不同的位置。Fig. 8 shows the example that multiple media content streams carry identical common media segment 810 almost simultaneously.In this example, two media content streams 804a, 804b are passed to two different media display devices 806a, 806b.In other examples, many different media content streams that carry the same media content separately can be delivered to many different media display devices simultaneously.In this example, two media display devices 806a, 806b can be, for example, in the same family, used by two different people 808a, 808b in the same family.For example, a family member 808a is watching TV program on the TV 806a in the living room, and another family member 808b is watching TV program on the laptop computer in the study.Alternatively, two media display devices 806a, 806b and the people 808a, 808b that use them can be irrelevant and be located at different locations.
几乎同时,两个人808a、808b可以将他们的显示设备806a、806b调谐到相同的媒体段810中。例如,两个人808a、808b可以各自决定观看相同的电影。结果,两个媒体显示设备806a、806b可能在给定的时刻正好显示完全相同的媒体段810。可替代地,在给定时刻,在由每个设备806a、806b显示的内容之间可能存在几秒或几分钟的时间差。例如,在电影中,电视806a可以比膝上型计算机806b快上几秒钟。At approximately the same time, two people 808a, 808b may tune their display devices 806a, 806b to the same media segment 810. For example, two people 808a, 808b may each decide to watch the same movie. As a result, both media display devices 806a, 806b may be displaying the exact same media segment 810 at a given moment. Alternatively, at a given moment, there may be a time difference of a few seconds or minutes between the content displayed by each device 806a, 806b. For example, during a movie, the television 806a may be several seconds ahead of the laptop 806b.
在该示例中,媒体内容流804a、804b是通过互联网850传递的数字音频-视频流和/或音频流。例如,媒体内容流可以包括由网站提供的电影、电视节目、音乐、文本,和/或图像。媒体内容流804a、804b可以各自由不同的内容提供商(提供商A 802a和提供商B 802b)提供。提供商A 802a和B 802b可以是例如互联网电影、音乐和/或电视提供商。可替代地,在一些情况下,提供商A 802a和提供商B 802b可以是相同的内容提供商。In this example, media content streams 804a, 804b are digital audio-video streams and/or audio streams transmitted over the Internet 850. For example, media content streams can include movies, television programs, music, text, and/or images provided by a website. Media content streams 804a, 804b can be provided by different content providers (provider A 802a and provider B 802b). Provider A 802a and B 802b can be, for example, Internet movies, music and/or television providers. Alternatively, in some cases, provider A 802a and provider B 802b can be identical content providers.
在图8的示例中,媒体显示设备806a、806b二者可以连接到配置有视频匹配系统的计算设备(未示出)。当视频匹配系统正在播放时,视频匹配系统可尝试使用从正被播放的媒体内容中提取的线索来识别设备806a、806b中的每一个正在播放的媒体内容。视频匹配系统可以将这些线索与数据库匹配,并且从数据库中确定信息,诸如正在播放的媒体内容的标题、媒体内容的创建者、制作者和/或发行者的身份,正在观看媒体内容的人的身份,和/或观众的设备的身份。例如,在该示例中,视频匹配系统可以能够通过检查媒体内容流804a来获得提供商A 802a的身份。作为另一示例,视频匹配系统可以能够将TV 806a与同一个人808a拥有的智能电话区分开来。另外,视频匹配系统可以能够确定某人808a的电视806a和智能电话位于不同的地方。视频匹配系统可以进一步使用该信息向媒体显示设备提供背景相关的内容,诸如交互式信息和/或广告,媒体显示设备可以向观众显示背景相关的内容。In the example of Fig. 8, media display device 806a, 806b can be connected to the computing device (not shown) that is configured with video matching system.When video matching system is playing, video matching system can attempt to use the clue that extracts from the media content being played to identify each media content being played in device 806a, 806b.Video matching system can be with these clues and database match, and from database, determine information, such as the identity of the creator, producer and/or publisher of the title of the media content being played, the identity of the people who are watching media content, and/or the identity of audience's equipment.For example, in this example, video matching system can be able to obtain the identity of provider A 802a by checking media content stream 804a.As another example, video matching system can be able to distinguish the smart phone that TV 806a has with same person 808a.In addition, video matching system can be able to determine that TV 806a and smart phone of someone 808a are located in different places. The video matching system may further use this information to provide contextually relevant content, such as interactive information and/or advertisements, to the media display device, which may display the contextually relevant content to the viewer.
如上所讨论,当两个示例媒体内容流804a、804b几乎同时显示相同的媒体段810时,视频匹配系统可能不能确定一些信息。例如,尽管视频匹配系统能够识别正在播放公共媒体段810的电视806a和膝上型计算机806b,但是单凭该信息可能不足以使视频匹配系统确定专门用于每个设备的背景相关的内容。例如,如果视频匹配系统被提供有信息,诸如观看两个设备的人808a、808b的特征或身份,则视频匹配系统可以能够定制第一人808a的背景相关的内容,同时提供用于第二人808b的不同的背景相关的内容。As discussed above, when two example media content streams 804a, 804b display the same media segment 810 at approximately the same time, the video matching system may be unable to determine some information. For example, although the video matching system is able to identify that the television 806a and the laptop computer 806b are playing the common media segment 810, this information alone may not be sufficient for the video matching system to determine context-dependent content specifically for each device. For example, if the video matching system is provided with information such as the characteristics or identities of the persons 808a, 808b viewing the two devices, the video matching system may be able to customize context-dependent content for the first person 808a while providing different context-dependent content for the second person 808b.
为了确定背景相关的内容,视频匹配系统可以使用上面关于图6和图7讨论的方法。在各种实施方式中,视频匹配系统可以确定在图8的媒体内容流804a、804b中包括的其它媒体内容的标识信息。例如,在一种情况下,观看电视806a的人808a在调谐到公共媒体段810之前可能已经观看了体育新闻。视频匹配系统可以使用用于该先前的媒体内容的标识信息来识别媒体内容流804a。例如,视频匹配系统可以能够将媒体内容流804a识别成与电视806a和/或观看电视806a的人808a相关联的。视频匹配系统可以进一步能够将背景相关的内容提供给电视806a。例如,视频匹配系统可以提供某人808a最喜欢的球队的新闻和/或为体育赛事或运动装备提供广告。To determine context-relevant content, the video matching system can use the methods discussed above with respect to Figures 6 and 7. In various embodiments, the video matching system can determine identification information of other media content included in the media content streams 804a and 804b of Figure 8. For example, in one scenario, person 808a watching television 806a may have watched sports news before tuning into the common media segment 810. The video matching system can use the identification information for this previous media content to identify media content stream 804a. For example, the video matching system can be able to identify media content stream 804a as being associated with television 806a and/or person 808a watching television 806a. The video matching system can further be able to provide context-relevant content to television 806a. For example, the video matching system can provide news about person 808a's favorite team and/or advertisements for sporting events or sports equipment.
作为另一示例,观看膝上型计算机806b的人808b在调谐到公共媒体段810之前可能没有使用膝上型计算机806b来观看媒体内容。相反,该人808b可以在公共媒体段810中间或之后,观看其它媒体内容。例如,在公共媒体段810中的商业广告中断期间,人808b可以使用膝上型计算机806b来购买学习用品。视频匹配系统可以使用在公共媒体段810中间或之后显示的该其它媒体内容来识别媒体内容流804b。例如,视频匹配系统可以将媒体内容流识别成与膝上型计算机806b和/或使用膝上型计算机806b的人808b相关联的。视频匹配系统可以进一步将背景相关的内容提供给膝上型计算机806b。例如,视频匹配系统可以提供与学校用品有关的广告,关于在哪里购物的建议,和/或在何处找到商家的建议。As another example, person 808b viewing laptop computer 806b may not have used laptop computer 806b to view media content prior to tuning into common media segment 810. Instead, person 808b may view other media content during or after common media segment 810. For example, during a commercial break in common media segment 810, person 808b may use laptop computer 806b to purchase school supplies. The video matching system may use this other media content displayed during or after common media segment 810 to identify media content stream 804b. For example, the video matching system may identify the media content stream as being associated with laptop computer 806b and/or person 808b using laptop computer 806b. The video matching system may further provide contextually relevant content to laptop computer 806b. For example, the video matching system may provide advertisements related to school supplies, suggestions on where to shop, and/or suggestions on where to find merchants.
在各种实施方式中,当媒体内容流正在播放公共媒体段时,视频匹配系统可以使用其它方法来识别媒体内容流。在一些情况下,媒体内容流的提供商可以提供叠加到公共媒体段上的图形元素。图9示出了已被叠加到由媒体显示设备900正在播放的公共媒体段上的图形902的一个示例。在各种实施方式中,内容提供商可以使用图形902来识别内容提供商,特别是当多个内容提供商正在播放公共媒体段时。例如,多个地方电视频道可能同时播放同一个国家新闻段。地方广播公司可以向新闻段添加图形902以识别其自身。图形902可以是例如徽标、节目信息或其它信息。In various embodiments, when a media content stream is playing a common media segment, the video matching system can use other methods to identify the media content stream. In some cases, the provider of the media content stream can provide a graphic element superimposed on the common media segment. Figure 9 shows an example of a graphic 902 that has been superimposed on the common media segment being played by the media display device 900. In various embodiments, the content provider can use the graphic 902 to identify the content provider, particularly when multiple content providers are playing common media segments. For example, multiple local television channels may play the same national news segment at the same time. Local broadcasters can add graphics 902 to the news segment to identify themselves. The graphic 902 can be, for example, a logo, program information, or other information.
图9示出了一些内容提供商可以使用的图形的大小和位置的一个示例。在其它示例中,图形902可以是横跨屏幕的底部或顶部的横幅或滚动条的形状,在屏幕的左侧或右侧的栏,或者可以看上去叠加在屏幕的中心上。Fig. 9 shows an example of the size and position of some content providers' usable graphics. In other examples, graphics 902 can be the shape of a banner or scroll bar at the bottom or top across the screen, in the hurdle on the left or right side of the screen, or can appear to be superimposed on the center of the screen.
在各种实施方式中,用于检测图形覆盖的方法可以检查视频显示,并且找到视频图像边缘。视频图像边缘可以通过在媒体显示设备的屏幕的各部分之间寻找高对比度差来检测。该方法可以进一步包括监视所检测到的边缘是否保持静止。当检测到的边缘保持在特定位置的时间比短持续时间更长时,视频匹配系统可以确定它已经找到了屏幕上的图形。例如,视频匹配系统可以寻找屏幕的底部区域的高对比度差,这可以指示屏幕上的横幅的存在。In various embodiments, a method for detecting graphic overlays can examine a video display and locate video image edges. Video image edges can be detected by searching for high contrast differences between portions of a screen of a media display device. The method can further include monitoring whether the detected edge remains stationary. When the detected edge remains in a particular position for longer than a short duration, the video matching system can determine that it has found the graphic on the screen. For example, the video matching system can search for high contrast differences in the bottom area of the screen, which can indicate the presence of an on-screen banner.
在各种实施方式中,上述视频匹配系统可以包括用于检测图形覆盖的方法。例如,如上所讨论,可以为媒体显示设备的屏幕定义像素块。“像素块”可以被定义为从媒体显示设备的屏幕采样的像素区块。像素块可以包含一定数量的像素,每个像素可以具有例如RGB颜色值(或YUV或以某种其它格式表示的颜色值)。出于图形覆盖检测的目的,像素块可以是例如32像素宽×32像素高或32像素的倍数,诸如64像素宽×64像素高。这些示例大小可以利用离散余弦变换(DCT)。视频匹配系统可以执行离散余弦变换功能。通过检查每个像素块的离散余弦变换的右下象限中的系数,可以检测图形覆盖的边缘。In various embodiments, the above-mentioned video matching system may include a method for detecting graphic overlays. For example, as discussed above, pixel blocks can be defined for the screen of a media display device. A "pixel block" can be defined as a pixel block sampled from the screen of a media display device. A pixel block can contain a certain number of pixels, each pixel can have, for example, an RGB color value (or a color value represented by YUV or some other format). For the purpose of graphic overlay detection, a pixel block can be, for example, 32 pixels wide by 32 pixels high or a multiple of 32 pixels, such as 64 pixels wide by 64 pixels high. These example sizes can utilize discrete cosine transform (DCT). The video matching system can perform a discrete cosine transform function. By examining the coefficients in the lower right quadrant of the discrete cosine transform of each pixel block, the edge of the graphic overlay can be detected.
在各种实施方式中,检测过程还可以包括在预定的时间长度内检测来自离散余弦变换的高频信息是否未改变。当高频信息没有改变时,可能会出现图形覆盖。在这些实施方式中,可以识别一些屏幕上的图形,诸如滚动横幅。In various embodiments, the detection process may further include detecting whether the high-frequency information from the discrete cosine transform has not changed within a predetermined time period. When the high-frequency information has not changed, a graphic overlay may appear. In these embodiments, some on-screen graphics, such as a scrolling banner, may be identified.
在各种实施方式中,其它屏上图形检测方法可以使用诸如Sobel和Sharr的算法,或者可以使用来自图像分析的知觉散列族的算法。与离散余弦变换一样,这些算法也可用于检测视频信号内的图形元素的边缘和角落。在一些情况下,具有奇数个像素(例如3个像素×3个像素)的像素块可以用于感兴趣的视频区域上的卷积编码的逐步扫描以搜索边缘。In various embodiments, other on-screen pattern detection methods may use algorithms such as Sobel and Sharr, or algorithms from the perceptual hashing family of image analysis. As with discrete cosine transforms, these algorithms can also be used to detect edges and corners of graphic elements within a video signal. In some cases, a pixel block with an odd number of pixels (e.g., 3 pixels x 3 pixels) may be used for a convolutionally coded progressive scan over the video region of interest to search for edges.
在各种实施方式中,检测屏幕上图形可开始于减少来自8位红-绿-蓝(RGB)值和8位单色值的像素块中的像素信息。接下来,可以应用高斯模糊来减少视频信息中的噪声。接下来,可以将像素矩阵(即,所得的像素块)传送到感兴趣的视频区域。然后可以使用该矩阵来计算像素值相对于视频屏幕的垂直轴或水平轴的一阶微分。计算出的微分留在相应的像素位置中。可以针对指示边缘的最大值检查该微分。In various embodiments, detecting on-screen graphics can begin by reducing pixel information from a pixel block of 8-bit red-green-blue (RGB) values and 8-bit monochrome values. Next, a Gaussian blur can be applied to reduce noise in the video information. Next, the pixel matrix (i.e., the resulting pixel block) can be transferred to the video region of interest. This matrix can then be used to calculate the first-order differential of the pixel value with respect to the vertical or horizontal axis of the video screen. The calculated differential is retained in the corresponding pixel location. This differential can be checked for a maximum value that indicates an edge.
在各种实施方式中,用于检测图形覆盖的另一种方法是利用可用于图形覆盖的各种图形来训练视频匹配系统。然后可以使用图像匹配算法来将训练或学习的图形与屏幕上的像素进行匹配。例如,视频匹配系统可以使用知觉散列(pHash)方法来执行匹配。其它帧比较方法的示例包括比例不变的特征变换(SIFT)和加速鲁棒特征(SURF)。在使用pHash的实施方式中,可以快速处理整个视频帧。得到的散列值可以与参考视频图像进行比较。这些参考视频图像也可以使用pHash进行处理,并且可以从中央服务器提供。使用pHash的优点之一在于它能够可靠地匹配对对比度、亮度或颜色变化具有相对较高的不敏感性的粗糙特征(例如大的矩形或可以被图形覆盖使用的其它形状)。pHash的另一个优点是它也能够匹配详细的单个视频帧。In various embodiments, another method for detecting graphic overlays is to train a video matching system using various graphics that can be used for graphic overlays. An image matching algorithm can then be used to match the trained or learned graphics with pixels on the screen. For example, the video matching system can use a perceptual hashing (pHash) method to perform matching. Examples of other frame comparison methods include scale-invariant feature transforms (SIFT) and speeded-up robust features (SURF). In an embodiment using pHash, the entire video frame can be processed quickly. The resulting hash value can be compared with a reference video image. These reference video images can also be processed using pHash and can be provided from a central server. One of the advantages of using pHash is that it can reliably match coarse features that have a relatively high insensitivity to contrast, brightness, or color changes (such as large rectangles or other shapes that can be used by graphic overlays). Another advantage of pHash is that it can also match detailed single video frames.
在各种实施方式中,视频匹配系统可以进一步维护不同的可能的图形覆盖比较候选的库。此外,视频匹配系统可以使用该库而不增加在单位时间内进行的总图像搜索的总数量。具体地,在一些实施方式中,视频匹配系统可以跟踪成功的检测。成功和频繁地匹配的图形覆盖比较候选更有可能在未来匹配,而不经常匹配或未成功匹配的候选未来不太可能匹配。In various embodiments, the video matching system can further maintain a library of different possible graph overlay comparison candidates. Furthermore, the video matching system can utilize this library without increasing the total number of image searches performed per unit time. Specifically, in some embodiments, the video matching system can track successful detections. Graph overlay comparison candidates that are successfully and frequently matched are more likely to match in the future, while candidates that are infrequently matched or unsuccessfully matched are less likely to match in the future.
在各种实施方式中,图形覆盖检测可以与用于自动内容识别的过程交替进行。In various implementations, graphical overlay detection may be interleaved with the process for automatic content recognition.
图10示出了当多个媒体内容流包括相同的未调度媒体段时可以实现的过程1000的示例。过程1000可以由计算设备来实现,其中计算设备已经被配置有诸如上面所述的视频匹配系统。10 illustrates an example of a process 1000 that may be implemented when multiple media content streams include the same unscheduled media segments. The process 1000 may be implemented by a computing device that has been configured with a video matching system such as described above.
在步骤1002处,计算设备可以接收多个媒体内容流。计算设备可以被配置为识别在特定时间由特定媒体显示设备(例如,电视、平板计算机、膝上型计算机等)正在播放的媒体内容。多个媒体内容流中的至少两个可以同时包括相同的未调度段。例如,两个媒体内容流二者可能都包括“突发新闻”段,即重大事件的全国广播。作为另一个示例,媒体内容流可以都包括相同的流式电影,其中电影由使用不同的媒体显示设备的不同的人请求。在该示例中,媒体段是“未调度的”,因为可能没有与媒体显示设备相关联的节目调度。At step 1002, a computing device may receive multiple media content streams. The computing device may be configured to identify media content being played by a specific media display device (e.g., a television, tablet computer, laptop computer, etc.) at a specific time. At least two of the multiple media content streams may simultaneously include the same unscheduled segment. For example, two media content streams may both include a "breaking news" segment, i.e., a national broadcast of a major event. As another example, the media content streams may both include the same streaming movie, where the movie is requested by different people using different media display devices. In this example, the media segment is "unscheduled" because there may be no program schedule associated with the media display device.
在步骤1004处,计算设备可以确定特定媒体显示设备在当前时间正在播放媒体内容流中的未调度媒体段。计算设备可以通过检查在多个媒体内容流的每一个中在当前时间可用的媒体内容来做出该确定。例如,多个媒体内容流可以包括两个或更多个地方电视频道,并且这些地方电视频道中的两个或更多个都可以接收突发新闻馈送。At step 1004, the computing device may determine that a particular media display device is currently playing an unscheduled media segment in a media content stream. The computing device may make this determination by examining the media content available at the current time in each of a plurality of media content streams. For example, the plurality of media content streams may include two or more local television channels, and two or more of the local television channels may each receive a breaking news feed.
在步骤1006处,计算设备可以从由包含在当前时间由特定媒体显示设备播放的媒体内容流中的媒体内容确定标识信息。例如,计算设备可以使用在未调度媒体段之前由特定媒体显示设备播放的媒体内容提供的标识信息。可替代地或另外地,计算设备可以使用在未调度媒体段之后播放的媒体内容提供的标识信息。标识信息可以识别媒体内容流。例如,标识信息可以识别频道、服务提供商、特定媒体显示设备和/或使用特定媒体显示设备的人。At step 1006, the computing device may determine identification information from the media content contained in the media content stream being played by the specific media display device at the current time. For example, the computing device may use identification information provided by media content played by the specific media display device before the unscheduled media segment. Alternatively or additionally, the computing device may use identification information provided by media content played after the unscheduled media segment. The identification information may identify the media content stream. For example, the identification information may identify a channel, a service provider, a specific media display device, and/or a person using the specific media display device.
在步骤1008处,计算设备可以确定背景相关的内容。背景相关的内容可以包括例如交互式信息、广告和/或附加内容的建议等。当由特定媒体显示设备正在播放未调度媒体段时,可以禁用背景相关的内容。At step 1008, the computing device may determine contextually relevant content. The contextually relevant content may include, for example, interactive information, advertisements, and/or suggestions for additional content. The contextually relevant content may be disabled when an unscheduled media segment is being played by a particular media display device.
在步骤1010处,计算设备可以在未调度媒体段已被播放之后显示媒体内容流和背景相关的内容。例如,计算设备可以在紧随未调度媒体段的媒体内容上覆盖背景相关的信息。可替代地或另外地,计算设备可以在未调度媒体段之后并且在播放附加的媒体内容之前插入背景相关的信息。At step 1010, the computing device may display the media content stream and context-related content after the unscheduled media segment has been played. For example, the computing device may overlay the context-related information on the media content immediately following the unscheduled media segment. Alternatively or additionally, the computing device may insert the context-related information after the unscheduled media segment and before playing the additional media content.
现在将更详细地讨论与将来自未知媒体内容的线索匹配到参考数据库中的候选者相关的各种方法。这些方法包括上面关于图2所讨论的最近邻居搜索过程。Various methods related to matching clues from unknown media content to candidates in a reference database will now be discussed in more detail. These methods include the nearest neighbor search process discussed above with respect to FIG.
如上所讨论,视频匹配系统可以被配置成当媒体内容流包括未调度的媒体段时识别媒体内容流。如上面进一步讨论的,识别媒体内容流可以包括在未调度的媒体段之前或之后识别由媒体显示设备播放的媒体内容。以上关于图1讨论了用于识别媒体内容的过程。具体地,视频内容系统可以使用从媒体内容设备的显示机构取得的样本(例如,图形和/或音频样本)并且从这些样本生成线索。视频匹配系统然后可以将线索与参考数据库匹配,其中数据库包含已知媒体内容的线索。As discussed above, the video matching system can be configured to identify a media content stream when the media content stream includes unscheduled media segments. As further discussed above, identifying a media content stream can include identifying the media content played by a media display device before or after the unscheduled media segments. The process for identifying media content is discussed above with respect to FIG1 . Specifically, the video content system can use samples (e.g., graphics and/or audio samples) obtained from the display mechanism of the media content device and generate clues from these samples. The video matching system can then match the clues with a reference database, wherein the database contains clues of known media content.
视频匹配系统可以进一步包括用于提高查找数据库中的匹配的效率的各种方法。数据库可以包含大量的线索,并且因此视频匹配系统可以包括用于查找潜在匹配或所匹配的“候选”的算法。视频匹配系统可以进一步包括用于确定哪些候选线索实际匹配从媒体内容设备的显示机构生成的线索的算法。定位候选提示可能比用于将线索值与数据库中的值进行匹配的其它方法(诸如,将线索与数据库中的每个条目匹配)更有效。The video matching system may further include various methods for improving the efficiency of finding matches in the database. The database may contain a large number of clues, and therefore the video matching system may include an algorithm for finding potential matches or "candidates" to be matched. The video matching system may further include an algorithm for determining which candidate clues actually match the clues generated from the display mechanism of the media content device. Locating candidate clues may be more efficient than other methods for matching clue values with values in the database (such as matching the clue with each entry in the database).
最近邻居和路径追踪是可用于将未知线索与参考数据库中的候选线索匹配的技术的示例。路径追踪是从许多可能的点中识别相关的一系列点的数学方法。最近邻居是一种可以识别候选点以执行路径追踪的方法。下面给出将路径最近邻居和路径追踪应用于跟踪使用模糊线索的视频传输的示例,但是一般概念可以应用于从参考数据库中选择候选匹配的任何领域。Nearest neighbor and path tracing are examples of techniques that can be used to match unknown clues to candidate clues in a reference database. Path tracing is a mathematical method for identifying a series of related points from many possible points. Nearest neighbor is a method that can identify candidate points for performing path tracing. Below is an example of using nearest neighbor and path tracing to track video transmissions using ambiguous clues, but the general concepts can be applied to any field where candidate matches are selected from a reference database.
给出了一种用于高效视频追踪的方法。视频追踪是应用路径追踪技术,以解决对于给定的未知视频线索在视频参考数据库中定位匹配候选的问题。给定大量视频段,系统必须能够实时识别给定的查询视频输入取自哪个段以及处于什么样的时间偏移。段和偏移一起被称为位置。该方法被称为视频追踪,因为它必须能够有效地检测和适应暂停、快进、倒带、突然切换到其它段和切换到未知段。在能够追踪实时视频之前,处理数据库。视觉线索(少数像素值)每隔几分之一秒从帧中被取出,并被放入专门的数据结构(请注意,这也可以实时完成)。视频追踪通过不断接收来自输入视频的线索并更新关于其当前位置的一组确信或估计来执行。每个线索或者同意或者不同意该估计,并且它们被调节以反映新的证据。如果对此为真的信心足够高,则假设视频位置是正确的位置。通过仅追踪一小部分可能的“可疑”位置,这可以高效地完成。A method for efficient video tracking is presented. Video tracking is the application of path tracing techniques to the problem of locating matching candidates for a given unknown video cue in a video reference database. Given a large number of video segments, the system must be able to identify in real time which segment a given query video input is taken from and at what time offset. The segment and offset together are called a position. The method is called video tracking because it must be able to efficiently detect and adapt to pauses, fast forwards, rewinds, sudden switches to other segments, and switches to unknown segments. Before being able to track live video, the database is processed. Visual cues (a few pixel values) are taken from the frame every fraction of a second and placed into a specialized data structure (note that this can also be done in real time). Video tracking is performed by continuously receiving cues from the input video and updating a set of beliefs or estimates about its current position. Each cue either agrees or disagrees with the estimate, and they are adjusted to reflect the new evidence. If the confidence that this is true is high enough, the video position is assumed to be the correct one. This can be done efficiently by tracking only a small number of possible "suspect" positions.
描述了用于视频追踪的方法,但是使用数学结构来解释和调查它。引入数学结构的目的是给读者提供在这两个领域之间转换的必要工具。视频信号由连续帧组成。每一帧可以被认为是静止图像。每一帧都是像素的光栅。每个像素由对应于构成该像素颜色的红色、绿色和蓝色(RGB)的三个强度值构成。在本文使用的术语中,线索是帧中的像素的子集的RGB值的列表以及对应的时间戳。线索中的像素数量明显小于帧中的像素数量,通常在5到15之间。作为标量值的有序列表,线索值实际上是向量。该向量也被称为点。The method used for video tracking is described, but mathematical structures are used to explain and investigate it. The purpose of introducing the mathematical structures is to provide the reader with the necessary tools to transition between the two fields. A video signal consists of a series of frames. Each frame can be thought of as a still image. Each frame is a raster of pixels. Each pixel is composed of three intensity values corresponding to the red, green, and blue (RGB) colors that make up the color of that pixel. In the terminology used in this article, a clue is a list of RGB values for a subset of the pixels in a frame, along with the corresponding timestamps. The number of pixels in a clue is significantly smaller than the number of pixels in a frame, typically between 5 and 15. As an ordered list of scalar values, the clue values are actually vectors. This vector is also called a point.
尽管这些点在高维度中,通常在15到150之间,但它们可以被想象成二维中的点。事实上,插图将作为二维绘图给出。现在,考虑视频的进展及其对应的线索点。通常,小的时间变化会导致像素值的小变化。像素点可以被视为在帧之间略微“移动”。从帧到帧之间的这些微小的移动之后,该线索跟随空间中的路径,如珠子串绕在弯曲线上那样。Although these points are in high dimensions, typically between 15 and 150, they can be imagined as points in two dimensions. In fact, the illustrations will be presented as two-dimensional plots. Now, consider the progression of the video and its corresponding clue points. Typically, small changes in time will result in small changes in pixel values. The pixels can be thought of as "moving" slightly between frames. Following these tiny shifts from frame to frame, the clue follows a path in space, like beads strung along a curved line.
在这种类比的背景下,在视频追踪中,接收珠子在空间中的位置(线索点),并且寻找珠子跟随的线的一部分(路径)。出于两个事实这明显更困难。首先,珠子不准确地跟随线,而是与线保持某一变化的未知距离。其次,线都缠在一起。这些陈述在第2节中更为精确。下面描述的算法在两个概念步骤中完成这项任务。当接收到线索时,算法查找所有已知路径上足够接近线索点的所有点;这些点被称为可疑者。这使用等球算法中概率点位置有效地进行。这些可疑者被添加到历史数据结构中,并且计算它们中每一个指示真实位置的概率。该步骤还包括移除不大可能的可疑位置。该历史更新过程一方面确保只保留一小段历史,另一方面永远不会删除可能的位置。通用算法在算法1中给出,并在图11中示出。In the context of this analogy, in video tracking, the position of a bead in space (a clue point) is received, and the part of the line that the bead follows (the path) is found. This is significantly more difficult for two reasons. First, the bead does not follow the line exactly, but rather maintains a varying, unknown distance from it. Second, the lines are all tangled together. These statements are made more precise in Section 2. The algorithm described below accomplishes this task in two conceptual steps. When a clue is received, the algorithm finds all points on all known paths that are close enough to the clue point; these points are called suspects. This is done efficiently using the probabilistic point positions in the equisphere algorithm. These suspects are added to a history data structure, and the probability of each of them indicating the true position is calculated. This step also includes removing unlikely suspect positions. This history update process ensures that only a small piece of history is retained and that possible positions are never deleted. The general algorithm is given in Algorithm 1 and illustrated in Figure 11.
下节以描述第1节中的等球中概率点位置(PPLEB)算法开始。使用PPLEB算法以便有效地执行上述算法1中的第5行。迅速执行这种对可疑者的搜索的能力对于该方法的应用至关重要。在第2节中,描述了执行第6和7行的一个可能的统计模型。所描述的模型是设置的自然选择。它也显示了如何可以非常有效地使用它。The next section begins by describing the Probability Point Location in Equal Sphere (PPLEB) algorithm from Section 1. The PPLEB algorithm is used to efficiently execute line 5 of Algorithm 1 above. The ability to quickly perform this search for suspects is crucial for the application of this method. In Section 2, a possible statistical model for executing lines 6 and 7 was described. The described model is a natural choice for this setting. It also shows how it can be used very efficiently.
第1节-等球中的概率点位置Section 1 - Probability Point Positions in Equal Spheres
下节描述了用于执行等球中概率点位置(PPLEB)的简单算法。在传统的PLEB(等球中的点位置)中,在lR d和半径为r的特定球中,算法以n点集合x开始。该算法被给予O(多(n))预处理时间以产生有效的数据结构。然后,给定查询点x,算法需要返回所有点x,使得||x-xi||≤r。点的集合使得||x-xi||≤r几何地位于围绕查询x的半径r的球内(见图12)。该关系被称为x,接近x或者作为x,并且x是邻居。The following section describes a simple algorithm for performing Probabilistic Point Location in Equisphere (PPLEB). In traditional PLEB (Point Location in Equisphere), the algorithm begins with a set of n points x in a particular sphere of lR d and radius r. The algorithm is given O(multi(n)) preprocessing time to generate an efficient data structure. Then, given a query point x, the algorithm needs to return all points x such that ||xx i || ≤ r. The set of points such that ||xx i || ≤ r is geometrically located within the sphere of radius r surrounding the query x (see Figure 12). This relationship is referred to as x, is near x, or is x, and x is a neighbor.
PPLEB的问题和最近邻居搜索的问题是在学术界受到很多关注的两个类似的问题。事实上,这些问题是计算几何学领域最早研究的问题。许多不同的方法迎合环境维度较小或不变的情况。这些以不同的方式划分空间,并递归搜索各部分。这些方法包括KD树、覆盖树和其它。尽管在低维度方面非常有效,但是当环境维度高时,它们往往表现很差。这被称为“维度诅咒”。各种方法试图解决这个问题,同时克服维度诅咒。本文使用的算法使用更简单和更快的版本的算法,并且可以依靠局部敏感散列(Local Sensitive Hashing)。The PPLEB problem and the nearest neighbor search problem are two similar problems that have received a lot of attention in academia. In fact, these problems are some of the earliest problems studied in the field of computational geometry. Many different methods cater to situations where the environment dimension is small or constant. These divide the space in different ways and recursively search each part. These methods include KD trees, cover trees, and others. Although very effective in low dimensions, they tend to perform poorly when the environment dimension is high. This is known as the "curse of dimensionality". Various methods have tried to solve this problem while overcoming the curse of dimensionality. The algorithm used in this article uses a simpler and faster version of the algorithm and can rely on Local Sensitive Hashing.
第1.1节局部敏感散列Section 1.1 Locality Sensitive Hashing
在局部敏感散列的方案中,人们设计了散列函数族H,使得:In the locality-sensitive hashing scheme, a hash function family H is designed such that:
换句话说,如果x和y彼此接近,x和y被映射到相同的值h的概率显著更高。In other words, if x and y are close to each other, the probability that x and y are mapped to the same value h is significantly higher.
为了清楚起见,让我们首先讨论所有进入的向量具有相同长度r'和的简化情况。后一条件的原因稍后会变得清楚。首先定义随机函数u∈U,它根据x和y之间的角度在x和y之间分开。令是从单位球Sd-1中均匀选择的随机向量,令(见图13)。很容易验证Pru-U(u(x))≠u(y))=0x,y/π。此外,对于圆上的任何点x、y、x'、y',使得For clarity, let us first discuss the simplified case where all incoming vectors have the same length r' and . The reason for the latter condition will become clear later. First define a random function u∈U that splits between x and y according to the angle between them. Let be a random vector uniformly chosen from the unit sphere S d-1 and let (see Figure 13). It is easy to verify that Pr uU (u(x)) ≠ u(y)) = 0 x,y /π. Furthermore, for any point x,y,x',y' on the circle such that
函数族H被设定为u的t个独立副本的叉积,即h(x)=[u1(x),…,ut(x)]。直觉上,人们希望如果h(x)=h(y),则x和y很可能彼此接近。让我们量化一下。首先,计算假阳性错误的预期数量nfp。这些是h(x)=h(y)但是||x-y||>2r的情况。找到nfp不超过1的值t,即,一预计不会是错的。The family of functions H is defined as the cross product of t independent copies of u, i.e., h(x) = [u1(x),…, ut (x)]. Intuitively, one would expect that if h(x) = h(y), then x and y are likely close to each other. Let's quantify this. First, calculate the expected number of false positive errors, nfp . These are cases where h(x) = h(y) but ||xy|| > 2r. Find a value t for which nfp does not exceed 1, meaning that one is not expected to be wrong.
E[nft]≤n(1-2p)t≤1E[n ft ]≤n(1-2p) t ≤1
→t≥log(1/n)/log(1-2p)→t≥log(1/n)/log(1-2p)
假设h(x)和h(y)是邻居,现在计算h(x)=h(y)的概率:Assuming h(x) and h(y) are neighbors, now calculate the probability that h(x) = h(y):
这里注意,必须使2p<1,这需要这听起来可能不像是非常高的成功概率。事实上,显著小于1/2。下一节将介绍如何将这个概率提高到1/2。Note that we must have 2p < 1, which requires a probability of success that may not sound very high. In fact, it is significantly less than 1/2. The next section shows how to raise this probability to 1/2.
第1.2节点搜索算法1.2 Node Search Algorithm
函数h将空间中的每个点映射到桶。将点x的桶函数相对于散列函数h定义为Bh(x)≡{xi|h(xi)=h(x)}。所维护的数据结构是桶函数[Bh1,…,Bhm]的实例。当搜索点x时,函数返回根据上一节,有两个期望的结果:The function h maps each point in space to a bucket. The bucket function for a point x is defined with respect to the hash function h as B h (x) ≡ { xi |h( xi ) = h(x)}. The data structure maintained is an instance of the bucket function [ Bh1 ,…, Bhm ]. When searching for a point x, the function returns:
Pr(xi∈B(x)|||xi-x||≤r)≥1/2Pr(x i ∈B(x)|||x i -x||≤r)≥1/2
换句话说,尽管发现x的每个邻居至少有1/2的概率,但是不可能找到许多非邻居。In other words, although every neighbor of x is found with probability at least 1/2, it is impossible to find many non-neighbors.
第1.3节处理不同的半径输入向量Section 1.3 Handling Different Radius Input Vectors
前一节仅处理搜索相同长度的向量,即r'。现在描述的是如何使用该构建作为构件块来支持不同半径的搜索。如图14中可见,空间被分成若干个具有指数增长宽度的环。由Ri表示的环i包括所有的点xi,使得‖xi‖∈[2r(1+∈)i,2r(1+∈)i+1]。这样做达到了两个目的。首先,如果xi和xj属于同一个环,则||xj||/(1+∈)≤‖xi‖≤||xj||(1+∈)。其次,任何搜索都可以在最多1/∈这种环中执行。此外,如果数据集中的最大长度向量是r',则系统中环的总数是O(log(r'/r))。The previous section dealt only with searching vectors of the same length, namely r'. We now describe how to use this construction as a building block to support searches of different radii. As can be seen in Figure 14, the space is divided into a number of rings with exponentially increasing widths. A ring i, denoted by Ri , consists of all points x i such that ‖xi‖∈ [2r(1+∈) i , 2r(1+∈) i+1 ]. This achieves two goals. First, if x i and x j belong to the same ring, then ||x j ||/(1+∈) ≤‖xi‖≤ ||x j ||(1+∈). Second, any search can be performed in at most 1/∈ such rings. Furthermore, if the maximum length vector in the data set is r', then the total number of rings in the system is O(log(r'/r)).
第2节路径追踪问题Section 2 Path Tracing Problem
在路径追踪问题中,空间中的固定路径与时间点序列中的粒子的位置一起给出。术语“粒子”、“线索”和“点”将可以互换使用。该算法需要输出粒子在路径上的位置。这因为几个因素而变得更加困难:粒子只是近似跟随路径;路径可以不连续并多次自行相交;粒子和路径位置二者都是按时间点序列给出的(每个时间点都不相同)。In the path tracing problem, a fixed path in space is given along with the positions of particles in a sequence of time points. The terms "particle," "clue," and "point" will be used interchangeably. The algorithm needs to output the position of the particle along the path. This is made more difficult by several factors: the particle only approximately follows the path; the path can be discontinuous and intersect itself multiple times; and both the particle and path positions are given as a sequence of time points (different at each time point).
注意到该问题可以模拟在任何数量的路径上跟踪粒子是重要的。这可以简单地通过将路径连接成一个长路径并将所得位置解释为单个路径上的位置来完成。It is important to note that this problem can be simulated by tracking particles along any number of paths. This can be done simply by concatenating the paths into one long path and interpreting the resulting position as a position on a single path.
更准确地说,令路径P为参数曲线曲线参数将被称为时间。我们所知的路径上的点是在任意时间点ti给出的,即给出n对(ti,P(ti))。粒子跟随路径,但其位置在不同的时间点给出,如图15中所示。此外,还给出了m对(t’j,x(t’j)),其中x(t’j)是时间t’j中粒子的位置。More precisely, let the path P be a parametric curve. The parameters of the curve will be called time. A point on the path is given at any time t i , giving n pairs (t i , P(t i )). The particle follows the path, but its position is given at different times, as shown in Figure 15. Furthermore, m pairs (t' j , x(t' j )) are given, where x(t' j ) is the position of the particle at time t' j .
第2.1节似然估计Section 2.1 Likelihood Estimation
由于粒子不精确地跟随路径,并且由于路径可以多次自行相交,所以通常不可能明确地识别粒子实际上在路径上的位置。因此,在所有可能的路径位置上计算概率分布。如果位置概率是显著可能的,则假定粒子位置是已知的。下节描述如何有效地完成此操作。Because particles do not follow a path exactly, and because a path can intersect itself multiple times, it is often impossible to unambiguously identify the actual position of a particle on the path. Therefore, a probability distribution is calculated over all possible path positions. If the position probability is significantly likely, the particle position is assumed to be known. The next section describes how to do this efficiently.
如果粒子跟随路径,则粒子时间戳与P上相应点的偏移之间的时间差应相对固定。换句话说,如果x(t')当前在路径上的偏移t中,则它应该接近P(t)。另外,τ秒之前,它应该已经在偏移t-τ中。因此x(t'-τ)应该接近P(t-τ)(注意,如果粒子与路径相交,且x(t')暂时接近P(t),则x(t'-τ)和P(t-τ)不大可能也接近)。定义相对偏移为Δ=t-t'。注意,只要粒子遵循路径,则相对偏移Δ保持不变。即,x(t')接近P(t'+Δ)。If the particle follows the path, the time difference between the particle's timestamp and the offset of the corresponding point on P should be relatively fixed. In other words, if x(t') is currently at offset t on the path, then it should be close to P(t). In addition, τ seconds ago, it should have been at offset t-τ. Therefore, x(t'-τ) should be close to P(t-τ) (note that if the particle intersects the path and x(t') is temporarily close to P(t), then x(t'-τ) and P(t-τ) are unlikely to be close as well). Define the relative offset as Δ = t-t'. Note that as long as the particle follows the path, the relative offset Δ remains unchanged. That is, x(t') is close to P(t'+Δ).
通过计算得到最大似然相对偏移:The maximum likelihood relative shift is obtained by calculation:
换句话说,最有可能的相对偏移是粒子历史对其而言最有可能的那个相对偏移。然而该方程不能在没有统计模型的情况下求解。该模型必须量化:x跟随路径有多紧;x在各个位置之间跳跃的概率有多大;路径和粒子曲线在测量点之间有多平滑。In other words, the most likely relative shift is the one for which the particle's history is most likely. However, this equation cannot be solved without a statistical model. The model must quantify: how closely x follows the path; how likely x is to jump between locations; and how smoothly the path and particle curves are between measurement points.
第2.2节时间折扣建仓Section 2.2 Time Discount Position Building
现在描述用于估计似然函数的统计模型。该模型假设粒子偏离路径的偏差以标准偏差ar正常分布。它还假定在任何给定的时间点,粒子将突然切换到另一条路径的一定非零的概率。这体现在对过去点数的指数折扣。除了作为建模观点的合理选择之外,该模型还具有可高效更新的优点。对于一些恒定的时间单位1,设定似然函数与f成比例,其定义如下:We now describe the statistical model used to estimate the likelihood function. This model assumes that the deviations of a particle from its path are normally distributed with standard deviation ar. It also assumes that at any given point in time, there is a certain nonzero probability that the particle will suddenly switch to another path. This is reflected in an exponential discount over past points. In addition to being a reasonable choice of modeling perspective, this model has the advantage of being highly updateable. For some constant time unit 1, the likelihood function is set to be proportional to f and is defined as follows:
这里α<<1是比例系数并且ξ>0是粒子将以给定时间单位跳到路径上的随机位置的概率。Here α<<1 is the scaling factor and ξ>0 is the probability that the particle will jump to a random position on the path in a given time unit.
有效地更新函数f可以使用以下简单的观察来实现。Efficiently updating the function f can be achieved using the following simple observation.
此外,由于α<<1,如果||x(t′m)-P(ti)||≥r,则发生以下情况:Furthermore, since α<<1, if ||x(t′ m )-P(t i )||≥r, then the following occurs:
这是似然函数的重要性质,因为总和更新现在可以仅在x(t’j)的邻居上执行而不是在整个路径上执行。用S表示(ti,P(ti))的集合使得||x(t′m)-P(ti)||≤r。以下方程发生:This is an important property of the likelihood function, since the sum update can now be performed only on the neighbors of x(t' j ) instead of on the entire path. Let S be the set of (t i , P(t i )) such that ||x(t′ m )-P(t i )||≤r. The following equation occurs:
这在下面的算法2.2中描述。项f被用作也接收负整数索引的稀疏向量。集合S是路径上x(ti)的所有邻居的集合,并且可以使用PPLEB算法快速计算。很容易验证,如果x(ti)的邻居数受一些常数nnear限制,则向量f中非零的数量由仅是更大的常数因子的nnear/ξ限制。该算法的最后阶段是如果超过一些阈值,则输出特定值δ。This is described in Algorithm 2.2 below. The term f is treated as a sparse vector that also accepts negative integer indices. The set S is the set of all neighbors of x(t i ) on the path and can be quickly computed using the PPLEB algorithm. It is easy to verify that if the number of neighbors of x(t i ) is bounded by some constant n near , then the number of nonzeros in the vector f is bounded by n near /ξ, which is just a factor of the larger constant. The final stage of the algorithm is to output a specific value δ if some threshold is exceeded.
图11给出三个连续的点位置和它们周围的路径点。请注意,无论是最低点还是中间点单独都不足以识别路径的正确部分。然而它们在一起可以识别。添加顶点增加了粒子确实是路径的最终(左)曲线的确定性。Figure 11 shows three consecutive point locations and the path points surrounding them. Notice that neither the lowest point nor the middle point alone is sufficient to identify the correct portion of the path. However, together they can be identified. Adding the vertex increases the certainty that the particle is indeed the final (left) curve of the path.
在图12中,给定n(灰色)点集合,该算法被给予查询点(黑色),并返回与查询点相隔距离r内的点集合(圆内的点)。在传统设定中,算法必须返回所有这些点。在概率设定中,每个这种点应该以一些恒定的概率返回。In Figure 12, given a set of n (grey) points, the algorithm is given a query point (black) and returns the set of points within a distance r from the query point (the points inside the circle). In the traditional setting, the algorithm must return all of these points. In the probabilistic setting, each such point should be returned with some constant probability.
图13示出了u(x1)、u(x2)和u(x)的值。直观地说,如果虚线在它们之间经过,则函数u给x1和x2赋予不同的值,如若不然,则赋予相同的值。沿随机方向经过的虚线确保发生这种情况的概率与x1和x2之间的角度成正比。Figure 13 shows the values of u( x1 ), u( x2 ), and u(x). Intuitively, function u assigns different values to x1 and x2 if the dashed line passes between them, and the same value otherwise. The dashed line passing in a random direction ensures that the probability of this happening is proportional to the angle between x1 and x2 .
图15示出通过将空间划分为多个环使得环Ri在半径2r(1+∈)i和2r(1+∈)i+1之间,可以确保环内的任何两个向量长度直至多个(1+∈)因子都是相同的,并且任何搜索在至多1/∈环中执行。Figure 15 shows that by dividing the space into multiple rings such that the ring R i is between radius 2r(1+∈) i and 2r(1+∈) i+1 , it is ensured that the lengths of any two vectors within the ring are the same up to a multiple of (1+∈) factors, and any search is performed in at most 1/∈ rings.
图15示出自相交的路径和查询点(黑色)。它表明,没有粒子位置的历史,就不可能知道它在路径上的位置。A self-intersecting path and query point (black) are shown in Figure 15. It shows that without the history of the particle's position, it is impossible to know its position on the path.
图16给出三个连续的点位置和它们周围的路径点。请注意,x(t1)和x(t2)单独都不足以识别路径的正确部分。然而它们在一起可以识别。添加x(t3)增加了粒子确实是路径的最终(左)曲线的确定性。Figure 16 shows three consecutive point locations and the path points around them. Note that neither x(t 1 ) nor x(t 2 ) alone is sufficient to identify the correct portion of the path. However, together they do. Adding x(t 3 ) increases the certainty that the particle is indeed the final (left) curve of the path.
在前面的描述中,出于解释的目的,阐述了具体细节以便提供对各种示例的透彻理解。然而,显而易见的是,可以在没有这些具体细节的情况下实践各种示例。例如,电路、系统、网络、过程和其它部件可以表示为框图形式的部件,以免以不必要的细节使示例变得晦涩难懂。在其它情况下,为了避免使这些示例变得晦涩难懂,可以显示众所周知的电路、过程、算法、结构和技术而没有不必要的细节。附图和描述不旨在是限制性的。In the foregoing description, for purposes of explanation, specific details have been set forth to provide a thorough understanding of the various examples. However, it will be apparent that the various examples can be practiced without these specific details. For example, circuits, systems, networks, processes, and other components may be represented as components in block diagram form to avoid obscuring the examples with unnecessary detail. In other cases, well-known circuits, processes, algorithms, structures, and techniques may be shown without unnecessary detail to avoid obscuring the examples. The drawings and descriptions are not intended to be limiting.
前面的描述仅提供了示例性说明,并不意在限制本公开的范围,适用性或配置。相反,示例的前述描述将为本领域技术人员提供用于实现各种示例的可能性描述。应该理解的是,在不脱离如所附权利要求所阐述的本发明的精神和范围的情况下,可以对各种要素的功能和布置进行各种改变。The foregoing description provides only exemplary illustrations and is not intended to limit the scope, applicability, or configuration of the present disclosure. Rather, the foregoing description of examples will provide those skilled in the art with a description of possibilities for implementing various examples. It should be understood that various changes may be made to the function and arrangement of the various elements without departing from the spirit and scope of the present invention as set forth in the appended claims.
另外,应注意的是,各个示例可被描述为过程,该过程被描绘为流程图、流程示图、数据流程图、结构图或框图。尽管流程图可以将操作描述为顺序过程,但是许多操作可以并行或同时执行。另外,操作的顺序可以重新排列。过程在操作完成时被终止,但是可以具有不包括在附图中的附加步骤。过程可以对应于方法、函数、程序、子例程、子程序等。当过程对应于函数时,其终止可以对应于将函数返回到调用函数或主函数。In addition, it should be noted that various examples may be described as processes, which are depicted as flow charts, flowcharts, data flow diagrams, structure diagrams, or block diagrams. Although a flow chart may describe operations as a sequential process, many operations may be performed in parallel or simultaneously. In addition, the order of the operations may be rearranged. A process is terminated when the operations are completed, but may have additional steps not included in the figures. A process may correspond to a method, function, procedure, subroutine, subprogram, etc. When a process corresponds to a function, its termination may correspond to a return of the function to the calling function or the main function.
术语“机器可读存储介质”或“计算机可读存储介质”包括但不限于便携式或非便携式存储设备、光存储设备以及能够存储、包含或携带指令和/或数据的各种其它介质。机器可读存储介质或计算机可读存储介质可以包括其中数据可以被存储并且不包括载波和/或无线地或通过有线连接传播的暂态电子信号的非暂态介质。非暂态介质的示例可以包括但不限于磁盘或磁带,诸如光盘(CD)或数字多功能盘(DVD)的光存储介质、闪速存储器、存储器或存储器设备。计算机程序产品可以包括表示过程、功能、子程序、程序、例程、子例程、模块、软件包、类、或者指令、数据结构或程序语句的任何组合的代码和/或机器可执行指令。代码段可以通过传递和/或接收信息、数据、自变量、参数或存储器内容而耦合到另一代码段或硬件电路。信息、自变量、参数、数据或其它信息可以使用包括存储器共享、消息传递、令牌传递、网络传输或其它传输技术的任何合适的手段来传递、转发或发送。The term "machine-readable storage medium" or "computer-readable storage medium" includes, but is not limited to, portable or non-portable storage devices, optical storage devices, and various other media capable of storing, containing, or carrying instructions and/or data. A machine-readable storage medium or computer-readable storage medium may include a non-transient medium in which data can be stored and does not include a carrier wave and/or a transient electronic signal transmitted wirelessly or via a wired connection. Examples of non-transient media may include, but are not limited to, a disk or tape, an optical storage medium such as a compact disc (CD) or a digital versatile disc (DVD), flash memory, a memory, or a storage device. A computer program product may include code and/or machine-executable instructions representing any combination of a procedure, function, subroutine, program, routine, subroutine, module, software package, class, or instruction, data structure, or program statement. A code segment may be coupled to another code segment or hardware circuit by passing and/or receiving information, data, independent variables, parameters, or memory contents. Information, independent variables, parameters, data, or other information may be passed, forwarded, or sent using any suitable means including memory sharing, message passing, token passing, network transmission, or other transmission technology.
此外,示例可以通过硬件、软件、固件、中间件、微码、硬件描述语言或其任何组合来实现。当以软件、固件、中间件或微码实现时,执行必要任务的程序代码或代码段(例如,计算机程序产品)可被存储在机器可读介质中。处理器可以执行必要的任务。Furthermore, examples may be implemented by hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. When implemented in software, firmware, middleware, or microcode, program code or code segments (e.g., a computer program product) that perform the necessary tasks may be stored in a machine-readable medium. A processor may perform the necessary tasks.
在一些附图中描绘的系统可以以各种配置来提供。在一些示例中,系统可以被配置为分布式系统,其中系统的一个或多个部件横跨云计算系统中的一个或多个网络分布。The systems depicted in some figures can be provided in various configurations. In some examples, the system can be configured as a distributed system, where one or more components of the system are distributed across one or more networks in a cloud computing system.
如上面进一步详细描述的,本公开的某些方面和特征涉及通过将未知数据点与一个或多个参考数据点进行比较来识别未知数据点。本文描述的系统和方法提高了存储和搜索用于识别未知数据点的大数据集的效率。例如,该系统和方法允许识别未知数据点,同时减少执行识别所需的大数据集的密度。该技术可以应用于收获和操纵大量数据的任何系统。这些系统的说明性示例包括基于内容的自动搜索系统(例如,用于视频相关应用程序或其它合适的应用程序的自动内容识别)、Map Reduce系统、Big table系统、图案识别系统、面部识别系统、分类系统、计算机视觉系统、数据压缩系统、聚类分析或任何其它合适的系统。本领域的普通技术人员将认识到,本文描述的技术可以应用于存储与未知数据进行比较的数据的任何其它系统。例如,在自动内容识别(ACR)的情况下,系统和方法减少了必须存储以便匹配系统搜索并找到未知数据组和已知数据组之间的关系的数据量。As described in further detail above, certain aspects and features of the present disclosure relate to identifying unknown data points by comparing the unknown data point with one or more reference data points. The systems and methods described herein improve the efficiency of storing and searching large data sets for identifying unknown data points. For example, the systems and methods allow identification of unknown data points while reducing the density of the large data sets required for performing the identification. The technology can be applied to any system that harvests and manipulates large amounts of data. Illustrative examples of these systems include content-based automatic search systems (e.g., automatic content recognition for video-related applications or other suitable applications), Map Reduce systems, Big table systems, pattern recognition systems, facial recognition systems, classification systems, computer vision systems, data compression systems, cluster analysis, or any other suitable systems. Those of ordinary skill in the art will recognize that the technology described herein can be applied to any other system that stores data for comparison with unknown data. For example, in the case of automatic content recognition (ACR), the system and method reduce the amount of data that must be stored in order to match the system search and find the relationship between the unknown data group and the known data group.
仅作为示例而非限制,为了说明的目的,本文描述的一些示例使用自动音频和/或视频内容识别系统。然而,本领域的普通技术人员将认识到,其它系统可以使用相同的技术。By way of example only and not limitation, for purposes of illustration, some of the examples described herein use automatic audio and/or video content recognition systems. However, one of ordinary skill in the art will recognize that other systems may use the same technology.
可以根据具体要求做出实质性的变化。例如,也可以使用定制的硬件,和/或可以用硬件、软件(包括便携式软件,诸如小应用程序等)或上述两者来实现特定的要素。此外,可以利用到诸如网络输入/输出设备的其它访问设备或计算设备的连接。Substantial variations can be made depending on specific requirements. For example, customized hardware can be used, and/or specific elements can be implemented using hardware, software (including portable software, such as applets, etc.), or both. In addition, connections to other access devices or computing devices, such as network input/output devices, can be utilized.
在前述说明书中,参考其具体示例描述了各种实施方式的各方面,但是本领域技术人员将认识到,本实施方式不限于此。上述实施方式的各种特征和方面可以单独或联合使用。此外,在不脱离本说明书的更宽泛的精神和范围的情况下,可以在除本文所描述的那些以外的任何数量的环境和应用中使用示例。因此,说明书和附图被认为是说明性的而不是限制性的。In the foregoing description, various aspects of various embodiments have been described with reference to their specific examples, but those skilled in the art will recognize that the present embodiment is not limited thereto. The various features and aspects of the above-described embodiments may be used individually or in combination. In addition, without departing from the broader spirit and scope of this specification, examples may be used in any number of environments and applications other than those described herein. Therefore, the description and drawings are to be considered illustrative rather than restrictive.
在前面的描述中,为了说明的目的,以特定的顺序描述了方法。应该理解的是,在替代示例中,可以以与所描述的顺序不同的顺序来执行这些方法。还应该理解的是,上述方法可以由硬件部件来执行,或者可以按照机器可执行指令的顺序来实施,该机器可执行指令可用于使机器(诸如通用或专用处理器或用这些指令编程的逻辑电路)来执行该方法。这些机器可执行指令可以存储在一个或多个机器可读介质上,诸如CD-ROM或其它类型的光盘、软盘、ROM、RAM、EPROM、EEPROM、磁卡或光卡、闪速存储器或适于存储电子指令的其它类型机器可读介质。可替代地,该方法可以通过硬件和软件的组合来执行。In the foregoing description, for the purpose of illustration, method is described in a specific order. It should be understood that, in an alternative example, these methods can be performed in an order different from the described order. It should also be understood that the above method can be performed by hardware components, or can be implemented in the order of machine executable instructions, which can be used to make a machine (such as a general or special processor or a logic circuit programmed with these instructions) perform the method. These machine executable instructions can be stored on one or more machine-readable media, such as CD-ROM or other types of optical disks, floppy disks, ROM, RAM, EPROM, EEPROM, magnetic cards or optical cards, flash memories or other types of machine-readable media suitable for storing electronic instructions. Alternatively, the method can be performed by a combination of hardware and software.
在将部件描述为被配置为执行某些操作的情况下,可以例如通过执行操作的设计电子电路或其它硬件,通过执行操作的编程可编程电子电路(例如,微处理器或其它合适的电子器件电路),或其任何组合来实现这种配置。Where a component is described as being configured to perform certain operations, such configuration may be achieved, for example, by designed electronic circuits or other hardware that perform the operations, by programmed programmable electronic circuits (e.g., a microprocessor or other suitable electronic device circuitry) that perform the operations, or any combination thereof.
虽然本文已经详细描述了本申请的说明性示例,但是应该理解,本发明构思可以以其它方式被不同地体现和采用,并且所附权利要求旨在被解释为包括这种变体,除了受现有技术的限制。While illustrative examples of the present application have been described in detail herein, it should be understood that the inventive concept may be otherwise variously embodied and employed, and that the appended claims are intended to be interpreted to include such variations, except as limited by the prior art.
Claims (20)
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201562193322P | 2015-07-16 | 2015-07-16 | |
| US62/193,322 | 2015-07-16 | ||
| PCT/US2016/042621 WO2017011798A1 (en) | 2015-07-16 | 2016-07-15 | Detection of common media segments |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| HK1254819A1 HK1254819A1 (en) | 2019-07-26 |
| HK1254819B true HK1254819B (en) | 2021-05-21 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12425698B2 (en) | Detection of common media segments | |
| US10375451B2 (en) | Detection of common media segments | |
| JP6972260B2 (en) | Systems and methods for partitioning search indexes to improve media segment identification efficiency | |
| US9860593B2 (en) | Devices, systems, methods, and media for detecting, indexing, and comparing video signals from a video display in a background scene using a camera-enabled device | |
| US10805681B2 (en) | System and method for detecting unknown TV commercials from a live TV stream | |
| CN108028947A (en) | System and method for improving the workload management in ACR TV monitor systems | |
| HK1254819B (en) | Detection of common media segments | |
| US20250390530A1 (en) | Systems and methods for partitioning search indexes for improved efficiency in identifying media segments | |
| HK1255272B (en) | Systems and methods for partitioning search indexes for improved efficiency in identifying media segments | |
| HK1252711B (en) | System and method for improving work load management in acr television monitoring system |