WO2010076769A1

WO2010076769A1 - Embedding of addressable and context based messages in encoded video streams

Info

Publication number: WO2010076769A1
Application number: PCT/IB2009/056005
Authority: WO
Inventors: Ziv Isaiah; Sharon Eliasi
Original assignee: Individual
Current assignee: Individual
Priority date: 2009-01-01
Filing date: 2009-12-31
Publication date: 2010-07-08
Anticipated expiration: 2011-07-01

Abstract

A computerized method for embedding encoded video content in real time into an encoded digital video data stream. The encoded video content includes multiple blocks, e.g. macroblocks, of encoded video. An area in image space is selected as a placeholder in a video sequence of the encoded digital video data stream. Changes in the selected area are tracked by calculating a motion vector for the selected area from picture to picture in the video sequence. An identifier is assigned to the placeholder. Metadata is stored related to the motion vector. The blocks of the encoded video content are subsequently embedded in the video sequence at the placeholder. The embedding may be performed while streaming in real time to a video receiver-decoder.

Description

EMBEDDING OF ADDRESSABLE AND CONTEXT BASED MESSAGES IN

ENCODED VIDEO STREAMS

BACKGROUND

1. Technical Field

The present invention relates to a method for embedding targeted and context based messages in encoded video streams.

2. Description of Related Art

Advancements in digital technology have produced a number of digital video applications. Digital video is currently used in digital and high definition television, videoconferencing, computer imaging, and over the wide area network. Video is composed of a series of still pictures or image frames taken frequently at time intervals. The image frames when subsequently displayed sequentially, provide an illusion of continuous motion. Each image frame includes a two-dimensional array of picture elements or "pixels" in image space. Each pixel is positioned in a defined position in image space and has associated with it a luminous intensity or luminance of optional color information. Each horizontal line of pixels in the two-dimensional image frame is called a raster line.

Uncompressed digital video signals constitute a huge amount of data and would require large amounts of bandwidth to transmit and memory to store. In digital processing systems, a large amount of digital data is required to define each video frame since each line of an image frame includes a sequence of digital data. However, the available frequency bandwidth of a conventional transmission channel to transmit the data is limited. Therefore, it has become necessary to reduce the substantial amount of data by way of employing various data compression techniques that are optimized for particular applications. Digital compression devices are commonly referred to as "encoders", devices that perform decompression are referred to as "decoders". Devices that perform both encoding and decoding in hardware and/or software are referred to as "codecs". In the interest of standardizing methods for motion picture video compression, the Motion Picture Experts Group (MPEG) issued a number of standards. MPEG-I is a compression algorithm intended for video devices having intermediate data rates. MPEG-2 is a compression algorithm for devices using higher data rates, such as digital high-definition TV (HDTV), direct broadcast satellite systems (DBSS), cable TV (CATV), and serial storage media such as digital video tape recorders (VTR). MPEG-3 is the designation for a group of audio and video coding standards agreed upon by the MPEG designed to handle HDTV signals at 108Op in the range of 20 to 40 megabits per second. MPEG-4 is a patented collection of methods defining compression of audio and visual digital data. Uses of MPEG-4 include compression of audio and visual data for web (streaming media) and CD distribution, voice (telephone, videophone) and broadcast television applications.

In the field of video compression a video frame is compressed using different algorithms with different amounts of data compression. These different algorithms for video frames are categorized into picture types or frame types. A "group of pictures" (GOP), is a group of successive pictures within a coded video stream. Each coded video stream consists of successive GOPs. From the pictures contained in it, the visible frames are generated. A GOP can contain the following major picture types used in the different video algorithms: I, P and B. An I-frame, an 'Intra-coded picture', is in effect a fully- specified picture, like a conventional static image file. Each GOP starts with an I-frame. I-frames are the least compressible but do not require other video frames for decoding. P-frames and B-frames hold only part of the image information, so they need less space to store than an I-frame, and thus improve video compression rates. P-frames can use data from previous I-frames to decompress and are more compressible than I-frames. The P-frame ('Predicted picture') holds only the changes in the image from the previous frame. For example, in a scene where a car moves across a stationary background, only the car's movements need to be encoded. The encoder does not need to store the unchanging background pixels in the P- frame. A B-frame ('Bi-predictive picture') saves more space by using differences between the current frame and both the preceding and following frames to specify its content. B- frames can use both previous and forward frames for data reference to get the highest amount of data compression. Strictly, the term "picture" is a more general term than the term "frame". A picture can be either a frame or a field. A frame is a complete image captured during a known time interval, and a field is the set of odd-numbered or even-numbered scanning lines composing a partial image. When video is sent in interlaced- scan format, each frame is sent as the field of odd- numbered lines followed by the field of even-numbered lines. The above notwithstanding, hereinafter, the term "picture" and "frame" are used herein interchangeably.

Typically, pictures are segmented into macroblocks, and individual prediction types can be selected on a macroblock basis rather than being the same for the entire picture, as follows:

I-pictures can contain only intra macroblocks. P-pictures can contain either intra macroblocks or predicted macroblocks. B pictures can contain intra, predicted, or bi-predicted macroblocks

Inter-frame coding techniques are used to compress data in video sequences. Motion- compensated coding, especially, can further improve the efficiency of image coding for the transmission of compressed data which is normally used to predict current frame data from previous frame data based on an estimation of the motion between the current and the previous frames. Such estimated motion may be described in terms of two dimensional "motion vectors". The term "motion vector" as used herein is a two-dimensional vector used for inter prediction that provides an offset from the coordinates in the decoded picture to the coordinates in a reference picture. Motion estimation includes the process of estimating the displacement of a portion of an image between neighboring pictures. For example, a moving soccer ball will appear in different locations in adjacent pictures. Displacement is described as the motion vectors that give the best match between a specified region, e.g., the ball, in the current picture and the corresponding displaced region in a preceding or upcoming reference picture. The difference between the specified region in the current picture and the corresponding displaced region in the reference picture is referred to as "residue".

Several methods for estimating the displacement of an object in a video sequence have been proposed. Generally, the methods can be classified into two types: pixel recursive algorithms and block matching algorithms. Pixel-recursive techniques predict the displacement of each pixel iteratively from corresponding pixels in neighboring frames. Block-matching algorithms, on the other hand, estimate the displacement between frames on a block-by-block basis and choose vectors that minimize the difference. In the block matching algorithm, a current frame is divided into search blocks. To determine a motion vector for a search block in the current frame, a similarity calculation is performed between the search block of the current frame and each of equal-sized candidate blocks included in a generally larger search region within a previous frame. An error function such as the mean absolute error or mean square error is used to carry out a similarity measurement between the search block of the current frame and one of the candidate blocks in the search region. A motion vector, by definition, represents the displacement between the search block and a candidate block which yields a minimum "error" or difference. Since the search block is compared with all possible candidate blocks within a search region corresponding to the search block (full search block matching), there normally occurs a heavy computational requirement.

In conventional block-matching processes, the current image to be encoded is divided into equal-sized blocks of pixel information. In MPEG-I and MPEG-2 video compression standards, for example, the pixels are grouped into " macroblocks", each consisting of a

16x16 sample array of luminance samples together with one 8x8 block of samples for each of the two chrominance components. The 16x16 array of luminance samples further comprises four 8x8 blocks that are typically used as input blocks to the compression models.

During motion estimation techniques, a video frame is compressed by comparing it to a previous video frame, typically the immediately preceding video frame in a motion video clip or sequence. As discussed above, where similar blocks are found in the previous video frame, a motion vector is transmitted instead of the pixels for that block, which allows the block to be reconstructed from the reference block in the previous video frame. The first video frame, i.e. "intra-frames or I-frames of a sequence has no previous video frame and is sent without being encoded with motion estimation techniques. It will be understood that intra-frames, while not encoded with motion estimation techniques, may be encoded with other data compression techniques. Periodically, a new intra-frame is sent, for otherwise cumulative errors may build up in the successively compressed and reconstructed video frames. Typically, an intra-frame is transmitted every tenth video frame. Thus, video frame 0 may be encoded and transmitted as an intra-frame, while video frame 1 is encoded relative to video frame 0 with motion estimation encoding; video frame 2 is encoded relative to video frame 1; and so on. Every tenth video frame, i.e. video frames 10, 20, 30 . . . , is transmitted as an intra-frame. It will be understood that other intervals may be selected for sending intra-frames during motion estimation, e.g. every 16th frame. Further, intra-frames may be sent at other times as well, for example at scene changes. If a video processing algorithm detects a scene change from video frame 4 to video frame 5, for example, video frame 5 may be encoded and transmitted as an intra-frame rather than utilizing a difference from frame 4.

Elementary streams (ES) are the raw bit streams of MPEG-I audio and video, output by an encoder. System Clock Reference (SCR) is a timing value stored in a 33-bit header of each elementary stream, at a frequency/precision of 90 kHz, with an extra 9-bit extension that stores additional timing data with a precision of 27 MHz. These are inserted by the encoder, derived from the system time clock (STC). Presentation time stamps (PTS) exist in program streams to correct the inevitable disparity between audio and video system clock reference values (time-base correction). 90 kHz PTS values in the PS header tell the decoder which video SCR values match which audio SCR values. PTS determines when to display a portion of an MPEG program, and is also used by the decoder to determine when data can be discarded from the buffer. Either video or audio will be delayed by the decoder until the corresponding segment of the other arrives and can be decoded.

Decoding Time Stamps (DTS), additionally, are required because of B-frames. With B- frames in the video stream, adjacent frames have to be encoded and decoded out-of-order (re-ordered frames). DTS is quite similar to PTS, but instead of just handling sequential frames, it contains the proper time-stamps to tell the decoder when to decode and display the next B-frame (types of frames explained below), ahead of its anchor (P- or I-) frame. Without B-frames in the video, PTS and DTS values are identical. Further information regarding coding of video signals may be found in ITU-T Recommendation H.262 Transmission of Non-Telephone Signals;Information Technology- Generic Coding of Moving Pictures and Associated Audio Information: Video.

The term "video sequence" as used herein includes a series of encoded frames.

The term "area" as used herein refers to an area in image space within one or more pictures.

The term "encoded" as used herein is in the context of an "encoded video stream" and refers to any technique of compressing video data such as but not limited to known MPEG-I to MPEG 4 standards.

The term "encoded video" and "encoded video stream" are used herein interchangeably.

The term "embed" as used herein refers to inserting content as blocks, e.g. MPEG macroblocks, constituting a portion of a picture. The blocks are inserted within the picture.

The term "real-time" as used herein also includes "pseudo-real-time" or "real time on the average" so that a brief time latency may be added to the data stream which is compensated for a moment later.

BRIEF SUMMARY

According to an aspect of the present invention there is provided a computerized method for embedding encoded video content in real time into an encoded digital video data stream. The encoded video content includes multiple blocks, e.g. macroblocks of encoded video. An area (in image space) is selected as a placeholder in a video sequence of the encoded digital video data stream. Changes in the selected area are tracked by calculating a motion vector for the selected area from picture to picture in the video sequence. An identifier is assigned to the placeholder. Metadata is stored related to the motion vector. The blocks of the encoded video content are subsequently embedded in the video sequence at the placeholder. The embedding may be performed while streaming in real time to a video receiver-decoder. The blocks may be Motion Picture Experts Group (MPEG) macroblocks. The video sequence may include a Motion Picture Experts Group (MPEG) group of pictures. The video sequence may include two or more MPEG groups of pictures and the tracking changes and the embedding are performed over the MPEG groups of pictures. The tracking changes may be performed for one or more anchor points of the area and the motion vectors are calculated for the anchor points. The identifier is embedded in the video sequence thereby marking the blocks and producing a modified encoded video data stream. Alternatively, the identifier externally references the blocks of the encoded digital video data stream. Typically, the video content is selected for the embedding into the encoded digital video data stream, and the video content and the metadata are loaded for processing. The video content may be modified by applying initial state and time development characteristics to produce processed video content of the initial state and/or the time development characteristics based on the metadata. The encoded video data stream and the identifier are input in real time and blocks of the video content are replaced with corresponding blocks of the processed video content in the selected area within the video sequence.

Tracking from picture to picture may be performed according to type of the anchor points, position, size, orientation, luminance and chrominance (color). The calculation of the motion vector may include calculation of a master motion vector from multiple motion vectors for anchor points of the same type or of different types.

According to the present invention there is provided a computer system configured to perform any of the methods disclosed herein. The computer system includes storage and a processor. The storage is adapted for storing replacement blocks of encoded video content. The processor is operatively connected to the storage. The processor is configured to input in real time an encoded digital video data stream, to load the encoded video content, and to replace original blocks of the encoded digital video data stream with the replacement blocks to produce as output in real time a processed encoded digital video data stream. The original blocks and the replacement blocks contain portions of pictures of an encoded video sequence. Initial state and time-development characteristics of the original blocks are typically previously stored as metadata in the storage and the processor is configured to load the metadata to produce the processed encoded digital video data stream. . According to the present invention there is provided a computer readable medium encoded with processing instructions for causing a processor to execute any of the methods disclosed herein.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention is herein described, by way of example only, with reference to the accompanying drawings, wherein:

Figure 1 illustrates a high level flow diagram of a process for embedding video content or message, e.g. advertisement, into an encoded video stream, according to an aspect of the present invention.

Figure 2 illustrates a sub-process of the process of Figure 1, for generating metadata and identifiers for video frames.

Figure 2A illustrates in further detail a sub-process of the process of Figure 1, the sub- process including tracking of the placeholder within the video sequence, according to an embodiment of the present invention.

Figure 2B illustrates in further detail calculation of the master motion vector, according to an embodiment of the present invention..

Figure 3 illustrates a sub-process for associating video content with placeholders for later embedding the video content, e.g. advertisement, into an encoded video, according to an embodiment of the present invention. .

Figure 4 illustrates a sub-process real-time replacement of original blocks with replacement blocks of the video content , according to an embodiment of the present invention.

Figure 5 illustrates a simplified system drawing for implementing methods according to embodiments of the present invention. Figure 6 illustrates schematically a simplified computer system for implementing methods according to embodiments of the present invention. The foregoing and/or other aspects will become apparent from the following detailed description when considered in conjunction with the accompanying drawing figures.

DETAILED DESCRIPTION

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. The embodiments are described below to explain the present invention by referring to the figures.

Before explaining embodiments of the invention in detail, it is to be understood that the invention is not limited in its application to the details of design and the arrangement of the components set forth in the following description or illustrated in the drawings. The invention is capable of other embodiments or of being practiced or carried out in various ways. Also, it is to be understood that the phraseology and terminology employed herein is for the purpose of description and should not be regarded as limiting.

The embodiments of the present invention may comprise a general-purpose or special- purpose computer system including various computer hardware components, which are discussed in greater detail below. Embodiments within the scope of the present invention also include computer-readable media for carrying or having computer-executable instructions, computer-readable instructions, or data structures stored thereon. Such computer- readable media may be any available media, which is accessible by a general - purpose or special-purpose computer system. By way of example, such computer- readable media may include physical storage media such as RAM, ROM, EPROM, CD- ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other media which can be used to carry or store desired program code means in the form of computer-executable instructions, computer-readable instructions, or data structures and which may be accessed by a general-purpose or special-purpose computer system.

In this description and in the following claims, a "computer system" is defined as one or more software modules, one or more hardware modules, or combinations thereof, which work together to perform operations on electronic data. For example, the definition of computer system includes the hardware components of a personal computer, as well as software modules, such as the operating system of the personal computer. The physical layout of the modules is not important. A computer system may include one or more computers coupled via a computer network. Likewise, a computer system may include a single physical device (such as a mobile phone or Personal Digital Assistant "PDA") where internal modules (such as a memory and processor) work together to perform operations on electronic data.

In this description and in the following claims, a "network" is defined as any architecture where two or more computer systems may exchange data. Exchanged data may be in the form of electrical signals that are meaningful to the two or more computer systems. When data is transferred or provided over a network or another communications connection (either hardwired, wireless, or a combination of hardwired or wireless) to a computer system or computer device, the connection is properly viewed as a computer-readable medium. Thus, any such connection is properly termed a computer-readable medium. Combinations of the above should also be included within the scope of computer-readable media. Computer-executable instructions comprise, for example, instructions and data which cause a general-purpose computer system or special-purpose computer system to perform a certain function or group of functions.

By way of introduction, embodiments of the present invention are intended to allow an advertisement agency to dynamically activate an advertisement campaign which may be targeted and addressed to individual viewers based on viewer profiles. The targeted advertisements are embedded within the context of the encoded (or compressed) video stream as either a broadcast video stream or multicast e.g. video on-demand (VOD) or unicast. Specifically, the advertisement is embedded in the video stream by replacing original macroblocks within a group of encoded or compressed video frames with new macroblocks within the frames. It should be noted that throughout the description the term "advertisement" is used by way of example only for the embedded video content. According to different features of the present invention the embedded video content may be any type of content or message for any number of commercial or non-commercial purposes not just for advertisement. Referring now to the drawings, Figure 1 illustrates a high level flow diagram 10 of a process for embedding video content, e.g. advertisement, into an encoded video stream 19, according to an aspect of the present invention. Three stages or sub-processes 20, 30, 40 are shown. In sub-process 20 placement opportunities within an encoded video stream are prepared. An example of a placement opportunity in a video may be a restaurant table with an empty place for a bottle. An output of sub-process 20 is placeholder metadata 23. Placeholder metadata 23 includes for instance initial state characteristics, e.g. initial position in image space, and time development, e.g. motion in image space of the placeholder on the restaurant table. The placeholder metadata is input into sub-process 30 in which any number of bottles for instance, are assigned as advertisements to fill the placeholder (place on the restaurant table). The advertisements are each accommodated in terms of size, background, color to the context with respect to the restaurant table for example, in the video stream. Advertisements with different bottles, for example may be stored as processed advertisements 32. In sub-process 30, processed advertisements 32 may be embedded in real time into the encoded video stream during broadcast or multicast to produce an encoded video stream 46 with embedded advertisements. Optionally, sub-process 20 may also be performed in real-time or pseudo-real time given sufficient processing power.

Reference is now made to Figure 2 which illustrates sub-process 20 in more detail. An encoded (or compressed) video 19 is typically decoded and viewed by a user. The user selects (step 21) a sequence of image frames and marks (step 22) an area or graphical object in image space in the selected group of pictures. The marked area serves as a placeholder for later real-time embedding of video content, e.g. advertisement, shown in sub-process 40. Changes of the placeholder within the group of pictures are tracked (step 24) and the placeholder is assigned (step 25) a unique identifier (ID). Placeholder metadata are stored (step 26). The placeholder metadata may include information as to the initial state and time development characteristics of the placeholder The unique identifiers may be embedded (step 27) into the encoded video stream to produce an encoded video

29 with the placeholders within it uniquely identified. Alternatively, a separate external list of placeholder identifiers is included with references to specific frames inside encoded video 19. Reference is now also made to Figure 2A which illustrates in further detail tracking (step 24) of the placeholder within the video sequence, according to an embodiment of the present invention. Sub-process 24 preferably starts (step 201) with an intra-frame, typically the first intra-frame of the selected video sequence, e.g. MPEG group of pictures. Anchor points are located (step 205) in the selected area or placeholder.

According to a feature of the present invention, an anchor point is a selected macroblock just outside the selected area used to anchor the selected area to an element inside the original scene to be used as reference, for positioning, sizing etc. For positioning the selected area, one macroblock on the frame is typically sufficient as a reference for positioning every macroblock within the selected area. For sizing the selected area, one or more anchor points may be required of the selected area such as at corners of a rectangular area. The mutual time behavior of the anchor points allows determination of the changes in size of the selected area. Anchor points may also be used for other characteristics including angle, rotation, orientation. Hence, anchor points may be classified (step 207) according to type. Anchor point types may include position, size, luminance, orientation, chrominance (color). The initial state characteristics of the anchor points and the time development characteristics of the anchor points are determined and summarized in a motion vector for each of the anchor points. The motion vectors for all the anchor points are used to calculate (step 209) a master motion vector. In the next frame (step 211) the new anchor point properties (position, size, orientation, luminance, and/or chrominance) are estimated (step 213) by applying the master motion vector previously calculated in step 209. Steps 209-213 repeat until the end (step 215) of the selected group of pictures (step 21) or until another intra-frame is reached. According to a feature of the present invention, tracking (step 24) may be performed between groups of pictures, i.e. inter-GOP tracking by, for example, decoding the last frame of the GOP and the first frame of the following GOP and applying existing methods for picture analysis that looks up an object in two different pictures.

Reference is now also made to Figure 2B which illustrates in further detail calculation

(step 209) of the master motion vector. In step 210, motion vectors of the same type (position, size, orientation, luminance, and/or chrominance) are grouped. A formula is selected (step 212) dependent on the type of motion vector. The formula is applied (step 214) to all the selected motion vectors to calculate the master motion vector.

Reference is now made to Figure 3 which illustrates sub-process 30 for associating advertisements with placeholders for later embedding video content, e.g. advertisement, into encoded video 29. A user selects (step 32) from potential advertisements 31 an advertisement 33. e.g a name label wine bottle, to be adapted to a specific placeholder, e.g. for placement on a table top. The use of a still image as an advertisement is by way of non-limiting example only. The video content 31 for embedding into encoded video frames 29 may, in different embodiments of the present invention, be a still graphical image, video, animation in any available format and may be useful for advertisement or other purpose. Selected advertisement 33 and placeholder metadata 23 are loaded (step 34) for video processing. Selected advertisement 33 is modified (step 35) or adapted according to size, orientation, color by applying initial state characteristics and further modified (step 36) by applying time development characteristics in order to produce a processed advertisement 37. Processed ads 37 are stored for later use in ad storage 38.

Reference is now also made to Figure 5 which illustrates a simplified system drawing for implementing methods according to embodiments of the present invention. A data network 58 is shown. A source 51 of streaming encoded video content 19, for instance a video on demand (VOD) server is connected over a high bandwidth connection 52 to a wide area data network 58, e.g. Internet. The streaming encoded video 19 is conventionally routed to connection 55 and then to an integrated receiver/decoder (IRD) 57, which decodes streaming encoded video signal 19 for viewing. According to an embodiment of the present invention, streaming source 51 transmits encoded video stream 29 previously prepared by process 20 for embedding advertisement content according to placeholder identifiers previously embedded therein or referenced thereto. Encoded video stream 29 is routed to an embedding engine 59 including a video processor 601 and storage 38 designed and built according to an embodiment of the present invention. Embedding engine is configured to embed advertisements or other video content by replacing blocks, e.g macroblocks, within frames of encoded video stream 29, not by splicing or replacing whole frames. The processed streaming video signal with the added content, e.g advertisements are transmitted to IRD 57 over data network 58 through high bandwidth connections 53 and 54. In other embodiments of the present invention, embedding engine 59 may be co-located or be connected directly with IRD 57 or embedding engine 59 may be co-located or connected directly with streaming source 51.

Reference is now also made to Figure 4, which illustrates process 40, according to an embodiment of the present invention. Encoded video stream 29 undergoes encoded video domain real time processing (step 45) which retrieves processed advertisements or other video content 37, stored in storage 38 and places advertisements 37 at placeholders previously embedded or referenced within encoded video stream 29. The output of real time processing 45 is an encoded video stream 46 with embedded advertisements 46. Typically, campaign data 41 is available which includes which customers have active campaigns, and to whom the campaign is intended to reach. An advertisement decision 49 is made to proceed with a specific advertisement campaign with a specific targeted audience. A placement decision (step 42) is made which selects one or more specific images as part of the advertisement campaign. The specific images are input to real time processing (step 42). According to a feature of the present invention, session/viewer/group of viewer properties/characteristics 43 may be available as an output of the real time processing (step 45) to facilitate advertisement decision 49 and/or placement decision as to in which video 29 to place advertisements, time of day/day of week and desired time/number density of advertisements in processed video stream 46.

Reference is now also made to Figure 4A which illustrates in more detail real time processing 45 typically performed by embedding engine 59. Placement decision 42 results in a selection of specific macroblocks to replace within an image frame of encoded video 29.

Embedding Engine 59 captures encoded video stream 29 and analyzes it in order to find placeholder identifiers (IDs). The IDs may either be a mark inside encoded video stream 29, or, for example, a one-to-many relationship between Decoding Time Stamp (DTS) and placeholder IDs.

When embedding engine 59 identifies a placeholder it looks up for potential ads 37 to be embedded. Considering contextual properties and viewer/target characterization, an ad 37 is selected from ad storage 38 that matches the available placeholder. Based on placement decision (step 42), original macroblocks are selected (step 401) to be replaced within frames 29 and replacement blocks, e.g. macroblocks, are retrieved (step 405) from ad storage 38. In step 403, original macroblocks are replaced with the replacement macroblocks within each frame of the group of pictures that the advertisement content appears. Processed frames 46 each containing replacement macroblocks are output as processed encoded video stream 46. When processed ad 37 is embedded into encoded video stream 29 by replacing a set of macroblocks with another set of macroblocks, embedding engine 59 maintains the integrity of the stream to be standard so that IRDs 57 (set top box or otherwise any decoder) do not require a configuration change nor any update of software nor new hardware. The video layer, the transport layer, structure of the sequence image frames, motion estimation vectors, DTS/PTS sequence, PCR, buffer model and all other required specifications are all maintained while replacing macroblocks (step 403) in the sequence of image frames. In step 403, other frames may reference the selected area and compensation is required when the replaced macroblock is referenced from outside the selected area.

In a simple example for replacing macroblocks (step 403) , replacement macroblock R_n has an initial coordinate relative to an anchor of position type:

Rn(anchor_x+offsetnχ, anchor_y+offset_ny) where n is an index over macroblocks in the selected area, anchor_x and anchor_y are the coordinates of the anchor point and offset_nx, , ojfset_ny offsets in image space from the anchor point. Time development of replacement macroblock R_n follows the time development of the master motion vector.

The master motion vector expresses how the selected area behaves in time or the time dependent characteristics of the selected area from picture to picture. The master motion vector is calculated as a function of all the motion vectors of the selected area. In an example, consider a video which is taken through a transparent window from a moving train. The original video shows a moving background scene through the train window. In this example, the master motion vector is calculated as a weighted average performed over all the motion vectors in the selected area and all the motion vectors of the window frame. Motion vectors over the entire selected area of the window are set to zero. The motion vectors outside the window, for instance the window frame are set at the same value, e.g. 1. Use of the master motion vector allows the motion of the moving background as seen through the transparent window to be disregarded and instead the selected area inside the train window in the video is moved according to original motion of the window frame in the original video. The message or advertisement may then be embedded within the window frame by replacing original blocks of the moving background with blocks of the advertisement.

Reference is now made to Figure 6 which illustrates schematically a simplified computer system or embedding engine 59 according to an embodiment of the present invention. Computer system 59 includes a video processor 601, a storage mechanism including a memory bus 607 to store information in memory 609 and in ad storage 38. Network interfaces 53 and 54 are operatively connected to processor 601 with a peripheral bus 603. Computer system 59 typically includes a data input mechanism 611, e.g. disk drive for a computer readable medium 613, e.g. optical disk. Data input mechanism 611 is operatively connected to processor 601 with peripheral bus 603.

The definite articles "a", "an" as used herein, such as "a placeholder", "an identifier, an "anchor point" have the meaning of "one or more" that is "one or more placeholders" or "one or more identifiers", "one or more anchor points".

Although selected embodiments of the present invention have been shown and described, it is to be understood the present invention is not limited to the described embodiments. Instead, it is to be appreciated that changes may be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and the equivalents thereof.

Claims

1. A computerized method for embedding encoded video content in real time into an encoded digital video data stream, wherein the encoded video content includes a plurality of blocks of encoded video, the method comprising: selecting an area as a placeholder in a video sequence of the encoded digital video data stream; tracking changes of said selected area by calculating a motion vector for said area from picture to picture in said video sequence; and assigning an identifier to said placeholder and storing metadata related to said motion vector; wherein the blocks of encoded video content are subsequently embedded in said video sequence at the placeholder.

2. The method of claim 1, wherein the embedding is performed while streaming in real time to a video receiver-decoder.

3. The method of claim 1, wherein said blocks are Motion Picture Experts Group (MPEG) macroblocks.

4. The method of claim 1, wherein said video sequence includes a Motion Picture Experts Group (MPEG) group of pictures.

5. The method of claim 1, wherein said video sequence includes at least two MPEG group of pictures and wherein said tracking changes and said embedding are performed over said at least two MPEG groups of pictures.

6. The method of claim 1, wherein said tracking changes is performed for an anchor point of said area and said motion vector is calculated for said anchor point.

7. The method of claim 1, wherein said identifier is embedded in said video sequence thereby marking said blocks and producing a modified encoded video data stream.

8. The method of claim 1 wherein said identifier externally references said blocks of the encoded digital video data stream.

9. The method of claim 7 or 8, further comprising: selecting the video content for the embedding into the encoded digital video data stream; loading the video content and said metadata for processing; and modifying the video content by applying initial state and time development characteristics to produce processed video content wherein at least one of said initial state and said time development characteristics is based on said metadata.

10. The method of claim 9, further comprising: inputting in real time said modified encoded video data stream and said identifier; replacing in real time a plurality of blocks of the video content with corresponding blocks of the processed video content in said selected area within said video sequence.

11. The method of claim 9, further comprising: inputting in real time the encoded digital video data stream; and replacing in real time a plurality of blocks of the video content with corresponding blocks of the processed video content in said selected area within said video sequence.

12. The method of claim 6, wherein said tracking is performed according to a type of said anchor point, wherein said type is selected from the group consisting of: position, size, orientation, luminance and color.

13. The method of claim 1, wherein said calculating said motion vector includes calculation of a master motion vector from a plurality of motion vectors for anchor points of different types.

14. A computer system configured to execute the method of claim 1.

15. A computer readable medium readable encoded with processing instructions for causing a processor to execute the method of claim 1.

16. A computer system comprising: storage adapted for storing replacement blocks of encoded video content;; and a processor operatively connected to said storage; wherein the processor is configured to input in real time an encoded digital video data stream, to load the encoded video content, and to replace original blocks of said encoded digital video data stream with said replacement blocks to produce as output in real time a processed encoded digital video data stream, wherein said original blocks and said replacement blocks contain portions of pictures of an encoded video sequence.

17. The computer system of claim 16, wherein initial state and time-development characteristics of said original blocks are previously stored as metadata in said storage and wherein said processor is configured to load said metadata to produce said processed encoded digital video data stream. .

18. A computerized method for embedding encoded video content into an encoded video data stream, wherein the encoded video content contains a plurality of blocks of encoded video, the method comprising: in a first stage, selecting an area as a placeholder in a video sequence of the encoded digital video data stream; tracking changes of said selected area by calculating a motion vector for said area from picture to picture in said video sequence; assigning an identifier to said placeholder and storing metadata related to said motion vector; in a second stage, selecting the video content for the embedding into the encoded digital video data stream; loading the video content and said metadata for processing; modifying the video content by applying initial state and time development characteristics to produce processed video content, wherein at least one of said initial state and said time development characteristics is based on said metadata; in a third stage, inputting in real time said encoded video data stream and said identifier; and replacing in real time the blocks of the video content with corresponding blocks of the processed video content in said selected area within said video sequence.