HK40005301B

HK40005301B - Multimode synchronous rendering of audio and video

Info

Publication number: HK40005301B
Application number: HK19128690.5A
Authority: HK
Inventors: D·劳; 李春镐
Original assignee: Dts有限责任公司
Priority date: 2016-09-14
Filing date: 2017-09-12
Publication date: 2023-05-19

Description

Multi-mode synchronized rendering of audio and video

Background

The network system may have various connected devices such as computers, smart phones, tablets, televisions, etc. Devices connected to the network may play media. For example, a computer on a network may download media from the internet to display video through a display and output audio through speakers or headphones. Recently, smart televisions have become available with built-in networking functionality that enables media to be streamed directly to the television. Despite this and other advances, effective options for listening to television audio are still limited to cable speakers.

Disclosure of Invention

One aspect features a method for selecting a mode for synchronizing audio playback between a first electronic device and a second electronic device. The method includes receiving video data and audio data at a first electronic device, the first electronic device comprising a television or a media source coupled to a television; wirelessly communicating clock information associated with the first electronic device to the second electronic device over a wireless network to establish a synchronized clock between the first electronic device and the second electronic device, the second electronic device being a mobile device; programmably selecting, using a hardware processor of the first electronic device, an audio synchronization mode based at least in part on the video data, wherein the audio synchronization mode is selected between a first mode and a second mode, the first mode comprising delaying video if the video data is below a threshold in size, the second mode comprising compressing audio data if the video data is above the threshold in size; and transmitting the audio data from the first electronic device to the second electronic device according to the selected audio synchronization pattern.

One aspect features a system for selecting a mode for synchronizing audio playback between a first electronic device and a second electronic device. The system includes a first electronic device comprising: a memory comprising processor-executable instructions; a hardware processor configured to execute the processor-executable instructions; and a wireless transmitter in communication with the hardware processor. The processor-executable instructions are configured to: receiving video data and audio data; causing the wireless transmitter to wirelessly transmit clock information associated with the first electronic device to a second electronic device over a wireless network to establish a synchronized clock between the first electronic device and the second electronic device; programmably selecting an audio synchronization mode based at least in part on one or more of the video data, buffer characteristics, and network characteristics; and causing the wireless transmitter to transmit the audio data from the first electronic device to the second electronic device according to the selected audio synchronization pattern.

One aspect features a non-transitory physical electronic storage including processor-executable instructions stored thereon that, when executed by a processor, are configured to implement a system for selecting a mode for synchronizing audio playback between a first electronic device and a second electronic device. The system is configured to: receiving, at a first electronic device, audio data associated with a video; wirelessly communicating clock information associated with the first electronic device to a second electronic device over a wireless network to establish a synchronized clock between the first electronic device and the second electronic device; programmably selecting an audio synchronization mode based at least in part on one or more video or network characteristics; and transmitting the audio data from the first electronic device to the second electronic device according to the selected audio synchronization pattern.

For the purpose of summarizing the disclosure, certain aspects, advantages, and novel features of the invention have been described herein. It is to be understood that not necessarily all such advantages may be achieved in accordance with any particular embodiment of the invention. Thus, the invention may be embodied or carried out in a manner that achieves or optimizes one advantage or group of advantages as taught herein without necessarily achieving other advantages as may be taught or suggested herein.

Drawings

Fig. 1 illustrates an example system configured for synchronized media playback between a video device and two audio devices.

Fig. 2 illustrates an example system configured for synchronized media playback between a video device and an audio device.

Fig. 3 illustrates an example process for selecting a synchronization mode for transmitting audio.

FIG. 4 shows a more detailed example process for selecting a synchronization mode for transmitting audio.

Fig. 5 illustrates an example process for media delivery processing according to semi-isochronous mode.

6A, 6B, and 6C illustrate an example process for receiving and rendering audio according to a semi-isochronous mode.

Fig. 7 illustrates an example process for media delivery processing according to deterministic patterns.

FIG. 8 illustrates an example process for receiving and rendering audio according to a deterministic mode.

FIG. 9 shows an example process for initializing clock synchronization.

FIG. 10 shows an example process for re-synchronizing and correcting clock drift.

Detailed Description

Introduction to

While televisions typically include built-in speakers for rendering audio, some viewers desire to watch the television while listening to the audio through wireless headphones. Bluetooth wireless headsets can be used for this purpose, but the bluetooth protocol is not satisfactory because bluetooth wireless headsets are often not synchronized with television video. Viewers are thus subjected to an undesirable experience of listening to conversations and other audio that are disconnected in time from the conversation and other audio in the corresponding video. Therefore, bluetooth headsets are not preferred by consumers.

Unfortunately, the use of WiFi protocols (e.g., IEEE 802.11x) to deliver television audio has not adequately addressed this synchronization problem. The main reason for this is that the wireless protocols are asynchronous. The media source and the audio receiver may use separate clocks and their respective clock signals may not be synchronized or drift away during playback of the media file. Furthermore, the wireless network performance, including the latency level, may vary such that the audio data is more or less unpredictably delayed.

Certain embodiments of the present disclosure describe improved synchronization techniques for synchronizing video on one device (e.g., a Television (TV)) with corresponding audio played on another device (e.g., a phone or tablet). For example, through headphones connected to the mobile device, the listener can listen to audio that is correctly (or approximately) synchronized with the corresponding video played on the TV. A television, or more generally any media source (such as a set-top box or computer), may perform synchronization with a mobile application installed on a mobile device. Thus, some or all of the problems encountered by users of bluetooth headsets may be overcome.

Also, because the synchronization may be accomplished using mobile devices, the media source may provide the desired audio to different listeners by wirelessly transmitting individual audio streams to each listener's mobile device. In this way, each listener can listen through separate headphones and adjust various audio settings, such as volume or equalization (e.g., bass-to-treble balance).

In some embodiments, the systems and methods described herein can perform synchronized playback by selecting from at least two different modes (deterministic mode and semi-isochronous mode). Deterministic patterns can include, among other features, delaying video and playing back audio on the mobile device based on this known delay. A semi-isochronous mode, which may be used when deterministic processing is not available or based on other standards, may include compressing audio (even using lossy compression techniques) so that the audio can be transmitted as quickly as possible. These playback modes may be selected based on one or more factors, such as hardware buffering capabilities, network performance, or the type of media being played. Various clock synchronization, buffering, clock drift correction and data processing methods may also be employed to improve synchronization.

Although many of the following examples are described for convenience from the perspective of TVs and smartphones (see, e.g., fig. 1), it should be appreciated that these concepts generally extend to media sources and audio sinks (see, e.g., fig. 2). Furthermore, as used herein, the term "synchronize" and its derivatives refer to actual or approximate synchronization in addition to having their conventional meaning. Although the described systems and methods may achieve better synchronization than existing systems, a small delay between video and audio may exist, but is not noticeable to the listener. Thus, as described herein, "synchronization" may include approximate synchronization with no perceptible delay. Even such synchronization may have a small delay in some cases noticeable to the user due to network conditions, but the delay may generally be less intrusive than is achieved by currently available systems.

Detailed example System

Fig. 1 and 2 provide an overview of an example system in which the synchronization features described above may be implemented. Subsequent figures, fig. 3-10, describe embodiments of synchronization processes that may be implemented in systems such as those shown in fig. 1 and 2.

Fig. 1 illustrates an example system 100 configured for synchronized media playback between a video device and two audio devices, which may implement any of the methods described herein. The system 100 includes a set top box 101, a television system 103, a mobile device 105, a mobile device 107, and a network 109, the network 109 may be part of a wireless local area network or a wide area network.

The set top box 101 includes a processor 111, a media library 113, an interface 115, a control interface 117, and an external media interface 121 for receiving user input 119. The TV system includes a processor 123, a video buffer 125, a display 127, an interface 129, and a Wi-Fi transceiver 131. The first phone includes a processor 133, a user interface 135, a Wi-Fi transceiver 137 and a wireless transmitter 139 for transmitting to a wireless speaker 141 (optionally a wired speaker or headset may be used). The second phone includes a processor 143, a user interface 145, a Wi-Fi transceiver 147, an audio driver 149 and an auxiliary output 151 that can output audio to headphones 153.

The set top box 101 provides a source of media, such as a movie, video clip, video stream, or video game. Media may be received through the external media interface 121. The external media interface 121 may be a coaxial connection, HDMI connection, DVI connection, VGA connection, component connection, cable connection, ethernet connection, wireless connection, etc., to the internet, game console, computer, cable provider, broadcaster, etc. Alternatively, the set-top box 101 may have storage on a local computer-readable storage medium (such as a hard disk or Blu-A disk (not shown)) from a local media library 113.

A user may provide user input 119 using a remote control, smart phone, or other device through control interface 117. One or more processors 111 in the set-top box may process the user input to communicate the selected media information to the TV system through interface 115. The interface 115 may be a coaxial connection, HDMI connection, DVI connection, VGA connection, component connection, cable connection, ethernet connection, bus, etc. to interface 129 on the TV system.

The TV system 103 receives media for playback (in other embodiments, the TV system is a media source) through the interface 129. The one or more processors 123 may process the media to manipulate the audio data and the video data. The video data may be buffered in a video buffer 125 and then rendered in a display screen 127. However, the video buffering duration and whether or not to buffer video at all may be implemented according to the methods described herein. The audio data may be transmitted over the network 109 to the Wi-Fi connection 137 on the mobile device 105 and the Wi-Fi connection 147 on the mobile device 107 via the Wi-Fi connection 131 of the TV system 103. The same or different audio data may be transmitted to both mobile devices 105, 107. In some embodiments, the same audio data may be transmitted to both mobile devices 105, 107, and then both devices 105, 107 may individually adjust audio parameters such as volume or equalization (e.g., bass/treble balance or more detailed multi-band analysis). In some embodiments, the TV system 103 may send audio data in different languages to the two devices 105, 107, such as english audio to the mobile device 105 and spanish audio to the mobile device 107. In some embodiments, one or both of the mobile devices 105, 107 may output an enhancement narration for blind, personally processed audio or locally stored audio data or enhancement.

The mobile device 105 receives audio data over the Wi-Fi connection 137. The audio data is processed by the one or more processors 133 and rendered for output over the wireless interface 139 for playback over the wireless speakers 141. The user interface 135 of the mobile device 105 allows a user to interact with the system 100. For example, a user may select media content to play using an application on device 105; issuing playback commands such as start, stop, fast forward, skip and rewind; or otherwise interact with the set top box 101, TV system 103, or mobile device 105. In some embodiments, the user interface may be used to select local media on device 105 to be played by other devices on network 109, such as by TV system 103.

The mobile device 107 receives audio data over the Wi-Fi connection 147. The audio data is processed by the one or more processors 143 and driven through an audio driver 149 for playback through a headset 153 connected to the secondary output port 151 of the second phone. The user interface 145 of the mobile device 107 allows a user to interact with the system 100.

In various embodiments, the audio data received by mobile device 107 may be the same as or different from the audio data received by mobile device 105. When the audio data received by mobile device 105 and mobile device 107 are the same, the system may behave in a broadcast or multicast configuration.

More general example System

Fig. 2 illustrates an example system 200 configured for synchronized media playback between a video device and an audio device, which can implement any of the methods described herein. The system includes a media source 201 and a media player 203 connected to an audio receiver 207 via a network 205. The media source 201 includes media content 209. Media player 203 includes a receiver 211, one or more processors 213, memory, storage or memory and storage 215, a video buffer 217, a video renderer 219, and a wireless broadcaster 221. The audio receiver 207 includes a wireless receiver 223, a processor 225, a memory, storage or memory and storage 227, and an audio output 229.

The media source 201 provides media content 209 to the media player 203. The media source 201 may be a set-top box (e.g., satellite or cable), cable box, television, smart television, internet provider, broadcaster, smart phone, media bar (e.g., Google chrome) ^TM Etc.), media players (e.g., Roku) ^TM Device, etc.), video game console, Blu-A player, another computer, a media server, an antenna, combinations thereof, and the like. In some embodiments, media source 201 may be mediaA portion of the player 203, such as a locally stored media library in a hard drive of the media player 203, or the media source 201 may be the media player 203.

Media player 203 receives media content 209 for playback through receiver 211. The media player may be, for example, a TV, a computer, an audio-video receiver ("AVR"), etc. The one or more processors 213 may process the media content to manipulate the audio data and the video data. The video data may be buffered in video buffer 217 and then rendered in video renderer 219. In some embodiments, the video renderer 219 may be a display screen, a monitor, a projector, a virtual reality headset, and the like. However, the video buffering duration and whether or not to buffer the video being used at all may be determined according to the methods described herein. In some embodiments, the media player 203 may have limited video buffering or lack support or hardware for video buffering altogether. The audio data may be wirelessly transmitted to the audio receiver 207 via the network 205.

The audio receiver 207 receives audio data through the wireless receiver 223. The audio data is processed by one or more processors 225 and rendered for output through an audio output 227, such as headphones, wired speakers, or wireless speakers.

Overview of an example synchronization Process

Fig. 3 illustrates an example process 300 for selecting a synchronization mode for transmitting video and audio. The process may be implemented by any of the systems described herein. For example, software executing on a media source or media player may implement process 300.

At block 301, a video player (or other media source) and an audio player are identified on a network. This may occur, for example, when the video player and audio player connect to a network and receive a unique connection identification or address. Each of the video player and the audio player may run an application configured to facilitate synchronized audio and video playback.

At block 303, criteria for selecting a synchronization pattern are evaluated. For example, the criteria discussed below with respect to FIG. 4 may be used. Example synchronization modes include, but are not limited to, deterministic and semi-isochronous modes (see block 307).

At block 305, clock synchronization may be established between the video player and the audio player. This may be performed using one-way or two-way communication between the video player and the audio player. One example of a two-way communication clock synchronization system is Precision Time Protocol (PTP). In some embodiments, clock synchronization may be performed using a one-way broadcast over a network from a video player to one or more audio players. Embodiments based on a unidirectional broadcast synchronization clock may avoid asymmetric uplink/downlink network time-based miscalculations that affect some types of bidirectional clock synchronization schemes. Further, embodiments based on a unidirectional broadcast synchronized clock may perform a single synchronization with multiple devices, rather than waiting for each device to respond individually. An example method for establishing clock synchronization is described below with respect to fig. 9.

The video player and the audio player may initially start with clocks that are not synchronized and known to each other. Once clock synchronization has been established, the synchronized clock signal can be used to synchronize audio and video playback between the video player and one or more audio players. As described in more detail below, the synchronous clock may be used to time or track various parameters, such as delay periods, playback times, staging times, margin times and timestamps, and other parameters.

At block 307, a deterministic or semi-isochronous synchronization pattern is selected based on the evaluated criteria. At block 309, audio and video data is transmitted according to the selected synchronization mode.

Example procedures for selecting synchronization mode

Fig. 4 illustrates an example process 400 for selecting a synchronization pattern (deterministic or semi-isochronous) for transmitting audio. The process may be implemented by any of the systems described herein. For example, software executing on a media source or media player (hereinafter generally referred to as a "system") may implement process 400. In this embodiment, the system evaluates the various criteria depicted in blocks 401-409 to select either semi-isochronous or deterministic synchronization. This process may be performed by the media source or media player at a later time of manufacture (e.g., at the listener's home). In one embodiment, the system defaults to one mode (e.g., deterministic), but switches to another mode in response to one or more criteria being met. In another embodiment, less than all criteria are considered.

At block 401, the system determines whether the buffer of the video device (or of the media source) is sufficient for deterministic mode. Some videos, if uncompressed, may be too large to buffer, so it may not be possible (or less desirable due to degradation in video performance) to delay those videos for deterministic processing. Other video is delivered in compressed form and is more easily buffered. This step may involve identifying whether the video is compressed, identifying the corresponding size of the buffer, etc. Alternatively, the buffering capability of the system may be encoded at manufacture with metadata or the like to indicate whether deterministic mode is available or not, or to indicate how much of it is available depending on the bandwidth of the incoming video.

In another embodiment, this step (401) further comprises determining whether the video device even supports buffering. This determination may be based on the presence of buffering hardware or based on the capabilities of the firmware and software supporting buffering. In some embodiments, this determination may be made based on a lookup of the model of the system in a list of models known to have or not have buffering capabilities. In some embodiments, this determination may be made by attempting to issue a test command to buffer the video. If the video device does not support buffering, then a semi-isochronous mode is selected in block 411.

If the video device supports buffering at block 401, then at block 403 the system determines whether a particular media type classified as unsuitable for buffering is detected as a media source, such as a video game. Many video games are not suitable for video buffering used in deterministic mode because any delay in the video may be undesirable to the user. For example, the type of media used may be determined based on the presence of a video game console connected to the system through certain ports, based on the names of one or more running processes detected in the system, combinations of the above, and the like. If a video game (or some other particular type of media) is detected, then a semi-isochronous mode is selected in block 411.

If media other than the video game (or other media not suitable for buffering) is detected as a media source at block 403, the network may be tested for stability at block 405. This may be done, for example, by sending a ping to the network and measuring changes in ping time, by sending packets (packets) over the network and measuring changes in packet transfer time, by transmitting data to or receiving data from a remote host and measuring changes in transmission and reception speeds, etc. The test may be from the video player to a network, between the media player and the audio player, or to a remote internet source. The test measurements may be based on a synchronous clock. If the network transit time is not stable (e.g., changes by more than a certain amount or percentage, minimum speed drops below a certain amount, etc.), then a semi-isochronous mode is selected in block 411.

If the network transfer time is stable at block 405, then the network is tested for bandwidth at block 407. This may be accomplished, for example, by sending packets (e.g., typical amounts of audio data) over a network to an audio device or other destination. The rate at which data is received may be used to determine the bandwidth. The test measurements may be based on the synchronized clocks described above with respect to fig. 3 (see also fig. 9). If the network bandwidth is insufficient (e.g., average bandwidth is below a certain threshold, minimum bandwidth drops below a certain amount, etc.), then a semi-isochronous mode is selected in block 411.

If it is determined at block 407 that the network has sufficient bandwidth, then at block 409, the transmitter and receiver may determine whether both the video player and the audio player support playback via deterministic mode. This may be based on whether clocks are synchronized or can be synchronized between the video player and the media player and whether applications in both the video device and the audio device have configured the respective devices to support deterministic mode. If not, a semi-isochronous mode is selected at block 411. If both the video player and the audio player are synchronized for deterministic mode, then deterministic mode is selected at block 413. When network conditions change, one or more of the blocks shown in fig. 4 may be performed again to re-evaluate the network conditions and change the playback mode. Thus, the playback mode may change in real-time based on real-world network conditions.

Semi-isochronous mode example procedure

Fig. 5 illustrates an example process 500 for media processing according to a semi-isochronous mode. The process may be implemented by any of the systems described herein. For example, software executing on a media source or media player (hereinafter generally referred to as a "system") may implement process 500.

The semi-isochronous mode may be selected to quickly transmit and render audio. In a semi-isochronous system, audio data may be compressed by a media player, transmitted to an audio player, and if received within a reasonable time delay, the audio data may be rendered.

By compressing the data, semi-isochronous mode may perform better than deterministic mode when the network connection is congested and has low available bandwidth. Also, semi-isochronous mode may perform better than deterministic mode when network speeds are highly variable and audio packets may not reach the target destination before the delay time, because semi-isochronous does not involve buffering received audio to be played at a particular delay time (in some embodiments). Nonetheless, deterministic patterns may provide better performance in other scenarios where fast, stable network connections are available.

At block 501, audio data is packetized using, for example, currently available internet protocols (such as TCP/IP).

At block 503, the audio data is compressed. Because the compressed audio data uses less bandwidth, it can be transmitted faster over congested networks. Compression schemes may be lossless or lossy, where some lossy algorithms may provide faster performance (due to higher compression rates) at the expense of audio signal degradation. At block 505, a packet loss rate is optionally measured.

At block 507, Forward Error Correction (FEC) packets are optionally generated based on the loss rate. FEC may provide improved packet loss recovery using unidirectional communication such as audio transport as described herein. In addition, FEC may be particularly useful because the short amount of time audio data needs to be rendered may not leave sufficient time to transmit a request (followed by a retransmitted payload). Although many FEC algorithms may be used, in one embodiment, the system generates FEC packets by applying an XOR operation on one or more packets, as described, for example, in RFC 5109, "RTP Payload Format for Generic Forward Error Correction" (2007) (which is incorporated herein by reference in its entirety). At block 509, the FEC packets may be interleaved with the audio packets. At block 511, the packet may be transmitted to a receiver.

At block 513, video may be rendered on a first device, such as a TV system. In some embodiments, the video is rendered without buffering. In some embodiments, the video may be buffered briefly based on an average or minimum expected transfer and processing time. The buffering time may be estimated based on network tests (e.g., the tests discussed with respect to blocks 405 and 407 of fig. 4), and may be selected such that the audio will play at or just after the video (e.g., within 1 to 3 frames). For example, if the network ping typically ranges from 200 to 300ms (with occasional fast to 100ms), then the buffer may be 100ms (optionally plus the fastest audio rendering time).

Fig. 6A, 6B, and 6C illustrate example processes 600, 620, and 650, respectively, for receiving and rendering audio according to a semi-isochronous mode. For example, software executing on an audio receiver such as a mobile device may implement processes 600, 620, and 650.

Fig. 6A illustrates a first example process 600 for receiving and rendering audio according to a semi-isochronous mode. At block 601, an audio receiver receives compressed audio packets. At block 603, the receiver determines whether the compressed audio packets are received within a first threshold. If the compressed audio packets are not received within the first threshold, the audio receiver discards the compressed audio packets at block 605. The first threshold may be approximately the time it would take to render two video frames at the TV (about 66ms in some systems). If the compressed audio packets are received within the first threshold, the audio receiver stores the compressed audio packets in a holding buffer at block 607.

Fig. 6B illustrates a second example process 620 for receiving and rendering audio according to a semi-isochronous mode. At block 621, the audio receiver searches for the first packet from the holding buffer. The first packet may be a compressed audio packet stored in a holding buffer, for example, at block 607. At block 623, a determination is made whether the audio receiver finds the first packet in the holding buffer. If the first packet is not found, at block 625, a determination is made as to whether FEC or redundant data is available to construct the first packet. The receiver may attempt to identify redundant or FEC correction data. For example, redundancy or correction data may be obtained from previously transmitted error correction packets (such as FEC packets). Correction data may also be extracted for dropped packets if subsequent packets are available to guess missing packet data based on data in previous and subsequent packets using curve fitting techniques, as well as other possible techniques. If FEC data is determined to be available, then at block 629, a first packet is reconstructed from the FEC data.

If the first packet is found or reconstructed in the holding buffer, then at block 631, a determination is made as to whether the packet arrival time threshold has expired. If the packet arrival time threshold has not expired, then at block 633, the first packet is decompressed to generate decompressed audio data. At block 635, the decompressed audio data is added to a scratch buffer of the audio receiver. At block 637, the audio receiver increments the packet index to find the next packet (e.g., after processing the first packet, to find the second packet).

If no FEC data is available for reconstructing the first packet at block 625, a determination is made at block 627 as to whether the packet arrival time has expired to find the first packet. If the packet arrival time has not expired, process 620 may again proceed to 621 so that the first packet may continue to be searched from the holding buffer.

If the threshold time of arrival has expired for the first packet at either of blocks 627 or 631, the process may proceed to block 639 where the silence packet is inserted into the staging buffer in block 639. In some embodiments, duplicate packets may be inserted into the scratch pad buffer instead of inserting quiesces. After some delay, such as a delay of two video frames, the audio packets may be detectably out of sync with the video, so it may be preferable to drop the audio packets. Then, at block 637, the packet index may be incremented and process 620 may be repeated for the next audio packet.

Fig. 6C illustrates a third example process 650 for receiving and rendering audio according to a semi-isochronous mode. At block 651, it may be determined whether a threshold time has expired to render the next item in the staging buffer. If at block 651 the threshold time for rendering the next item in the scratch pad buffer has not expired, the process 650 may repeat block 651 until the threshold time has indeed expired. When the threshold time expires, the audio receiver may begin rendering the next item in the scratch buffer at block 653. The next item may be decompressed audio data added to the scratch buffer of the audio receiver, for example at block 634. The process may then return to block 651.

Deterministic pattern example procedure

Fig. 7 illustrates an example process 700 for media processing according to a deterministic pattern. The process may be implemented by any of the systems described herein. For example, software executing on a media source or media player (hereinafter generally referred to as a "system") may implement process 700.

In deterministic mode, the delay time may be determined to allow sufficient time for audio delivery and rendering in the presence of normal network fluctuations, while taking into account the delay in video rendering. The media source may deliver the audio payload once before the audio is scheduled for rendering. Once transferred to the second device, the audio payload may be buffered until the playback time, thus rendered in synchronization with the video. While audio is being transferred to and processed in the second device, video data may be buffered in the first device until playback time, and then the video data may be rendered. The media source and the receiving device may use the synchronizing clock signal to synchronize audio and video output (see, e.g., fig. 9 and 10).

At block 701, a delay time is determined. The delay time may be long enough so that the audio packets may be received by the audio device and rendered at the delayed time in synchronization with the video rendered by the video device. The maximum delay time may be determined by the video buffer capacity. The delay time may be set to be greater than the average or maximum expected delivery and processing time. The delay time may be set to be greater than the average transmit and process time plus a multiple of the standard deviation. If the delay time is long enough so that most packets can be rendered simultaneously, then the remaining small number of packets can be concealed using error concealment or correction techniques.

The delay time may be predetermined (e.g., at manufacture) or may be estimated based on network testing (e.g., the testing discussed with respect to blocks 405 and 407 of fig. 4). For example, if the network ping typically ranges from 400 to 500ms (with occasional lags of up to 900ms), then the delay time may be 900ms (optionally plus the time it takes to render the audio after receiving the audio packet), if this is supported by the video buffer.

The delay time may be measured based on a synchronous clock and may be in the form of an amount of time to be delayed (e.g., 2 seconds and 335 microseconds) after the timestamp. In some embodiments, the delay time is set in the form of the presentation time at which the audio and video are to be played (e.g., the video is buffered at 12:30:00:000pm and playback is set to 12:30:02:335 pm). In embodiments featuring multiple devices configured to play back audio, the measurement may be based on a worst case measurement among all of the multiple devices (while still within the buffering hardware capability). In some embodiments, the delay time may be 0.5, 1, 2, 3, 4, 5, or 6 seconds, or another similar time.

At block 703, the video is delayed for a delay time. The delay may be implemented using a video buffer, such as video buffer 217 of fig. 2. Some embodiments buffer the compressed video.

At block 705, a video may be played on the first device at a delayed time. The time of the delay is determined based on the synchronous clock. In some embodiments, playing the video may include decompressing the video to render it. The playback of video from the buffer can be very fast, but not instantaneous. Thus, in some embodiments, for further sophisticated performance, minor adjustments to playback time may be made when determining timing parameters (e.g., delay time).

At block 707, the audio data is packetized. The packets may be formed using timestamps based on a synchronous clock, sequence numbers of the packets, bit rates, buffer time, slack time, other header information, and an audio payload including compressed audio data. The timestamp may be initialized to the current time T0 of the media device that transmitted the packet. For subsequent S packets or scratch times, a subsequent timestamp or scratch time may be calculated by adding (S x D) to the initial scratch time or initial timestamp, where D represents the playback time of the packet, and where S is an integer. Calculating subsequent time stamps or a temporal time instead of reporting the actual time may in some embodiments improve the algorithm that corrects for clock drift. For example, when the current time measured by the clock of the audio device has drifted outside of an expected range compared to the timestamp or the scratch time, then a clock resynchronization may be performed to correct the clock drift.

At block 709, FEC packets may be generated as described above and interleaved with the packetized audio data. This may be done at the payload level, at the packet level, or at another level. FEC packets may be generated based on the loss rate.

At block 711, the audio packet is transmitted over the network to the second device.

Fig. 8 illustrates an example process 800 for receiving and rendering audio according to a deterministic mode. The process may be implemented by any of the systems described herein. For example, software executing on an audio receiver such as a mobile device may implement process 800.

At block 801, audio packets are received. For example, with respect to fig. 2, the wireless receiver 223 of the audio receiver 207 may receive audio packets transmitted over the network 205.

At block 805, the audio packet is unpacked and optionally a capture time for the audio packet may be determined. The capture time may be measured based on a synchronous clock. Various embodiments may unpack the package at different levels of hardware/firmware/software. The synchronization clock based timestamp, sequence number of the packet, bit rate, buffer time, slack time, other header information, and audio payload including compressed audio data may be determined by unpacking the audio packet.

At block 807, a temporal storage time for the audio payload is determined. The temporal time may be measured based on the synchronous clock signal and identifies the time or frame at which the video is or is to be played.

At block 809, the allowance time for the audio payload is determined. The allowance time may be an amount of time before or after a scratch time at which the audio payload may be rendered. In some embodiments, the slack time is determined and then communicated in a separate packet (e.g., at the beginning of a series of packets, or periodically inserted into a series of packets). The allowance time may be, for example, 20, 30, 33, 46, 50, 60, 66, 80, 99, or 100 milliseconds; 0.5, 1, 1.5, 2, 2.5, or 3 frames; and so on.

At block 811, a comparison is made to determine whether the audio packet was captured within the buffering time plus (or minus) the allowance time. If an audio packet is captured within the scratch time plus the slack time at block 811, the audio payload is buffered (814) and rendered (816) at block 813.

If the time at which the audio packets are captured at block 811 is after the scratch time plus the slack time, then redundancy or correction data may be obtained and constructed at block 815. The block 815 may be determined at a time that is still within the temporal margin plus or minus the allowance time. For example, redundancy or correction data may be obtained from previously transmitted error correction packets (such as FEC packets). Correction data may also be extracted for dropped packets if subsequent packets are available to guess missing packet data based on data in previous and subsequent packets using curve fitting techniques. At block 817, an audio payload constructed from the redundant or corrected data is rendered.

If no redundancy or correction data is available for the scratch time plus slack time at block 815, a silence or a copy of the previous packet may be rendered in place of the missing packet at block 819.

Example clock synchronization processing

Fig. 9 illustrates an example 900 procedure of an example method for initializing clock synchronization. FIG. 10 illustrates an example process 100 of an example method for resynchronization and correcting clock drift. These processes may be implemented by any of the systems described herein. FIGS. 9 and 10 are discussed in U.S. Pat. No.9,237,324 entitled "Playback Synchronization," filed on 12.9.2013, the entire contents of which are incorporated herein by reference.

The example method 900 may be performed by any media device having a wireless receiver (e.g., the audio receiver 207 of fig. 2). The example method 900 may be performed as part of block 305 of the example method 300 or at any time when synchronization of timing parameters between media devices is appropriate. Method 900 provides an example of clock synchronization based on unidirectional communication. The media device may repeatedly transmit ("flood") the clock signal as beacon messages over the network to the audio receiver in clock "ticks" at a constant rate (e.g., 100 ms). Due to variations in network performance, the audio receiver may receive the clock signal at varying times. However, the audio receiver can determine when a set of clock signals are received at an approximately constant rate (e.g., signals received at 100ms intervals) and use those timings as a basis for performing further statistical analysis to perfect clock synchronization. The audio receiver may do this without transmitting timing information to the media player.

The example method 900 begins at block 905 and proceeds to block 910 where the receiving device initializes a minimum offset variable "MinO" for use in maintaining a running minimum offset value when a new message is received or processed at block 910. Next, in block 915, the receiving device receives a beacon message from the transmitting device. Then, in block 920, the receiving device generates a timestamp based on the time currently represented by the clock of the receiving device. Such a timestamp may be referred to as a "receiver timestamp", "r (x)". The time elapsed between blocks 915 and 920 forms part of the fixed delay of the clock offset value to be calculated by the receiving device. As such, various implementations of method 900 strive to reduce or minimize the number of operations that occur between blocks 920 and 925.

In block 925, the receiving device extracts the sender timestamp "s (x)" from the beacon message. The sender timestamp is inserted into the beacon message by the sender device shortly before transmission of the beacon message. In block 930, the receiving device determines whether the sending device is a media source of the virtual media network. In such a case, the method 900 proceeds to block 935. The receiving device then translates the sender timestamp from the time domain of the sending device to the time domain of the virtual media network. Such a translation may involve adding or subtracting an offset previously negotiated between the two devices. Such translation and negotiation between time domains may be performed according to any method known to those skilled in the art. In some alternative embodiments, the source device and the media node maintain clocks in the same time domain. In some such embodiments, there are no blocks 930, 935.

After translating the sender timestamp into the virtual media network domain in block 935 or after determining that the sender is not a media source in block 930, the method 900 proceeds to block 940 where the receiving device calculates an offset value, such as, for example, a difference between two timestamps, based on the sender timestamp and the receiver timestamp in block 940. This current offset value "CurO" is equivalent to the true offset between the sender and receiver clocks plus any delay encountered by the beacon message between the creation of the two timestamps S (x) and R (x). As mentioned above, this delay consists of two components. The first component of the delay is a fixed delay associated with the time taken to traverse the hardware and software components of the network, such as, for example, a constant delay associated with the time taken by the OS between the circuit and data path traveled by the message along with the transmission/reception of the message and the generation of the associated timestamp. Such a fixed delay may already be considered as part of the rendering process. A second component of the delay is a variable network delay associated with a delay that varies over time. For example, a shared media network such as Wi-Fi may wait for media to be clear before transmitting, and as such may introduce different delays at different times.

Since the variable delay introduces only additional delay (and does not remove the delay), a better estimate of the real clock offset is obtained from the least delayed message. Thus, the method 900 searches for the smallest offset value obtained during the flooding of the beacon message as the best available estimate of the true offset. In block 945, the receiving device compares the current offset CurO with the previously located minimum offset or, if the current iteration of the loop is the first time, the minimum offset value MinO initialized in block 910. If CurO is less than MinO, then it is known that CurO represents a closer estimate of the true offset between the sender and receiver clocks, and in block 950, the receiver device overwrites the value of MinO with the value of CurO.

In block 955, the receiving device determines whether the sending device has completed flooding the beacon message. For example, the recipient device may determine whether a timeout has occurred while waiting for additional beacon messages, may determine that the sender device has begun sending media messages, may determine that a predetermined number of beacon messages have been received, or may determine that the sending device has transmitted a special message indicating the end of the flooding. In various embodiments, the recipient device determines whether flooding is sufficient to establish the desired accuracy of the drift. For example, the receiving device may track the interval at which the beacon message is received and, based on a comparison of the measured interval to a known time interval, may determine whether the network is stable enough to produce the desired accuracy of the offset value. If the network is not stable enough, the receiving device transmits a message to the sending device indicating that additional flooding should be performed. Various modifications will be apparent. It will be apparent in light of the teachings herein that various combinations of these and other methods for determining the sufficiency of beacon message flooding may be employed.

If the receiving device determines that additional flooding is being or will be performed, then the method 900 loops from block 955 back to block 915 to process additional beacon messages. Otherwise, the method 900 proceeds to block 960 where the receiving device resets the local clock based on the determined minimum offset at block 960. For example, the receiving device may subtract MinO from the current clock value to set the local clock to a new value that is estimated to be closer to the actual clock value of the transmitting device. In some embodiments where the fixed delay of the network is known or estimated, the receiving device subtracts the MinO from the current clock value and adds back in the fixed delay value in an attempt to separate the true clock offset value of the calculated offset value. In some embodiments, the receiving device does not alter the local clock at all, but may maintain a minimum offset value, MinO, for use in comparing the timestamp received from the sending device to the local clock. For example, the receiving device may add the MinO to the timestamp prior to any such comparison. Various other modifications will be apparent. Method 900 may then proceed to end in block 965. The reset clock at the completion of method 900 may be considered a synchronous clock.

In various alternative embodiments, the receiving device utilizes a previously established lower bound offset to help ensure that an unreasonably large offset calculated during the flooding period is not used to reset the clock. For example, if the flooding period is covered by a period of highly variable network delay, the calculated offset may be much larger than the actual value of the offset between the sender and receiver clocks. In some such embodiments, the receiver first compares the minimum offset calculated in block 940-950 to a previously established lower bound offset to determine whether the minimum offset is greater than the lower bound offset. If so, the receiver rejects updating the clock based on the minimum offset and continues to use the previously established lower bound. Otherwise, the receiver updates the clock as detailed in block 960 because the minimum offset value is less than the lower bound (and therefore a better estimate).

In various embodiments, the receiving device periodically performs the method 900 to re-establish synchronization. In some such embodiments, the receiving device resets the clock to its original value, deletes stored offset values, or otherwise "rolls back" any changes made based on previous executions of method 900, thereby "restarting" when the clock offset is determined. By periodically re-establishing the clock offset, the receiving device may better account for clock drift between the clocks of the transmitting device and the receiving device.

It will be apparent in view of the teachings herein that while method 900 is described as a real-time method of processing each beacon message as it is received, various alternative embodiments utilize a method of processing the beacon messages as a batch. For example, in some such embodiments, the receiving device receives multiple beacon messages, time stamps the messages as they are received, and processes the received messages in sequence at a later time to locate the minimum offset in a manner similar to that described with respect to blocks 925 and 960.

It will be appreciated that although the foregoing method attempts to generate the best estimate of the clock offset between the two devices. It is possible that the network conditions may temporarily improve after this initial flooding period and that a better estimate may be obtained later. Thus, methods may be employed after initial timing parameter establishment to attempt to better estimate clock offset. Such an approach may also address the possibility of clock drift, where differences in crystal, temperature, or other parameters may cause the transmitting device clock and the receiving device clock to operate at slightly different rates.

FIG. 10 illustrates an example process 100 of an example method for re-synchronizing and correcting clock drift. Due to imperfections, the local clocks of any device in the system may drift slowly. The example method may be used for better playback synchronization for a receiving media device during media streaming. The example method 1000 may be performed by any media device acting as a receiving media device. The example method 1000 may be performed as part of block 309 of fig. 3 or at any time when synchronization of timing parameters between media devices is appropriate.

The example method 1000 begins in block 1005 and proceeds to block 1010 where a receiving device receives a media data packet from a sending device at block 1010. Next, in block 1015, the receiving device generates a timestamp r (x) based on the time currently represented by the clock of the receiving device. In block 1020, the receiving device extracts the sender timestamp "s (x)" from the media data message. The sender timestamp may have been inserted into the media data message by the sender device shortly before transmission. In block 1025, the receiving device determines whether the sending device is a media source of the virtual media network. In such a case, the method 1000 proceeds to block 1030. The receiving device then translates the sender timestamp from the time domain of the sending device to the time domain of the virtual media network. Such a translation may involve adding or subtracting an offset previously negotiated between the two devices. Such translation and negotiation between time domains may be performed according to any method known to those skilled in the art. In some alternative embodiments, the source device and the media node maintain clocks in the same time domain. In some such embodiments, there are blocks 1020, 1030.

After translating the sender timestamp into the virtual media network domain in block 1030 or after determining that the sender is not a media source in block 1025, the method 1000 proceeds to block 1035 where the receiving device calculates an offset value, such as, for example, the difference between two timestamps, based on the sender timestamp and the receiver timestamp. In the case where the sender timestamp has been translated, the translated timestamp is used in calculating the offset. This offset value "O" is equivalent to the true offset between the sender and receiver clocks plus any delay encountered by the media data message between the creation of the two timestamps s (x) and r (x), including both fixed and variable delays. In block 1040, the receiving device determines if the offset value is a better estimate of the offset between the clocks than previously utilized to represent. For example, in various embodiments where the previously determined minimum offset is used to reset the clock of the receiving device, the receiving device determines whether the current offset, O, is less than zero. A positive result on this comparison indicates that the previously used minimum offset may have blended into some variable network delay and subtracted from the local clock "beyond" the ideal set point, thereby setting the local clock behind the sender's clock. The current offset O may exhibit this override by being negative by incorporating less (or zero) variable delay than the previously used minimum. In such a case, the current offset O will be determined to exhibit the new best estimate of the true clock offset and may be used to reset the local clock again in block 1045 to at least partially correct the previous override. Various modifications to other embodiments will be apparent. For example, in embodiments where the previously determined minimum offset is not used to modify the local clock and instead is persisted for timestamp comparison, block 1040 determines whether the current offset O is less than the previous minimum offset MinO, and if so, the receiving device sets MinO equal to O in block 1045. Various other modifications will be apparent.

In various alternative embodiments, the receiving device utilizes a previously established lower bound offset to help ensure that an unreasonably large offset calculated during the flooding period is not used to reset the clock. In some such embodiments, the receiver first compares the offset calculated in block 1035 to a previously established lower bound offset to determine whether the offset represents a better estimate of the true offset than the lower bound offset. If so, the receiver rejects updating the clock based on the minimum offset and continues to use the previously established lower bound. Otherwise, the receiver updates the clock as detailed in block 1045, since the offset value is a better estimate than the lower bound.

In block 1050, the receiving device continues to process the received media packets, for example, to render the media output at the appropriate time. For example, the receiving device may extract or calculate a presentation time from the media data packet that is separate from the sender timestamp and the receiver timestamp. Such a presentation time indicates a time at which the media data carried by the message should be rendered. After extracting the presentation time, the receiving device causes the media data to be rendered at a time that matches the presentation time. For example, a receiving device may buffer media data for playback by a local playback device, or may forward a message to another media node for playback. It will be understood that "matching" the current time of the presentation time may encompass equivalence between the current time and the presentation timestamp, but may also encompass other forms of matching. For example, in various embodiments, the current time matches when the current time minus a persistent minimum offset value equals the presentation timestamp. Additionally or alternatively, a fixed delay value is added, subtracted, or otherwise considered for a comparison of matches. Various other methods for determining an appropriate time for playback based on a local clock, presentation time stamps, and other potentially available values will be apparent. Furthermore, the concept that the current time matches the presentation time based on the minimum offset value will be understood to encompass comparisons with local clocks that have previously been modified by the minimum offset value but do not explicitly consider the minimum offset value. Various embodiments perform such comparisons immediately prior to output to ensure that the data is output at the appropriate time. Other embodiments use such comparisons to insert media data into the playback buffer at a location where the media is likely to be played back at the presentation time. Such insertion may involve inserting "dummy" data to adjust the timing of playback before inserting the media data. Various additional methods of controlling the playback timing of the data in the buffer will be apparent.

Further embodiments

As shown in fig. 1, the telephone 105 may output audio through a wireless speaker 141 or through a wired headset 153. Audio can be rendered faster through wired headphones than through wireless speakers. Thus, such variations in rendering time may be taken into account during audio synchronization. For example, in a deterministic mode, if rendering and transfer through the wireless speaker 141 takes 25ms, the phone 1105 may transfer data to the wireless speaker 25ms before the scratch time. In another example, in deterministic mode, if rendering and transmission through the wireless speaker takes a variable 25-50ms, the phone 1 may implement deterministic mode such that the phone transmits audio to the wireless speaker at least 50ms before the audio is scheduled to play, and also transmits a delay time indicating when the audio should be played. The wireless speaker may receive the audio and buffer it until the delay time is over, and then play the audio.

In some embodiments, the audio may be passed through a plurality of intermediary devices communicating via a network before the audio is finally rendered. For each intermediate step of the transfer over the network, the sending device and the receiving device may perform the above method such that the final audio rendering will be synchronized with the video playback.

One aspect features a method for multimodal synchronized media playback between an audio player and a video player, the method including identifying a video player connected to a wireless network and an audio player connected to the wireless network; synchronizing a clock signal between the video player and the audio player; determining a deterministic mode or a semi-isochronous mode as an audio synchronization mode; receiving an audio packet; unpacking the audio packets to extract: a timestamp and an audio payload; determining a reception time of the audio packet measured by the synchronous clock signal; and rendering the audio output according to the audio synchronization pattern.

In some embodiments, the audio synchronization mode is a deterministic mode, and the method further comprises determining an expected playback time measured by the synchronization clock signal based at least in part on the time stamp; buffering the audio payload until an expected playback time; and rendering the audio payload at playback time.

In some embodiments, the audio synchronization mode is a deterministic mode, and the method further comprises determining an expected playback time measured by the synchronizing clock signal based at least in part on the time stamp; determining that an audio payload will not be available to an expected playback time; and rendering the filler (filer) package at the desired playback time.

In some embodiments, the audio synchronization mode is a deterministic mode, and the method further comprises: determining an expected playback time measured by the synchronized clock signal based at least in part on the time stamp; determining that the audio payload will not be available to the expected playback time; constructing an audio payload from the error correction data; and rendering the audio payload at the desired playback time.

In some embodiments, the audio synchronization mode is a semi-isochronous mode, and the method further comprises determining a time to receive the audio packet using the synchronization clock signal; determining an expected playback time based at least in part on the timestamp; and rendering the audio payload in response to determining that the expected playback time has not elapsed. The method may further include determining the expected playback time includes adding an allowance time to a scratch time, the scratch time being a time stamp.

In some embodiments, the audio synchronization mode is a semi-isochronous mode, and the method further comprises determining a time to receive the audio packet using the synchronization clock signal; determining an expected playback time based at least in part on the timestamp; and rendering the fill packet in response to determining that the expected playback time has elapsed.

In some embodiments, the audio synchronization mode is a semi-isochronous mode and the method further comprises determining a time to receive the audio packet using the synchronization clock signal; determining an expected playback time based at least in part on the timestamp; constructing an audio payload from the error correction data; and rendering the constructed audio payload in response to determining that the expected playback time has elapsed.

In some embodiments, the method further includes testing the wireless network to determine a stability of the network and a bandwidth of the network, and determining the audio synchronization pattern based at least in part on the stability of the network and the bandwidth of the network. In some embodiments, the method further comprises correcting for drift of the clock signal by resynchronizing the clock signal.

In some embodiments, synchronizing the clock signal between the video player and the audio player is performed using a one-way communication from the video player to the audio player.

Term(s) for

In the foregoing embodiments, an apparatus, system, and method for multimodal synchronized rendering of video and audio have been described in connection with specific embodiments. However, it will be appreciated that the principles and advantages of the embodiments may be applied to any other system, apparatus or method across network devices to improve synchronization of video and audio. Although certain embodiments are described with reference to a telephone, smart television, or other specific device, it will be understood that the principles and advantages described herein may be applied to a variety of devices. Although some disclosed embodiments may be described with reference to particular wireless protocols or networks, it will be understood that the principles and advantages herein may be applied to a variety of networks and protocols. Also, while some equations and timing are provided for illustrative purposes, other similar equations or timing may alternatively be implemented to achieve the functionality described herein.

The principles and advantages described herein may be implemented in various devices. Examples of such devices may include, but are not limited to, consumer electronics, components of consumer electronics, electronic test equipment, and the like. The components of the electronic device may also include memory chips, memory modules, optical network circuits or other communication networks, and driver circuits. Other examples of devices in a network with audio or video capabilities may include mobile phones (e.g., smart phones), healthcare monitoring devices, in-vehicle electronic systems such as automotive electronic systems, telephones, televisions, computer monitors, computers, handheld computers, tablet computers, notebook computers, Personal Digital Assistants (PDAs), microwave ovens, refrigerators, stereos, cassette recorders or players, DVD players, CD players, Digital Video Recorders (DVRs), video recorders, MP3 players, radios, portable cameras, digital cameras, portable memory chips, copiers, facsimile machines, scanners, multifunction peripherals, wristwatches, clocks, and so forth. In addition, the device may include unfinished product.

Unless the context clearly requires otherwise, throughout the description and the claims, the words "comprise", "comprising", and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is, these words should be interpreted as "including, but not limited to". As generally used herein, the terms "coupled" or "connected" refer to two or more elements that may be connected directly or through one or more intermediate elements. Furthermore, the words "herein," "above," "below," and words of similar import, when used in this application, shall refer to this application as a whole and not to any particular portions of this application. Words in the detailed description that use the singular or plural number may also include the plural or singular number, respectively, where the context permits. The word "or" in reference to a list of two or more items is intended to cover all of the following interpretations of the word: any item in the list, all items in the list, and any combination of items in the list. All numerical values provided herein are intended to include similar values within the error of measurement.

Moreover, conditional language, such as "may," "might," "for example," "such as," and the like, as used herein, is generally intended to convey that certain embodiments include, while other embodiments do not include, certain features, elements, and/or states, unless specifically stated otherwise or understood otherwise within the context of the usage.

The teachings of the invention provided herein may be applied to other systems, not necessarily the systems described above. The elements and acts of the various embodiments described above can be combined to provide further embodiments. In variations of the above-described embodiments of the method, some blocks may be omitted, reordered, out-of-order, or performed in sequence or in parallel.

While certain embodiments of the present invention have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the present disclosure. Indeed, the novel methods and systems described herein may be embodied in a variety of other forms. Furthermore, various omissions, substitutions and changes in the form of the methods and systems described herein may be made without departing from the spirit of the disclosure. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the disclosure. Various examples of the systems and methods described herein may include many advantages, no single one of which is solely responsible for its desirable attributes. Rather, the invention is defined by the claims.

Claims

1. A method for selecting a mode for synchronizing audio playback between a first electronic device and a second electronic device, the method comprising:

receiving video data and audio data at a first electronic device, the first electronic device comprising a television or a media source coupled to a television;

wirelessly communicating clock information associated with the first electronic device to the second electronic device over a wireless network to establish a synchronized clock between the first electronic device and the second electronic device, the second electronic device being a mobile device;

programmatically selecting, using a hardware processor of the first electronic device, an audio synchronization mode based at least in part on the video data, wherein the audio synchronization mode is selected between a first mode and a second mode, the first mode comprising delaying video data if the video data is below a threshold in size, the second mode comprising compressing audio data if the video data is above the threshold in size; and

transmitting the audio data from the first electronic device to the second electronic device according to the selected audio synchronization pattern.

2. The method of claim 1, further comprising delaying the video data in response to selecting the first mode.

3. The method of claim 1 or 2, further comprising compressing the audio data in response to selecting the second mode.

4. The method of claim 1 or 2, further comprising compressing the audio data and delaying the video data in response to selecting the first mode.

5. The method of claim 1 or 2, wherein selecting the audio synchronization mode comprises selecting the second mode in response to detecting a video game in the video data.

6. The method of claim 1 or 2, wherein selecting the audio synchronization mode comprises selecting the second mode based on a stability of the wireless network or a bandwidth of the wireless network.

7. The method of claim 1 or 2, wherein wirelessly transmitting the clock information comprises transmitting a plurality of packets, each packet spaced at approximately equal time intervals.

8. The method of claim 1 or 2, further comprising determining a delay time for the first mode, the delay time being determined based on a buffer capacity of the first electronic device.

9. The method of claim 8, wherein the determining a delay time comprises selecting a delay time that is longer than an average transit time for packets to be sent from the first electronic device to the second electronic device.

10. The method of claim 1 or 2, further comprising transmitting the interleaved forward error correction information to an audio player.

11. The method of claim 1 or 2, wherein synchronizing a clock signal between a video player and an audio player comprises resynchronizing the synchronized clock after a period of audio rendering to account for clock drift.

12. A system for selecting a mode for synchronizing audio playback between a first electronic device and a second electronic device, the system comprising:

a first electronic device, the first electronic device comprising:

a memory comprising processor-executable instructions;

a hardware processor configured to execute the processor-executable instructions; and

a wireless transmitter in communication with the hardware processor;

wherein the processor-executable instructions are configured to:

receiving video data and audio data;

causing the wireless transmitter to wirelessly transmit clock information associated with the first electronic device to a second electronic device over a wireless network to establish a synchronized clock between the first and second electronic devices;

programmably selecting an audio synchronization mode based at least in part on the video data, wherein the audio synchronization mode is selected between a first mode and a second mode, the first mode comprising delaying video data if the video data is below a threshold in size, the second mode comprising compressing audio data if the video data is above the threshold in size; and

causing the wireless transmitter to transmit the audio data from the first electronic device to the second electronic device according to the selected audio synchronization pattern.

13. The system of claim 12, wherein the first electronic device is a television or a set-top box.

14. The system of claim 12, wherein the selection of the audio synchronization mode comprises evaluating a bandwidth size of the video data.

15. The system of claim 14, wherein the second mode is selected in response to the bandwidth size exceeding a buffer capacity of the first electronic device.

16. A non-transitory physical electronic storage comprising processor-executable instructions stored thereon that, when executed by a processor, are configured to implement a system for selecting a mode for synchronizing audio playback between a first electronic device and a second electronic device, the system configured to:

receiving video data and audio data at a first electronic device;

wirelessly communicating clock information associated with the first electronic device to a second electronic device over a wireless network to establish a synchronized clock between the first electronic device and the second electronic device;

17. The non-transitory physical electronic storage of claim 16, wherein the second mode comprises applying lossy compression to the audio data.

18. The non-transitory physical electronic storage of claim 16 or 17, wherein forward error correction is employed in the first mode or second mode.