[go: up one dir, main page]

WO2025160185A1 - Distributed live media production system - Google Patents

Distributed live media production system

Info

Publication number
WO2025160185A1
WO2025160185A1 PCT/US2025/012607 US2025012607W WO2025160185A1 WO 2025160185 A1 WO2025160185 A1 WO 2025160185A1 US 2025012607 W US2025012607 W US 2025012607W WO 2025160185 A1 WO2025160185 A1 WO 2025160185A1
Authority
WO
WIPO (PCT)
Prior art keywords
instances
cloud
compute
instance
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/US2025/012607
Other languages
French (fr)
Inventor
Mike Coleman
Jialu WANG
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Port 9 LLC
Original Assignee
Port 9 LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Port 9 LLC filed Critical Port 9 LLC
Publication of WO2025160185A1 publication Critical patent/WO2025160185A1/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals

Definitions

  • the invention relates to the field of live video production and editing, particularly to systems and methods for cloud-based distributed live media production. These methods encompass processes of modifying and enhancing video content for various purposes, including entertainment and reporting.
  • Hardware Limitations Media production is a resource-intensive task, requiring substantial processing power, memory, and storage capacity. Many users will experience performance bottlenecks and long rendering times when working with high-definition or 4K video content, with even more severe problems occurring when editing over the net, in communication with a remote server. In these cases the upstream and downstream bandwidths will make realtime or near-realtime work impossible, especially when dealing with high definition video.
  • Cloud media production offers numerous benefits including scalability and offloading of hardware costs, maintenance and operations to the cloud provider. These platforms offer a range of features, from basic video trimming to advanced editing and collaboration tools.
  • media production software that has been moved to the cloud is not particularly appropriate for professional use: It operates on low- quality compressed media and has resiliency problems. The so-called "blast radius" - namely, how much functionality will be lost if an instance fails - is large.
  • US9185321B1 “Apparatus, system and method for processing video data, audio data and ancillary data” provides a method for processing video and other data for broadcast having a plurality of processing components and a ‘control engine’ configured to receive topologies from a host system.
  • the topologies are groupings of processing components adapted for processing the video and other data.
  • the control engine determines and sends commands for execution by the processing components on a frame-by-frame basis.
  • the network bandwidth needs to support the entire number of sources, since decisions about which sources to use are made internal to the switcher.
  • the cpu capability of the server will limit the number of processing functions available.
  • US 10966001B2 Remote cloud-based video production system in an environment where there is network delay
  • video sources communicate with a cloud-based video production server and a remote user interface via a network.
  • a control unit is in communication with the production server and the remote user interface.
  • a buffer is located between each of the video sources and the control unit to allow for network delays.
  • Commands for selecting and manipulating video content from the video sources are sent from the user interface to the control unit, each of the commands containing a command timestamp corresponding to the video timestamp of the video frame displayed on the user interface when the command is issued.
  • the control unit executes each command at a time when the video timestamp of the frame in the corresponding buffer corresponds to the command timestamp.
  • the control unit outputs a video program in accordance with the commands.
  • this method will in general require a fixed set of cloud hardware for its operation, a requirement that as we will see below is not a necessary limitation.
  • the network bandwidth needs to support the entire number of sources, since decisions about which sources to use are made internal to the switcher, and likewise the cpu capability of the server will limit the number of processing functions available.
  • EP2683155A1 describes a system for processing video having a control unit and a processing unit which exchange data in a packetized format.
  • the system allows splitting the processing ability of a large vision mixer into smaller subunits, and also allows for realtime processing behavior.
  • Command signals are sent from the control unit to one or more of the subunits, and an extremely complex scheduling scheme is implemented to control the timing of the execution of command signals received in the processing units to compensate for signal latencies and processing latencies.
  • the system does not allow for spawning subunits on the fly, nor does it proactively introduce delays that would allow the system to recover gracefully from instance failure, communication breakdown, or other unexpected problems.
  • this system assumes synchronous processing and real-time network transfer, and requires a specific hardware implementation.
  • the present invention seeks to address the limitations and challenges associated with conventional media production methods by intelligently leveraging the capabilities of cloud computing.
  • a distributed, asynchronous media production method is introduced that requires a minimum of resources while providing highly reliable, near-realtime, distributed editing capabilities.
  • Figure 1 depicts a block diagram of a commercial implementation of a realtime distributed editing system.
  • Figure 2 presents a simplified block diagram of one possible implementation of the invention.
  • Figure 3 presents a block diagram of a generic vision mixer implemented on a single cloud instance.
  • Figure 4 presents a block diagram of an ‘atomic unit’ of the invention implemented on a single cloud instance.
  • Figure 5 presents a block diagram of several of the atomic units of the invention connected in a network.
  • Figure 6 shows details of queuing in one of the atomic units of the invention.
  • Figure 7 shows a latency histogram for typical cross areal-zone latencies, for modern (ca. 2022) cloud services.
  • Figures 8A, B, C show timing diagrams detailing delays for one possible embodiment of the invention.
  • Figure 9 shows a simplified system diagram of a possible embodiment of a system allowing for studio-based control-over system operation.
  • Figure 10 shows a system diagram of a possible embodiment of the invention allowing for ground based control-over system operation.
  • Figure 11 shows a system diagram of a possible embodiment the invention allowing for both ground- and cloud-based control over system operation, where the cloud-based operation occurs through a web interface.
  • the invention introduces a solution for cloud-based media production that fully exploits some unique advantages that use of cloud infrastructure permits.
  • a key aspect of the invention is that it introduces the use of an internally asynchronous process, without the timing ‘genlock’ of standard video hardware and software. This allows the system to exploit the full power of available processors, eliminating waiting loops, and leveraging the possibility of ‘faster-than video’ processing (where it takes less than one frame time to process a given frame).
  • An induced, fixed delay (1 second for example) from input to output is maintained by the system. All outputs are available synchronously after this fixed delay. Internally all operations are performed asynchronously, as soon and quickly as possible.
  • the induced fixed latency creates a large “repair window” time allowing for recovery from packet loss, instance failure, communications breakdowns, and so on. By means of this delay, there is enough time for operations to be redone and outages to be routed around.
  • the fixed delay also makes it easier to integrate into user’ s workflows that typically do not expect variable processing time.
  • each atomic unit comprises a flow router handling input and output (the flow routers are primarily concerned with moving full resolution uncompressed media between cloud instances), some locally shared memory, and one or more processing functions (for example, digital video effects such as mixes, cuts, keys, and so on).
  • the flow router and processing functions are controlled external by means of a control plane, with control being handled by means of a message-passing paradigm such as NATS.
  • the control plane of the invention is responsible for: i. Breaking processing into atomic processing units ii. Scheduling media flow between units, iii. Scheduling work on instances iv. Maintaining resilience, providing oversight and fallback in cases of instance failure
  • the control plane calculates the necessary topology needed to implement the desired switcher and launches the necessary instances, possibly launching a given instance just prior to the point when it is needed.
  • the control plane instructs each instance’s flow routers of their upstream and downstream connections, which can change in realtime as operations are performed on the switcher.
  • the control plane In order to optimize costs, the control plane is constantly organizing the type and number of instances with respect to their availability. It always attempts to use the least expensive instances currently available that are suited to each task. When work increases and more instances need to be added the control plane may temporarily use higher price instances if no lower cost instances are available, but it will also keep trying to start cheaper instances. Due to the way the system is architected, expensive instances can just be killed when necessary and the work will be repaired and continued on the new cheaper instances.
  • Figure 7 shows a histogram of latency for several cloud providers, for traffic between servers across accessibility zones. Similar histograms can be expected for latency between cloud and end-users, generally with larger latencies to be expected. With increasing latency beyond some point, there will be a decrease in the probability of occurrence. By means of access to such data (and similar data such as number of packets dropped over time, etc.), the budget/qual- ity algorithm can better plan its instance requisition, for example using systems with only 0.1% occurrence of latency from cloud to ground over 250ms. It is within provision of the invention to dynamically make such measurements so as to take advantage of the dynamic nature of internet communications.
  • FIG. 5 A more sophisticated example of the capabilities of the invention are shown in Fig. 5, where a number of cloud instances are shown connected in a network. Each of these may be used to perform a different operation on a video stream, with one for instance responsible for transitions, another for overlays, a third for video effect A, a fourth for video effect B, and so on. Due to the nature of the setup, instances that ‘die’ can be replaced by new instances seamlessly.
  • a key advantage of the invention is due to its granular nature. Expensive instances (e.g. with GPUs and/or large memory capacity, low latency, high bandwidth, large amounts of processing power, or other special capabilities) are used only when absolutely necessary, thus potentially saving money. This is in direct contrast to conventional systems such as that shown in Fig. 3, which has all blocks on one instance which must be sized for worst case scenarios (e.g. of the maximum possible number of streams to be dealt with and minimum possible latency requirements), and once requisitioned, are generally in use for months or years. Upgrading involves requisitioning an even larger instance, with the same fundamental issues. Furthermore, most if not all functionality will be lost if a problem arises with the instance (communication is lost, or the instance goes down, for example).
  • Another provision of the invention is that large data transfers are broken into smaller ‘flowlets’ or independent communications channels, which eases requirements for minimum bandwidths since multiple such channels can be combined in parallel for a larger aggregate bandwidth.
  • a message passing system such as NATS is used to send information between instances and between instances and the control plane.
  • edits are sent in the form of messages from an editing station (e.g. an instance running on an editor’s laptop) to instances dealing with the full resolution video (e.g. running on high-bandwidth cloud servers).
  • an editing station e.g. an instance running on an editor’s laptop
  • instances dealing with the full resolution video e.g. running on high-bandwidth cloud servers.
  • edits are created using downsampled video but operate on full resolution video.
  • FIG. 2 A block diagram of such a setup is shown in Fig. 2 where the cloud service is shown in the upper large rectangle and a local (e.g. laptop) editing station in the lower large rectangle.
  • a set of sources feed into the cloud instance at full resolution.
  • Necessary elements of the incoming video streams are sent from the cloud instance to the local instance but at lower resolution and with some fixed latency such as 250mS, the frames being sent with timestamps.
  • the local instance has an app running that is suited for media production such as a switcher app with an associated control surface that allows the editor to control the effects being applied.
  • Proxy streams as seen in Fig. 2 are created on the cloud gateway instances (top large rectangle of Fig. 2) that receive the full resolution sources.
  • the proxy streams may in some embodiments be sent through an intermediate instance (not shown), for increased resilience.
  • Induced latencies allow the system to operate in realtime while still allowing for synchronization of editing stations.
  • an editor and producer at disparate locations, using landline phones for realtime communications; since they are synchronized (at worst, to within a few tens of milliseconds required for telephone signal propagation), they will be able to edit in unison effectively only if both see synchronized video streams.
  • the provision of induced latency allows for this, while still allowing enough time for the slightly-delayed editing decisions to be relayed to the cloud, applied to full resolution video streams, and output to consumers within the one-second output latency deadline.
  • the induced latency may be for example 250ms for all ground instances, and 400ms induced latency for full-resolution video on source instances. As mentioned this induced latency allows for multiple ground stations to operate in synchrony despite any differences in latencies from cloud to ground for the different stations. Furthermore, since the editing operations are performed locally on the proxy video, there is no ‘button delay’ (where an editing command would be performed on the ground, transmitted to cloud, and transmitted back to ground for viewing) - the edit is performed and shown immediately, and in the meantime sent to the cloud to be performed on the full-resolution video at the appropriate time.
  • the editing functions are made available by means of a web interface.
  • users can access their media production projects from any device with an internet connection, eliminating the constraints of local software and hardware installations and making media production more cost-effective for individuals and businesses.
  • Fig. 6 shows a more detailed example of the asynchronous processing used in the invention.
  • the media is processed asynchronously, without use of a genlock or other clocking mechanism other time stamps for all media (e.g. each frame of video and audio). Frames are thus processed upon arrival (or arrival of all inputs to a given block).
  • the processing function will be combining or transitioning between two sources. This usually requires taking media frames from each source that have the same time stamp, combining them and writing the output using the same timestamp. Since the processing can often happen much faster than real time, and the connections are from other networked instances, there is buffering in shared memory to enable queuing the samples on and off the instance.
  • the control plane supplies static configuration for the processing function, and timed commands corresponding to operational changes (wiper position, pattern changes, etc.).
  • the control plane arranges to have the media show up some time after its associated timed command, so when the command is read there is still time to operate on it. This typically means there is a small queue for timed commands as well; commands from the ground editing station are queued by the control plane in the cloud.
  • a media processing function has media as inputs, and access to this command queue. Since the commands were delayed by the proxy process the source gateways also delay the full resolution media, so it does not appear at the processing functions until after the timestamped commands are available from the ground.
  • Figs. 8-10 show some timing charts illustrating the various time delays (both intentional and unavoidable) of one implementation of the system.
  • Fig. 8 row 1 shows time in frames since the input of a video sequence to a cloud instance of the invention.
  • the cloud and ground e.g. the laptop production station. This delay can be enforced if, for instance, the actual cloud-ground latency has a form such as shown in Fig. 7 with the vast majority of data arriving within (for example) 10ms.
  • the remaining 240ms is added at the ground.
  • a different ground production station suffering from a 20ms cloud-ground delay would correspondingly have 230ms added at the ground, thus keeping both ground stations in sync.
  • this 250ms delay corresponds to 15 frames for the case of 60 fps video.
  • row 2 of Fig. 8 shows ground processing beginning at frame 15 of row 1.
  • the ground producer may now perform some operation on this frame.
  • a command corresponding to this operation is now sent back to the cloud, with some inevitable delay equivalent to (for example) 8 frames.
  • the cloud instance will receive the command for operating on Frame 1 at the time when the original video (row 1) has hit frame 23. This is seen when comparing row 1 with row 3.
  • the inventive system be provided as a cloud service, allowing broadcasters to set up a live production studio in the cloud, fully managed and paid by the minute. The broadcaster does not have to deal with any infrastructure or maintenance.
  • This cloud service supports the use case shown for example in Fig. 9, where a sports production is controlled by both users at their homes and a central studio, managed by sending and processing the audio and video in the cloud.
  • Many broadcasters have partial solutions in this direction, implemented for example by running prosumer Windows applications in the cloud, controlled and monitored using a remote desktop connection.
  • Low price The invention exploits cloud infrastructure to provide reliable systems at low cost. For example, most of the processing can be done on low cost CPU instances.
  • Fig. 10 shows a more detailed view of the cloud, event, and ground parts of one possible implementation of the system.
  • the event will generally have one or more cameras covering various aspects of the event.
  • Audio-only channels may also be employed for commentators or the like. Each of these channels may send its data independently to the cloud service of the invention, through any means of connectivity found convenient and of sufficient bandwidth.
  • the system determines the amount of buffering needed for each source in order align them all temporally.
  • proxy generation and buffering occur as explained previously, and the proxy video is sent to the editing stations with a fixed (e.g. 250ms) latency.
  • Editing is done at the ground editing stations (which may be widely distributed and require only a minimum of bandwidth) and the time-stamped editing commands are sent back to the cloud processor(s).
  • the full -re solution operations corresponding to the editing commands made on the ground are performed in the cloud, and the fully-edited full resolution video resulting is sent on to subsequent services at a fixed (e.g. 1 second) delay from start to finish.
  • the ground stations may be operated by technical directors receiving verbal commands from a producer, over phone or other connection.
  • webbased control over the system may also be provided by means of a simplified GUI.
  • the proxy switching or other editing operations can be performed in the cloud.
  • the cloud-based proxy editor would present a web interface viewed and controlled, for instance, over a WebRTC connection.
  • Several of these are instantiated in the cloud, ideally physically located near the operator to reduce latency.
  • Drawbacks include: a. The web GUIs are not as full featured as a native application b. There is "button press" latency, where the operator's presses get delayed by the network round-tri c. it is not as easy to keep exact sync across a distributed team [0060] It is within provision of the invention to allow for users to integrate existing NDI - format editing material on the ground, by converting the proxy content of the invention as described above, into NDI format on the ground, and using a headless back end in the cloud (if necessary). Thus existing replay, switching, or processing products may be integrated with the invention, using the proxy content of the invention in NDI format on the ground and sending commands to the user’s backend in the cloud, operating on uncompressed media.
  • FIG. 11 An implementation providing for both ground-based control surfaces and webbased control surfaces (using for instance a web GUI) is shown in Fig. 11.
  • the replay system may comprise an existing product that uses standard inputs and outputs such as NDI.
  • the inventive system would provide a gateway that converts to / from NDI and communicates with the cloud system of the invention as described above.
  • the green boxes are operated by the existing product vendor, while the rest of the system is provided by the invention.
  • the provider of the existing editing product only need make two changes to their product to fit with the inventive service:
  • the application must be refactored into front end / back end if it is not already capable.
  • the processing back end could be as simple as making a "headless” version that can run in the cloud and accept commands via remote control.
  • Any commands received from the user operating on the ground are sent from the modified ground product to the REST API of the gateway where they are forwarded to their processing back end in the cloud.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

A distributed, asynchronous, cloud native media production method is introduced that requires a minimum of resources while providing highly reliable, realtime distributed editing capabilities. Atomic instances having flow buffers and shared memory are spawned by a control plane as necessary to perform transformations upon data streams, such as (for instance) transitioning between two video streams.

Description

DISTRIBUTED LIVE MEDIA PRODUCTION SYSTEM
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of priority of U.S. Provisional Patent Application No. 63/623,812, filed January 23, 2024, the contents of which are incorporated herein by reference in its entirety.
BACKGROUND OF THE INVENTION
[0002] The invention relates to the field of live video production and editing, particularly to systems and methods for cloud-based distributed live media production. These methods encompass processes of modifying and enhancing video content for various purposes, including entertainment and reporting.
[0003] Traditional media production involves resource-intensive tasks that are often performed on local hardware and software, for example in dedicated editing rooms forming the ‘nerve center’ of television studios. The usual editing workflows typically involve the use of specialized software and hardware installed in such editing rooms or on a user's local computer or workstation in communication with a network. The software applications provide a range of editing tools and features, allowing users to cut, trim, merge, add effects, and enhance video clips.
[0004] As will be appreciated, local media production has several inherent limitations, as described in the following sections.
[0005] Cost: Outfitting a local studio with the necessary hardware and software can be a multi -million dollar project. The equipment may have to be sized for worst-case maximum requirements, resulting in expensive hardware that largely sits idle.
[0006] Hardware Limitations: Media production is a resource-intensive task, requiring substantial processing power, memory, and storage capacity. Many users will experience performance bottlenecks and long rendering times when working with high-definition or 4K video content, with even more severe problems occurring when editing over the net, in communication with a remote server. In these cases the upstream and downstream bandwidths will make realtime or near-realtime work impossible, especially when dealing with high definition video.
[0007] Distributed work: Collaborative media production projects, especially those with production staff at different locations, require tight synchronization between team members, and inevitable network delays and limited bandwidth will make this extremely challenging.
[0008] Furthermore, scaling media production resources to accommodate fluctuating workloads is not easily handled by such systems, which generally must be sized for the worst case scenario to be handled (e.g. the maximum number of simultaneous video streams that must be handled), making the hardware requirements needlessly expensive, especially for production companies having widely varying editing needs. In these cases, expensive hardware is being underused, representing an expense that is in principle unnecessary.
[0009] To address some of these challenges and leverage the advantages of cloud computing, cloud-based media production solutions have emerged. Cloud media production offers numerous benefits including scalability and offloading of hardware costs, maintenance and operations to the cloud provider. These platforms offer a range of features, from basic video trimming to advanced editing and collaboration tools. However, there is a need for further innovation to improve the efficiency, flexibility, and accessibility of cloud media production, which is the primary focus of the present invention. Generally speaking, media production software that has been moved to the cloud is not particularly appropriate for professional use: It operates on low- quality compressed media and has resiliency problems. The so-called "blast radius" - namely, how much functionality will be lost if an instance fails - is large.
[0010] For example US9185321B1 “Apparatus, system and method for processing video data, audio data and ancillary data” provides a method for processing video and other data for broadcast having a plurality of processing components and a ‘control engine’ configured to receive topologies from a host system. The topologies are groupings of processing components adapted for processing the video and other data. The control engine determines and sends commands for execution by the processing components on a frame-by-frame basis. There are a number of unresolved issues left by the system however, including bandwidth requirements between editors and the server, as well as unavoidable delays to the use of a ‘genlock’ or synchronous timer for processing. Furthermore, the network bandwidth needs to support the entire number of sources, since decisions about which sources to use are made internal to the switcher. Finally, the cpu capability of the server will limit the number of processing functions available.
[0011] US 10966001B2 “Remote cloud-based video production system in an environment where there is network delay “ similarly describes a cloud-based video production system. In this system, video sources communicate with a cloud-based video production server and a remote user interface via a network. A control unit is in communication with the production server and the remote user interface. A buffer is located between each of the video sources and the control unit to allow for network delays. Commands for selecting and manipulating video content from the video sources are sent from the user interface to the control unit, each of the commands containing a command timestamp corresponding to the video timestamp of the video frame displayed on the user interface when the command is issued. The control unit executes each command at a time when the video timestamp of the frame in the corresponding buffer corresponds to the command timestamp. The control unit outputs a video program in accordance with the commands. However this method will in general require a fixed set of cloud hardware for its operation, a requirement that as we will see below is not a necessary limitation. As before, the network bandwidth needs to support the entire number of sources, since decisions about which sources to use are made internal to the switcher, and likewise the cpu capability of the server will limit the number of processing functions available.
[0012] EP2683155A1 describes a system for processing video having a control unit and a processing unit which exchange data in a packetized format. The system allows splitting the processing ability of a large vision mixer into smaller subunits, and also allows for realtime processing behavior. Command signals are sent from the control unit to one or more of the subunits, and an extremely complex scheduling scheme is implemented to control the timing of the execution of command signals received in the processing units to compensate for signal latencies and processing latencies. However, the system does not allow for spawning subunits on the fly, nor does it proactively introduce delays that would allow the system to recover gracefully from instance failure, communication breakdown, or other unexpected problems. Furthermore, this system assumes synchronous processing and real-time network transfer, and requires a specific hardware implementation. [0013] As a final example, consider US201501 17839A1 “ Network-based rendering and steering of visual effects This is a system for applying visual effects to video in a client/server or cloud-based system. Using a web browser, for example, users can apply and control both simple and sophisticated effects with dynamic real-time previews without downloading client-side software. Rendering and previewing of the effects is done on the server and rendered images can be sent in rapid succession to the client, allowing simple, real-time feedback and control. Other aspects of the technology are also described. This application does not appear to allow for realtime or near realtime output but rather allows users to see near-realtime results of their editing operations on their browser. Furthermore, the previously mentioned limitations concerning the server hardware having to support the ‘worst case’ scenario and limiting the processing capability of the system also applies.
[0014] Thus, despite the advantages of cloud media production, there remain opportunities for innovation in this field. The present invention addresses specific challenges and limitations associated with cloud-based media production, providing novel solutions to enhance the overall user experience and expand the capabilities of media production in the cloud.
SUMMARY OF THE INVENTION
[0015] The present invention seeks to address the limitations and challenges associated with conventional media production methods by intelligently leveraging the capabilities of cloud computing. A distributed, asynchronous media production method is introduced that requires a minimum of resources while providing highly reliable, near-realtime, distributed editing capabilities.
BRIEF DESCRIPTION OF DRAWINGS
[0016] Figure 1 depicts a block diagram of a commercial implementation of a realtime distributed editing system.
[0017] Figure 2 presents a simplified block diagram of one possible implementation of the invention. [0018] Figure 3 presents a block diagram of a generic vision mixer implemented on a single cloud instance.
[0019] Figure 4 presents a block diagram of an ‘atomic unit’ of the invention implemented on a single cloud instance.
[0020] Figure 5 presents a block diagram of several of the atomic units of the invention connected in a network.
[0021] Figure 6 shows details of queuing in one of the atomic units of the invention.
[0022] Figure 7 shows a latency histogram for typical cross areal-zone latencies, for modern (ca. 2022) cloud services.
[0023] Figures 8A, B, C show timing diagrams detailing delays for one possible embodiment of the invention.
[0024] Figure 9 shows a simplified system diagram of a possible embodiment of a system allowing for studio-based control-over system operation.
[0025] Figure 10 shows a system diagram of a possible embodiment of the invention allowing for ground based control-over system operation.
[0026] Figure 11 shows a system diagram of a possible embodiment the invention allowing for both ground- and cloud-based control over system operation, where the cloud-based operation occurs through a web interface.
DETAILED DESCRIPTION OF THE INVENTION
[0027] The invention introduces a solution for cloud-based media production that fully exploits some unique advantages that use of cloud infrastructure permits. A key aspect of the invention is that it introduces the use of an internally asynchronous process, without the timing ‘genlock’ of standard video hardware and software. This allows the system to exploit the full power of available processors, eliminating waiting loops, and leveraging the possibility of ‘faster-than video’ processing (where it takes less than one frame time to process a given frame). An induced, fixed delay (1 second for example) from input to output is maintained by the system. All outputs are available synchronously after this fixed delay. Internally all operations are performed asynchronously, as soon and quickly as possible. The induced fixed latency creates a large “repair window” time allowing for recovery from packet loss, instance failure, communications breakdowns, and so on. By means of this delay, there is enough time for operations to be redone and outages to be routed around. The fixed delay also makes it easier to integrate into user’ s workflows that typically do not expect variable processing time.
[0028] Editing operations like overlays/transitions are performed remotely, by independent cloud instances which can be created and destroyed as needed, on the fly. A ‘topology on demand’ is implemented that uses minimal resources. This topology makes use of ‘atomic units’ which may be implemented by cloud or local compute instances such as that shown in Fig. 4. The cloud instances may each be implemented separately, and called into existence and destroyed as needed. As seen in Fig. 4, each atomic unit comprises a flow router handling input and output (the flow routers are primarily concerned with moving full resolution uncompressed media between cloud instances), some locally shared memory, and one or more processing functions (for example, digital video effects such as mixes, cuts, keys, and so on). The flow router and processing functions are controlled external by means of a control plane, with control being handled by means of a message-passing paradigm such as NATS.
[0029] The control plane of the invention is responsible for: i. Breaking processing into atomic processing units ii. Scheduling media flow between units, iii. Scheduling work on instances iv. Maintaining resilience, providing oversight and fallback in cases of instance failure
Cost optimization and quality control
[0030] The control plane calculates the necessary topology needed to implement the desired switcher and launches the necessary instances, possibly launching a given instance just prior to the point when it is needed. The control plane instructs each instance’s flow routers of their upstream and downstream connections, which can change in realtime as operations are performed on the switcher. [0031 ] In order to optimize costs, the control plane is constantly organizing the type and number of instances with respect to their availability. It always attempts to use the least expensive instances currently available that are suited to each task. When work increases and more instances need to be added the control plane may temporarily use higher price instances if no lower cost instances are available, but it will also keep trying to start cheaper instances. Due to the way the system is architected, expensive instances can just be killed when necessary and the work will be repaired and continued on the new cheaper instances.
[0032] Figure 7 shows a histogram of latency for several cloud providers, for traffic between servers across accessibility zones. Similar histograms can be expected for latency between cloud and end-users, generally with larger latencies to be expected. With increasing latency beyond some point, there will be a decrease in the probability of occurrence. By means of access to such data (and similar data such as number of packets dropped over time, etc.), the budget/qual- ity algorithm can better plan its instance requisition, for example using systems with only 0.1% occurrence of latency from cloud to ground over 250ms. It is within provision of the invention to dynamically make such measurements so as to take advantage of the dynamic nature of internet communications.
[0033] A more sophisticated example of the capabilities of the invention are shown in Fig. 5, where a number of cloud instances are shown connected in a network. Each of these may be used to perform a different operation on a video stream, with one for instance responsible for transitions, another for overlays, a third for video effect A, a fourth for video effect B, and so on. Due to the nature of the setup, instances that ‘die’ can be replaced by new instances seamlessly.
[0034] Furthermore, due to the fine-grained, multiply-connected design and faster-than- realtime operation, greater resilience than ‘red/blue’ systems is possible; e.g. dead instances may be recreated on the fly, and single or even multiple dead communication links won’t affect performance.
[0035] A key advantage of the invention is due to its granular nature. Expensive instances (e.g. with GPUs and/or large memory capacity, low latency, high bandwidth, large amounts of processing power, or other special capabilities) are used only when absolutely necessary, thus potentially saving money. This is in direct contrast to conventional systems such as that shown in Fig. 3, which has all blocks on one instance which must be sized for worst case scenarios (e.g. of the maximum possible number of streams to be dealt with and minimum possible latency requirements), and once requisitioned, are generally in use for months or years. Upgrading involves requisitioning an even larger instance, with the same fundamental issues. Furthermore, most if not all functionality will be lost if a problem arises with the instance (communication is lost, or the instance goes down, for example).
[0036] As will be appreciated by those skilled in the art, all of these problems are solved by the granular solution of the invention, which allows the control plane to dynamically allocate computing resources based on the user's or application’s requirements, ensuring optimal performance while minimizing rendering times, resources allocated, cost, and the impact of communication or instance problems.
[0037] Another provision of the invention is that large data transfers are broken into smaller ‘flowlets’ or independent communications channels, which eases requirements for minimum bandwidths since multiple such channels can be combined in parallel for a larger aggregate bandwidth.
[0038] Yet another practical provision of the invention is that full-resolution video is only sent where needed. Thus, editing operations are performed on preview/downsampled versions of video, this being accomplished by means of the aforementioned proxy generator, which has instructions from the control plane as to what type of signal to send, where.
[0039] As mentioned, a message passing system such as NATS is used to send information between instances and between instances and the control plane. In one useful implementation of the invention, edits are sent in the form of messages from an editing station (e.g. an instance running on an editor’s laptop) to instances dealing with the full resolution video (e.g. running on high-bandwidth cloud servers). Thus, edits are created using downsampled video but operate on full resolution video.
[0040] A block diagram of such a setup is shown in Fig. 2 where the cloud service is shown in the upper large rectangle and a local (e.g. laptop) editing station in the lower large rectangle. A set of sources feed into the cloud instance at full resolution. Necessary elements of the incoming video streams are sent from the cloud instance to the local instance but at lower resolution and with some fixed latency such as 250mS, the frames being sent with timestamps. The local instance has an app running that is suited for media production such as a switcher app with an associated control surface that allows the editor to control the effects being applied. These effects are sent in the form of a series of commands, which may be thought of as taking the form, for instance, of ‘Transition from stream A to stream B in 5 seconds starting from timestamp 0123:45:67’ . These commands (and not the edited video) are sent from the switcher app and/or control surface to the cloud instance. The cloud instance can now apply the effects to the fullresolution video at its disposal. Since the local instance only must deal with low resolution proxy video, the bandwidth demands and latency will generally be low enough to allow the cloud instance to finish processing the video well before the one-second latency deadline, even given the 250mS voluntary latency introduced by the system into the ground processing sequence. The bandwidth requirements are also reduced; for instance the proxy media stream may require 1-2 Mb/s of network bandwidth, as opposed to 25 Mb/s for a remote desktop solution.
[0041] Proxy streams as seen in Fig. 2 (‘proxy gen and buffering’) are created on the cloud gateway instances (top large rectangle of Fig. 2) that receive the full resolution sources. The gateway machines:
• receive full resolution sources to be used for production (e.g. from cameras on the ground)
• buffer all sources to correct for timing delay mismatch from the ground
• create proxy media to be sent to the ground editing station
• send full resolution media to other instances (through the flow router) after delaying the media to compensate for the editing delay.
[0042] The proxy streams may in some embodiments be sent through an intermediate instance (not shown), for increased resilience.
[0043] Induced latencies allow the system to operate in realtime while still allowing for synchronization of editing stations. Consider for example the case of an editor and producer at disparate locations, using landline phones for realtime communications; since they are synchronized (at worst, to within a few tens of milliseconds required for telephone signal propagation), they will be able to edit in unison effectively only if both see synchronized video streams. The provision of induced latency allows for this, while still allowing enough time for the slightly-delayed editing decisions to be relayed to the cloud, applied to full resolution video streams, and output to consumers within the one-second output latency deadline.
[0044] The induced latency may be for example 250ms for all ground instances, and 400ms induced latency for full-resolution video on source instances. As mentioned this induced latency allows for multiple ground stations to operate in synchrony despite any differences in latencies from cloud to ground for the different stations. Furthermore, since the editing operations are performed locally on the proxy video, there is no ‘button delay’ (where an editing command would be performed on the ground, transmitted to cloud, and transmitted back to ground for viewing) - the edit is performed and shown immediately, and in the meantime sent to the cloud to be performed on the full-resolution video at the appropriate time.
[0045] In one embodiment of the invention, the editing functions are made available by means of a web interface. With this implementation, users can access their media production projects from any device with an internet connection, eliminating the constraints of local software and hardware installations and making media production more cost-effective for individuals and businesses.
[0046] Fig. 6 shows a more detailed example of the asynchronous processing used in the invention. The media is processed asynchronously, without use of a genlock or other clocking mechanism other time stamps for all media (e.g. each frame of video and audio). Frames are thus processed upon arrival (or arrival of all inputs to a given block). Typically, the processing function will be combining or transitioning between two sources. This usually requires taking media frames from each source that have the same time stamp, combining them and writing the output using the same timestamp. Since the processing can often happen much faster than real time, and the connections are from other networked instances, there is buffering in shared memory to enable queuing the samples on and off the instance.
[0047] The control plane supplies static configuration for the processing function, and timed commands corresponding to operational changes (wiper position, pattern changes, etc.). The control plane arranges to have the media show up some time after its associated timed command, so when the command is read there is still time to operate on it. This typically means there is a small queue for timed commands as well; commands from the ground editing station are queued by the control plane in the cloud.
[0048] A media processing function has media as inputs, and access to this command queue. Since the commands were delayed by the proxy process the source gateways also delay the full resolution media, so it does not appear at the processing functions until after the timestamped commands are available from the ground.
[0049] Figs. 8-10 show some timing charts illustrating the various time delays (both intentional and unavoidable) of one implementation of the system. Fig. 8 row 1 shows time in frames since the input of a video sequence to a cloud instance of the invention. As shown in Fig. 2, there is a fixed delay of 250ms between the cloud and ground (e.g. the laptop production station). This delay can be enforced if, for instance, the actual cloud-ground latency has a form such as shown in Fig. 7 with the vast majority of data arriving within (for example) 10ms. To bring up the total to 250ms, the remaining 240ms is added at the ground. A different ground production station suffering from a 20ms cloud-ground delay would correspondingly have 230ms added at the ground, thus keeping both ground stations in sync. As shown in Fig. 8 this 250ms delay corresponds to 15 frames for the case of 60 fps video. Thus row 2 of Fig. 8 shows ground processing beginning at frame 15 of row 1. The ground producer may now perform some operation on this frame. A command corresponding to this operation is now sent back to the cloud, with some inevitable delay equivalent to (for example) 8 frames. Thus, the cloud instance will receive the command for operating on Frame 1 at the time when the original video (row 1) has hit frame 23. This is seen when comparing row 1 with row 3.
[0050] Now that the video frame and corresponding command are ready, it would in principle be possible to operate on the video frame immediately. However as shown in row 4 of Fig. 9, a further delay equivalent to (for instance) 8 frames is introduced in order to account for different ground-cloud delays. This extra delay reduces the probability of the case that the video frame arrives before the command has. The command and frame now being reliably available, the operation that the ground station requested may now be performed, with some typical processing time (shown as equivalent to 4 frames, see row 5). Comparing row 5 to row 1, we see that a total equivalent of 34 frames have passed. Since the system has promised to deliver its output at exactly Is (60 frames) after the input was received, the flow router will wait an extra 27 frames before releasing the operated-upon frame 1 to its output. It should be appreciated that this buffer allows for long or short latencies from cloud->ground and ground->cloud without affecting either the synchronous operation of multiple ground operations nor the exact timing of the output with respect to the input.
[0051 ] It is within provision of the invention that the inventive system be provided as a cloud service, allowing broadcasters to set up a live production studio in the cloud, fully managed and paid by the minute. The broadcaster does not have to deal with any infrastructure or maintenance.
[0052] This cloud service supports the use case shown for example in Fig. 9, where a sports production is controlled by both users at their homes and a central studio, managed by sending and processing the audio and video in the cloud. Many broadcasters have partial solutions in this direction, implemented for example by running prosumer Windows applications in the cloud, controlled and monitored using a remote desktop connection.
[0053] However, such solutions, as mentioned above, are not ‘cloud native’ in the same sense as the inventive video production service. Some performance characteristics made possible by the granular, multiple-minimal-instance based solution of the invention include:
• High reliability: Targeting %99.99 SLA. There are no single points of failure and the system is self healing against multiple instance losses.
• Best quality: The system operates entirely in an uncompressed domain, so the video quality is the same or better than existing studio systems.
• Low price: The invention exploits cloud infrastructure to provide reliable systems at low cost. For example, most of the processing can be done on low cost CPU instances.
[0054] Fig. 10 shows a more detailed view of the cloud, event, and ground parts of one possible implementation of the system. The event will generally have one or more cameras covering various aspects of the event. Audio-only channels may also be employed for commentators or the like. Each of these channels may send its data independently to the cloud service of the invention, through any means of connectivity found convenient and of sufficient bandwidth. The system determines the amount of buffering needed for each source in order align them all temporally.
[0055] Once this data has arrived at the cloud server(s), proxy generation and buffering occur as explained previously, and the proxy video is sent to the editing stations with a fixed (e.g. 250ms) latency. Editing is done at the ground editing stations (which may be widely distributed and require only a minimum of bandwidth) and the time-stamped editing commands are sent back to the cloud processor(s). The full -re solution operations corresponding to the editing commands made on the ground are performed in the cloud, and the fully-edited full resolution video resulting is sent on to subsequent services at a fixed (e.g. 1 second) delay from start to finish. The ground stations may be operated by technical directors receiving verbal commands from a producer, over phone or other connection.
[0056] In order for event commentators and crew to be able to see the ‘final result’ with a minimal delay, a ‘backhaul’ link is provided that sends edited proxy video back to the event.
[0057] A table comparing conventional and inventive solutions is presented below.
[0058] In order to provide for users not possessing specialized control equipment, webbased control over the system may also be provided by means of a simplified GUI. In such implementations the proxy switching or other editing operations can be performed in the cloud. The cloud-based proxy editor would present a web interface viewed and controlled, for instance, over a WebRTC connection. Several of these are instantiated in the cloud, ideally physically located near the operator to reduce latency.
[0059] There are a few drawbacks to the web-based editing method described, and hence this implementation is more suited for lower-budget operation or programming with less severe constraints. Drawbacks include: a. The web GUIs are not as full featured as a native application b. There is "button press" latency, where the operator's presses get delayed by the network round-tri c. it is not as easy to keep exact sync across a distributed team [0060] It is within provision of the invention to allow for users to integrate existing NDI - format editing material on the ground, by converting the proxy content of the invention as described above, into NDI format on the ground, and using a headless back end in the cloud (if necessary). Thus existing replay, switching, or processing products may be integrated with the invention, using the proxy content of the invention in NDI format on the ground and sending commands to the user’s backend in the cloud, operating on uncompressed media.
[0061] An implementation providing for both ground-based control surfaces and webbased control surfaces (using for instance a web GUI) is shown in Fig. 11. This shows an arrangement useful for switching on the ground, as before (and shown e.g. in Fig. 10) now supplemented with a separate location and operator for replay/highlights. The replay system may comprise an existing product that uses standard inputs and outputs such as NDI. For this situation the inventive system would provide a gateway that converts to / from NDI and communicates with the cloud system of the invention as described above. With reference to Fig. 11, the green boxes are operated by the existing product vendor, while the rest of the system is provided by the invention. The provider of the existing editing product only need make two changes to their product to fit with the inventive service:
1. The application must be refactored into front end / back end if it is not already capable. The processing back end could be as simple as making a "headless" version that can run in the cloud and accept commands via remote control.
2. Any commands received from the user operating on the ground are sent from the modified ground product to the REST API of the gateway where they are forwarded to their processing back end in the cloud.
[0062] These additions are practically useful to include, as the NDI "ecosystem" is growing, with many products and users. The method is sufficient for dealing with proxy-grade content. By means of the gateway described, users can easily add the cloud service of the invention to their existing ground NDI workflow.

Claims

1. A system for asynchronous, cloud media production operations, comprising: a. a control plane adapted to calculate a necessary topology, spawn compute instances required for said topology and destroy compute instances not required for said topology at any given time, said compute instances including a flow router, locally shared memory, and one or more processing functions for digital video effects; b. One or more editing instances having control surfaces, adapted to allow users to edit video by means of a sequence of video edit commands; c. a message passing paradigm for communications between said control plane, said compute instances, and said editing instances; wherein said control plane causes low-resolution video with induced latency to be sent to said editing instances, while said video edit commands are sent to relevant compute instances which perform said commands on full resolution video.
2. The system of claim 1 wherein a subset of said compute instances are redundant, and wherein upon failure of any of said redundant compute instances, their corresponding redundant instances are used to replace said failed compute.
3. The system of claim 1 wherein, upon failure of any of said compute instances, replacement instances are used to replace said failed compute instances without loss of data, by means of keeping old input and output media on each instance for a predetermined time period such that said media can be reused on said replacement instances, copied by said flow routers under control of said control plane.
4. The system of claim 1 wherein, upon failure of any of said compute instances, the instance to which the output of said failed compute instance is used to perform the operation intended to be performed by said failed compute instance.
5. The system of claim 1 wherein, upon failure of any of said compute instances, the instance to which the output of said failed compute instance is used to perform the operation intended to be performed by said failed compute instance.
6. The system of claim 1 wherein said editing instances are local instances and wherein said control surfaces are manipulated directly by users.
7. The system of claim 1 wherein said editing instances are cloud instances, and wherein said control surfaces are GUIs manipulated by users controlling said cloud instances.
PCT/US2025/012607 2024-01-23 2025-01-22 Distributed live media production system Pending WO2025160185A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202463623812P 2024-01-23 2024-01-23
US63/623,812 2024-01-23

Publications (1)

Publication Number Publication Date
WO2025160185A1 true WO2025160185A1 (en) 2025-07-31

Family

ID=96545818

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2025/012607 Pending WO2025160185A1 (en) 2024-01-23 2025-01-22 Distributed live media production system

Country Status (1)

Country Link
WO (1) WO2025160185A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130308436A1 (en) * 2012-05-18 2013-11-21 Futurewei Technologies, Inc. System and Method for Cloud-Based Live Media Ingestion and Transcoding
US20140033042A1 (en) * 2009-04-14 2014-01-30 Avid Technology Canada Corp. Rendering in a multi-user video editing system
US20180349168A1 (en) * 2017-05-30 2018-12-06 Magalix Corporation Systems and methods for managing a cloud computing environment
US20210327472A1 (en) * 2013-05-20 2021-10-21 Intel Corporation Elastic cloud video editing and multimedia search

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140033042A1 (en) * 2009-04-14 2014-01-30 Avid Technology Canada Corp. Rendering in a multi-user video editing system
US20130308436A1 (en) * 2012-05-18 2013-11-21 Futurewei Technologies, Inc. System and Method for Cloud-Based Live Media Ingestion and Transcoding
US20210327472A1 (en) * 2013-05-20 2021-10-21 Intel Corporation Elastic cloud video editing and multimedia search
US20180349168A1 (en) * 2017-05-30 2018-12-06 Magalix Corporation Systems and methods for managing a cloud computing environment

Similar Documents

Publication Publication Date Title
US11356493B2 (en) Systems and methods for cloud storage direct streaming
JP6928038B2 (en) Systems and methods for frame copying and frame expansion in live video encoding and streaming
US10440403B2 (en) System and method for controlling media content capture for live video broadcast production
US10856029B2 (en) Providing low and high quality streams
US10154320B2 (en) Dynamic time synchronization
US7197535B2 (en) System and method for frame image capture
US20180069950A1 (en) Scalable, Live Transcoding with Support for Adaptive Streaming and Failover
US10484737B2 (en) Methods and systems for instantaneous asynchronous media sharing
US20230291779A1 (en) System and method for advanced data management with video enabled software tools for video broadcasting environments
US11895352B2 (en) System and method for operating a transmission network
EP3908006A2 (en) Systems and methods for real time control of a remote video production with multiple streams
US10848538B2 (en) Synchronized source selection for adaptive bitrate (ABR) encoders
US9118947B2 (en) Multi-vision virtualization system and method
EP3891999B1 (en) Just after broadcast media content
WO2025160185A1 (en) Distributed live media production system
KR101877034B1 (en) System and providing method for multimedia virtual system
KR102268167B1 (en) System for Providing Images
US12192048B2 (en) Systems and methods for processing data streams
Daami et al. Client based synchronization control of coded data streams
Kanellopoulos Group synchronization for multimedia systems
GB2553597A (en) Multimedia processing in IP networks
Fitz tgw: a webcast transcoding gateway
JP2008160481A (en) Content editing apparatus, content editing method, and content editing program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 25745615

Country of ref document: EP

Kind code of ref document: A1