WO2025027641A1

WO2025027641A1 - A stream selection system for automated switching of video streams and methods thereof

Info

Publication number: WO2025027641A1
Application number: PCT/IN2024/051402
Authority: WO
Inventors: Siddhartha Dutta; Vandana Bhatia; Ishu JAIN
Original assignee: Kauriink Private Ltd
Current assignee: Kauriink Private Ltd
Priority date: 2023-07-29
Filing date: 2024-07-29
Publication date: 2025-02-06
Anticipated expiration: 2026-01-29

Abstract

The present invention relates to an automated stream selection system and method for transitioning between multiple video sources based on real-time object detection and tracking within video frames The system comprises a processing module, a memory module, and a streaming module. The memory module stores frames received from primary and secondary video sources, organizing them into sequences. The processing module divides each frame into predefined identification areas, detects primary and secondary objects using spatial-temporal markers, and calculates the spatial coordinates of these objects across frames. By comparing the movement of the primary object, the system determines a transition factor indicating movement towards specific identification areas and selects a switching direction accordingly. The system then transitions from the primary video source to the selected secondary source, adjusting the opacity of frames to ensure smooth visual transitions.

Description

A STREAM SELECTION SYSTEM FOR AUTOMATED SWITCHING OF

VIDEO STREAMS AND METHODS THEREOF

FIELD OF INVENTION

[001] The present invention generally relates to the field of real time monitoring, analysis, and broadcasting of video and audio sources. More specifically, the present invention relates to an automated system for switching between one or more live video and audio sources.

BACKGROUND OF THE INVENTION

[002] The following description of related art is intended to provide background information pertaining to the field of the invention. This section may include certain aspects of the art that may be related to various features of the present invention. However, it should be appreciated that this section is used only to enhance the understanding of the reader with respect to the present invention, and not as admissions of prior art.

[003] Telecasting orbroadcasting live events, such as sporting events, music shows, live news broadcast, etc. is a tumultuous task. The final stream of video and audio that reaches the audience requires capture, pre-processing, editing, encoding, and then streaming of the original data, which require immense manpower and equipment. Similarly, video and audio monitoring systems, for example, in close circuit video surveillance systems, also demand extensive systems which require to be set up, and continuous monitoring by one or more persons, depending on the size and the type of system. Even if such a system is not being monitored continuously, the review of the video and audio data has to be done manually, in most commercially available systems. For commercial events such as games or concerts, the profitability hinges on the ability to offer high-quality and captivating streaming/ broadcasting production.

[004] Manual control of the editing and production process in a live telecast / broadcast, or the manual monitoring of a live source for identification of events / objects, introduces several factors in the outcome of the process, such as delay, human error, and other external factors. Most sporting events are covered by using professional videographers. This introduces yet another layer of delay, human error, and other external factors affecting the outcome of the telecast. Therefore, any such live telecast or live monitoring situation demands highly trained professionals in every aspect of the process. This increases the overall cost of any such outfit.

[005] Efforts have been made to automate various aspects of the process of telecasting and / or monitoring a live event, and extensive research has been dedicated to make both the process of production / editing and viewing streamlined and easier.

[006] The prior art document US10219009B2 discloses a system that allows viewers to manually switch between two video sources with different perspectives and replay key moments during a live broadcast making it more interactive.

[007] On the processing end, various prior art documents, including US11412303B2, US9437012B2, and US20170255832A1, have introduced methods for analysing live stream content and extracting analytical data from the video such as position and velocity of elements, and detecting actions performed by objects in videos. These innovations laid the foundation for identifying and deriving data from images.

[008] Other prior art documents such as US10055847B2 and US2015/0117704A1 disclose other aspects of image processing being carried out on the live video source, in order to detect one or more objects and one or more events. There are several other prior art documents which discuss detecting and tracking objects, detecting events and tracking the objects thereafter, and so on. Although these inventions hold immense value, they have not led to widespread adoption due to the persistent core issue of heavy reliance on manual operations, resulting in high running / operating costs.

[009] There is however, a gap between what is existing as prior art, and the requirements of an automated broadcasting system as described in this section. The common general knowledge and the prior arts fail to teach a system which performs a very basic and important task of a broadcasting / monitoring unit, transitioning between video / audio sources. The decision to show one source and then switch to another at a specific moment is at present made manually.

[0010] Therefore, there is a need for a system which can transition between various video sources automatically, which reduces manpower required in editing / production / monitoring of multiple video sources in real time and automates and streamlines the process.

SUMMARY OF THE INVENTION [0011] The present description provides an automated stream selection system and methods thereof for transitioning from a primary video source to a secondary video source from a plurality of secondary video sources. The stream selection system and the methods thereof rely on artificial intelligence and machine learning to achieve its objectives. The stream selection system also utilizes a cloud-based processing and encoding infrastructure which allows direct transmission to the viewer, and / or remote monitoring and control, without significant ground level infrastructure. Therefore, the present system minimizes human interaction, and therefore the delays and errors resulting thereof.

[0012] In an aspect, a stream selection system for transitioning between one or more sources is disclosed. The stream selection system comprises a processing module, a memory module, and a streaming module. The memory module is configured to store a plurality of selected frames for the streaming system. The processing module is configured to receive a plurality of frames from a primary source and a plurality of frames from one or more secondary sources. The processing module is further configured to configure the plurality of frames from the primary source into a first sequence, and the plurality of frames from the one or more secondary sources into one or more respective secondary sequences, on a first in first out basis. The processing module is further configured to divide each frame of each sequence into a pre-defined number of identification areas. The processing module is further configured to determine at least a primary object and one or more secondary objects with respect to pre-defined spatial-temporal markers in one of the identification areas. The processing module is further configured to determine a spatial coordinate of the primary object and the one or more secondary objects in each frame of the stream continuation sequence and the stream transition sequence. The processing module is further configured to compare the difference between the spatial coordinate of the primary object in a current frame of the stream continuation sequence to a next frame. The processing module is further configured to determine a transition factor indicating a movement of the primary object towards one of the pre-defined number of identification areas by at least a pre-defined number of pixels. The processing module is further configured to select a switching direction towards one of the pre-defined number of identification areas.

[0013] In an embodiment, the switching direction selected by the processor is one of a direction towards the moving primary object and a direction away from the moving primary object.

[0014] In an embodiment, the switching direction is selected when the primary object when the primary object is determined to move towards the identification area for a pre-defined number of consecutive frames, and the transition factor is equal to a pre-defined value of each of pre-defined value of each of the pre-defined number of consecutive frames.

[0015] In an embodiment, the pre-defined number of consecutive frames is between 1 and 10, and the pre-defined value of the transition factor is equal to one.

[0016] In an embodiment, the pre-defined number of identification areas include a left inner area, a left outer area, a right inner area, a right outer area, a bottom inner area, a bottom outer area, a top inner area, and a top outer area. [0017] In an embodiment, a gap of 60 pixels is introduced between the right inner area and the right outer area, and between the left inner area and the left outer area. [0018] In an embodiment, the transition factor is determined as one when the predefined number of pixels is more than or equal to 30 in any direction.

[0019] In an embodiment, the transition factor is negative when the primary object is determined to have moved leftwards or downwards.

[0020] In an embodiment, the processing module is further configured to select the secondary sequence from the stream transition sequence based on the selected switching direction. In another embodiment, the processing module is further configured to switch from a first sequence to the selected secondary sequence.

[0021] In an embodiment, the processing module is further configured to adjust an opacity of a pre-defined number of next frames of the first sequence to an opacity of a pre-defined number of next frames of the selected secondary sequence.

[0022] In an embodiment, the pre-defined number of next frames of the selected secondary sequence is between 3 and 10.

[0023] In an embodiment, the primary and secondary sources comprise of movable all weather video capturing devices.

[0024] In an aspect, a method for automated switching from a current video source to a next video source is disclosed. The method comprises the step of receiving, by a processor, a plurality of frames from a primary source and a plurality of frames from one or more secondary sources. The method further comprises the step of configuring, by the processor, the plurality of frames from the primary source into a first sequence, and the plurality of frames from the one or more secondary sources into one or more respective secondary sequences, on a first in first out basis. The method further comprises the step of dividing, by the processor, each frame of each sequence into a pre-defined number of identification areas. The method further comprises the step of determining, by the processor, at least a primary object and one or more secondary objects with respect to pre-defined spatial -temporal markers in one of the identification areas. The method further comprises the step of determining, by the processor, a spatial coordinate of the primary object and the one or more secondary objects in each frame of the stream continuation sequence and the stream transition sequence. The method further comprises the step of comparing, by the processor, the difference between the spatial coordinate of the primary object in a current frame of the stream continuation sequence to a next frame. The method further comprises the step of determining, by the processor, a transition factor indicating a movement of the primary object towards one of the pre-defined number of identification areas by at least a pre-defined number of pixels. The method further comprises the step of selecting, by the processor, a switching direction towards one of the pre-defined number of identification areas.

[0025] In an embodiment, the switching direction selected by the processor is one of a direction towards the moving primary object and a direction away from the moving primary object.

[0026] In an embodiment, the switching direction is selected when the primary object when the primary object is determined to move towards the identification area for a pre-defined number of consecutive frames, and the transition factor is equal to a pre-defined value of each of pre-defined value of each of the pre-defined number of consecutive frames.

[0027] In an embodiment, the pre-defined number of consecutive frames is between 1 and 10, and the pre-defined value of the transition factor is equal to one.

[0028] In an embodiment, the pre-defined number of identification areas include a left inner area, a left outer area, a right inner area, a right outer area, a bottom inner area, a bottom outer area, a top inner area, and a top outer area.

[0029] In an embodiment, a gap of 60 pixels is introduced between the right inner area and the right outer area, and between the left inner area and the left outer area. [0030] In an embodiment, the transition factor is determined as one when the predefined number of pixels is more than or equal to 30 in any direction.

[0031] In an embodiment, the transition factor is negative when the primary object is determined to have moved leftwards or downwards.

[0032] In an embodiment, the method further comprises the step of selecting, by the processor, the secondary sequence from the stream transition sequence based on the selected switching direction. In another embodiment, the method further comprises the step of switching, by the processor, from a first sequence to the selected secondary sequence.

[0033] In an embodiment, the method further comprises the step of adjusting, by the processor, an opacity of a pre-defined number of next frames of the first sequence to an opacity of a pre-defined number of next frames of the selected secondary sequence. [0034] In an embodiment, the pre-defined number of next frames of the selected secondary sequence is between 3 and 10.

[0035] In an embodiment, the method further comprises the step of configuring, by the processor, a pre-determined delay between a current frame and a next frame in the stream continuation sequence. In another embodiment, the method further comprises the step of inducing, by the processor, a pre-determined audio track corresponding to the stream continuation sequence. In another embodiment, the method further comprises the step of transmitting, by the processor, each frame of the stream continuation sequence to an encoder. In another embodiment, the method further comprises the step of transmitting, by the processor, the encoded video stream to a broadcasting unit.

[0036] In an embodiment, the primary and secondary sources comprise of movable all weather video capturing devices.

BRIEF DESCRIPTION OF DRAWINGS

[0037] Reference will be made to embodiments of the invention, examples of which may be illustrated in accompanying figures. The accompanying figures, which are incorporated herein, and constitute a part of this invention, illustrate exemplary embodiments of the disclosed methods and systems in which like reference numerals refer to the same parts throughout the different drawings. Components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present invention. Although the invention is generally described in context of these embodiments, it should be understood that it is not intended to limit the scope of the invention to these particular embodiments. [0038] Figure 1A illustrates an exemplary block diagram representation of the stream selection system as per an embodiment.

[0039] Figure IB illustrates an exemplary block diagram representation of the stream selection system as per an embodiment.

[0040] Figure 1C illustrates an exemplary block diagram representation of the stream selection system as per an embodiment.

[0041] Figure 2 illustrates a block diagram representation of the system enabling the automatic broadcasting by the stream selection system.

[0042] Figure 3 illustrates an exemplary flow chart representing a method being implemented by the stream selection system.

[0043] Figure 4 illustrates an exemplary flow chart representing a method being implemented by the stream selection system.

[0044] Figure 5 illustrates an exemplary flow chart representing a method being implemented by the stream selection system.

DETAILED DESCRIPTION OF INVENTION

[0045] Various features and embodiments of the present invention here will be discernible from the following further description thereof, set out hereunder. The ensuing description provides exemplary embodiments only and is not intended to limit the scope, applicability, or configuration of the invention. Rather, the ensuing description of the exemplary embodiments will provide those skilled in the art with an enabling description for implementing an exemplary embodiment. It should be understood that various changes may be made in the function and arrangement of elements without departing from the spirit and scope of the invention as set forth.

[0046] The word “exemplary” and/or “demonstrative” is used herein to mean serving as an example, instance, or illustration. For the avoidance of doubt, the subject matter disclosed herein is not limited by such examples. In addition, any aspect or design described herein as “exemplary” and/or “demonstrative” is not necessarily to be construed as preferred or advantageous over other aspects or designs, nor is it meant to preclude equivalent exemplary structures and techniques known to those of ordinary skill in the art. Furthermore, to the extent that the terms “includes,” “has,” “contains,” and other similar words are used in either the description, such terms are intended to be inclusive — in a manner similar to the term “comprising” as an open transition word — without precluding any additional or other elements.

[0047] Reference throughout this specification to “one embodiment” or “an embodiment” or “an instance” or “one instance” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. [0048] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting to the invention. As used herein, the singular forms “a”, “an”, and “the” are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items.

[0049] Figure 1A is an exemplary block diagram representation of the stream selection system 100. The stream selection module 100 comprises a processing module 102, a memory module 103, and a streaming module 104. In an embodiment, the processing module 102 comprises its own memory unit (not shown) separate from the memory module 103. The memory unit can be a permanent memory unit or a temporary memory unit, including one of a read only memory (ROM), a random access memory (RAM), a hard disk drive (HDD), and a solid state drive (SSD). The processing module is configured to receive input real time video feeds from one or more sources 101 (1 ... n). The stream selection system 100 selects one of the video feeds received from the one or more sources to be broadcasted. In an embodiment, movable all weather camera setups are installed. In an embodiment, the number of camera setups installed so that the entire field of view is covered is not a limitation on the system, and can range from one to any number of cameras. In an embodiment, at least two of the camera setups have at least two cameras each: a telescopic camera to capture distant object with larger focal length and a wide angle camera to capture a larger field of view, and the other camera setups have only a wide angle camera. In another embodiment, the camera setups are situated at distances between 50 meters and 1000 meters from the area of interest (also referred to as the field of view).

[0050] In an embodiment, the processor 102 only processes the incoming video feed from only one of the one or more sources 101. In another embodiment, the processor processes the incoming video feed from a plurality of the one or more sources 101. The processor (interchangeably referred to as the “processing module”) 102 is configured to receive a plurality of frames from a primary source 101 (1) and a plurality of frames from one or more secondary sources 101(2... n). The processing module 102 is further configured to configure the plurality of frames from the primary source into a first sequence, and the plurality of frames from the one or more secondary sources into one or more respective secondary sequences, on a first in first out basis. The processing module 102 is further configured to divide each frame of each sequence into a pre-defined number of identification areas. The processing module 102 is further configured to determine at least a primary object and one or more secondary objects with respect to predefined spatial-temporal markers in one of the identification areas. The processing module 102 is further configured to determine a spatial coordinate of the primary object and the one or more secondary objects in each frame of the stream continuation sequence and the stream transition sequence. The processing module 102 is further configured to compare the difference between the spatial coordinate of the primary object in a current frame of the stream continuation sequence to a next frame. The processing module is further configured to determine a transition factor indicating a movement of the primary object towards one of the pre-defined number of identification areas by at least a pre-defined number of pixels. The processing module is further configured to select a switching direction towards one of the pre-defined number of identification areas. In an embodiment, the stream selection system is a cloud based computing system. Activation of the system is initiated by the user upon pressing a designated button approximately 10 minutes prior to broadcasting. Upon activation, a new instance of the cloud-based computing environment is instantiated, requisite applications are installed, synchronization with on-site equipment is established, thereby rendering the system prepared for streaming operations. Subsequent to the conclusion of the stream, the user may activate another designated button, prompting a self-destruct mechanism which effectively removes all traces of the system. This methodology ensures that the system is exclusively employed when required by the user, thereby minimizing any instances of idle usage.

[0051] In an embodiment, the one or more sources 101 are movable all weather camera setups called visual Poles (VP). These are installed on the ground. In an embodiment, a minimum of 1 and a maximum of 4 VPs are installed on the ground and they seamlessly integrated into the landscape. In an embodiment, at least 2 out of the (4 to N) pillars have two cameras: a telescopic camera to capture distant object with larger focal length and a wide-angle camera to capturing a larger field of view and rest of the pillars carry a wide-angle cameras.

[0052] The stream selection system 100 includes the memory module 103. The memory module 103 includes a primary memory 109 and a replay memory 110. The frames selected and transmitted by the processor 102 are temporarily stored in the primary memory. In an embodiment, selected frames are stored in the replay memory 110, to be induced in the output video stream. The frames stored in the replay memory 110 are induced in the output video stream by one of the relay module 108 and the streaming system 104. The streaming system 104 is configured to encode the video stream, transmit the video stream to one or more platforms, including cable television, OTT platforms, other APIs, etc.

[0053] Figure IB is an exemplary block diagram representation of the stream selection system 100. The stream selection module 100 comprises a processing module 102, a memory module 103, and a streaming module 104. The processing module 102 further comprises a frame splitter module 105, an identifier module 106, a transition module 107, and a relay module 108. In an embodiment, the processing module 102 comprises its own memory unit (not shown) separate from the memory module 103. The processing module is configured to receive input real time video feeds from one or more sources 101 (1 ... n). The stream selection system 100 selects one of the video feeds received from the one or more sources to be broadcasted. In an embodiment, movable all weather camera setups are installed. In an embodiment, the number of camera setups installed so that the entire field of view is covered is not a limitation on the system, and can range from one to any number of cameras. In an embodiment, at least two of the camera setups have at least two cameras each: a telescopic camera to capture distant object with larger focal length and a wide angle camera to capture a larger field of view, and the other camera setups have only a wide angle camera. In another embodiment, the camera setups are situated at distances between 50 meters and 1000 meters from the area of interest.

[0054] The video feed being broadcasted is the primary source, and the video feed from the other sources are secondary sources, that the system 100 may select. Since a video feed is a continuous stream of image frames, the video feed has been referred to as a plurality of frames in the rest of this description. Therefore, the processing module (also referred to as the processor) 102 receives a plurality of frames from a primary source and a plurality of frames from one or more secondary sources.

[0055] The frame splitter module 105 of the processor 102 is configured to consolidate multiple real time inputs of plurality of frames. Within this consolidation, a first sequence of the plurality of frames from the primary source, and one or more secondary sequences of the respective plurality of frames are discerned from the collective pool of the plurality of frames being received by the processor 102 in real time. The frame splitter module 105 then divides the consolidated plurality of frames into individual frames, arranges them in a first in first out (FIFO) sequence, and relays the first frame (of each sequence) to frame processor module (not shown). In an embodiment, the plurality of frames being processed by any of the modules of the processor 102 is from the primary source 101. In an embodiment, while the video feed from the primary source is being streamed, the plurality of frames from the other video feeds are not processed. The frame processor module categorizes the received frames into their respective sequences, i.e., the first sequence and the one or more secondary sequences.

[0056] The identifier module 106 is configured recognize pre-defined objects within a frame using spatial-temporal markers. This identification is carried out in each frame of each of the first sequence and the one or more secondary sequences. In an embodiment, the identifier module 106 converts the colour code of each frame from RGB to BGR. In an embodiment, the identifier module 106 divides each frame of each sequence into a pre-defined number of identification areas. The identifier module 106 determines the one or more pre-defined objects relative to the one or more pre-defined spatial-temporal markers in each frame of each of the first sequence and the one or more respective secondary sequences. In an embodiment, there are at least 6 identification areas, the dimensions (in pixels) of which are as follows: a. R1 : Left Inner : 60 x 1120 b. R2: Left Outer : 60 x 1120 c. R3: Right Inner : 60 x 1120 d. R4: Right Outer : 60 x 1120 e. R5: Bottom Inner : 1920 x 40 f. R6: Bottom Outer : 1920 x 40 g. R7: Top Inner : 1920 x 40 h. R8: Top Outer : 1920 x 40 In an embodiment, a gap of 60 pixels is introduced between the right inner area and the right outer area, and between the left inner area and the left outer area.

[0057] Based on the identified primary and secondary objects, the pertinent frame is relayed to one of a stream continuation sequence and a stream transition sequence. The identifier module 106 determines a spatial coordinate (X-Y) of the primary object and the one or more secondary objects in each frame of the stream continuation sequence and the stream transition sequence. The identifier module 106 compares the difference between the spatial coordinate of the primary object in a current frame of the stream continuation sequence to a next frame in each of the secondary sequences.

[0058] In an embodiment, the decision to switch is based on the direction of movement of the primary object.

[0059] The transition module 107 selects a pre-defined transition coefficient between a current frame in the stream continuation sequence and an at least predefined number of next frame(s) in the stream transition sequence. In an embodiment, the pre-defined transition coefficient is selected by determining a ratio of the size of the determined primary object in a previous frame with a next frame. In an embodiment, the predefined number of next frames is between 3 and 10. In an embodiment, the pre-defined number of next frames is 5. In an embodiment, the transition coefficient is the transpose ratio between the previous frame and the succeeding frame for a span of 5 consecutive frames. In an embodiment, the transition coefficient selected by the transition module is as per the condition given in the following table:

Table 1

[0060] The logic processor module selects a switching direction towards one of the pre-defined number of identification areas. The logic processor determines a transition factor indicating that the primary object has moved by at least a predefined number of pixels towards one of the pre-defined number of identification areas. If the difference in the coordinate of the primary object in a first frame with respect to the next frame is greater than 30 pixels or less than -30 pixels, a second transition factor is registered with the value ‘ 1’, otherwise ‘O’. In an embodiment, when at 3 consecutive entries of ‘ 1’ is registered against the second transition factor, the transition module 107 undertakes a switches to one of the secondary sequences in the stream transition sequence, based on the following rules: a. Switch Rightwards: If the register has 3 entries of ‘ 1’ where the pixel difference is greater than 30 pixels. b. Switch Leftwards: If the register has 3 entries of ‘ 1’ where the pixel difference is less than -30 pixels. c. Switch Topwards: The primary object is detected in R7 and then in R8, or if the register has 3 entries of ‘ 1’ where the pixel difference is greater than - 30 pixels. d. Switch Downwards: The primary object is detected in R5 and then in R6, or if the register has 3 entries of ‘ 1’ where the pixel difference is greater than - 30 pixels.

[0061] The relay module 108 is configured to arrange the frames in sequence of their selected and determined output, and introduce a buffer delay time between frames. The relay module 108 is also configured to induce audio sequences corresponding to the video stream. In an embodiment, such audio comprises live audio, commentary, pre-defined audio etc. The relay module 108 is further configured to induce replay sequences in the output video stream.

[0062] The stream selection system 100 includes the memory module 103. The memory module 103 includes a primary memory 109 and a replay memory 110. The frames selected by the processor 102 are temporarily stored in the primary memory. In an embodiment, selected frames are stored in the replay memory 110, to be induced in the output video stream. The frames stored in the replay memory 110 are induced in the output video stream by one of the relay module 108 and the streaming system 104. The streaming system 104 is configured to encode the video stream, transmit the video stream to one or more platforms, including cable television, OTT platforms, other APIs, etc. In an embodiment, the memory modules comprise one or more of a random access memory (RAM), a read only memory (ROM), a hard disk drive (HDD), and a solid state drive (SSD). In another embodiment, the memory module comprise cloud based storage systems utilizing a server at a remote location.

[0063] Figure 1C is an exemplary block diagram representation of the stream selection system 100. In an embodiment, the processor 102 of the stream selection system 100 is firstly configured to act as a splitter engine. In an embodiment, the incoming video feed, originating from each of the one or more sources 101 is retrieved by the splitter engine over the public internet. In another embodiment, the incoming video feed is retrieved over dedicated wired / wireless network between the one or more sources 101 and the processor 102. The splitter engine consolidates the data from the one or more sources, and distinguishes one as a primary source and the others as secondary sources. The feed from the primary source is then split into its individual content frames, which are arranged in a first in first out basis. In another embodiment, the processor 102 of the stream selection system is secondly configured to act as a frame processor. The frame processor receives the individual frames from the splitter engine and categorizes them as per pre-defined criteria. It retains a duplicate of the most recent frame transmitted for broadcasting. The frame processor transmits the current frame in the first sequence to an identifier engine. In an embodiment, the processor 102 is thirdly configured to act as a identifier engine. Upon receipt of the frames from the first sequence, the identifier engine converts the colour codes of each frame from RGB to BGR. It further defines at least six rectangular regions on each frame: a. R1 : Left Inner : 60 x 1120 b. R2: Left Outer : 60 x 1120 c. R3: Right Inner : 60 x 1120 d. R4: Right Outer : 60 x 1120 e. R5: Bottom Inner : 1920 x 40 f. R6: Bottom Outer : 1920 x 40 g. R7: Top Inner : 1920 x 40 h. R8: Top Outer : 1920 x 40

The identifier engine further detects one or more predefined objects, including at least a primary object and one or more secondary objects. In another embodiment, the processor 102 is fourthly configured to act as a logic processor. The logic processor undertakes the task of categorizing and tallying frames based on the inputs from the identifier engine. In an embodiment, the logic processor receives the coordinates of the primary object in each frame. If the difference between the coordinates in consecutive frames with positive identification of the primary object is more than 30 pixels or less than -30 pixels, then an entry of 1 is made against a transition factor. In an embodiment, this continues till three entries of 1 are determined for the transition factor. In an embodiment, the switching decision is taken as follows: a. Switch Rightwards: If the register has 3 entries of ‘ 1’ where the pixel difference is greater than 30 pixels. b. Switch Leftwards: If the register has 3 entries of ‘ 1’ where the pixel difference is less than -30 pixels. c. Switch Topwards: The primary object is detected in R7 and then in R8, or if the register has 3 entries of ‘ 1’ where the pixel difference is greater than - 30 pixels. d. Switch Downwards: The primary object is detected in R5 and then in R6, or if the register has 3 entries of ‘ 1’ where the pixel difference is greater than - 30 pixels.

In another embodiment, the switching is based on the direction of movement of the primary object. The one or more secondary sources which becomes the primary source after switching is either displaying the primary object moving towards it, or it is displaying the primary object moving away from it.

In an embodiment, the processor 102, fifthly, is configured to act as a transition manager. Once the decision to switch has been made, the plurality of frames from the selected secondary source are sequentially processed in a stream transition sequence. In an embodiment, once the decision to switch has been made, the plurality of frames from the primary source are sequentially processed in a stream continuation sequence. The transition manager adjusts the opacity of consecutive frames such that the transition from the stream continuation sequence to the stream transition sequence is not abrupt. This is done by the transition manager by selecting a pre-defined transition coefficient.

[0064] In an embodiment, a mitigation policy for reverse movement of the primary object is configured in the processor 102. One or more designated areas are predefined around determined pre-defined secondary objects which are not moving in the field of view. In an embodiment, a first designated area is a rectangle with width 0-600 pixels and height 0-900 pixels. In another embodiment, a second designated area is a rectangle with width 1520 - 1920 pixels and height 0-900 pixels.

[0065] Figure 2 is an exemplary illustration of a block diagram representation of a system 200 enabling the automatic broadcasting by the stream selection system 205, 100. The system includes one or more sources 201. Each of the one or more sources generate distinct video feeds from various angles and points of view of the overall field of view. In an embodiment, the one or more sources 201 are cameras located at a location of which the video stream is being streamed. The input from the one or more sources 201 are encoded by an encoder 202, and transmitted over a network 203. The network may be wired or wireless, and the description herein is not limited by the type of network. The encoded video from the one or more sources transmitted is received at a decoder 204 and the stream selection system 205, 100. The stream selection system 205, 100 selects and outputs a sequence of frames which will be broadcasted, as described in the description above. The output of the stream selection system 205, 100 is then encoded by a transcoder 206, and transmitted over a network 207. The transcoder 206 is configured to encode the video stream as per a plurality of encoding standards, so that the video can be viewed on multiple types of devices. The encoding standards include HEVC, H.264, AVI, MP4, MPEG-2, MPEG-4 AVC, and VC-1. The encoded output video stream transmitted is then received by a streaming system 208, 104.

[0066] Figure 3 is an exemplary illustration of a block diagram representing a method 300 being implemented by the stream selection system 100. The method 300 comprises a step of receiving, by a processor 102, a plurality of frames from a primary source and a plurality of frames from one or more secondary sources. The processor (interchangeable with ‘processing module 102’ described earlier) is configured to receive real time video feeds from one or more sources, one of which is being streamed and the others analysed as per the present method 300. The plurality of frames is a received in a sequence from each source, since a video is a continuous sequence of frames. The method further comprises configuring, by the processor (102), the plurality of frames from the primary source into a first sequence, and the plurality of frames from the one or more secondary sources into one or more respective secondary sequences, on a first in first out basis. The frame splitter module 105 of the processor 102 is configured to consolidate multiple real time inputs of plurality of frames. Within this consolidation, a first sequence of the plurality of frames from the primary source, and one or more secondary sequences of the respective plurality of frames are discerned from the collective pool of the plurality of frames being received by the processor 102 in real time. The processor 102 then divides the consolidated plurality of frames into individual frames, arranges them in a first in first out (FIFO) sequence, and relays the first frame (of each sequence) to frame processor module (not shown). The processor 102 categorizes the received frames into their respective sequences, i.e., the first sequence and the one or more secondary sequences. The method further comprises dividing, by the processor (102), each frame of each sequence into a pre-defined number of identification areas. The identifier module 106 then divides each frame of each sequence into a pre-defined number of identification areas.. In an embodiment, there are at least 6 identification areas. In an embodiment, these areas include a right inner area, a right outer area, a left inner area, a left outer area, a top inner area, a top outer area, a bottom inner area, and a bottom outer area. In an embodiment, a gap of 60 pixels is introduced between the right inner area and the right outer area, and between the left inner area and the left outer area. The method further comprises determining, by the processor (102), at least a primary object and one or more secondary objects with respect to pre-defined spatial -temporal markers in one of the identification areas. The identifier module 106 determines the one or more pre-defined objects relative to the one or more pre-defined spatial -temporal markers in each frame of the first sequence. In an embodiment, the one or more secondary sequences are not processed by the identifier module at this stage. In another embodiment, at least one of the one or more sequences is processed by the identifier module at this stage. The method further comprises determining, by the processor (102), a spatial coordinate of the primary object and the one or more secondary objects in each frame of the stream continuation sequence. In an embodiment, the stream continuation sequence comprises a series of frames which are received from the primary source and processed. The identifier module 106 determines a spatial coordinate (X-Y) of the primary object and the one or more secondary objects in each frame of the stream continuation sequence. The method further comprises comparing, by the processor (102), the difference between the spatial coordinate of the primary object in a current frame of the stream continuation sequence to a next frame. The identifier module 106 compares the difference between the spatial coordinate of the primary object in a current frame of the stream continuation sequence to a next frame. The method further comprises determining, by the processor (102), a transition factor indicating a movement of the primary object towards one of the pre-defined number of identification areas by at least a pre-defined number of pixels. The logic processor determines a transition factor indicating that the primary object has moved by at least a predefined number of pixels towards one of the pre-defined number of identification areas. If the difference in the coordinate of the primary object in a first frame with respect to the next frame is greater than 30 pixels or less than -30 pixels, the transition factor is registered with the value ‘1’, otherwise ‘O’. In an embodiment, when at least 3 consecutive entries of ‘ 1’ is registered against the transition factor, the logic processor switches to one of the secondary sequences. The method further comprises selecting, by the processor (102), a switching direction towards one of the pre-defined number of identification areas. The switching is based on the direction of movement of the primary object. The one or more secondary sources which becomes the primary source after switching is either displaying the primary object moving towards it, or it is displaying the primary object moving away from it.

[0067] Figure 4 is an exemplary flow chart illustrating a method implemented by the processor 102 in the decision to switch by the processor 102. The method comprises the step of selecting, by the processor, the secondary sequence based on the selected switching direction. The switching is based on the direction of movement of the primary object. The one or more secondary sources which becomes the primary source after switching is either displaying the primary object moving towards it, or it is displaying the primary object moving away from it. The logic processor determines a transition factor indicating that the primary object has moved by at least a predefined number of pixels towards one of the pre-defined number of identification areas. If the difference in the coordinate of the primary object in a first frame with respect to the next frame is greater than 30 pixels or less than -30 pixels, the transition factor is registered with the value ‘ 1’, otherwise ‘O’. In an embodiment, when at least 3 consecutive entries of ‘ 1’ is registered against the transition factor, the logic processor switches to one of the secondary sequences. In an embodiment, once the direction of switching is selected, the plurality of frames from the primary source is processed as a stream continuation sequence, and the plurality of frames from the selected secondary source is processed as the stream transition sequence. The method further comprises the step of switching, by the processor, from a first sequence to the selected secondary sequence. The transition module of the processor 102 selects a pre-defined transition coefficient between a current frame in the stream continuation sequence and an at least predefined number of next frame(s) in the stream transition sequence. In an embodiment, the predefined transition coefficient is selected by determining a ratio of the size of the determined primary object in a previous frame with a next frame. In an embodiment, the predefined number of next frames is between 3 and 10. In an embodiment, the pre-defined number of next frames is 5. In an embodiment, the transition coefficient is the transpose ratio between the previous frame and the succeeding frame for a span of 5 consecutive frames. In an embodiment, the transition coefficient selected by the transition module is as per the condition given in table 1 above. [0068] Figure 5 is an exemplary illustration of a method being implemented by the stream selection system. The method includes the step of Configuring, by the processor, a pre-determined delay between a current frame and a next frame in the first sequence. The delay is induced so that the output stream is coherent and comprehensible to human vision, as well as maintaining the output video quality. The method further comprises the step of Inducing, by the processor, a predetermined audio track corresponding to the first sequence. In an embodiment, the audio track may be live audio from the field of view of the one or more sources. In another embodiment, the audio track may be a secondary live audio imposed upon the live audio from the field of view. The secondary live audio may be live commentary, or the like. The method further comprises the step of transmitting, by the processor, each frame of the first sequence to an encoder. The method further comprises the step of transmitting, by the processor, the encoded video stream to a broadcasting unit.

[0069] In an embodiment of the present invention, the stream selection system 100 decides to switch from the first sequence of frames in the stream continuation sequence to a second sequence of frames in the stream transition sequence on the basis of both the transition factor and the transition coefficient. In an embodiment, the processor selects the direction to switch based on the transition factor alone.

[0070] In an embodiment, the stream selection system 100 is a combination of image processing & Artificial intelligence-based techniques such as artificial neural network (ANN) and heuristics-based system that emulates human classification, prediction, and decision-making skills. In an embodiment, the stream selection system (100) consolidates and transmits the plurality of frames from the one or more sources by utilizing a public internet network, over one or more of a Wi-Fi or cellular network, using an open-source protocol known as RTSP (Real Time Streaming Protocol). In an embodiment, the processor 102 is a graphical processing unit (GPU). In another embodiment, each module of the processor 102 is a GPU. In another embodiment, each module of the processor 102 comprises a GPU and an additional central processing unit (CPU). In an embodiment, the streaming system 104 utilizes a User Interface (UI) unit on an end device, consolidating and presenting all the collected data in an intuitive infographic. In an embodiment, third party apps and APIs may be integrated into one of the streaming system 104 and the end device.

[0071] The present disclosure of the stream selection system and the methods thereof has applications in several fields. During live sports, concerts, or news broadcasts, this system can automate the switching between different camera feeds, providing a smooth and professional viewing experience. In security systems, it can switch between different camera feeds based on detected motion or specific objects, enhancing monitoring efficiency. In multi-participant video calls, it can manage the transitions between speakers or participants, ensuring that the focus is always on the active speaker. Automated switching can assist in live editing or creating dynamic video content from multiple sources. In virtual reality (VR) / augmented reality (AR) applications, it can manage transitions between different views or scenes based on user interactions or detected objects. [0072] Further applications include medical and healthcare services, education and e learning, industrial and manufacturing, public safety and emergency response, and transportation and traffic management to name a few. During surgeries, multiple cameras capture different angles and close-ups of the surgical field. The system can automatically switch views based on the surgeon's actions or specific instruments, providing a comprehensive view for the medical team and for educational recordings. In a smart classroom, cameras can focus on the teacher, the whiteboard, or students. The system can switch views based on the teacher's movements, interactions, or student questions, providing an engaging learning experience for remote learners. For live music concerts or theatre performances, the system can switch between wide shots of the stage, close-ups of performers, and audience reactions, providing a rich and immersive viewing experience. In manufacturing facilities, cameras monitor different stages of the assembly line. The system can switch views based on detected anomalies, specific assembly stages, or quality control checks, improving monitoring and control. During emergency situations like fires, floods, or accidents, multiple cameras capture different areas of the affected zone. The system can switch views based on detected activities or responder actions, providing comprehensive situational awareness to command centres. In traffic management systems, cameras monitor various intersections and road segments. The system can switch views based on traffic flow, detected incidents, or time of day, aiding in efficient traffic control and management.

Claims

We claim:

1. A stream selection system for transitioning between one or more sources comprising a processing module, a memory module, and a streaming module, the memory module is configured to store a plurality of selected frames for the streaming system, and the processing module is configured to: receive a plurality of frames from a primary source and a plurality of frames from one or more secondary sources, configure the plurality of frames from the primary source into a first sequence, and the plurality of frames from the one or more secondary sources into one or more respective secondary sequences, on a first in first out basis, divide each frame of each sequence into a pre-defined number of identification areas, determine at least a primary object and one or more secondary objects with respect to pre-defined spatial-temporal markers in one of the identification areas, determine a spatial coordinate of the primary object and the one or more secondary objects in each frame of the stream continuation sequence, compare the difference between the spatial coordinate of the primary object in a current frame of the stream continuation sequence to a next frame, determine a transition factor indicating a movement of the primary object towards one of the pre-defined number of identification areas by at least a predefined number of pixels, select a switching direction towards one of the pre-defined number of identification areas.

2. The stream selection system as claimed in claim 1, wherein the switching direction selected by the processor is one of a direction towards the moving primary object and a direction away from the moving primary object.

3. The stream selection system as claimed in claim 1, wherein the switching direction is selected when the primary object when the primary object is determined to move towards the identification area for a pre-defined number of consecutive frames, and the transition factor is equal to a pre-defined value of each of pre-defined value of each of the pre-defined number of consecutive frames.

4. The stream selection system as claimed in claim 3, wherein the pre-defined number of consecutive frames is between 1 and 10, and the pre-defined value of the transition factor is equal to one.

5. The stream selection system as claimed in claim 1, wherein pre-defined number of identification areas include a left inner area, a left outer area, a right inner area, a right outer area, a bottom inner area, a bottom outer area, a top inner area, and a top outer area.

6. The stream selection system as claimed in claim 5, wherein a gap of 60 pixels is introduced between the right inner area and the right outer area, and between the left inner area and the left outer area.

7. The stream selection system as claimed in claim 1, wherein the transition factor is determined as one when the predefined number of pixels is more than or equal to 30 in any direction.

8. The stream selection system as claimed in claim 1, wherein the transition factor is negative when the primary object is determined to have moved leftwards or downwards.

9. The stream selection system as claimed in claim 1, wherein the processing module is further configured to: select the secondary sequence based on the selected switching direction, switch from a first sequence to the selected secondary sequence.

10. The stream selection system as claimed in claim 1, wherein the processing module is further configured to adjust an opacity of a pre-defined number of next frames of the first sequence to an opacity of a pre-defined number of next frames of the selected secondary sequence.

11. The stream selection system as claimed in claim 10, wherein the pre-defined number of next frames of the selected secondary sequence is between 3 and 10.

12. The stream selection system as claimed in claim 1, wherein the primary and secondary sources comprise of movable all weather video capturing devices.

13. A method for automated switching from a current video source to a next video source, the method comprising the steps of: receiving, by a processor (102), a plurality of frames from a primary source and a plurality of frames from one or more secondary sources, configuring, by the processor (102), the plurality of frames from the primary source into a first sequence, and the plurality of frames from the one or more secondary sources into one or more respective secondary sequences, on a first in first out basis, dividing, by the processor (102), each frame of each sequence into a predefined number of identification areas, determining, by the processor (102), at least a primary object and one or more secondary objects with respect to pre-defined spatial -temporal markers in one of the identification areas, determining, by the processor (102), a spatial coordinate of the primary object and the one or more secondary objects in each frame of the stream continuation sequence, comparing, by the processor (102), the difference between the spatial coordinate of the primary object in a current frame of the stream continuation sequence to a next frame, determining, by the processor (102), a transition factor indicating a movement of the primary object towards one of the pre-defined number of identification areas by at least a pre-defined number of pixels, selecting, by the processor (102), a switching direction towards one of the pre-defined number of identification areas.

14. The method as claimed in claim 13, wherein the switching direction selected by the processor is one of a direction towards the moving primary object and a direction away from the moving primary object.

15. The method as claimed in claim 13, wherein the switching direction is selected when the primary object when the primary object is determined to move towards the identification area for a pre-defined number of consecutive frames, and the transition factor is equal to a pre-defined value of each of the pre-defined number of consecutive frames.

16. The method as claimed in claim 15, wherein the pre-defined number of consecutive frames is between 1 and 10, and the pre-defined value of the transition factor is equal to one.

17. The method as claimed in claim 13, wherein pre-defined number of identification areas include a left inner area, a left outer area, a right inner area, a right outer area, a bottom inner area, a bottom outer area, a top inner area, and a top outer area.

18. The method as claimed in claim 17, wherein a gap of 60 pixels is introduced between the right inner area and the right outer area, and between the left inner area and the left outer area.

19. The method as claimed in claim 13, wherein the transition factor is determined as one when the predefined number of pixels is more than or equal to 30 in any direction.

20. The method as claimed in claim 13, wherein the transition factor is negative when the primary object is determined to have moved leftwards or downwards.

21. The method as claimed in claim 13 further comprising selecting, by the processor (102), the secondary sequence based on the selected switching direction, switching, by the processor (102), from the first sequence to the selected secondary sequence.

22. The method as claimed in claim 13 further comprising adjusting, by the processor, an opacity of a pre-defined number of next frames of the first sequence to an opacity of a pre-defined number of next frames of the selected secondary sequence.

23. The method as claimed in claim 22, wherein the pre-defined number of next frames of the selected secondary sequence is between 3 and 10.

24. The method as claimed in claim 13 further comprising configuring, by the processor (102), a pre-determined delay between a current frame and a next frame in the first sequence, inducing, by the processor (102), a pre-determined audio track corresponding to the first sequence, transmitting, by the processor (102), each frame of the first sequence to an encoder, transmitting, by the processor (102), the encoded video stream to a broadcasting unit.

25. The method as claimed in claim 13, wherein the primary and secondary sources comprise of movable all weather video capturing devices.