WO2019134499A1 - Procédé et dispositif d'étiquetage en temps réel de trames vidéo - Google Patents
Procédé et dispositif d'étiquetage en temps réel de trames vidéo Download PDFInfo
- Publication number
- WO2019134499A1 WO2019134499A1 PCT/CN2018/121730 CN2018121730W WO2019134499A1 WO 2019134499 A1 WO2019134499 A1 WO 2019134499A1 CN 2018121730 W CN2018121730 W CN 2018121730W WO 2019134499 A1 WO2019134499 A1 WO 2019134499A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- frame
- user equipment
- video frame
- video
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/478—Supplemental services, e.g. displaying phone caller identification, shopping application
- H04N21/4788—Supplemental services, e.g. displaying phone caller identification, shopping application communicating with other users, e.g. chatting
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/431—Generation of visual interfaces for content selection or interaction; Content or additional data rendering
- H04N21/4312—Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
- H04N21/4316—Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations for displaying supplemental content in a region of the screen, e.g. an advertisement in a separate window
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/472—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
- H04N21/47205—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for manipulating displayed content, e.g. interacting with MPEG-4 objects, editing locally
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/472—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
- H04N21/47214—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for content reservation or setting reminders; for requesting event notification, e.g. of sport results or stock market
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/141—Systems for two-way working between two video terminals, e.g. videophone
Definitions
- the present application relates to the field of computers, and in particular, to a technique for real-time annotation of video frames.
- the universal video streaming means the sender of the video stream encodes the video according to the encoding protocol and sends it to the video receiver through the network, and the receiver receives the corresponding video, and decodes the video, and takes a screenshot of the decoded video, and The image is marked on the screen, and the labeled image is encoded and sent to the video sender through the network, or the labeled image is sent to the server, and the video sender receives the image added by the server.
- the decoded image with the annotation information received by the sender has a certain loss of clarity, and the image with the annotation is transmitted from the receiver to the video sender through the network.
- the speed of the process depends on the transmission of the current network. The rate, and thus the transmission speed, is affected, and there is a delay phenomenon, which is not conducive to real-time interaction between the two parties.
- a method for real-time annotation of a video frame on a first user equipment side comprising:
- a method for real-time annotation of a video frame on a second user equipment side comprising:
- a method for real-time annotation of a video frame on a third user equipment side comprising:
- a method for real-time annotation of a video frame on a network device side comprising:
- a method for real-time annotation of a video frame includes:
- the first user equipment sends a video stream to the second user equipment
- the first user equipment receives the labeling operation information, and presents a corresponding labeling operation in real time on the first video frame according to the labeling operation information.
- a method for real-time annotation of a video frame includes:
- the first user equipment sends a video stream to the network device
- the network device receives the labeling operation information of the second video frame by the second user equipment, and forwards the labeling operation information to the first user equipment;
- the first user equipment receives the labeling operation information, and presents a corresponding labeling operation in real time on the first video frame according to the labeling operation information.
- a method for real-time annotation of a video frame includes:
- the first user equipment sends a video stream to the second user equipment and the third user equipment;
- a method for real-time annotation of a video frame includes:
- the first user equipment sends a video stream to the network device
- the network device receives the labeling operation information of the second video frame by the second user equipment, and forwards the labeling operation information to the first user equipment and the third user equipment;
- the first user equipment receives the labeling operation information, and presents a corresponding labeling operation in real time on the first video frame according to the labeling operation information;
- a first user equipment for real-time annotation of a video frame comprising:
- a video sending module configured to send a video stream to the second user equipment
- a frame information receiving module configured to receive second frame related information of the second video frame that is intercepted by the second user equipment in the video stream;
- a video frame determining module configured to determine, according to the second frame related information, a first video frame corresponding to the second video frame in the video stream;
- An annotation receiving module configured to receive the labeling operation information of the second video frame by the second user equipment
- an annotation presentation module configured to present a corresponding annotation operation on the first video frame in real time according to the annotation operation information.
- a second user equipment for real-time annotation of a video frame comprising:
- a video receiving module configured to receive a video stream sent by the first user equipment
- a frame information determining module configured to send second frame related information of the intercepted second video frame to the first user equipment according to a screenshot operation of the user in the video stream;
- An annotation obtaining module configured to acquire the labeling operation information of the second video frame by the user
- an annotation sending module configured to send the labeling operation information to the first user equipment.
- a third user equipment for real-time annotation of a video frame comprising:
- a third video receiving module configured to receive a video stream that is sent by the first user equipment to the second user equipment and the third user equipment
- a third frame information receiving module configured to receive second frame related information of the second video frame that is intercepted by the second user equipment in the video stream;
- a third video frame determining module configured to determine, according to the second frame related information, a third video frame corresponding to the second video frame in the video stream;
- a third label receiving module configured to receive the labeling operation information of the second video frame by the second user equipment
- the third rendering module is configured to present a corresponding labeling operation on the third video frame in real time according to the labeling operation information.
- a network device for real-time annotation of a video frame comprising:
- a video forwarding module configured to receive and forward a video stream sent by the first user equipment to the second user equipment
- a frame information receiving module configured to receive second frame related information of the second video frame that is intercepted by the second user equipment in the video stream;
- a frame information forwarding module configured to forward the second frame related information to the first user equipment
- An annotation receiving module configured to receive the labeling operation information of the second video frame by the second user equipment
- an annotation forwarding module configured to forward the labeling operation information to the first user equipment.
- a system for real-time annotation of video frames comprising a first user device as described above and a second user device as described above.
- a system for real-time annotation of video frames including a first user device as described above, a second user device as described above, and a network device as described above.
- a system for real-time annotation of a video frame comprising a first user device as described above, a second user device as described above, and a third user as described above device.
- a system for real-time annotation of a video frame comprising a first user device as described above, a second user device as described above, a third user as described above Equipment and network equipment as described above.
- a computer readable medium comprising instructions that, when executed, cause a system to:
- a computer readable medium comprising instructions that, when executed, cause a system to:
- a computer readable medium comprising instructions that, when executed, cause a system to:
- a computer readable medium comprising instructions that, when executed, cause a system to:
- the present application determines a video sender's unencoded video frame image according to the video receiver's screenshot and the corresponding video frame related information by buffering a certain video frame on the video sender, and the video is
- the annotation information of the receiver on the screenshot is transmitted to the video sender in real time.
- the annotation is displayed on the video frame image corresponding to the video sender in real time, so the sender can observe the labeling process of the video receiver in real time, and the resolution is high because the labeled video frame is not encoded and so on; further, the scheme It also enables real-time display of annotations, with good practicability, strong interactivity, and improved user experience and broadband utilization.
- the video sender can send the video frame to the video receiver after determining the uncoded video frame, and the video receiver can also mark the high quality video frame, thereby greatly improving the user experience.
- FIG. 1 shows a system topology diagram for real-time annotation of video frames in accordance with an embodiment of the present application
- FIG. 2 shows a flow chart of a method for real-time annotation of video frames at a first user equipment end according to an aspect of the present application
- FIG. 3 is a flow chart showing a method for real-time annotation of video frames on a second user equipment side according to another aspect of the present application
- FIG. 4 is a flowchart of a method for real-time annotation of a video frame at a third user equipment end according to still another aspect of the present application;
- FIG. 5 is a flowchart of a method for real-time annotation of video frames on a network device side according to still another aspect of the present application.
- FIG. 6 shows a system method diagram for real-time annotation of video frames in accordance with an aspect of the present application
- FIG. 7 shows a system method diagram for real-time annotation of video frames in accordance with another aspect of the present application.
- FIG. 8 shows a schematic diagram of a first user equipment for real-time annotation of video frames in accordance with an aspect of the present application
- FIG. 9 shows a schematic diagram of a second user equipment for real-time annotation of video frames in accordance with another aspect of the present application.
- FIG. 10 shows a schematic diagram of a third user equipment for real-time annotation of video frames in accordance with another aspect of the present application
- FIG. 11 shows a schematic diagram of a network device for real-time annotation of video frames in accordance with still another aspect of the present application
- FIG. 12 illustrates an exemplary system that can be used to implement various embodiments described in this application.
- the terminal, the device of the service network, and the trusted party each include one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
- processors CPUs
- input/output interfaces network interfaces
- memory volatile and non-volatile memory
- the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory.
- RAM random access memory
- ROM read only memory
- Memory is an example of a computer readable medium.
- Computer readable media includes both permanent and non-persistent, removable and non-removable media.
- Information storage can be implemented by any method or technology.
- the information can be computer readable instructions, data structures, modules of programs, or other data.
- Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage,
- the device referred to in the present application includes but is not limited to a user equipment, a network device, or a device formed by integrating a user equipment and a network device through a network.
- the user equipment includes, but is not limited to, any mobile electronic product that can perform human-computer interaction with the user (for example, human-computer interaction through a touchpad), such as a smart phone, a tablet computer, etc., and the mobile electronic product can be operated by any operation.
- System such as android operating system, iOS operating system, etc.
- the network device includes an electronic device capable of automatically performing numerical calculation and information processing according to an instruction set or stored in advance, and the hardware includes but is not limited to a microprocessor, an application specific integrated circuit (ASIC), and programmable logic.
- ASIC application specific integrated circuit
- the network device includes, but is not limited to, a computer, a network host, a single network server, a plurality of network server sets, or a plurality of servers; wherein the cloud is composed of a large number of computers or network servers based on Cloud Computing.
- cloud computing is a kind of distributed computing, a virtual supercomputer composed of a group of loosely coupled computers.
- the network includes, but is not limited to, the Internet, a wide area network, a metropolitan area network, a local area network, a VPN network, a wireless ad hoc network (Ad Hoc network), and the like.
- the device may also be a program running on the user equipment, the network device, or the user equipment and the network device, the network device, the touch terminal, or the network device and the touch terminal integrated through the network.
- FIG. 1 shows a typical scenario of the present application.
- the first user equipment receives the annotation information sent by the second user equipment and performs local storage while performing video communication with the second user equipment and the third user equipment.
- the uncoded video frame presents the annotation information in real time.
- the process may be performed by the first user equipment and the second user equipment, or may be completed by the first user equipment, the second user equipment, and the network equipment, and may also be performed by the first user equipment, the second user equipment, and the first user equipment.
- the three user equipments are completed, and may also be completed by the first user equipment, the second user equipment, the third user equipment, and the network equipment.
- the first user equipment, the second user equipment, and the third user equipment are any electronic devices that can record and send video, such as smart glasses, mobile phones, tablets, notebooks, smart watches, etc., where the first user equipment is The following embodiments are described by using the smart glasses, the second user equipment, and the third user equipment as a tablet. Those skilled in the art should understand that the embodiments are also applicable to other user equipments such as mobile phones, notebooks, and smart watches.
- step S11 the first user equipment end sends a video stream to the second user equipment; in step S12, the first user equipment receives the second video frame intercepted by the second user equipment in the video stream.
- step S13 the first user equipment determines, according to the second frame related information, a first video frame corresponding to the second video frame in the video stream; in step S14, the first The user equipment receives the labeling operation information of the second video frame by the second user equipment; in step S15, the first user equipment presents a corresponding labeling operation in real time on the first video frame according to the labeling operation information.
- the first user equipment end sends a video stream to the second user equipment.
- the first user equipment establishes a communication connection with the second user equipment through a wired or wireless network, and the first user equipment encodes the video stream to the second user equipment by using a video communication manner.
- the first user equipment receives second frame related information of the second video frame intercepted by the second user equipment in the video stream. For example, the second user equipment determines the second frame related information of the video frame corresponding to the screen capture screen based on the screen capture operation of the second user, and then the first user equipment receives the second frame related to the second video frame sent by the second user equipment.
- Information wherein the second frame related information includes, but is not limited to, second video frame identification information, a second video frame encoding start time, a second video frame decoding end time, and a second video frame codec and transmission total duration information. .
- the first user equipment determines, according to the second frame related information, a first video frame corresponding to the second video frame in the video stream. For example, the first user equipment locally stores a period of time or a certain number of transmitted unencoded video frames, and the first user equipment stores the uncoded locally according to the second frame related information sent by the second user equipment. In the video frame, the uncoded first video frame corresponding to the screen capture is determined.
- step S14 the first user equipment receives the labeling operation information of the second video frame by the second user equipment.
- the second user equipment generates corresponding labeling operation information based on the labeling operation of the second user, and sends the labeling operation information to the first user equipment in real time, and the first user receives the labeling operation information.
- the first user equipment presents a corresponding labeling operation in real time on the first video frame according to the labeling operation information. For example, the first user equipment displays a corresponding labeling operation in real time on the first video frame based on the received labeling operation information, such as displaying the first video frame in the form of a small window in the current interface, and corresponding to the first video frame in the current interface.
- the location presents a corresponding annotation operation at a rate of one frame every 50 ms.
- user A holds smart glasses
- user B holds tablet B
- smart glasses and tablet B establish video communication via wired or wireless network.
- Smart glasses encode the currently collected picture and send it to tablet B, and cache.
- the tablet B receives the video stream, and based on the screen capture operation of the user B, determines a second video frame corresponding to the screen capture image, and sends the second frame related information corresponding to the second video frame to the smart glasses.
- the second frame related information includes, but is not limited to, a second video frame identification information, a second video frame encoding start time, a second video frame decoding end time, and a second video frame codec and transmission total duration information.
- the smart glasses receive the second frame related information of the second video frame, and determine a corresponding uncoded first video frame in the locally stored video frame based on the second frame related information.
- the tablet computer B generates the real-time labeling operation information according to the labeling operation of the user B, and sends the labeling operation information to the smart glasses in real time.
- the smart glasses After receiving the labeling operation information, the smart glasses display the corresponding unencoded in the preset area of the smart glasses.
- the first video frame, and the corresponding labeling operation is presented in real time at a position corresponding to the first video frame.
- the method further includes step S16 (not shown).
- step S16 the first user equipment stores the video frame in the video stream; wherein, in step S13, the first user equipment determines, according to the second frame related information, the stored video frame from the first The first video frame corresponding to the two video frames.
- the first user equipment sends the video stream to the second user equipment, and locally stores a period of time or a certain number of uncoded video frames, wherein a period of time or a certain number may be a preset fixed value,
- the threshold may be dynamically adjusted according to the network condition or the transmission rate; then, the first user equipment determines the corresponding uncompiled in the locally stored video frame based on the second frame related information of the second video frame sent by the second user equipment.
- the stored video frame meets, but is not limited to, at least one of the following: a time interval between a transmission time of the stored video frame and a current time is less than or equal to a video frame storage duration threshold; the stored The cumulative number of video frames is less than or equal to a predetermined number of video frame storage thresholds.
- the smart glasses send the collected images to the tablet B and store them for a period of time or a certain number of uncoded video frames locally, wherein the duration or the number may be a fixed value of the system or manually preset, such as A certain duration or a certain number of video frame thresholds obtained by statistical analysis of big data; a period of time or a certain number of video frames may also be a video frame threshold dynamically adjusted according to network conditions or transmission rates.
- the duration or number threshold of the dynamic adjustment may be determined according to the codec and the total duration information of the video frame, such as calculating the total length of the codec and the current video frame, and using the duration as a unit duration or within the duration.
- the number of transmitted video frames is taken as a unit number, and the dynamic video frame duration or number threshold is set with reference to the current unit duration or unit number.
- the set predetermined or dynamic video frame storage duration threshold should be greater than or equal to one unit duration.
- the set predetermined or dynamic video frame storage threshold should be greater than or equal to one unit number. Then, the smart glasses determine a corresponding uncoded first video frame in the stored video according to the second frame related information of the second video frame sent by the tablet B, wherein the stored video frame is sent and current.
- the interval of time is less than or equal to the video frame storage duration threshold, or the accumulated number of stored video frames is less than or equal to a predetermined video frame storage threshold.
- the method further includes step S17 (not shown).
- step S17 the first user equipment acquires codec and total transmission duration information of the video frame in the video stream, and adjusts the video frame storage duration threshold or the video frame according to the codec and total transmission duration information. The number of storage thresholds.
- the first user equipment records the encoding start time of each video frame, and after encoding, sends the video frame to the second user equipment, and the second user equipment receives and records each video frame decoding end time; subsequently, the second user equipment
- the video frame decoding end time is sent to the first user equipment, and the first user equipment calculates the codec and total transmission duration information of the current video frame based on the encoding start time and the decoding end time, or the second user equipment is based on the encoding start time and
- the decoding end time calculates the codec and total transmission duration information of the current video frame, and sends the codec and the total transmission duration information to the first user equipment.
- the first user equipment adjusts the video frame storage duration threshold or the video frame storage threshold according to the codec and the total transmission duration information. If the duration information is used as a unit time reference, setting a certain multiple of the video frame duration is
- the video frame stores a duration threshold; for example, the number of video frames that can be sent in the duration information is calculated according to the duration information and the rate at which the first user equipment sends the video frame, and the number is used as a unit number to set a certain multiple of the video frame. The number is used as a threshold for the number of video frames stored.
- the smart glasses record the i-th video frame encoding start time as T si , and after encoding, send the video frame to the tablet B, and the tablet computer B receives and records the video frame decoding end time as T ei . Subsequently, the tablet B sends the video frame decoding end time T ei to the smart glasses, and the smart glasses calculate the decoding end time T ei according to the received i-th video frame and the encoding start time T si recorded locally .
- the total length of codec and transmission T i T ei -T si , and the total length of time T i of the codec and transmission is returned to the smart glasses.
- the smart glasses determine the duration of the video frame in the 1.3T i time dynamically saved by the smart glasses according to the big data statistics according to the encoding and decoding of the i-th video frame and the total transmission time T i .
- dynamically adjust the magnification according to the network transmission rate for example, setting the buffer duration threshold to (1+k)T i , where k is a threshold adjusted according to network fluctuation, and if the network fluctuation is large, setting k to 0.5, the network When the fluctuation is small, set k to 0.2 or the like.
- the smart glasses can dynamically adjust the magnification according to the current network transmission rate, such as setting the buffer number threshold (1+k)N, where k is a threshold adjusted according to network fluctuations, and if the network fluctuation is large, k is set. When it is 0.5 and the network fluctuation is small, set k to 0.2 or the like.
- the first user equipment sends the video stream and the frame identification information of the transmitted video frame in the video stream to the second user equipment; wherein, in step S13, the first user equipment is configured according to The second frame related information determines a first video frame corresponding to the second video frame in the video stream, where frame identification information of the first video frame corresponds to the second frame related information.
- the frame identification information of the video frame may be a codec time corresponding to the video frame, or may be a number corresponding to the video.
- the first user equipment performs encoding processing on a plurality of video frames to be transmitted, and sends frame corresponding information of the corresponding video stream and the transmitted video frame in the video stream to the second User equipment.
- the first user equipment performs encoding processing on a plurality of video frames to be transmitted, and acquires an encoding start time of the plurality of video frames to be transmitted, and sends the multiple video frames and their encoding start time to the second.
- the frame identification information of the transmitted video frame in the video stream includes encoding start time information of the transmitted video frame.
- the smart glasses record the encoding start time of each video frame, and after encoding, send the video frame and the encoding start time of the transmitted video frame to the tablet computer B, wherein the transmitted video frame includes the video that is to be sent after the current encoding is completed.
- Frames and transmitted video frames may be sent at a certain time or at a certain interval of a certain video frame, and the coding start time of the transmitted video frame is sent to the tablet B, or the coding start time of the first video frame may be directly Video frames are sent to tablet B at the same time.
- the tablet computer operates the screen capture operation of the user B, determines the video frame corresponding to the screen capture screen, and sends the second frame related information of the corresponding second video frame to the smart glasses, wherein the second frame related information and the second frame identification information Correspondingly, including but not limited to at least one of the following: an encoding start time of the second video frame, a second video frame decoding end time, a second video frame codec and transmission total duration information, a second video frame corresponding number or Images, etc.
- the smart glasses receive the second frame related information, and determine correspondingly stored uncoded first video frames according to the second frame related information, such as according to the encoding start time of the second video frame, the second video frame decoding end time, Determining a coding start time of the uncoded first video frame corresponding to the second video frame, and determining a corresponding first video frame, and determining, by the second video frame, corresponding to the second video frame, The number directly determines the first video frame of the same number, and also determines the corresponding first video frame in the stored uncoded video frame by image recognition of the second video frame.
- the method further includes step S18 (not shown).
- step S18 the first user equipment presents the first video frame; wherein, in step S15, the first user equipment superimposes a corresponding labeling operation on the first video frame according to the labeling operation information. For example, the first user equipment determines the first video frame that has not been coded, and displays the first video frame in a preset position in the current interface or in a small window; subsequently, the first user equipment receives the annotation according to the real-time reception. The operation information is superimposed on the corresponding position of the first video frame to present a corresponding labeling operation.
- the smart glasses determine a corresponding uncoded first video frame according to the second frame related information sent by the tablet B, and display the first video frame at a preset position of the interface of the smart glasses. Then, the smart glasses receive the real-time annotation operation sent by the tablet B. The smart glasses determine the corresponding position of the annotation operation in the currently displayed first video frame, and present the current annotation operation in real time at the corresponding location.
- the method further includes step S19 (not shown).
- step S19 the first user equipment sends the first video frame to the second user equipment as a preferred frame for presenting the labeling operation. For example, the first user equipment determines the first video frame that has not been coded, and sends the first video frame to the second user equipment for the second user equipment to present the first video frame of higher quality.
- the smart glasses determine a corresponding unencoded first video frame according to the second frame related information sent by the tablet B, and send the first video frame as a preferred frame to the tablet B, such as by lossless compression.
- Tablet B receives the first video frame and presents the first video frame.
- the first user equipment sends a video stream to the second user equipment and the third user equipment.
- a communication connection is established between the first user equipment, the second user equipment, and the third user equipment, where the first user equipment is the current video frame sender, and the second user equipment and the third user equipment are the current video frame receiving.
- the first user equipment sends a video stream to the second user equipment and the third user equipment through the communication connection.
- user A holds smart glasses
- user B holds tablet B
- user C holds tablet C
- smart glasses and tablet B
- tablet C establishes video communication through wired or wireless network, and smart glasses will be currently collected.
- the picture is encoded and sent to each tablet B and buffered for a period of time or a certain number of video frames.
- the tablet B After receiving and decoding, the tablet B receives the video stream, and based on the screen capture operation of the user B, determines a second video frame corresponding to the screen capture image, and sends the second frame related information corresponding to the second video frame to the smart glasses and The tablet C, wherein the second frame related information includes but is not limited to: the second video frame identification information, the second video frame encoding start time, the second video frame decoding end time, and the second video frame codec and transmission total duration Information, etc.
- the smart glasses receive the second frame related information of the second video frame, and determine a corresponding uncoded first video frame in the locally stored video frame based on the second frame related information.
- the tablet computer B generates real-time labeling operation information according to the labeling operation of the user B, and sends the labeling operation information to the smart glasses and the tablet computer C in real time.
- the smart glasses After receiving the labeling operation information, the smart glasses display the corresponding areas in the preset area of the smart glasses. The first video frame that is not coded, and the corresponding labeling operation is presented in real time at the position corresponding to the first video frame.
- the tablet propyl receives the second frame related information and the labeled operation information, and finds the corresponding third video frame in the locally cached codec video library according to the second frame related information. And presenting a corresponding labeling operation in the third video frame based on the third video frame and the labeling operation information.
- the method further includes step S010 (not shown).
- step S010 the first user equipment sends the first video frame as a preferred frame for presenting the labeling operation to the second user equipment and/or the third user equipment.
- the first user equipment determines a corresponding first video frame in the locally cached video frame according to the second frame related information, and sends the first video frame to the second user equipment and/or the third user equipment.
- the second user equipment and/or the third user equipment receives the uncoded first video frame, the first video frame is presented, and the second user and/or the third user may perform an annotation operation based on the first video frame.
- the uncoded first video frame is sent to the tablet B through lossless compression or high quality compression.
- Tablet PC C wherein Tablet PC B and Tablet PC C determine whether to obtain the first video frame according to the quality of the current communication network connection, or select the transmission mode of the first video frame according to the quality of the current communication network connection, such as good network quality
- lossless compression high-quality compression is used when the network quality is poor.
- the first user equipment sends the first video frame and the second frame related information to the second user equipment and/or the third user equipment, where The first video frame is used as a preferred frame to present the annotation operation in the second user device or the third user device.
- the tablet ethyl unit performs a plurality of screen capture operations on the operation of the user B, and the tablet computer B determines the screen capture operation corresponding to the first video frame according to the second frame related information, for example, according to the second frame screenshot time.
- a screen capture operation corresponding to a video frame, and the second frame related information is presented in the first video frame.
- the tablet computer C receives the second frame related information and the first video frame, and presents the second frame related information in the window in which the first video frame is presented while presenting the first video frame.
- FIG. 3 illustrates a method for real-time annotation of a video frame at a second user equipment end according to another aspect of the present application, wherein the method includes step S21, step S22, step S23, and step S24.
- step S21 the second user equipment receives the video stream sent by the first user equipment; in step S22, the second user equipment sends the first user equipment to the first user equipment according to the user's screenshot operation in the video stream.
- step S23 the second user equipment acquires the labeling operation information of the second video frame by the user; in step S24, the second user equipment The first user equipment sends the labeling operation information.
- the second user equipment receives and presents the video stream sent by the first user equipment; and the second user equipment determines, according to the screen capture operation of the second user, the second video frame corresponding to the current screen capture, and the second video frame is The second frame related information is sent to the first user equipment. Then, the second user equipment generates the labeling operation information based on the labeling operation of the second user, and sends the labeling operation information to the first user equipment.
- user B holds tablet B
- user A holds smart glasses
- tablet B and smart glasses communicate video over wired or wireless networks.
- the tablet computer B receives and presents the video stream sent by the smart glasses, and determines the second video frame corresponding to the screen capture screen according to the screen capture operation of the user B. Then, the tablet B sends the second frame related information corresponding to the second video frame to the smart glasses, and the smart glasses receive the second frame related information and determine a corresponding first video frame based on the second frame related information.
- the tablet ethyl unit generates the corresponding labeling operation information in the labeling operation of the user B, and sends the labeling operation information to the smart glasses in real time.
- the smart glasses present a first video frame at a preset position of the interface according to the first video frame and the labeled operation information, and present a corresponding labeling operation in real time in the corresponding position in the first video frame.
- the second user equipment receives the video stream sent by the first user equipment, and the frame identification information of the transmitted video frame in the video stream; wherein the second frame related information And including at least one of the following: frame identification information of the second video frame; frame related information generated based on frame identification information of the second video frame.
- the first user equipment sends the video stream to the second user equipment, and also sends the frame identifier information of the sent video frame in the video stream to the second user equipment, where the second user equipment receives the video stream, and the video stream The frame identification information of the transmitted video frame.
- the second user equipment determines, according to the screen capture operation of the second user, the second video frame corresponding to the current screen capture, and sends the second frame related information of the second video frame to the first user equipment, where the second video
- the second frame related information of the frame includes but is not limited to: frame identification information of the second video frame; frame related information generated based on the frame identification information of the second video frame.
- the smart glasses send the frame identification information corresponding to the video frames in the already transmitted video stream to the tablet B while transmitting the video stream.
- the tablet computer B detects the screen capture operation of the user B. Based on the screen of the current screen capture, it is determined that the screen capture screen corresponds to the second video frame, and the second frame related information corresponding to the second video frame is sent to the smart glasses, wherein the second video
- the frame related information includes, but is not limited to, frame identification information of the second video frame, frame related information generated based on the frame identification information of the second video frame, where the frame identification information of the second video frame may be the encoding of the video frame.
- the frame-related information generated based on the frame identification information of the second video frame may be the decoding end time of the video frame or the total length of the codec and the transmission time information, etc., at the start time or the number corresponding to the video frame.
- the frame identification information includes encoding start time information of the second video frame.
- the first user equipment performs an encoding process on the video frame, and sends the frame identification information of the corresponding video stream and the transmitted video frame in the video stream to the second user equipment, where the frame identification information of the video frame includes The encoding start time of the video frame.
- the second frame related information includes decoding end time information and codec and transmission total duration information of the second video frame. The second user equipment receives and presents the video stream, and records a corresponding decoding end time, determines a corresponding second video frame based on the screen capture operation, and determines a corresponding edit according to the encoding start time and the decoding end time of the second video frame. Decode and transmit total duration information.
- the smart glasses record the encoding start time of each video frame, and after encoding, send the encoding start time of the video frame and the transmitted video frame to the tablet B.
- Tablet B receives and presents the video frame and records the decoding end time of the video frame.
- the screen is operated by the screen capture operation of the user E, and the corresponding second video frame is determined, and the codec and the total transmission duration information of the second video frame are determined according to the coding start time and the decoding end time corresponding to the second video frame.
- the tablet B sends the second frame related information of the second video frame to the smart glasses, where the second frame related information includes, but is not limited to, an encoding start time of the second video frame, a codec of the second video frame, and Transfer total time information, etc.
- the second user equipment acquires the labeling operation information of the second video frame by the user in real time; wherein, in step S24, the second user equipment sends the first user equipment to the first user equipment.
- the annotation operation is sent in real time.
- the second user equipment acquires the corresponding labeling operation information in real time based on the operation of the second user, for example, collecting the corresponding labeling operation information at a certain time interval. Then, the second user equipment sends the obtained annotation operation information to the first user equipment in real time.
- the tablet computer B collects the labeling operation of the user B on the screen capture screen, for example, the user B draws a circle, an arrow, a text, a box and the like on the screen.
- Tablet B records the position and path of the marked brush. For example, through multiple points on the screen, the position corresponding to the corresponding point is obtained, and the position where the multiple points are connected is marked.
- Tablet PC B obtains the corresponding labeling operation in real time and sends it to the smart glasses in real time, such as collecting and sending labeling operations at a frequency of 50 ms.
- the method further includes step S25 (not shown).
- step S25 the second user equipment receives the first video frame sent by the first user equipment, where the first video frame is used as a preferred frame for presenting the labeling operation, in the second video frame. Loading the first video frame to replace the second video frame in a display window, wherein the labeling operation is displayed on the first video frame.
- the second user equipment determines a second video frame corresponding to the current screen capture, and sends the second frame related information of the second video frame to the first user equipment; the first user equipment is based on the second video frame.
- the tablet ethyl enters the screen capture mode by the operation of the user, determines the second video frame corresponding to the current picture, and sends the second frame related information of the second video frame to the smart glasses end, where the second frame related information includes However, it is not limited to: an encoding start time of the second video frame or a number corresponding to the video frame.
- the smart glasses determine a corresponding uncoded first video frame according to the second frame related information sent by the tablet B, and send the first video frame to the tablet B, such as by using lossless compression to the first video.
- the frame is sent to the tablet B, or the first video frame is sent to the tablet B through the lossy lossy compression.
- the lossy compression process guarantees a higher quality than the locally buffered video frame of the tablet.
- the tablet B receives and presents the first video frame, such as being presented in the form of a small window next to the current video, or displaying the first video frame on a large screen, presenting the current video in the form of a small window, and the like. Subsequently, the tablet computer B obtains the labeling operation information and the like regarding the first video frame according to the operation of the second user.
- the method further includes step S26 (not shown).
- step S26 the second user equipment receives the first video frame and the second frame related information sent by the first user equipment, where the first video frame is used as a preferred frame for presenting the labeling operation. Determining, according to the second frame related information, that the first video frame is used to replace the second video frame, and loading the first video frame in a display window of the second video frame to replace the a second video frame, wherein the labeling operation is displayed on the first video frame.
- the tablet B receives the unencoded first video frame and the second frame related information sent by the smart glasses, wherein the second frame related information includes a screenshot time of the second video frame and a video frame of the second video frame. Number, etc.
- the tablet ethyl unit performs a plurality of screen capture operations on the operation of the user B, and the tablet computer B determines the screen capture operation corresponding to the first video frame according to the second frame related information, for example, according to the second frame screenshot time. A screen capture operation corresponding to a video frame, and the second frame related information is presented in the first video frame.
- the tablet computer B determines the current corresponding screen capture operation according to the second frame related information, and presents the window in the form of a small window next to the current video, or displays the first video frame on a large screen, and presents the current video in the form of a small window, etc.
- the tablet B presents the second frame related information in the first video frame presented, such as the screenshot time of the frame time in the first video frame or the frame number of the frame in the video frame. Wait. Subsequently, the tablet computer B obtains the labeling operation information and the like regarding the first video frame according to the operation of the second user.
- step S21 the second user equipment receives the video stream sent by the first user equipment to the second user equipment and the third user equipment; wherein, in step S24, the second user equipment The first user equipment and the third user equipment send the labeling operation information.
- user A holds smart glasses
- user B holds tablet B
- user C holds tablet C
- smart glasses and tablet B
- tablet C establishes video communication through wired or wireless network, and smart glasses will be currently collected.
- the picture is encoded and sent to each tablet B and buffered for a period of time or a certain number of video frames.
- the tablet B After receiving and decoding, the tablet B receives the video stream, and based on the screen capture operation of the user B, determines a second video frame corresponding to the screen capture image, and sends the second frame related information corresponding to the second video frame to the smart glasses and The tablet C, wherein the second frame related information includes but is not limited to: the second video frame identification information, the second video frame encoding start time, the second video frame decoding end time, and the second video frame codec and transmission total duration Information, etc.
- the smart glasses receive the second frame related information of the second video frame, and determine a corresponding uncoded first video frame in the locally stored video frame based on the second frame related information.
- Tablet B generates real-time annotation operation information according to user B's labeling operation, and sends the labeling operation information to smart glasses and tablet PC in real time.
- FIG. 4 illustrates a method for real-time annotation of a video frame at a third user equipment end according to still another aspect of the present application, wherein the method includes step S31, step S32, step S33, step S34, and step S35.
- step S31 the third user equipment receives the video stream sent by the first user equipment to the second user equipment and the third user equipment; in step S32, the third user equipment receives the second user equipment in the video.
- the second frame related information of the second video frame intercepted in the stream; in step S33, the third user equipment determines, according to the second frame related information, a corresponding to the second video frame in the video stream.
- step S34 the third user equipment receives the labeling operation information of the second video frame by the second user equipment; in step S35, the third user equipment is configured according to the labeling operation information.
- a corresponding labeling operation is presented in real time on the third video frame.
- user A holds smart glasses
- user B holds tablet B
- user C holds tablet C
- smart glasses and tablet B
- tablet C establishes video communication through wired or wireless network, and smart glasses will be currently collected.
- the picture is encoded and sent to each tablet B and buffered for a period of time or a certain number of video frames.
- the tablet B After receiving and decoding, the tablet B receives the video stream, and based on the screen capture operation of the user B, determines a second video frame corresponding to the screen capture image, and sends the second frame related information corresponding to the second video frame to the smart glasses and The tablet C, wherein the second frame related information includes but is not limited to: the second video frame identification information, the second video frame encoding start time, the second video frame decoding end time, and the second video frame codec and transmission total duration Information, etc.
- the smart glasses receive the second frame related information of the second video frame, and determine a corresponding uncoded first video frame in the locally stored video frame based on the second frame related information.
- the tablet computer B generates real-time labeling operation information according to the labeling operation of the user B, and sends the labeling operation information to the smart glasses and the tablet computer C in real time.
- the smart glasses After receiving the labeling operation information, the smart glasses display the corresponding areas in the preset area of the smart glasses. The first video frame that is not coded, and the corresponding labeling operation is presented in real time at the position corresponding to the first video frame.
- the tablet propyl receives the second frame related information and the labeled operation information, and finds the corresponding third video frame in the locally cached codec video library according to the second frame related information. And presenting a corresponding labeling operation in the third video frame based on the third video frame and the labeling operation information.
- the method further includes step S36 (not shown).
- step S36 the third user equipment receives the first video frame sent by the first user equipment, where the first video is used as a preferred frame for presenting the labeling operation, in the third video frame. Loading the first video frame to replace the third video frame in a display window, wherein the labeling operation is displayed on the first video frame.
- the tablet ethyl enters the screen capture mode by the operation of the user, determines the second video frame corresponding to the current picture, and sends the second frame related information of the second video frame to the smart glasses end, where the second frame related information includes However, it is not limited to: an encoding start time of the second video frame or a number corresponding to the video frame.
- the smart glasses determine a corresponding unencoded first video frame according to the second frame related information sent by the tablet B, and send the first video frame to the tablet B and the tablet C, as by lossless compression
- the first video frame is sent to the tablet B and the tablet C, or the first video frame is sent to the tablet B and the tablet C through the lossy lossy compression, the lossy compression process is guaranteed to be better than the tablet B
- the quality of the video frame cached locally by the tablet C is high.
- the tablet computer C receives and presents the first video frame, such as being presented in the form of a small window next to the current video, or displaying the first video frame on a large screen, presenting the current video in the form of a small window, and the like. Subsequently, the tablet computer C receives the labeling operation information sent by the tablet computer B, and presents the labeling operation in the first video frame.
- the method further includes step S37 (not shown).
- step S37 the third user equipment receives the first video frame and the second frame related information sent by the first user equipment, where the first video frame is used as a preferred frame for presenting the labeling operation. Determining, according to the second frame related information, that the first video frame is used to replace the third video frame, and loading the first video frame in a display window of the third video frame to replace the first video frame a three video frame, wherein the labeling operation is displayed on the first video frame.
- the tablet computer C receives the unencoded first video frame and the second frame related information sent by the smart glasses, wherein the second frame related information includes a screenshot time of the second video frame, and a video frame of the second video frame. Number, etc.
- the tablet computer C receives and presents the first video frame, such as being presented as a small window next to the current video, or displaying the first video frame on a large screen, presenting the current video as a small window, etc., while presenting the first video.
- the tablet C presents the second frame related information in the first video frame presented, such as the screenshot time of the frame time in the first video frame or the frame number of the frame in the video frame.
- the tablet computer C receives the labeling operation information sent by the tablet computer B, and presents the labeling operation in the first video frame.
- FIG. 5 illustrates a method for real-time annotation of a video frame at a network device end according to still another aspect of the present application, wherein the method includes step S41, step S42, step S43, step S44, and step S45.
- step S41 the network device receives and forwards the video stream sent by the first user equipment to the second user equipment; in step S42, the network device receives the second video intercepted by the second user equipment in the video stream.
- step S43 the network device forwards the second frame related information to the first user equipment; in step S44, the network device receives the second user equipment to the first The labeling operation information of the two video frames; in step S45, the network device forwards the labeling operation information to the first user equipment.
- user A holds smart glasses
- user B holds tablet B
- smart glasses and tablet B communicate video through the cloud.
- the smart glasses encode the currently collected picture and send it to the cloud, and the cloud forwards it to the tablet B.
- the smart glasses cache a period of time or a certain number of video frames when the video is sent.
- the tablet B After receiving and decoding, the tablet B receives the video stream, and based on the screen capture operation of the user B, determines a second video frame corresponding to the screen capture image, and sends the second frame related information corresponding to the second video frame to the cloud,
- the cloud forwards to the smart glasses, where the second frame related information includes, but is not limited to, the second video frame identification information, the second video frame encoding start time, the second video frame decoding end time, and the second video frame encoding and decoding. Total length information, etc.
- the smart glasses receive the second frame related information of the second video frame, and determine a corresponding uncoded first video frame in the locally stored video frame based on the second frame related information.
- the tablet computer B generates real-time annotation operation information according to the labeling operation of the user B, and sends the labeling operation information to the cloud, and is sent by the cloud to the smart glasses.
- the smart glasses After receiving the labeling operation information, the smart glasses display in the preset area of the smart glasses. Corresponding uncoded first video frame, and corresponding corresponding labeling operation is presented in real time at a position corresponding to the first video frame.
- the network device receives and forwards the video stream sent by the first user equipment to the second user equipment, and the frame identification information of the transmitted video frame in the video stream.
- the first user equipment performs encoding processing on the video frame, and sends the corresponding video stream and the frame identification information of the transmitted video frame in the video stream to the network device, where the network device sends the video stream and the frame of the transmitted video frame.
- the identification information is forwarded to the second user equipment, wherein the frame identification information includes an encoding start time of the video frame.
- step S43 the network device determines, according to the second frame related information, frame identification information of a video frame corresponding to the second video frame in the video stream, and the The frame identification information of the video frame corresponding to the two video frames is sent to the first user equipment.
- the cloud receives the video stream sent by the smart glasses and the frame identification information of the transmitted video frames in the video stream, such as the encoding start time of each video frame.
- the cloud forwards the video stream and the frame identification information corresponding to the transmitted video frame to the tablet B.
- Tablet B receives and presents the video frame and records the decoding end time of the video frame.
- the screen capture operation of the tablet E is performed by the user E, and the corresponding second video frame is determined, and the second frame related information of the second video frame is sent to the cloud, where the second frame related information includes the decoding end corresponding to the second video frame. Time or video number of the second video frame, etc.
- the cloud receives the second frame related information of the second video frame sent by the tablet B, and determines the frame identification information of the corresponding second frame based on the second frame related information, such as according to the decoding end time or the second of the second video frame.
- the video number of the video frame or the like determines the encoding start time of the second frame or the video number of the second video frame, and the like.
- step S41 the network device receives and forwards the video stream sent by the first user equipment to the second user equipment and the third user equipment; wherein, in step S43, the network equipment uses the second frame.
- the related information is forwarded to the first user equipment and the third user equipment; wherein, in step S45, the network device forwards the labeling operation information to the first user equipment and the third user equipment.
- user A holds smart glasses
- user B holds tablet B
- user C holds tablet C
- smart glasses, tablet B, and tablet C establish video communication through network devices, and smart glasses will capture the current picture.
- the code is sent to the network device and buffered for a period of time or a certain number of video frames, and the network device sends the video stream to the tablet B and the tablet C.
- the tablet B After receiving and decoding, the tablet B receives the video stream, and based on the screen capture operation of the user B, determines a second video frame corresponding to the screen capture, and sends the second frame related information corresponding to the second video frame to the network device.
- the second frame related information includes, but is not limited to, a second video frame identification information, a second video frame encoding start time, a second video frame decoding end time, and a second video frame codec and transmission total duration information.
- the smart glasses receive the second frame related information of the second video frame, and determine a corresponding uncoded first video frame in the locally stored video frame based on the second frame related information.
- the network device forwards the second frame related information to the first user equipment and the second user equipment.
- the tablet computer B generates real-time annotation operation information according to the labeling operation of the user B, and transmits the labeling operation information to the smart glasses and the tablet computer C through the network device in real time, and the smart glasses receive the labeling operation information, and the preset in the smart glasses
- the area displays a corresponding uncoded first video frame, and presents a corresponding labeling operation in real time at a position corresponding to the first video frame.
- the tablet propyl receives the second frame related information and the labeled operation information, and finds the corresponding third video frame in the locally cached codec video library according to the second frame related information. And presenting a corresponding labeling operation in the third video frame based on the third video frame and the labeling operation information.
- FIG. 6 illustrates a method for real-time annotation of video frames in accordance with an aspect of the present application, wherein the method includes:
- the first user equipment sends a video stream to the second user equipment
- the first user equipment receives the labeling operation information, and presents a corresponding labeling operation in real time on the first video frame according to the labeling operation information.
- FIG. 7 illustrates a method for real-time annotation of a video frame according to another aspect of the present application, wherein the method includes:
- the first user equipment sends a video stream to the network device
- the first user equipment receives the labeling operation information, and presents a corresponding labeling operation in real time on the first video frame according to the labeling operation information.
- a method for real-time annotation of a video frame according to still another aspect of the present application, wherein the method comprises:
- the first user equipment sends a video stream to the second user equipment and the third user equipment;
- a method for real-time annotation of a video frame includes:
- the first user equipment sends a video stream to the network device
- the network device receives the labeling operation information of the second video frame by the second user equipment, and forwards the labeling operation information to the first user equipment and the third user equipment;
- the first user equipment receives the labeling operation information, and presents a corresponding labeling operation in real time on the first video frame according to the labeling operation information;
- FIG. 8 illustrates a first user equipment for real-time annotation of a video frame according to an aspect of the present application, wherein the device includes a video transmission module 11, a frame information receiving module 12, a video frame determination module 13, and an annotation receiving module. 14 and annotation presentation module 15.
- the video sending module 11 is configured to send a video stream to the second user equipment, where the frame information receiving module 12 is configured to receive second frame related information of the second video frame that is intercepted by the second user equipment in the video stream.
- a video frame determining module 13 configured to determine, according to the second frame related information, a first video frame corresponding to the second video frame in the video stream, and an annotation receiving module 14 configured to receive the second The labeling operation information of the second video frame by the user equipment; the labeling presentation module 15 is configured to present a corresponding labeling operation in real time on the first video frame according to the labeling operation information.
- the video sending module 11 is configured to send a video stream to the second user equipment.
- the first user equipment establishes a communication connection with the second user equipment through a wired or wireless network, and the first user equipment encodes the video stream to the second user equipment by using a video communication manner.
- the frame information receiving module 12 is configured to receive second frame related information of the second video frame that is intercepted by the second user equipment in the video stream. For example, the second user equipment determines the second frame related information of the video frame corresponding to the screen capture screen based on the screen capture operation of the second user, and then the first user equipment receives the second frame related to the second video frame sent by the second user equipment.
- Information wherein the second frame related information includes, but is not limited to, second video frame identification information, a second video frame encoding start time, a second video frame decoding end time, and a second video frame codec and transmission total duration information. .
- the video frame determining module 13 is configured to determine, according to the second frame related information, a first video frame in the video stream that corresponds to the second video frame. For example, the first user equipment locally stores a period of time or a certain number of transmitted unencoded video frames, and the first user equipment stores the uncoded locally according to the second frame related information sent by the second user equipment. In the video frame, the uncoded first video frame corresponding to the screen capture is determined.
- the label receiving module 14 is configured to receive the labeling operation information of the second video frame by the second user equipment.
- the second user equipment generates corresponding labeling operation information based on the labeling operation of the second user, and sends the labeling operation information to the first user equipment in real time, and the first user receives the labeling operation information.
- the annotation presentation module 15 is configured to present a corresponding annotation operation on the first video frame in real time according to the annotation operation information.
- the second user equipment displays a corresponding labeling operation in real time on the first video frame based on the received labeling operation information, such as displaying the first video frame in a small window in the current interface, and corresponding to the first video frame in the current interface.
- the location presents a corresponding annotation operation at a rate of one frame every 50 ms.
- user A holds smart glasses
- user B holds tablet B
- smart glasses and tablet B establish video communication via wired or wireless network.
- Smart glasses encode the currently collected picture and send it to tablet B, and cache.
- the tablet B receives the video stream, and based on the screen capture operation of the user B, determines a second video frame corresponding to the screen capture image, and sends the second frame related information corresponding to the second video frame to the smart glasses.
- the second frame related information includes, but is not limited to, a second video frame identification information, a second video frame encoding start time, a second video frame decoding end time, and a second video frame codec and transmission total duration information.
- the smart glasses receive the second frame related information of the second video frame, and determine a corresponding uncoded first video frame in the locally stored video frame based on the second frame related information.
- the tablet computer B generates the real-time labeling operation information according to the labeling operation of the user B, and sends the labeling operation information to the smart glasses in real time.
- the smart glasses After receiving the labeling operation information, the smart glasses display the corresponding unencoded in the preset area of the smart glasses.
- the first video frame, and the corresponding labeling operation is presented in real time at a position corresponding to the first video frame.
- the device also includes a storage module 16 (not shown). a storage module 16 for storing a video frame in the video stream, where the video frame determining module 13 is configured to determine, from the stored video frame, the second video frame according to the second frame related information. Corresponding first video frame.
- the first user equipment sends the video stream to the second user equipment, and locally stores a period of time or a certain number of uncoded video frames, wherein a period of time or a certain number may be a preset fixed value,
- the threshold may be dynamically adjusted according to the network condition or the transmission rate; then, the first user equipment determines, according to the second frame related information of the second video frame sent by the second user equipment, the corresponding unrecognized in the locally stored video frame.
- the first video frame of the codec is a codec.
- the stored video frame meets, but is not limited to, at least one of the following: a time interval between a transmission time of the stored video frame and a current time is less than or equal to a video frame storage duration threshold; the stored The cumulative number of video frames is less than or equal to a predetermined number of video frame storage thresholds.
- the smart glasses send the collected images to the tablet B and store them for a period of time or a certain number of uncoded video frames locally, wherein the duration or the number may be a fixed value of the system or manually preset, such as A certain duration or a certain number of video frame thresholds obtained by statistical analysis of big data; a period of time or a certain number of video frames may also be a video frame threshold dynamically adjusted according to network conditions or transmission rates.
- the duration or number threshold of the dynamic adjustment may be determined according to the codec and the total duration information of the video frame, such as calculating the total length of the codec and the current video frame, and using the duration as a unit duration or within the duration.
- the number of transmitted video frames is taken as a unit number, and the dynamic video frame duration or number threshold is set with reference to the current unit duration or unit number.
- the set predetermined or dynamic video frame storage duration threshold should be greater than or equal to one unit duration.
- the set predetermined or dynamic video frame storage threshold should be greater than or equal to one unit number. Then, the smart glasses determine a corresponding uncoded first video frame in the stored video according to the second frame related information of the second video frame sent by the tablet B, wherein the stored video frame is sent and current.
- the interval of time is less than or equal to the video frame storage duration threshold, or the accumulated number of stored video frames is less than or equal to a predetermined video frame storage threshold.
- the device further includes a threshold adjustment module 17 (not shown).
- the threshold adjustment module 17 is configured to acquire codec and total transmission duration information of the video frame in the video stream, and adjust the video frame storage duration threshold or the number of the video frame storage according to the codec and total transmission duration information. Threshold.
- the first user equipment records the encoding start time of each video frame, and after encoding, sends the video frame to the second user equipment, and the second user equipment receives and records each video frame decoding end time; subsequently, the second user equipment
- the video frame decoding end time is sent to the first user equipment, and the first user equipment calculates the codec and total transmission duration information of the current video frame based on the encoding start time and the decoding end time, or the second user equipment is based on the encoding start time and
- the decoding end time calculates the codec and total transmission duration information of the current video frame, and sends the codec and the total transmission duration information to the first user equipment.
- the first user equipment adjusts the video frame storage duration threshold or the video frame storage threshold according to the codec and the total transmission duration information. If the duration information is used as a unit time reference, setting a certain multiple of the video frame duration is
- the video frame stores a duration threshold; for example, the number of video frames that can be sent in the duration information is calculated according to the duration information and the rate at which the first user equipment sends the video frame, and the number is used as a unit number to set a certain multiple of the video frame. The number is used as a threshold for the number of video frames stored.
- the smart glasses record the i-th video frame encoding start time as T si , and after encoding, send the video frame to the tablet B, and the tablet computer B receives and records the video frame decoding end time as T ei . Subsequently, the tablet B sends the video frame decoding end time T ei to the smart glasses, and the smart glasses calculate the decoding end time T ei according to the received i-th video frame and the encoding start time T si recorded locally .
- the total length of codec and transmission T i T ei -T si , and the total length of time T i of the codec and transmission is returned to the smart glasses.
- the smart glasses determine the duration of the video frame in the 1.3T i time dynamically saved by the smart glasses according to the big data statistics according to the encoding and decoding of the i-th video frame and the total transmission time T i .
- dynamically adjust the magnification according to the network transmission rate for example, setting the buffer duration threshold to (1+k)T i , where k is a threshold adjusted according to network fluctuation, and if the network fluctuation is large, setting k to 0.5, the network When the fluctuation is small, set k to 0.2 or the like.
- the smart glasses can dynamically adjust the magnification according to the current network transmission rate, such as setting the buffer number threshold (1+k)N, where k is a threshold adjusted according to network fluctuations, and if the network fluctuation is large, k is set. When it is 0.5 and the network fluctuation is small, set k to 0.2 or the like.
- the video sending module 11 is configured to send, to the second user equipment, a video stream and frame identification information of the transmitted video frame in the video stream, where the video frame determining module 13 is configured to The second frame related information determines a first video frame corresponding to the second video frame in the video stream, where frame identification information of the first video frame corresponds to the second frame related information.
- the frame identification information of the video frame may be a codec time corresponding to the video frame, or may be a number corresponding to the video.
- the video sending module 11 is configured to perform encoding processing on multiple video frames to be transmitted, and send frame identification information of the corresponding video stream and the transmitted video frame in the video stream to the second user equipment. .
- the first user equipment performs encoding processing on a plurality of video frames to be transmitted, and acquires an encoding start time of the plurality of video frames to be transmitted, and sends the multiple video frames and their encoding start time to the second.
- the frame identification information of the transmitted video frame in the video stream includes encoding start time information of the transmitted video frame.
- the smart glasses record the encoding start time of each video frame, and after encoding, send the video frame and the encoding start time of the transmitted video frame to the tablet computer B, wherein the transmitted video frame includes the video that is to be sent after the current encoding is completed.
- Frames and transmitted video frames may be sent at a certain time or at a certain interval of a certain video frame, and the coding start time of the transmitted video frame is sent to the tablet B, or the coding start time of the first video frame may be directly Video frames are sent to tablet B at the same time.
- the tablet computer operates the screen capture operation of the user B, determines the video frame corresponding to the screen capture screen, and sends the second frame related information of the corresponding second video frame to the smart glasses, wherein the second frame related information and the second frame identification information Correspondingly, including but not limited to at least one of the following: an encoding start time of the second video frame, a second video frame decoding end time, a second video frame codec and transmission total duration information, a second video frame corresponding number or Images, etc.
- the smart glasses receive the second frame related information, and determine correspondingly stored uncoded first video frames according to the second frame related information, such as according to the encoding start time of the second video frame, the second video frame decoding end time, Determining a coding start time of the uncoded first video frame corresponding to the second video frame, and determining a corresponding first video frame, and determining, by the second video frame, corresponding to the second video frame, The number directly determines the first video frame of the same number, and also determines the corresponding first video frame in the stored uncoded video frame by image recognition of the second video frame.
- the device also includes a video frame rendering module 18 (not shown).
- the video frame presentation module 18 is configured to present the first video frame.
- the annotation presentation module 15 is configured to superimpose a corresponding annotation operation on the first video frame according to the annotation operation information.
- the first user equipment determines the first video frame that has not been coded, and displays the first video frame in a preset position in the current interface or in a small window; subsequently, the first user equipment receives the annotation according to the real-time reception.
- the operation information is superimposed on the corresponding position of the first video frame to present a corresponding labeling operation.
- the smart glasses determine a corresponding uncoded first video frame according to the second frame related information sent by the tablet B, and display the first video frame at a preset position of the interface of the smart glasses. Then, the smart glasses receive the real-time annotation operation sent by the tablet B. The smart glasses determine the corresponding position of the annotation operation in the currently displayed first video frame, and present the current annotation operation in real time at the corresponding location.
- the device also includes a first preferred frame module 19 (not shown).
- the first preferred frame module 19 is configured to send the first video frame to the second user equipment as a preferred frame that presents the labeling operation. For example, the first user equipment determines the first video frame that has not been coded, and sends the first video frame to the second user equipment for the second user equipment to present the first video frame of higher quality.
- the smart glasses determine a corresponding unencoded first video frame according to the second frame related information sent by the tablet B, and send the first video frame as a preferred frame to the tablet B, such as by lossless compression.
- Tablet B receives the first video frame and presents the first video frame.
- the video sending module 11 is configured to send a video stream to the second user equipment and the third user equipment.
- a communication connection is established between the first user equipment, the second user equipment, and the third user equipment, where the first user equipment is the current video frame sender, and the second user equipment and the third user equipment are the current video frame receiving.
- the first user equipment sends a video stream to the second user equipment and the third user equipment through the communication connection.
- user A holds smart glasses
- user B holds tablet B
- user C holds tablet C
- smart glasses and tablet B
- tablet C establishes video communication through wired or wireless network, and smart glasses will be currently collected.
- the picture is encoded and sent to each tablet B and buffered for a period of time or a certain number of video frames.
- the tablet B After receiving and decoding, the tablet B receives the video stream, and based on the screen capture operation of the user B, determines a second video frame corresponding to the screen capture image, and sends the second frame related information corresponding to the second video frame to the smart glasses and The tablet C, wherein the second frame related information includes but is not limited to: the second video frame identification information, the second video frame encoding start time, the second video frame decoding end time, and the second video frame codec and transmission total duration Information, etc.
- the smart glasses receive the second frame related information of the second video frame, and determine a corresponding uncoded first video frame in the locally stored video frame based on the second frame related information.
- the tablet computer B generates real-time labeling operation information according to the labeling operation of the user B, and sends the labeling operation information to the smart glasses and the tablet computer C in real time.
- the smart glasses After receiving the labeling operation information, the smart glasses display the corresponding areas in the preset area of the smart glasses. The first video frame that is not coded, and the corresponding labeling operation is presented in real time at the position corresponding to the first video frame.
- the tablet propyl receives the second frame related information and the labeled operation information, and finds the corresponding third video frame in the locally cached codec video library according to the second frame related information. And presenting a corresponding labeling operation in the third video frame based on the third video frame and the labeling operation information.
- the device further includes a second preferred frame module 010 (not shown).
- the second preferred frame module 010 is configured to send the first video frame to the second user equipment and/or the third user equipment as a preferred frame for presenting the labeling operation.
- the first user equipment determines a corresponding first video frame in the locally cached video frame according to the second frame related information, and sends the first video frame to the second user equipment and/or the third user equipment.
- the second user equipment and/or the third user equipment receives the uncoded first video frame, the first video frame is presented, and the second user and/or the third user may perform an annotation operation based on the first video frame.
- the uncoded first video frame is sent to the tablet B through lossless compression or high quality compression.
- Tablet PC C wherein Tablet PC B and Tablet PC C determine whether to obtain the first video frame according to the quality of the current communication network connection, or select the transmission mode of the first video frame according to the quality of the current communication network connection, such as good network quality
- lossless compression high-quality compression is used when the network quality is poor.
- the second preferred frame module 010 is configured to send the first video frame and the second frame related information to the second user equipment and/or the third user equipment, where The first video frame is used as a preferred frame to present the annotation operation in the second user device or the third user device.
- the tablet ethyl unit performs a plurality of screen capture operations on the operation of the user B, and the tablet computer B determines the screen capture operation corresponding to the first video frame according to the second frame related information, for example, according to the second frame screenshot time.
- a screen capture operation corresponding to a video frame, and the second frame related information is presented in the first video frame.
- the tablet computer C receives the second frame related information and the first video frame, and presents the second frame related information in the window in which the first video frame is presented while presenting the first video frame.
- FIG. 9 illustrates a second user equipment for real-time annotation of a video frame according to another aspect of the present application, wherein the apparatus includes a video receiving module 21, a frame information determining module 22, an annotation acquiring module 23, and an annotation transmitting module 24.
- the video receiving module 21 is configured to receive the video stream sent by the first user equipment
- the frame information determining module 22 is configured to send the intercepted number to the first user equipment according to a screenshot operation of the user in the video stream.
- a second frame related information of the second video frame an annotation obtaining module 23, configured to acquire the labeling operation information of the second video frame by the user; and an annotation sending module 24, configured to send the Label the operation information.
- the second user equipment receives and presents the video stream sent by the first user equipment; and the second user equipment determines, according to the screen capture operation of the second user, the second video frame corresponding to the current screen capture, and the second video frame is The second frame related information is sent to the first user equipment. Then, the second user equipment generates the labeling operation information based on the labeling operation of the second user, and sends the labeling operation information to the first user equipment.
- user B holds tablet B
- user A holds smart glasses
- tablet B and smart glasses communicate video over wired or wireless networks.
- the tablet computer B receives and presents the video stream sent by the smart glasses, and determines the second video frame corresponding to the screen capture screen according to the screen capture operation of the user B. Then, the tablet B sends the second frame related information corresponding to the second video frame to the smart glasses, and the smart glasses receive the second frame related information and determine a corresponding first video frame based on the second frame related information.
- the tablet ethyl unit generates the corresponding labeling operation information in the labeling operation of the user B, and sends the labeling operation information to the smart glasses in real time.
- the smart glasses present a first video frame at a preset position of the interface according to the first video frame and the labeled operation information, and present a corresponding labeling operation in real time in the corresponding position in the first video frame.
- the video receiving module 21 is configured to receive a video stream sent by the first user equipment, and frame identification information of the transmitted video frame in the video stream, where the second frame related information includes the following At least one of: frame identification information of the second video frame; frame related information generated based on frame identification information of the second video frame.
- the first user equipment sends the video stream to the second user equipment, and also sends the frame identifier information of the sent video frame in the video stream to the second user equipment, where the second user equipment receives the video stream, and the video stream The frame identification information of the transmitted video frame.
- the second user equipment determines, according to the screen capture operation of the second user, the second video frame corresponding to the current screen capture, and sends the second frame related information of the second video frame to the first user equipment, where the second video
- the second frame related information of the frame includes but is not limited to: frame identification information of the second video frame; frame related information generated based on the frame identification information of the second video frame.
- the smart glasses send the frame identification information corresponding to the video frames in the already transmitted video stream to the tablet B while transmitting the video stream.
- the tablet computer B detects the screen capture operation of the user B. Based on the screen of the current screen capture, it is determined that the screen capture screen corresponds to the second video frame, and the second frame related information corresponding to the second video frame is sent to the smart glasses, wherein the second video
- the frame related information includes, but is not limited to, frame identification information of the second video frame, frame related information generated based on the frame identification information of the second video frame, where the frame identification information of the second video frame may be the encoding of the video frame.
- the frame-related information generated based on the frame identification information of the second video frame may be the decoding end time of the video frame or the total length of the codec and the transmission time information, etc., at the start time or the number corresponding to the video frame.
- the frame identification information includes encoding start time information of the second video frame.
- the first user equipment performs an encoding process on the video frame, and sends the frame identification information of the corresponding video stream and the transmitted video frame in the video stream to the second user equipment, where the frame identification information of the video frame includes The encoding start time of the video frame.
- the second frame related information includes decoding end time information and codec and transmission total duration information of the second video frame. The second user equipment receives and presents the video stream, and records a corresponding decoding end time, determines a corresponding second video frame based on the screen capture operation, and determines a corresponding edit according to the encoding start time and the decoding end time of the second video frame. Decode and transmit total duration information.
- the smart glasses record the encoding start time of each video frame, and after encoding, send the encoding start time of the video frame and the transmitted video frame to the tablet B.
- Tablet B receives and presents the video frame and records the decoding end time of the video frame.
- the screen is operated by the screen capture operation of the user E, and the corresponding second video frame is determined, and the codec and the total transmission duration information of the second video frame are determined according to the coding start time and the decoding end time corresponding to the second video frame.
- the tablet B sends the second frame related information of the second video frame to the smart glasses, where the second frame related information includes, but is not limited to, an encoding start time of the second video frame, a codec of the second video frame, and Transfer total time information, etc.
- the annotation obtaining module 23 is configured to acquire the labeling operation information of the second video frame by the user in real time, where the label sending module 24 is configured to send the identifier to the first user equipment in real time.
- Label operation For example, the second user equipment acquires the corresponding labeling operation information in real time based on the operation of the second user, for example, collecting the corresponding labeling operation information at a certain time interval. Then, the second user equipment sends the acquired annotation operation information to the first user equipment in real time.
- the tablet computer B collects the labeling operation of the user B on the screen capture screen, for example, the user B draws a circle, an arrow, a text, a box and the like on the screen.
- Tablet B records the position and path of the marked brush. For example, through multiple points on the screen, the position corresponding to the corresponding point is obtained, and the position where the multiple points are connected is marked.
- Tablet PC B obtains the corresponding labeling operation in real time and sends it to the smart glasses in real time, such as collecting and sending labeling operations at a frequency of 50 ms.
- the device also includes a first video frame replacement module 25 (not shown).
- a first video frame replacement module 25 configured to receive a first video frame sent by the first user equipment, where the first video frame is used as a preferred frame for presenting the labeling operation, in the second video Loading the first video frame to replace the second video frame in a display window of the frame, wherein the labeling operation is displayed on the first video frame.
- the second user equipment determines a second video frame corresponding to the current screen shot and transmits the second frame related information of the second video frame to the first user equipment; the first user equipment is based on the second video frame.
- the second frame related information determines the unencoded first video frame corresponding to the second video frame, and sends the first video frame to the second user equipment, where the second user equipment receives and presents the first video frame. And obtaining the labeling operation information of the first video frame by the second user.
- the tablet ethyl enters the screen capture mode by the operation of the user, determines the second video frame corresponding to the current picture, and sends the second frame related information of the second video frame to the smart glasses end, where the second frame related information includes However, it is not limited to: an encoding start time of the second video frame or a number corresponding to the video frame.
- the smart glasses determine a corresponding uncoded first video frame according to the second frame related information sent by the tablet B, and send the first video frame to the tablet B, such as by using lossless compression to the first video.
- the frame is sent to the tablet B, or the first video frame is sent to the tablet B through the lossy lossy compression.
- the lossy compression process guarantees a higher quality than the locally buffered video frame of the tablet.
- the tablet B receives and presents the first video frame, such as being presented in the form of a small window next to the current video, or displaying the first video frame on a large screen, presenting the current video in the form of a small window, and the like. Subsequently, the tablet computer B obtains the labeling operation information and the like regarding the first video frame according to the operation of the second user.
- the device further includes a first video frame labeling module 26 (not shown).
- a first video frame labeling module 26 configured to receive a first video frame and the second frame related information that are sent by the first user equipment, where the first video frame is used as a preference for presenting the labeling operation a frame, determining, according to the second frame related information, that the first video frame is used to replace the second video frame, and loading the first video frame to replace the first video frame in a display window of the second video frame The second video frame is described, wherein the labeling operation is displayed on the first video frame.
- the tablet B receives the unencoded first video frame and the second frame related information sent by the smart glasses, wherein the second frame related information includes a screenshot time of the second video frame and a video frame of the second video frame. Number, etc.
- the tablet ethyl unit performs a plurality of screen capture operations on the operation of the user B, and the tablet computer B determines the screen capture operation corresponding to the first video frame according to the second frame related information, for example, according to the second frame screenshot time. A screen capture operation corresponding to a video frame, and the second frame related information is presented in the first video frame.
- the tablet computer B determines the current corresponding screen capture operation according to the second frame related information, and presents the window in the form of a small window next to the current video, or displays the first video frame on a large screen, and presents the current video in the form of a small window, etc.
- the tablet B presents the second frame related information in the first video frame presented, such as the screenshot time of the frame time in the first video frame or the frame number of the frame in the video frame. Wait. Subsequently, the tablet computer B obtains the labeling operation information and the like regarding the first video frame according to the operation of the second user.
- the video receiving module 21 is configured to receive a video stream that is sent by the first user equipment to the second user equipment and the third user equipment, where the label sending module 24 is configured to send to the first user equipment. And the third user equipment sends the labeling operation information.
- user A holds smart glasses
- user B holds tablet B
- user C holds tablet C
- smart glasses and tablet B
- tablet C establishes video communication through wired or wireless network, and smart glasses will be currently collected.
- the picture is encoded and sent to each tablet B and buffered for a period of time or a certain number of video frames.
- the tablet B After receiving and decoding, the tablet B receives the video stream, and based on the screen capture operation of the user B, determines a second video frame corresponding to the screen capture image, and sends the second frame related information corresponding to the second video frame to the smart glasses and The tablet C, wherein the second frame related information includes but is not limited to: the second video frame identification information, the second video frame encoding start time, the second video frame decoding end time, and the second video frame codec and transmission total duration Information, etc.
- the smart glasses receive the second frame related information of the second video frame, and determine a corresponding uncoded first video frame in the locally stored video frame based on the second frame related information.
- the tablet computer B generates real-time labeling operation information according to the labeling operation of the user B, and sends the labeling operation information to the smart glasses and the tablet computer C in real time.
- FIG. 10 illustrates an apparatus for real-time labeling a video frame at a third user equipment end according to still another aspect of the present application, wherein the apparatus includes a third video receiving module 31, a third frame information receiving module 32, and a third The three video frame determining module 33, the third label receiving module 34, and the third rendering module 35.
- a third video receiving module 31 configured to receive a video stream that is sent by the first user equipment to the second user equipment and the third user equipment, where the third frame information receiving module 32 is configured to receive the second user equipment in the a second frame related information of the second video frame that is intercepted in the video stream; the third video frame determining module 33 is configured to determine, according to the second frame related information, that the video stream corresponds to the second video frame a third video frame, a third annotation receiving module 34, configured to receive the labeling operation information of the second video frame by the second user equipment, and a third rendering module 35, configured to The corresponding labeling operation is presented in real time on the third video frame.
- user A holds smart glasses
- user B holds tablet B
- user C holds tablet C
- smart glasses and tablet B
- tablet C establishes video communication through wired or wireless network, and smart glasses will be currently collected.
- the picture is encoded and sent to each tablet B and buffered for a period of time or a certain number of video frames.
- the tablet B After receiving and decoding, the tablet B receives the video stream, and based on the screen capture operation of the user B, determines a second video frame corresponding to the screen capture image, and sends the second frame related information corresponding to the second video frame to the smart glasses and The tablet C, wherein the second frame related information includes but is not limited to: the second video frame identification information, the second video frame encoding start time, the second video frame decoding end time, and the second video frame codec and transmission total duration Information, etc.
- the smart glasses receive the second frame related information of the second video frame, and determine a corresponding uncoded first video frame in the locally stored video frame based on the second frame related information.
- the tablet computer B generates real-time labeling operation information according to the labeling operation of the user B, and sends the labeling operation information to the smart glasses and the tablet computer C in real time.
- the smart glasses After receiving the labeling operation information, the smart glasses display the corresponding areas in the preset area of the smart glasses. The first video frame that is not coded, and the corresponding labeling operation is presented in real time at the position corresponding to the first video frame.
- the tablet propyl receives the second frame related information and the labeled operation information, and finds the corresponding third video frame in the locally cached codec video library according to the second frame related information. And presenting a corresponding labeling operation in the third video frame based on the third video frame and the labeling operation information.
- the device also includes a preferred frame reception presentation module 36 (not shown).
- a preferred frame receiving presentation module 36 configured to receive a first video frame sent by the first user equipment, where the first video is used as a preferred frame for presenting the labeling operation, in the third video frame Loading the first video frame to replace the third video frame in a display window, wherein the labeling operation is displayed on the first video frame.
- the tablet ethyl enters the screen capture mode by the operation of the user, determines the second video frame corresponding to the current picture, and sends the second frame related information of the second video frame to the smart glasses end, where the second frame related information includes However, it is not limited to: an encoding start time of the second video frame or a number corresponding to the video frame.
- the smart glasses determine a corresponding unencoded first video frame according to the second frame related information sent by the tablet B, and send the first video frame to the tablet B and the tablet C, as by lossless compression
- the first video frame is sent to the tablet B and the tablet C, or the first video frame is sent to the tablet B and the tablet C through the lossy lossy compression, the lossy compression process is guaranteed to be better than the tablet B
- the quality of the video frame cached locally by the tablet C is high.
- the tablet computer C receives and presents the first video frame, such as being presented in the form of a small window next to the current video, or displaying the first video frame on a large screen, presenting the current video in the form of a small window, and the like. Subsequently, the tablet computer C receives the labeling operation information sent by the tablet computer B, and presents the labeling operation in the first video frame.
- the device further includes a preferred frame annotation presentation module 37 (not shown).
- a frame annotation presentation module 37 configured to receive a first video frame and the second frame related information sent by the first user equipment, where the first video frame is used as a preferred frame for presenting the labeling operation Determining, according to the second frame related information, that the first video frame is used to replace the third video frame, and loading the first video frame in a display window of the third video frame to replace the first video frame a three video frame, wherein the labeling operation is displayed on the first video frame.
- the tablet computer C receives the unencoded first video frame and the second frame related information sent by the smart glasses, wherein the second frame related information includes a screenshot time of the second video frame, and a video frame of the second video frame. Number, etc.
- the tablet computer C receives and presents the first video frame, such as being presented as a small window next to the current video, or displaying the first video frame on a large screen, presenting the current video as a small window, etc., while presenting the first video.
- the tablet C presents the second frame related information in the first video frame presented, such as the screenshot time of the frame time in the first video frame or the frame number of the frame in the video frame.
- the tablet computer C receives the labeling operation information sent by the tablet computer B, and presents the labeling operation in the first video frame.
- FIG. 11 illustrates a network device for real-time annotation of a video frame according to still another aspect of the present application, wherein the device includes a video forwarding module 41, a frame information receiving module 42, a frame information forwarding module 43, and an annotation receiving module 44. And annotating the forwarding module 45.
- the video forwarding module 41 is configured to receive and forward a video stream that is sent by the first user equipment to the second user equipment
- the frame information receiving module 42 is configured to receive the second information that is captured by the second user equipment in the video stream.
- a second frame related information of the video frame a frame information forwarding module 43, configured to forward the second frame related information to the first user equipment, and an annotation receiving module 44, configured to receive the second user equipment
- the labeling operation information of the second video frame is used to forward the labeling operation information to the first user equipment.
- user A holds smart glasses
- user B holds tablet B
- smart glasses and tablet B communicate video through the cloud.
- the smart glasses encode the currently collected picture and send it to the cloud, and the cloud forwards it to the tablet B.
- the smart glasses cache a period of time or a certain number of video frames when the video is sent.
- the tablet B After receiving and decoding, the tablet B receives the video stream, and based on the screen capture operation of the user B, determines a second video frame corresponding to the screen capture image, and sends the second frame related information corresponding to the second video frame to the cloud,
- the cloud forwards to the smart glasses, where the second frame related information includes, but is not limited to, the second video frame identification information, the second video frame encoding start time, the second video frame decoding end time, and the second video frame encoding and decoding. Total length information, etc.
- the smart glasses receive the second frame related information of the second video frame, and determine a corresponding uncoded first video frame in the locally stored video frame based on the second frame related information.
- the tablet computer B generates real-time annotation operation information according to the labeling operation of the user B, and sends the labeling operation information to the cloud, and is sent by the cloud to the smart glasses.
- the smart glasses After receiving the labeling operation information, the smart glasses display in the preset area of the smart glasses. Corresponding uncoded first video frame, and corresponding corresponding labeling operation is presented in real time at a position corresponding to the first video frame.
- the video forwarding module 41 is configured to receive and forward a video stream sent by the first user equipment to the second user equipment, and frame identification information of the transmitted video frame in the video stream.
- the first user equipment performs encoding processing on the video frame, and sends the corresponding video stream and the frame identification information of the transmitted video frame in the video stream to the network device, where the network device sends the video stream and the frame of the transmitted video frame.
- the identification information is forwarded to the second user equipment, wherein the frame identification information includes an encoding start time of the video frame.
- the frame information forwarding module 43 is configured to determine, according to the second frame related information, frame identification information of a video frame corresponding to the second video frame in the video stream, and The frame identification information of the video frame corresponding to the second video frame is sent to the first user equipment.
- the cloud receives the video stream sent by the smart glasses and the frame identification information of the transmitted video frames in the video stream, such as the encoding start time of each video frame.
- the cloud forwards the video stream and the frame identification information corresponding to the transmitted video frame to the tablet B.
- Tablet B receives and presents the video frame and records the decoding end time of the video frame.
- the screen capture operation of the tablet E is performed by the user E, and the corresponding second video frame is determined, and the second frame related information of the second video frame is sent to the cloud, where the second frame related information includes the decoding end corresponding to the second video frame. Time or video number of the second video frame, etc.
- the cloud receives the second frame related information of the second video frame sent by the tablet B, and determines the frame identification information of the corresponding second frame based on the second frame related information, such as according to the decoding end time or the second of the second video frame.
- the video number of the video frame or the like determines the encoding start time of the second frame or the video number of the second video frame, and the like.
- the video forwarding module 41 is configured to receive and forward a video stream sent by the first user equipment to the second user equipment and the third user equipment, where the frame information forwarding module 43 is configured to use the second The frame-related information is forwarded to the first user equipment and the third user equipment.
- the label forwarding module 45 is configured to forward the labeling operation information to the first user equipment and the third user equipment.
- user A holds smart glasses
- user B holds tablet B
- user C holds tablet C
- smart glasses, tablet B, and tablet C establish video communication through network devices, and smart glasses will capture the current picture.
- the code is sent to the network device and buffered for a period of time or a certain number of video frames, and the network device sends the video stream to the tablet B and the tablet C.
- the tablet B After receiving and decoding, the tablet B receives the video stream, and based on the screen capture operation of the user B, determines a second video frame corresponding to the screen capture, and sends the second frame related information corresponding to the second video frame to the network device.
- the second frame related information includes, but is not limited to, a second video frame identification information, a second video frame encoding start time, a second video frame decoding end time, and a second video frame codec and transmission total duration information.
- the smart glasses receive the second frame related information of the second video frame, and determine a corresponding uncoded first video frame in the locally stored video frame based on the second frame related information.
- the network device forwards the second frame related information to the first user equipment and the second user equipment.
- the tablet computer B generates real-time annotation operation information according to the labeling operation of the user B, and transmits the labeling operation information to the smart glasses and the tablet computer C through the network device in real time, and the smart glasses receive the labeling operation information, and the preset in the smart glasses
- the area displays a corresponding uncoded first video frame, and presents a corresponding labeling operation in real time at a position corresponding to the first video frame.
- the tablet propyl receives the second frame related information and the labeled operation information, and finds the corresponding third video frame in the locally cached codec video library according to the second frame related information. And presenting a corresponding labeling operation in the third video frame based on the third video frame and the labeling operation information.
- a system for real-time annotation of a video frame comprising: the first user equipment according to any of the above embodiments and any of the above embodiments The second user equipment is described; in other embodiments, the system further comprises: the network device as described in any of the above embodiments.
- a system for real-time annotation of a video frame comprising: the first user equipment according to any of the above embodiments, according to any of the above embodiments The second user equipment and the third user equipment as described in any of the above embodiments; in other embodiments, the system further includes: the network device as described in any of the foregoing embodiments.
- the present application also provides a computer readable storage medium storing computer code, the method of any of which is performed when the computer code is executed.
- the present application also provides a computer program product that is executed as described in any of the foregoing when the computer program product is executed by a computer device.
- the application also provides a computer device, the computer device comprising:
- One or more processors are One or more processors;
- a memory for storing one or more computer programs
- the one or more processors When the one or more computer programs are executed by the one or more processors, the one or more processors are caused to implement the method of any of the preceding.
- Figure 12 illustrates an exemplary system that can be used to implement various embodiments described in this application
- system 300 can be used as a device for any real-time annotation of video frames as any of the described embodiments.
- system 300 can include and be coupled to one or more computer readable media (eg, system memory or NVM/storage device 320) having instructions and configured to execute The instructions are one or more processors (eg, processor(s) 305) that implement the modules to perform the actions described herein.
- processors eg, processor(s) 305
- system control module 310 can include any suitable interface controller to provide to at least one of processor(s) 305 and/or any suitable device or component in communication with system control module 310. Any suitable interface.
- System control module 310 can include a memory controller module 330 to provide an interface to system memory 315.
- the memory controller module 330 can be a hardware module, a software module, and/or a firmware module.
- System memory 315 can be used, for example, to load and store data and/or instructions for system 300.
- system memory 315 can include any suitable volatile memory, such as a suitable DRAM.
- system memory 315 can include double data rate type quad synchronous dynamic random access memory (DDR4 SDRAM).
- DDR4 SDRAM double data rate type quad synchronous dynamic random access memory
- system control module 310 can include one or more input/output (I/O) controllers to provide an interface to NVM/storage device 320 and communication interface(s) 325.
- I/O input/output
- NVM/storage device 320 can be used to store data and/or instructions.
- NVM/storage device 320 may comprise any suitable non-volatile memory (eg, flash memory) and/or may include any suitable non-volatile storage device(s) (eg, one or more hard disk drives (HDD), one or more compact disc (CD) drives and/or one or more digital versatile disc (DVD) drives).
- HDD hard disk drives
- CD compact disc
- DVD digital versatile disc
- the NVM/storage device 320 can include storage resources that are physically part of the device on which the system 300 is installed, or that can be accessed by the device without having to be part of the device.
- NVM/storage device 320 can be accessed via network via communication interface(s) 325.
- the communication interface(s) 325 can provide an interface to the system 300 to communicate over one or more networks and/or with any other suitable device.
- System 300 can wirelessly communicate with one or more components of a wireless network in accordance with any of one or more wireless network standards and/or protocols.
- At least one of the processor(s) 305 can be packaged with the logic of one or more controllers of the system control module 310 (eg, the memory controller module 330). For one embodiment, at least one of the processor(s) 305 can be packaged with the logic of one or more controllers of the system control module 310 to form a system in package (SiP). For one embodiment, at least one of the processor(s) 305 can be integrated on the same mold as the logic of one or more controllers of the system control module 310. For one embodiment, at least one of the processor(s) 305 can be integrated with the logic of one or more controllers of the system control module 310 on the same mold to form a system on a chip (SoC).
- SoC system on a chip
- system 300 can be, but is not limited to, a server, workstation, desktop computing device, or mobile computing device (eg, a laptop computing device, a handheld computing device, a tablet, a netbook, etc.).
- system 300 can have more or fewer components and/or different architectures.
- system 300 includes one or more cameras, a keyboard, a liquid crystal display (LCD) screen (including a touch screen display), a non-volatile memory port, multiple antennas, a graphics chip, an application specific integrated circuit ( ASIC) and speakers.
- LCD liquid crystal display
- ASIC application specific integrated circuit
- the present application can be implemented in software and/or a combination of software and hardware, for example, using an application specific integrated circuit (ASIC), a general purpose computer, or any other similar hardware device.
- the software program of the present application can be executed by a processor to implement the steps or functions described above.
- the software programs (including related data structures) of the present application can be stored in a computer readable recording medium such as a RAM memory, a magnetic or optical drive or a floppy disk and the like.
- some of the steps or functions of the present application may be implemented in hardware, for example, as a circuit that cooperates with a processor to perform various steps or functions.
- a portion of the application can be applied as a computer program product, such as computer program instructions, which, when executed by a computer, can invoke or provide a method and/or technical solution in accordance with the present application.
- the form of computer program instructions in a computer readable medium includes, but is not limited to, source files, executable files, installation package files, etc., accordingly, the manner in which the computer program instructions are executed by the computer includes but not Limited to: the computer directly executes the instruction, or the computer compiles the instruction and then executes the corresponding compiled program, or the computer reads and executes the instruction, or the computer reads and installs the instruction and then executes the corresponding installation. program.
- the computer readable medium can be any available computer readable storage medium or communication medium that can be accessed by a computer.
- Communication media includes media that can be transferred from one system to another by communication signals including, for example, computer readable instructions, data structures, program modules or other data.
- Communication media can include conductive transmission media such as cables and wires (eg, fiber optics, coaxial, etc.) and wireless (unguided transmission) media capable of propagating energy waves, such as acoustic, electromagnetic, RF, microwave, and infrared.
- Computer readable instructions, data structures, program modules or other data may be embodied, for example, as modulated data signals in a wireless medium, such as a carrier wave or a similar mechanism, such as embodied in a portion of a spread spectrum technique.
- modulated data signal refers to a signal whose one or more features are altered or set in such a manner as to encode information in the signal. Modulation can be analog, digital or hybrid modulation techniques.
- the computer readable storage medium may comprise, by way of example and not limitation, vols and non-volatile, implemented in any method or technology for storing information such as computer readable instructions, data structures, program modules or other data.
- a computer readable storage medium includes, but is not limited to, volatile memory such as random access memory (RAM, DRAM, SRAM); and nonvolatile memory such as flash memory, various read only memories (ROM, PROM, EPROM) , EEPROM), magnetic and ferromagnetic/ferroelectric memories (MRAM, FeRAM); and magnetic and optical storage devices (hard disks, tapes, CDs, DVDs); or other currently known media or later developed for storage in computer systems Computer readable information/data used.
- volatile memory such as random access memory (RAM, DRAM, SRAM)
- nonvolatile memory such as flash memory, various read only memories (ROM, PROM, EPROM) , EEPROM), magnetic and ferromagnetic/ferroelectric memories (MRAM, FeRAM); and magnetic and optical storage devices (hard disk
- an embodiment in accordance with the present application includes a device including a memory for storing computer program instructions and a processor for executing program instructions, wherein when the computer program instructions are executed by the processor, triggering
- the apparatus operates based on the aforementioned methods and/or technical solutions in accordance with various embodiments of the present application.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Business, Economics & Management (AREA)
- Databases & Information Systems (AREA)
- Human Computer Interaction (AREA)
- Finance (AREA)
- Strategic Management (AREA)
- General Engineering & Computer Science (AREA)
- Marketing (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
La présente invention concerne un procédé et un dispositif permettant d'étiqueter des trames vidéo en temps réel. Le procédé comprend plus précisément : l'envoi d'un flux vidéo à un second équipement utilisateur ; la réception d'informations relatives à une seconde trame d'une seconde trame vidéo interceptée par le second équipement utilisateur dans le flux vidéo ; la détermination d'une première trame vidéo correspondant à la seconde trame vidéo dans le flux vidéo en fonction des informations relatives à la seconde trame ; la réception d'informations d'opération d'étiquetage de la seconde trame vidéo par le second équipement utilisateur ; et l'affichage d'une opération d'étiquetage correspondante sur la première trame vidéo en temps réel en fonction des informations d'opération d'étiquetage. Selon la présente invention, des informations d'étiquetage sont directement affichées de manière superposée sur une image de trame vidéo non codée ou non décodée envoyée par un expéditeur de vidéo, et étant donné que la trame vidéo étiquetée n'est pas soumise à des opérations telles que le codage et le décodage, la définition est élevée. En outre, la solution de l'invention permet également d'obtenir un affichage en temps réel de l'étiquette, de telle sorte que l'aptitude à la mise en œuvre est bonne, l'interactivité est élevée, et l'expérience d'utilisateur et le taux d'utilisation large bande sont améliorés.
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201810011908 | 2018-01-05 | ||
| CN201810011908.0 | 2018-01-05 | ||
| CN201810409977.7A CN108401190B (zh) | 2018-01-05 | 2018-05-02 | 一种用于对视频帧进行实时标注的方法与设备 |
| CN201810409977.7 | 2018-05-02 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2019134499A1 true WO2019134499A1 (fr) | 2019-07-11 |
Family
ID=63101425
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2018/121730 Ceased WO2019134499A1 (fr) | 2018-01-05 | 2018-12-18 | Procédé et dispositif d'étiquetage en temps réel de trames vidéo |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN108401190B (fr) |
| WO (1) | WO2019134499A1 (fr) |
Families Citing this family (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108401190B (zh) * | 2018-01-05 | 2020-09-04 | 亮风台(上海)信息科技有限公司 | 一种用于对视频帧进行实时标注的方法与设备 |
| CN112950951B (zh) * | 2021-01-29 | 2023-05-02 | 浙江大华技术股份有限公司 | 智能信息显示方法、电子装置和存储介质 |
| CN113596517B (zh) * | 2021-07-13 | 2022-08-09 | 北京远舢智能科技有限公司 | 一种基于混合现实的图像冻结标注方法及系统 |
| CN114201645A (zh) * | 2021-12-01 | 2022-03-18 | 北京百度网讯科技有限公司 | 对象标注方法、装置、电子设备以及存储介质 |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106412622A (zh) * | 2016-11-14 | 2017-02-15 | 百度在线网络技术(北京)有限公司 | 在播放视频内容时显示弹幕信息的方法和装置 |
| CN106603537A (zh) * | 2016-12-19 | 2017-04-26 | 广东威创视讯科技股份有限公司 | 一种移动智能终端标注视频信号源的系统及方法 |
| CN107277641A (zh) * | 2017-07-04 | 2017-10-20 | 上海全土豆文化传播有限公司 | 一种弹幕信息的处理方法及客户端 |
| CN107333087A (zh) * | 2017-06-27 | 2017-11-07 | 京东方科技集团股份有限公司 | 一种基于视频会话的信息共享方法和装置 |
| CN108401190A (zh) * | 2018-01-05 | 2018-08-14 | 亮风台(上海)信息科技有限公司 | 一种用于对视频帧进行实时标注的方法与设备 |
Family Cites Families (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7810121B2 (en) * | 2002-05-03 | 2010-10-05 | Time Warner Interactive Video Group, Inc. | Technique for delivering network personal video recorder service and broadcast programming service over a communications network |
| US20060104601A1 (en) * | 2004-11-15 | 2006-05-18 | Ati Technologies, Inc. | Method and apparatus for programming the storage of video information |
| CN103716586A (zh) * | 2013-12-12 | 2014-04-09 | 中国科学院深圳先进技术研究院 | 一种基于三维空间场景的监控视频融合系统和方法 |
| CN104935861B (zh) * | 2014-03-19 | 2019-04-19 | 成都鼎桥通信技术有限公司 | 一种多方多媒体通信方法 |
| CN104954812A (zh) * | 2014-03-27 | 2015-09-30 | 腾讯科技(深圳)有限公司 | 一种视频同步播放的方法、装置及系统 |
| CN104536661A (zh) * | 2014-12-17 | 2015-04-22 | 深圳市金立通信设备有限公司 | 一种终端截屏方法 |
| US9516255B2 (en) * | 2015-01-21 | 2016-12-06 | Microsoft Technology Licensing, Llc | Communication system |
| CN104883515B (zh) * | 2015-05-22 | 2018-11-02 | 广东威创视讯科技股份有限公司 | 一种视频标注处理方法及视频标注处理服务器 |
-
2018
- 2018-05-02 CN CN201810409977.7A patent/CN108401190B/zh active Active
- 2018-12-18 WO PCT/CN2018/121730 patent/WO2019134499A1/fr not_active Ceased
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106412622A (zh) * | 2016-11-14 | 2017-02-15 | 百度在线网络技术(北京)有限公司 | 在播放视频内容时显示弹幕信息的方法和装置 |
| CN106603537A (zh) * | 2016-12-19 | 2017-04-26 | 广东威创视讯科技股份有限公司 | 一种移动智能终端标注视频信号源的系统及方法 |
| CN107333087A (zh) * | 2017-06-27 | 2017-11-07 | 京东方科技集团股份有限公司 | 一种基于视频会话的信息共享方法和装置 |
| CN107277641A (zh) * | 2017-07-04 | 2017-10-20 | 上海全土豆文化传播有限公司 | 一种弹幕信息的处理方法及客户端 |
| CN108401190A (zh) * | 2018-01-05 | 2018-08-14 | 亮风台(上海)信息科技有限公司 | 一种用于对视频帧进行实时标注的方法与设备 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN108401190B (zh) | 2020-09-04 |
| CN108401190A (zh) | 2018-08-14 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US9445150B2 (en) | Asynchronously streaming video of a live event from a handheld device | |
| US10484737B2 (en) | Methods and systems for instantaneous asynchronous media sharing | |
| WO2019134499A1 (fr) | Procédé et dispositif d'étiquetage en temps réel de trames vidéo | |
| US20200241835A1 (en) | Method and apparatus of audio/video switching | |
| CN115243074A (zh) | 视频流的处理方法及装置、存储介质、电子设备 | |
| CN113938470B (zh) | 一种浏览器播放rtsp数据源的方法、装置以及流媒体服务器 | |
| CN111385576A (zh) | 视频编码方法、装置、移动终端及存储介质 | |
| CN113424487B (zh) | 用于视频显示的方法、装置及计算机存储介质 | |
| CN114051120A (zh) | 视频告警方法、装置、存储介质及电子设备 | |
| CN110855645B (zh) | 流媒体数据播放方法、装置 | |
| WO2024051823A1 (fr) | Procédé de gestion d'informations de réception et dispositif dorsal | |
| CN115834918B (zh) | 视频直播方法、装置、电子设备及可读存储介质 | |
| JP2014075735A (ja) | 画像処理装置および画像処理方法 | |
| WO2019149066A1 (fr) | Procédé de lecture vidéo, appareil terminal et support d'informations | |
| WO2021057697A1 (fr) | Procédé et appareils de codage et de décodage vidéo, support de stockage et dispositif informatique | |
| CN113965779B (zh) | 云游戏数据的传输方法、装置、系统及电子设备 | |
| CN113079386A (zh) | 一种视频在线播放方法、装置、电子设备及存储介质 | |
| CN115665502B (zh) | 视频数据处理方法、注入方法、系统、设备及存储介质 | |
| CN112866745B (zh) | 流媒体视频数据处理方法、装置、计算机设备和存储介质 | |
| US9872060B1 (en) | Write confirmation of a digital video record channel | |
| US11743478B2 (en) | Video stream transcoding with reduced latency and memory transfer | |
| CN109788357B (zh) | 一种播放媒体文件的方法及装置 | |
| CN110798700B (zh) | 视频处理方法、视频处理装置、存储介质与电子设备 | |
| CN115942000A (zh) | 基于h.264格式的视频流转码方法及装置、设备及介质 | |
| US10354695B2 (en) | Data recording control device and data recording control method |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18898158 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 18898158 Country of ref document: EP Kind code of ref document: A1 |