[go: up one dir, main page]

WO2017211206A1 - Procédé et dispositif de marquage vidéo, et procédé et système de surveillance vidéo - Google Patents

Procédé et dispositif de marquage vidéo, et procédé et système de surveillance vidéo Download PDF

Info

Publication number
WO2017211206A1
WO2017211206A1 PCT/CN2017/086325 CN2017086325W WO2017211206A1 WO 2017211206 A1 WO2017211206 A1 WO 2017211206A1 CN 2017086325 W CN2017086325 W CN 2017086325W WO 2017211206 A1 WO2017211206 A1 WO 2017211206A1
Authority
WO
WIPO (PCT)
Prior art keywords
event
video
audio
marking
video file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2017/086325
Other languages
English (en)
Chinese (zh)
Inventor
韦薇
王启贵
谢思远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Publication of WO2017211206A1 publication Critical patent/WO2017211206A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7834Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using audio features
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording

Definitions

  • This application relates to, but is not limited to, the field of communication technology.
  • the process of tagging a video file requires manual viewing and a determination as to whether or not to mark it, and where to mark the video file.
  • This type of marking is not only inefficient, but also whether the judgment result of the marking and the determination of the marking position are subject to manual influence, which may result in poor marking accuracy.
  • the present invention provides a video tagging method, device, and video monitoring method and system, which solve the problem of low efficiency and poor accuracy when the video tagging is manually implemented in the related art.
  • a video marking method including:
  • the extracted sound features are matched to each audio event in the audio event library; each of the audio events is established based on a sound characteristic of the audio signal generated at the time of the event;
  • an event flag is generated for the audio event that occurs at a corresponding location in the video file.
  • the extracting the sound characteristics of the audio signal in the video file includes:
  • the sound characteristics of the audio signal in the video file are extracted during video recording.
  • the extracting the sound characteristics of the audio signal includes:
  • the matching the extracted sound features with each of the audio events comprises:
  • the corresponding location in the video file performs event marking on the audio event that occurs, including performing one or more of the following:
  • the acquiring the severity level corresponding to the audio event that occurs includes:
  • the severity level corresponding to the audio event is determined according to one or more of recording location information of the video file and duration after the audio event occurs.
  • the event marking the occurrence of the audio event in the corresponding position in the video file includes:
  • the marking format corresponding to the severity level of the audio event is selected according to the correspondence table of the severity level and the marking format.
  • the extracting the sound feature of the audio signal in the video file includes: extracting a sound feature of the audio signal in the video file according to a preset detection period;
  • the method further includes:
  • the start time of the audio event in the previous detection period is taken as the start time of the audio event in the current detection period
  • the end time of the audio event in the previous detection period and the start time of the audio event in the current detection period are set as the start time of the current detection period.
  • the embodiment of the invention further provides a video monitoring method, including:
  • the video file of the event tag portion is displayed as an alarm.
  • the embodiment of the invention further provides a video marking device, comprising:
  • a feature extraction module configured to: extract a sound feature of the audio signal in the video file
  • a processing module configured to: match the sound feature extracted by the feature extraction module with each audio event in the audio event library; each audio event is established based on a sound feature of an audio signal generated when the event occurs;
  • a marking module configured to: when the processing result of the processing module is that the sound feature is successfully matched with the at least one of the audio events, the corresponding event in the video file performs an event tag on the audio event that occurs.
  • the device further includes:
  • the video recording module is set to: perform video recording
  • the feature extraction module is configured to: extract a sound feature of an audio signal in the video file during video recording by the video recording module.
  • the marking module performs event marking on the occurrence of the audio event in a corresponding position in the video file, including performing one or more of the following markings:
  • the embodiment of the invention further provides a video monitoring system, comprising: a monitoring processing device and the video marking device according to any one of the preceding claims;
  • the video tagging device is configured to: perform an event tag on the video file recorded during the video monitoring process, and notify the monitoring processing device after completing an event tag on the video file;
  • the monitoring processing device is configured to: after receiving the alarm of the video marking device, perform an alarm display on the video file of the event marking portion.
  • the embodiment of the present invention further provides a computer readable storage medium, where the computer readable storage medium stores computer executable instructions, and when the processor executes the computer executable instructions, the following operations are performed:
  • the extracted sound features are matched to each audio event in the audio event library; each of the audio events is established based on a sound characteristic of the audio signal generated at the time of the event;
  • an event flag is generated for the audio event that occurs at a corresponding location in the video file.
  • the embodiment of the present invention further provides a computer readable storage medium, where the computer readable storage medium stores computer executable instructions, and when the processor executes the computer executable instructions, the following operations are performed:
  • the video file of the event tag portion is displayed as an alarm.
  • the video marking method, device and video monitoring method and system provided by the embodiments of the present invention, by extracting the sound characteristics of the audio signal in the video file, matching the extracted sound features with each audio event in the audio event library, when the extracted sound When the feature is successfully matched with the at least one audio event, it indicates that the audio event occurs in the video file, and the audio event occurs in the corresponding position in the video file; wherein the audio event is based on the audio generated when the event occurs in advance
  • the sound characteristics of the signal are established.
  • by setting an audio event in advance, and then matching the sound feature of the audio signal in the video file with each audio event to determine whether the corresponding mark needs to be performed it is not necessary to manually view the video content to determine whether to mark.
  • the efficiency and accuracy of marking video files can be greatly improved.
  • FIG. 1 is a flowchart of a video marking method according to an embodiment of the present invention
  • FIG. 2 is a flowchart of a video monitoring method according to an embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of a video marking apparatus according to an embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of another video marking apparatus according to an embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of still another video marking apparatus according to an embodiment of the present disclosure.
  • FIG. 6 is a schematic structural diagram of a video monitoring system according to an embodiment of the present disclosure.
  • FIG. 7 is a schematic structural diagram of a component of a video monitoring system according to an embodiment of the present disclosure.
  • FIG. 8 is a flowchart of still another video monitoring method according to an embodiment of the present invention.
  • the audio event is set in advance, and then the sound feature of the audio signal in the video file to be processed is extracted, and the sound feature is matched with each audio event to automatically determine whether the corresponding mark needs to be performed, and does not need to be manually viewed.
  • the video content thus determines whether or not to mark, which can greatly improve the efficiency and accuracy of marking video files.
  • FIG. 1 it is a flowchart of a video marking method according to an embodiment of the present invention.
  • the video marking method provided by the embodiment of the present invention may include the following steps, namely, S101 to S105:
  • the video file in the embodiment of the present invention may include an audio signal and a synchronously recorded video signal.
  • the S101 in the embodiment of the present invention may be performed after the video file is recorded, or during the video file recording process, and the time for marking the video file may be improved during the video file recording process. Real-time marking can be realized. For some areas, especially in the field of video surveillance, the marked alarm content is seen one minute or one second earlier, and the impact on the subsequent alarm event situation may be very different. Therefore, the implementation of S101 during video file recording is of great significance for the field of video surveillance.
  • Each audio event in an embodiment of the invention is established based on the acoustic characteristics of the audio signal produced at the time the event occurred. For example, for a smoke alarm event, a smoke alarm sound is generated, and the sound characteristics of the sound are extracted to obtain an audio event of the smoke alarm. For another example, for a robbery or violent or aggressive incident, there may be a call for help, such as screaming for help, and extracting the sound characteristics of these sounds may result in an audio event of a robbery or violent or aggressive event. In addition to the above examples, there are generally one or more sound features corresponding to the event for different events, for example, a gunshot event may correspond to a gunshot sound, and in other events, a corresponding glass breakage may occur. Sound, crying, car horn, etc. This embodiment of the present invention will not be described again. Those skilled in the art should understand that the audio events in the embodiments of the present invention can be flexibly configured according to the requirements in practical applications.
  • S103 determining whether the extracted sound feature is successfully matched with at least one audio event; When the matching is performed, S104 is performed; when it is determined that the matching is not performed, S105 is performed;
  • S104 Perform event tagging on the audio event that occurs in the corresponding position in the video file.
  • S105 The current no audio event occurs, and can continue to wait for the next detection.
  • the embodiment of the present invention completes matching with each audio event by presetting the audio event and automatically extracting the sound features of the sound signal in the video file through the flow shown in FIG. 1, if the extracted sound feature matches one of the audio events. If successful, the audio event occurs in the video file, and the audio event is automatically marked in the corresponding position of the video file.
  • the above judgment and marking process do not require manual participation at all, which can improve efficiency and improve the accuracy of marking.
  • the foregoing process may also be performed in a video recording process, and in the video recording process, real-time marking processing may be performed on the recorded video file, and the related file needs to be recorded in the video file.
  • the time limit of the mark processing can be greatly improved. For the video surveillance field, some malicious events can be stopped in time or even some malicious events can be avoided to ensure the security of the user's property and life.
  • the implementation of extracting the sound feature of the audio signal and matching the extracted sound feature with each audio event may be performed by using the following process:
  • the background signal and the foreground signal of the audio signal are extracted.
  • the background signal and the foreground signal can be separated by behavioral modeling based on the neural mechanism of human hearing; this can eliminate the influence of the background signal.
  • performing event marking on the audio event that occurs in the corresponding position in the video file includes performing one or more of the following:
  • the start time of the occurrence of the audio event is marked at the key video frame position of the video file; the key video frame in the embodiment of the present invention refers to the video frame at the moment when the audio event occurs.
  • the event mark can be performed after obtaining a clear sound source direction (which can also be characterized by an angle) and/or a distance to avoid unclear sound.
  • the source information instead leads to misleading subsequent processing.
  • a corresponding severity level may be set for each audio event.
  • the severity level may be set to be a general Severe, characterized by a coefficient of 0; for a vicious audio event, set its severity level to be more severe, characterized by a factor of 1; for a dangerous audio event, set its severity level to be very severe, characterized by a factor of 2.
  • you perform a severity level tag you can directly mark the coefficient for each severity level.
  • its severity level may not only be related to the event itself, but may also be closely related to where the event occurred (eg, hotels, shops, schools, residential areas, within the home) and the duration of the event.
  • the implementation manner of obtaining the severity level corresponding to the audio event may include: according to the location information of the recorded video file (that is, the location where the audio event occurs) and/or the audio event occurs.
  • the duration of the subsequent determination of the severity level corresponding to the audio event that is, determining the severity level corresponding to the audio event in combination with the acquired location information and/or duration, such that the determined severity level considers a more comprehensive factor.
  • the results obtained are also more accurate.
  • the implementation of the event marking of the audio event in the corresponding position in the video file may include: marking the severity level corresponding to the audio event
  • the tag format corresponding to the severity level of the audio event may be selected according to a preset correspondence table between the severity level and the tag format.
  • the mark format in the embodiment of the present invention includes, but is not limited to, a different color/format adopted by the text box and the text. The following is an example. See Table 1.
  • the audio events of different severity levels are marked with different mark formats, and when the video files of the mark parts are displayed, the mark can be displayed in different formats, for the user Play different prompts, which is more conducive to users to respond correctly and quickly to events of different severity levels.
  • the process of marking the video file may be performed periodically.
  • the embodiment of the present invention may preset a detection period, for example, 10 seconds, that is, every 10 seconds. It can be detected once; it can also be set to 30 seconds, 1 minute, etc.
  • the value of the detection period can be flexibly set according to actual needs.
  • the audio signal and its sound characteristics are extracted from the recorded video file during each detection cycle, and the extracted sound features are matched with each audio event. In this way, there will be the following situations: for example, suppose a robbery occurs, in which the threat of intimidation, crying, crying, and crying for several seconds, and no obvious long interruption in the middle, may continue to occur for a period of time.
  • the embodiment of the present invention can set an event merging rule, which can be integrated into an audio event for the above similar situation, and improves the intelligence and accuracy of detection and marking.
  • the merge rule can be set to any of the following rules:
  • M is greater than or equal to 2; for example, if the detection period is 1 minute and M is 10, the same audio event is allowed to be merged within 10 minutes;
  • the combination is detected as an associated audio event, and N is greater than or equal to 2.
  • the start time of the event is marked in the video file, and the end time may not be marked first.
  • wait for the next detection cycle if the next detection cycle If no audio event is detected, or no identical or associated audio event is detected, the end time of the audio event is marked as the start time of the next detection cycle.
  • the sound features of the audio signals extracted in the adjacent detection period are successfully matched with the at least one audio event, that is, the audio events occur in two adjacent detection periods.
  • the start time of the audio event in the previous detection period is used as the current detection.
  • the start time of the audio event in the period, the end time is not marked first, waiting for the subsequent detection result; when it is judged that the combination is not performed, the end time of the audio event in the previous detection period is set as the start time of the current detection period, and the current detection is set.
  • the start time of the audio event in the cycle is the start time of the current detection cycle.
  • the method provided by the embodiment of the present invention combines the generated audio events.
  • the re-acquisition of the severity level corresponding to the audio event may be re-acquired, and when the change occurs, the corresponding update may be performed.
  • the marking of the video file may also include the end time and/or duration of the audio event.
  • the video file for the start time and end time segments can be referred to as a tag video. In the subsequent alarm display, the video of this label can be displayed in a targeted manner.
  • FIG. 2 is a flowchart of a video monitoring method according to an embodiment of the present invention.
  • the video monitoring method provided by the embodiment of the present invention may include the following steps, that is, S201 to S203:
  • S201 Perform monitoring video recording; in practical applications, video acquisition and synchronized audio collection can be performed by an image collector (such as a camera) and a pickup.
  • an image collector such as a camera
  • the monitoring personnel can check the most timely I saw the video content of the audio event part and made the corresponding processing in time.
  • the event may still occur, has not ended yet; or the event may have ended, depending on the event duration and the event detection period.
  • the implementation of the alarm display of the video file of the marked part may send an alarm to the background server.
  • the real-time video of the image collector in the embodiment of the present invention is displayed before the display device, the video content of the marked portion is currently displayed by the display device, and the corresponding event flag is displayed correspondingly.
  • the video link of the alarm message and the event tag portion may be sent to the display device, and the user may play the video by clicking the video link, and
  • the function of switching to real-time video at any time can also be provided in the embodiment of the present invention.
  • corresponding alarm processing may be required for audio events that occur (eg, robbery, shooting, etc.). Therefore, when displaying on the display device, the embodiment of the present invention can also provide an alarm option bar, which can be integrated with the time point mark on the video progress bar, and the alarm option can be popped up when the user clicks. At the same time, it is considered that the user needs to view multiple times for the key event to make the determination. Therefore, the embodiment of the present invention can also provide a lookback function and can also be integrated in a certain position on the video progress bar (for example, the corresponding mark can be embodied, for example, integrated. On the time point mark), the user needs to look back and click on the logo of the corresponding location.
  • the audio event occurring can be timely and accurately observed in the monitoring process, and a timely and accurate response can be made to ensure the security of the user's property and life.
  • FIG. 3 is a schematic structural diagram of a video marking apparatus according to an embodiment of the present invention.
  • the video marking device 30 provided by the embodiment of the present invention may include: a feature extraction module 31, a processing module 32, and a marking module 33.
  • the feature extraction module 31 is configured to: extract sound features of the audio signal in the video file.
  • the video file in the embodiment of the present invention may include an audio signal and a synchronously recorded video signal.
  • the processing module 32 is configured to: the sound feature and the audio event library extracted by the feature extraction module 31 Each audio event is matched.
  • Each audio event in an embodiment of the invention is established based on the acoustic characteristics of the audio signal produced at the time the event occurred. For different events, there are generally one or more sound features corresponding to the event. For example, a gunshot event may have a gunshot sound, while in other events, a glass break sound may be generated correspondingly, crying. Sound, car horn, and so on. This embodiment of the present invention will not be described again. Those skilled in the art should understand that the audio events in the embodiments of the present invention can be flexibly configured according to the requirements in practical applications.
  • the marking module 33 is configured to: when the processing result of the processing module 32 is that the sound feature is successfully matched with the at least one audio event, the corresponding event in the video file is event-marked for the audio event that occurs.
  • the foregoing functions of the feature extraction module 31, the processing module 32, and the marking module 33 in the embodiment of the present invention may be implemented by a processor, or may be implemented independently.
  • the feature extraction module 31 automatically extracts the sound features of the sound signal in the video file to complete the matching with each audio event via the processing module 32.
  • Event markers are automatically made at the appropriate location in the video file. The entire process does not require manual participation, and the marking efficiency and accuracy can be greatly guaranteed.
  • FIG. 4 is a schematic structural diagram of another video marking apparatus according to an embodiment of the present invention. Based on the structure of the device shown in FIG. 3, the video tagging device 30 in the embodiment of the present invention may further include:
  • the video recording module 34 is configured to: perform video recording.
  • the video recording module 34 can include a video capture device and a pickup. That is, the video tagging device 30 itself can be used as a monitoring device, which can cooperate with the monitoring platform to complete video monitoring in various scenarios.
  • the feature extraction module 31 is configured to: during the video recording process performed by the video recording module 34, extract the sound features of the audio signal in the video file, and then complete the subsequent marking process flow via the processing module 32 and the marking module. This can improve the timeliness of the processing of marking the video file, and basically realize real-time marking. For the video monitoring field, the marked alarm content is seen one minute or one second earlier, and the impact on the subsequent alarm event situation may be very Different.
  • the feature extraction module 31 extracts a sound feature of the audio signal.
  • the processing module 32 matches the extracted sound features with each audio event, which can be performed by the following process:
  • the feature extraction module 31 After transforming the audio signal into the time-frequency domain, the feature extraction module 31 extracts the background signal and the foreground signal of the audio signal, and extracts the sound feature set from the foreground signal.
  • the feature extraction module 31 can separate the background signal and the foreground signal by behavioral modeling based on the neural mechanism of human hearing; this can eliminate the influence of the background signal.
  • the processing module 32 reads the audio event from the audio event library, and calculates the similarity between the sound feature set extracted by the feature extraction module 31 and each audio event, when the similarity between the sound feature set and an audio event is greater than the set similarity. When the threshold is reached, it is determined that the audio event is successfully matched, that is, the audio event is determined to have occurred in the video file. Event events can then be flagged for the audio event that occurs in the video file.
  • the marking module 33 performs event marking on the generated audio event in the corresponding position in the video file, including performing one or more of the following markings:
  • the start time of the occurrence of the audio event is marked at the key video frame position of the video file; the key video frame in the embodiment of the present invention refers to the video frame at the moment when the audio event occurs.
  • a corresponding severity level may be set for each audio event.
  • the severity level may be set to be a general Severe, characterized by a coefficient of 0; for a vicious audio event, set its severity level to be more severe, characterized by a factor of 1; for a dangerous audio event, set its severity level to be severe, characterized by a factor of 2.
  • you perform a severity level tag you can directly mark the coefficient for each severity level.
  • its severity level may not only be related to the event itself, but may also be related to the event. Locations that occur (such as hotels, shops, schools, residential areas, homes) and the duration of events are closely related.
  • the implementation manner of obtaining the severity level corresponding to the audio event may include: according to the location information of the recorded video file (that is, the location where the audio event occurs) and/or the audio event occurs.
  • the duration of the subsequent determination of the severity level corresponding to the audio event that is, determining the severity level corresponding to the audio event in combination with the acquired location information and/or duration, such that the determined severity level considers a more comprehensive factor.
  • the results obtained are also more accurate.
  • the implementation of the event marking of the audio event in the corresponding position in the video file may include: marking the severity level corresponding to the audio event
  • the tag format corresponding to the severity level of the audio event may be selected according to a preset correspondence table between the severity level and the tag format.
  • the mark format in the embodiment of the present invention includes, but is not limited to, a different color/format adopted by the text box and the text.
  • FIG. 5 is a schematic structural diagram of still another video marking apparatus according to an embodiment of the present invention.
  • the video tagging apparatus 30 provided by the embodiment of the present invention may further include:
  • the cache module 35 is configured to: store an audio event library, which may be obtained from another server (for example, a monitoring server), or directly receive the user's setting acquisition.
  • the cache module 35 is further configured to: cache audio data and video data collected by the video recording module 34, and various data marked by the cache tag module 33.
  • the process of marking the video file may be performed periodically.
  • the embodiment of the present invention may preset a detection period, for example, 10 seconds, that is, every 10 seconds. It can be detected once; it can also be set to 30 seconds, 1 minute, etc. In practical applications, the value of the detection period can be flexibly set according to actual needs.
  • the feature extraction module 31 extracts the audio signal and its sound features from the recorded video file, and the processing module 32 matches the extracted sound features with each audio event. In this way, there will be matching to the same or associated audio events over multiple detection cycles.
  • the embodiment of the present invention can set an event merging rule, and the similarity flag module 33 can integrate it into an audio event to improve the intelligence and accuracy of detection and marking.
  • the tagging module 33 can be based on the following merge rules Process any of them:
  • M is greater than or equal to 2; for example, if the detection period is 2 minutes and M is 5, the same audio event is allowed to be merged within 10 minutes;
  • the combination is detected as an associated audio event, and N is greater than or equal to 2.
  • the marking module 33 when the processing module 32 detects that an audio event occurs for the first time in a certain detection period, the marking module 33 first marks the start time of the event in the video file. The end time may not be marked first, waiting for the detection result of the next detection period. If no audio event is detected in the next detection period, or no identical or associated audio event is detected, the end time of marking the audio event is The start time of the next detection cycle.
  • the processing module 32 has a sound feature of the audio signal extracted in the adjacent detection period that matches at least one audio event, that is, in two adjacent detection periods.
  • the marking module 33 may determine whether to merge the audio events occurring in two adjacent detection periods according to the above-mentioned preset event merging rule; when it is determined that the merging is performed, the audio in the previous detection period is The start time of the event is used as the start time of the audio event in the current detection period. The end time is not marked first, waiting for the subsequent detection result. When it is judged that the combination is not performed, the end time of the audio event in the previous detection period is set as the current detection period. The start time of the audio event in the current detection period is set to the start time of the current detection period.
  • the marking module 33 merges the generated audio events. , can re-acquire whether the severity level corresponding to the audio event changes, and when the change occurs, the mark can be updated accordingly. Therefore, in the embodiment of the present invention, the marking performed by the marking module 33 on the video file may further include an end time and/or duration of the audio event.
  • the video file for the start time and end time can be called label view. frequency.
  • FIG. 6 is a schematic structural diagram of a video monitoring system according to an embodiment of the present invention.
  • the video monitoring system 60 provided by the embodiment of the present invention may include: a monitoring processing device 61 and a video marking device 62 in any of the embodiments shown in FIGS. 3 to 5.
  • the video tagging device 62 is configured to: perform event tagging on the video file recorded during the video monitoring process, and after performing an event tag on the video file, alert the monitoring processing device 61; the process of the alarm is also an event.
  • the activation process is marked, which can be done by a tag activation module set in video tagging device 62.
  • the monitoring processing device 61 is configured to: after receiving the alarm of the video marking device 62, display the video file of the event marking portion. It should be noted that, according to the above description, the event may still occur, and has not yet ended; or the event may have ended, depending on factors such as the duration of the event, the detection period, and the like.
  • the monitoring processing device 61 may be implemented by using a background server in combination with a corresponding display device, the background server storing the storage medium for storing the audio event library, and also for storing the information from the video marking device 62. Video data, alarm information, etc.
  • the video tagging device 62 may further include an interaction unit, where the interaction unit may be a display unit, configured to: receive a tag video and real-time video that can be viewed from the tag module 33, or receive or deliver Various interactive messages.
  • the interaction unit may be a display unit, configured to: receive a tag video and real-time video that can be viewed from the tag module 33, or receive or deliver Various interactive messages.
  • the video marking device 62 sends the video file of the event tag portion to the monitoring processing device 61 for alarm. If the real-time video of the image collector in the embodiment of the present invention is displayed before the monitoring processing device 61, the monitoring device 61 currently displays the video content of the marked portion, and the corresponding event flag is displayed accordingly. If the monitoring processing device 61 is not displaying the real-time video of the image collector in the embodiment of the present invention, the video link of the alarm message and the event tag portion may be sent to the monitoring processing device 61, and the user may click the link to play the video. The function of switching to real-time video at any time can also be provided in the embodiment of the present invention.
  • corresponding alarm processing may be required for audio events that occur (eg, robbery, shooting, etc.). Therefore, the embodiment of the present invention is in monitoring the processing equipment.
  • an alarm option bar can also be provided.
  • the alarm option bar can be integrated with the time point mark on the video progress bar, and when the user clicks, the alarm option can be popped up.
  • the embodiment of the invention can also provide a lookback function and can also be integrated in a certain position on the video progress bar, and the user needs to look back at the identifier of the corresponding location. Just fine.
  • the monitoring processing device 61 and the video marking device 62 can be combined to form a monitoring system, and the video marking device 62 can realize real-time marking function on the video, which can be timely and accurate in the monitoring process. View the audio events that occur and make timely and accurate responses to ensure the safety of the user's property and life.
  • FIG. 7 is a schematic structural diagram of a video surveillance system according to an embodiment of the present invention.
  • the component of the video surveillance system may include:
  • Camera and pickup module 71 also referred to as monitoring module or monitoring device
  • audio event object 72 may be a display or separate Display terminals, such as mobile phones, pads, etc.
  • background server 73 may be a display or separate Display terminals, such as mobile phones, pads, etc.
  • the camera and the pickup module 71 may be a camera with a built-in pickup or a camera with an external pickup. If it is external, audio and video synchronization is required.
  • the camera and the pickup module 71 further includes a feature extraction module, a processing module, a marking module, and a cache module.
  • the feature extraction module and the processing module are configured to: detect an audio event object 72 according to the collected audio signal; and set the marking module to: According to the detected audio event object 72, the mark attribute of the time point mark of the video event mark is acquired, and the real-time video is edited, the time point mark is marked, and the eye-catching text box annotation is added on the label video frame; the cache module is set to: Cache audio event libraries, acquired audio and video signals, event markers, and more.
  • the feature extraction module is configured to: first separate the foreground signal and the background signal of the audio signal, and perform feature extraction on the foreground signal, and the processing module is further configured to: compare the foreground signal with an audio event of the event detection model library in the cache module, if similar If the degree exceeds the set threshold, then one or more types of audio events are detected. Mark The module can locate the sound source and obtain the sound source distance and sound source direction. Then determine the severity. The processing module is further configured to: first determine whether to integrate the audio event, and if so, integrate the audio event, obtain the start time, the end time, and the duration, and integrate the audio detection conclusion, the sound source angle, and the sound source distance, and re-determine the severity level.
  • An audio event within an integrated time period generates only one point-in-time marker, including the marker start time and the marker end time. Saved to the cache module and synchronized to the database of the background server 73.
  • the camera and pickup module 71 is connected to the background server 73 via a network. Send an alarm to the background server. If the real-time video of the camera is being displayed before, continue to display. At this time, the label video with the marked attribute should be displayed; if the real-time video of the camera is not displayed before, the background server is displayed.
  • Send alarm messages and video links click to display the tag video with tag attributes, you can switch to live video at any time. The time point mark appears on the video progress bar. Click the mark to select the alarm or roll back. If you select the alarm, dial the specified alarm call and share the tag video marked with time and location.
  • management station 74 which may include an audio event management module, an audio event severity level determination management module, and a merge rule management module
  • audio features of specific events, entry and management of severity level determination rules, and merge rules, etc. can be entered and managed. .
  • the monitoring device (including the camera and the pickup) is disposed in the elevator. After the monitoring device detects the data collected by the pickup through the audio event in the current detection period, the detected audio event is “in-the-bail robbery”.
  • the audio event E1 is pre-registered and the mark attribute of the corresponding time point mark S1 is recorded, including the mark start time (ie, the current time), the severity level, the sound source distance, and the sound source direction. For example, the mark start time: 19:50:00, mark attribute: robbery
  • the audio event E1 is officially registered, and the video in the time period from the mark start time to the mark end time is called a tag video, and the tag video can be appropriately time-slided, for example, the start time of the tag video is pushed forward n Seconds and the end time is pushed back n seconds to get a more complete picture of the event.
  • the start time of the tag video is 19 seconds before 19:50:00. If n is 5, the start time is 19:49:55, and the end time is empty, indicating that the audio event is still occurring and does not end.
  • the next detection period starts at 19:50:11, and the audio and video acquisition and audio event detection are still performed according to the previous steps.
  • Detection is "robbery.”
  • the audio event is pre-registered as E2, and the current time, severity level, sound source distance, and sound source direction are recorded. For example, the mark start time: 19:50:11, mark attribute: robbery
  • Event consolidation is performed on E2 and E1 according to the event integration decision rule.
  • the flag of the event flag S1 of the audio event E1 is updated, for example, the duration, and the severity level is re-determined according to the severity level determination rule.
  • the start time of the next detection cycle is 19:51:10.
  • the audio and video acquisition and audio event detection are still performed according to the previous steps. After the data collected by the pickup of the camera is detected by the audio event, no event is detected. And since the mark end time of the time point mark S1 of the last audio event E1 is empty, the mark end time of setting S1 is 19:51:09.
  • the normal video is played on the corresponding display screen of the camera. There is no text comment in the center of the video, and the progress bar is displayed. At 19:50:00, there is an annotation in orange font for "robbery: 1 minute. The 09-second time point mark is no longer highlighted.
  • FIG. 8 is a flowchart of still another video monitoring method according to an embodiment of the present invention.
  • the method provided by the embodiment of the present invention may include the following steps, that is, S801 to S820:
  • an event detection period starts, and at time T1, an audio event is detected according to the above-mentioned video marking method (that is, matching of an audio event is performed);
  • S802 determining whether an audio event is detected; when it is determined that an audio event is not detected, executing S803; when it is determined that an audio event is detected, executing S805;
  • S803 determining whether the marking end time of the last audio event E0 is empty; when it is determined that it is empty, executing S804; when it is determined that it is not empty, executing S801 (waiting for the arrival of the next event detecting period);
  • S804 set the mark end time of the last audio event E0 to the previous second of T1, re-determine the severity level of E0, update the event flag S0 of the audio event E0, activate the flag S0; then execute S811;
  • S806 determining whether the audio event E1 is integrated with the last audio event E0; if integrated, executing S807; if not, executing S808;
  • S808 determining whether the end time of the marking of the audio event E0 is empty; when it is judged to be empty, executing S809; when it is determined that it is not empty, executing S810;
  • S810 Formally register the audio event E1, the event marker S1 starts at time T1, the end time is empty, the activation flag E1; and then executes S811;
  • S811 determining whether the video of the camera is being played; when it is determined that the video is being played, executing S814; when it is determined that the video is not playing, executing S812;
  • the display manner may be various, for example, displaying in the right area of the screen and sorting according to the event start time from the time of going to the post, if the monitor has not viewed the audio all the time.
  • the tag video link of the event may have multiple event alarm messages after a period of time. For the same audio event that updates the tag attribute multiple times, the alarm message needs to be merged);
  • S814 playing a tag video, simultaneously displaying an event tag attribute and a start time of the corresponding audio event on the screen, and displaying a progress bar, and displaying an audio event flag on a start time of the time point mark corresponding to the audio event on the progress bar;
  • S816 judging whether to click "alarm"; if clicked, executing S817; if not clicking, executing S818;
  • S817 Alert the designated terminal by phone or SMS or other specified means, share the tag video link of the audio event; then end the process.
  • the video marking method and the video monitoring method provided by the embodiments of the present invention can quickly locate the moment when a specific behavior or a specific event occurs in the video in the video monitoring, so that the video monitoring personnel can quickly find the problem and improve the working efficiency of the video monitoring personnel. .
  • Embodiments of the present invention also provide a computer readable storage medium storing computer executable instructions that, when executing computer executable instructions, perform the following operations, namely, S11 to S13:
  • Embodiments of the present invention also provide a computer readable storage medium storing computer executable instructions that, when executing computer executable instructions, perform the following operations, namely S21 to S23:
  • all or part of the steps of the above embodiments may also be implemented by using an integrated circuit. These steps may be separately fabricated into individual integrated circuit modules, or multiple modules or steps may be fabricated into a single integrated circuit module. achieve.
  • the device/function module/functional unit in the above embodiment can be implemented by using a general-purpose computing device. Now, they can be concentrated on a single computing device or distributed over a network of multiple computing devices.
  • the device/function module/functional unit in the above embodiment When the device/function module/functional unit in the above embodiment is implemented in the form of a software function module and sold or used as a stand-alone product, it can be stored in a computer readable storage medium.
  • the above mentioned computer readable storage medium may be a read only memory, a magnetic disk or an optical disk or the like.
  • the extracted sound feature is matched with each audio event in the audio event library by extracting the sound feature of the audio signal in the video file, and the video file is indicated when the extracted sound feature is successfully matched with the at least one audio event.
  • the audio event occurs, and the audio event occurring in the video file is event-marked; wherein the audio event is established in advance based on the sound characteristics of the audio signal generated when the event occurs.
  • by setting an audio event in advance, and then matching the sound feature of the audio signal in the video file with each audio event to determine whether the corresponding mark needs to be performed it is not necessary to manually view the video content to determine whether to mark.
  • the efficiency and accuracy of marking video files can be greatly improved.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Closed-Circuit Television Systems (AREA)
  • Alarm Systems (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

L'invention concerne un procédé et un dispositif de marquage vidéo, et un procédé et un système de surveillance vidéo. Le procédé de marquage vidéo consiste à : extraire, d'un fichier vidéo, une caractéristique sonore d'un signal audio ; exécuter une mise en correspondance sur la base de la caractéristique sonore extraite et de chaque événement audio dans une bibliothèque d'événements audio ; et, si la caractéristique sonore extraite correspond à au moins un événement audio, ajouter un marqueur d'événement à une position correspondante dans le fichier vidéo pour signifier l'occurrence d'un événement audio.
PCT/CN2017/086325 2016-06-08 2017-05-27 Procédé et dispositif de marquage vidéo, et procédé et système de surveillance vidéo Ceased WO2017211206A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610405207.6A CN107483879B (zh) 2016-06-08 2016-06-08 视频标记方法、装置及视频监控方法和系统
CN201610405207.6 2016-06-08

Publications (1)

Publication Number Publication Date
WO2017211206A1 true WO2017211206A1 (fr) 2017-12-14

Family

ID=60577623

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/086325 Ceased WO2017211206A1 (fr) 2016-06-08 2017-05-27 Procédé et dispositif de marquage vidéo, et procédé et système de surveillance vidéo

Country Status (2)

Country Link
CN (1) CN107483879B (fr)
WO (1) WO2017211206A1 (fr)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110992984A (zh) * 2019-12-02 2020-04-10 新华智云科技有限公司 音频处理方法及装置、存储介质
CN111950332A (zh) * 2019-05-17 2020-11-17 杭州海康威视数字技术股份有限公司 视频时序定位方法、装置、计算设备和存储介质
CN113038265A (zh) * 2021-03-01 2021-06-25 创新奇智(北京)科技有限公司 视频标注处理方法、装置、电子设备及存储介质
CN113435433A (zh) * 2021-08-30 2021-09-24 广东电网有限责任公司中山供电局 一种基于作业现场的音视频数据提取处理系统
CN114363660A (zh) * 2021-12-24 2022-04-15 腾讯科技(武汉)有限公司 视频合集确定方法、装置、电子设备及存储介质
US11722763B2 (en) 2021-08-06 2023-08-08 Motorola Solutions, Inc. System and method for audio tagging of an object of interest

Families Citing this family (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11074292B2 (en) * 2017-12-29 2021-07-27 Realwear, Inc. Voice tagging of video while recording
CN109246467A (zh) * 2018-08-15 2019-01-18 上海蔚来汽车有限公司 标记待分享视频的方法、装置、摄像机和智能手机
CN119440325A (zh) * 2018-08-22 2025-02-14 深圳市欢太科技有限公司 一种速记方法及装置、终端、存储介质
CN112513800B (zh) * 2018-08-22 2024-03-12 深圳市欢太科技有限公司 一种速记方法及装置、终端、存储介质
CN109121022B (zh) * 2018-09-28 2020-05-05 百度在线网络技术(北京)有限公司 用于标记视频片段的方法及装置
CN109640112B (zh) * 2019-01-15 2021-11-23 广州虎牙信息科技有限公司 视频处理方法、装置、设备及存储介质
CN110083085A (zh) * 2019-03-15 2019-08-02 杭州钱袋金融信息服务有限公司 一种具有语音标记功能的金融双录系统
CN110223715B (zh) * 2019-05-07 2021-05-25 华南理工大学 一种基于声音事件检测的独居老人家中活动估计方法
CN110211319B (zh) * 2019-06-05 2021-05-14 深圳市梦网视讯有限公司 一种安防监控预警事件跟踪方法和系统
CN110942766A (zh) * 2019-11-29 2020-03-31 厦门快商通科技股份有限公司 音频事件检测方法、系统、移动终端及存储介质
CN111327855B (zh) * 2020-03-10 2022-08-05 网易(杭州)网络有限公司 一种视频录制方法、装置以及视频定位方法、装置
CN112116328B (zh) * 2020-09-25 2023-12-19 维沃移动通信有限公司 提醒方法、装置及电子设备
CN112420077B (zh) * 2020-11-19 2022-08-16 展讯通信(上海)有限公司 声音定位方法和装置、测试方法和系统、设备及存储介质
WO2022265629A1 (fr) * 2021-06-16 2022-12-22 Hewlett-Packard Development Company, L.P. Scores de qualité de signal audio
CN113593619B (zh) * 2021-07-30 2022-08-09 北京百度网讯科技有限公司 用于录制音频的方法、装置、设备和介质
CN115730091A (zh) * 2021-08-31 2023-03-03 华为技术有限公司 批注展示方法、装置、终端设备及可读存储介质
CN113573136B (zh) * 2021-09-23 2021-12-07 腾讯科技(深圳)有限公司 视频处理方法、装置、计算机设备和存储介质
CN114979745B (zh) * 2022-05-06 2025-01-24 维沃移动通信有限公司 视频处理方法、装置、电子设备及可读存储介质
CN115184867A (zh) * 2022-06-15 2022-10-14 成都市联洲国际技术有限公司 一种声源定位方法、装置、存储介质及终端设备
CN115129927B (zh) * 2022-08-17 2023-05-02 广东龙眼数字科技有限公司 一种监控视频流回溯方法、电子设备及存储介质
CN115866358A (zh) * 2022-11-15 2023-03-28 浙江大华技术股份有限公司 音视频处理系统和方法
CN116887011A (zh) * 2023-06-12 2023-10-13 广州开得联软件技术有限公司 一种视频标记方法、装置、设备及介质
CN119277184A (zh) * 2024-04-15 2025-01-07 荣耀终端有限公司 视频处理方法、电子设备、芯片系统及存储介质
CN118338096A (zh) * 2024-05-15 2024-07-12 中科世通亨奇(北京)科技有限公司 可视化的视频标注方法、系统、电子设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101819770A (zh) * 2010-01-27 2010-09-01 武汉大学 音频事件检测系统及方法
WO2011025085A1 (fr) * 2009-08-25 2011-03-03 Axium Technologies, Inc. Procédé et système de surveillance audiovisuelle combinée
CN102044242A (zh) * 2009-10-15 2011-05-04 华为技术有限公司 语音激活检测方法、装置和电子设备
CN102176746A (zh) * 2009-09-17 2011-09-07 广东中大讯通信息有限公司 一种用于局部小区域安全进入的智能监控系统及实现方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011025085A1 (fr) * 2009-08-25 2011-03-03 Axium Technologies, Inc. Procédé et système de surveillance audiovisuelle combinée
CN102176746A (zh) * 2009-09-17 2011-09-07 广东中大讯通信息有限公司 一种用于局部小区域安全进入的智能监控系统及实现方法
CN102044242A (zh) * 2009-10-15 2011-05-04 华为技术有限公司 语音激活检测方法、装置和电子设备
CN101819770A (zh) * 2010-01-27 2010-09-01 武汉大学 音频事件检测系统及方法

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111950332A (zh) * 2019-05-17 2020-11-17 杭州海康威视数字技术股份有限公司 视频时序定位方法、装置、计算设备和存储介质
CN111950332B (zh) * 2019-05-17 2023-09-05 杭州海康威视数字技术股份有限公司 视频时序定位方法、装置、计算设备和存储介质
CN110992984A (zh) * 2019-12-02 2020-04-10 新华智云科技有限公司 音频处理方法及装置、存储介质
CN110992984B (zh) * 2019-12-02 2022-12-06 新华智云科技有限公司 音频处理方法及装置、存储介质
CN113038265A (zh) * 2021-03-01 2021-06-25 创新奇智(北京)科技有限公司 视频标注处理方法、装置、电子设备及存储介质
CN113038265B (zh) * 2021-03-01 2022-09-20 创新奇智(北京)科技有限公司 视频标注处理方法、装置、电子设备及存储介质
US11722763B2 (en) 2021-08-06 2023-08-08 Motorola Solutions, Inc. System and method for audio tagging of an object of interest
CN113435433A (zh) * 2021-08-30 2021-09-24 广东电网有限责任公司中山供电局 一种基于作业现场的音视频数据提取处理系统
CN114363660A (zh) * 2021-12-24 2022-04-15 腾讯科技(武汉)有限公司 视频合集确定方法、装置、电子设备及存储介质
CN114363660B (zh) * 2021-12-24 2023-09-08 腾讯科技(武汉)有限公司 视频合集确定方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN107483879B (zh) 2020-06-09
CN107483879A (zh) 2017-12-15

Similar Documents

Publication Publication Date Title
WO2017211206A1 (fr) Procédé et dispositif de marquage vidéo, et procédé et système de surveillance vidéo
CN109753920B (zh) 一种行人识别方法及装置
CN109040824B (zh) 视频处理方法、装置、电子设备和可读存储介质
US10141025B2 (en) Method, device and computer-readable medium for adjusting video playing progress
WO2021164644A1 (fr) Procédé et appareil de détection d'événement de violation, dispositif électronique et support de stockage
TW202105199A (zh) 資料更新方法、電子設備和儲存介質
KR20200116158A (ko) 이미지 처리 방법 및 장치, 전자 기기 및 저장 매체
WO2021093375A1 (fr) Procédé, appareil et système pour détecter des personnes marchant ensemble, dispositif électronique et support de stockage
KR101677607B1 (ko) 동영상 브라우징 방법, 장치, 프로그램 및 기록매체
CN111814629A (zh) 人员检测方法及装置、电子设备和存储介质
TWI724546B (zh) 用於手機的智慧化報警方法、裝置及包括其的系統
WO2021036382A1 (fr) Procédé et appareil de traitement d'image, dispositif électronique et support de stockage
US20190259116A1 (en) Systems And Methods For Generating An Audit Trail For Auditable Devices
WO2021093427A1 (fr) Procédé et appareil de gestion d'informations de visiteur, dispositif électronique et support d'enregistrement
CN112101216A (zh) 人脸识别方法、装置、设备及存储介质
WO2023094894A1 (fr) Procédé et appareil de suivi de cible, procédé et appareil de détection d'événement et dispositif électronique et support de stockage
US10887628B1 (en) Systems and methods for adaptive livestreaming
CN104050785A (zh) 基于虚拟化边界与人脸识别技术的安全警戒方法
CN109614181A (zh) 移动终端的安全态势展示方法、装置及存储介质
WO2023155484A1 (fr) Dispositif cible anti-retrait
CN111814627B (zh) 人员检测方法及装置、电子设备和存储介质
WO2012146273A1 (fr) Procédé et système pour l'insertion d'un marqueur vidéo
CN112486770B (zh) 客户端打点上报方法、装置、电子设备和存储介质
CN111339964A (zh) 图像处理方法及装置、电子设备和存储介质
CN118585671A (zh) 视频检索方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17809645

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17809645

Country of ref document: EP

Kind code of ref document: A1