[go: up one dir, main page]

WO2018149376A1 - Video abstract generation method and device - Google Patents

Video abstract generation method and device Download PDF

Info

Publication number
WO2018149376A1
WO2018149376A1 PCT/CN2018/076290 CN2018076290W WO2018149376A1 WO 2018149376 A1 WO2018149376 A1 WO 2018149376A1 CN 2018076290 W CN2018076290 W CN 2018076290W WO 2018149376 A1 WO2018149376 A1 WO 2018149376A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
trajectory
target object
retrieval
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2018/076290
Other languages
French (fr)
Chinese (zh)
Inventor
潘志敏
车军
向杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision Digital Technology Co Ltd
Original Assignee
Hangzhou Hikvision Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co Ltd filed Critical Hangzhou Hikvision Digital Technology Co Ltd
Publication of WO2018149376A1 publication Critical patent/WO2018149376A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/738Presentation of query results
    • G06F16/739Presentation of query results in form of a video summary, e.g. the video summary being a video sequence, a composite still image or having synthesized frames

Definitions

  • the present application relates to the field of video processing technologies, and in particular, to a method and an apparatus for generating a video digest.
  • video summary technology analyzes the structure and content of the video, extracts the meaningful part from the original video, that is, the moving target, and combines the moving target with the background scene in a specific way to form a simple and fully capable video content.
  • a summary of a video is a simple summary of long video content, usually represented by a sequence of static or dynamic images, and the original information is preserved.
  • the corresponding technique for generating a video summary is a technique for generating a video summary based on a target object, and the method includes the following steps: First, by analyzing an input video, generating a video structured description file, and establishing a correlation according to the video structured description file; a database, wherein the video structured description file includes attribute information of the target object in the video and trajectory information of the target object; secondly, the established database is retrieved to extract the trajectory information of the moving target; and finally, each target object is further The trajectory is analyzed, the target trajectory is translated on the time axis, and the trajectory of the target object at different times is arranged in the same picture to generate a summary video.
  • the technology can meet the user's need to generate a video summary for a specific target object, and generate a video summary with a short duration, a compact motion, and a high concentration ratio.
  • the trajectory of the target object when the trajectory of the target object is arranged, the actual situation is complicated, and there are cases where the trajectories of the plurality of target objects overlap. Since the above technique repeats the trajectory of the target object, the trajectory of the overlapping target object is performed. Excluded, when the trajectories of multiple target objects overlap, the summary video loses the information of the associated target object; and due to the loss of the associated target object information, a target object is frequently generated in the generated video summary. And the phenomenon of disappearing, resulting in poor visual effects.
  • the purpose of the embodiment of the present application is to provide a method and a device for generating a video summary, so as to improve the visual effect of the video summary generated when the target trajectory is complicated.
  • the specific technical solutions are as follows:
  • the embodiment of the present application provides a method for generating a video summary, where the method includes:
  • the first target original image is spliced with the first abstract background image to generate a video summary.
  • the target retrieval condition includes: retrieving a time period and/or retrieval attribute information of the target object;
  • the searching for the established database includes:
  • the searching for the established database includes:
  • the searching for the established database includes:
  • the method before the acquiring the target search condition, the method further includes:
  • trajectory information includes: movement information of the trajectory and overlapping state information with other trajectories;
  • the original image, the mask map, and the target original image are stored in the database.
  • the mask map of each frame corresponding to the track information is extracted from the video frame that includes the target object, including:
  • the trajectory of the to-be-translated trajectory that meets the preset translation condition is translated to the same target time period along the time axis, including:
  • a track that is not in the target time segment in the first track set is used as a to-be-translated track, and is stored in a to-be-translated queue, where the to-be-translated track is not in the first track set.
  • the current to-be-translated trajectory is translated to the target time period, and stored in the summary queue;
  • step of acquiring the first target original image corresponding to each track of the target time period from the database including:
  • the track information of each track further includes: a target box information set of the target object;
  • the splicing the first target original image and the first abstract background image to generate a video summary including:
  • the copying the first target original image to the corresponding first location in the first summary background image includes:
  • the pixel value of the overlapping portion of the corresponding trajectory is set to be the mean value of the target original image pixel value of each target object, and the pixel value of the non-overlapping portion is each target.
  • the pixel value of the target original image of the object, and the image to be copied is obtained;
  • Copying each first target original image to a corresponding first location in the first summary background image to generate a video summary including:
  • the method before storing the original image, the mask map, and the target original image in the database, the method further includes:
  • the method before the acquiring the target search condition, the method further includes:
  • the method further includes:
  • the embodiment of the present application provides a device for generating a video summary, where the device includes:
  • a retrieval module configured to acquire a target retrieval condition, and retrieve the established database to obtain a first trajectory set including a trajectory that meets the target retrieval condition, wherein the database stores the video frame from the target object Extracting the trajectory information of each trajectory and the target original image, wherein the trajectory information of each trajectory includes: overlapping state information with other trajectories;
  • a combination module configured to divide at least two tracks that overlap in the first set of tracks into a group according to the overlapping state information, and each group is determined as a combined track;
  • a panning module for translating a trajectory of the to-be-translated trajectory that satisfies the preset panning condition to the same target time segment along the time axis, wherein the to-be-translated trajectory comprises: the combined trajectory and/or the first trajectory set No overlapping trajectories occur;
  • a first acquiring module configured to acquire, from the database, a first target original image corresponding to each track in the target time period
  • a second obtaining module configured to obtain a first abstract background image for generating a video summary
  • a splicing module configured to splicing each of the first target original images and the first abstract background image to generate a video summary.
  • the target retrieval condition includes: retrieving a time period and/or retrieval attribute information of the target object;
  • the retrieval module is specifically configured to:
  • the search module is specifically configured to:
  • the retrieval module is specifically configured to:
  • the device further includes:
  • a first extraction module configured to extract each target object from the input video
  • a second extraction module configured to extract trajectory information and attribute information of each target object, where the trajectory information includes: movement information of the trajectory, and overlapping state information with other trajectories;
  • a first storage module configured to store track information and attribute information of each target object into a video structured target description file
  • a database generating module configured to generate the database according to the video structured target description file
  • a third extraction module configured to extract an original image and a mask map of each frame corresponding to the trajectory information from the video frame that includes the target object, and determine, according to the original image and the mask map, the trajectory information corresponding to the trajectory information The original image of each frame of the frame;
  • a second storage module configured to store the original image, the mask map, and the target original image into the database.
  • the third extraction module includes:
  • a first extraction submodule configured to extract a motion mask of the target object from a video frame that includes the target object
  • a first determining submodule configured to determine an initial mask according to the motion mask
  • a second determining submodule configured to determine an edge point set of the initial mask map
  • a second extraction submodule configured to extract a convex set in the set of edge points, and form a convex punctual point set of the mask map
  • the filler submodule is configured to fill the convex hull corresponding to the convex hull point set to obtain a final mask map.
  • the translation module includes:
  • a queue creation sub-module for establishing a queue to be translated and a summary queue
  • a first storage sub-module configured to use a trajectory in the target trajectory that is not in the target time segment as a trajectory to be translated, and store the trajectory in the to-be-translated trajectory, where the trajectory to be translated is the a combined trajectory in a set of trajectories that is not in the target time period and a trajectory that does not overlap;
  • a second storage submodule configured to store, in the first time set, a track in the target time period into the summary queue
  • a third extraction sub-module configured to sequentially extract a current to-be-translated trajectory from the to-be-translated queue, and obtain a rectangular frame of each target object in the video frame corresponding to the current to-be-translated trajectory according to the original image in the database ;
  • An operation submodule configured to calculate an overlapping area between a rectangular frame of each target object and a rectangular frame of the target object in a video frame corresponding to each track that has been stored in the summary queue;
  • a third storage submodule configured to: when the overlapping area is less than or equal to a preset overlapping parameter threshold, translate the current to-be-translated trajectory to the target time segment, and store the trajectory to the summary queue;
  • the first acquiring module is specifically configured to:
  • the track information of each track further includes: a target box information set of the target object;
  • the splicing module includes:
  • a third determining submodule configured to determine, according to the target box information set of the target object in the trajectory information, a first position of the first target original image in the first abstract background image
  • the video summary generation sub-module is configured to copy each first target original image to a corresponding first position in the first summary background image to generate a video summary.
  • the third determining submodule is specifically configured to:
  • the pixel value of the overlapping portion of the corresponding trajectory is set to be the mean value of the target original image pixel value of each target object, and the pixel value of the non-overlapping portion is each target.
  • the pixel value of the target original image of the object, and the image to be copied is obtained;
  • the splicing module is specifically configured to:
  • the device further includes:
  • An operation module configured to obtain a summary background image according to a preset period
  • a third storage module configured to store the acquired summary background image of each period into the database
  • the second acquiring module includes:
  • a sub-module configured to divide the target time segment into a time sub-segment corresponding to the preset period according to a time corresponding to each preset period;
  • a fourth determining sub-module configured to determine a first preset period corresponding to the time sub-segment including the most trajectory in the target time period
  • the background image obtaining sub-module is configured to obtain, from the database, a first abstract background image corresponding to the first preset period.
  • the device further includes:
  • a display module configured to display a user interaction interface according to a user instruction
  • a receiving module configured to receive and save a target retrieval condition input by the user through the user interaction interface, a preset translation condition, and a preset period for generating a summary background image
  • An execution module configured to perform the step of searching the established database when receiving a startup request input by the user through the user interaction interface
  • the ending module is configured to end the process of generating the video summary when receiving the interrupt request input by the user through the user interaction interface.
  • an embodiment of the present application provides a computer device, including a processor and a memory, where
  • a memory for storing a computer program
  • the method of the first aspect of the embodiments of the present application is implemented when the processor is configured to execute a computer program stored in the memory.
  • the embodiment of the present application provides a storage medium for storing executable code, where the executable code is used to execute at the runtime: the method steps described in the first aspect of the embodiment of the present application.
  • an embodiment of the present application provides an application program for performing, at runtime, the method steps described in the first aspect of the embodiments of the present application.
  • a method and device for generating a video summary provided by an embodiment of the present application, by searching an established database, obtaining a trajectory that satisfies a target retrieval condition, and combining the trajectories that overlap in the trajectory into a combined trajectory, and then shifting differently The combined trajectory of the time period and the trajectory that does not overlap to the same target time period, and finally splicing the target original image and the abstract background image corresponding to the trajectory in the target time segment to generate a video summary;
  • the overlapping trajectories of the plurality of target objects are combined into one combined trajectory, the whole trajectory is shifted on the time axis to avoid losing some trajectories in the overlapping trajectories during translation, thereby improving the visual effect of generating the video summary.
  • FIG. 1 is a schematic flowchart diagram of a method for generating a video summary according to an embodiment of the present application
  • FIG. 2 is a schematic flowchart of a method for generating a video summary according to another embodiment of the present application
  • FIG. 3 is a schematic diagram of a specific process of S205 in the embodiment shown in FIG. 2;
  • FIG. 4 is a schematic diagram of a specific process of S105 in the embodiment shown in FIG. 2;
  • FIG. 5 is a schematic diagram of a specific process of S103 in the embodiment shown in FIG. 2;
  • FIG. 6 is a schematic flowchart diagram of a method for generating a video summary according to still another embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a device for generating a video summary according to an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a device for generating a video summary according to another embodiment of the present application.
  • FIG. 9 is a schematic diagram showing a specific structure of a third extraction module 850 in the embodiment shown in FIG. 8;
  • FIG. 10 is a schematic diagram showing a specific structure of the translation module 730 in the embodiment shown in FIG. 8;
  • FIG. 11 is a schematic structural diagram of a second acquiring module 750 in the embodiment shown in FIG. 8;
  • FIG. 12 is a schematic structural diagram of a device for generating a video summary according to still another embodiment of the present application.
  • the embodiment of the present application provides a method and an apparatus for generating a video summary.
  • a method for generating a video digest provided by the embodiment of the present application is first introduced.
  • An execution body of a method for generating a video summary provided by an embodiment of the present application may be a video summary controller having a function of generating a video summary.
  • the manner of implementing the method for generating a video digest provided by the embodiment of the present application may be at least one of software, a hardware circuit, and a logic circuit that are disposed in the video summary controller.
  • the video summary controller can be applied to a video surveillance system or to a server side of a video website.
  • a method for generating a video summary may include the following steps:
  • the trajectory information of each trajectory extracted from the video frame containing the target object and the target original image are stored in the database, and the trajectory information of each trajectory includes overlapping state information with other trajectories;
  • the database may further include: Attribute information of each target object, a video frame original image, a mask map and/or a background image of the target object in each frame original image;
  • the overlap state information may be an identifier of the trajectory of the overlapping target object, the identifier It may be the name of the track, or the track number, or other feature symbols used to characterize the track;
  • the track information may include: an identifier of the track of the target object, the number of track points of the target object, and each track point of the target object.
  • the frame number, the time information of the trajectory of the target object, the spatial information of the trajectory of the target object, and/or the overlapping state information with other trajectories; the attribute information may include: the appearance time of the target object, the moving direction of the target object, the license plate Number, model, vehicle brand, vehicle color, person's dress color, person's age, person's height, person is Wearing glasses and / or whether the person backpack bag and other information.
  • the target original image may be an original image of the video frame, or may be an image obtained by combining the mask image with the original image of the video frame.
  • the target retrieval condition may be any combination of all the information in the attribute information of the target object, or may be a certain time period.
  • the established database is retrieved, and the trajectory of the target object that meets the target retrieval condition is retrieved from the database.
  • the target retrieval condition includes: retrieving the time period and/or the retrieval attribute information of the target object.
  • the steps of searching the established database may include:
  • the database is retrieved, and the trajectory of each target object in the retrieval time period is obtained.
  • the trajectory of each target object in the retrieval time period is acquired, for example, there is a video with a duration of 5 hours from 7:00 to 12:00, and the target retrieval condition is set.
  • the retrieval time period 8:00 to 9:00
  • the trajectory of each target object in 8:00 to 9:00 is extracted.
  • the time interval of the original video is intercepted, and the target can be extracted.
  • the video of the object's active time period reduces the input data generated by the subsequent video summary and reduces the amount of calculation.
  • the step of searching the established database may include:
  • the database is searched according to the retrieval attribute information of the target object, and the trajectory of each target object matching the retrieval attribute information is obtained.
  • the target search condition is any combination of all the information in the attribute information of the target object
  • the established database is searched, and the trajectory of the target object having the same target search condition is retrieved from the database.
  • the target search condition is: a man who is 1.75 meters in height and between 40 years old and 45 years old, wearing a white down jacket, according to the target search condition, can extract from the database each target object that satisfies the target search condition. Track.
  • the target retrieval condition defines the target object in the video, can ensure the accuracy of the target object, and is easier to extract the trajectory of the target object that satisfies the requirement.
  • the step of searching the established database may include:
  • the database is searched according to the retrieval time period and the retrieval attribute information of the retrieval target object, and the trajectory of each target object matching the retrieval attribute in the retrieval time period is obtained.
  • the trajectory of the target object that matches the attribute information in the retrieval time period can be obtained in this embodiment.
  • This embodiment not only ensures the activity level of the target object but also defines the attributes of the target object.
  • the obtained trajectory is more accurate.
  • the target search condition may be preset, that is, the condition of the target search has been set before the video is obtained, and is mostly used for the case of fixing the scene, for example, the same time period of the same time period is repeated for many days in a row.
  • Target objects for the case of a fixed scene, using the preset target search condition can avoid repeatedly setting the same target search condition; the target search condition can also be input by the user according to the actual situation, for example, the user needs to retrieve a certain time period.
  • the trajectory of the specific target object within the user the user can set the target search condition according to the attribute information of the target object. This is all reasonable.
  • a SQL (Structured Query Language) query can be generated according to the target retrieval condition; the established database is retrieved.
  • the SQL query statement is the most commonly used statement in the SQL query language.
  • the SQL query language is a database query and programming language for accessing data and querying, updating, and managing databases. SQL query statements are selected by selecting commands.
  • the data of the target search condition is selected in the table of the database, and the data may be a track identifier of the target object, and the track information such as time information and spatial information of the track of the target object may be determined by the track identifier.
  • the overlapping state information may be an identifier of a track of the overlapping target object, the identifier may be a name of the track, or a track number, or other feature symbols used to represent the track;
  • the first track set is a slave A set of trajectories in which a plurality of trajectories of a target object satisfying the target retrieval condition extracted in the database are combined.
  • the overlapping trajectories are taken as a whole, and the trajectories that overlap in the first trajectory set are combined to obtain a combined trajectory. Moreover, the order of the trajectories on the time axis in the combined trajectory needs to remain unchanged.
  • the trajectory to be translated includes: a combined trajectory composed of at least two trajectories in the first trajectory set and/or a trajectory in the first trajectory set that does not overlap; the target time period may be preset by the user, It can also be a certain period of time during the playback time of the entire video.
  • the video summary to be generated is not simply spliced by motion segments, but the resulting video summary is concentrated by shifting the trajectory of the target object appearing at different time segments to the same time period.
  • the translation of the trajectory of the target object is a translation of the time of the trajectory and does not include the translation of the spatial position.
  • the target original image may be the original image of the video frame, or may be an image obtained by the mask image and the original image of the video frame, and the first target original image is the target original image of any track, in the first target original
  • the first target original image may be obtained according to the trajectory information of the target object, or may be obtained according to the attribute information of the target object.
  • the image formed by the other content is the background image of the video, and the generated video summary cannot include only the first target original image, and should also include the first abstract background image.
  • the first abstract background image may be a static abstract background image calculated by a static background determining method, or may be a dynamic abstract background image determined by a dynamic background determining method.
  • the background image of the video at different times will be very different.
  • the background image of the segment's video is taken as a summary background image.
  • S106 splicing each first target original image with the first abstract background image to generate a video summary.
  • the first target original image is spliced with the first abstract background image, and the first target original image may be copied to a position where the target object is located in the first abstract background image.
  • the first target original image cannot be simply copied and pasted to obtain a video summary.
  • the embodiment by searching the established database, the trajectories satisfying the target retrieval condition are obtained, and the overlapping trajectories in the trajectories are combined into a combined trajectory, and then the combined trajectories of different time periods are translated and overlapped without overlapping. Tracking to the same target time segment, and finally splicing the target original image and the abstract background image corresponding to the trajectory in the target time segment to generate a video summary; in the embodiment of the present application, when the video summary is generated, overlapping multiple target objects will occur.
  • the trajectories are combined into a combined trajectory, and the whole trajectory is shifted on the time axis to avoid losing some trajectories in the overlapping trajectories during translation, thereby improving the visual effect of generating a video summary.
  • the method for generating the video summary further includes:
  • the flow of the video summary generation is ended; when the startup request input by the user is received, the step of acquiring the target retrieval condition and performing the retrieval on the established database is performed.
  • the user can input an interrupt request at any time. For example, the user finds that the target retrieval condition is set incorrectly, and after receiving the interrupt request, the process of generating the video summary is ended; Then, the user can reset the target retrieval condition according to the requirement, and input a startup request, and after receiving the startup request, re-search the established database according to the target retrieval condition.
  • the method for generating the video summary may further include:
  • the target object is a target with characteristic information, such as a character, a car, a ship, and the like.
  • the trajectory information may include: an identifier of a trajectory of the target object, a number of trajectory points of the target object, a frame number of each trajectory point of the target object, time information of the trajectory of the target object, spatial information of the trajectory of the target object, and / or information such as overlapping state information with other tracks; attribute information may include: the time of occurrence of the target object, the direction of movement of the target object, the license plate number, the model, the brand of the vehicle, the color of the vehicle, the color of the person's dress, the person Information such as the age, the height of the person, whether the person wears glasses and/or whether the person has a backpack or a bag.
  • the video structured target extraction is performed on the video.
  • the video structured target extraction includes target object extraction and target attribute extraction, and the target trajectory description file is obtained by the target object extraction;
  • the target attribute description file, the target track description file and the target attribute description file are included in the video structured object description file.
  • the target object extraction and the target attribute extraction can be performed synchronously.
  • the target attribute extraction is to extract one or more video frame images containing the target object by using a preset video frame extraction method, and then combine the attribute classifier to obtain the target of the comprehensive attribute classifier.
  • the target attribute description file for the object is used to identify a certain type of attribute of the target object, and the information of the class attribute can be obtained through internal analysis of the attribute classifier.
  • the target object extraction is a specific attribute and a motion attribute of the target object extracted according to the target attribute, combined with the specific attribute and the motion attribute of the target object, and multi-target tracking is performed, and the specific attribute of the tracked target object is associated with the motion attribute.
  • the video structured target description file is used to store attribute information and track information of the target object.
  • the existing video structured object description file is usually generated on the industrial computer or server, and can also be implemented through an embedded platform, such as DSP (Digital Signal Processor) and ARM (Advanced Reduced Instruction Set). Computer Machines, a reduced instruction set microprocessor).
  • the database is used to establish a database, and all the attribute information is managed by the database, including: the appearance time of the target object, the moving direction of the target object, the license plate number, the model, and the brand of the vehicle. , the color of the vehicle, the color of the person's dress, the age of the person, the height of the person, whether the person wears glasses and/or whether the person has a backpack or the like.
  • the trajectory of the target object is analyzed, and the trajectory information of the target object is extracted.
  • the trajectory information of the target object can be used to extract the overlapping state information of the target object, so that the overlapping state information of the target object is saved in the database. .
  • the target original image is an image of the target object.
  • the target original image may be obtained by matching the original image of each video frame through a mask map. Since the mask map reflects the contour of the target object, the mask map is only Represents the outline of the target object, not the image content. After the original image of the video frame is matched, the image of the target object of the area of the mask map is obtained, compared to directly extracting the target object from the original image of the video frame. The image is more accurate.
  • the mask map can be extracted by a preset background modeling method, and the preset background modeling method can be any of the color background model method, the average background model method, the Gaussian background model method or the CodeBook background model method. One.
  • a method for generating a video digest in the embodiment of the present application the step of extracting a mask map of each frame corresponding to the trajectory information from the video frame that includes the target object may include:
  • the motion mask is two-dimensional data constituting the mask map
  • the target can be determined by extracting the motion mask of the target object according to the two-dimensional data of the motion mask and the preset background modeling method.
  • the mask map of the object The mask map characterizes the outline of the target object and distinguishes the target original image from the background image in the original video frame.
  • S2054 Extract a convex set in the edge point set to form a convex punctual point set of the mask map.
  • the mask map extraction is incomplete.
  • the mask map can be post-processed. That is, according to the edge point set of the mask map, the convex punctual point set of the mask map is formed, and the convex hull corresponding to the convex hull point set is filled, thereby perfecting the mask map to maximize the contour of the target object.
  • the original image, the mask map and the target original image corresponding to the target object are stored in the database, so that in the step of generating the video summary, the corresponding target original image can be quickly searched from the database according to the attribute information of the target object.
  • the method for generating the video summary may further include:
  • the summary background image is obtained according to the preset period.
  • the preset period is a period for saving the summary background image, and may be set by the user according to actual needs, or may be an empirical value preset by a person skilled in the art.
  • the target original image can be obtained by the mask map and the original image of the video frame. Since the mask map reflects the outline of the target object, the mask map only represents the outline of the target object, and does not contain the image content, and the original video frame. After the image phase and the image of the target object of the region obtained by the mask map, it is more accurate than extracting the image of the target object directly from the original image of the video frame, and the target object is completely extracted, and the image is removed.
  • the background part of the target frame enhances the stitching effect of the original image and the abstract background image. In the process of video analysis, a background image is maintained from beginning to end. This background image is updated every frame. When the preset cycle time is reached, the background image is automatically saved once.
  • the summary background image of each cycle obtained is stored in the database.
  • the background image changes during the playback of the video, the background image changes little compared to the target object. Therefore, it is only necessary to periodically save the background image, which not only ensures the authenticity of the background but also increases the background. Too much amount of calculation.
  • a method for generating a video summary in the embodiment of the present application the step of obtaining a first summary background image for generating a video summary may include:
  • the target time segment is divided into time sub-segments corresponding to the preset period according to the time corresponding to each preset period.
  • the target time segment is divided into corresponding time sub-segments, which can ensure that the corresponding abstract background image can be obtained when the subsequent summary background image for generating the video summary is obtained, without requiring more A multi-algorithm algorithm determines a better summary background image, which can effectively save computational effort.
  • S1052 Determine a first preset period corresponding to a time sub-segment containing the most track in the target time period.
  • the background image in the time period should be used as the abstract background image of the video summary, with only a small part of the track and the actual background. Does not match, compared to the static background image can more realistically reflect the actual trajectory of the target object, improve the effect of the video summary.
  • the trajectories are combined into a combined trajectory, and the whole trajectory is shifted on the time axis to avoid losing some trajectories in the overlapping trajectories during translation, thereby improving the visual effect of generating a video summary.
  • the target original image is obtained by matching the mask image with the original image of the video frame, due to the mask
  • the figure embodies the outline of the target object.
  • the mask map only represents the outline of the target object, and does not contain the image content. After the original image is combined with the original image of the video frame, the image of the target object in the area of the mask map is obtained. It is more accurate to extract the image of the target object directly from the original video frame.
  • a method for generating a video summary may include:
  • S1031 Establish a to-be-translated queue and a summary queue.
  • the queue to be translated is a queue for storing undistributed target trajectories, and the trajectory that can be stored in the queue to be translated is not yet determined in the database whether the preset condition is satisfied, or the target is not in the first trajectory set.
  • the summary queue is a queue for storing tracks for generating video summaries.
  • a track that is not in the target time segment in the first track set is used as a track to be translated, and is stored in the queue to be translated.
  • the to-be-translated trajectory is a combined trajectory in the first trajectory set that is not in the target time segment and a trajectory that does not overlap.
  • S1034 Extract the current to-be-translated trajectory from the queue to be translated, and obtain a rectangular frame of each target object in the video frame corresponding to the current to-be-translated trajectory according to the original image in the database.
  • the position of the target object in the video frame is actually contained in a rectangular sub-picture.
  • the size of the rectangular sub-picture is related to the target extraction method. For example, if the head target is extracted, the size of the rectangular sub-picture is based on the size of each head target. As far as possible, including the range of the area of any individual head target; if, for example, the pedestrian target needs to be extracted, the size of the rectangular sub-picture is determined based on the size of each pedestrian target and the range of any pedestrian target as much as possible.
  • the trajectory of the target object is formed by a plurality of consecutive video frames, and the trajectory of one target object has a plurality of rectangular sub-pictures, forming a rectangular frame to be translated, and the rectangular frame is a rectangular sub-graph containing the target object and having the smallest area. Box.
  • an overlapping parameter is preset, and the preset overlapping parameter threshold may be set according to an actual demand situation and a specific attribute of the target object.
  • the preset overlapping parameter threshold may be set according to an actual demand situation and a specific attribute of the target object.
  • the track in the first track set that is not in the target time segment is stored in the queue to be translated, and the track in the target time segment in the first track set is stored to
  • the summary queue stores the track in the queue to be translated that meets the preset overlap condition to the summary queue.
  • the step of obtaining the first target original image corresponding to each track of the target time segment from the database may include:
  • the trajectory information has a corresponding relationship with the target object and the trajectory of the target object.
  • the trajectory information also has a corresponding relationship with the target original image of the target object. Therefore, the first target original image corresponding to the trajectory information can be extracted from the database according to the trajectory information of the trajectory to be translated.
  • the first target original image is spliced with the first abstract background image to generate a video summary, including:
  • the first position of the first target original image in the first summary background image is determined according to the target frame information set of the target object in the trajectory information.
  • the track information of each track further includes: a target frame information set of the target object, and the target frame information includes coordinates and a width and height of a top left corner of the rectangular frame of the target object, for example, the target frame information is (x, y, w, h), where x is the abscissa of the upper left corner of the rectangular frame of the target object, y is the ordinate of the upper left corner of the rectangular frame of the target object, w is the width of the rectangular frame of the target object, and h is the target object
  • the height of the rectangular frame may also include the coordinates of the center point of the rectangular frame of the target object and the width and height.
  • the target frame information is (m, n, p, q), where m is the rectangle of the target object.
  • the absc issa of the center point of the frame
  • n is the ordinate of the center point of the rectangle of the target object
  • p is the width of the rectangle of the target object
  • q is the height of the rectangle of the target object.
  • the target forms a set of target frame information in the moving process, and the set includes information such as coordinates, length, direction, and the like of the first target original image in the first abstract background image, and therefore, according to the target frame information of the target object in the trajectory information
  • the set determines the first position of the first target original image in the first summary background image.
  • the first target original image is copied to the corresponding first position in the first abstract background image to generate a video summary.
  • the method used for splicing may cause the target original image to have an error in the position and actual position in the abstract background image.
  • the target original image includes a partial background image, which is different from the background image to be stitched, so that the position of the target original image in the abstract background image does not match the original image; therefore, in this embodiment, the target original image may be The original image of the video frame and the mask image are combined with the obtained image.
  • the target original image does not include the background image, and the true position of the target object in the abstract background image is reflected according to the mask map, and the target original image and the abstract background image are displayed.
  • the splicing can accurately copy the target original image to the contour area outlined by the mask map, which can ensure the effect of splicing the target original image and the abstract background image, thereby improving the visual effect of the video summary.
  • the step of copying each of the first target original images to the corresponding first location in the first summary background image may include:
  • the pixel value of the overlapping portion of the corresponding trajectory is set to be the mean value of the target original image pixel values of each target object, and the pixels of the non-overlapping portion.
  • the value is the pixel value of the target original image of each target object, and the image to be copied is obtained;
  • each of the first target original images that are not overlapped with the target object and the target object are copied to the corresponding first position in the first abstract background image to generate a video summary.
  • the trajectory of the target object that generates the video summary overlaps and does not overlap.
  • the target original image can be directly copied into the summary background image, and overlap occurs.
  • the average value of the pixel values of the target original image of each target object is taken as the pixel value of the image of the overlapping portion, and then spliced to generate a video summary.
  • the overlapping portion can also take the weight value of the pixel value of the target original image of each target object, which is reasonable.
  • a method for generating a video summary provided by an embodiment of the present application may further include: before the step of acquiring a target search condition, the method for generating a video summary may further include:
  • S601 Display a user interaction interface according to a user instruction.
  • the user interaction interface is an interface for realizing interaction between the user and the system, and the user interaction interface may be a dialog box or a selection screen in the webpage. It is used to prompt the user to input a target search, a preset panning condition, and a preset period, wherein the preset period is used to generate a summary background image.
  • S602. Receive and save a target retrieval condition input by the user through the user interaction interface, a preset translation condition, and a preset period for generating a summary background image.
  • the trajectories are combined into a combined trajectory, and the whole trajectory is shifted on the time axis to avoid losing some trajectories in the overlapping trajectories during translation, thereby improving the visual effect of generating a video summary.
  • the user is allowed to set the target retrieval condition and setting parameters, thereby improving the flexibility of the application and bringing convenience to the user.
  • the method for generating the video summary may further include:
  • the step of retrieving the established database is performed; and when the interrupt request input by the user through the user interaction interface is received, the process of generating the video summary is ended.
  • the user can input an interrupt request at any time. For example, the user finds that the target retrieval condition is set incorrectly, and after receiving the interrupt request, the process of generating the video summary is ended; Then, the user can reset the target retrieval condition according to the requirement, and input a startup request, and after receiving the startup request, re-search the established database according to the target retrieval condition.
  • the embodiment of the present application provides a device for generating a video summary.
  • the device for generating a video summary may include:
  • the retrieval module 710 is configured to acquire a target retrieval condition, and perform a retrieval on the established database to obtain a first trajectory set including a trajectory that meets the target retrieval condition, where the database stores a video frame from the target object.
  • the combining module 720 is configured to divide at least two tracks that overlap in the first track set into a group according to the overlapping state information, and each group is determined as a combined track;
  • a panning module 730 configured to translate, according to a time axis, a trajectory that satisfies a preset panning condition in the trajectory to be translated to the same target time segment, where the trajectory to be translated includes: the combined trajectory and/or the first trajectory set There is no overlapping trajectory in the middle;
  • a first acquiring module 740 configured to acquire, from the database, a first target original image corresponding to each track in the target time period
  • a second obtaining module 750 configured to obtain a first abstract background image for generating a video summary
  • the splicing module 760 is configured to splicing the first target original image and the first abstract background image to generate a video summary.
  • the embodiment by searching the established database, the trajectories satisfying the target retrieval condition are obtained, and the overlapping trajectories in the trajectories are combined into a combined trajectory, and then the combined trajectories of different time periods are translated and overlapped without overlapping. Tracking to the same target time segment, and finally splicing the target original image and the abstract background image corresponding to the trajectory in the target time segment to generate a video summary; in the embodiment of the present application, when the video summary is generated, overlapping multiple target objects will occur.
  • the trajectories are combined into a combined trajectory, and the whole trajectory is shifted on the time axis to avoid losing some trajectories in the overlapping trajectories during translation, thereby improving the visual effect of generating a video summary.
  • the target search condition may include: retrieving a time period and/or retrieval attribute information of the target object;
  • the retrieval module may specifically be used to:
  • the search module may specifically be used to:
  • the retrieval module may specifically be used to:
  • a device for generating a video summary may further include:
  • a first extraction module 810 configured to extract each target object from the input video
  • the second extraction module 820 is configured to extract the trajectory information and the attribute information of each target object, where the trajectory information includes: movement information of the trajectory and overlapping state information with other trajectories;
  • a first storage module 830 configured to store the trajectory information and the attribute information of each target object into the video structured target description file
  • a database generating module 840 configured to generate the database according to the video structured target description file
  • a third extraction module 850 configured to extract, from a video frame that includes the target object, an original image and a mask map of each frame corresponding to the trajectory information, and determine the trajectory information according to the original image and the mask map. Corresponding target image of each frame;
  • the second storage module 860 is configured to store the original image, the mask map, and the target original image into the database.
  • the trajectories are combined into a combined trajectory, and the whole trajectory is shifted on the time axis to avoid losing some trajectories in the overlapping trajectories during translation, thereby improving the visual effect of generating a video summary.
  • the target original image is obtained by matching the mask image with the original image of the video frame, due to the mask
  • the figure embodies the outline of the target object.
  • the mask map only represents the outline of the target object, and does not contain the image content. After the original image is combined with the original image of the video frame, the image of the target object in the area of the mask map is obtained. It is more accurate to extract the image of the target object directly from the original video frame.
  • the third extraction module 850 may include:
  • a first extraction submodule 851 configured to extract a motion mask of the target object from a video frame that includes the target object
  • a first determining submodule 852 configured to determine an initial mask according to the motion mask
  • a second determining submodule 853 configured to determine an edge point set of the initial mask map
  • a second extraction sub-module 854 configured to extract a convex set in the set of edge points, and form a convex punctual point set of the mask map;
  • the filling sub-module 855 is configured to fill the convex hull corresponding to the convex punctual point set to obtain a final mask map.
  • the translation module 730 can include:
  • a queue establishment sub-module 731 configured to establish a queue to be translated and a summary queue
  • a first storage sub-module 732 configured to use a trajectory that is not in the target time period in the target trajectory as a to-be-translated trajectory, and store the trajectory to be translated into a queue to be translated, where the to-be-translated trajectory is the a combined trajectory in the first trajectory set that is not in the target time period and a trajectory that does not overlap;
  • a second storage submodule 733 configured to store, in the first time set, a track in the target time period into the summary queue
  • a third extraction sub-module 734 configured to sequentially extract a current to-be-translated trajectory from the to-be-translated trajectory, and obtain a rectangle of each target object in the video frame corresponding to the current to-be-translated trajectory according to an original image in the database frame;
  • the operation sub-module 735 is configured to calculate an overlapping area between the rectangular frame of each target object and the rectangular frame of the target object in the video frame corresponding to each track that has been stored in the summary queue;
  • the third storage sub-module 736 is configured to: when the overlapping area is less than or equal to a preset overlapping parameter threshold, translate the current to-be-translated trajectory to the target time segment, and store the trajectory to the summary queue;
  • the first obtaining module 740 is specifically configured to:
  • the track information of each track may further include: a target box information set of the target object;
  • the splicing module 760 can include:
  • a third determining submodule configured to determine, according to the target box information set of the target object in the trajectory information, a first position of the first target original image in the first abstract background image
  • the video summary generation sub-module is configured to copy each first target original image to a corresponding first position in the first summary background image to generate a video summary.
  • the third determining submodule is specifically configured to:
  • the pixel value of the overlapping portion of the corresponding trajectory is set to be the mean value of the target original image pixel value of each target object, and the pixel value of the non-overlapping portion is each target.
  • the pixel value of the target original image of the object, and the image to be copied is obtained;
  • the splicing module 760 can be specifically configured to:
  • the device may further include:
  • An operation module configured to obtain a summary background image according to a preset period
  • a third storage module configured to store the acquired summary background image of each period into the database
  • the second obtaining module 750 may include:
  • the dividing sub-module 751 is configured to divide the target time segment into time sub-segments corresponding to the preset period according to the time corresponding to each preset period;
  • a fourth determining sub-module 752 configured to determine a first preset period corresponding to the time sub-segment including the most track in the target time period
  • the background image obtaining sub-module 753 is configured to obtain, from the database, a first summary background image corresponding to the first preset period.
  • the apparatus may further include:
  • the display module 1210 is configured to display a user interaction interface according to a user instruction
  • the receiving module 1220 is configured to receive and save a target retrieval condition input by the user through the user interaction interface, a preset translation condition, and a preset period for generating a summary background image.
  • the device shown may further include:
  • An execution module configured to perform the step of searching the established database when receiving a startup request input by the user through the user interaction interface
  • the ending module is configured to end the process of generating the video summary when receiving the interrupt request input by the user through the user interaction interface.
  • the video summary generating apparatus in another embodiment of the embodiment of the present application may include: a retrieval module 710, a combination module 720, a translation module 730, a first acquisition module 740, a second acquisition module 750, and a splicing module.
  • a retrieval module 710 may include: a retrieval module 710, a combination module 720, a translation module 730, a first acquisition module 740, a second acquisition module 750, and a splicing module.
  • the embodiment of the present application further provides a computer device, including a processor and a memory, where
  • a memory for storing a computer program
  • the processor when used to execute the computer program stored in the memory, implements all the steps of the method for generating the video summary provided by the embodiment of the present application.
  • the above image collector may include an IPC (IP Camera), a smart camera, and the like.
  • the above memory may include a RAM (Random Access Memory), and may also include NVM (Non-Volatile Memory), such as at least one disk storage.
  • the memory may also be at least one storage device located away from the aforementioned processor.
  • the processor may be a general-purpose processor, including a CPU (Central Processing Unit), an NP (Network Processor), or the like; or a DSP (Digital Signal Processing) or an ASIC (Application) Specific Integrated Circuit, FPGA (Field-Programmable Gate Array) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components.
  • CPU Central Processing Unit
  • NP Network Processor
  • DSP Digital Signal Processing
  • ASIC Application) Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • other programmable logic device discrete gate or transistor logic device, discrete hardware components.
  • the processor can realize: by searching the established database, obtaining a trajectory that satisfies the target retrieval condition, and the trajectories are generated in the trajectory.
  • the stacked trajectories are combined into a combined trajectory, and then the combined trajectories of different time periods and the trajectories that do not overlap are translated to the same target time period, and finally the target original image and the abstract background image corresponding to the trajectory in the target time segment are spliced and generated.
  • the video summary in the embodiment of the present application, when the video summary is generated, the trajectories of the overlapping multiple target objects are combined into one combined trajectory, and the whole trajectory is shifted on the time axis to avoid losing some of the overlapping trajectories during translation. Tracks to improve the visuals of generating video summaries.
  • the embodiment of the present application provides a storage medium for storing a computer program, and when the computer program is executed by the processor, the method for generating the video summary is implemented. All the steps.
  • the storage medium stores an application that executes the method for generating the video digest provided by the embodiment of the present application at runtime, and thus can implement: by searching the established database, obtaining a trajectory that satisfies the target retrieval condition. Combine the trajectories that overlap in these trajectories into combined trajectories, then translate the combined trajectories of different time periods and the trajectories that have not overlapped to the same target time period, and finally the target original images and abstracts corresponding to the trajectories in the target time period.
  • the background image is spliced to generate a video summary.
  • the trajectories of the overlapping multiple target objects are combined into one combined trajectory, and the whole time is shifted on the time axis to avoid losing the overlap during translation.
  • the embodiment of the present application provides an application program for performing the following steps of the method for generating the video summary provided by the embodiment of the present application.
  • the application performs the method for generating the video summary provided by the embodiment of the present application at runtime, so that the trajectory that satisfies the target retrieval condition is obtained by searching the established database, and the trajectory occurs in the trajectory.
  • the overlapping trajectories are combined into a combined trajectory, and then the combined trajectories of different time periods and the trajectories that do not overlap are translated to the same target time period, and finally the target original image and the abstract background image corresponding to the trajectory in the target time segment are spliced.
  • Generating a video summary is generated by the embodiment of the present application at runtime, so that the trajectory that satisfies the target retrieval condition is obtained by searching the established database, and the trajectory occurs in the trajectory.
  • the overlapping trajectories are combined into a combined trajectory, and then the combined trajectories of different time periods and the trajectories that do not overlap are translated to the same target time period, and finally the target original image and the abstract background image corresponding to the trajectory
  • the embodiment of the present application When generating a video summary, the embodiment of the present application combines the trajectories of multiple overlapping target objects into a combined trajectory, and performs overall translation on the time axis to avoid losing one of the overlapping trajectories during translation. These tracks improve the visual effect of generating a video summary.
  • the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

A video abstract generation method and device. The video abstract generation method comprises: obtaining a target search condition, and searching an established database to obtain a first track set containing tracks satisfying the target search condition (S101); classifying at least two tracks, which overlap with each other, in the first track set into a group according to overlapping state information, each group being determined as a combined track (S102); performing translation on tracks, which satisfies a preset translation condition, in tracks to be translated to the same target time period along the timeline (S103); obtaining a first target original image for each track in the target time period from the database (S104); obtaining a first abstract background image for generating a video abstract (S105); and splicing the first target original images and the first abstract background image to generate the video abstract (S106). The method can improve the visual effect of a video abstract generated in the case that a target track is complex.

Description

一种视频摘要的生成方法及装置Method and device for generating video summary

本申请要求于2017年02月17日提交中国专利局、申请号为201710087044.6发明名称为“一种视频摘要的生成方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims the priority of the Chinese Patent Application entitled "A Method and Apparatus for Generating a Video Abstract" by the Chinese Patent Office, filed on Feb. 17, 2017, the entire disclosure of which is hereby incorporated by reference. in.

技术领域Technical field

本申请涉及视频处理技术领域,特别是涉及一种视频摘要的生成方法及装置。The present application relates to the field of video processing technologies, and in particular, to a method and an apparatus for generating a video digest.

背景技术Background technique

随着对视频数据处理要求的不断提高以及视频数据量的日益庞大,用户希望能够针对一长段视频建立一个摘要,通过快速浏览以便于更好的利用该视频,例如应用于维护社会治安、打击违法犯罪现象的视频监控。因此,视频摘要技术应运而生。视频摘要技术是对视频的结构和内容进行分析,从原视频中提取出有意义的部分,即运动目标,并将该运动目标与背景场景以特定方式进行组合,形成简洁的能够充分表现视频内容的概要;视频摘要是对长视频内容的简单概括,通常用一段静态或动态的图像序列来表示,并对原始信息予以保留。With the ever-increasing demands for video data processing and the increasing volume of video data, users hope to be able to create a summary for a long video, and use a quick view to make better use of the video, for example, to maintain social security, combat Video surveillance of crimes. Therefore, video summary technology came into being. The video summary technology analyzes the structure and content of the video, extracts the meaningful part from the original video, that is, the moving target, and combines the moving target with the background scene in a specific way to form a simple and fully capable video content. A summary of a video is a simple summary of long video content, usually represented by a sequence of static or dynamic images, and the original information is preserved.

相应的生成视频摘要的技术是基于目标对象的视频摘要的生成技术,该技术包含如下步骤:首先,通过对输入视频的分析,生成视频结构化描述文件,并根据该视频结构化描述文件建立相关数据库,其中,视频结构化描述文件中包含有视频中的目标对象的属性信息和目标对象的轨迹信息;其次,对建立的数据库进行检索,提取运动目标的轨迹信息;最后,再对各目标对象的轨迹进行分析,在时间轴上平移目标轨迹,将不同时间的目标对象的轨迹排布到同一画面中,以此生成摘要视频。该技术能够满足用户对特定目标对象生成视频摘要的需求,且生成视频摘要的时长短、运动紧凑,具有较高的浓缩比。The corresponding technique for generating a video summary is a technique for generating a video summary based on a target object, and the method includes the following steps: First, by analyzing an input video, generating a video structured description file, and establishing a correlation according to the video structured description file; a database, wherein the video structured description file includes attribute information of the target object in the video and trajectory information of the target object; secondly, the established database is retrieved to extract the trajectory information of the moving target; and finally, each target object is further The trajectory is analyzed, the target trajectory is translated on the time axis, and the trajectory of the target object at different times is arranged in the same picture to generate a summary video. The technology can meet the user's need to generate a video summary for a specific target object, and generate a video summary with a short duration, a compact motion, and a high concentration ratio.

但是,在目标对象的轨迹排布时,实际情况复杂,存在多个目标对象的轨迹交叠的情况,由于上述技术通过重新排列目标对象的轨迹的方法,将发生交叠的目标对象的轨迹进行排除,故在多个目标对象的轨迹发生交叠时, 会造成摘要视频丢失关联目标对象的信息;并且由于关联目标对象信息的丢失,使得在生成的视频摘要中会频繁出现一个目标对象突然产生和消失的现象,导致视觉效果不佳。However, when the trajectory of the target object is arranged, the actual situation is complicated, and there are cases where the trajectories of the plurality of target objects overlap. Since the above technique repeats the trajectory of the target object, the trajectory of the overlapping target object is performed. Excluded, when the trajectories of multiple target objects overlap, the summary video loses the information of the associated target object; and due to the loss of the associated target object information, a target object is frequently generated in the generated video summary. And the phenomenon of disappearing, resulting in poor visual effects.

发明内容Summary of the invention

本申请实施例的目的在于提供一种视频摘要的生成方法及装置,以提高在目标轨迹复杂的情况下生成的视频摘要的视觉效果。具体技术方案如下:The purpose of the embodiment of the present application is to provide a method and a device for generating a video summary, so as to improve the visual effect of the video summary generated when the target trajectory is complicated. The specific technical solutions are as follows:

第一方面,本申请实施例提供了一种视频摘要的生成方法,所述方法包括:In a first aspect, the embodiment of the present application provides a method for generating a video summary, where the method includes:

获取目标检索条件,对已建立的数据库进行检索,得到包含符合所述目标检索条件的轨迹的第一轨迹集合,其中,所述数据库中存储有从包含目标对象的视频帧中提取的每条轨迹的轨迹信息和目标原图,所述每条轨迹的轨迹信息中包括:与其他轨迹间的交叠状态信息;Obtaining a target retrieval condition, and retrieving the established database to obtain a first trajectory set including a trajectory that meets the target retrieval condition, wherein the database stores each trajectory extracted from a video frame containing the target object Trajectory information and target original image, wherein the trajectory information of each trajectory includes: overlapping state information with other trajectories;

根据所述交叠状态信息,将所述第一轨迹集合中发生交叠的至少两条轨迹划分为一组,每组确定为一条组合轨迹;And dividing, according to the overlapping state information, at least two tracks that overlap in the first set of tracks into a group, each group being determined as a combined track;

沿时间轴将待平移轨迹中满足预设平移条件的轨迹平移至同一目标时间段,其中,所述待平移轨迹包括:所述组合轨迹和/或所述第一轨迹集合中未发生交叠的轨迹;Translating a trajectory of the to-be-translated trajectory that meets the preset translation condition to the same target time period along the time axis, wherein the trajectory to be translated includes: the combined trajectory and/or the overlapping of the first trajectory set Trajectory

从所述数据库中获取所述目标时间段中的各轨迹对应的第一目标原图;Acquiring, from the database, a first target original image corresponding to each track in the target time period;

获得用于生成视频摘要的第一摘要背景图;Obtaining a first summary background image for generating a video summary;

将各第一目标原图与所述第一摘要背景图进行拼接,生成视频摘要。The first target original image is spliced with the first abstract background image to generate a video summary.

可选的,所述目标检索条件包括:检索时间段和/或目标对象的检索属性信息;Optionally, the target retrieval condition includes: retrieving a time period and/or retrieval attribute information of the target object;

当检索条件仅包括检索时间段时,所述对已建立的数据库进行检索,包括:When the retrieval condition includes only the retrieval time period, the searching for the established database includes:

根据所述检索时间段,对所述数据库进行检索,获得所述检索时间段中各目标对象的轨迹;Searching the database according to the retrieval time period to obtain a trajectory of each target object in the retrieval time period;

当检索条件仅包括目标对象的检索属性信息时,所述对已建立的数据库进行检索,包括:When the retrieval condition includes only the retrieval attribute information of the target object, the searching for the established database includes:

根据所述目标对象的检索属性信息,对所述数据库进行检索,获得与所述检索属性信息匹配的各目标对象的轨迹;Searching the database according to the retrieval attribute information of the target object, and obtaining a trajectory of each target object that matches the retrieval attribute information;

当检索条件包括检索时间段和目标对象的检索属性信息时,所述对已建立的数据库进行检索,包括:When the retrieval condition includes the retrieval time period and the retrieval attribute information of the target object, the searching for the established database includes:

根据所述检索时间段和所述检索目标对象的检索属性信息,对所述数据库进行检索,获得所述检索时间段中,与所述检索属性匹配的各目标对象的轨迹。And searching the database according to the retrieval time period and the retrieval attribute information of the retrieval target object, and obtaining a trajectory of each target object that matches the retrieval attribute in the retrieval time period.

可选的,在所述获取目标检索条件之前,所述方法还包括:Optionally, before the acquiring the target search condition, the method further includes:

从输入的视频中提取各目标对象;Extract each target object from the input video;

提取各目标对象的轨迹信息和属性信息,其中,轨迹信息中包括:轨迹的移动信息、及与其他轨迹间的交叠状态信息;Extracting trajectory information and attribute information of each target object, where the trajectory information includes: movement information of the trajectory and overlapping state information with other trajectories;

将各目标对象的轨迹信息和属性信息存储至视频结构化目标描述文件中;Storing the track information and attribute information of each target object into a video structured target description file;

根据所述视频结构化目标描述文件,生成所述数据库;Generating the database according to the video structured target description file;

从包含目标对象的视频帧中,提取轨迹信息对应的每一帧的原图和掩码图,根据所述原图及所述掩码图确定与所述轨迹信息对应的每一帧的目标原图;Extracting, from a video frame that includes the target object, an original image and a mask map of each frame corresponding to the trajectory information, and determining, according to the original image and the mask map, a target original of each frame corresponding to the trajectory information. Figure

将所述原图、所述掩码图及所述目标原图存储至所述数据库中。The original image, the mask map, and the target original image are stored in the database.

可选的,所述从包含目标对象的视频帧中,提取轨迹信息对应的每一帧的掩码图,包括:Optionally, the mask map of each frame corresponding to the track information is extracted from the video frame that includes the target object, including:

从包含目标对象的视频帧中,提取所述目标对象的运动掩码;Extracting a motion mask of the target object from a video frame containing the target object;

根据所述运动掩码,确定初始的掩码图;Determining an initial mask map according to the motion mask;

确定所述初始的掩码图的边缘点集;Determining an edge point set of the initial mask map;

提取所述边缘点集中的凸集,构成所述掩码图的凸包点集;Extracting a convex set in the set of edge points to form a convex set of points of the mask map;

填充所述凸包点集对应的凸包,得到最终的掩码图。Filling the convex hull corresponding to the convex hull point set to obtain a final mask map.

可选的,所述沿时间轴将待平移轨迹中满足预设平移条件的轨迹平移至同一目标时间段,包括:Optionally, the trajectory of the to-be-translated trajectory that meets the preset translation condition is translated to the same target time period along the time axis, including:

建立待平移队列及摘要队列;Establish a queue to be translated and a summary queue;

将所述第一轨迹集合中未在所述目标时间段中的轨迹作为待平移轨迹,并存储至待平移队列中,其中,所述待平移轨迹为所述第一轨迹集合中的未在所述目标时间段中的组合轨迹和未发生交叠的轨迹;And a track that is not in the target time segment in the first track set is used as a to-be-translated track, and is stored in a to-be-translated queue, where the to-be-translated track is not in the first track set. a combined trajectory in the target time period and a trajectory in which no overlap occurs;

将所述第一轨迹集合中的在所述目标时间段中的轨迹存储至所述摘要队列中;Storing the trajectory in the target time period in the first trajectory set into the summary queue;

依次从所述待平移队列中提取当前待平移轨迹,并根据所述数据库中的原图,得到所述当前待平移轨迹对应的视频帧中各目标对象的矩形框;Extracting the current to-be-translated trajectory from the to-be-translated queue, and obtaining a rectangular frame of each target object in the video frame corresponding to the current to-be-translated trajectory according to the original image in the database;

计算各目标对象的矩形框分别与已存储至所述摘要队列的每条轨迹对应的视频帧中目标对象的矩形框之间的重叠面积;Calculating an overlapping area between a rectangular frame of each target object and a rectangular frame of the target object in the video frame corresponding to each track that has been stored in the summary queue;

在所述重叠面积小于或等于预设重叠参数阈值时,将所述当前待平移轨迹平移至所述目标时间段,并存储至所述摘要队列;When the overlapping area is less than or equal to a preset overlapping parameter threshold, the current to-be-translated trajectory is translated to the target time period, and stored in the summary queue;

所述从所述数据库中获取所述目标时间段的各轨迹对应的第一目标原图的步骤,包括:And the step of acquiring the first target original image corresponding to each track of the target time period from the database, including:

获取所述摘要队列中的各轨迹对应的第一目标原图。Obtaining a first target original image corresponding to each track in the summary queue.

可选的,所述每条轨迹的轨迹信息中还包括:所述目标对象的目标框信息集合;Optionally, the track information of each track further includes: a target box information set of the target object;

所述将所述各第一目标原图与所述第一摘要背景图进行拼接,生成视频摘要,包括:The splicing the first target original image and the first abstract background image to generate a video summary, including:

根据所述轨迹信息中所述目标对象的目标框信息集合,确定所述第一目标原图在所述第一摘要背景图中的第一位置;Determining, according to the target frame information set of the target object in the trajectory information, a first position of the first target original image in the first abstract background image;

将各第一目标原图复制到所述第一摘要背景图中对应的第一位置,生成视频摘要。Copying each first target original image to a corresponding first position in the first abstract background image to generate a video summary.

可选的,所述将各第一目标原图复制到所述第一摘要背景图中对应的第一位置,包括:Optionally, the copying the first target original image to the corresponding first location in the first summary background image includes:

若各第一目标原图中目标对象有交叠,则设置所对应的轨迹的交叠部分的像素值为各目标对象的目标原图像素值的均值、不交叠部分的像素值为各目标对象的目标原图的像素值,得到待复制图;If the target objects in the first target original image overlap, the pixel value of the overlapping portion of the corresponding trajectory is set to be the mean value of the target original image pixel value of each target object, and the pixel value of the non-overlapping portion is each target. The pixel value of the target original image of the object, and the image to be copied is obtained;

所述将各第一目标原图复制到所述第一摘要背景图中对应的第一位置,生成视频摘要,包括:Copying each first target original image to a corresponding first location in the first summary background image to generate a video summary, including:

将各待复制图和目标对象没有交叠的各第一目标原图复制到所述第一摘要背景图中对应的第一位置,生成视频摘要。Copying each of the first target original images that do not overlap each of the to-be-copied image and the target object to a corresponding first position in the first summary background image to generate a video summary.

可选的,在将所述原图、所述掩码图及所述目标原图存储至所述数据库中之前,所述方法还包括:Optionally, before storing the original image, the mask map, and the target original image in the database, the method further includes:

按预设周期,获取摘要背景图;Obtain a summary background image according to a preset period;

将获取的每个周期的摘要背景图存储至所述数据库中;Storing a summary background image of each cycle acquired into the database;

所述获得用于生成视频摘要的第一摘要背景图,包括:The obtaining a first abstract background image for generating a video summary includes:

按各预设周期所对应的时间,将所述目标时间段划分为与预设周期对应的时间子段;Dividing the target time period into time sub-segments corresponding to preset periods according to time corresponding to each preset period;

确定所述目标时间段中、包含轨迹最多的时间子段对应的第一预设周期;Determining, in the target time period, a first preset period corresponding to a time sub-segment containing the most track;

从所述数据库中,获得所述第一预设周期对应的第一摘要背景图。Obtaining, from the database, a first summary background image corresponding to the first preset period.

可选的,在所述获取目标检索条件之前,所述方法还包括:Optionally, before the acquiring the target search condition, the method further includes:

按照用户指令,显示用户交互界面;Display the user interaction interface according to user instructions;

接收并保存用户通过所述用户交互界面输入的目标检索条件、预设平移条件和用于生成摘要背景图的预设周期;Receiving and saving a target retrieval condition input by the user through the user interaction interface, a preset translation condition, and a preset period for generating a summary background image;

所述方法还包括:The method further includes:

在接收到用户通过所述用户交互界面输入的启动请求时,执行所述对已建立的数据库进行检索的步骤;Performing the step of retrieving the established database when receiving a startup request input by the user through the user interaction interface;

在接收到用户通过所述用户交互界面输入的中断请求时,结束视频摘要生成的流程。When the interrupt request input by the user through the user interaction interface is received, the process of generating the video summary is ended.

第二方面,本申请实施例提供了一种视频摘要的生成装置,所述装置包括:In a second aspect, the embodiment of the present application provides a device for generating a video summary, where the device includes:

检索模块,用于获取目标检索条件,对已建立的数据库进行检索,得到包含符合所述目标检索条件的轨迹的第一轨迹集合,其中,所述数据库中存储有从包含目标对象的视频帧中提取的每条轨迹的轨迹信息和目标原图,所述每条轨迹的轨迹信息中包括:与其他轨迹间的交叠状态信息;a retrieval module, configured to acquire a target retrieval condition, and retrieve the established database to obtain a first trajectory set including a trajectory that meets the target retrieval condition, wherein the database stores the video frame from the target object Extracting the trajectory information of each trajectory and the target original image, wherein the trajectory information of each trajectory includes: overlapping state information with other trajectories;

组合模块,用于根据所述交叠状态信息,将所述第一轨迹集合中发生交叠的至少两条轨迹划分为一组,每组确定为一条组合轨迹;a combination module, configured to divide at least two tracks that overlap in the first set of tracks into a group according to the overlapping state information, and each group is determined as a combined track;

平移模块,用于沿时间轴将待平移轨迹中满足预设平移条件的轨迹平移至同一目标时间段,其中,所述待平移轨迹包括:所述组合轨迹和/或所述第一轨迹集合中未发生交叠的轨迹;a panning module for translating a trajectory of the to-be-translated trajectory that satisfies the preset panning condition to the same target time segment along the time axis, wherein the to-be-translated trajectory comprises: the combined trajectory and/or the first trajectory set No overlapping trajectories occur;

第一获取模块,用于从所述数据库中获取所述目标时间段中的各轨迹对应的第一目标原图;a first acquiring module, configured to acquire, from the database, a first target original image corresponding to each track in the target time period;

第二获取模块,用于获得用于生成视频摘要的第一摘要背景图;a second obtaining module, configured to obtain a first abstract background image for generating a video summary;

拼接模块,用于将各第一目标原图与所述第一摘要背景图进行拼接,生成视频摘要。a splicing module, configured to splicing each of the first target original images and the first abstract background image to generate a video summary.

可选的,所述目标检索条件包括:检索时间段和/或目标对象的检索属性信息;Optionally, the target retrieval condition includes: retrieving a time period and/or retrieval attribute information of the target object;

当检索条件仅包括检索时间段时,所述检索模块,具体用于:When the retrieval condition includes only the retrieval time period, the retrieval module is specifically configured to:

根据所述检索时间段,对所述数据库进行检索,获得所述检索时间段中各目标对象的轨迹;Searching the database according to the retrieval time period to obtain a trajectory of each target object in the retrieval time period;

当检索条件仅包括目标对象的检索属性信息时,所述检索模块,具体用于:When the search condition includes only the search attribute information of the target object, the search module is specifically configured to:

根据所述目标对象的检索属性信息,对所述数据库进行检索,获得与所 述检索属性信息匹配的各目标对象的轨迹;And searching the database according to the retrieval attribute information of the target object, and obtaining a trajectory of each target object that matches the retrieval attribute information;

当检索条件包括检索时间段和目标对象的检索属性信息时,所述检索模块,具体用于:When the retrieval condition includes the retrieval time period and the retrieval attribute information of the target object, the retrieval module is specifically configured to:

根据所述检索时间段和所述检索目标对象的检索属性信息,对所述数据库进行检索,获得所述检索时间段中,与所述检索属性匹配的各目标对象的轨迹。And searching the database according to the retrieval time period and the retrieval attribute information of the retrieval target object, and obtaining a trajectory of each target object that matches the retrieval attribute in the retrieval time period.

可选的,所述装置还包括:Optionally, the device further includes:

第一提取模块,用于从输入的视频中提取各目标对象;a first extraction module, configured to extract each target object from the input video;

第二提取模块,用于提取各目标对象的轨迹信息和属性信息,其中,轨迹信息中包括:轨迹的移动信息、及与其他轨迹间的交叠状态信息;a second extraction module, configured to extract trajectory information and attribute information of each target object, where the trajectory information includes: movement information of the trajectory, and overlapping state information with other trajectories;

第一存储模块,用于将各目标对象的轨迹信息和属性信息存储至视频结构化目标描述文件中;a first storage module, configured to store track information and attribute information of each target object into a video structured target description file;

数据库生成模块,用于根据所述视频结构化目标描述文件,生成所述数据库;a database generating module, configured to generate the database according to the video structured target description file;

第三提取模块,用于从包含目标对象的视频帧中,提取轨迹信息对应的每一帧的原图和掩码图,根据所述原图及所述掩码图确定与所述轨迹信息对应的每一帧的目标原图;a third extraction module, configured to extract an original image and a mask map of each frame corresponding to the trajectory information from the video frame that includes the target object, and determine, according to the original image and the mask map, the trajectory information corresponding to the trajectory information The original image of each frame of the frame;

第二存储模块,用于将所述原图、所述掩码图及所述目标原图存储至所述数据库中。And a second storage module, configured to store the original image, the mask map, and the target original image into the database.

可选的,所述第三提取模块,包括:Optionally, the third extraction module includes:

第一提取子模块,用于从包含目标对象的视频帧中,提取所述目标对象的运动掩码;a first extraction submodule, configured to extract a motion mask of the target object from a video frame that includes the target object;

第一确定子模块,用于根据所述运动掩码,确定初始的掩码图;a first determining submodule, configured to determine an initial mask according to the motion mask;

第二确定子模块,用于确定所述初始的掩码图的边缘点集;a second determining submodule, configured to determine an edge point set of the initial mask map;

第二提取子模块,用于提取所述边缘点集中的凸集,构成所述掩码图的凸包点集;a second extraction submodule, configured to extract a convex set in the set of edge points, and form a convex punctual point set of the mask map;

填充子模块,用于填充所述凸包点集对应的凸包,得到最终的掩码图。The filler submodule is configured to fill the convex hull corresponding to the convex hull point set to obtain a final mask map.

可选的,所述平移模块,包括:Optionally, the translation module includes:

队列建立子模块,用于建立待平移队列及摘要队列;a queue creation sub-module for establishing a queue to be translated and a summary queue;

第一存储子模块,用于将所述第一轨迹集合中未在所述目标时间段中的轨迹作为待平移轨迹,并存储至待平移队列中,其中,所述待平移轨迹为所述第一轨迹集合中的未在所述目标时间段中的组合轨迹和未发生交叠的轨迹;a first storage sub-module, configured to use a trajectory in the target trajectory that is not in the target time segment as a trajectory to be translated, and store the trajectory in the to-be-translated trajectory, where the trajectory to be translated is the a combined trajectory in a set of trajectories that is not in the target time period and a trajectory that does not overlap;

第二存储子模块,用于将所述第一轨迹集合中的在所述目标时间段中的轨迹存储至所述摘要队列中;a second storage submodule, configured to store, in the first time set, a track in the target time period into the summary queue;

第三提取子模块,用于依次从所述待平移队列中提取当前待平移轨迹,并根据所述数据库中的原图,得到所述当前待平移轨迹对应的视频帧中各目标对象的矩形框;a third extraction sub-module, configured to sequentially extract a current to-be-translated trajectory from the to-be-translated queue, and obtain a rectangular frame of each target object in the video frame corresponding to the current to-be-translated trajectory according to the original image in the database ;

运算子模块,用于计算各目标对象的矩形框分别与已存储至所述摘要队列的每条轨迹对应的视频帧中目标对象的矩形框之间的重叠面积;An operation submodule, configured to calculate an overlapping area between a rectangular frame of each target object and a rectangular frame of the target object in a video frame corresponding to each track that has been stored in the summary queue;

第三存储子模块,用于在所述重叠面积小于或等于预设重叠参数阈值时,将所述当前待平移轨迹平移至所述目标时间段,并存储至所述摘要队列;a third storage submodule, configured to: when the overlapping area is less than or equal to a preset overlapping parameter threshold, translate the current to-be-translated trajectory to the target time segment, and store the trajectory to the summary queue;

所述第一获取模块,具体用于:The first acquiring module is specifically configured to:

获取所述摘要队列中的各轨迹对应的第一目标原图。Obtaining a first target original image corresponding to each track in the summary queue.

可选的,所述每条轨迹的轨迹信息中还包括:所述目标对象的目标框信息集合;Optionally, the track information of each track further includes: a target box information set of the target object;

所述拼接模块,包括:The splicing module includes:

第三确定子模块,用于根据所述轨迹信息中所述目标对象的目标框信息集合,确定所述第一目标原图在所述第一摘要背景图中的第一位置;a third determining submodule, configured to determine, according to the target box information set of the target object in the trajectory information, a first position of the first target original image in the first abstract background image;

视频摘要生成子模块,用于将各第一目标原图复制到所述第一摘要背景图中对应的第一位置,生成视频摘要。The video summary generation sub-module is configured to copy each first target original image to a corresponding first position in the first summary background image to generate a video summary.

可选的,所述第三确定子模块,具体用于:Optionally, the third determining submodule is specifically configured to:

若各第一目标原图中目标对象有交叠,则设置所对应的轨迹的交叠部分的像素值为各目标对象的目标原图像素值的均值、不交叠部分的像素值为各目标对象的目标原图的像素值,得到待复制图;If the target objects in the first target original image overlap, the pixel value of the overlapping portion of the corresponding trajectory is set to be the mean value of the target original image pixel value of each target object, and the pixel value of the non-overlapping portion is each target. The pixel value of the target original image of the object, and the image to be copied is obtained;

所述拼接模块,具体用于:The splicing module is specifically configured to:

将各待复制图和目标对象没有交叠的各第一目标原图复制到所述第一摘要背景图中对应的第一位置,生成视频摘要。Copying each of the first target original images that do not overlap each of the to-be-copied image and the target object to a corresponding first position in the first summary background image to generate a video summary.

可选的,所述装置还包括:Optionally, the device further includes:

运算模块,用于按预设周期,获取摘要背景图;An operation module, configured to obtain a summary background image according to a preset period;

第三存储模块,用于将获取的每个周期的摘要背景图存储至所述数据库中;a third storage module, configured to store the acquired summary background image of each period into the database;

所述第二获取模块,包括:The second acquiring module includes:

划分子模块,用于按各预设周期所对应的时间,将所述目标时间段划分为与预设周期对应的时间子段;a sub-module, configured to divide the target time segment into a time sub-segment corresponding to the preset period according to a time corresponding to each preset period;

第四确定子模块,用于确定所述目标时间段中、包含轨迹最多的时间子段对应的第一预设周期;a fourth determining sub-module, configured to determine a first preset period corresponding to the time sub-segment including the most trajectory in the target time period;

背景图获取子模块,用于从所述数据库中,获得所述第一预设周期对应的第一摘要背景图。The background image obtaining sub-module is configured to obtain, from the database, a first abstract background image corresponding to the first preset period.

可选的,所述装置还包括:Optionally, the device further includes:

显示模块,用于按照用户指令,显示用户交互界面;a display module, configured to display a user interaction interface according to a user instruction;

接收模块,用于接收并保存用户通过所述用户交互界面输入的目标检索条件、预设平移条件和用于生成摘要背景图的预设周期;a receiving module, configured to receive and save a target retrieval condition input by the user through the user interaction interface, a preset translation condition, and a preset period for generating a summary background image;

执行模块,用于在接收到用户通过所述用户交互界面输入的启动请求时,执行所述对已建立的数据库进行检索的步骤;An execution module, configured to perform the step of searching the established database when receiving a startup request input by the user through the user interaction interface;

结束模块,用于在接收到用户通过所述用户交互界面输入的中断请求时,结束视频摘要生成的流程。The ending module is configured to end the process of generating the video summary when receiving the interrupt request input by the user through the user interaction interface.

第三方面,本申请实施例提供了一种计算机设备,包括处理器和存储器, 其中,In a third aspect, an embodiment of the present application provides a computer device, including a processor and a memory, where

所处存储器,所述存储器,用于存放计算机程序;a memory, the memory, for storing a computer program;

所述处理器,用于执行所述存储器上所存放的计算机程序时,实现本申请实施例第一方面所述的方法步骤。The method of the first aspect of the embodiments of the present application is implemented when the processor is configured to execute a computer program stored in the memory.

第四方面,本申请实施例提供了一种存储介质,用于存储可执行代码,所述可执行代码用于在运行时执行:本申请实施例第一方面所述的方法步骤。In a fourth aspect, the embodiment of the present application provides a storage medium for storing executable code, where the executable code is used to execute at the runtime: the method steps described in the first aspect of the embodiment of the present application.

第五方面,本申请实施例提供了一种应用程序,用于在运行时执行:本申请实施例第一方面所述的方法步骤。In a fifth aspect, an embodiment of the present application provides an application program for performing, at runtime, the method steps described in the first aspect of the embodiments of the present application.

本申请实施例提供的一种视频摘要的生成方法及装置,通过对已建立的数据库进行检索,得到满足目标检索条件的轨迹,将这些轨迹中发生交叠的轨迹组合为组合轨迹,然后平移不同时间段的组合轨迹和未发生交叠的轨迹至同一目标时间段,最后对目标时间段中的轨迹对应的目标原图和摘要背景图进行拼接,生成视频摘要;本申请实施例在生成视频摘要时,将发生交叠的多个目标对象的轨迹组合成一条组合轨迹,在时间轴上整体平移,避免在平移时丢失交叠的轨迹中的某些轨迹,提高生成视频摘要的视觉效果。A method and device for generating a video summary provided by an embodiment of the present application, by searching an established database, obtaining a trajectory that satisfies a target retrieval condition, and combining the trajectories that overlap in the trajectory into a combined trajectory, and then shifting differently The combined trajectory of the time period and the trajectory that does not overlap to the same target time period, and finally splicing the target original image and the abstract background image corresponding to the trajectory in the target time segment to generate a video summary; When the overlapping trajectories of the plurality of target objects are combined into one combined trajectory, the whole trajectory is shifted on the time axis to avoid losing some trajectories in the overlapping trajectories during translation, thereby improving the visual effect of generating the video summary.

附图说明DRAWINGS

为了更清楚地说明本申请实施例和现有技术的技术方案,下面对实施例和现有技术中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present application and the technical solutions of the prior art, the following description of the embodiments and the drawings used in the prior art will be briefly introduced. Obviously, the drawings in the following description are only Some embodiments of the application may also be used to obtain other figures from those of ordinary skill in the art without departing from the scope of the invention.

图1为本申请一实施例的视频摘要的生成方法的流程示意图;FIG. 1 is a schematic flowchart diagram of a method for generating a video summary according to an embodiment of the present application;

图2为本申请另一实施例的视频摘要的生成方法的流程示意图;2 is a schematic flowchart of a method for generating a video summary according to another embodiment of the present application;

图3为图2所示实施例中S205的具体流程示意图;3 is a schematic diagram of a specific process of S205 in the embodiment shown in FIG. 2;

图4为图2所示实施例中S105的具体流程示意图;4 is a schematic diagram of a specific process of S105 in the embodiment shown in FIG. 2;

图5为图2所示实施例中S103的具体流程示意图;FIG. 5 is a schematic diagram of a specific process of S103 in the embodiment shown in FIG. 2;

图6为本申请又一实施例的视频摘要的生成方法的流程示意图;FIG. 6 is a schematic flowchart diagram of a method for generating a video summary according to still another embodiment of the present application;

图7为本申请一实施例的视频摘要的生成装置的结构示意图;FIG. 7 is a schematic structural diagram of a device for generating a video summary according to an embodiment of the present application;

图8为本申请另一实施例的视频摘要的生成装置的结构示意图;FIG. 8 is a schematic structural diagram of a device for generating a video summary according to another embodiment of the present application;

图9为图8所示实施例中第三提取模块850的具体结构示意图;FIG. 9 is a schematic diagram showing a specific structure of a third extraction module 850 in the embodiment shown in FIG. 8;

图10为图8所示实施例中平移模块730的具体结构示意图;FIG. 10 is a schematic diagram showing a specific structure of the translation module 730 in the embodiment shown in FIG. 8;

图11为图8所示实施例中第二获取模块750的具体结构示意图;FIG. 11 is a schematic structural diagram of a second acquiring module 750 in the embodiment shown in FIG. 8;

图12为本申请又一实施例的视频摘要的生成装置的结构示意图。FIG. 12 is a schematic structural diagram of a device for generating a video summary according to still another embodiment of the present application.

具体实施方式detailed description

为使本申请的目的、技术方案、及优点更加清楚明白,以下参照附图并举实施例,对本申请进一步详细说明。显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the objects, technical solutions, and advantages of the present application more comprehensible, the present application will be further described in detail below with reference to the accompanying drawings. It is apparent that the described embodiments are only a part of the embodiments of the present application, and not all of them. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without departing from the inventive scope are the scope of the present application.

为了提高在目标轨迹复杂的情况下生成的视频摘要的视觉效果,本申请实施例提供了一种视频摘要的生成方法及装置。In order to improve the visual effect of the video summary generated in the case where the target trajectory is complicated, the embodiment of the present application provides a method and an apparatus for generating a video summary.

下面首先对本申请实施例所提供的一种视频摘要的生成方法进行介绍。A method for generating a video digest provided by the embodiment of the present application is first introduced.

本申请实施例所提供的一种视频摘要的生成方法的执行主体可以为一种具有实现生成视频摘要功能的视频摘要控制器。其中,实现本申请实施例所提供的一种视频摘要的生成方法的方式可以为设置于视频摘要控制器中的软件、硬件电路和逻辑电路中的至少一种方式。该视频摘要控制器可以应用于视频监控系统,也可以应用于视频网站的服务器端。An execution body of a method for generating a video summary provided by an embodiment of the present application may be a video summary controller having a function of generating a video summary. The manner of implementing the method for generating a video digest provided by the embodiment of the present application may be at least one of software, a hardware circuit, and a logic circuit that are disposed in the video summary controller. The video summary controller can be applied to a video surveillance system or to a server side of a video website.

如图1所示,本申请实施例所提供的一种视频摘要的生成方法,可以包括如下步骤:As shown in FIG. 1 , a method for generating a video summary provided by an embodiment of the present application may include the following steps:

S101,获取目标检索条件,对已建立的数据库进行检索,得到包含符合目标检索条件的轨迹的第一轨迹集合。S101. Acquire a target search condition, and perform a search on the established database to obtain a first track set including a track that meets the target search condition.

其中,数据库中存储有从包含目标对象的视频帧中提取的每条轨迹的轨迹信息和目标原图,每条轨迹的轨迹信息中包括与其他轨迹间的交叠状态信 息;数据库还可以包括:各目标对象的属性信息、视频帧原图、每一帧原图中目标对象的掩码图和/或背景图;交叠状态信息可以为发生交叠的目标对象的轨迹的标识符,标识符可以是轨迹的名称,也可以是轨迹编号,或者其他用于表征轨迹的特征符号;轨迹信息可以包括:目标对象的轨迹的标识符、目标对象的轨迹点数量、目标对象的每个轨迹点的帧序号、目标对象的轨迹的时间信息、目标对象的轨迹的空间信息和/或与其他轨迹的交叠状态信息等信息;属性信息可以包括:目标对象的出现时间、目标对象的运动方向、车牌号、车型、车辆的品牌、车辆的颜色、人的着装颜色、人的年龄、人的身高、人是否佩戴眼镜和/或人是否背包拎包等信息。目标原图可以为视频帧的原图像,也可以为通过掩码图与视频帧的原图像相与之后得到的图像。The trajectory information of each trajectory extracted from the video frame containing the target object and the target original image are stored in the database, and the trajectory information of each trajectory includes overlapping state information with other trajectories; the database may further include: Attribute information of each target object, a video frame original image, a mask map and/or a background image of the target object in each frame original image; the overlap state information may be an identifier of the trajectory of the overlapping target object, the identifier It may be the name of the track, or the track number, or other feature symbols used to characterize the track; the track information may include: an identifier of the track of the target object, the number of track points of the target object, and each track point of the target object. The frame number, the time information of the trajectory of the target object, the spatial information of the trajectory of the target object, and/or the overlapping state information with other trajectories; the attribute information may include: the appearance time of the target object, the moving direction of the target object, the license plate Number, model, vehicle brand, vehicle color, person's dress color, person's age, person's height, person is Wearing glasses and / or whether the person backpack bag and other information. The target original image may be an original image of the video frame, or may be an image obtained by combining the mask image with the original image of the video frame.

目标检索条件可以是目标对象的属性信息中所有信息的任意组合,也可以是某一个时间段。对已建立的数据库进行检索,从数据库中检索出符合目标检索条件的目标对象的轨迹。The target retrieval condition may be any combination of all the information in the attribute information of the target object, or may be a certain time period. The established database is retrieved, and the trajectory of the target object that meets the target retrieval condition is retrieved from the database.

可选的,目标检索条件包括:检索时间段和/或目标对象的检索属性信息。Optionally, the target retrieval condition includes: retrieving the time period and/or the retrieval attribute information of the target object.

当检索条件仅包括检索时间段时,对已建立的数据库进行检索的步骤,可以包括:When the retrieval condition includes only the retrieval time period, the steps of searching the established database may include:

根据检索时间段,对数据库进行检索,获得检索时间段中各目标对象的轨迹。According to the retrieval time period, the database is retrieved, and the trajectory of each target object in the retrieval time period is obtained.

在目标检索条件为某一个检索时间段时,获取该检索时间段内的各目标对象的轨迹,例如,存在一段7:00至12:00的时长为5个小时的视频,设定目标检索条件为检索时间段:8:00至9:00,则将8:00至9:00中各目标对象的轨迹进行提取,本实施例,相当于对原视频进行了时间段的截取,可以提取目标对象的活动较为频繁的时段的视频,减小了后续的视频摘要生成的输入数据,降低运算量。When the target retrieval condition is a certain retrieval time period, the trajectory of each target object in the retrieval time period is acquired, for example, there is a video with a duration of 5 hours from 7:00 to 12:00, and the target retrieval condition is set. For the retrieval time period: 8:00 to 9:00, the trajectory of each target object in 8:00 to 9:00 is extracted. In this embodiment, the time interval of the original video is intercepted, and the target can be extracted. The video of the object's active time period reduces the input data generated by the subsequent video summary and reduces the amount of calculation.

当检索条件仅包括目标对象的检索属性信息时,对已建立的数据库进行检索的步骤,可以包括:When the search condition includes only the search attribute information of the target object, the step of searching the established database may include:

根据目标对象的检索属性信息,对数据库进行检索,获得与检索属性信息匹配的各目标对象的轨迹。The database is searched according to the retrieval attribute information of the target object, and the trajectory of each target object matching the retrieval attribute information is obtained.

在目标检索条件为目标对象的属性信息中所有信息的任意组合时,对已建立的数据库进行检索,从数据库中检索出与目标检索条件相同的目标对象的轨迹。例如,目标检索条件为:身高1.75米、年龄40岁至45岁之间、身穿白色羽绒服的男士,则根据该目标检索条件,从数据库中可以提取到满足该目标检索条件的目标对象的各轨迹。本实施例,目标检索条件对视频中的目标对象做了限定,能够保证目标对象的准确性,更易于提取满足要求的目标对象的轨迹。When the target search condition is any combination of all the information in the attribute information of the target object, the established database is searched, and the trajectory of the target object having the same target search condition is retrieved from the database. For example, the target search condition is: a man who is 1.75 meters in height and between 40 years old and 45 years old, wearing a white down jacket, according to the target search condition, can extract from the database each target object that satisfies the target search condition. Track. In this embodiment, the target retrieval condition defines the target object in the video, can ensure the accuracy of the target object, and is easier to extract the trajectory of the target object that satisfies the requirement.

当检索条件包括检索时间段和目标对象的检索属性信息时,对已建立的数据库进行检索的步骤,可以包括:When the retrieval condition includes the retrieval time period and the retrieval attribute information of the target object, the step of searching the established database may include:

根据检索时间段和检索目标对象的检索属性信息,对数据库进行检索,获得检索时间段中,与检索属性匹配的各目标对象的轨迹。The database is searched according to the retrieval time period and the retrieval attribute information of the retrieval target object, and the trajectory of each target object matching the retrieval attribute in the retrieval time period is obtained.

结合上述两个实施例,本实施例可以获取到检索时间段中、与属性信息匹配的目标对象的各轨迹,本实施例既保证了目标对象的活跃程度又限定了目标对象的属性,相较于上述两个实施例,获得到的轨迹更为准确。In combination with the above two embodiments, the trajectory of the target object that matches the attribute information in the retrieval time period can be obtained in this embodiment. This embodiment not only ensures the activity level of the target object but also defines the attributes of the target object. In the above two embodiments, the obtained trajectory is more accurate.

目标检索条件可以是预先设定的,也就是在获得视频之前就已经对目标检索的条件做了设定,多用于固定场景的情况,例如连续很多天每天的同一时间段在同一地点重复出现某个目标对象,针对固定场景的情况,使用预先设定的目标检索条件可以避免重复设定相同目标检索条件;目标检索条件还可以是用户根据实际情况输入的,例如用户需要检索某个特定时间段内特定目标对象的轨迹,用户可以根据该目标对象的属性信息设定目标检索条件。这都是合理的。The target search condition may be preset, that is, the condition of the target search has been set before the video is obtained, and is mostly used for the case of fixing the scene, for example, the same time period of the same time period is repeated for many days in a row. Target objects, for the case of a fixed scene, using the preset target search condition can avoid repeatedly setting the same target search condition; the target search condition can also be input by the user according to the actual situation, for example, the user needs to retrieve a certain time period. The trajectory of the specific target object within the user, the user can set the target search condition according to the attribute information of the target object. This is all reasonable.

在对已建立的数据库进行检索时,可以根据目标检索条件,生成SQL(Structured Query Language结构化查询语言)查询语句;对已建立的数据库进行检索。其中,SQL查询语句是SQL查询语言中最为常用的语句,SQL查询语言是一种数据库查询和程序设计语言,用于存取数据以及查询、更新和管理数据库;SQL查询语句是通过选择命令、在数据库的表格中选择满足目标检索条件的数据,该数据可以是目标对象的轨迹标识符,通过轨迹标识符可以确定目标对象的轨迹的如时间信息、空间信息等轨迹信息。When searching the established database, a SQL (Structured Query Language) query can be generated according to the target retrieval condition; the established database is retrieved. Among them, the SQL query statement is the most commonly used statement in the SQL query language. The SQL query language is a database query and programming language for accessing data and querying, updating, and managing databases. SQL query statements are selected by selecting commands. The data of the target search condition is selected in the table of the database, and the data may be a track identifier of the target object, and the track information such as time information and spatial information of the track of the target object may be determined by the track identifier.

S102,根据交叠状态信息,将第一轨迹集合中发生交叠的至少两条轨迹划分为一组,每组确定为一条组合轨迹。S102. Divide at least two trajectories in the first trajectory set that are overlapped into one group according to the overlapping state information, and each group is determined as one combined trajectory.

其中,交叠状态信息可以为发生交叠的目标对象的轨迹的标识符,标识符可以是轨迹的名称,也可以是轨迹编号,或者其他用于表征轨迹的特征符号;第一轨迹集合为从数据库中提取到的满足目标检索条件的目标对象的多条轨迹所组合成的一个轨迹集合。The overlapping state information may be an identifier of a track of the overlapping target object, the identifier may be a name of the track, or a track number, or other feature symbols used to represent the track; the first track set is a slave A set of trajectories in which a plurality of trajectories of a target object satisfying the target retrieval condition extracted in the database are combined.

为了提高在目标对象的轨迹复杂的情况下的视频摘要的效果,在沿时间轴平移轨迹的步骤中将发生交叠的轨迹作为一个整体、组合第一轨迹集合中发生交叠的轨迹得到组合轨迹。并且,组合轨迹中各轨迹在时间轴上的顺序需要保持不变。In order to improve the effect of the video summary in the case where the trajectory of the target object is complicated, in the step of shifting the trajectory along the time axis, the overlapping trajectories are taken as a whole, and the trajectories that overlap in the first trajectory set are combined to obtain a combined trajectory. . Moreover, the order of the trajectories on the time axis in the combined trajectory needs to remain unchanged.

S103,沿时间轴将待平移轨迹中满足预设平移条件的轨迹平移至同一目标时间段。S103. Translate the trajectory of the to-be-translated trajectory that meets the preset translation condition to the same target time period along the time axis.

其中,待平移轨迹包括:第一轨迹集合中发生交叠的至少两条轨迹组成的组合轨迹和/或第一轨迹集合中未发生交叠的轨迹;目标时间段可以是用户预先设定的,也可以是整个视频的播放时间中的某一段时间段。所要生成的视频摘要不是简单地由运动片段拼接而成,而是通过将不同时间段出现的目标对象的轨迹平移至同一时间段,浓缩形成的视频摘要。对目标对象的轨迹的平移是对轨迹的时间的平移,并不包括空间位置的平移。The trajectory to be translated includes: a combined trajectory composed of at least two trajectories in the first trajectory set and/or a trajectory in the first trajectory set that does not overlap; the target time period may be preset by the user, It can also be a certain period of time during the playback time of the entire video. The video summary to be generated is not simply spliced by motion segments, but the resulting video summary is concentrated by shifting the trajectory of the target object appearing at different time segments to the same time period. The translation of the trajectory of the target object is a translation of the time of the trajectory and does not include the translation of the spatial position.

S104,从数据库中获取目标时间段中的各轨迹对应的第一目标原图。S104. Acquire, from the database, a first target original image corresponding to each track in the target time period.

目标原图可以为视频帧的原图像,也可以为通过掩码图与视频帧的原图像相与之后得到的图像,第一目标原图为任一轨迹的目标原图,在第一目标原图为视频帧的原图像时,第一目标原图可以根据目标对象的轨迹信息获得,也可以根据目标对象的属性信息获得。The target original image may be the original image of the video frame, or may be an image obtained by the mask image and the original image of the video frame, and the first target original image is the target original image of any track, in the first target original When the picture is the original image of the video frame, the first target original image may be obtained according to the trajectory information of the target object, or may be obtained according to the attribute information of the target object.

S105,获得用于生成视频摘要的第一摘要背景图。S105. Obtain a first abstract background image for generating a video summary.

视频帧中除了目标对象的第一目标原图之外,其他内容所形成的图像为视频的背景图,最后生成的视频摘要不能仅仅包括第一目标原图,还应包括第一摘要背景图。In addition to the first target original image of the target object in the video frame, the image formed by the other content is the background image of the video, and the generated video summary cannot include only the first target original image, and should also include the first abstract background image.

本实施例中,第一摘要背景图可以是通过静态背景确定方法计算得到的静态的摘要背景图,也可以是通过动态背景确定方法确定的动态的摘要背景图。在特殊场景或视频时间太长时,视频在不同时间的背景图会有很大的差异,为了减小视频摘要生成时,背景图对视频摘要的影响,提升视频摘要的效果,需要保存不同时间段的视频的背景图作为摘要背景图。In this embodiment, the first abstract background image may be a static abstract background image calculated by a static background determining method, or may be a dynamic abstract background image determined by a dynamic background determining method. When the special scene or video time is too long, the background image of the video at different times will be very different. In order to reduce the influence of the background image on the video summary when the video summary is generated, the effect of the video summary is improved, and different time needs to be saved. The background image of the segment's video is taken as a summary background image.

S106,将各第一目标原图与第一摘要背景图进行拼接,生成视频摘要。S106: splicing each first target original image with the first abstract background image to generate a video summary.

将第一目标原图与第一摘要背景图进行拼接,可以是将第一目标原图复制到第一摘要背景图中目标对象所处的位置。但是,由于平移后的轨迹存在交叠的轨迹和未交叠的轨迹,则不能简单地对第一目标原图进行复制粘贴得到视频摘要。The first target original image is spliced with the first abstract background image, and the first target original image may be copied to a position where the target object is located in the first abstract background image. However, since there are overlapping trajectories and unoverlapping trajectories in the translated trajectory, the first target original image cannot be simply copied and pasted to obtain a video summary.

应用本实施例,通过对已建立的数据库进行检索,得到满足目标检索条件的轨迹,将这些轨迹中发生交叠的轨迹组合为组合轨迹,然后平移不同时间段的组合轨迹和未发生交叠的轨迹至同一目标时间段,最后对目标时间段中的轨迹对应的目标原图和摘要背景图进行拼接,生成视频摘要;本申请实施例在生成视频摘要时,将发生交叠的多个目标对象的轨迹组合成一条组合轨迹,在时间轴上整体平移,避免在平移时丢失交叠的轨迹中的某些轨迹,提高生成视频摘要的视觉效果。Applying the embodiment, by searching the established database, the trajectories satisfying the target retrieval condition are obtained, and the overlapping trajectories in the trajectories are combined into a combined trajectory, and then the combined trajectories of different time periods are translated and overlapped without overlapping. Tracking to the same target time segment, and finally splicing the target original image and the abstract background image corresponding to the trajectory in the target time segment to generate a video summary; in the embodiment of the present application, when the video summary is generated, overlapping multiple target objects will occur. The trajectories are combined into a combined trajectory, and the whole trajectory is shifted on the time axis to avoid losing some trajectories in the overlapping trajectories during translation, thereby improving the visual effect of generating a video summary.

可选的,视频摘要的生成方法还包括:Optionally, the method for generating the video summary further includes:

在接收到用户输入的中断请求时,结束视频摘要生成的流程;在接收到用户输入的启动请求时,执行获取目标检索条件,对已建立的数据库进行检索的步骤。Upon receiving the interrupt request input by the user, the flow of the video summary generation is ended; when the startup request input by the user is received, the step of acquiring the target retrieval condition and performing the retrieval on the established database is performed.

为了在视频摘要的生成过程中,提高用户的使用体验,用户可以在任意时间输入中断请求,例如,用户发现目标检索条件设置错误,在接收到该中断请求后,即结束视频摘要生成的流程;然后,用户可以根据需求,重新设置目标检索条件,并输入启动请求,在接收到启动请求后,根据目标检索条件重新对已建立的数据库进行检索。In order to improve the user experience during the generation of the video summary, the user can input an interrupt request at any time. For example, the user finds that the target retrieval condition is set incorrectly, and after receiving the interrupt request, the process of generating the video summary is ended; Then, the user can reset the target retrieval condition according to the requirement, and input a startup request, and after receiving the startup request, re-search the established database according to the target retrieval condition.

如图2所示,本申请实施例所提供的一种视频摘要的生成方法,在获取目 标检索条件的步骤之前,视频摘要的生成方法还可以包括:As shown in FIG. 2, in the method for generating a video summary provided by the embodiment of the present application, before the step of acquiring the target search condition, the method for generating the video summary may further include:

S201,从输入的视频中提取各目标对象。S201, extracting each target object from the input video.

其中,目标对象是具有特征信息的目标,例如人物、汽车、轮船等。Among them, the target object is a target with characteristic information, such as a character, a car, a ship, and the like.

S202,提取各目标对象的轨迹信息和属性信息。S202. Extract track information and attribute information of each target object.

其中,轨迹信息中可以包括:目标对象的轨迹的标识符、目标对象的轨迹点数量、目标对象的每个轨迹点的帧序号、目标对象的轨迹的时间信息、目标对象的轨迹的空间信息和/或与其他轨迹的交叠状态信息等信息;属性信息中可以包括:目标对象的出现时间、目标对象的运动方向、车牌号、车型、车辆的品牌、车辆的颜色、人的着装颜色、人的年龄、人的身高、人是否佩戴眼镜和/或人是否背包拎包等信息。The trajectory information may include: an identifier of a trajectory of the target object, a number of trajectory points of the target object, a frame number of each trajectory point of the target object, time information of the trajectory of the target object, spatial information of the trajectory of the target object, and / or information such as overlapping state information with other tracks; attribute information may include: the time of occurrence of the target object, the direction of movement of the target object, the license plate number, the model, the brand of the vehicle, the color of the vehicle, the color of the person's dress, the person Information such as the age, the height of the person, whether the person wears glasses and/or whether the person has a backpack or a bag.

在获取到输入的视频后,需要对该视频进行视频结构化目标提取,视频结构化目标提取包括目标对象提取和目标属性提取,通过目标对象提取,得到目标轨迹描述文件;通过目标属性提取,得到目标属性描述文件,目标轨迹描述文件与目标属性描述文件包含于视频结构化目标描述文件中。目标对象提取与目标属性提取可以同步进行,目标属性提取是通过预设视频帧提取方法提取一张或者多张包含目标对象的视频帧图像,再结合属性分类器,综合属性分类器的结果得到目标对象的目标属性描述文件。其中,属性分类器用于识别目标对象的某一类属性,并且可以通过属性分类器的内部分析得到该类属性的信息。目标对象提取是根据目标属性提取得到的目标对象的特定属性和运动属性,结合目标对象的特定属性和运动属性,并进行多目标跟踪,将跟踪的目标对象的特定属性与运动属性进行关联融合,得到目标对象的目标轨迹描述文件。保证在多个目标同时出现时目标对象与目标对象的轨迹一一对应,防止另一个目标对象的轨迹影响本目标对象的轨迹,提高目标对象提取的准确性。After the input video is obtained, the video structured target extraction is performed on the video. The video structured target extraction includes target object extraction and target attribute extraction, and the target trajectory description file is obtained by the target object extraction; The target attribute description file, the target track description file and the target attribute description file are included in the video structured object description file. The target object extraction and the target attribute extraction can be performed synchronously. The target attribute extraction is to extract one or more video frame images containing the target object by using a preset video frame extraction method, and then combine the attribute classifier to obtain the target of the comprehensive attribute classifier. The target attribute description file for the object. The attribute classifier is used to identify a certain type of attribute of the target object, and the information of the class attribute can be obtained through internal analysis of the attribute classifier. The target object extraction is a specific attribute and a motion attribute of the target object extracted according to the target attribute, combined with the specific attribute and the motion attribute of the target object, and multi-target tracking is performed, and the specific attribute of the tracked target object is associated with the motion attribute. Get the target trajectory description file of the target object. It is guaranteed that the target object and the target object's trajectory correspond one-to-one when multiple targets occur at the same time, preventing the trajectory of the other target object from affecting the trajectory of the target object, and improving the accuracy of the target object extraction.

S203,将各目标对象的轨迹信息和属性信息存储至视频结构化目标描述文件中。S203. Store the trajectory information and attribute information of each target object into the video structured target description file.

视频结构化目标描述文件用于存储目标对象的属性信息及轨迹信息。现有的视频结构化目标描述文件的生成通常是在工控机或者服务器上实现的, 当然也可以通过嵌入式平台实现,例如DSP(Digital Signal Processor,数字信号处理器)、ARM(Advanced Reduced Instruction Set Computer Machines,精简指令集微处理器)。The video structured target description file is used to store attribute information and track information of the target object. The existing video structured object description file is usually generated on the industrial computer or server, and can also be implemented through an embedded platform, such as DSP (Digital Signal Processor) and ARM (Advanced Reduced Instruction Set). Computer Machines, a reduced instruction set microprocessor).

S204,根据视频结构化目标描述文件,生成数据库。S204: Generate a database according to the video structured target description file.

在得到视频结构化目标描述文件后,利用其中的属性信息建立数据库,并由该数据库管理所有的属性信息,包括:目标对象的出现时间、目标对象的运动方向、车牌号、车型、车辆的品牌、车辆的颜色、人的着装颜色、人的年龄、人的身高、人是否佩戴眼镜和/或人是否背包拎包等信息。After obtaining the video structured target description file, the database is used to establish a database, and all the attribute information is managed by the database, including: the appearance time of the target object, the moving direction of the target object, the license plate number, the model, and the brand of the vehicle. , the color of the vehicle, the color of the person's dress, the age of the person, the height of the person, whether the person wears glasses and/or whether the person has a backpack or the like.

在建立数据库前,对目标对象的轨迹进行分析、提取目标对象的轨迹信息,通过目标对象的轨迹信息可以提取目标对象的交叠状态信息,这样,数据库中就保存了目标对象的交叠状态信息。Before establishing the database, the trajectory of the target object is analyzed, and the trajectory information of the target object is extracted. The trajectory information of the target object can be used to extract the overlapping state information of the target object, so that the overlapping state information of the target object is saved in the database. .

S205,从包含目标对象的视频帧中,提取轨迹信息对应的每一帧的原图和掩码图,根据原图及掩码图确定与轨迹信息对应的每一帧的目标原图。S205. Extract, from the video frame that includes the target object, an original image and a mask map of each frame corresponding to the trajectory information, and determine a target original image of each frame corresponding to the trajectory information according to the original image and the mask map.

目标原图为目标对象的图像,本实施例中,目标原图可以是通过掩码图与每个视频帧的原图相与得到,由于掩码图体现了目标对象的轮廓,掩码图只是表示目标对象的轮廓,而不包含图像内容,在与视频帧的原图相与之后,即得到掩码图的区域的目标对象的图像,相较于直接从视频帧原图中提取目标对象的图像更为准确。其中,掩码图可以通过预设的背景建模方法提取得到,预设的背景建模方法可以为颜色背景模型方法、平均背景模型方法、高斯背景模型方法或CodeBook背景模型方法等方法中的任一种。The target original image is an image of the target object. In this embodiment, the target original image may be obtained by matching the original image of each video frame through a mask map. Since the mask map reflects the contour of the target object, the mask map is only Represents the outline of the target object, not the image content. After the original image of the video frame is matched, the image of the target object of the area of the mask map is obtained, compared to directly extracting the target object from the original image of the video frame. The image is more accurate. The mask map can be extracted by a preset background modeling method, and the preset background modeling method can be any of the color background model method, the average background model method, the Gaussian background model method or the CodeBook background model method. One.

如图3所示,本申请实施例的一种视频摘要的生成方法,从包含目标对象的视频帧中,提取轨迹信息对应的每一帧的掩码图的步骤,可以包括:As shown in FIG. 3, a method for generating a video digest in the embodiment of the present application, the step of extracting a mask map of each frame corresponding to the trajectory information from the video frame that includes the target object may include:

S2051,从包含目标对象的视频帧中,提取目标对象的运动掩码。S2051: Extract a motion mask of the target object from the video frame that includes the target object.

S2052,根据运动掩码,确定初始的掩码图。S2052. Determine an initial mask map according to the motion mask.

其中,运动掩码是构成掩码图的二维数据,通过提取目标对象的运动掩码,根据运动掩码的二维数据以及上述预设的背景建模方法中的任一种,可以确定目标对象的掩码图。掩码图表征了目标对象的轮廓,用于将视频帧原 图中的目标原图与背景图区分开。The motion mask is two-dimensional data constituting the mask map, and the target can be determined by extracting the motion mask of the target object according to the two-dimensional data of the motion mask and the preset background modeling method. The mask map of the object. The mask map characterizes the outline of the target object and distinguishes the target original image from the background image in the original video frame.

S2053,确定初始的掩码图的边缘点集。S2053. Determine an edge point set of the initial mask map.

S2054,提取边缘点集中的凸集,构成掩码图的凸包点集。S2054: Extract a convex set in the edge point set to form a convex punctual point set of the mask map.

S2055,填充凸包点集对应的凸包,得到最终的掩码图。S2055: Fill the convex hull corresponding to the convex hull point set to obtain a final mask map.

在复杂的场景下,易出现掩码图提取不完整的情况,为了完善掩码图,进一步改善视频摘要的效果,在提取到目标对象的掩码图后,可以对掩码图进行后处理操作,即根据掩码图的边缘点集,构成掩码图的凸包点集,并填充凸包点集对应的凸包,从而完善掩码图,最大程度的体现目标对象的轮廓。In a complicated scenario, the mask map extraction is incomplete. In order to improve the mask map and further improve the effect of the video summary, after the mask map of the target object is extracted, the mask map can be post-processed. That is, according to the edge point set of the mask map, the convex punctual point set of the mask map is formed, and the convex hull corresponding to the convex hull point set is filled, thereby perfecting the mask map to maximize the contour of the target object.

S206,将原图、掩码图及目标原图存储至数据库中。S206: Store the original image, the mask map, and the target original image in a database.

将目标对象对应的原图、掩码图及目标原图存储至数据库中,以便在进行视频摘要生成的步骤中,能够根据目标对象的属性信息快速从数据库中查找到对应的目标原图。The original image, the mask map and the target original image corresponding to the target object are stored in the database, so that in the step of generating the video summary, the corresponding target original image can be quickly searched from the database according to the attribute information of the target object.

可选的,在将原图、掩码图及目标原图存储至数据库中的步骤之前,视频摘要的生成方法还可以包括:Optionally, before the step of storing the original image, the mask image, and the target original image in the database, the method for generating the video summary may further include:

第一步,按预设周期,获取摘要背景图。In the first step, the summary background image is obtained according to the preset period.

其中,预设周期是保存摘要背景图的周期,可以是用户根据实际需求设定的,也可以是本领域技术人员预先设定的经验值。The preset period is a period for saving the summary background image, and may be set by the user according to actual needs, or may be an empirical value preset by a person skilled in the art.

通过掩码图与视频帧的原图相与可以得到目标原图,由于掩码图体现了目标对象的轮廓,掩码图只是表示目标对象的轮廓,而不包含图像内容,在与视频帧原图相与之后,即得到掩码图的区域的目标对象的图像,相较于直接从视频帧原图中提取目标对象的图像更为准确,并且在保障目标对象提取完整的前提下,去除了目标框内的背景部分,提升目标原图与摘要背景图拼接效果。在视频分析的过程中,自始至终都维护了一张背景图,此背景图每帧都进行更新,在到达预设周期时间时,背景图自动保存一次。此背景图的更新方法是:对每帧图像提取运动掩码图,像素为运动前景的,则对应的背景图的像素不进行更新,否则根据更新公式更新背景图的像素,其中,更新公式为:a=b×k+c×(1-k),a为当前帧的背景图像素值,b为前一帧的背景 图像素值,k为预设值,k的取值范围为0至1的任意数,c为当前视频帧像素值。The target original image can be obtained by the mask map and the original image of the video frame. Since the mask map reflects the outline of the target object, the mask map only represents the outline of the target object, and does not contain the image content, and the original video frame. After the image phase and the image of the target object of the region obtained by the mask map, it is more accurate than extracting the image of the target object directly from the original image of the video frame, and the target object is completely extracted, and the image is removed. The background part of the target frame enhances the stitching effect of the original image and the abstract background image. In the process of video analysis, a background image is maintained from beginning to end. This background image is updated every frame. When the preset cycle time is reached, the background image is automatically saved once. The updating method of the background image is: extracting a motion mask map for each frame image, and the pixels are motion foreground, and the pixels of the corresponding background image are not updated, otherwise the pixels of the background image are updated according to the update formula, wherein the update formula is :a=b×k+c×(1-k), a is the background image pixel value of the current frame, b is the background image pixel value of the previous frame, k is the preset value, and k is in the range of 0 to Any number of 1, c is the current video frame pixel value.

第二步,将获取的每个周期的摘要背景图存储至数据库中。In the second step, the summary background image of each cycle obtained is stored in the database.

由于视频在播放的过程中,背景图会有所变化,但是相较于目标对象,背景图的改变很小,因此,只需要周期性的保存背景图,既保证了背景的真实性又不增加太多的计算量。Since the background image changes during the playback of the video, the background image changes little compared to the target object. Therefore, it is only necessary to periodically save the background image, which not only ensures the authenticity of the background but also increases the background. Too much amount of calculation.

如图4所示,本申请实施例的一种视频摘要的生成方法,获得用于生成视频摘要的第一摘要背景图的步骤,可以包括:As shown in FIG. 4, a method for generating a video summary in the embodiment of the present application, the step of obtaining a first summary background image for generating a video summary may include:

S1051,按各预设周期所对应的时间,将目标时间段划分为与预设周期对应的时间子段。S1051: The target time segment is divided into time sub-segments corresponding to the preset period according to the time corresponding to each preset period.

在周期性保存背景图时,将目标时间段划分为对应的时间子段,可以保证在后续获得用于生成视频摘要的摘要背景图时,一定可以获取到对应的摘要背景图,而不需要更多的算法确定更优的摘要背景图,能够有效的节省运算量。When the background image is periodically saved, the target time segment is divided into corresponding time sub-segments, which can ensure that the corresponding abstract background image can be obtained when the subsequent summary background image for generating the video summary is obtained, without requiring more A multi-algorithm algorithm determines a better summary background image, which can effectively save computational effort.

S1052,确定目标时间段中、包含轨迹最多的时间子段对应的第一预设周期。S1052: Determine a first preset period corresponding to a time sub-segment containing the most track in the target time period.

S1053,从数据库中,获得第一预设周期对应的第一摘要背景图。S1053. Obtain a first summary background image corresponding to the first preset period from the database.

如果某一个时间子段中包含的轨迹数目最多,说明在该时间段目标对象的运动最频繁,那么该时间段中的背景图应作为视频摘要的摘要背景图,仅有少部分轨迹与实际背景不符,相较于静态背景图更能真实的体现目标对象的实际轨迹,提升视频摘要的效果。If the number of tracks included in a certain time subsection is the most, indicating that the target object has the most frequent motion during that time period, the background image in the time period should be used as the abstract background image of the video summary, with only a small part of the track and the actual background. Does not match, compared to the static background image can more realistically reflect the actual trajectory of the target object, improve the effect of the video summary.

应用本实施例,通过对已建立的数据库进行检索,得到满足目标检索条件的轨迹,将这些轨迹中发生交叠的轨迹组合为组合轨迹,然后平移不同时间段的组合轨迹和未发生交叠的轨迹至同一目标时间段,最后对目标时间段中的轨迹对应的目标原图和摘要背景图进行拼接,生成视频摘要;本申请实施例在生成视频摘要时,将发生交叠的多个目标对象的轨迹组合成一条组合 轨迹,在时间轴上整体平移,避免在平移时丢失交叠的轨迹中的某些轨迹,提高生成视频摘要的视觉效果。并通过建立数据库,保存目标对象的属性信息、目标原图等信息,在生成视频摘要时,提高生成视频摘要的速度;目标原图通过掩码图与视频帧原图相与得到,由于掩码图体现了目标对象的轮廓,掩码图只是表示目标对象的轮廓,而不包含图像内容,在与视频帧原图相与之后,即得到掩码图的区域的目标对象的图像,相较于直接从视频帧原图中提取目标对象的图像更为准确。Applying the embodiment, by searching the established database, the trajectories satisfying the target retrieval condition are obtained, and the overlapping trajectories in the trajectories are combined into a combined trajectory, and then the combined trajectories of different time periods are translated and overlapped without overlapping. Tracking to the same target time segment, and finally splicing the target original image and the abstract background image corresponding to the trajectory in the target time segment to generate a video summary; in the embodiment of the present application, when the video summary is generated, overlapping multiple target objects will occur. The trajectories are combined into a combined trajectory, and the whole trajectory is shifted on the time axis to avoid losing some trajectories in the overlapping trajectories during translation, thereby improving the visual effect of generating a video summary. And by establishing a database, saving the attribute information of the target object, the original image of the target, and the like, and increasing the speed of generating the video summary when generating the video summary; the target original image is obtained by matching the mask image with the original image of the video frame, due to the mask The figure embodies the outline of the target object. The mask map only represents the outline of the target object, and does not contain the image content. After the original image is combined with the original image of the video frame, the image of the target object in the area of the mask map is obtained. It is more accurate to extract the image of the target object directly from the original video frame.

如图5所示,本申请实施例的一种视频摘要的生成方法,沿时间轴将待平移轨迹中满足预设平移条件的轨迹平移至同一目标时间段的步骤,可以包括:As shown in FIG. 5, a method for generating a video summary according to an embodiment of the present disclosure, the step of shifting a trajectory of a to-be-translated trajectory that meets a preset translation condition to the same target time segment along the time axis may include:

S1031,建立待平移队列及摘要队列。S1031: Establish a to-be-translated queue and a summary queue.

其中,待平移队列是用于存储未排布目标轨迹的队列,可以在待平移队列中存储的是数据库中还没有判断是否满足预设条件的轨迹,还可以是第一轨迹集合中未在目标时间段中的轨迹。摘要队列是用于存储用于生成视频摘要的轨迹的队列。The queue to be translated is a queue for storing undistributed target trajectories, and the trajectory that can be stored in the queue to be translated is not yet determined in the database whether the preset condition is satisfied, or the target is not in the first trajectory set. The trajectory in the time period. The summary queue is a queue for storing tracks for generating video summaries.

S1032,将第一轨迹集合中未在目标时间段中的轨迹作为待平移轨迹,并存储至待平移队列中。S1032: A track that is not in the target time segment in the first track set is used as a track to be translated, and is stored in the queue to be translated.

其中,待平移轨迹为第一轨迹集合中的未在目标时间段的组合轨迹和未发生交叠的轨迹。The to-be-translated trajectory is a combined trajectory in the first trajectory set that is not in the target time segment and a trajectory that does not overlap.

S1033,将第一轨迹集合中的在目标时间段中的轨迹存储至摘要队列中。S1033. Store the trajectory in the target time segment in the first trajectory set into the summary queue.

S1034,依次从待平移队列中提取当前待平移轨迹,并根据数据库中的原图,得到当前待平移轨迹对应的视频帧中各目标对象的矩形框。S1034: Extract the current to-be-translated trajectory from the queue to be translated, and obtain a rectangular frame of each target object in the video frame corresponding to the current to-be-translated trajectory according to the original image in the database.

目标对象在视频帧中所处位置实际包含在一个矩形子图中,矩形子图的大小与目标提取方法有关,例如若提取的是人头目标,矩形子图的大小是基于各人头目标的大小、尽可能包含任一个人头目标的区域范围所决定的;再例如若需要提取行人目标,矩形子图的大小是基于各行人目标的大小、尽可能包含任一个行人目标的范围所决定的。目标对象的轨迹由多个连续的视频 帧形成,则一个目标对象的轨迹存在多个矩形子图,形成待平移轨迹的矩形框,该矩形框为包含目标对象的各矩形子图且面积最小的方框。The position of the target object in the video frame is actually contained in a rectangular sub-picture. The size of the rectangular sub-picture is related to the target extraction method. For example, if the head target is extracted, the size of the rectangular sub-picture is based on the size of each head target. As far as possible, including the range of the area of any individual head target; if, for example, the pedestrian target needs to be extracted, the size of the rectangular sub-picture is determined based on the size of each pedestrian target and the range of any pedestrian target as much as possible. The trajectory of the target object is formed by a plurality of consecutive video frames, and the trajectory of one target object has a plurality of rectangular sub-pictures, forming a rectangular frame to be translated, and the rectangular frame is a rectangular sub-graph containing the target object and having the smallest area. Box.

S1035,计算各目标对象的矩形框分别与已存储至摘要队列的每条轨迹对应的视频帧中目标对象的矩形框之间的重叠面积。S1035. Calculate an overlapping area between the rectangular frame of each target object and the rectangular frame of the target object in the video frame corresponding to each track that has been stored in the summary queue.

S1036,在重叠面积小于或等于预设重叠参数阈值时,将当前待平移轨迹平移至目标时间段,并存储至摘要队列。S1036: When the overlapping area is less than or equal to the preset overlapping parameter threshold, the current to-be-translated trajectory is translated to the target time segment and stored in the summary queue.

在两条轨迹的矩形框的重叠面积太大时,说明这两条轨迹的重叠部分太多,可以近似认为该两条轨迹为同一条轨迹,因此,在平移轨迹时,不平移矩形框的重叠面积太大的轨迹。在本实施例中,预先设定一个重叠参数,该预设重叠参数阈值可以根据实际的需求情况以及目标对象的具体属性进行设定,在矩形框的重叠面积大于该预设重叠参数阈值时,则认为重叠部分太多,不存储该轨迹至摘要队列;只有在矩形框的重叠面积小于或等于预设重叠参数阈值时,才存储待平移轨迹至摘要队列。When the overlapping area of the rectangular frames of the two tracks is too large, it means that there are too many overlapping parts of the two tracks, and the two tracks can be approximated as the same track. Therefore, when the track is translated, the overlap of the rectangular frames is not translated. A track with too much area. In this embodiment, an overlapping parameter is preset, and the preset overlapping parameter threshold may be set according to an actual demand situation and a specific attribute of the target object. When the overlapping area of the rectangular frame is greater than the preset overlapping parameter threshold, If the overlapping portion is too large, the track is not stored in the summary queue; the track to be translated to the summary queue is stored only when the overlapping area of the rectangular frame is less than or equal to the preset overlapping parameter threshold.

本实施例通过建立待平移队列和摘要队列两个队列,将第一轨迹集合中未在目标时间段中的轨迹存至待平移队列,将第一轨迹集合中在目标时间段中的轨迹存至摘要队列,再将待平移队列中满足预设重叠条件的轨迹存至摘要队列。应用本实施例,可以避免在平移轨迹时发生错乱的现象。In this embodiment, by establishing two queues of the to-be-translated queue and the summary queue, the track in the first track set that is not in the target time segment is stored in the queue to be translated, and the track in the target time segment in the first track set is stored to The summary queue stores the track in the queue to be translated that meets the preset overlap condition to the summary queue. With the embodiment, it is possible to avoid a phenomenon in which a disorder occurs when the trajectory is translated.

可选的,从数据库中获取目标时间段的各轨迹对应的第一目标原图的步骤,可以包括:Optionally, the step of obtaining the first target original image corresponding to each track of the target time segment from the database may include:

获取摘要队列中的各轨迹对应的第一目标原图。Acquire a first target original image corresponding to each track in the summary queue.

轨迹信息与目标对象以及目标对象的轨迹都具有对应关系,当然,轨迹信息与目标对象的目标原图也具有对应关系。因此,可以根据待平移轨迹的轨迹信息从数据库中提取该轨迹信息对应的第一目标原图。The trajectory information has a corresponding relationship with the target object and the trajectory of the target object. Of course, the trajectory information also has a corresponding relationship with the target original image of the target object. Therefore, the first target original image corresponding to the trajectory information can be extracted from the database according to the trajectory information of the trajectory to be translated.

可选的,将各第一目标原图与第一摘要背景图进行拼接,生成视频摘要,包括:Optionally, the first target original image is spliced with the first abstract background image to generate a video summary, including:

第一步,根据轨迹信息中目标对象的目标框信息集合,确定第一目标原图在第一摘要背景图中的第一位置。In the first step, the first position of the first target original image in the first summary background image is determined according to the target frame information set of the target object in the trajectory information.

每条轨迹的轨迹信息中还包括:目标对象的目标框信息集合,目标框信息包括目标对象的矩形框的左上角点的坐标和宽高,例如,目标框信息为(x,y,w,h),其中,x为目标对象的矩形框的左上角点的横坐标,y为目标对象的矩形框的左上角点的纵坐标,w为目标对象的矩形框的宽,h为目标对象的矩形框的高,当然,目标框信息还可以包括目标对象的矩形框的中心点坐标和宽高,例如,目标框信息为(m,n,p,q),其中,m为目标对象的矩形框的中心点的横坐标,n为目标对象的矩形框的中心点的纵坐标,p为目标对象的矩形框的宽,q为目标对象的矩形框的高。目标在移动过程中形成目标框信息的集合,该集合包含了第一目标原图在第一摘要背景图中的坐标、长度、方向等信息,因此,可以根据轨迹信息中目标对象的目标框信息集合,确定第一目标原图在第一摘要背景图中的第一位置。The track information of each track further includes: a target frame information set of the target object, and the target frame information includes coordinates and a width and height of a top left corner of the rectangular frame of the target object, for example, the target frame information is (x, y, w, h), where x is the abscissa of the upper left corner of the rectangular frame of the target object, y is the ordinate of the upper left corner of the rectangular frame of the target object, w is the width of the rectangular frame of the target object, and h is the target object The height of the rectangular frame, of course, the target frame information may also include the coordinates of the center point of the rectangular frame of the target object and the width and height. For example, the target frame information is (m, n, p, q), where m is the rectangle of the target object. The abscissa of the center point of the frame, n is the ordinate of the center point of the rectangle of the target object, p is the width of the rectangle of the target object, and q is the height of the rectangle of the target object. The target forms a set of target frame information in the moving process, and the set includes information such as coordinates, length, direction, and the like of the first target original image in the first abstract background image, and therefore, according to the target frame information of the target object in the trajectory information The set determines the first position of the first target original image in the first summary background image.

第二步,将各第一目标原图复制到第一摘要背景图中对应的第一位置,生成视频摘要。In the second step, the first target original image is copied to the corresponding first position in the first abstract background image to generate a video summary.

在进行目标原图与摘要背景图的拼接时,如果直接将目标原图和摘要背景图进行拼接,由于拼接所使用的方法可能会使得目标原图在摘要背景图中的位置与实际位置存在误差,或者是目标原图中包含有部分背景图,与需要拼接的背景图不相同,从而导致目标原图在摘要背景图中的位置与原图不符;因此本实施例中,目标原图可以为视频帧的原图与掩码图相与得到的图像,该目标原图不包含背景图,并且根据掩码图体现出目标对象在摘要背景图中的真实位置,将目标原图与摘要背景图拼接,可以准确的将目标原图复制到掩码图所勾勒的轮廓区域,能够保障目标原图与摘要背景图拼接的效果,从而改善视频摘要的视觉效果。When splicing the target original image and the abstract background image, if the target original image and the abstract background image are directly spliced, the method used for splicing may cause the target original image to have an error in the position and actual position in the abstract background image. Or, the target original image includes a partial background image, which is different from the background image to be stitched, so that the position of the target original image in the abstract background image does not match the original image; therefore, in this embodiment, the target original image may be The original image of the video frame and the mask image are combined with the obtained image. The target original image does not include the background image, and the true position of the target object in the abstract background image is reflected according to the mask map, and the target original image and the abstract background image are displayed. The splicing can accurately copy the target original image to the contour area outlined by the mask map, which can ensure the effect of splicing the target original image and the abstract background image, thereby improving the visual effect of the video summary.

可选的,将各第一目标原图复制到第一摘要背景图中对应的第一位置的步骤,可以包括:Optionally, the step of copying each of the first target original images to the corresponding first location in the first summary background image may include:

第一步,若各第一目标原图中目标对象有交叠,则设置所对应的轨迹的交叠部分的像素值为各目标对象的目标原图像素值的均值、不交叠部分的像素值为各目标对象的目标原图的像素值,得到待复制图;In the first step, if the target objects in the first target original image overlap, the pixel value of the overlapping portion of the corresponding trajectory is set to be the mean value of the target original image pixel values of each target object, and the pixels of the non-overlapping portion. The value is the pixel value of the target original image of each target object, and the image to be copied is obtained;

第二步,将各待复制图和目标对象没有交叠的各第一目标原图复制到第一摘要背景图中对应的第一位置,生成视频摘要。In the second step, each of the first target original images that are not overlapped with the target object and the target object are copied to the corresponding first position in the first abstract background image to generate a video summary.

在生成视频摘要时,生成视频摘要的目标对象的轨迹存在发生交叠和未发生交叠的情况,在未发生交叠时,可以直接将目标原图复制到摘要背景图中,在发生交叠时,根据重叠部分的像素值,取各目标对象的目标原图的像素值的均值作为交叠部分的图像的像素值,然后再进行拼接,生成视频摘要。当然交叠部分还可以取各目标对象的目标原图的像素值的加权值,这都是合理的。When the video summary is generated, the trajectory of the target object that generates the video summary overlaps and does not overlap. When no overlap occurs, the target original image can be directly copied into the summary background image, and overlap occurs. Then, according to the pixel value of the overlapping portion, the average value of the pixel values of the target original image of each target object is taken as the pixel value of the image of the overlapping portion, and then spliced to generate a video summary. Of course, the overlapping portion can also take the weight value of the pixel value of the target original image of each target object, which is reasonable.

如图6所示,本申请实施例所提供的一种视频摘要的生成方法,在获取目标检索条件的步骤之前,视频摘要的生成方法还可以包括:As shown in FIG. 6 , a method for generating a video summary provided by an embodiment of the present application may further include: before the step of acquiring a target search condition, the method for generating a video summary may further include:

S601,按照用户指令,显示用户交互界面。S601: Display a user interaction interface according to a user instruction.

用户交互界面是实现用户与系统之间进行交互的界面,用户交互界面中可以为对话框,也可以为网页中的选择画面。用于提示用户输入目标检索、预设平移条件以及预设周期,其中预设周期用于生成摘要背景图。The user interaction interface is an interface for realizing interaction between the user and the system, and the user interaction interface may be a dialog box or a selection screen in the webpage. It is used to prompt the user to input a target search, a preset panning condition, and a preset period, wherein the preset period is used to generate a summary background image.

S602,接收并保存用户通过用户交互界面输入的目标检索条件、预设平移条件和用于生成摘要背景图的预设周期。S602. Receive and save a target retrieval condition input by the user through the user interaction interface, a preset translation condition, and a preset period for generating a summary background image.

应用本实施例,通过对已建立的数据库进行检索,得到满足目标检索条件的轨迹,将这些轨迹中发生交叠的轨迹组合为组合轨迹,然后平移不同时间段的组合轨迹和未发生交叠的轨迹至同一目标时间段,最后对目标时间段中的轨迹对应的目标原图和摘要背景图进行拼接,生成视频摘要;本申请实施例在生成视频摘要时,将发生交叠的多个目标对象的轨迹组合成一条组合轨迹,在时间轴上整体平移,避免在平移时丢失交叠的轨迹中的某些轨迹,提高生成视频摘要的视觉效果。并且通过用户交互界面,支持用户设定目标检索条件及设置参数,提高应用的灵活性,给用户带来便利。Applying the embodiment, by searching the established database, the trajectories satisfying the target retrieval condition are obtained, and the overlapping trajectories in the trajectories are combined into a combined trajectory, and then the combined trajectories of different time periods are translated and overlapped without overlapping. Tracking to the same target time segment, and finally splicing the target original image and the abstract background image corresponding to the trajectory in the target time segment to generate a video summary; in the embodiment of the present application, when the video summary is generated, overlapping multiple target objects will occur. The trajectories are combined into a combined trajectory, and the whole trajectory is shifted on the time axis to avoid losing some trajectories in the overlapping trajectories during translation, thereby improving the visual effect of generating a video summary. And through the user interaction interface, the user is allowed to set the target retrieval condition and setting parameters, thereby improving the flexibility of the application and bringing convenience to the user.

可选的,视频摘要的生成方法还可以包括:Optionally, the method for generating the video summary may further include:

在接收到用户通过用户交互界面输入的启动请求时,执行对已建立的数据库进行检索的步骤;在接收到用户通过用户交互界面输入的中断请求时,结束视频摘要生成的流程。When receiving the startup request input by the user through the user interaction interface, the step of retrieving the established database is performed; and when the interrupt request input by the user through the user interaction interface is received, the process of generating the video summary is ended.

为了在视频摘要的生成过程中,提高用户的使用体验,用户可以在任意时间输入中断请求,例如,用户发现目标检索条件设置错误,在接收到该中断请求后,即结束视频摘要生成的流程;然后,用户可以根据需求,重新设置目标检索条件,并输入启动请求,在接收到启动请求后,根据目标检索条件重新对已建立的数据库进行检索。In order to improve the user experience during the generation of the video summary, the user can input an interrupt request at any time. For example, the user finds that the target retrieval condition is set incorrectly, and after receiving the interrupt request, the process of generating the video summary is ended; Then, the user can reset the target retrieval condition according to the requirement, and input a startup request, and after receiving the startup request, re-search the established database according to the target retrieval condition.

相应于上述方法实施例,本申请实施例提供了一种视频摘要的生成装置,如图7所示,该视频摘要的生成装置可以包括:Corresponding to the foregoing method embodiment, the embodiment of the present application provides a device for generating a video summary. As shown in FIG. 7, the device for generating a video summary may include:

检索模块710,用于获取目标检索条件,对已建立的数据库进行检索,得到包含符合所述目标检索条件的轨迹的第一轨迹集合,其中,所述数据库中存储有从包含目标对象的视频帧中提取的每条轨迹的轨迹信息和目标原图,所述每条轨迹的轨迹信息中包括:与其他轨迹间的交叠状态信息;The retrieval module 710 is configured to acquire a target retrieval condition, and perform a retrieval on the established database to obtain a first trajectory set including a trajectory that meets the target retrieval condition, where the database stores a video frame from the target object. The trajectory information of each trajectory extracted in the trajectory and the target original image, wherein the trajectory information of each trajectory includes: overlapping state information with other trajectories;

组合模块720,用于根据所述交叠状态信息,将所述第一轨迹集合中发生交叠的至少两条轨迹划分为一组,每组确定为一条组合轨迹;The combining module 720 is configured to divide at least two tracks that overlap in the first track set into a group according to the overlapping state information, and each group is determined as a combined track;

平移模块730,用于沿时间轴将待平移轨迹中满足预设平移条件的轨迹平移至同一目标时间段,其中,所述待平移轨迹包括:所述组合轨迹和/或所述第一轨迹集合中未发生交叠的轨迹;a panning module 730, configured to translate, according to a time axis, a trajectory that satisfies a preset panning condition in the trajectory to be translated to the same target time segment, where the trajectory to be translated includes: the combined trajectory and/or the first trajectory set There is no overlapping trajectory in the middle;

第一获取模块740,用于从所述数据库中获取所述目标时间段中的各轨迹对应的第一目标原图;a first acquiring module 740, configured to acquire, from the database, a first target original image corresponding to each track in the target time period;

第二获取模块750,用于获得用于生成视频摘要的第一摘要背景图;a second obtaining module 750, configured to obtain a first abstract background image for generating a video summary;

拼接模块760,用于将各第一目标原图与所述第一摘要背景图进行拼接,生成视频摘要。The splicing module 760 is configured to splicing the first target original image and the first abstract background image to generate a video summary.

应用本实施例,通过对已建立的数据库进行检索,得到满足目标检索条件的轨迹,将这些轨迹中发生交叠的轨迹组合为组合轨迹,然后平移不同时间段的组合轨迹和未发生交叠的轨迹至同一目标时间段,最后对目标时间段中的轨迹对应的目标原图和摘要背景图进行拼接,生成视频摘要;本申请实施例在生成视频摘要时,将发生交叠的多个目标对象的轨迹组合成一条组合 轨迹,在时间轴上整体平移,避免在平移时丢失交叠的轨迹中的某些轨迹,提高生成视频摘要的视觉效果。Applying the embodiment, by searching the established database, the trajectories satisfying the target retrieval condition are obtained, and the overlapping trajectories in the trajectories are combined into a combined trajectory, and then the combined trajectories of different time periods are translated and overlapped without overlapping. Tracking to the same target time segment, and finally splicing the target original image and the abstract background image corresponding to the trajectory in the target time segment to generate a video summary; in the embodiment of the present application, when the video summary is generated, overlapping multiple target objects will occur. The trajectories are combined into a combined trajectory, and the whole trajectory is shifted on the time axis to avoid losing some trajectories in the overlapping trajectories during translation, thereby improving the visual effect of generating a video summary.

可选的,所述目标检索条件可以包括:检索时间段和/或目标对象的检索属性信息;Optionally, the target search condition may include: retrieving a time period and/or retrieval attribute information of the target object;

当检索条件仅包括检索时间段时,所述检索模块,具体可以用于:When the retrieval condition includes only the retrieval time period, the retrieval module may specifically be used to:

根据所述检索时间段,对所述数据库进行检索,获得所述检索时间段中各目标对象的轨迹;Searching the database according to the retrieval time period to obtain a trajectory of each target object in the retrieval time period;

当检索条件仅包括目标对象的检索属性信息时,所述检索模块,具体还可以用于:When the search condition includes only the search attribute information of the target object, the search module may specifically be used to:

根据所述目标对象的检索属性信息,对所述数据库进行检索,获得与所述检索属性信息匹配的各目标对象的轨迹;Searching the database according to the retrieval attribute information of the target object, and obtaining a trajectory of each target object that matches the retrieval attribute information;

当检索条件包括检索时间段和目标对象的检索属性信息时,所述检索模块,具体还可以用于:When the retrieval condition includes the retrieval time segment and the retrieval attribute information of the target object, the retrieval module may specifically be used to:

根据所述检索时间段和所述检索目标对象的检索属性信息,对所述数据库进行检索,获得所述检索时间段中,与所述检索属性匹配的各目标对象的轨迹。And searching the database according to the retrieval time period and the retrieval attribute information of the retrieval target object, and obtaining a trajectory of each target object that matches the retrieval attribute in the retrieval time period.

更进一步的,在包含检索模块710、组合模块720、平移模块730、第一获取模块740、第二获取模块750、拼接模块760的基础上,如图8所示,本申请实施例所提供的一种视频摘要的生成装置还可以包括:Further, based on the inclusion of the retrieval module 710, the combination module 720, the translation module 730, the first acquisition module 740, the second acquisition module 750, and the splicing module 760, as shown in FIG. 8, the embodiment of the present application provides A device for generating a video summary may further include:

第一提取模块810,用于从输入的视频中提取各目标对象;a first extraction module 810, configured to extract each target object from the input video;

第二提取模块820,用于提取各目标对象的轨迹信息和属性信息,其中,轨迹信息中包括:轨迹的移动信息、及与其他轨迹间的交叠状态信息;The second extraction module 820 is configured to extract the trajectory information and the attribute information of each target object, where the trajectory information includes: movement information of the trajectory and overlapping state information with other trajectories;

第一存储模块830,用于将各目标对象的轨迹信息和属性信息存储至视频结构化目标描述文件中;a first storage module 830, configured to store the trajectory information and the attribute information of each target object into the video structured target description file;

数据库生成模块840,用于根据所述视频结构化目标描述文件,生成所述数据库;a database generating module 840, configured to generate the database according to the video structured target description file;

第三提取模块850,用于从包含目标对象的视频帧中,提取轨迹信息对应的每一帧的原图和掩码图,根据所述原图及所述掩码图确定与所述轨迹信息对应的每一帧的目标原图;a third extraction module 850, configured to extract, from a video frame that includes the target object, an original image and a mask map of each frame corresponding to the trajectory information, and determine the trajectory information according to the original image and the mask map. Corresponding target image of each frame;

第二存储模块860,用于将所述原图、所述掩码图及所述目标原图存储至所述数据库中。The second storage module 860 is configured to store the original image, the mask map, and the target original image into the database.

应用本实施例,通过对已建立的数据库进行检索,得到满足目标检索条件的轨迹,将这些轨迹中发生交叠的轨迹组合为组合轨迹,然后平移不同时间段的组合轨迹和未发生交叠的轨迹至同一目标时间段,最后对目标时间段中的轨迹对应的目标原图和摘要背景图进行拼接,生成视频摘要;本申请实施例在生成视频摘要时,将发生交叠的多个目标对象的轨迹组合成一条组合轨迹,在时间轴上整体平移,避免在平移时丢失交叠的轨迹中的某些轨迹,提高生成视频摘要的视觉效果。并通过建立数据库,保存目标对象的属性信息、目标原图等信息,在生成视频摘要时,提高生成视频摘要的速度;目标原图通过掩码图与视频帧原图相与得到,由于掩码图体现了目标对象的轮廓,掩码图只是表示目标对象的轮廓,而不包含图像内容,在与视频帧原图相与之后,即得到掩码图的区域的目标对象的图像,相较于直接从视频帧原图中提取目标对象的图像更为准确。Applying the embodiment, by searching the established database, the trajectories satisfying the target retrieval condition are obtained, and the overlapping trajectories in the trajectories are combined into a combined trajectory, and then the combined trajectories of different time periods are translated and overlapped without overlapping. Tracking to the same target time segment, and finally splicing the target original image and the abstract background image corresponding to the trajectory in the target time segment to generate a video summary; in the embodiment of the present application, when the video summary is generated, overlapping multiple target objects will occur. The trajectories are combined into a combined trajectory, and the whole trajectory is shifted on the time axis to avoid losing some trajectories in the overlapping trajectories during translation, thereby improving the visual effect of generating a video summary. And by establishing a database, saving the attribute information of the target object, the original image of the target, and the like, and increasing the speed of generating the video summary when generating the video summary; the target original image is obtained by matching the mask image with the original image of the video frame, due to the mask The figure embodies the outline of the target object. The mask map only represents the outline of the target object, and does not contain the image content. After the original image is combined with the original image of the video frame, the image of the target object in the area of the mask map is obtained. It is more accurate to extract the image of the target object directly from the original video frame.

如图9所示,所述第三提取模块850,可以包括:As shown in FIG. 9, the third extraction module 850 may include:

第一提取子模块851,用于从包含目标对象的视频帧中,提取所述目标对象的运动掩码;a first extraction submodule 851, configured to extract a motion mask of the target object from a video frame that includes the target object;

第一确定子模块852,用于根据所述运动掩码,确定初始的掩码图;a first determining submodule 852, configured to determine an initial mask according to the motion mask;

第二确定子模块853,用于确定所述初始的掩码图的边缘点集;a second determining submodule 853, configured to determine an edge point set of the initial mask map;

第二提取子模块854,用于提取所述边缘点集中的凸集,构成所述掩码图的凸包点集;a second extraction sub-module 854, configured to extract a convex set in the set of edge points, and form a convex punctual point set of the mask map;

填充子模块855,用于填充所述凸包点集对应的凸包,得到最终的掩码图。The filling sub-module 855 is configured to fill the convex hull corresponding to the convex punctual point set to obtain a final mask map.

如图10所示,所述平移模块730,可以包括:As shown in FIG. 10, the translation module 730 can include:

队列建立子模块731,用于建立待平移队列及摘要队列;a queue establishment sub-module 731, configured to establish a queue to be translated and a summary queue;

第一存储子模块732,用于将所述第一轨迹集合中未在所述目标时间段中的轨迹作为待平移轨迹,并存储至待平移队列中,其中,所述待平移轨迹为所述第一轨迹集合中的未在所述目标时间段中的组合轨迹和未发生交叠的轨迹;a first storage sub-module 732, configured to use a trajectory that is not in the target time period in the target trajectory as a to-be-translated trajectory, and store the trajectory to be translated into a queue to be translated, where the to-be-translated trajectory is the a combined trajectory in the first trajectory set that is not in the target time period and a trajectory that does not overlap;

第二存储子模块733,用于将所述第一轨迹集合中的在所述目标时间段中的轨迹存储至所述摘要队列中;a second storage submodule 733, configured to store, in the first time set, a track in the target time period into the summary queue;

第三提取子模块734,用于依次从所述待平移队列中提取当前待平移轨迹,并根据所述数据库中的原图,得到所述当前待平移轨迹对应的视频帧中各目标对象的矩形框;a third extraction sub-module 734, configured to sequentially extract a current to-be-translated trajectory from the to-be-translated trajectory, and obtain a rectangle of each target object in the video frame corresponding to the current to-be-translated trajectory according to an original image in the database frame;

运算子模块735,用于计算各目标对象的矩形框分别与已存储至所述摘要队列的每条轨迹对应的视频帧中目标对象的矩形框之间的重叠面积;The operation sub-module 735 is configured to calculate an overlapping area between the rectangular frame of each target object and the rectangular frame of the target object in the video frame corresponding to each track that has been stored in the summary queue;

第三存储子模块736,用于在所述重叠面积小于或等于预设重叠参数阈值时,将所述当前待平移轨迹平移至所述目标时间段,并存储至所述摘要队列;The third storage sub-module 736 is configured to: when the overlapping area is less than or equal to a preset overlapping parameter threshold, translate the current to-be-translated trajectory to the target time segment, and store the trajectory to the summary queue;

所述第一获取模块740,具体可以用于:The first obtaining module 740 is specifically configured to:

获取所述摘要队列中的各轨迹对应的第一目标原图。Obtaining a first target original image corresponding to each track in the summary queue.

可选的,所述每条轨迹的轨迹信息中还可以包括:所述目标对象的目标框信息集合;Optionally, the track information of each track may further include: a target box information set of the target object;

所述拼接模块760,可以包括:The splicing module 760 can include:

第三确定子模块,用于根据所述轨迹信息中所述目标对象的目标框信息集合,确定所述第一目标原图在所述第一摘要背景图中的第一位置;a third determining submodule, configured to determine, according to the target box information set of the target object in the trajectory information, a first position of the first target original image in the first abstract background image;

视频摘要生成子模块,用于将各第一目标原图复制到所述第一摘要背景图中对应的第一位置,生成视频摘要。The video summary generation sub-module is configured to copy each first target original image to a corresponding first position in the first summary background image to generate a video summary.

可选的,所述第三确定子模块,具体可以用于:Optionally, the third determining submodule is specifically configured to:

若各第一目标原图中目标对象有交叠,则设置所对应的轨迹的交叠部分的像素值为各目标对象的目标原图像素值的均值、不交叠部分的像素值为各目标对象的目标原图的像素值,得到待复制图;If the target objects in the first target original image overlap, the pixel value of the overlapping portion of the corresponding trajectory is set to be the mean value of the target original image pixel value of each target object, and the pixel value of the non-overlapping portion is each target. The pixel value of the target original image of the object, and the image to be copied is obtained;

所述拼接模块760,具体可以用于:The splicing module 760 can be specifically configured to:

将各待复制图和目标对象没有交叠的各第一目标原图复制到所述第一摘要背景图中对应的第一位置,生成视频摘要。Copying each of the first target original images that do not overlap each of the to-be-copied image and the target object to a corresponding first position in the first summary background image to generate a video summary.

可选的,所述装置还可以包括:Optionally, the device may further include:

运算模块,用于按预设周期,获取摘要背景图;An operation module, configured to obtain a summary background image according to a preset period;

第三存储模块,用于将获取的每个周期的摘要背景图存储至所述数据库中;a third storage module, configured to store the acquired summary background image of each period into the database;

如图11所示,所述第二获取模块750,可以包括:As shown in FIG. 11, the second obtaining module 750 may include:

划分子模块751,用于按各预设周期所对应的时间,将所述目标时间段划分为与预设周期对应的时间子段;The dividing sub-module 751 is configured to divide the target time segment into time sub-segments corresponding to the preset period according to the time corresponding to each preset period;

第四确定子模块752,用于确定所述目标时间段中、包含轨迹最多的时间子段对应的第一预设周期;a fourth determining sub-module 752, configured to determine a first preset period corresponding to the time sub-segment including the most track in the target time period;

背景图获取子模块753,用于从所述数据库中,获得所述第一预设周期对应的第一摘要背景图。The background image obtaining sub-module 753 is configured to obtain, from the database, a first summary background image corresponding to the first preset period.

如图12所示,所述装置还可以包括:As shown in FIG. 12, the apparatus may further include:

显示模块1210,用于按照用户指令,显示用户交互界面;The display module 1210 is configured to display a user interaction interface according to a user instruction;

接收模块1220,用于接收并保存用户通过所述用户交互界面输入的目标检索条件、预设平移条件和用于生成摘要背景图的预设周期。The receiving module 1220 is configured to receive and save a target retrieval condition input by the user through the user interaction interface, a preset translation condition, and a preset period for generating a summary background image.

可选的,所示装置还可以包括:Optionally, the device shown may further include:

执行模块,用于在接收到用户通过所述用户交互界面输入的启动请求时,执行所述对已建立的数据库进行检索的步骤;An execution module, configured to perform the step of searching the established database when receiving a startup request input by the user through the user interaction interface;

结束模块,用于在接收到用户通过所述用户交互界面输入的中断请求时,结束视频摘要生成的流程。The ending module is configured to end the process of generating the video summary when receiving the interrupt request input by the user through the user interaction interface.

可以理解的是,本申请实施例的另一实施例中视频摘要的生成装置可以同时包括:检索模块710、组合模块720、平移模块730、第一获取模块740、 第二获取模块750、拼接模块760、第一提取模块810、第二提取模块820、第一存储模块830、数据库生成模块840、第三提取模块850、第二存储模块860、运算模块、第三存储模块、显示模块1210、接收模块1220、执行模块和结束模块。It can be understood that the video summary generating apparatus in another embodiment of the embodiment of the present application may include: a retrieval module 710, a combination module 720, a translation module 730, a first acquisition module 740, a second acquisition module 750, and a splicing module. 760, the first extraction module 810, the second extraction module 820, the first storage module 830, the database generation module 840, the third extraction module 850, the second storage module 860, the operation module, the third storage module, the display module 1210, and the receiving Module 1220, an execution module, and an end module.

本申请实施例还提供了一种计算机设备,包括处理器和存储器,其中,The embodiment of the present application further provides a computer device, including a processor and a memory, where

存储器,用于存放计算机程序;a memory for storing a computer program;

处理器,用于执行存储器上所存放的计算机程序时,实现本申请实施例所提供的上述视频摘要的生成方法的所有步骤。The processor, when used to execute the computer program stored in the memory, implements all the steps of the method for generating the video summary provided by the embodiment of the present application.

上述图像采集器可以包括IPC(IP Camera,网络摄像机)、智能照相机等。The above image collector may include an IPC (IP Camera), a smart camera, and the like.

上述存储器可以包括RAM(Random Access Memory,随机存取存储器),也可以包括NVM(Non-Volatile Memory,非易失性存储器),例如至少一个磁盘存储器。可选的,存储器还可以是至少一个位于远离前述处理器的存储装置。The above memory may include a RAM (Random Access Memory), and may also include NVM (Non-Volatile Memory), such as at least one disk storage. Optionally, the memory may also be at least one storage device located away from the aforementioned processor.

上述处理器可以是通用处理器,包括CPU(Central Processing Unit,中央处理器)、NP(Network Processor,网络处理器)等;还可以是DSP(Digital Signal Processing,数字信号处理器)、ASIC(Application Specific Integrated Circuit,专用集成电路)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。The processor may be a general-purpose processor, including a CPU (Central Processing Unit), an NP (Network Processor), or the like; or a DSP (Digital Signal Processing) or an ASIC (Application) Specific Integrated Circuit, FPGA (Field-Programmable Gate Array) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components.

本实施例中,处理器通过读取存储器中存储的计算机程序,并通过运行该计算机程序,能够实现:通过对已建立的数据库进行检索,得到满足目标检索条件的轨迹,将这些轨迹中发生交叠的轨迹组合为组合轨迹,然后平移不同时间段的组合轨迹和未发生交叠的轨迹至同一目标时间段,最后对目标时间段中的轨迹对应的目标原图和摘要背景图进行拼接,生成视频摘要;本申请实施例在生成视频摘要时,将发生交叠的多个目标对象的轨迹组合成一条组合轨迹,在时间轴上整体平移,避免在平移时丢失交叠的轨迹中的某些轨迹,提高生成视频摘要的视觉效果。In this embodiment, by reading the computer program stored in the memory and running the computer program, the processor can realize: by searching the established database, obtaining a trajectory that satisfies the target retrieval condition, and the trajectories are generated in the trajectory. The stacked trajectories are combined into a combined trajectory, and then the combined trajectories of different time periods and the trajectories that do not overlap are translated to the same target time period, and finally the target original image and the abstract background image corresponding to the trajectory in the target time segment are spliced and generated. The video summary; in the embodiment of the present application, when the video summary is generated, the trajectories of the overlapping multiple target objects are combined into one combined trajectory, and the whole trajectory is shifted on the time axis to avoid losing some of the overlapping trajectories during translation. Tracks to improve the visuals of generating video summaries.

另外,相应于上述实施例所提供的视频摘要的生成方法,本申请实施例 提供了一种存储介质,用于存储计算机程序,所述计算机程序被处理器执行时,实现上述视频摘要的生成方法的所有步骤。In addition, corresponding to the method for generating a video summary provided by the foregoing embodiment, the embodiment of the present application provides a storage medium for storing a computer program, and when the computer program is executed by the processor, the method for generating the video summary is implemented. All the steps.

本实施例中,存储介质存储有在运行时执行本申请实施例所提供的视频摘要的生成方法的应用程序,因此能够实现:通过对已建立的数据库进行检索,得到满足目标检索条件的轨迹,将这些轨迹中发生交叠的轨迹组合为组合轨迹,然后平移不同时间段的组合轨迹和未发生交叠的轨迹至同一目标时间段,最后对目标时间段中的轨迹对应的目标原图和摘要背景图进行拼接,生成视频摘要;本申请实施例在生成视频摘要时,将发生交叠的多个目标对象的轨迹组合成一条组合轨迹,在时间轴上整体平移,避免在平移时丢失交叠的轨迹中的某些轨迹,提高生成视频摘要的视觉效果。In this embodiment, the storage medium stores an application that executes the method for generating the video digest provided by the embodiment of the present application at runtime, and thus can implement: by searching the established database, obtaining a trajectory that satisfies the target retrieval condition. Combine the trajectories that overlap in these trajectories into combined trajectories, then translate the combined trajectories of different time periods and the trajectories that have not overlapped to the same target time period, and finally the target original images and abstracts corresponding to the trajectories in the target time period. The background image is spliced to generate a video summary. In the embodiment of the present application, when the video summary is generated, the trajectories of the overlapping multiple target objects are combined into one combined trajectory, and the whole time is shifted on the time axis to avoid losing the overlap during translation. Some of the trajectories in the trajectory enhance the visual effect of generating a video summary.

此外,相应于上述实施例所提供的视频摘要的生成方法,本申请实施例提供了一种应用程序,用于在运行时执行:本申请实施例所提供的上述视频摘要的生成方法步骤。In addition, corresponding to the method for generating the video summary provided by the foregoing embodiment, the embodiment of the present application provides an application program for performing the following steps of the method for generating the video summary provided by the embodiment of the present application.

本实施例中,应用程序在运行时执行本申请实施例所提供的视频摘要的生成方法,因此能够实现:通过对已建立的数据库进行检索,得到满足目标检索条件的轨迹,将这些轨迹中发生交叠的轨迹组合为组合轨迹,然后平移不同时间段的组合轨迹和未发生交叠的轨迹至同一目标时间段,最后对目标时间段中的轨迹对应的目标原图和摘要背景图进行拼接,生成视频摘要;本申请实施例在生成视频摘要时,将发生交叠的多个目标对象的轨迹组合成一条组合轨迹,在时间轴上整体平移,避免在平移时丢失交叠的轨迹中的某些轨迹,提高生成视频摘要的视觉效果。In this embodiment, the application performs the method for generating the video summary provided by the embodiment of the present application at runtime, so that the trajectory that satisfies the target retrieval condition is obtained by searching the established database, and the trajectory occurs in the trajectory. The overlapping trajectories are combined into a combined trajectory, and then the combined trajectories of different time periods and the trajectories that do not overlap are translated to the same target time period, and finally the target original image and the abstract background image corresponding to the trajectory in the target time segment are spliced. Generating a video summary. When generating a video summary, the embodiment of the present application combines the trajectories of multiple overlapping target objects into a combined trajectory, and performs overall translation on the time axis to avoid losing one of the overlapping trajectories during translation. These tracks improve the visual effect of generating a video summary.

对于计算机设备、应用程序以及存储介质实施例而言,由于其所涉及的方法内容基本相似于前述的方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。For the computer device, the application program and the storage medium embodiment, since the method content involved is basically similar to the foregoing method embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.

需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列 出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that, in this context, relational terms such as first and second are used merely to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply such entities or operations. There is any such actual relationship or order between them. Furthermore, the term "comprises" or "comprises" or "comprises" or any other variations thereof is intended to encompass a non-exclusive inclusion, such that a process, method, article, or device that comprises a plurality of elements includes not only those elements but also Other elements, or elements that are inherent to such a process, method, item, or device. An element that is defined by the phrase "comprising a ..." does not exclude the presence of additional equivalent elements in the process, method, item, or device that comprises the element.

本说明书中的各个实施例均采用相关的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于系统实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。The various embodiments in the present specification are described in a related manner, and the same or similar parts between the various embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.

以上所述仅为本申请的较佳实施例而已,并不用以限制本申请,凡在本申请的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本申请保护的范围之内。The above is only the preferred embodiment of the present application, and is not intended to limit the present application. Any modifications, equivalent substitutions, improvements, etc., which are made within the spirit and principles of the present application, should be included in the present application. Within the scope of protection.

Claims (21)

一种视频摘要的生成方法,其特征在于,所述方法包括:A method for generating a video summary, the method comprising: 获取目标检索条件,对已建立的数据库进行检索,得到包含符合所述目标检索条件的轨迹的第一轨迹集合,其中,所述数据库中存储有从包含目标对象的视频帧中提取的每条轨迹的轨迹信息和目标原图,所述每条轨迹的轨迹信息中包括:与其他轨迹间的交叠状态信息;Obtaining a target retrieval condition, and retrieving the established database to obtain a first trajectory set including a trajectory that meets the target retrieval condition, wherein the database stores each trajectory extracted from a video frame containing the target object Trajectory information and target original image, wherein the trajectory information of each trajectory includes: overlapping state information with other trajectories; 根据所述交叠状态信息,将所述第一轨迹集合中发生交叠的至少两条轨迹划分为一组,每组确定为一条组合轨迹;And dividing, according to the overlapping state information, at least two tracks that overlap in the first set of tracks into a group, each group being determined as a combined track; 沿时间轴将待平移轨迹中满足预设平移条件的轨迹平移至同一目标时间段,其中,所述待平移轨迹包括:所述组合轨迹和/或所述第一轨迹集合中未发生交叠的轨迹;Translating a trajectory of the to-be-translated trajectory that meets the preset translation condition to the same target time period along the time axis, wherein the trajectory to be translated includes: the combined trajectory and/or the overlapping of the first trajectory set Trajectory 从所述数据库中获取所述目标时间段中的各轨迹对应的第一目标原图;Acquiring, from the database, a first target original image corresponding to each track in the target time period; 获得用于生成视频摘要的第一摘要背景图;Obtaining a first summary background image for generating a video summary; 将各第一目标原图与所述第一摘要背景图进行拼接,生成视频摘要。The first target original image is spliced with the first abstract background image to generate a video summary. 根据权利要求1所述的方法,其特征在于,所述目标检索条件包括:检索时间段和/或目标对象的检索属性信息;The method according to claim 1, wherein the target retrieval condition comprises: retrieving a time period and/or retrieval attribute information of the target object; 当检索条件仅包括检索时间段时,所述对已建立的数据库进行检索,包括:When the retrieval condition includes only the retrieval time period, the searching for the established database includes: 根据所述检索时间段,对所述数据库进行检索,获得所述检索时间段中各目标对象的轨迹;Searching the database according to the retrieval time period to obtain a trajectory of each target object in the retrieval time period; 当检索条件仅包括目标对象的检索属性信息时,所述对已建立的数据库进行检索,包括:When the retrieval condition includes only the retrieval attribute information of the target object, the searching for the established database includes: 根据所述目标对象的检索属性信息,对所述数据库进行检索,获得与所述检索属性信息匹配的各目标对象的轨迹;Searching the database according to the retrieval attribute information of the target object, and obtaining a trajectory of each target object that matches the retrieval attribute information; 当检索条件包括检索时间段和目标对象的检索属性信息时,所述对已建立的数据库进行检索,包括:When the retrieval condition includes the retrieval time period and the retrieval attribute information of the target object, the searching for the established database includes: 根据所述检索时间段和所述检索目标对象的检索属性信息,对所述数据库进行检索,获得所述检索时间段中,与所述检索属性匹配的各目标对象的轨迹。And searching the database according to the retrieval time period and the retrieval attribute information of the retrieval target object, and obtaining a trajectory of each target object that matches the retrieval attribute in the retrieval time period. 根据权利要求1所述的方法,其特征在于,在所述获取目标检索条件之前,所述方法还包括:The method according to claim 1, wherein before the obtaining the target retrieval condition, the method further comprises: 从输入的视频中提取各目标对象;Extract each target object from the input video; 提取各目标对象的轨迹信息和属性信息,其中,轨迹信息中包括:轨迹的移动信息、及与其他轨迹间的交叠状态信息;Extracting trajectory information and attribute information of each target object, where the trajectory information includes: movement information of the trajectory and overlapping state information with other trajectories; 将各目标对象的轨迹信息和属性信息存储至视频结构化目标描述文件中;Storing the track information and attribute information of each target object into a video structured target description file; 根据所述视频结构化目标描述文件,生成所述数据库;Generating the database according to the video structured target description file; 从包含目标对象的视频帧中,提取轨迹信息对应的每一帧的原图和掩码图,根据所述原图及所述掩码图确定与所述轨迹信息对应的每一帧的目标原图;Extracting, from a video frame that includes the target object, an original image and a mask map of each frame corresponding to the trajectory information, and determining, according to the original image and the mask map, a target original of each frame corresponding to the trajectory information. Figure 将所述原图、所述掩码图及所述目标原图存储至所述数据库中。The original image, the mask map, and the target original image are stored in the database. 根据权利要求3所述的方法,其特征在于,所述从包含目标对象的视频帧中,提取轨迹信息对应的每一帧的掩码图,包括:The method according to claim 3, wherein the extracting a mask map of each frame corresponding to the trajectory information from the video frame including the target object comprises: 从包含目标对象的视频帧中,提取所述目标对象的运动掩码;Extracting a motion mask of the target object from a video frame containing the target object; 根据所述运动掩码,确定初始的掩码图;Determining an initial mask map according to the motion mask; 确定所述初始的掩码图的边缘点集;Determining an edge point set of the initial mask map; 提取所述边缘点集中的凸集,构成所述掩码图的凸包点集;Extracting a convex set in the set of edge points to form a convex set of points of the mask map; 填充所述凸包点集对应的凸包,得到最终的掩码图。Filling the convex hull corresponding to the convex hull point set to obtain a final mask map. 根据权利要求3所述的方法,其特征在于,所述沿时间轴将待平移轨迹中满足预设平移条件的轨迹平移至同一目标时间段,包括:The method according to claim 3, wherein the shifting the trajectory of the to-be-translated trajectory that satisfies the preset translation condition to the same target time period along the time axis comprises: 建立待平移队列及摘要队列;Establish a queue to be translated and a summary queue; 将所述第一轨迹集合中未在所述目标时间段中的轨迹作为待平移轨迹, 33并存储至待平移队列中,其中,所述待平移轨迹为所述第一轨迹集合中的未在所述目标时间段中的组合轨迹和未发生交叠的轨迹;And a track that is not in the target time segment in the first track set is used as a to-be-translated track, and is stored in a queue to be translated, wherein the to-be-translated track is not in the first track set. a combined trajectory in the target time period and a trajectory in which no overlap occurs; 将所述第一轨迹集合中的在所述目标时间段中的轨迹存储至所述摘要队列中;Storing the trajectory in the target time period in the first trajectory set into the summary queue; 依次从所述待平移队列中提取当前待平移轨迹,并根据所述数据库中的原图,得到所述当前待平移轨迹对应的视频帧中各目标对象的矩形框;Extracting the current to-be-translated trajectory from the to-be-translated queue, and obtaining a rectangular frame of each target object in the video frame corresponding to the current to-be-translated trajectory according to the original image in the database; 计算各目标对象的矩形框分别与已存储至所述摘要队列的每条轨迹对应的视频帧中目标对象的矩形框之间的重叠面积;Calculating an overlapping area between a rectangular frame of each target object and a rectangular frame of the target object in the video frame corresponding to each track that has been stored in the summary queue; 在所述重叠面积小于或等于预设重叠参数阈值时,将所述当前待平移轨迹平移至所述目标时间段,并存储至所述摘要队列;When the overlapping area is less than or equal to a preset overlapping parameter threshold, the current to-be-translated trajectory is translated to the target time period, and stored in the summary queue; 所述从所述数据库中获取所述目标时间段的各轨迹对应的第一目标原图的步骤,包括:And the step of acquiring the first target original image corresponding to each track of the target time period from the database, including: 获取所述摘要队列中的各轨迹对应的第一目标原图。Obtaining a first target original image corresponding to each track in the summary queue. 根据权利要求3所述的方法,其特征在于,所述每条轨迹的轨迹信息中还包括:所述目标对象的目标框信息集合;The method according to claim 3, wherein the track information of each track further comprises: a target frame information set of the target object; 所述将所述各第一目标原图与所述第一摘要背景图进行拼接,生成视频摘要,包括:The splicing the first target original image and the first abstract background image to generate a video summary, including: 根据所述轨迹信息中所述目标对象的目标框信息集合,确定所述第一目标原图在所述第一摘要背景图中的第一位置;Determining, according to the target frame information set of the target object in the trajectory information, a first position of the first target original image in the first abstract background image; 将各第一目标原图复制到所述第一摘要背景图中对应的第一位置,生成视频摘要。Copying each first target original image to a corresponding first position in the first abstract background image to generate a video summary. 根据权利要求6所述的方法,其特征在于,所述将各第一目标原图复制到所述第一摘要背景图中对应的第一位置,包括:The method according to claim 6, wherein the copying each of the first target original images to the corresponding first position in the first abstract background image comprises: 若各第一目标原图中目标对象有交叠,则设置所对应的轨迹的交叠部分的像素值为各目标对象的目标原图像素值的均值、不交叠部分的像素值为各目标对象的目标原图的像素值,得到待复制图;If the target objects in the first target original image overlap, the pixel value of the overlapping portion of the corresponding trajectory is set to be the mean value of the target original image pixel value of each target object, and the pixel value of the non-overlapping portion is each target. The pixel value of the target original image of the object, and the image to be copied is obtained; 所述将各第一目标原图复制到所述第一摘要背景图中对应的第一位置,生成视频摘要,包括:Copying each first target original image to a corresponding first location in the first summary background image to generate a video summary, including: 将各待复制图和目标对象没有交叠的各第一目标原图复制到所述第一摘要背景图中对应的第一位置,生成视频摘要。Copying each of the first target original images that do not overlap each of the to-be-copied image and the target object to a corresponding first position in the first summary background image to generate a video summary. 根据权利要求3所述的方法,其特征在于,在将所述原图、所述掩码图及所述目标原图存储至所述数据库中之前,所述方法还包括:The method according to claim 3, wherein before the storing the original image, the mask map and the target original image in the database, the method further comprises: 按预设周期,获取摘要背景图;Obtain a summary background image according to a preset period; 将获取的每个周期的摘要背景图存储至所述数据库中;Storing a summary background image of each cycle acquired into the database; 所述获得用于生成视频摘要的第一摘要背景图,包括:The obtaining a first abstract background image for generating a video summary includes: 按各预设周期所对应的时间,将所述目标时间段划分为与预设周期对应的时间子段;Dividing the target time period into time sub-segments corresponding to preset periods according to time corresponding to each preset period; 确定所述目标时间段中、包含轨迹最多的时间子段对应的第一预设周期;Determining, in the target time period, a first preset period corresponding to a time sub-segment containing the most track; 从所述数据库中,获得所述第一预设周期对应的第一摘要背景图。Obtaining, from the database, a first summary background image corresponding to the first preset period. 根据权利要求8所述的方法,其特征在于,在所述获取目标检索条件之前,所述方法还包括:The method according to claim 8, wherein before the obtaining the target retrieval condition, the method further comprises: 按照用户指令,显示用户交互界面;Display the user interaction interface according to user instructions; 接收并保存用户通过所述用户交互界面输入的目标检索条件、预设平移条件和用于生成摘要背景图的预设周期;Receiving and saving a target retrieval condition input by the user through the user interaction interface, a preset translation condition, and a preset period for generating a summary background image; 所述方法还包括:The method further includes: 在接收到用户通过所述用户交互界面输入的启动请求时,执行所述对已建立的数据库进行检索的步骤;Performing the step of retrieving the established database when receiving a startup request input by the user through the user interaction interface; 在接收到用户通过所述用户交互界面输入的中断请求时,结束视频摘要生成的流程。When the interrupt request input by the user through the user interaction interface is received, the process of generating the video summary is ended. 一种视频摘要的生成装置,其特征在于,所述装置包括:A device for generating a video summary, the device comprising: 检索模块,用于获取目标检索条件,对已建立的数据库进行检索,得到 包含符合所述目标检索条件的轨迹的第一轨迹集合,其中,所述数据库中存储有从包含目标对象的视频帧中提取的每条轨迹的轨迹信息和目标原图,所述每条轨迹的轨迹信息中包括:与其他轨迹间的交叠状态信息;a retrieval module, configured to acquire a target retrieval condition, and retrieve the established database to obtain a first trajectory set including a trajectory that meets the target retrieval condition, wherein the database stores the video frame from the target object Extracting the trajectory information of each trajectory and the target original image, wherein the trajectory information of each trajectory includes: overlapping state information with other trajectories; 组合模块,用于根据所述交叠状态信息,将所述第一轨迹集合中发生交叠的至少两条轨迹划分为一组,每组确定为一条组合轨迹;a combination module, configured to divide at least two tracks that overlap in the first set of tracks into a group according to the overlapping state information, and each group is determined as a combined track; 平移模块,用于沿时间轴将待平移轨迹中满足预设平移条件的轨迹平移至同一目标时间段,其中,所述待平移轨迹包括:所述组合轨迹和/或所述第一轨迹集合中未发生交叠的轨迹;a panning module for translating a trajectory of the to-be-translated trajectory that satisfies the preset panning condition to the same target time segment along the time axis, wherein the to-be-translated trajectory comprises: the combined trajectory and/or the first trajectory set No overlapping trajectories occur; 第一获取模块,用于从所述数据库中获取所述目标时间段中的各轨迹对应的第一目标原图;a first acquiring module, configured to acquire, from the database, a first target original image corresponding to each track in the target time period; 第二获取模块,用于获得用于生成视频摘要的第一摘要背景图;a second obtaining module, configured to obtain a first abstract background image for generating a video summary; 拼接模块,用于将各第一目标原图与所述第一摘要背景图进行拼接,生成视频摘要。a splicing module, configured to splicing each of the first target original images and the first abstract background image to generate a video summary. 根据权利要求10所述的装置,其特征在于,所述目标检索条件包括:检索时间段和/或目标对象的检索属性信息;The apparatus according to claim 10, wherein said target retrieval condition comprises: retrieving a time period and/or retrieval attribute information of the target object; 当检索条件仅包括检索时间段时,所述检索模块,具体用于:When the retrieval condition includes only the retrieval time period, the retrieval module is specifically configured to: 根据所述检索时间段,对所述数据库进行检索,获得所述检索时间段中各目标对象的轨迹;Searching the database according to the retrieval time period to obtain a trajectory of each target object in the retrieval time period; 当检索条件仅包括目标对象的检索属性信息时,所述检索模块,具体用于:When the search condition includes only the search attribute information of the target object, the search module is specifically configured to: 根据所述目标对象的检索属性信息,对所述数据库进行检索,获得与所述检索属性信息匹配的各目标对象的轨迹;Searching the database according to the retrieval attribute information of the target object, and obtaining a trajectory of each target object that matches the retrieval attribute information; 当检索条件包括检索时间段和目标对象的检索属性信息时,所述检索模块,具体用于:When the retrieval condition includes the retrieval time period and the retrieval attribute information of the target object, the retrieval module is specifically configured to: 根据所述检索时间段和所述检索目标对象的检索属性信息,对所述数据库进行检索,获得所述检索时间段中,与所述检索属性匹配的各目标对象的 轨迹。And searching the database according to the retrieval time period and the retrieval attribute information of the retrieval target object, and obtaining a trajectory of each target object that matches the retrieval attribute in the retrieval time period. 根据权利要求10所述的装置,其特征在于,所述装置还包括:The device according to claim 10, wherein the device further comprises: 第一提取模块,用于从输入的视频中提取各目标对象;a first extraction module, configured to extract each target object from the input video; 第二提取模块,用于提取各目标对象的轨迹信息和属性信息,其中,轨迹信息中包括:轨迹的移动信息、及与其他轨迹间的交叠状态信息;a second extraction module, configured to extract trajectory information and attribute information of each target object, where the trajectory information includes: movement information of the trajectory, and overlapping state information with other trajectories; 第一存储模块,用于将各目标对象的轨迹信息和属性信息存储至视频结构化目标描述文件中;a first storage module, configured to store track information and attribute information of each target object into a video structured target description file; 数据库生成模块,用于根据所述视频结构化目标描述文件,生成所述数据库;a database generating module, configured to generate the database according to the video structured target description file; 第三提取模块,用于从包含目标对象的视频帧中,提取轨迹信息对应的每一帧的原图和掩码图,根据所述原图及所述掩码图确定与所述轨迹信息对应的每一帧的目标原图;a third extraction module, configured to extract an original image and a mask map of each frame corresponding to the trajectory information from the video frame that includes the target object, and determine, according to the original image and the mask map, the trajectory information corresponding to the trajectory information The original image of each frame of the frame; 第二存储模块,用于将所述原图、所述掩码图及所述目标原图存储至所述数据库中。And a second storage module, configured to store the original image, the mask map, and the target original image into the database. 根据权利要求12所述的装置,其特征在于,所述第三提取模块,包括:The apparatus according to claim 12, wherein the third extraction module comprises: 第一提取子模块,用于从包含目标对象的视频帧中,提取所述目标对象的运动掩码;a first extraction submodule, configured to extract a motion mask of the target object from a video frame that includes the target object; 第一确定子模块,用于根据所述运动掩码,确定初始的掩码图;a first determining submodule, configured to determine an initial mask according to the motion mask; 第二确定子模块,用于确定所述初始的掩码图的边缘点集;a second determining submodule, configured to determine an edge point set of the initial mask map; 第二提取子模块,用于提取所述边缘点集中的凸集,构成所述掩码图的凸包点集;a second extraction submodule, configured to extract a convex set in the set of edge points, and form a convex punctual point set of the mask map; 填充子模块,用于填充所述凸包点集对应的凸包,得到最终的掩码图。The filler submodule is configured to fill the convex hull corresponding to the convex hull point set to obtain a final mask map. 根据权利要求10所述的装置,其特征在于,所述平移模块,包括:The device of claim 10, wherein the translation module comprises: 队列建立子模块,用于建立待平移队列及摘要队列;a queue creation sub-module for establishing a queue to be translated and a summary queue; 第一存储子模块,用于将所述第一轨迹集合中未在所述目标时间段中的轨迹作为待平移轨迹,并存储至待平移队列中,其中,所述待平移轨迹为所述第一轨迹集合中的未在所述目标时间段中的组合轨迹和未发生交叠的轨迹;a first storage sub-module, configured to use a trajectory in the target trajectory that is not in the target time segment as a trajectory to be translated, and store the trajectory in the to-be-translated trajectory, where the trajectory to be translated is the a combined trajectory in a set of trajectories that is not in the target time period and a trajectory that does not overlap; 第二存储子模块,用于将所述第一轨迹集合中的在所述目标时间段中的轨迹存储至所述摘要队列中;a second storage submodule, configured to store, in the first time set, a track in the target time period into the summary queue; 第三提取子模块,用于依次从所述待平移队列中提取当前待平移轨迹,并根据所述数据库中的原图,得到所述当前待平移轨迹对应的视频帧中各目标对象的矩形框;a third extraction sub-module, configured to sequentially extract a current to-be-translated trajectory from the to-be-translated queue, and obtain a rectangular frame of each target object in the video frame corresponding to the current to-be-translated trajectory according to the original image in the database ; 运算子模块,用于计算各目标对象的矩形框分别与已存储至所述摘要队列的每条轨迹对应的视频帧中目标对象的矩形框之间的重叠面积;An operation submodule, configured to calculate an overlapping area between a rectangular frame of each target object and a rectangular frame of the target object in a video frame corresponding to each track that has been stored in the summary queue; 第三存储子模块,用于在所述重叠面积小于或等于预设重叠参数阈值时,将所述当前待平移轨迹平移至所述目标时间段,并存储至所述摘要队列;a third storage submodule, configured to: when the overlapping area is less than or equal to a preset overlapping parameter threshold, translate the current to-be-translated trajectory to the target time segment, and store the trajectory to the summary queue; 所述第一获取模块,具体用于:The first acquiring module is specifically configured to: 获取所述摘要队列中的各轨迹对应的第一目标原图。Obtaining a first target original image corresponding to each track in the summary queue. 根据权利要求12所述的装置,其特征在于,所述每条轨迹的轨迹信息中还包括:所述目标对象的目标框信息集合;The apparatus according to claim 12, wherein the track information of each track further comprises: a target frame information set of the target object; 所述拼接模块,包括:The splicing module includes: 第三确定子模块,用于根据所述轨迹信息中所述目标对象的目标框信息集合,确定所述第一目标原图在所述第一摘要背景图中的第一位置;a third determining submodule, configured to determine, according to the target box information set of the target object in the trajectory information, a first position of the first target original image in the first abstract background image; 视频摘要生成子模块,用于将各第一目标原图复制到所述第一摘要背景图中对应的第一位置,生成视频摘要。The video summary generation sub-module is configured to copy each first target original image to a corresponding first position in the first summary background image to generate a video summary. 根据权利要求15所述的装置,其特征在于,所述第三确定子模块,具体用于:The device according to claim 15, wherein the third determining submodule is specifically configured to: 若各第一目标原图中目标对象有交叠,则设置所对应的轨迹的交叠部分的像素值为各目标对象的目标原图像素值的均值、不交叠部分的像素值为各目标对象的目标原图的像素值,得到待复制图;If the target objects in the first target original image overlap, the pixel value of the overlapping portion of the corresponding trajectory is set to be the mean value of the target original image pixel value of each target object, and the pixel value of the non-overlapping portion is each target. The pixel value of the target original image of the object, and the image to be copied is obtained; 所述拼接模块,具体用于:The splicing module is specifically configured to: 将各待复制图和目标对象没有交叠的各第一目标原图复制到所述第一摘要背景图中对应的第一位置,生成视频摘要。Copying each of the first target original images that do not overlap each of the to-be-copied image and the target object to a corresponding first position in the first summary background image to generate a video summary. 根据权利要求10所述的装置,其特征在于,所述装置还包括:The device according to claim 10, wherein the device further comprises: 运算模块,用于按预设周期,获取摘要背景图;An operation module, configured to obtain a summary background image according to a preset period; 第三存储模块,用于将获取的每个周期的摘要背景图存储至所述数据库中;a third storage module, configured to store the acquired summary background image of each period into the database; 所述第二获取模块,包括:The second acquiring module includes: 划分子模块,用于按各预设周期所对应的时间,将所述目标时间段划分为与预设周期对应的时间子段;a sub-module, configured to divide the target time segment into a time sub-segment corresponding to the preset period according to a time corresponding to each preset period; 第四确定子模块,用于确定所述目标时间段中、包含轨迹最多的时间子段对应的第一预设周期;a fourth determining sub-module, configured to determine a first preset period corresponding to the time sub-segment including the most trajectory in the target time period; 背景图获取子模块,用于从所述数据库中,获得所述第一预设周期对应的第一摘要背景图。The background image obtaining sub-module is configured to obtain, from the database, a first abstract background image corresponding to the first preset period. 根据权利要求10所述的装置,其特征在于,所述装置还包括:The device according to claim 10, wherein the device further comprises: 显示模块,用于按照用户指令,显示用户交互界面;a display module, configured to display a user interaction interface according to a user instruction; 接收模块,用于接收并保存用户通过所述用户交互界面输入的目标检索条件、预设平移条件和用于生成摘要背景图的预设周期;a receiving module, configured to receive and save a target retrieval condition input by the user through the user interaction interface, a preset translation condition, and a preset period for generating a summary background image; 执行模块,用于在接收到用户通过所述用户交互界面输入的启动请求时,执行所述对已建立的数据库进行检索的步骤;An execution module, configured to perform the step of searching the established database when receiving a startup request input by the user through the user interaction interface; 结束模块,用于在接收到用户通过所述用户交互界面输入的中断请求时,结束视频摘要生成的流程。The ending module is configured to end the process of generating the video summary when receiving the interrupt request input by the user through the user interaction interface. 一种计算机设备,其特征在于,包括处理器和存储器,其中,A computer device, comprising: a processor and a memory, wherein 所述存储器,用于存放计算机程序;The memory is configured to store a computer program; 所述处理器,用于执行所述存储器上所存放的计算机程序时,实现权利 要求1-9任一所述的方法步骤。The processor, when executed to execute a computer program stored on the memory, implements the method steps of any of claims 1-9. 一种存储介质,其特征在于,用于存储可执行代码,所述可执行代码用于在运行时执行:权利要求1-9任一所述的方法步骤。A storage medium, characterized by storing executable code for execution at runtime: the method steps of any of claims 1-9. 一种应用程序,其特征在于,用于在运行时执行:权利要求1-9任一所述的方法步骤。An application, characterized in that it is executed at runtime: the method steps of any of claims 1-9.
PCT/CN2018/076290 2017-02-17 2018-02-11 Video abstract generation method and device Ceased WO2018149376A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710087044.6 2017-02-17
CN201710087044.6A CN108460032A (en) 2017-02-17 2017-02-17 A kind of generation method and device of video frequency abstract

Publications (1)

Publication Number Publication Date
WO2018149376A1 true WO2018149376A1 (en) 2018-08-23

Family

ID=63170088

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/076290 Ceased WO2018149376A1 (en) 2017-02-17 2018-02-11 Video abstract generation method and device

Country Status (2)

Country Link
CN (1) CN108460032A (en)
WO (1) WO2018149376A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110519532A (en) * 2019-09-02 2019-11-29 中移物联网有限公司 A kind of information acquisition method and electronic equipment
CN110704606A (en) * 2019-08-19 2020-01-17 中国科学院信息工程研究所 Generation type abstract generation method based on image-text fusion
CN111464882A (en) * 2019-01-18 2020-07-28 杭州海康威视数字技术股份有限公司 Video abstract generation method, device, equipment and medium
CN111694984A (en) * 2020-06-12 2020-09-22 百度在线网络技术(北京)有限公司 Video searching method and device, electronic equipment and readable storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114679564B (en) * 2020-12-24 2025-08-22 浙江宇视科技有限公司 Video summary processing method, device, electronic device and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103617234A (en) * 2013-11-26 2014-03-05 公安部第三研究所 Device and method for active video concentration
CN104301699A (en) * 2013-07-16 2015-01-21 浙江大华技术股份有限公司 Image processing method and device
CN104469547A (en) * 2014-12-10 2015-03-25 西安理工大学 A method of video summarization based on tree-like moving target trajectories
CN104639994A (en) * 2013-11-08 2015-05-20 杭州海康威视数字技术股份有限公司 Video abstraction generating method, system and network storage equipment based on moving objects
CN104717573A (en) * 2015-03-05 2015-06-17 广州市维安电子技术有限公司 Video abstract generation method

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101249839B1 (en) * 2008-06-30 2013-04-11 퍼듀 리서치 파운데이션 Image processing apparatus and image processing method thereof
CN102254144A (en) * 2011-07-12 2011-11-23 四川大学 Robust method for extracting two-dimensional code area in image
US9412025B2 (en) * 2012-11-28 2016-08-09 Siemens Schweiz Ag Systems and methods to classify moving airplanes in airports
US9141866B2 (en) * 2013-01-30 2015-09-22 International Business Machines Corporation Summarizing salient events in unmanned aerial videos
KR101804383B1 (en) * 2014-01-14 2017-12-04 한화테크윈 주식회사 System and method for browsing summary image
TW201605239A (en) * 2014-07-22 2016-02-01 鑫洋國際股份有限公司 Video analysis method and video analysis apparatus
CN104657712B (en) * 2015-02-09 2017-11-14 惠州学院 Masked man's detection method in a kind of monitor video
CN104717574B (en) * 2015-03-17 2017-11-24 华中科技大学 The fusion method of event and background in a kind of video frequency abstract

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104301699A (en) * 2013-07-16 2015-01-21 浙江大华技术股份有限公司 Image processing method and device
CN104639994A (en) * 2013-11-08 2015-05-20 杭州海康威视数字技术股份有限公司 Video abstraction generating method, system and network storage equipment based on moving objects
CN103617234A (en) * 2013-11-26 2014-03-05 公安部第三研究所 Device and method for active video concentration
CN104469547A (en) * 2014-12-10 2015-03-25 西安理工大学 A method of video summarization based on tree-like moving target trajectories
CN104717573A (en) * 2015-03-05 2015-06-17 广州市维安电子技术有限公司 Video abstract generation method

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111464882A (en) * 2019-01-18 2020-07-28 杭州海康威视数字技术股份有限公司 Video abstract generation method, device, equipment and medium
CN111464882B (en) * 2019-01-18 2022-03-25 杭州海康威视数字技术股份有限公司 Video abstract generation method, device, equipment and medium
CN110704606A (en) * 2019-08-19 2020-01-17 中国科学院信息工程研究所 Generation type abstract generation method based on image-text fusion
CN110704606B (en) * 2019-08-19 2022-05-31 中国科学院信息工程研究所 A generative summary generation method based on image-text fusion
CN110519532A (en) * 2019-09-02 2019-11-29 中移物联网有限公司 A kind of information acquisition method and electronic equipment
CN111694984A (en) * 2020-06-12 2020-09-22 百度在线网络技术(北京)有限公司 Video searching method and device, electronic equipment and readable storage medium
CN111694984B (en) * 2020-06-12 2023-06-20 百度在线网络技术(北京)有限公司 Video searching method, device, electronic equipment and readable storage medium

Also Published As

Publication number Publication date
CN108460032A (en) 2018-08-28

Similar Documents

Publication Publication Date Title
Braun et al. Eurocity persons: A novel benchmark for person detection in traffic scenes
US11055535B2 (en) Method and device for video classification
US10540772B2 (en) Feature trackability ranking, systems and methods
US9912874B2 (en) Real-time visual effects for a live camera view
WO2018149376A1 (en) Video abstract generation method and device
US11842514B1 (en) Determining a pose of an object from rgb-d images
Braun et al. The eurocity persons dataset: A novel benchmark for object detection
CN104376576B (en) A kind of method for tracking target and device
US9798949B1 (en) Region selection for image match
US20170352162A1 (en) Region-of-interest extraction device and region-of-interest extraction method
US10891019B2 (en) Dynamic thumbnail selection for search results
US10998007B2 (en) Providing context aware video searching
CN110413816A (en) Colored sketches picture search
CN110796701A (en) Identification method, device and equipment of mark points and storage medium
CN114972599B (en) A method for virtualizing a scene
CN107832331A (en) Generation method, device and the equipment of visualized objects
US11961249B2 (en) Generating stereo-based dense depth images
CN110009662A (en) Method, apparatus, electronic device, and computer-readable storage medium for face tracking
US11158122B2 (en) Surface geometry object model training and inference
US20250166135A1 (en) Fine-grained controllable video generation
CN115601672A (en) VR intelligent shop patrol method and device based on deep learning
CN119180997A (en) Target detection model training method and device, electronic equipment and storage medium
CN117455972A (en) UAV ground target positioning method based on monocular depth estimation
CN115240077A (en) Anchor frame independent angular point regression method and device for detecting object in any direction in remote sensing image
CN119088997B (en) Image query method, device and program product

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18754210

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18754210

Country of ref document: EP

Kind code of ref document: A1