[go: up one dir, main page]

WO2018149376A1 - Procédé et dispositif de génération d'extrait vidéo - Google Patents

Procédé et dispositif de génération d'extrait vidéo Download PDF

Info

Publication number
WO2018149376A1
WO2018149376A1 PCT/CN2018/076290 CN2018076290W WO2018149376A1 WO 2018149376 A1 WO2018149376 A1 WO 2018149376A1 CN 2018076290 W CN2018076290 W CN 2018076290W WO 2018149376 A1 WO2018149376 A1 WO 2018149376A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
trajectory
target object
retrieval
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2018/076290
Other languages
English (en)
Chinese (zh)
Inventor
潘志敏
车军
向杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision Digital Technology Co Ltd
Original Assignee
Hangzhou Hikvision Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co Ltd filed Critical Hangzhou Hikvision Digital Technology Co Ltd
Publication of WO2018149376A1 publication Critical patent/WO2018149376A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/738Presentation of query results
    • G06F16/739Presentation of query results in form of a video summary, e.g. the video summary being a video sequence, a composite still image or having synthesized frames

Definitions

  • the present application relates to the field of video processing technologies, and in particular, to a method and an apparatus for generating a video digest.
  • video summary technology analyzes the structure and content of the video, extracts the meaningful part from the original video, that is, the moving target, and combines the moving target with the background scene in a specific way to form a simple and fully capable video content.
  • a summary of a video is a simple summary of long video content, usually represented by a sequence of static or dynamic images, and the original information is preserved.
  • the corresponding technique for generating a video summary is a technique for generating a video summary based on a target object, and the method includes the following steps: First, by analyzing an input video, generating a video structured description file, and establishing a correlation according to the video structured description file; a database, wherein the video structured description file includes attribute information of the target object in the video and trajectory information of the target object; secondly, the established database is retrieved to extract the trajectory information of the moving target; and finally, each target object is further The trajectory is analyzed, the target trajectory is translated on the time axis, and the trajectory of the target object at different times is arranged in the same picture to generate a summary video.
  • the technology can meet the user's need to generate a video summary for a specific target object, and generate a video summary with a short duration, a compact motion, and a high concentration ratio.
  • the trajectory of the target object when the trajectory of the target object is arranged, the actual situation is complicated, and there are cases where the trajectories of the plurality of target objects overlap. Since the above technique repeats the trajectory of the target object, the trajectory of the overlapping target object is performed. Excluded, when the trajectories of multiple target objects overlap, the summary video loses the information of the associated target object; and due to the loss of the associated target object information, a target object is frequently generated in the generated video summary. And the phenomenon of disappearing, resulting in poor visual effects.
  • the purpose of the embodiment of the present application is to provide a method and a device for generating a video summary, so as to improve the visual effect of the video summary generated when the target trajectory is complicated.
  • the specific technical solutions are as follows:
  • the embodiment of the present application provides a method for generating a video summary, where the method includes:
  • the first target original image is spliced with the first abstract background image to generate a video summary.
  • the target retrieval condition includes: retrieving a time period and/or retrieval attribute information of the target object;
  • the searching for the established database includes:
  • the searching for the established database includes:
  • the searching for the established database includes:
  • the method before the acquiring the target search condition, the method further includes:
  • trajectory information includes: movement information of the trajectory and overlapping state information with other trajectories;
  • the original image, the mask map, and the target original image are stored in the database.
  • the mask map of each frame corresponding to the track information is extracted from the video frame that includes the target object, including:
  • the trajectory of the to-be-translated trajectory that meets the preset translation condition is translated to the same target time period along the time axis, including:
  • a track that is not in the target time segment in the first track set is used as a to-be-translated track, and is stored in a to-be-translated queue, where the to-be-translated track is not in the first track set.
  • the current to-be-translated trajectory is translated to the target time period, and stored in the summary queue;
  • step of acquiring the first target original image corresponding to each track of the target time period from the database including:
  • the track information of each track further includes: a target box information set of the target object;
  • the splicing the first target original image and the first abstract background image to generate a video summary including:
  • the copying the first target original image to the corresponding first location in the first summary background image includes:
  • the pixel value of the overlapping portion of the corresponding trajectory is set to be the mean value of the target original image pixel value of each target object, and the pixel value of the non-overlapping portion is each target.
  • the pixel value of the target original image of the object, and the image to be copied is obtained;
  • Copying each first target original image to a corresponding first location in the first summary background image to generate a video summary including:
  • the method before storing the original image, the mask map, and the target original image in the database, the method further includes:
  • the method before the acquiring the target search condition, the method further includes:
  • the method further includes:
  • the embodiment of the present application provides a device for generating a video summary, where the device includes:
  • a retrieval module configured to acquire a target retrieval condition, and retrieve the established database to obtain a first trajectory set including a trajectory that meets the target retrieval condition, wherein the database stores the video frame from the target object Extracting the trajectory information of each trajectory and the target original image, wherein the trajectory information of each trajectory includes: overlapping state information with other trajectories;
  • a combination module configured to divide at least two tracks that overlap in the first set of tracks into a group according to the overlapping state information, and each group is determined as a combined track;
  • a panning module for translating a trajectory of the to-be-translated trajectory that satisfies the preset panning condition to the same target time segment along the time axis, wherein the to-be-translated trajectory comprises: the combined trajectory and/or the first trajectory set No overlapping trajectories occur;
  • a first acquiring module configured to acquire, from the database, a first target original image corresponding to each track in the target time period
  • a second obtaining module configured to obtain a first abstract background image for generating a video summary
  • a splicing module configured to splicing each of the first target original images and the first abstract background image to generate a video summary.
  • the target retrieval condition includes: retrieving a time period and/or retrieval attribute information of the target object;
  • the retrieval module is specifically configured to:
  • the search module is specifically configured to:
  • the retrieval module is specifically configured to:
  • the device further includes:
  • a first extraction module configured to extract each target object from the input video
  • a second extraction module configured to extract trajectory information and attribute information of each target object, where the trajectory information includes: movement information of the trajectory, and overlapping state information with other trajectories;
  • a first storage module configured to store track information and attribute information of each target object into a video structured target description file
  • a database generating module configured to generate the database according to the video structured target description file
  • a third extraction module configured to extract an original image and a mask map of each frame corresponding to the trajectory information from the video frame that includes the target object, and determine, according to the original image and the mask map, the trajectory information corresponding to the trajectory information The original image of each frame of the frame;
  • a second storage module configured to store the original image, the mask map, and the target original image into the database.
  • the third extraction module includes:
  • a first extraction submodule configured to extract a motion mask of the target object from a video frame that includes the target object
  • a first determining submodule configured to determine an initial mask according to the motion mask
  • a second determining submodule configured to determine an edge point set of the initial mask map
  • a second extraction submodule configured to extract a convex set in the set of edge points, and form a convex punctual point set of the mask map
  • the filler submodule is configured to fill the convex hull corresponding to the convex hull point set to obtain a final mask map.
  • the translation module includes:
  • a queue creation sub-module for establishing a queue to be translated and a summary queue
  • a first storage sub-module configured to use a trajectory in the target trajectory that is not in the target time segment as a trajectory to be translated, and store the trajectory in the to-be-translated trajectory, where the trajectory to be translated is the a combined trajectory in a set of trajectories that is not in the target time period and a trajectory that does not overlap;
  • a second storage submodule configured to store, in the first time set, a track in the target time period into the summary queue
  • a third extraction sub-module configured to sequentially extract a current to-be-translated trajectory from the to-be-translated queue, and obtain a rectangular frame of each target object in the video frame corresponding to the current to-be-translated trajectory according to the original image in the database ;
  • An operation submodule configured to calculate an overlapping area between a rectangular frame of each target object and a rectangular frame of the target object in a video frame corresponding to each track that has been stored in the summary queue;
  • a third storage submodule configured to: when the overlapping area is less than or equal to a preset overlapping parameter threshold, translate the current to-be-translated trajectory to the target time segment, and store the trajectory to the summary queue;
  • the first acquiring module is specifically configured to:
  • the track information of each track further includes: a target box information set of the target object;
  • the splicing module includes:
  • a third determining submodule configured to determine, according to the target box information set of the target object in the trajectory information, a first position of the first target original image in the first abstract background image
  • the video summary generation sub-module is configured to copy each first target original image to a corresponding first position in the first summary background image to generate a video summary.
  • the third determining submodule is specifically configured to:
  • the pixel value of the overlapping portion of the corresponding trajectory is set to be the mean value of the target original image pixel value of each target object, and the pixel value of the non-overlapping portion is each target.
  • the pixel value of the target original image of the object, and the image to be copied is obtained;
  • the splicing module is specifically configured to:
  • the device further includes:
  • An operation module configured to obtain a summary background image according to a preset period
  • a third storage module configured to store the acquired summary background image of each period into the database
  • the second acquiring module includes:
  • a sub-module configured to divide the target time segment into a time sub-segment corresponding to the preset period according to a time corresponding to each preset period;
  • a fourth determining sub-module configured to determine a first preset period corresponding to the time sub-segment including the most trajectory in the target time period
  • the background image obtaining sub-module is configured to obtain, from the database, a first abstract background image corresponding to the first preset period.
  • the device further includes:
  • a display module configured to display a user interaction interface according to a user instruction
  • a receiving module configured to receive and save a target retrieval condition input by the user through the user interaction interface, a preset translation condition, and a preset period for generating a summary background image
  • An execution module configured to perform the step of searching the established database when receiving a startup request input by the user through the user interaction interface
  • the ending module is configured to end the process of generating the video summary when receiving the interrupt request input by the user through the user interaction interface.
  • an embodiment of the present application provides a computer device, including a processor and a memory, where
  • a memory for storing a computer program
  • the method of the first aspect of the embodiments of the present application is implemented when the processor is configured to execute a computer program stored in the memory.
  • the embodiment of the present application provides a storage medium for storing executable code, where the executable code is used to execute at the runtime: the method steps described in the first aspect of the embodiment of the present application.
  • an embodiment of the present application provides an application program for performing, at runtime, the method steps described in the first aspect of the embodiments of the present application.
  • a method and device for generating a video summary provided by an embodiment of the present application, by searching an established database, obtaining a trajectory that satisfies a target retrieval condition, and combining the trajectories that overlap in the trajectory into a combined trajectory, and then shifting differently The combined trajectory of the time period and the trajectory that does not overlap to the same target time period, and finally splicing the target original image and the abstract background image corresponding to the trajectory in the target time segment to generate a video summary;
  • the overlapping trajectories of the plurality of target objects are combined into one combined trajectory, the whole trajectory is shifted on the time axis to avoid losing some trajectories in the overlapping trajectories during translation, thereby improving the visual effect of generating the video summary.
  • FIG. 1 is a schematic flowchart diagram of a method for generating a video summary according to an embodiment of the present application
  • FIG. 2 is a schematic flowchart of a method for generating a video summary according to another embodiment of the present application
  • FIG. 3 is a schematic diagram of a specific process of S205 in the embodiment shown in FIG. 2;
  • FIG. 4 is a schematic diagram of a specific process of S105 in the embodiment shown in FIG. 2;
  • FIG. 5 is a schematic diagram of a specific process of S103 in the embodiment shown in FIG. 2;
  • FIG. 6 is a schematic flowchart diagram of a method for generating a video summary according to still another embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a device for generating a video summary according to an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a device for generating a video summary according to another embodiment of the present application.
  • FIG. 9 is a schematic diagram showing a specific structure of a third extraction module 850 in the embodiment shown in FIG. 8;
  • FIG. 10 is a schematic diagram showing a specific structure of the translation module 730 in the embodiment shown in FIG. 8;
  • FIG. 11 is a schematic structural diagram of a second acquiring module 750 in the embodiment shown in FIG. 8;
  • FIG. 12 is a schematic structural diagram of a device for generating a video summary according to still another embodiment of the present application.
  • the embodiment of the present application provides a method and an apparatus for generating a video summary.
  • a method for generating a video digest provided by the embodiment of the present application is first introduced.
  • An execution body of a method for generating a video summary provided by an embodiment of the present application may be a video summary controller having a function of generating a video summary.
  • the manner of implementing the method for generating a video digest provided by the embodiment of the present application may be at least one of software, a hardware circuit, and a logic circuit that are disposed in the video summary controller.
  • the video summary controller can be applied to a video surveillance system or to a server side of a video website.
  • a method for generating a video summary may include the following steps:
  • the trajectory information of each trajectory extracted from the video frame containing the target object and the target original image are stored in the database, and the trajectory information of each trajectory includes overlapping state information with other trajectories;
  • the database may further include: Attribute information of each target object, a video frame original image, a mask map and/or a background image of the target object in each frame original image;
  • the overlap state information may be an identifier of the trajectory of the overlapping target object, the identifier It may be the name of the track, or the track number, or other feature symbols used to characterize the track;
  • the track information may include: an identifier of the track of the target object, the number of track points of the target object, and each track point of the target object.
  • the frame number, the time information of the trajectory of the target object, the spatial information of the trajectory of the target object, and/or the overlapping state information with other trajectories; the attribute information may include: the appearance time of the target object, the moving direction of the target object, the license plate Number, model, vehicle brand, vehicle color, person's dress color, person's age, person's height, person is Wearing glasses and / or whether the person backpack bag and other information.
  • the target original image may be an original image of the video frame, or may be an image obtained by combining the mask image with the original image of the video frame.
  • the target retrieval condition may be any combination of all the information in the attribute information of the target object, or may be a certain time period.
  • the established database is retrieved, and the trajectory of the target object that meets the target retrieval condition is retrieved from the database.
  • the target retrieval condition includes: retrieving the time period and/or the retrieval attribute information of the target object.
  • the steps of searching the established database may include:
  • the database is retrieved, and the trajectory of each target object in the retrieval time period is obtained.
  • the trajectory of each target object in the retrieval time period is acquired, for example, there is a video with a duration of 5 hours from 7:00 to 12:00, and the target retrieval condition is set.
  • the retrieval time period 8:00 to 9:00
  • the trajectory of each target object in 8:00 to 9:00 is extracted.
  • the time interval of the original video is intercepted, and the target can be extracted.
  • the video of the object's active time period reduces the input data generated by the subsequent video summary and reduces the amount of calculation.
  • the step of searching the established database may include:
  • the database is searched according to the retrieval attribute information of the target object, and the trajectory of each target object matching the retrieval attribute information is obtained.
  • the target search condition is any combination of all the information in the attribute information of the target object
  • the established database is searched, and the trajectory of the target object having the same target search condition is retrieved from the database.
  • the target search condition is: a man who is 1.75 meters in height and between 40 years old and 45 years old, wearing a white down jacket, according to the target search condition, can extract from the database each target object that satisfies the target search condition. Track.
  • the target retrieval condition defines the target object in the video, can ensure the accuracy of the target object, and is easier to extract the trajectory of the target object that satisfies the requirement.
  • the step of searching the established database may include:
  • the database is searched according to the retrieval time period and the retrieval attribute information of the retrieval target object, and the trajectory of each target object matching the retrieval attribute in the retrieval time period is obtained.
  • the trajectory of the target object that matches the attribute information in the retrieval time period can be obtained in this embodiment.
  • This embodiment not only ensures the activity level of the target object but also defines the attributes of the target object.
  • the obtained trajectory is more accurate.
  • the target search condition may be preset, that is, the condition of the target search has been set before the video is obtained, and is mostly used for the case of fixing the scene, for example, the same time period of the same time period is repeated for many days in a row.
  • Target objects for the case of a fixed scene, using the preset target search condition can avoid repeatedly setting the same target search condition; the target search condition can also be input by the user according to the actual situation, for example, the user needs to retrieve a certain time period.
  • the trajectory of the specific target object within the user the user can set the target search condition according to the attribute information of the target object. This is all reasonable.
  • a SQL (Structured Query Language) query can be generated according to the target retrieval condition; the established database is retrieved.
  • the SQL query statement is the most commonly used statement in the SQL query language.
  • the SQL query language is a database query and programming language for accessing data and querying, updating, and managing databases. SQL query statements are selected by selecting commands.
  • the data of the target search condition is selected in the table of the database, and the data may be a track identifier of the target object, and the track information such as time information and spatial information of the track of the target object may be determined by the track identifier.
  • the overlapping state information may be an identifier of a track of the overlapping target object, the identifier may be a name of the track, or a track number, or other feature symbols used to represent the track;
  • the first track set is a slave A set of trajectories in which a plurality of trajectories of a target object satisfying the target retrieval condition extracted in the database are combined.
  • the overlapping trajectories are taken as a whole, and the trajectories that overlap in the first trajectory set are combined to obtain a combined trajectory. Moreover, the order of the trajectories on the time axis in the combined trajectory needs to remain unchanged.
  • the trajectory to be translated includes: a combined trajectory composed of at least two trajectories in the first trajectory set and/or a trajectory in the first trajectory set that does not overlap; the target time period may be preset by the user, It can also be a certain period of time during the playback time of the entire video.
  • the video summary to be generated is not simply spliced by motion segments, but the resulting video summary is concentrated by shifting the trajectory of the target object appearing at different time segments to the same time period.
  • the translation of the trajectory of the target object is a translation of the time of the trajectory and does not include the translation of the spatial position.
  • the target original image may be the original image of the video frame, or may be an image obtained by the mask image and the original image of the video frame, and the first target original image is the target original image of any track, in the first target original
  • the first target original image may be obtained according to the trajectory information of the target object, or may be obtained according to the attribute information of the target object.
  • the image formed by the other content is the background image of the video, and the generated video summary cannot include only the first target original image, and should also include the first abstract background image.
  • the first abstract background image may be a static abstract background image calculated by a static background determining method, or may be a dynamic abstract background image determined by a dynamic background determining method.
  • the background image of the video at different times will be very different.
  • the background image of the segment's video is taken as a summary background image.
  • S106 splicing each first target original image with the first abstract background image to generate a video summary.
  • the first target original image is spliced with the first abstract background image, and the first target original image may be copied to a position where the target object is located in the first abstract background image.
  • the first target original image cannot be simply copied and pasted to obtain a video summary.
  • the embodiment by searching the established database, the trajectories satisfying the target retrieval condition are obtained, and the overlapping trajectories in the trajectories are combined into a combined trajectory, and then the combined trajectories of different time periods are translated and overlapped without overlapping. Tracking to the same target time segment, and finally splicing the target original image and the abstract background image corresponding to the trajectory in the target time segment to generate a video summary; in the embodiment of the present application, when the video summary is generated, overlapping multiple target objects will occur.
  • the trajectories are combined into a combined trajectory, and the whole trajectory is shifted on the time axis to avoid losing some trajectories in the overlapping trajectories during translation, thereby improving the visual effect of generating a video summary.
  • the method for generating the video summary further includes:
  • the flow of the video summary generation is ended; when the startup request input by the user is received, the step of acquiring the target retrieval condition and performing the retrieval on the established database is performed.
  • the user can input an interrupt request at any time. For example, the user finds that the target retrieval condition is set incorrectly, and after receiving the interrupt request, the process of generating the video summary is ended; Then, the user can reset the target retrieval condition according to the requirement, and input a startup request, and after receiving the startup request, re-search the established database according to the target retrieval condition.
  • the method for generating the video summary may further include:
  • the target object is a target with characteristic information, such as a character, a car, a ship, and the like.
  • the trajectory information may include: an identifier of a trajectory of the target object, a number of trajectory points of the target object, a frame number of each trajectory point of the target object, time information of the trajectory of the target object, spatial information of the trajectory of the target object, and / or information such as overlapping state information with other tracks; attribute information may include: the time of occurrence of the target object, the direction of movement of the target object, the license plate number, the model, the brand of the vehicle, the color of the vehicle, the color of the person's dress, the person Information such as the age, the height of the person, whether the person wears glasses and/or whether the person has a backpack or a bag.
  • the video structured target extraction is performed on the video.
  • the video structured target extraction includes target object extraction and target attribute extraction, and the target trajectory description file is obtained by the target object extraction;
  • the target attribute description file, the target track description file and the target attribute description file are included in the video structured object description file.
  • the target object extraction and the target attribute extraction can be performed synchronously.
  • the target attribute extraction is to extract one or more video frame images containing the target object by using a preset video frame extraction method, and then combine the attribute classifier to obtain the target of the comprehensive attribute classifier.
  • the target attribute description file for the object is used to identify a certain type of attribute of the target object, and the information of the class attribute can be obtained through internal analysis of the attribute classifier.
  • the target object extraction is a specific attribute and a motion attribute of the target object extracted according to the target attribute, combined with the specific attribute and the motion attribute of the target object, and multi-target tracking is performed, and the specific attribute of the tracked target object is associated with the motion attribute.
  • the video structured target description file is used to store attribute information and track information of the target object.
  • the existing video structured object description file is usually generated on the industrial computer or server, and can also be implemented through an embedded platform, such as DSP (Digital Signal Processor) and ARM (Advanced Reduced Instruction Set). Computer Machines, a reduced instruction set microprocessor).
  • the database is used to establish a database, and all the attribute information is managed by the database, including: the appearance time of the target object, the moving direction of the target object, the license plate number, the model, and the brand of the vehicle. , the color of the vehicle, the color of the person's dress, the age of the person, the height of the person, whether the person wears glasses and/or whether the person has a backpack or the like.
  • the trajectory of the target object is analyzed, and the trajectory information of the target object is extracted.
  • the trajectory information of the target object can be used to extract the overlapping state information of the target object, so that the overlapping state information of the target object is saved in the database. .
  • the target original image is an image of the target object.
  • the target original image may be obtained by matching the original image of each video frame through a mask map. Since the mask map reflects the contour of the target object, the mask map is only Represents the outline of the target object, not the image content. After the original image of the video frame is matched, the image of the target object of the area of the mask map is obtained, compared to directly extracting the target object from the original image of the video frame. The image is more accurate.
  • the mask map can be extracted by a preset background modeling method, and the preset background modeling method can be any of the color background model method, the average background model method, the Gaussian background model method or the CodeBook background model method. One.
  • a method for generating a video digest in the embodiment of the present application the step of extracting a mask map of each frame corresponding to the trajectory information from the video frame that includes the target object may include:
  • the motion mask is two-dimensional data constituting the mask map
  • the target can be determined by extracting the motion mask of the target object according to the two-dimensional data of the motion mask and the preset background modeling method.
  • the mask map of the object The mask map characterizes the outline of the target object and distinguishes the target original image from the background image in the original video frame.
  • S2054 Extract a convex set in the edge point set to form a convex punctual point set of the mask map.
  • the mask map extraction is incomplete.
  • the mask map can be post-processed. That is, according to the edge point set of the mask map, the convex punctual point set of the mask map is formed, and the convex hull corresponding to the convex hull point set is filled, thereby perfecting the mask map to maximize the contour of the target object.
  • the original image, the mask map and the target original image corresponding to the target object are stored in the database, so that in the step of generating the video summary, the corresponding target original image can be quickly searched from the database according to the attribute information of the target object.
  • the method for generating the video summary may further include:
  • the summary background image is obtained according to the preset period.
  • the preset period is a period for saving the summary background image, and may be set by the user according to actual needs, or may be an empirical value preset by a person skilled in the art.
  • the target original image can be obtained by the mask map and the original image of the video frame. Since the mask map reflects the outline of the target object, the mask map only represents the outline of the target object, and does not contain the image content, and the original video frame. After the image phase and the image of the target object of the region obtained by the mask map, it is more accurate than extracting the image of the target object directly from the original image of the video frame, and the target object is completely extracted, and the image is removed.
  • the background part of the target frame enhances the stitching effect of the original image and the abstract background image. In the process of video analysis, a background image is maintained from beginning to end. This background image is updated every frame. When the preset cycle time is reached, the background image is automatically saved once.
  • the summary background image of each cycle obtained is stored in the database.
  • the background image changes during the playback of the video, the background image changes little compared to the target object. Therefore, it is only necessary to periodically save the background image, which not only ensures the authenticity of the background but also increases the background. Too much amount of calculation.
  • a method for generating a video summary in the embodiment of the present application the step of obtaining a first summary background image for generating a video summary may include:
  • the target time segment is divided into time sub-segments corresponding to the preset period according to the time corresponding to each preset period.
  • the target time segment is divided into corresponding time sub-segments, which can ensure that the corresponding abstract background image can be obtained when the subsequent summary background image for generating the video summary is obtained, without requiring more A multi-algorithm algorithm determines a better summary background image, which can effectively save computational effort.
  • S1052 Determine a first preset period corresponding to a time sub-segment containing the most track in the target time period.
  • the background image in the time period should be used as the abstract background image of the video summary, with only a small part of the track and the actual background. Does not match, compared to the static background image can more realistically reflect the actual trajectory of the target object, improve the effect of the video summary.
  • the trajectories are combined into a combined trajectory, and the whole trajectory is shifted on the time axis to avoid losing some trajectories in the overlapping trajectories during translation, thereby improving the visual effect of generating a video summary.
  • the target original image is obtained by matching the mask image with the original image of the video frame, due to the mask
  • the figure embodies the outline of the target object.
  • the mask map only represents the outline of the target object, and does not contain the image content. After the original image is combined with the original image of the video frame, the image of the target object in the area of the mask map is obtained. It is more accurate to extract the image of the target object directly from the original video frame.
  • a method for generating a video summary may include:
  • S1031 Establish a to-be-translated queue and a summary queue.
  • the queue to be translated is a queue for storing undistributed target trajectories, and the trajectory that can be stored in the queue to be translated is not yet determined in the database whether the preset condition is satisfied, or the target is not in the first trajectory set.
  • the summary queue is a queue for storing tracks for generating video summaries.
  • a track that is not in the target time segment in the first track set is used as a track to be translated, and is stored in the queue to be translated.
  • the to-be-translated trajectory is a combined trajectory in the first trajectory set that is not in the target time segment and a trajectory that does not overlap.
  • S1034 Extract the current to-be-translated trajectory from the queue to be translated, and obtain a rectangular frame of each target object in the video frame corresponding to the current to-be-translated trajectory according to the original image in the database.
  • the position of the target object in the video frame is actually contained in a rectangular sub-picture.
  • the size of the rectangular sub-picture is related to the target extraction method. For example, if the head target is extracted, the size of the rectangular sub-picture is based on the size of each head target. As far as possible, including the range of the area of any individual head target; if, for example, the pedestrian target needs to be extracted, the size of the rectangular sub-picture is determined based on the size of each pedestrian target and the range of any pedestrian target as much as possible.
  • the trajectory of the target object is formed by a plurality of consecutive video frames, and the trajectory of one target object has a plurality of rectangular sub-pictures, forming a rectangular frame to be translated, and the rectangular frame is a rectangular sub-graph containing the target object and having the smallest area. Box.
  • an overlapping parameter is preset, and the preset overlapping parameter threshold may be set according to an actual demand situation and a specific attribute of the target object.
  • the preset overlapping parameter threshold may be set according to an actual demand situation and a specific attribute of the target object.
  • the track in the first track set that is not in the target time segment is stored in the queue to be translated, and the track in the target time segment in the first track set is stored to
  • the summary queue stores the track in the queue to be translated that meets the preset overlap condition to the summary queue.
  • the step of obtaining the first target original image corresponding to each track of the target time segment from the database may include:
  • the trajectory information has a corresponding relationship with the target object and the trajectory of the target object.
  • the trajectory information also has a corresponding relationship with the target original image of the target object. Therefore, the first target original image corresponding to the trajectory information can be extracted from the database according to the trajectory information of the trajectory to be translated.
  • the first target original image is spliced with the first abstract background image to generate a video summary, including:
  • the first position of the first target original image in the first summary background image is determined according to the target frame information set of the target object in the trajectory information.
  • the track information of each track further includes: a target frame information set of the target object, and the target frame information includes coordinates and a width and height of a top left corner of the rectangular frame of the target object, for example, the target frame information is (x, y, w, h), where x is the abscissa of the upper left corner of the rectangular frame of the target object, y is the ordinate of the upper left corner of the rectangular frame of the target object, w is the width of the rectangular frame of the target object, and h is the target object
  • the height of the rectangular frame may also include the coordinates of the center point of the rectangular frame of the target object and the width and height.
  • the target frame information is (m, n, p, q), where m is the rectangle of the target object.
  • the absc issa of the center point of the frame
  • n is the ordinate of the center point of the rectangle of the target object
  • p is the width of the rectangle of the target object
  • q is the height of the rectangle of the target object.
  • the target forms a set of target frame information in the moving process, and the set includes information such as coordinates, length, direction, and the like of the first target original image in the first abstract background image, and therefore, according to the target frame information of the target object in the trajectory information
  • the set determines the first position of the first target original image in the first summary background image.
  • the first target original image is copied to the corresponding first position in the first abstract background image to generate a video summary.
  • the method used for splicing may cause the target original image to have an error in the position and actual position in the abstract background image.
  • the target original image includes a partial background image, which is different from the background image to be stitched, so that the position of the target original image in the abstract background image does not match the original image; therefore, in this embodiment, the target original image may be The original image of the video frame and the mask image are combined with the obtained image.
  • the target original image does not include the background image, and the true position of the target object in the abstract background image is reflected according to the mask map, and the target original image and the abstract background image are displayed.
  • the splicing can accurately copy the target original image to the contour area outlined by the mask map, which can ensure the effect of splicing the target original image and the abstract background image, thereby improving the visual effect of the video summary.
  • the step of copying each of the first target original images to the corresponding first location in the first summary background image may include:
  • the pixel value of the overlapping portion of the corresponding trajectory is set to be the mean value of the target original image pixel values of each target object, and the pixels of the non-overlapping portion.
  • the value is the pixel value of the target original image of each target object, and the image to be copied is obtained;
  • each of the first target original images that are not overlapped with the target object and the target object are copied to the corresponding first position in the first abstract background image to generate a video summary.
  • the trajectory of the target object that generates the video summary overlaps and does not overlap.
  • the target original image can be directly copied into the summary background image, and overlap occurs.
  • the average value of the pixel values of the target original image of each target object is taken as the pixel value of the image of the overlapping portion, and then spliced to generate a video summary.
  • the overlapping portion can also take the weight value of the pixel value of the target original image of each target object, which is reasonable.
  • a method for generating a video summary provided by an embodiment of the present application may further include: before the step of acquiring a target search condition, the method for generating a video summary may further include:
  • S601 Display a user interaction interface according to a user instruction.
  • the user interaction interface is an interface for realizing interaction between the user and the system, and the user interaction interface may be a dialog box or a selection screen in the webpage. It is used to prompt the user to input a target search, a preset panning condition, and a preset period, wherein the preset period is used to generate a summary background image.
  • S602. Receive and save a target retrieval condition input by the user through the user interaction interface, a preset translation condition, and a preset period for generating a summary background image.
  • the trajectories are combined into a combined trajectory, and the whole trajectory is shifted on the time axis to avoid losing some trajectories in the overlapping trajectories during translation, thereby improving the visual effect of generating a video summary.
  • the user is allowed to set the target retrieval condition and setting parameters, thereby improving the flexibility of the application and bringing convenience to the user.
  • the method for generating the video summary may further include:
  • the step of retrieving the established database is performed; and when the interrupt request input by the user through the user interaction interface is received, the process of generating the video summary is ended.
  • the user can input an interrupt request at any time. For example, the user finds that the target retrieval condition is set incorrectly, and after receiving the interrupt request, the process of generating the video summary is ended; Then, the user can reset the target retrieval condition according to the requirement, and input a startup request, and after receiving the startup request, re-search the established database according to the target retrieval condition.
  • the embodiment of the present application provides a device for generating a video summary.
  • the device for generating a video summary may include:
  • the retrieval module 710 is configured to acquire a target retrieval condition, and perform a retrieval on the established database to obtain a first trajectory set including a trajectory that meets the target retrieval condition, where the database stores a video frame from the target object.
  • the combining module 720 is configured to divide at least two tracks that overlap in the first track set into a group according to the overlapping state information, and each group is determined as a combined track;
  • a panning module 730 configured to translate, according to a time axis, a trajectory that satisfies a preset panning condition in the trajectory to be translated to the same target time segment, where the trajectory to be translated includes: the combined trajectory and/or the first trajectory set There is no overlapping trajectory in the middle;
  • a first acquiring module 740 configured to acquire, from the database, a first target original image corresponding to each track in the target time period
  • a second obtaining module 750 configured to obtain a first abstract background image for generating a video summary
  • the splicing module 760 is configured to splicing the first target original image and the first abstract background image to generate a video summary.
  • the embodiment by searching the established database, the trajectories satisfying the target retrieval condition are obtained, and the overlapping trajectories in the trajectories are combined into a combined trajectory, and then the combined trajectories of different time periods are translated and overlapped without overlapping. Tracking to the same target time segment, and finally splicing the target original image and the abstract background image corresponding to the trajectory in the target time segment to generate a video summary; in the embodiment of the present application, when the video summary is generated, overlapping multiple target objects will occur.
  • the trajectories are combined into a combined trajectory, and the whole trajectory is shifted on the time axis to avoid losing some trajectories in the overlapping trajectories during translation, thereby improving the visual effect of generating a video summary.
  • the target search condition may include: retrieving a time period and/or retrieval attribute information of the target object;
  • the retrieval module may specifically be used to:
  • the search module may specifically be used to:
  • the retrieval module may specifically be used to:
  • a device for generating a video summary may further include:
  • a first extraction module 810 configured to extract each target object from the input video
  • the second extraction module 820 is configured to extract the trajectory information and the attribute information of each target object, where the trajectory information includes: movement information of the trajectory and overlapping state information with other trajectories;
  • a first storage module 830 configured to store the trajectory information and the attribute information of each target object into the video structured target description file
  • a database generating module 840 configured to generate the database according to the video structured target description file
  • a third extraction module 850 configured to extract, from a video frame that includes the target object, an original image and a mask map of each frame corresponding to the trajectory information, and determine the trajectory information according to the original image and the mask map. Corresponding target image of each frame;
  • the second storage module 860 is configured to store the original image, the mask map, and the target original image into the database.
  • the trajectories are combined into a combined trajectory, and the whole trajectory is shifted on the time axis to avoid losing some trajectories in the overlapping trajectories during translation, thereby improving the visual effect of generating a video summary.
  • the target original image is obtained by matching the mask image with the original image of the video frame, due to the mask
  • the figure embodies the outline of the target object.
  • the mask map only represents the outline of the target object, and does not contain the image content. After the original image is combined with the original image of the video frame, the image of the target object in the area of the mask map is obtained. It is more accurate to extract the image of the target object directly from the original video frame.
  • the third extraction module 850 may include:
  • a first extraction submodule 851 configured to extract a motion mask of the target object from a video frame that includes the target object
  • a first determining submodule 852 configured to determine an initial mask according to the motion mask
  • a second determining submodule 853 configured to determine an edge point set of the initial mask map
  • a second extraction sub-module 854 configured to extract a convex set in the set of edge points, and form a convex punctual point set of the mask map;
  • the filling sub-module 855 is configured to fill the convex hull corresponding to the convex punctual point set to obtain a final mask map.
  • the translation module 730 can include:
  • a queue establishment sub-module 731 configured to establish a queue to be translated and a summary queue
  • a first storage sub-module 732 configured to use a trajectory that is not in the target time period in the target trajectory as a to-be-translated trajectory, and store the trajectory to be translated into a queue to be translated, where the to-be-translated trajectory is the a combined trajectory in the first trajectory set that is not in the target time period and a trajectory that does not overlap;
  • a second storage submodule 733 configured to store, in the first time set, a track in the target time period into the summary queue
  • a third extraction sub-module 734 configured to sequentially extract a current to-be-translated trajectory from the to-be-translated trajectory, and obtain a rectangle of each target object in the video frame corresponding to the current to-be-translated trajectory according to an original image in the database frame;
  • the operation sub-module 735 is configured to calculate an overlapping area between the rectangular frame of each target object and the rectangular frame of the target object in the video frame corresponding to each track that has been stored in the summary queue;
  • the third storage sub-module 736 is configured to: when the overlapping area is less than or equal to a preset overlapping parameter threshold, translate the current to-be-translated trajectory to the target time segment, and store the trajectory to the summary queue;
  • the first obtaining module 740 is specifically configured to:
  • the track information of each track may further include: a target box information set of the target object;
  • the splicing module 760 can include:
  • a third determining submodule configured to determine, according to the target box information set of the target object in the trajectory information, a first position of the first target original image in the first abstract background image
  • the video summary generation sub-module is configured to copy each first target original image to a corresponding first position in the first summary background image to generate a video summary.
  • the third determining submodule is specifically configured to:
  • the pixel value of the overlapping portion of the corresponding trajectory is set to be the mean value of the target original image pixel value of each target object, and the pixel value of the non-overlapping portion is each target.
  • the pixel value of the target original image of the object, and the image to be copied is obtained;
  • the splicing module 760 can be specifically configured to:
  • the device may further include:
  • An operation module configured to obtain a summary background image according to a preset period
  • a third storage module configured to store the acquired summary background image of each period into the database
  • the second obtaining module 750 may include:
  • the dividing sub-module 751 is configured to divide the target time segment into time sub-segments corresponding to the preset period according to the time corresponding to each preset period;
  • a fourth determining sub-module 752 configured to determine a first preset period corresponding to the time sub-segment including the most track in the target time period
  • the background image obtaining sub-module 753 is configured to obtain, from the database, a first summary background image corresponding to the first preset period.
  • the apparatus may further include:
  • the display module 1210 is configured to display a user interaction interface according to a user instruction
  • the receiving module 1220 is configured to receive and save a target retrieval condition input by the user through the user interaction interface, a preset translation condition, and a preset period for generating a summary background image.
  • the device shown may further include:
  • An execution module configured to perform the step of searching the established database when receiving a startup request input by the user through the user interaction interface
  • the ending module is configured to end the process of generating the video summary when receiving the interrupt request input by the user through the user interaction interface.
  • the video summary generating apparatus in another embodiment of the embodiment of the present application may include: a retrieval module 710, a combination module 720, a translation module 730, a first acquisition module 740, a second acquisition module 750, and a splicing module.
  • a retrieval module 710 may include: a retrieval module 710, a combination module 720, a translation module 730, a first acquisition module 740, a second acquisition module 750, and a splicing module.
  • the embodiment of the present application further provides a computer device, including a processor and a memory, where
  • a memory for storing a computer program
  • the processor when used to execute the computer program stored in the memory, implements all the steps of the method for generating the video summary provided by the embodiment of the present application.
  • the above image collector may include an IPC (IP Camera), a smart camera, and the like.
  • the above memory may include a RAM (Random Access Memory), and may also include NVM (Non-Volatile Memory), such as at least one disk storage.
  • the memory may also be at least one storage device located away from the aforementioned processor.
  • the processor may be a general-purpose processor, including a CPU (Central Processing Unit), an NP (Network Processor), or the like; or a DSP (Digital Signal Processing) or an ASIC (Application) Specific Integrated Circuit, FPGA (Field-Programmable Gate Array) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components.
  • CPU Central Processing Unit
  • NP Network Processor
  • DSP Digital Signal Processing
  • ASIC Application) Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • other programmable logic device discrete gate or transistor logic device, discrete hardware components.
  • the processor can realize: by searching the established database, obtaining a trajectory that satisfies the target retrieval condition, and the trajectories are generated in the trajectory.
  • the stacked trajectories are combined into a combined trajectory, and then the combined trajectories of different time periods and the trajectories that do not overlap are translated to the same target time period, and finally the target original image and the abstract background image corresponding to the trajectory in the target time segment are spliced and generated.
  • the video summary in the embodiment of the present application, when the video summary is generated, the trajectories of the overlapping multiple target objects are combined into one combined trajectory, and the whole trajectory is shifted on the time axis to avoid losing some of the overlapping trajectories during translation. Tracks to improve the visuals of generating video summaries.
  • the embodiment of the present application provides a storage medium for storing a computer program, and when the computer program is executed by the processor, the method for generating the video summary is implemented. All the steps.
  • the storage medium stores an application that executes the method for generating the video digest provided by the embodiment of the present application at runtime, and thus can implement: by searching the established database, obtaining a trajectory that satisfies the target retrieval condition. Combine the trajectories that overlap in these trajectories into combined trajectories, then translate the combined trajectories of different time periods and the trajectories that have not overlapped to the same target time period, and finally the target original images and abstracts corresponding to the trajectories in the target time period.
  • the background image is spliced to generate a video summary.
  • the trajectories of the overlapping multiple target objects are combined into one combined trajectory, and the whole time is shifted on the time axis to avoid losing the overlap during translation.
  • the embodiment of the present application provides an application program for performing the following steps of the method for generating the video summary provided by the embodiment of the present application.
  • the application performs the method for generating the video summary provided by the embodiment of the present application at runtime, so that the trajectory that satisfies the target retrieval condition is obtained by searching the established database, and the trajectory occurs in the trajectory.
  • the overlapping trajectories are combined into a combined trajectory, and then the combined trajectories of different time periods and the trajectories that do not overlap are translated to the same target time period, and finally the target original image and the abstract background image corresponding to the trajectory in the target time segment are spliced.
  • Generating a video summary is generated by the embodiment of the present application at runtime, so that the trajectory that satisfies the target retrieval condition is obtained by searching the established database, and the trajectory occurs in the trajectory.
  • the overlapping trajectories are combined into a combined trajectory, and then the combined trajectories of different time periods and the trajectories that do not overlap are translated to the same target time period, and finally the target original image and the abstract background image corresponding to the trajectory
  • the embodiment of the present application When generating a video summary, the embodiment of the present application combines the trajectories of multiple overlapping target objects into a combined trajectory, and performs overall translation on the time axis to avoid losing one of the overlapping trajectories during translation. These tracks improve the visual effect of generating a video summary.
  • the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

La présente invention concerne un procédé et un dispositif de génération d'extrait vidéo. Le procédé de génération d'extrait vidéo consiste à : obtenir une condition de recherche cible, et rechercher une base de données établie pour obtenir un premier ensemble de pistes contenant des pistes satisfaisant la condition de recherche cible (S101) ; classifier au moins deux pistes, qui se chevauchent les unes les autres, dans le premier ensemble de pistes dans un groupe selon des informations d'état de chevauchement, chaque groupe étant déterminé en tant que piste combinée (S102) ; réaliser une translation sur des pistes, qui satisfait une condition de translation prédéfinie, dans des pistes à translater à la même période de temps cible le long d'une ligne de montage chronologique (S103) ; obtenir une première image originale cible pour chaque piste dans la période de temps cible à partir de la base de données (S104) ; obtenir une première image d'arrière-plan d'extrait pour générer un extrait vidéo (S105) ; et coller les premières images originales cible et la première image d'arrière-plan d'extrait pour générer l'extrait vidéo (S106). Le procédé peut améliorer l'effet visuel d'un extrait vidéo généré dans le cas où une piste cible est complexe.
PCT/CN2018/076290 2017-02-17 2018-02-11 Procédé et dispositif de génération d'extrait vidéo Ceased WO2018149376A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710087044.6 2017-02-17
CN201710087044.6A CN108460032A (zh) 2017-02-17 2017-02-17 一种视频摘要的生成方法及装置

Publications (1)

Publication Number Publication Date
WO2018149376A1 true WO2018149376A1 (fr) 2018-08-23

Family

ID=63170088

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/076290 Ceased WO2018149376A1 (fr) 2017-02-17 2018-02-11 Procédé et dispositif de génération d'extrait vidéo

Country Status (2)

Country Link
CN (1) CN108460032A (fr)
WO (1) WO2018149376A1 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110519532A (zh) * 2019-09-02 2019-11-29 中移物联网有限公司 一种信息获取方法及电子设备
CN110704606A (zh) * 2019-08-19 2020-01-17 中国科学院信息工程研究所 一种基于图文融合的生成式摘要生成方法
CN111464882A (zh) * 2019-01-18 2020-07-28 杭州海康威视数字技术股份有限公司 视频摘要生成方法及装置、设备、介质
CN111694984A (zh) * 2020-06-12 2020-09-22 百度在线网络技术(北京)有限公司 视频搜索方法、装置、电子设备及可读存储介质

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114679564B (zh) * 2020-12-24 2025-08-22 浙江宇视科技有限公司 视频摘要处理方法、装置、电子设备和存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103617234A (zh) * 2013-11-26 2014-03-05 公安部第三研究所 主动式视频浓缩装置及方法
CN104301699A (zh) * 2013-07-16 2015-01-21 浙江大华技术股份有限公司 一种图像处理方法及装置
CN104469547A (zh) * 2014-12-10 2015-03-25 西安理工大学 一种基于树状运动目标轨迹的视频摘要生成方法
CN104639994A (zh) * 2013-11-08 2015-05-20 杭州海康威视数字技术股份有限公司 基于运动目标生成视频摘要的方法、系统及网络存储设备
CN104717573A (zh) * 2015-03-05 2015-06-17 广州市维安电子技术有限公司 一种视频摘要的生成方法

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101249839B1 (ko) * 2008-06-30 2013-04-11 퍼듀 리서치 파운데이션 화상처리장치 및 그 화상처리방법
CN102254144A (zh) * 2011-07-12 2011-11-23 四川大学 一种鲁棒的图像中二维码区域提取方法
US9412025B2 (en) * 2012-11-28 2016-08-09 Siemens Schweiz Ag Systems and methods to classify moving airplanes in airports
US9141866B2 (en) * 2013-01-30 2015-09-22 International Business Machines Corporation Summarizing salient events in unmanned aerial videos
KR101804383B1 (ko) * 2014-01-14 2017-12-04 한화테크윈 주식회사 요약 영상 브라우징 시스템 및 방법
TW201605239A (zh) * 2014-07-22 2016-02-01 鑫洋國際股份有限公司 視訊分析方法與裝置
CN104657712B (zh) * 2015-02-09 2017-11-14 惠州学院 一种监控视频中蒙面人检测方法
CN104717574B (zh) * 2015-03-17 2017-11-24 华中科技大学 一种视频摘要中事件与背景的融合方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104301699A (zh) * 2013-07-16 2015-01-21 浙江大华技术股份有限公司 一种图像处理方法及装置
CN104639994A (zh) * 2013-11-08 2015-05-20 杭州海康威视数字技术股份有限公司 基于运动目标生成视频摘要的方法、系统及网络存储设备
CN103617234A (zh) * 2013-11-26 2014-03-05 公安部第三研究所 主动式视频浓缩装置及方法
CN104469547A (zh) * 2014-12-10 2015-03-25 西安理工大学 一种基于树状运动目标轨迹的视频摘要生成方法
CN104717573A (zh) * 2015-03-05 2015-06-17 广州市维安电子技术有限公司 一种视频摘要的生成方法

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111464882A (zh) * 2019-01-18 2020-07-28 杭州海康威视数字技术股份有限公司 视频摘要生成方法及装置、设备、介质
CN111464882B (zh) * 2019-01-18 2022-03-25 杭州海康威视数字技术股份有限公司 视频摘要生成方法及装置、设备、介质
CN110704606A (zh) * 2019-08-19 2020-01-17 中国科学院信息工程研究所 一种基于图文融合的生成式摘要生成方法
CN110704606B (zh) * 2019-08-19 2022-05-31 中国科学院信息工程研究所 一种基于图文融合的生成式摘要生成方法
CN110519532A (zh) * 2019-09-02 2019-11-29 中移物联网有限公司 一种信息获取方法及电子设备
CN111694984A (zh) * 2020-06-12 2020-09-22 百度在线网络技术(北京)有限公司 视频搜索方法、装置、电子设备及可读存储介质
CN111694984B (zh) * 2020-06-12 2023-06-20 百度在线网络技术(北京)有限公司 视频搜索方法、装置、电子设备及可读存储介质

Also Published As

Publication number Publication date
CN108460032A (zh) 2018-08-28

Similar Documents

Publication Publication Date Title
Braun et al. Eurocity persons: A novel benchmark for person detection in traffic scenes
US11055535B2 (en) Method and device for video classification
US10540772B2 (en) Feature trackability ranking, systems and methods
US9912874B2 (en) Real-time visual effects for a live camera view
WO2018149376A1 (fr) Procédé et dispositif de génération d'extrait vidéo
US11842514B1 (en) Determining a pose of an object from rgb-d images
Braun et al. The eurocity persons dataset: A novel benchmark for object detection
CN104376576B (zh) 一种目标跟踪方法及装置
US9798949B1 (en) Region selection for image match
US20170352162A1 (en) Region-of-interest extraction device and region-of-interest extraction method
US10891019B2 (en) Dynamic thumbnail selection for search results
US10998007B2 (en) Providing context aware video searching
CN110413816A (zh) 彩色草图图像搜索
CN110796701A (zh) 标记点的识别方法、装置、设备及存储介质
CN114972599B (zh) 一种对场景进行虚拟化的方法
CN107832331A (zh) 可视化对象的生成方法、装置和设备
US11961249B2 (en) Generating stereo-based dense depth images
CN110009662A (zh) 人脸跟踪的方法、装置、电子设备及计算机可读存储介质
US11158122B2 (en) Surface geometry object model training and inference
US20250166135A1 (en) Fine-grained controllable video generation
CN115601672A (zh) 一种基于深度学习的vr智能巡店方法及装置
CN119180997A (zh) 目标检测模型训练方法、装置、电子设备及存储介质
CN117455972A (zh) 基于单目深度估计的无人机地面目标定位方法
CN115240077A (zh) 锚框无关角点回归的遥感图像任意方向物体检测方法及装置
CN119088997B (zh) 图像查询方法、设备以及程序产品

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18754210

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18754210

Country of ref document: EP

Kind code of ref document: A1