CN109376603A - A kind of video frequency identifying method, device, computer equipment and storage medium - Google Patents
A kind of video frequency identifying method, device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN109376603A CN109376603A CN201811113391.2A CN201811113391A CN109376603A CN 109376603 A CN109376603 A CN 109376603A CN 201811113391 A CN201811113391 A CN 201811113391A CN 109376603 A CN109376603 A CN 109376603A
- Authority
- CN
- China
- Prior art keywords
- video
- recognition result
- subfile
- identification
- key frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The embodiment of the invention discloses a kind of video frequency identifying method, device, computer equipment and storage mediums, the described method includes: obtain simple video subfile corresponding with video file to be identified and simple audio subfile, and acquisition key frame set corresponding with the simple video subfile and video clip set;Multi-modal picture recognition is carried out to the key frame set, the first recognition result is obtained, and video identification is carried out to the video clip set, obtains the second recognition result;Audio identification is carried out to the simple audio subfile, obtains third recognition result;According to first recognition result, second recognition result and the third recognition result, obtain corresponding with the video file integrating recognition result.The technical solution of the embodiment of the present invention is realized on the basis of reducing identification cost, and the rich of video identification technology, accuracy, high efficiency and real-time are improved.
Description
Technical field
The present embodiments relate to technical field of video processing more particularly to a kind of video frequency identifying method, device, computers
Equipment and storage medium.
Background technique
As Global Internet is popularized with what is communicated, allow people all over the world with various communication equipments
Online exchange, transmission multimedia messages.People can upload to respective picture, text, voice, video etc. the network platform point
Enjoy respective state, mood, beautiful scenery etc..And video with it includes abundant content information, allow people more intuitive, clear
Understanding content and largely transmit and be stored in the network platform.But in the video of people's upload, there are many local laws, road
The video that moral does not allow, such as yellow, gambling, it is bloody, vulgar, sudden and violent probably, extreme religion video.User propagates these views in downloading
When frequency, the great variety (especially teenager) that is easy to cause in soul.And the view of magnanimity on audit internet is manually gone merely
Frequency is a very time-consuming, laborious and unpractical problem.Video audit technology is come into being in this context.
Video audits technology early stage generally using traditional machine learning method, and this method uses artificial design features, needle
Pair be specific library, lack generalization (general the library be applicable in, algorithm performance is just deteriorated into another library).People is used later
Work audit combines conventional video audit technology to be audited by 7*24 hours uninterrupted naked eyes+machine auxiliary, reduces illegal
The appearance of violation video content.In recent years, in the fast development of the fields such as video, image, voice in deep learning.Therefore based on deep
Study, image recognition, the audit of the machine intelligence of cloud are spent as Main Trends of The Development, this can make enterprise put into manual examination and verification
Cost substantially reduce and available better video auditing result.Country Baidu, Netease, map, Shang Tangdeng section at present
Skill company is all proposed respective video auditing system accordingly, and external Google, Facebook, Amazon, Valossa
Deng being also proposed each video auditing system for having oneself characteristic.
In the implementation of the present invention, the discovery prior art has following defects that inventor
Although machine learning method can identify part violation content information, in short-sighted frequency, live video etc.
When appearance can not but accomplish accurate content recognition and face the video of magnanimity, algorithm cannot identify in video well
Hold.And manual examination and verification combination conventional video audit technology needs huge manual examination and verification team, audits accuracy rate in artificial intelligence
It also needs further to expand its team when not high.Meanwhile manual examination and verification also will cause fatigue in continual audit video,
And then lead to missing inspection, the erroneous detection of some videos.And enterprise needs to carry out plenty of time training to manual examination and verification personnel, so that enterprise
Industry is to the input costs of manual examination and verification considerably beyond machine learning algorithm cost.It is existing to be based on deep learning, image recognition, cloud
The machine intelligence audit technology of technology cannot detect a large amount of vulgar unhelpful videos present on current network well, and identify system
Unite identification content is relatively simple, identification range is small, identification dimension once increase calculation amount also will exponentially type increase, to calculation
Force request is excessively high.
Summary of the invention
The embodiment of the present invention provides a kind of video frequency identifying method, device, computer equipment and storage medium, to know reducing
On the basis of other cost, the rich of video identification technology, accuracy, high efficiency and real-time are improved.
In a first aspect, the embodiment of the invention provides a kind of video frequency identifying methods, comprising:
Obtain simple video subfile corresponding with video file to be identified and simple audio subfile, and obtain and
The corresponding key frame set of the simple video subfile and video clip set;
Multi-modal picture recognition is carried out to the key frame set, obtains the first recognition result, and to the video clip
Set carries out video identification, obtains the second recognition result;
Audio identification is carried out to the simple audio subfile, obtains third recognition result;
According to first recognition result, second recognition result and the third recognition result, obtain with it is described
Video file is corresponding to integrate recognition result.
Second aspect, the embodiment of the invention also provides a kind of video identification devices, comprising:
Subfile obtains module, for acquisition simple video subfile corresponding with video file to be identified and merely
Audio subfile, and obtain key frame set corresponding with the simple video subfile and video clip set;
First identification module, for obtaining the first recognition result to the multi-modal picture recognition of key frame set progress,
And video identification is carried out to the video clip set, obtain the second recognition result;
Second identification module obtains third recognition result for carrying out audio identification to the simple audio subfile;
Recognition result obtains module, for according to first recognition result, second recognition result and described the
Three recognition results obtain corresponding with the video file integrating recognition result.
The third aspect, the embodiment of the invention also provides a kind of computer equipment, the computer equipment includes:
One or more processors;
Storage device, for storing one or more programs;
When one or more of programs are executed by one or more of processors, so that one or more of processing
Device realizes video frequency identifying method provided by any embodiment of the invention.
Fourth aspect, the embodiment of the invention also provides a kind of computer storage mediums, are stored thereon with computer program,
The program realizes video frequency identifying method provided by any embodiment of the invention when being executed by processor.
The embodiment of the present invention is by obtaining simple video subfile corresponding with video file to be identified and simple sound
Frequency subfile, and key frame set corresponding with simple video subfile and video clip set;To key frame set into
The multi-modal picture recognition of row obtains the first recognition result, and carries out video identification to video clip set and obtain the second identification knot
Fruit;Audio identification is carried out to simple audio subfile and obtains third recognition result, finally ties the first recognition result, the second identification
Fruit and third recognition result are integrated to obtain the integration recognition result of video file, solve the existing knowledge of existing video audit technology
The problem that other content is single and identification range is small realizes abundant identification type, refinement identification content, carries out multidimensional to video content
The identification of degree improves the rich of video identification technology, accuracy, high efficiency and reality on the basis of reducing identification cost
Shi Xing.
Detailed description of the invention
Fig. 1 is a kind of flow chart for video frequency identifying method that the embodiment of the present invention one provides;
Fig. 2 is a kind of flow chart of video frequency identifying method provided by Embodiment 2 of the present invention;
Fig. 3 a is a kind of flow chart for video frequency identifying method that the embodiment of the present invention three provides;
Fig. 3 b is a kind of bounding box size and position prediction effect diagram that the embodiment of the present invention three provides;
Fig. 3 c is a kind of Face datection effect diagram that the embodiment of the present invention three provides;
Fig. 3 d is a kind of effect diagram for face key point location that the embodiment of the present invention three provides;
Fig. 3 e is a kind of flow chart for video frequency identifying method that the embodiment of the present invention three provides;
Fig. 3 f is a kind of system schematic for video identification that the embodiment of the present invention three provides;
Fig. 3 g is a kind of flow chart for video frequency identifying method that the embodiment of the present invention three provides;
Fig. 3 h is a kind of schematic diagram for logarithm Meier spectrum signature that the embodiment of the present invention three provides;
Fig. 3 i is a kind of video recognition algorithms configuration diagram that the embodiment of the present invention three provides;
Fig. 4 is a kind of schematic diagram for video identification device that the embodiment of the present invention four provides;
Fig. 5 is a kind of structural schematic diagram for computer equipment that the embodiment of the present invention five provides.
Specific embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched
The specific embodiment stated is used only for explaining the present invention rather than limiting the invention.
It also should be noted that only the parts related to the present invention are shown for ease of description, in attached drawing rather than
Full content.It should be mentioned that some exemplary embodiments are described before exemplary embodiment is discussed in greater detail
At the processing or method described as flow chart.Although operations (or step) are described as the processing of sequence by flow chart,
It is that many of these operations can be implemented concurrently, concomitantly or simultaneously.In addition, the sequence of operations can be by again
It arranges.The processing can be terminated when its operations are completed, it is also possible to have the additional step being not included in attached drawing.
The processing can correspond to method, function, regulation, subroutine, subprogram etc..
Embodiment one
Fig. 1 is a kind of flow chart for video frequency identifying method that the embodiment of the present invention one provides, and the present embodiment is applicable to pair
Video file carries out accurate, quick the case where identifying, this method can be executed by video identification device, which can be by soft
The mode of part and/or hardware can be generally integrated in computer equipment to realize.Correspondingly, as shown in Figure 1, this method packet
Include following operation:
S110, simple video subfile corresponding with video file to be identified and simple audio subfile are obtained, and
Obtain key frame set corresponding with the simple video subfile and video clip set.
Wherein, video file to be identified may include two kinds of data resources of video and audio.Simple video subfile can
To be the file for only including video resource, similarly, simple audio subfile can be the file for only including audio resource.Key frame
Set can be used for storing each key frame in simple video subfile, and key frame can be representative in simple video subfile
Strongest video frame.Wherein, the representative representativeness for referring to video clip semantic content, content intact and semanteme are obviously.Depending on
Frequency set of segments can be used for storing each video clip in simple video subfile.
In embodiments of the present invention, after getting video file to be identified, can to video file to be identified into
The processing that row audio-video detaches, to obtain simple video subfile and simple audio subfile.To video file to be identified
When being identified, simple video subfile and simple audio subfile can be identified respectively.Specifically, to simple view
When frequency subfile is identified, two kinds of identifying schemes of image recognition and video identification can be carried out.When being identified to image,
It can be identified according to each key frame in the corresponding key frame set of simple video subfile;It is identified to video
When, each video clip in the corresponding video clip set of simple video subfile can be identified.
S120, multi-modal picture recognition is carried out to the key frame set, obtains the first recognition result, and to the video
Set of segments carries out video identification, obtains the second recognition result.
Wherein, multi-modal picture recognition can be integration or fusion two kinds and two or more picture recognition features.First knows
Other result can be picture recognition as a result, the second recognition result can be video recognition result.
In embodiments of the present invention, each key frame in the corresponding key frame set of simple video subfile is identified
When, the first identification knot can be obtained to the key frame picture recognition in key frame set by the way of multi-modal picture recognition
Fruit.Each video clip in the corresponding video clip set of simple video subfile is carried out to identify available second identification knot
Fruit.
It should be noted that in embodiments of the present invention, the acquisition process of the first recognition result and the second recognition result is
It is mutually independent, it is unaffected by each other.That is, the video identification process of multi-modal picture recognition and video clip is mutually indepedent
Link.
S130, audio identification is carried out to the simple audio subfile, obtains third recognition result.
Wherein, third recognition result can be audio recognition result.
Correspondingly, carrying out audio identification, available corresponding third recognition result to simple audio subfile.
S140, according to first recognition result, second recognition result and the third recognition result, obtain with
The video file is corresponding to integrate recognition result.
Wherein, integrate recognition result can be to the first recognition result, the second recognition result and third recognition result by
The recognition result integrated according to setting rule.
It in embodiments of the present invention, can after obtaining the first recognition result, the second recognition result and third recognition result
Corresponding with video file recognition result is integrated to be integrated to obtain to three kinds of recognition results.It optionally, can be directly by
Union is added to obtain integrating recognition result in one recognition result, the second recognition result and third recognition result.
Video frequency identifying method provided by the embodiment of the present invention can be used for in video whether there is law and morals not
The content of permission is audited, and carries out intelligent video audit for content illegal in short-sighted frequency, live video and long video,
To construct good the Internet transmission storage environment, can be very good to solve the problems, such as that current short-sighted frequency and live streaming platform exist,
And enterprise is substantially reduced to the investment of manual examination and verification.Meanwhile video frequency identifying method provided by the embodiment of the present invention customizes energy
Power is strong, and flexibility ratio is high, can be customized according to user behavior and solve user demand.It can also provide to have with video content and be associated with by force
Property advertisement launch, to promote advertisement delivery effect.
The embodiment of the present invention is by obtaining simple video subfile corresponding with video file to be identified and simple sound
Frequency subfile, and key frame set corresponding with simple video subfile and video clip set;To key frame set into
The multi-modal picture recognition of row obtains the first recognition result, and carries out video identification to video clip set and obtain the second identification knot
Fruit;Audio identification is carried out to simple audio subfile and obtains third recognition result, finally ties the first recognition result, the second identification
Fruit and third recognition result are integrated to obtain the integration recognition result of video file, solve the existing knowledge of existing video audit technology
The problem that other content is single and identification range is small realizes abundant identification type, refinement identification content, carries out multidimensional to video content
The identification of degree improves the rich of video identification technology, accuracy, high efficiency and reality on the basis of reducing identification cost
Shi Xing.
Embodiment two
Fig. 2 is a kind of flow chart of video frequency identifying method provided by Embodiment 2 of the present invention, and the present embodiment is with above-mentioned implementation
It is embodied based on example, in the present embodiment, gives acquisition key frame collection corresponding with the simple video subfile
The specific implementation of conjunction and video clip set.Correspondingly, as shown in Fig. 2, the method for the present embodiment may include:
S210, simple video subfile corresponding with video file to be identified and simple audio subfile are obtained, and
Obtain key frame set corresponding with the simple video subfile and video clip set.
Correspondingly, S210 can specifically include:
S211, the simple video subfile is filtered using video frame coarse filtration technology, obtains filtering video frame
Set.
Wherein, filtering sets of video frames can be used for storing the video frame obtained after simple video subfile is filtered.
It is understood that handling the frame image in entire video flowing is very time-consuming and waste computing resource, it is common
Processing system for video generally use and carry out double sampling in video streaming with uniform time interval to reduce the number of video frame
Amount, but this method easily loses certain key frames in video.
The embodiment of the present invention is in order to improve the accuracy of key-frame extraction, first using video frame coarse filtration technology to simple
Video subfile is filtered, to effectively reduce the quantity of video frame.Specifically, can will be dark in simple video subfile
Frame, fuzzy frame and low-quality filtering frames, to obtain the preferable video frame of most of total quality.And then obtain after filtration
Filtering sets of video frames selects clear, bright and high quality video frame as key frame.
Specifically, spacer can be filtered out by following formula:
Luminance(Irgb)=0.2126Ir+0.7152Ig+0.0722Ib
Wherein, Luminance () indicates brightness of image, IrgbIndicate RGB triple channel natural image, IrIndicate red logical
Road image, IgIndicate that green channel images, r indicate that red channel, g indicate that green channel, b indicate that blue channel, rgb indicate three
Channel.When the brightness of image of each video frame in filtering sets of video frames is calculated by above-mentioned formula, setting can be passed through
The video frame that the mode of threshold value is unsatisfactory for requiring to brightness of image is filtered.
Fuzzy frame can be filtered by following formula:
Wherein, Sharpness () indicates image definition, IgrayIndicate gray level image, ΔxIndicate transverse gradients, ΔyTable
Show longitudinal gradient, x indicates transverse direction, and y indicates longitudinal direction.It is filtered in sets of video frames when being calculated by above-mentioned formula
When the image definition of each video frame, can in such a way that threshold value is set to image definition be unsatisfactory for require video frame into
Row filtering.
Low-quality frame can be filtered by following formula:
Wherein, δ indicates that picture quality, M indicate that horizontal pixel number, N indicate that longitudinal pixel number, i indicate lateral coordinates, j table
Show that longitudinal coordinate, P () indicate that pixel value, μ indicate threshold value.It is each in filtering sets of video frames when being calculated by above-mentioned formula
When the picture quality of video frame, the video frame that can be unsatisfactory for requiring to picture quality by way of threshold value is arranged was carried out
Filter.
In addition, can have a large amount of fuzzy frame during the Shot change of video.And hence it is also possible to according to lens edge
Detection technique further filters out underproof video frame.
S212, calculating feature vector corresponding with video frame is respectively filtered in the filtering sets of video frames, and according to
Described eigenvector carries out clustering processing to each filtering video frame in the filtering sets of video frames, obtains at least two clusters
Cluster, wherein include at least one filtering video frame in the clustering cluster.
The most common extraction method of key frame is clustering, and clustering is similar by calculating the vision between video frame
Degree, and select a video frame closest to clustering cluster center as key frame from each clustering cluster.Implement in the present invention
In example, key frame can be extracted according to the feature vector of video frame is respectively filtered in filtering sets of video frames.Specifically, can count
Calculate the corresponding feature vector of each filtering video frame, and according to the feature vector of each filtering video frame to each filtering video frame into
Row clustering processing, and then obtain multiple clustering clusters.It include at least one filtering video frame in each clustering cluster.
In an alternate embodiment of the present invention where, it calculates and respectively filters video frame difference in the filtering sets of video frames
Corresponding feature vector may include: to be regarded using convolutional neural networks model to each filtering in the filtering sets of video frames
Frequency frame carries out feature extraction;Or, using local binary patterns LBP to it is described filtering sets of video frames in each filtering video frame into
Row feature extraction, and each feature extraction result is subjected to processing formation statistics histogram as with each filtering video frame and is distinguished
Corresponding LBP feature vector.
In embodiments of the present invention, can use convolutional neural networks (Convolutional Neural Network,
CNN) model carries out feature extraction to each filtering video frame in the filtering sets of video frames.Specifically, classics can be selected
The network architectures such as CNN model such as AlexNet, VGGNet or Inception, obtain the high dimensional feature vector statement of video frame.
It is understood that many frames usually in video are all that similarity is very high, therefore is directed to the one of video frame
The feature for being easy to calculate a bit can effectively distinguish the similarity between different video frame, such as color and edge histogram feature
Or LBP (Local Binary Pattern, local binary patterns) feature etc..It optionally, in embodiments of the present invention, can be with
Video frame is described using LBP feature as feature descriptor.Transformed matrix-vector is obtained using LBP first, and then LBP
Feature vector of the statistics histogram as video frame.In order to consider the location information of feature, it is small that video frame is divided into several
Region carries out statistics with histogram in each zonule, that is, counts the quantity for belonging to a certain mode in the region, finally again institute
There is the histogram in region to be once connected together as the processing that feature vector receives next stage.
S213, the static highest filtering video frame composition key frame set of angle value in each clustering cluster is obtained respectively.
In embodiments of the present invention, optionally, key frame is extracted from different clustering clusters by the static degree of picture.
Since the motion compensation used in video compress will lead to fuzzy pseudomorphism, usually the picture with high kinergety also can more mould
Paste.Therefore, the quality of the key frame by selecting the picture with harmonic motion energy to may insure to extract is higher.Specifically, can
To be clustered first using feature vector of the K mean algorithm to the video frame extracted, the number of clustering cluster be can be set to
The number of camera lens in video, to obtain better cluster result.Different subsets ID number having the same in same clustering cluster, and
Calculate separately the static degree of picture.Static degree refers to the inverse of the quadratic sum of the pixel difference of adjacent picture, can be poly- from difference
Key frame of the static highest picture of angle value as the clustering cluster is selected in class cluster.
It should be noted that in embodiments of the present invention, the purpose for selecting the highest filtering video frame of static angle value is sieve
Representative strongest video frame is selected in each clustering cluster as key frame.In addition to being according to each cluster of screening with static angle value
For representative strongest video frame as key frame, other can filter out the side of representative strongest video frame in clustering cluster in cluster
Method can also be used as the method for extracting key frame, and the embodiment of the present invention is to method used by extraction key frame and without limit
System.
S214, the time parameter according to the filtering video frame for including in each clustering cluster in simple video subfile,
Determining initial time corresponding with each clustering cluster and duration, and according to the initial time and the duration, it is right
The simple video subfile carries out slicing treatment, obtains the video clip set.
Correspondingly, can not only extract key frame, while video can also be pressed after carrying out clustering to video frame
Different classifications is divided into different segments, by the starting of the available video clip of quantity of video frame in the boundary of classification and class
Time and clip durations, and then simple video subfile can be resolved into the frequency range of neglecting with special characteristic, complete slice
Processing obtains video clip set.Video clip set can be used for carrying out video identification.
S220, multi-modal picture recognition is carried out to the key frame set, obtains the first recognition result, and to the video
Set of segments carries out video identification, obtains the second recognition result.
S230, audio identification is carried out to the simple audio subfile, obtains third recognition result.
S240, according to first recognition result, second recognition result and the third recognition result, obtain with
The video file is corresponding to integrate recognition result.
By adopting the above technical scheme, it is operated by video frame coarse filtration, video frame feature extraction and key-frame extraction etc.
Extract key frame, can guarantee key frame meet with the highly relevant performance indicator of video content, to simple video subfile into
Row slicing treatment can be used for carrying out video identification to obtain video clip set, to realize more to video content progress
The identification of dimension.
Embodiment three
Fig. 3 a is a kind of flow chart for video frequency identifying method that the embodiment of the present invention three provides, and Fig. 3 b is the embodiment of the present invention
A kind of three bounding box sizes and position prediction effect diagram provided, Fig. 3 c are a kind of faces that the embodiment of the present invention three provides
Detection effect schematic diagram, Fig. 3 e are a kind of flow charts for video frequency identifying method that the embodiment of the present invention three provides, and Fig. 3 g is this hair
A kind of flow chart for video frequency identifying method that bright embodiment three provides.The present embodiment is carried out specifically based on above-described embodiment
Change, in the present embodiment, gives the specific implementation for obtaining each recognition result.Correspondingly, as shown in Figure 3a, the present embodiment
Method may include:
S310, simple video subfile corresponding with video file to be identified and simple audio subfile are obtained, and
Obtain key frame set corresponding with the simple video subfile and video clip set.
Wherein, after simple video subfile in video file to be identified is identified available first recognition result and
Second recognition result;Available third recognition result after simple audio subfile is identified.
Correspondingly, to key frame set carry out multi-modal picture recognition obtain the first recognition result can specifically include it is following
Two kinds of operations:
S320, picture classification is carried out to each key frame in the key frame set using default picture classification model, and
Using classification results as first recognition result.
Wherein, presetting picture classification model can be for the preparatory trained network to key frame progress picture classification
Model.
In embodiments of the present invention, it presets training data when picture classification model is trained and is mainly derived from two sides
Face.First is that including the background data base of the available label data set voluntarily marked of 20,000 multiclass, second is that the public affairs such as ImageNet
Open data set.It is abundant in content colorful due to picture, it is difficult to all categories accurately be differentiated using single model.Cause
This, the embodiment of the present invention can solve the problems, such as precisely to identify using multistage disaggregated model: the first order separates major class, such as quotation
Class, sport category and vegetable class etc.;The second level carries out more sophisticated category, is such as finely divided into basketball movement and foot again to sport category
Ball movement etc..Simultaneously according to the actual situation, it can take the circumstances into consideration to carry out third level classification, such as identify it is which two in basketball movement
Team is playing.Every first-level class device can select according to the actual situation classification, object detection, OCR based on CNN network
The methods of (Optical Character Recognition, optical character identification) completes classification.According to the reality of image content
Situation completes a series of streams such as the building of training dataset, including task formulation, picture crawler, picture calibration, inspection of quality
Journey, to guarantee to identify quality.
ResNet network can use residual error study and solve degenerate problem, and the content that residual error study needs to learn is relatively
It is few, therefore learning difficulty is small and is easy to get preferable effect, experiments have shown that increase of the ResNet network with depth, the knot of generation
Fruit shows much better than network traditional before.ResNet network not only shows very on the data set of ImageNet
It is good, equally have good performance on the data set of COCO etc., illustrate ResNet network be used as one it is general
Model.It therefore, in embodiments of the present invention, can be using ResNet as CNN network model when carrying out picture classification.
Further, it can be trained ResNet-34 as basic model.
S330, each key frame in the key frame set is separately input into YOLOv3 model trained in advance;And
YOLOv3 model output is obtained, target object mark corresponding with each key frame and target object exist
Position coordinates in key frame are as the first recognition result.
Wherein, target object can be object in addition to face, such as animal, automobile or cutter etc..Target object mark
Know the label in the list of labels that can be picture or video identification.Illustratively, list of labels includes but is not limited to: (1) excessive
Dirty pornographic:, true man sexuality pornographic including true man, animation pornographic, the sexy and some special defects of animation;(2) bloody violence: including a surname
It raises sudden and violent probably tissue, the bloody scene of violence and fights;(3) political sensitivity: including political sensitivity personage and scene etc.;(4) it disobeys
Prohibited cargo product: including being involved in drug traffic, controlled knife and army and police's articles etc.;(5) vulgar unhelpful: including exposed upper body, smoking, vulgar place and
It tatoos.In embodiments of the present invention, list of labels can update.Completely train the model of entire video identification usual
Need to spend the time of several weeks.Due to list of labels update frequency it is very fast, the model of video identification with list of labels again
Training is clearly very time-consuming.In order to shorten the training time, the iteration of model can be carried out using the method for transfer learning, it should
Method is finely adjusted the partial nerve network layer of the model by a model completely trained, new to identify
Classification.Training time and training resource can be greatlyd save in this way.Specific step is as follows: (1) changing softmax layers of node
Number is new number of labels, other network structures can not change;(2) weight of trained model before being loaded into;(3) again
Training pattern, but can substantially reduce the trained time.
Target detection is that the multiple objects in picture are positioned and classified, and positioning marks in picture where object
Position, classification is then to provide object each in picture to correspond to classification.Target detection is handled for multiple target, to improve
The speed and accuracy rate of video identification.It in embodiments of the present invention, can when being identified to the target object in each key frame
To be identified using YOLOv3 model trained in advance as target detection network.YOLOv3 model is end-to-end detection, nothing
It needs region to nominate, target discrimination and target identification is combined into one, recognition performance can be substantially improved.Pass through training in advance
After YOLOv3 model identifies the corresponding target object of each key frame, the label in list of labels can use to object
Body is identified, and can by the recognition result of key frame target object according to its position coordinates in key frame make its with
Each video clip is matched one by one in video clip set.Illustratively, it is assumed that YOLOv3 model identifies the 3rd key frame
In controlled knife, and recognize the key frame that the key frame belongs to the 2nd video clip, then can be by the 3rd key frame
Recognition result is matched in the 2nd video clip.
YOLOv3 model used in the embodiment of the present invention introduces residual error structure and constructs new Darknet-53;It simultaneously can
To carry out repeated detection, three different anchor are respectively set to be detected on the different characteristic pattern of three scales;Its
It is secondary not use softmax with more classification entropy loss and carry out single classification.The master of YOLOv3 model progress target detection
Wanting process includes the following aspects:
(1) predicted boundary frame
Input picture is divided into S*S cyberspace position cell, fixed frame anchor is obtained by the method for cluster
Boxes then predicts four coordinate value (t to each bounding boxx,ty,tw,th).For the cell of prediction, according to figure
As the offset (c in the upper left cornerx,cy), and the width and high p of bounding box is obtained beforewAnd phIt can be to bounding box
It is predicted.Mean square error loss function can be used when these coordinate values of YOLOv3 model training, to each bounding
Box predicts the score of an object by logistic regression.If the bounding box of prediction and true frame value are most of
Be overlapped and than other all predictions than get well, then the score value of object is just 1.If overlap does not reach a threshold value
(threshold value can be set to 0.5) is shown as no penalty values then the bounding box of this prediction will be ignored.
Fig. 3 b is a kind of bounding box size and position prediction effect diagram that the embodiment of the present invention three provides, with reference to figure
3b, predicted boundary frame can use following formula:
bx=σ (tx)+cx
by=σ (ty)+cy
Wherein, bxIndicate the upper left corner boundingbox abscissa, byIndicate the upper left corner boundingbox ordinate, bwTable
Show boundingbox width, bhIndicate boundingbox height, tx、ty、twAnd thIt is expressed as generating bounding box net
Four coordinate values of network model prediction, cxAnd cyIndicate deviant, pwAnd phIndicate the width and height of the bounding box of priori, σ
() indicates activation primitive.
(2) class prediction
Each bounding box is classified using multi-tag.Therefore polytypic softmax layers of single label is changed into and is used for
The polytypic logistic regression layer of multi-tag, does two classification to each classification by simple logistic regression layer.Logistic regression layer
Sigmoid function mainly is used, which can be by input constraint in the range of 0 to 1, therefore works as a picture and pass through feature
Certain one kind output after extraction, if it is greater than 0.5, means that after sigmoid function constraint and belongs to such.
(3) across scale prediction
YOLOv3 model is given a forecast by the way of multiple scale fusion, in three different scale prediction boxes, is made
The priori of boundingbox is obtained with cluster, selects 9 clustering clusters and 3 scales, it is then that this 9 clustering clusters are uniform
It is distributed on these scales.Changed by FPN (Feature Pyramid Network, feature pyramid network) network special
Sign extracts model, and finally prediction obtains a 3-d tensor, and it comprises bounding box information, object information and more
The predictive information of a class.Make YOLOv3 model available in such a way to more semantic informations.
(4) feature extraction
In embodiments of the present invention, YOLOv3 model is using DarkNet-53 network as feature extraction layer, one side base
This uses full convolution, and the sample of feature map is using convolutional layer, residual structure is on the other hand introduced, subtracts
Network, can be accomplished 53 layers, use multiple 3*3 and 1*1 convolutional layers by the small network difficulty of trained layer, improve network essence
Degree.
YOLOv3 model in the embodiment of the present invention can improve the recognition effect of multiple target multi-tag and Small object.
(5) training
In embodiments of the present invention, a variety of methods, such as data can be used to enhance when being trained to YOLOv3 model.
S340, recognition of face is carried out to obtain the first recognition result to each key frame of the key frame set.
Correspondingly, S340 can specifically include:
S341, Face datection is carried out to each key frame in the key frame set using S3FD algorithm.
It is understood that Face datection is the first step of recognition of face, it is particularly significant to recognition of face.Traditional face
Detection algorithm has the Face datection based on geometrical characteristic, the Face datection based on eigenface, the face inspection based on elastic graph matching
Survey and be based on the Face datection etc. of SVM (Support VectorMachine, support vector machines).Although these methods are able to achieve
The detection of face, but there are many erroneous detection, missing inspection, detection effect is very poor under complex background, and does not adapt to illumination, angle etc. and become
Change.To solve the above-mentioned problems, the embodiment of the present invention uses S3FD (the Single Shot Scale- based on deep learning
Invariant Face Detector, Scale invariant human-face detector) algorithm.S3FD algorithm is especially suitable for small Face datection.
Specifically, S3FD algorithm detects the face of different scale using the receptive field difference of different convolutional layers.The algorithm
Basic network be VGG16, can load VGG16 pre-training model accelerate network training.It is more multiple dimensioned in order to detect simultaneously
Face, S3FD algorithm increases 6 convolutional layers on the basis of VGG16, is ultimately used to the convolutional layer of detection face.S3FD is calculated
Method mainly has following two points improvement: 1) difference based on theoretical receptive field He practical receptive field, improves the side of anchor proposition
Formula;2) in order to preferably detect small face, more layer and scale are increased.The embodiment of the present invention is based on open source human face data
Collect wider face and VGG16 network is trained according to the human face data that self-demand is collected, detection effect figure Fig. 3 c
It is shown, it is seen then that the Face datection of the embodiment of the present invention works well.
S342, face key point location is carried out to the face detected by MTCNN algorithm, obtains face key point.
Face key point location is the key that do face alignment, needs to orient left and right human eye, the left and right corners of the mouth and nose
Position, the accuracy of face key point location can greatly influence the effect of face characteristic extraction.Traditional face key point is fixed
Position method is all based on the local feature of face greatly to position, and locating effect is undesirable, and generalization ability is poor, does not adapt to angle and light
According to etc. influence factors variation.To solve the above-mentioned problems, the embodiment of the present invention uses MTCNN (Multi-task Cascaded
Convolutional Networks, multitask concatenated convolutional network) algorithm realize face key point positioning.MTCNN is one
The convolutional neural networks of kind cascade structure, it is divided into tri- parts p-net, r-net and o-net.MTCNN can regard three as
The series connection of independent convolutional neural networks, three being completed for tasks of network be it is the same, it is only slightly poor in network structure
Not.The main thought of MTCNN algorithm is exactly the cascade using multiple networks, is constantly optimized to the same task, that is, p-
Net obtain one it is rough as a result, then r-net makes improvements, last o-net again improves the result of r-net.
It is continuous in this way to improve, so that crucial point location becomes more accurate.The embodiment of the present invention is based on open source data set wider
Face, Celeba and according to self-demand collect human face data MTCNN network is trained, Fig. 3 d is the embodiment of the present invention
A kind of effect diagram of the three face key point locations provided, the detection effect figure of MTCNN algorithm are as shown in Figure 3d, it is seen then that
The face key point locating effect of the embodiment of the present invention is good.
S343, feature extraction is carried out to facial image by Arcface algorithm according to the face key point.
Face characteristic is extracted primarily to being compared to face, the face characteristic of the same person should be quite similar,
The feature of different faces should similarity it is very low.Because the feature extracted is all based on such a big classification of face, how to allow
The feature extracted is similar as far as possible on the face in the same person, and discrimination is big as far as possible on different faces, is the pass for extracting feature
Key.In order to increase the discrimination between different faces, the embodiment of the present invention is (i.e. deep using Arcface algorithm improvement sorter network
Degree neural network) loss function increase the discrimination between different classes of so that feelings such as there are many imbalanced training sets and classification
Remain to that there is preferable classifying quality under condition.
Arcface algorithm is that categorised demarcation line is directly maximized in angular region, that is, original to sorter network
Softmax loss function is modified, and the loss that angular region carrys out presentation class network is converted to.Original softmax
Loss calculation formula is as follows:
Loss calculation formula after Arcface algorithm improvement is as follows:
Wherein, L1Indicating the definition of loss function, m indicates the size of batchsize, and i and j are natural number,It indicates
The y_i column of i-th of sample the last one full articulamentum, x and y indicate feature vector and classification, and x_i indicates i-th of sample
Deep learning feature and y_i indicate classification belonging to i-th of sample, and T indicates transposition operation,Indicate the last one full connection
The y_i column of the bias term of layer, b indicate the bias term of the last one full articulamentum,Indicate i-th sample last
The jth of a full articulamentum arranges, bjIndicate that the jth column of the bias term of the last one full articulamentum, s indicate after normalization | |
X | |, θyiIndicate angle between w_ (y_i) and x_i, θjIndicate angle between w_j and x_i.
Compared with original softmax loss, possess better performance, class spacing with the feature that Arcface algorithm extracts
It is big from more, even if still having preferable differentiation effect in the case where there are many classification number.
S344, it is matched according to the face characteristic extracted with the feature in feature database, and according to matching result to each
The corresponding people information of the key frame is identified, and using the mark result of the face information as the first recognition result.
In embodiments of the present invention, after extracting the face characteristic in key frame picture, building need to identify the people of personage
Face feature database also gets up the storage corresponding with his face characteristic of the people information in key frame picture.In cognitive phase meeting
It will be matched from the feature of face extraction to be detected and the face characteristic of feature database, identification knot provided according to matched similarity
Fruit.Under normal conditions, measures characteristic vector similarity has Euclidean distance and two kinds of COS distance.Because of Euclidean distance fluctuation range
It is bigger, it is difficult to there is a determining threshold value to define similarity, so the embodiment of the present invention describes spy using COS distance
The similarity of sign.COS distance range can very easily determine demarcation threshold between [- 1,1].On matching principle originally
The matching algorithm that inventive embodiments use closest matching method to combine with threshold method.The algorithm calculate first feature to be identified with
The similarity of feature planting modes on sink characteristic takes classification of the personage's classification of the highest feature of similarity as the identification feature, then judges this
Whether similarity is greater than the threshold value of setting, then assert it is that the figure kind is other greater than the threshold value, then determines it is not special less than the threshold value
Personage's classification of Zheng Kunei.
It should be noted that before carrying out multimodal recognition to key frame picture, it is also necessary to the key frame figure of input
Piece is pre-processed.Pretreatment refers mainly to be standardized picture and that picture is zoomed to same size is defeated as model
Enter, the first order is introduced into the classification that different classes of model carries out level-one label, and the second level is directed to a certain major class or a few again
Major class carries out more fine identification, and such classification framework is very easy to extension, and target detection mould can be used in part labels
Type and human face recognition model are assisted in identifying.
It should be noted that Fig. 3 a is only a kind of schematic diagram of implementation, there is no successively suitable between S320 and S330
Order relation can first implement S320, then implement S330, can also first implement S330, then implement S320, can be real parallel with the two
Apply or select an implementation.
Correspondingly, as shown in Figure 3 e, obtaining the second recognition result can specifically include operations described below:
S350, video identification is carried out to video clip set, obtains the second recognition result.
Specifically, S350 may include operations described below:
S351, time domain down-sampling is carried out to each video clip in the video clip set respectively, obtained and piece of video
The corresponding sampled video frame set of section.
Wherein, sampled video frame set can be used for storing according to the video frame obtained after setting rule sampling.
Fig. 3 f is a kind of system schematic for video identification that the embodiment of the present invention three provides.As illustrated in figure 3f, in this hair
In bright embodiment, the accurate identification of different type of action in video clip is realized using 3D convolutional neural networks 3DCNN technology.
The movement of identification may include fight, smoke, drinking, society shake with sea grass dance etc. more than 20 kinds of bad vulgar movements, can also
To include more than the 100 kinds of conventional actions such as having a meal, climb rocks, jump, play football and kissing.Difference movement resolution accuracy may be different
Sample, for example dance it is more likely that a global action, and smoke it is more likely that an activities.In order to meet different resolution
Demand, the embodiment of the present invention can construct a high-resolution 3DCNN network and a low resolution 3DCNN network.Usual feelings
Under condition, by time domain specification there are two types of in the way of, one is directly with original picture frame as the input of 3DCNN, it is for second
X gradient, y gradient and Optical-flow Feature between picture frame are extracted as the input of 3DCNN.It should be noted that 3DCNN into
When row training, for more classification problems, 3DCNN can use polytypic cross entropy loss.
Video sequence is the image of time correlation.In the time domain, the time interval very little of consecutive frame, especially in time domain
In the case where sample rate higher such as 25fps, 30fps, 50fps and 60fps, the correlation of consecutive frame is very high, and 3DCNN
Each sample of input also requires time domain frame number to fix.In embodiments of the present invention, time domain frame number can position 16 frames, Ke Yiwei
Angular transition under different frame rates provides prerequisite.Specifically, as illustrated in figure 3f, the system of video identification of the embodiment of the present invention
In M1 module can use following two kinds of sample modes:
Mode (1): it is assumed that the frame per second of original video sequence is Q, the frame per second after sampling is P, can be carried out based on time gap
Down-sampled processing, conversion formula are as follows: σi=λ θk+1+(1-λ)θk, whereinσiIndicate i-th of video
Frame, λ indicate weighting parameters, θk+1Indicate+1 frame image of kth of original video, θkIndicate that the kth frame image of original video, i are down-sampled
Frame number index afterwards, two frames i adjacent in original video sequence are k and k+1, it is down-sampled in this way after sequence of frames of video be σ=
[σ1, σ2…σM], wherein M value is 16.
Mode (2): the covering of 8 frames is had to 16 frames that take of original video Sequentially continuous, but between two neighboring 16 frame fragment, i.e.,
For original video segment, the segment of multiple 16 frames of 8 frames of covering mutually can be divided into.
Mode (1) can guarantee that the of overall importance of video clip, mode (2) can guarantee the locality and information of video clip
Integrality, the sample generated by both modes can take the label of original video segment, consequently facilitating being trained.
S352, spatially and temporally progress setting processing operation is integrated into the sampled video frame, obtains at least two classes
The input picture of type;Wherein, the setting processing operation includes that scaling processing, light stream extraction and edge image extract;It is described
The type of input picture includes high-definition picture, low-resolution image, light stream image and edge image.
Correspondingly, as illustrated in figure 3f, the M2 module in the system of video identification of the embodiment of the present invention is responsible for sample video
The processing that video frame in frame set is set.In embodiments of the present invention, M2 module provides 3 kinds of processing modes, that is, contracts
Processing, light stream extraction and edge image is put to extract.Wherein, scaling processing may further include two kinds of scalable manners: in order to
Meet to high-resolution demand, by original image in airspace resize at 224*224*3 size, and the image of low resolution is then
Resize is at 112*112*3.Light stream is the significant information of object of which movement in the time domain, is using pixel in image sequence in the time
The corresponding relationship between previous frame and present frame that the correlation between variation and consecutive frame on domain is found, between consecutive frame
This corresponding relationship regard the motion information of object as.Optionally, the embodiment of the present invention passes through in conjunction with opencv's
Cv2.calcOpticalFlowPyrLK () function calculates light stream.Edge image is the structural attribute and object of image
Move the significant information on airspace.Optionally, the embodiment of the present invention takes Canny operator extraction edge image, and to RGB3
A channel calculates separately edge feature.The calculation process of Canny are as follows: 1) filter out and make an uproar with smoothed image using Gaussian filter
Sound;2) gradient intensity of each pixel and direction in image are calculated;3) non-maxima suppression is applied, to eliminate edge detection band
The spurious response come;4) it detects using dual threshold to determine true and potential edge;5) by inhibiting isolated weak edge
It is finally completed edge detection.
S353, all kinds of input pictures are separately input into corresponding 3DCNN network, and use the 3DCNN net
Network identifies input picture, and obtains the 3DCNN network output, the output of video tab corresponding with input picture
Probability value.
Correspondingly, all kinds of input pictures can be separately input into corresponding 3DCNN after getting four class input pictures
In network.As illustrated in figure 3f, 3DCNN network may include the modules such as M3, M4 and M5.M3 module is the backbone network of 3DCNN,
The input of 3DCNN1,3DCNN2,3DCNN3 and 3DCNN4 be respectively high-definition picture, low-resolution image, light stream image and
Edge image.For the convolution kernel of 3DCNN network, airspace size can choose 3*3 and 5*5, the series connection side of multiple small convolution kernels
Formula.Pooling selects max_pooling, the size of time domain most to start all then gradually to be incremented by, be followed successively by 2,3,4 with 1.This
The mode of sample setting, which is that time-domain information is premature in order to prevent, to be fused.M4 module is full articulamentum (fully connected
Layers, FC), network parameter is excessive in order to prevent, and the embodiment of the present invention is only with a full convolutional layer.M5 module is substantially
A full articulamentum, but node number is classification quantity, prediction be current 3DCNN every a kind of input picture it is corresponding each
The output probability value of a classification.
S354, the output probability value of each video tab is merged according to setting amalgamation mode, will be obtained after fusion
To video tab be combined as the second recognition result.
Correspondingly, as illustrated in figure 3f, Output module merges the output probability value of the 3DCNN of front 4, fusion side
Formula is the probability value multiplication that 4 3DCNN networks correspond to classification, as a result the fusion output probability as each classification input picture
Value, when prediction, take the output of this layer to export as a result.
Illustratively, it is assumed that disaggregated model is respectively to dance there are two classifications, the output probability value of 4 3DCNN: 0.9,
Smoking: 0.1;It dances: 0.8, smoking: 0.2;It dances: 0.9, smoking: 0.1;It dances: 0.7, smoking: 0.3;The output of each 3DCNN
The corresponding weight of probability value is defaulted as 0.25, then corresponding second recognition result of video clip can be with are as follows: dances: 0.825, meter
Calculation mode are as follows: 0.9*0.25+0.8*0.25+0.9*0.25+0.7*0.25=0.825, smoking: 0.175, calculation are as follows:
0.1*0.25+0.2*0.25+0.1*0.25+0.4*0.25=0.175.
It should be noted that in embodiments of the present invention, video clip is simple video subfile after slicing treatment
The relatively single segment of content.For long video segment, it is possible to can exist first 10 seconds and smoking, play football within latter 10 seconds
Situation.Such video can be cut into two video clips, and for both playing football in smoking in same 10 seconds
Video can not have to incision and the second recognition result by " smoking " and " playing football " two labels as the video.
Correspondingly, as shown in figure 3g, obtaining third recognition result can specifically include operations described below:
S360, audio identification is carried out to simple audio subfile, obtains third recognition result.
Specifically, S360 may include operations described below:
S361, the simple audio subfile is pre-processed after carry out fast Fourier change to obtain the simple sound
The frequency domain information of frequency subfile.
It in embodiments of the present invention, first can be to simple audio when carrying out audio identification to simple audio subfile
Subfile is pre-processed (including audio signal preemphasis or signal adding window etc.) and is obtained merely using Fast Fourier Transform (FFT) afterwards
The frequency domain information of audio subfile.Fast Fourier is the fast algorithm of discrete Fourier Asia transformation, is become according to discrete Fourier
The odd even characteristic changed carries out algorithm optimization, and complexity is reduced to O (nlogn) from O (n2).Fast Fourier Transform (FFT) formula can be with
Expression are as follows:
Wherein, X (k) indicates that the signal sequence after Fourier's variation, x (n) indicate discrete tone sequence after sampling, and n is indicated
Tonic train length, k indicate frequency domain sequence length, and N indicates Fourier transformation siding-to-siding block length,
S362, the corresponding energy spectrum of frequency domain information for calculating simple audio subfile described in every frame.
Correspondingly, modulus square fortune can be carried out to frequency domain information after getting the frequency domain information of simple audio subfile
It calculates, calculates the energy spectrum of each frame signal, and be filtered to signal with Meier filter group, calculate Meier filtered energy.It will
Signal is expressed with plural form are as follows:
X (k)=α e-jθk=acos θ k+jasin θ k=ak+jbk
Then signal energy stave reaches are as follows:
Wherein, E (k) indicates Meier filtered energy.
S363, logarithm Meier spectrum energy is obtained according to the energy spectrum.
Log spectrum feature has preferable retention to high-frequency signal, more to the audio identification performance in complex scene
Stablize.Therefore, in embodiments of the present invention, audio knowledge can be carried out to simple audio subfile according to logarithm Meier spectrum energy
Not.
Specifically, logarithm Meier spectrum energy can be calculated by following formula:
Wherein, E (n) indicates that the corresponding logarithm Meier spectrum energy of n-th of Meier filter, C (k) indicate kth section audio
The energy of signal, Hn(k) frequency response of n-th of Meier filter is indicated.
S364, logarithm Meier spectrum signature is extracted according to the logarithm Meier spectrum energy.
Correspondingly, in embodiments of the present invention, in the logarithm Meier spectrum signature processing stage of logarithm Meier spectrum energy,
The library Librosa can be used and extract audio frequency characteristics.Wherein, sample rate is set as 44100Hz, and frame length is set as 30ms, preemphasis
Coefficient is 0.89, using hamming window function.Meier spectrum signature coefficient 32 is tieed up, its first-order difference and second differnce feature are calculated,
The feature vector of 96 dimensions is constituted altogether.
S365, the logarithm Meier spectrum signature is reconstructed, obtains two-dimentional audio frequency characteristics.
Fig. 3 h is a kind of schematic diagram for logarithm Meier spectrum signature that the embodiment of the present invention three provides, as illustrated in figure 3h,
In the embodiment of the present invention, after obtaining logarithm Meier spectrum signature, obtained one-dimensional logarithm Meier spectral audio feature is subjected to weight
Structure, obtains two-dimensional audio frequency characteristics distribution, and characteristic pattern dimension is (frequency band number * audio frame length).
The operation of above-mentioned S351-S355 belongs to the pretreated operation of audio frequency characteristics, and the operation of following S356 belongs to and is based on
The data sorting operation of deep learning.
S366, feature extraction and audio classification are carried out to the two-dimentional audio frequency characteristics by being based on CNN basic structural unit.
In embodiments of the present invention, it can realize that further feature is extracted and audio point using CNN basic structural unit
Class.Convolutional layer can not only reduce computing cost, while can retain data using part connection and parameter sharing mechanism
Space distribution rule.Basic structural unit mainly includes convolutional layer, pond layer and activation primitive, and classification layer uses Softmax
Function, loss function use cross entropy loss function, and it is 128 that initial learning rate, which is set as 0.001, batchsize size, is used
Stochastic gradient descent optimization algorithm.
In an alternate embodiment of the present invention where, when the CNN basic structural unit uses multiple classifiers, pass through
The mode of ballot identifies the simple audio subfile.
Illustratively, when using 3 classifiers, if obtained recognition result is respectively as follows: label 1: smoking, confidence
Degree: 0.9, weight: 0.8;Label 2: smoking, confidence level: 0.8, weight: 0.1;Label 3: dancing, confidence level: 0.5, weight:
0.1.The mode then voted, which may is that, to be taken according to label 1 and the identical label of label 2: smoking, then corresponding final recognition result
It may is that label: smoking, confidence level: 0.8*0.9+0.1*0.8+0.1*0=0.8.
Fig. 3 i is a kind of video recognition algorithms configuration diagram that the embodiment of the present invention three provides, in a specific example
In, as shown in figure 3i, after getting video file to be identified, video file to be identified is detached to form simple video Ziwen
Part and simple audio subfile, wherein simple video subfile is a kind of successive frame.Simple video subfile is by slice and closes
Key frame extraction process forms video clip and key frame, can carry out multimodal recognition to key frame picture to obtain picture recognition
Corresponding first recognition result, multimodal recognition may include picture classification, target detection and Face datection etc..Wherein, picture
Classification can be realized using OCR or NLP (Natural Language Processing, natural language processing) method.For list
Pure video subfile can be identified to obtain visual classification, i.e. video identification pair when carrying out video identification using 3DCNN network
The second recognition result answered.Audio classification can also be carried out when carrying out audio identification to simple audio subfile, can therefrom be obtained
The text information in simple audio subfile is taken, and text information is identified using NLP method, non-legible information can be carried out
Speech audio or non-speech audio identification etc., to obtain the corresponding third recognition result of audio identification.Finally, post-processing module
Recognition result in three can be integrated, recognition result is integrated in formation.Integrating includes finally determining label in recognition result
With confidence information etc..
The acquisition modes for embodying each recognition result by adopting the above technical scheme are able to solve existing video audit technology and deposit
Identification content is single and problem that identification range is small, realize abundant identification type, refinement identification content, to video content into
The identification of row various dimensions improves the rich of video identification technology, accuracy, high efficiency on the basis of reducing identification cost
And real-time.
Example IV
Fig. 4 is a kind of schematic diagram for video identification device that the embodiment of the present invention four provides, as shown in figure 4, described device
It include: that subfile obtains module 410, the first identification module 420, the second identification module 430 and recognition result acquisition module
440, in which:
Subfile obtains module 410, for obtain corresponding with video file to be identified simple video subfile and
Simple audio subfile, and obtain key frame set corresponding with the simple video subfile and video clip set;
First identification module 420 obtains the first identification knot for carrying out multi-modal picture recognition to the key frame set
Fruit, and video identification is carried out to the video clip set, obtain the second recognition result;
Second identification module 430 obtains third identification knot for carrying out audio identification to the simple audio subfile
Fruit;
Recognition result obtains module 440, for according to first recognition result, second recognition result and described
Third recognition result obtains corresponding with the video file integrating recognition result.
The embodiment of the present invention is by obtaining simple video subfile corresponding with video file to be identified and simple sound
Frequency subfile, and key frame set corresponding with simple video subfile and video clip set;To key frame set into
The multi-modal picture recognition of row obtains the first recognition result, and carries out video identification to video clip set and obtain the second identification knot
Fruit;Audio identification is carried out to simple audio subfile and obtains third recognition result, finally ties the first recognition result, the second identification
Fruit and third recognition result are integrated to obtain the integration recognition result of video file, solve the existing knowledge of existing video audit technology
The problem that other content is single and identification range is small realizes abundant identification type, refinement identification content, carries out multidimensional to video content
The identification of degree improves the rich of video identification technology, accuracy, high efficiency and reality on the basis of reducing identification cost
Shi Xing.
Optionally, subfile obtains module 410, comprising: filtering sets of video frames acquiring unit, for thick using video frame
Filtering technique is filtered the simple video subfile, obtains filtering sets of video frames;Clustering cluster acquiring unit, based on
Feature vector corresponding with video frame is respectively filtered in the filtering sets of video frames is calculated, and according to described eigenvector to institute
Each filtering video frame stated in filtering sets of video frames carries out clustering processing, obtains at least two clustering clusters, wherein the cluster
It include at least one filtering video frame in cluster;Key frame set component units, it is quiet in each clustering cluster for obtaining respectively
The highest filtering video frame of attitude score forms key frame set;Video clip set acquiring unit, for according to each cluster
Time parameter of the filtering video frame for including in cluster in simple video subfile, determination are corresponding with each clustering cluster
Initial time and duration, and according to the initial time and the duration, the simple video subfile is carried out at slice
Reason, obtains the video clip set.
Optionally, clustering cluster acquiring unit is specifically used for using convolutional neural networks model to the filtering set of video
Each filtering video frame in conjunction carries out feature extraction;Or, using local binary patterns LBP in the filtering sets of video frames
Each filtering video frame carries out feature extraction, and using each feature extraction result carry out processing formed statistics histogram as with it is each described
Filter the corresponding LBP feature vector of video frame
Optionally, the first identification module 420 is specifically used for using default picture classification model in the key frame set
Each key frame carry out picture classification, and using classification results as first recognition result;
And/or
Each key frame in the key frame set is separately input into YOLOv3 model trained in advance;And obtain institute
The output of YOLOv3 model is stated, target object mark corresponding with each key frame and target object are in key frame
In position coordinates as the first recognition result;
And/or
Face datection is carried out to each key frame in the key frame set using S3FD algorithm;It is calculated by MTCNN
Method carries out face key point location to the face detected, obtains face key point;Passed through according to the face key point
Arcface algorithm carries out feature extraction to facial image;According to the feature progress in the face characteristic and feature database extracted
Match, and the corresponding people information of each key frame is identified according to matching result, and by the mark of the face information
As a result it is used as the first recognition result.
Optionally, the first identification module 420 is also used to respectively carry out each video clip in the video clip set
Time domain down-sampling obtains sampled video frame set corresponding with video clip;
Spatially and temporally progress setting processing operation is integrated into the sampled video frame, obtains the defeated of at least two types
Enter image;Wherein, the setting processing operation includes that scaling processing, light stream extraction and edge image extract;The input figure
The type of picture includes high-definition picture, low-resolution image, light stream image and edge image;By all kinds of input pictures
It is separately input into corresponding 3DCNN network, and input picture is identified using the 3DCNN network, and described in acquisition
The output of 3DCNN network, the output probability value of video tab corresponding with input picture;The output of each video tab is general
Rate value is merged according to setting amalgamation mode, and the video tab obtained after fusion is combined as the second recognition result.
Optionally, the second identification module 430, it is fast specifically for being carried out after being pre-processed to the simple audio subfile
Fast Fourier changes to obtain the frequency domain information of the simple audio subfile;Calculate the frequency domain of simple audio subfile described in every frame
The corresponding energy spectrum of information;Logarithm Meier spectrum energy is obtained according to the energy spectrum;According to the logarithm Meier spectrum energy
Extract logarithm Meier spectrum signature;The logarithm Meier spectrum signature is reconstructed, two-dimentional audio frequency characteristics are obtained;By being based on
CNN basic structural unit carries out feature extraction and audio classification to the two-dimentional audio frequency characteristics.
Optionally, when the CNN basic structural unit uses multiple classifiers, to described simple by way of ballot
Audio subfile is identified.
Video frequency identifying method provided by any embodiment of the invention can be performed in above-mentioned video identification device, has the side of execution
The corresponding functional module of method and beneficial effect.The not technical detail of detailed description in the present embodiment, reference can be made to the present invention is any
The video frequency identifying method that embodiment provides.
Embodiment five
Fig. 5 is a kind of structural schematic diagram for computer equipment that the embodiment of the present invention five provides.Fig. 5, which is shown, to be suitable for being used to
Realize the block diagram of the computer equipment 512 of embodiment of the present invention.The computer equipment 512 that Fig. 5 is shown is only an example,
Should not function to the embodiment of the present invention and use scope bring any restrictions.
As shown in figure 5, computer equipment 512 is showed in the form of universal computing device.The component of computer equipment 512 can
To include but is not limited to: one or more processor 516, storage device 528 connect different system components (including storage dress
Set 528 and processor 516) bus 518.
Bus 518 indicates one of a few class bus structures or a variety of, including memory bus or Memory Controller,
Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.It lifts
For example, these architectures include but is not limited to industry standard architecture (Industry Standard
Architecture, ISA) bus, microchannel architecture (Micro Channel Architecture, MCA) bus, enhancing
Type isa bus, Video Electronics Standards Association (Video Electronics Standards Association, VESA) local
Bus and peripheral component interconnection (Peripheral Component Interconnect, PCI) bus.
Computer equipment 512 typically comprises a variety of computer system readable media.These media can be it is any can
The usable medium accessed by computer equipment 512, including volatile and non-volatile media, moveable and immovable Jie
Matter.
Storage device 528 may include the computer system readable media of form of volatile memory, such as arbitrary access
Memory (Random Access Memory, RAM) 530 and/or cache memory 532.Computer equipment 512 can be into
One step includes other removable/nonremovable, volatile/non-volatile computer system storage mediums.Only as an example, it deposits
Storage system 534 can be used for reading and writing immovable, non-volatile magnetic media, and (Fig. 5 do not show, commonly referred to as " hard drive
Device ").Although being not shown in Fig. 5, the disk for reading and writing to removable non-volatile magnetic disk (such as " floppy disk ") can be provided and driven
Dynamic device, and to removable anonvolatile optical disk (such as CD-ROM (Compact Disc-Read Only Memory, CD-
ROM), digital video disk (Digital Video Disc-Read Only Memory, DVD-ROM) or other optical mediums) read-write
CD drive.In these cases, each driver can pass through one or more data media interfaces and bus 518
It is connected.Storage device 528 may include at least one program product, which has one group of (for example, at least one) program
Module, these program modules are configured to perform the function of various embodiments of the present invention.
Program 536 with one group of (at least one) program module 526, can store in such as storage device 528, this
The program module 526 of sample includes but is not limited to operating system, one or more application program, other program modules and program
It may include the realization of network environment in data, each of these examples or certain combination.Program module 526 usually executes
Function and/or method in embodiment described in the invention.
Computer equipment 512 can also with one or more external equipments 514 (such as keyboard, sensing equipment, camera,
Display 524 etc.) communication, the equipment interacted with the computer equipment 512 communication can be also enabled a user to one or more,
And/or with any equipment (such as net that the computer equipment 512 is communicated with one or more of the other calculating equipment
Card, modem etc.) communication.This communication can be carried out by input/output (I/O) interface 522.Also, computer
Equipment 512 can also pass through network adapter 520 and one or more network (such as local area network (Local Area
Network, LAN), wide area network Wide Area Network, WAN) and/or public network, such as internet) communication.As schemed
Show, network adapter 520 is communicated by bus 518 with other modules of computer equipment 512.Although should be understood that in figure not
It shows, other hardware and/or software module can be used in conjunction with computer equipment 512, including but not limited to: microcode, equipment
Driver, redundant processing unit, external disk drive array, disk array (Redundant Arrays of Independent
Disks, RAID) system, tape drive and data backup storage system etc..
The program that processor 516 is stored in storage device 528 by operation, thereby executing various function application and number
According to processing, such as realize video frequency identifying method provided by the above embodiment of the present invention.
That is, the processing unit is realized when executing described program: obtaining corresponding with video file to be identified simple
Video subfile and simple audio subfile, and obtain key frame set corresponding with the simple video subfile and view
Frequency set of segments;Multi-modal picture recognition is carried out to the key frame set, obtains the first recognition result, and to the piece of video
Duan Jihe carries out video identification, obtains the second recognition result;Audio identification is carried out to the simple audio subfile, obtains third
Recognition result;According to first recognition result, second recognition result and the third recognition result, obtain with it is described
Video file is corresponding to integrate recognition result.
By computer equipment acquisition simple video subfile corresponding with video file to be identified and merely
Audio subfile, and key frame set corresponding with simple video subfile and video clip set;To key frame set
It carries out multi-modal picture recognition and obtains the first recognition result, and video identification is carried out to video clip set and obtains the second identification knot
Fruit;Audio identification is carried out to simple audio subfile and obtains third recognition result, finally ties the first recognition result, the second identification
Fruit and third recognition result are integrated to obtain the integration recognition result of video file, solve the existing knowledge of existing video audit technology
The problem that other content is single and identification range is small realizes abundant identification type, refinement identification content, carries out multidimensional to video content
The identification of degree improves the rich of video identification technology, accuracy, high efficiency and reality on the basis of reducing identification cost
Shi Xing.
Embodiment six
The embodiment of the present invention six also provides a kind of computer storage medium for storing computer program, the computer program
When being executed by computer processor for executing any video frequency identifying method of the above embodiment of the present invention: obtain with to
The corresponding simple video subfile of the video file of identification and simple audio subfile, and obtain and the simple video Ziwen
The corresponding key frame set of part and video clip set;Multi-modal picture recognition is carried out to the key frame set, obtains the
One recognition result, and video identification is carried out to the video clip set, obtain the second recognition result;To simple audio
File carries out audio identification, obtains third recognition result;According to first recognition result, second recognition result and institute
Third recognition result is stated, obtains corresponding with the video file integrating recognition result.
The computer storage medium of the embodiment of the present invention, can be using any of one or more computer-readable media
Combination.Computer-readable medium can be computer-readable signal media or computer readable storage medium.It is computer-readable
Storage medium for example may be-but not limited to-the system of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, device or
Device, or any above combination.The more specific example (non exhaustive list) of computer readable storage medium includes: tool
There are electrical connection, the portable computer diskette, hard disk, random access memory (RAM), read-only memory of one or more conducting wires
(Read Only Memory, ROM), erasable programmable read only memory ((Erasable Programmable Read
Only Memory, EPROM) or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic
Memory device or above-mentioned any appropriate combination.In this document, computer readable storage medium, which can be, any includes
Or the tangible medium of storage program, which can be commanded execution system, device or device use or in connection make
With.
Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal,
Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited
In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can
Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for
By the use of instruction execution system, device or device or program in connection.
The program code for including on computer-readable medium can transmit with any suitable medium, including --- but it is unlimited
In wireless, electric wire, optical cable, radio frequency (Radio Frequency, RF) etc. or above-mentioned any appropriate combination.
The computer for executing operation of the present invention can be write with one or more programming languages or combinations thereof
Program code, described program design language include object oriented program language-such as Java, Smalltalk, C++ or
Python etc. further includes conventional procedural programming language --- such as " C " language or similar programming language.Journey
Sequence code can be executed fully on the user computer, partly execute on the user computer, be independent soft as one
Part packet executes, part executes on the remote computer or completely in remote computer or service on the user computer for part
It is executed on device.In situations involving remote computers, remote computer can pass through the network of any kind --- including office
Domain net (LAN) or wide area network (WAN)-are connected to subscriber computer, or, it may be connected to outer computer (such as using because
Spy nets service provider to connect by internet).
Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that
The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation,
It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above embodiments to the present invention
It is described in further detail, but the present invention is not limited to the above embodiments only, without departing from the inventive concept, also
It may include more other equivalent embodiments, and the scope of the invention is determined by the scope of the appended claims.
Claims (10)
1. a kind of video frequency identifying method characterized by comprising
Obtain simple video subfile corresponding with video file to be identified and simple audio subfile, and acquisition with it is described
The corresponding key frame set of video subfile and video clip set merely;
Multi-modal picture recognition is carried out to the key frame set, obtains the first recognition result, and to the video clip set
Video identification is carried out, the second recognition result is obtained;
Audio identification is carried out to the simple audio subfile, obtains third recognition result;
According to first recognition result, second recognition result and the third recognition result, obtain and the video
File is corresponding to integrate recognition result.
2. the method according to claim 1, wherein obtaining key frame corresponding with the simple video subfile
Set and video clip set, comprising:
The simple video subfile is filtered using video frame coarse filtration technology, obtains filtering sets of video frames;
Calculate corresponding with video frame is respectively filtered in filtering sets of video frames feature vector, and according to the feature to
It measures and clustering processing is carried out to each filtering video frame in the filtering sets of video frames, obtain at least two clustering clusters, wherein institute
Stating in clustering cluster includes at least one filtering video frame;
The static highest filtering video frame composition key frame set of angle value in each clustering cluster is obtained respectively;
According to time parameter of the filtering video frame for including in each clustering cluster in simple video subfile, determining and each institute
The corresponding initial time of clustering cluster and duration are stated, and according to the initial time and the duration, to the simple view
Frequency subfile carries out slicing treatment, obtains the video clip set.
3. according to the method described in claim 2, it is characterized in that, calculating and respectively filtering video in the filtering sets of video frames
The corresponding feature vector of frame, comprising:
Feature extraction is carried out to each filtering video frame in the filtering sets of video frames using convolutional neural networks model;Or
Feature extraction is carried out to each filtering video frame in the filtering sets of video frames using local binary patterns LBP, and will
Each feature extraction result carry out processing formed statistics histogram as LBP feature corresponding with each filtering video frame to
Amount.
4. the method according to claim 1, wherein carry out multi-modal picture recognition to the key frame set,
Obtain the first recognition result, comprising:
Picture classification is carried out to each key frame in the key frame set using default picture classification model, and by classification results
As first recognition result;
And/or
Each key frame in the key frame set is separately input into YOLOv3 model trained in advance;And described in obtaining
The output of YOLOv3 model, target object mark corresponding with each key frame and target object are in key frame
Position coordinates as first recognition result;
And/or
Face datection is carried out to each key frame in the key frame set using S3FD algorithm;
Face key point location is carried out to the face detected by MTCNN algorithm, obtains face key point;
Feature extraction is carried out to facial image by Arcface algorithm according to the face key point;
It is matched according to the face characteristic extracted with the feature in feature database, and according to matching result to each key frame
Corresponding people information is identified, and using the mark result of the face information as first recognition result.
5. being obtained the method according to claim 1, wherein carrying out video identification to the video clip set
Second recognition result, comprising:
Time domain down-sampling is carried out to each video clip in the video clip set respectively, obtains adopt corresponding with video clip
Sample sets of video frames;
Spatially and temporally progress setting processing operation is integrated into the sampled video frame, obtains the input figure of at least two types
Picture;Wherein, the setting processing operation includes that scaling processing, light stream extraction and edge image extract;The input picture
Type includes high-definition picture, low-resolution image, light stream image and edge image;
All kinds of input pictures are separately input into corresponding 3DCNN network, and input is schemed using the 3DCNN network
As being identified, and obtain the 3DCNN network output, the output probability value of video tab corresponding with input picture;
The output probability value of each video tab is merged according to setting amalgamation mode, the video mark that will be obtained after fusion
Label are combined as second recognition result.
6. the method according to claim 1, wherein carrying out audio identification, packet to the simple audio subfile
It includes:
Fast Fourier is carried out after pre-processing the simple audio subfile to change to obtain the simple audio subfile
Frequency domain information;
Calculate the corresponding energy spectrum of frequency domain information of simple audio subfile described in every frame;
Logarithm Meier spectrum energy is obtained according to the energy spectrum;
Logarithm Meier spectrum signature is extracted according to the logarithm Meier spectrum energy;
The logarithm Meier spectrum signature is reconstructed, two-dimentional audio frequency characteristics are obtained;
By carrying out feature extraction and audio classification to the two-dimentional audio frequency characteristics based on CNN basic structural unit.
7. according to the method described in claim 6, it is characterized by:
When the CNN basic structural unit uses multiple classifiers, to the simple audio subfile by way of ballot
It is identified.
8. a kind of video identification device characterized by comprising
Subfile obtains module, for obtaining simple video subfile corresponding with video file to be identified and simple audio
Subfile, and obtain key frame set corresponding with the simple video subfile and video clip set;
First identification module obtains the first recognition result, and right for carrying out multi-modal picture recognition to the key frame set
The video clip set carries out video identification, obtains the second recognition result;
Second identification module obtains third recognition result for carrying out audio identification to the simple audio subfile;
Recognition result obtains module, for being known according to first recognition result, second recognition result and the third
Not as a result, obtaining corresponding with the video file integrating recognition result.
9. a kind of computer equipment, which is characterized in that the equipment includes:
One or more processors;
Storage device, for storing one or more programs,
When one or more of programs are executed by one or more of processors, so that one or more of processors are real
The now video frequency identifying method as described in any in claim 1-7.
10. a kind of computer storage medium, is stored thereon with computer program, which is characterized in that the program is executed by processor
Video frequency identifying method of the Shi Shixian as described in any in claim 1-7.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201811113391.2A CN109376603A (en) | 2018-09-25 | 2018-09-25 | A kind of video frequency identifying method, device, computer equipment and storage medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201811113391.2A CN109376603A (en) | 2018-09-25 | 2018-09-25 | A kind of video frequency identifying method, device, computer equipment and storage medium |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN109376603A true CN109376603A (en) | 2019-02-22 |
Family
ID=65401655
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201811113391.2A Pending CN109376603A (en) | 2018-09-25 | 2018-09-25 | A kind of video frequency identifying method, device, computer equipment and storage medium |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN109376603A (en) |
Cited By (78)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109829067A (en) * | 2019-03-05 | 2019-05-31 | 北京达佳互联信息技术有限公司 | Audio data processing method, device, electronic equipment and storage medium |
| CN109862394A (en) * | 2019-03-27 | 2019-06-07 | 北京周同科技有限公司 | Checking method, device, equipment and the storage medium of video content |
| CN109886241A (en) * | 2019-03-05 | 2019-06-14 | 天津工业大学 | Driver fatigue detection based on shot and long term memory network |
| CN110110846A (en) * | 2019-04-24 | 2019-08-09 | 重庆邮电大学 | Auxiliary driver's vehicle exchange method based on convolutional neural networks |
| CN110147711A (en) * | 2019-02-27 | 2019-08-20 | 腾讯科技(深圳)有限公司 | Video scene recognition methods, device, storage medium and electronic device |
| CN110176027A (en) * | 2019-05-27 | 2019-08-27 | 腾讯科技(深圳)有限公司 | Video target tracking method, device, equipment and storage medium |
| CN110210430A (en) * | 2019-06-06 | 2019-09-06 | 中国石油大学(华东) | A kind of Activity recognition method and device |
| CN110298291A (en) * | 2019-06-25 | 2019-10-01 | 吉林大学 | Ox face and ox face critical point detection method based on Mask-RCNN |
| CN110334602A (en) * | 2019-06-06 | 2019-10-15 | 武汉市公安局视频侦查支队 | A kind of people flow rate statistical method based on convolutional neural networks |
| CN110490098A (en) * | 2019-07-31 | 2019-11-22 | 恒大智慧科技有限公司 | Smoking behavior automatic testing method, equipment and the readable storage medium storing program for executing of community user |
| CN110647831A (en) * | 2019-09-12 | 2020-01-03 | 华宇(大连)信息服务有限公司 | Court trial patrol method and system |
| CN110717428A (en) * | 2019-09-27 | 2020-01-21 | 上海依图网络科技有限公司 | Identity recognition method, device, system, medium and equipment fusing multiple features |
| CN110750770A (en) * | 2019-08-18 | 2020-02-04 | 浙江好络维医疗技术有限公司 | Method for unlocking electronic equipment based on electrocardiogram |
| CN110755108A (en) * | 2019-11-04 | 2020-02-07 | 合肥望闻健康科技有限公司 | Heart sound classification method, system and device based on intelligent stethoscope and readable storage medium |
| CN110798703A (en) * | 2019-11-04 | 2020-02-14 | 云目未来科技(北京)有限公司 | Method and device for detecting illegal video content and storage medium |
| CN110852231A (en) * | 2019-11-04 | 2020-02-28 | 云目未来科技(北京)有限公司 | Illegal video detection method and device and storage medium |
| CN110851148A (en) * | 2019-09-23 | 2020-02-28 | 上海意略明数字科技股份有限公司 | Analysis system and method for recognizing user behavior data based on intelligent image |
| CN110853636A (en) * | 2019-10-15 | 2020-02-28 | 北京雷石天地电子技术有限公司 | A system and method for generating verbatim lyrics files based on K-nearest neighbor algorithm |
| CN110879985A (en) * | 2019-11-18 | 2020-03-13 | 西南交通大学 | A face recognition model training method for anti-noise data |
| CN110909613A (en) * | 2019-10-28 | 2020-03-24 | Oppo广东移动通信有限公司 | Video person recognition method, device, storage medium and electronic device |
| CN110942011A (en) * | 2019-11-18 | 2020-03-31 | 上海极链网络科技有限公司 | Video event identification method, system, electronic equipment and medium |
| CN110956108A (en) * | 2019-11-22 | 2020-04-03 | 华南理工大学 | A Small Frequency Standard Detection Method Based on Feature Pyramid |
| CN110996123A (en) * | 2019-12-18 | 2020-04-10 | 广州市百果园信息技术有限公司 | Video processing method, device, equipment and medium |
| CN110991246A (en) * | 2019-10-31 | 2020-04-10 | 天津市国瑞数码安全系统股份有限公司 | Video detection method and system |
| CN111031330A (en) * | 2019-10-29 | 2020-04-17 | 中国科学院大学 | A method for content analysis of webcast based on multimodal fusion |
| CN111047879A (en) * | 2019-12-24 | 2020-04-21 | 苏州奥易克斯汽车电子有限公司 | Vehicle overspeed detection method |
| CN111157007A (en) * | 2020-01-16 | 2020-05-15 | 深圳市守行智能科技有限公司 | Indoor positioning method using cross vision |
| CN111191207A (en) * | 2019-12-23 | 2020-05-22 | 深圳壹账通智能科技有限公司 | Electronic file control method and device, computer equipment and storage medium |
| CN111356014A (en) * | 2020-02-18 | 2020-06-30 | 南京中新赛克科技有限责任公司 | Youtube video identification and matching method based on automatic learning |
| CN111414496A (en) * | 2020-03-27 | 2020-07-14 | 腾讯科技(深圳)有限公司 | Artificial intelligence-based multimedia file detection method and device |
| CN111428591A (en) * | 2020-03-11 | 2020-07-17 | 天津华来科技有限公司 | AI face image processing method, device, equipment and storage medium |
| CN111541940A (en) * | 2020-04-30 | 2020-08-14 | 深圳创维-Rgb电子有限公司 | Motion compensation method and device for display equipment, television and storage medium |
| CN111563551A (en) * | 2020-04-30 | 2020-08-21 | 支付宝(杭州)信息技术有限公司 | Multi-mode information fusion method and device and electronic equipment |
| CN111563488A (en) * | 2020-07-14 | 2020-08-21 | 成都市映潮科技股份有限公司 | Video subject content identification method, system and storage medium |
| CN111724810A (en) * | 2019-03-19 | 2020-09-29 | 杭州海康威视数字技术股份有限公司 | A kind of audio classification method and device |
| CN111741356A (en) * | 2020-08-25 | 2020-10-02 | 腾讯科技(深圳)有限公司 | Quality inspection method, device, device and readable storage medium for double-recording video |
| CN111753762A (en) * | 2020-06-28 | 2020-10-09 | 北京百度网讯科技有限公司 | Recognition method, device, device and storage medium for key identification in video |
| CN111783507A (en) * | 2019-07-24 | 2020-10-16 | 北京京东尚科信息技术有限公司 | Target search method, apparatus, and computer-readable storage medium |
| CN111783718A (en) * | 2020-07-10 | 2020-10-16 | 浙江大华技术股份有限公司 | Target object state identification method and device, storage medium and electronic device |
| CN111797762A (en) * | 2020-07-02 | 2020-10-20 | 北京灵汐科技有限公司 | A scene recognition method and system |
| CN111860222A (en) * | 2020-06-30 | 2020-10-30 | 东南大学 | Video action recognition method, system, computer equipment and storage medium based on dense-segmented frame sampling |
| CN111914759A (en) * | 2020-08-04 | 2020-11-10 | 苏州市职业大学 | Pedestrian re-identification method, device, equipment and medium based on video clip |
| CN111985345A (en) * | 2020-07-27 | 2020-11-24 | 腾讯科技(深圳)有限公司 | Play data processing method and medium |
| CN112052911A (en) * | 2020-09-23 | 2020-12-08 | 恒安嘉新(北京)科技股份公司 | Method and device for identifying riot and terrorist content in image, electronic equipment and storage medium |
| CN112052441A (en) * | 2020-08-24 | 2020-12-08 | 深圳市芯汇群微电子技术有限公司 | Data decryption method of solid state disk based on face recognition and electronic equipment |
| CN112149463A (en) * | 2019-06-27 | 2020-12-29 | 京东方科技集团股份有限公司 | Image processing method and device |
| CN112150431A (en) * | 2020-09-21 | 2020-12-29 | 京东数字科技控股股份有限公司 | UI visual walkthrough method and device, storage medium and electronic device |
| CN112231497A (en) * | 2020-10-19 | 2021-01-15 | 腾讯科技(深圳)有限公司 | Information classification method and device, storage medium and electronic equipment |
| CN112241673A (en) * | 2019-07-19 | 2021-01-19 | 浙江商汤科技开发有限公司 | Video method and device, electronic equipment and storage medium |
| CN112347821A (en) * | 2019-08-09 | 2021-02-09 | 飞思达技术(北京)有限公司 | Method for extracting IPTV (Internet protocol television) and OTT (over the top) video features based on convolutional neural network |
| CN112581438A (en) * | 2020-12-10 | 2021-03-30 | 腾讯科技(深圳)有限公司 | Slice image recognition method and device, storage medium and electronic equipment |
| CN112995666A (en) * | 2021-02-22 | 2021-06-18 | 天翼爱音乐文化科技有限公司 | Video horizontal and vertical screen conversion method and device combined with scene switching detection |
| CN113055666A (en) * | 2019-12-26 | 2021-06-29 | 武汉Tcl集团工业研究院有限公司 | Video quality evaluation method and device |
| CN113076566A (en) * | 2021-04-26 | 2021-07-06 | 深圳市三旺通信股份有限公司 | Display content detection method, device, computer program product and storage medium |
| CN113077470A (en) * | 2021-03-26 | 2021-07-06 | 天翼爱音乐文化科技有限公司 | Method, system, device and medium for cutting horizontal and vertical screen conversion picture |
| CN113220941A (en) * | 2021-06-01 | 2021-08-06 | 平安科技(深圳)有限公司 | Video type obtaining method and device based on multiple models and electronic equipment |
| CN113283515A (en) * | 2021-05-31 | 2021-08-20 | 广州宸祺出行科技有限公司 | Detection method and system for illegal passenger carrying for online taxi appointment |
| CN113435443A (en) * | 2021-06-28 | 2021-09-24 | 中国兵器装备集团自动化研究所有限公司 | Method for automatically identifying landmark from video |
| CN113705563A (en) * | 2021-04-13 | 2021-11-26 | 腾讯科技(深圳)有限公司 | Data processing method, device, equipment and storage medium |
| CN113779308A (en) * | 2021-11-12 | 2021-12-10 | 冠传网络科技(南京)有限公司 | Short video detection and multi-classification method, device and storage medium |
| CN113821675A (en) * | 2021-06-30 | 2021-12-21 | 腾讯科技(北京)有限公司 | Video identification method and device, electronic equipment and computer readable storage medium |
| CN113923472A (en) * | 2021-09-01 | 2022-01-11 | 北京奇艺世纪科技有限公司 | Video content analysis method and device, electronic equipment and storage medium |
| CN114155454A (en) * | 2020-09-07 | 2022-03-08 | 中国移动通信有限公司研究院 | Video processing method, device and storage medium |
| CN114189708A (en) * | 2021-12-07 | 2022-03-15 | 国网电商科技有限公司 | A kind of video content identification method and related device |
| CN114465737A (en) * | 2022-04-13 | 2022-05-10 | 腾讯科技(深圳)有限公司 | Data processing method and device, computer equipment and storage medium |
| CN114626024A (en) * | 2022-05-12 | 2022-06-14 | 北京吉道尔科技有限公司 | Internet infringement video low-consumption detection method and system based on block chain |
| CN114639164A (en) * | 2022-03-10 | 2022-06-17 | 平安科技(深圳)有限公司 | Behavior recognition method, device and equipment based on voting mechanism and storage medium |
| CN114821272A (en) * | 2022-06-28 | 2022-07-29 | 上海蜜度信息技术有限公司 | Image recognition method, image recognition system, image recognition medium, electronic device, and target detection model |
| CN114821401A (en) * | 2022-04-07 | 2022-07-29 | 腾讯科技(深圳)有限公司 | Video auditing method, device, equipment, storage medium and program product |
| CN115049953A (en) * | 2022-05-09 | 2022-09-13 | 中移(杭州)信息技术有限公司 | Video processing method, device, equipment and computer readable storage medium |
| CN115062186A (en) * | 2022-08-05 | 2022-09-16 | 北京远鉴信息技术有限公司 | Video content retrieval method, device, equipment and storage medium |
| CN115529475A (en) * | 2021-12-29 | 2022-12-27 | 北京智美互联科技有限公司 | Method and system for detecting video flow content and controlling wind |
| CN115705706A (en) * | 2021-08-13 | 2023-02-17 | 腾讯科技(深圳)有限公司 | Video processing method, video processing device, computer equipment and storage medium |
| CN115755059A (en) * | 2022-11-23 | 2023-03-07 | 中国船舶重工集团公司第七一五研究所 | Passive high-resolution processing method based on multi-scale deep convolution neural regression network |
| CN115908280A (en) * | 2022-11-03 | 2023-04-04 | 广东科力新材料有限公司 | Data processing-based performance determination method and system for PVC calcium zinc stabilizer |
| CN117173608A (en) * | 2023-08-23 | 2023-12-05 | 山东新一代信息产业技术研究院有限公司 | Video content review methods and systems |
| CN117319749A (en) * | 2023-10-27 | 2023-12-29 | 深圳金语科技有限公司 | Video data transmission method, device, equipment and storage medium |
| CN120550406A (en) * | 2025-07-31 | 2025-08-29 | 赛力斯汽车有限公司 | Game interactive control method, device, computer equipment and storage medium |
Citations (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20100284617A1 (en) * | 2006-06-09 | 2010-11-11 | Sony Ericsson Mobile Communications Ab | Identification of an object in media and of related media objects |
| WO2013122675A2 (en) * | 2011-12-16 | 2013-08-22 | The Research Foundation For The State University Of New York | Methods of recognizing activity in video |
| CN103854014A (en) * | 2014-02-25 | 2014-06-11 | 中国科学院自动化研究所 | A horror video recognition method and device based on context sparse representation |
| CN104021544A (en) * | 2014-05-07 | 2014-09-03 | 中国农业大学 | Greenhouse vegetable disease surveillance video key frame extracting method and extracting system |
| CN105550699A (en) * | 2015-12-08 | 2016-05-04 | 北京工业大学 | CNN-based video identification and classification method through time-space significant information fusion |
| CN106407960A (en) * | 2016-11-09 | 2017-02-15 | 浙江师范大学 | Multi-feature-based classification method and system for music genres |
| CN106599789A (en) * | 2016-07-29 | 2017-04-26 | 北京市商汤科技开发有限公司 | Video class identification method and device, data processing device and electronic device |
| CN107247919A (en) * | 2017-04-28 | 2017-10-13 | 深圳大学 | The acquisition methods and system of a kind of video feeling content |
| CN107590420A (en) * | 2016-07-07 | 2018-01-16 | 北京新岸线网络技术有限公司 | Scene extraction method of key frame and device in video analysis |
| CN107609497A (en) * | 2017-08-31 | 2018-01-19 | 武汉世纪金桥安全技术有限公司 | The real-time video face identification method and system of view-based access control model tracking technique |
| CN108053838A (en) * | 2017-12-01 | 2018-05-18 | 上海壹账通金融科技有限公司 | With reference to audio analysis and fraud recognition methods, device and the storage medium of video analysis |
-
2018
- 2018-09-25 CN CN201811113391.2A patent/CN109376603A/en active Pending
Patent Citations (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20100284617A1 (en) * | 2006-06-09 | 2010-11-11 | Sony Ericsson Mobile Communications Ab | Identification of an object in media and of related media objects |
| WO2013122675A2 (en) * | 2011-12-16 | 2013-08-22 | The Research Foundation For The State University Of New York | Methods of recognizing activity in video |
| CN103854014A (en) * | 2014-02-25 | 2014-06-11 | 中国科学院自动化研究所 | A horror video recognition method and device based on context sparse representation |
| CN104021544A (en) * | 2014-05-07 | 2014-09-03 | 中国农业大学 | Greenhouse vegetable disease surveillance video key frame extracting method and extracting system |
| CN105550699A (en) * | 2015-12-08 | 2016-05-04 | 北京工业大学 | CNN-based video identification and classification method through time-space significant information fusion |
| CN107590420A (en) * | 2016-07-07 | 2018-01-16 | 北京新岸线网络技术有限公司 | Scene extraction method of key frame and device in video analysis |
| CN106599789A (en) * | 2016-07-29 | 2017-04-26 | 北京市商汤科技开发有限公司 | Video class identification method and device, data processing device and electronic device |
| CN106407960A (en) * | 2016-11-09 | 2017-02-15 | 浙江师范大学 | Multi-feature-based classification method and system for music genres |
| CN107247919A (en) * | 2017-04-28 | 2017-10-13 | 深圳大学 | The acquisition methods and system of a kind of video feeling content |
| CN107609497A (en) * | 2017-08-31 | 2018-01-19 | 武汉世纪金桥安全技术有限公司 | The real-time video face identification method and system of view-based access control model tracking technique |
| CN108053838A (en) * | 2017-12-01 | 2018-05-18 | 上海壹账通金融科技有限公司 | With reference to audio analysis and fraud recognition methods, device and the storage medium of video analysis |
Non-Patent Citations (1)
| Title |
|---|
| 王深: "《基于聚类算法的多特征融合关键帧提取技术研究》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (110)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110147711A (en) * | 2019-02-27 | 2019-08-20 | 腾讯科技(深圳)有限公司 | Video scene recognition methods, device, storage medium and electronic device |
| CN110147711B (en) * | 2019-02-27 | 2023-11-14 | 腾讯科技(深圳)有限公司 | Video scene recognition method and device, storage medium and electronic device |
| CN109829067A (en) * | 2019-03-05 | 2019-05-31 | 北京达佳互联信息技术有限公司 | Audio data processing method, device, electronic equipment and storage medium |
| CN109886241A (en) * | 2019-03-05 | 2019-06-14 | 天津工业大学 | Driver fatigue detection based on shot and long term memory network |
| CN111724810A (en) * | 2019-03-19 | 2020-09-29 | 杭州海康威视数字技术股份有限公司 | A kind of audio classification method and device |
| CN111724810B (en) * | 2019-03-19 | 2023-11-24 | 杭州海康威视数字技术股份有限公司 | An audio classification method and device |
| CN109862394A (en) * | 2019-03-27 | 2019-06-07 | 北京周同科技有限公司 | Checking method, device, equipment and the storage medium of video content |
| CN110110846A (en) * | 2019-04-24 | 2019-08-09 | 重庆邮电大学 | Auxiliary driver's vehicle exchange method based on convolutional neural networks |
| CN110176027A (en) * | 2019-05-27 | 2019-08-27 | 腾讯科技(深圳)有限公司 | Video target tracking method, device, equipment and storage medium |
| CN110176027B (en) * | 2019-05-27 | 2023-03-14 | 腾讯科技(深圳)有限公司 | Video target tracking method, device, equipment and storage medium |
| CN110334602A (en) * | 2019-06-06 | 2019-10-15 | 武汉市公安局视频侦查支队 | A kind of people flow rate statistical method based on convolutional neural networks |
| CN110210430A (en) * | 2019-06-06 | 2019-09-06 | 中国石油大学(华东) | A kind of Activity recognition method and device |
| CN110334602B (en) * | 2019-06-06 | 2021-10-26 | 武汉市公安局视频侦查支队 | People flow statistical method based on convolutional neural network |
| CN110298291A (en) * | 2019-06-25 | 2019-10-01 | 吉林大学 | Ox face and ox face critical point detection method based on Mask-RCNN |
| CN110298291B (en) * | 2019-06-25 | 2022-09-23 | 吉林大学 | Mask-RCNN-based cow face and cow face key point detection method |
| CN112149463A (en) * | 2019-06-27 | 2020-12-29 | 京东方科技集团股份有限公司 | Image processing method and device |
| CN112149463B (en) * | 2019-06-27 | 2024-04-23 | 京东方科技集团股份有限公司 | Image processing method and device |
| CN112241673A (en) * | 2019-07-19 | 2021-01-19 | 浙江商汤科技开发有限公司 | Video method and device, electronic equipment and storage medium |
| CN111783507A (en) * | 2019-07-24 | 2020-10-16 | 北京京东尚科信息技术有限公司 | Target search method, apparatus, and computer-readable storage medium |
| CN110490098A (en) * | 2019-07-31 | 2019-11-22 | 恒大智慧科技有限公司 | Smoking behavior automatic testing method, equipment and the readable storage medium storing program for executing of community user |
| CN112347821A (en) * | 2019-08-09 | 2021-02-09 | 飞思达技术(北京)有限公司 | Method for extracting IPTV (Internet protocol television) and OTT (over the top) video features based on convolutional neural network |
| CN110750770A (en) * | 2019-08-18 | 2020-02-04 | 浙江好络维医疗技术有限公司 | Method for unlocking electronic equipment based on electrocardiogram |
| CN110750770B (en) * | 2019-08-18 | 2023-10-03 | 浙江好络维医疗技术有限公司 | Electrocardiogram-based method for unlocking electronic equipment |
| CN110647831A (en) * | 2019-09-12 | 2020-01-03 | 华宇(大连)信息服务有限公司 | Court trial patrol method and system |
| CN110851148A (en) * | 2019-09-23 | 2020-02-28 | 上海意略明数字科技股份有限公司 | Analysis system and method for recognizing user behavior data based on intelligent image |
| CN110717428A (en) * | 2019-09-27 | 2020-01-21 | 上海依图网络科技有限公司 | Identity recognition method, device, system, medium and equipment fusing multiple features |
| CN110853636A (en) * | 2019-10-15 | 2020-02-28 | 北京雷石天地电子技术有限公司 | A system and method for generating verbatim lyrics files based on K-nearest neighbor algorithm |
| CN110853636B (en) * | 2019-10-15 | 2022-04-15 | 北京雷石天地电子技术有限公司 | System and method for generating word-by-word lyric file based on K nearest neighbor algorithm |
| WO2021082941A1 (en) * | 2019-10-28 | 2021-05-06 | Oppo广东移动通信有限公司 | Video figure recognition method and apparatus, and storage medium and electronic device |
| CN110909613B (en) * | 2019-10-28 | 2024-05-31 | Oppo广东移动通信有限公司 | Video character recognition method and device, storage medium and electronic equipment |
| CN110909613A (en) * | 2019-10-28 | 2020-03-24 | Oppo广东移动通信有限公司 | Video person recognition method, device, storage medium and electronic device |
| CN111031330A (en) * | 2019-10-29 | 2020-04-17 | 中国科学院大学 | A method for content analysis of webcast based on multimodal fusion |
| CN110991246A (en) * | 2019-10-31 | 2020-04-10 | 天津市国瑞数码安全系统股份有限公司 | Video detection method and system |
| CN110852231A (en) * | 2019-11-04 | 2020-02-28 | 云目未来科技(北京)有限公司 | Illegal video detection method and device and storage medium |
| CN110798703A (en) * | 2019-11-04 | 2020-02-14 | 云目未来科技(北京)有限公司 | Method and device for detecting illegal video content and storage medium |
| CN110755108A (en) * | 2019-11-04 | 2020-02-07 | 合肥望闻健康科技有限公司 | Heart sound classification method, system and device based on intelligent stethoscope and readable storage medium |
| CN110942011B (en) * | 2019-11-18 | 2021-02-02 | 上海极链网络科技有限公司 | Video event identification method, system, electronic equipment and medium |
| CN110942011A (en) * | 2019-11-18 | 2020-03-31 | 上海极链网络科技有限公司 | Video event identification method, system, electronic equipment and medium |
| CN110879985A (en) * | 2019-11-18 | 2020-03-13 | 西南交通大学 | A face recognition model training method for anti-noise data |
| CN110956108B (en) * | 2019-11-22 | 2023-04-18 | 华南理工大学 | Small frequency scale detection method based on characteristic pyramid |
| CN110956108A (en) * | 2019-11-22 | 2020-04-03 | 华南理工大学 | A Small Frequency Standard Detection Method Based on Feature Pyramid |
| CN110996123B (en) * | 2019-12-18 | 2022-01-11 | 广州市百果园信息技术有限公司 | Video processing method, device, equipment and medium |
| CN110996123A (en) * | 2019-12-18 | 2020-04-10 | 广州市百果园信息技术有限公司 | Video processing method, device, equipment and medium |
| CN111191207A (en) * | 2019-12-23 | 2020-05-22 | 深圳壹账通智能科技有限公司 | Electronic file control method and device, computer equipment and storage medium |
| CN111047879A (en) * | 2019-12-24 | 2020-04-21 | 苏州奥易克斯汽车电子有限公司 | Vehicle overspeed detection method |
| CN113055666B (en) * | 2019-12-26 | 2022-08-09 | 武汉Tcl集团工业研究院有限公司 | Video quality evaluation method and device |
| CN113055666A (en) * | 2019-12-26 | 2021-06-29 | 武汉Tcl集团工业研究院有限公司 | Video quality evaluation method and device |
| CN111157007A (en) * | 2020-01-16 | 2020-05-15 | 深圳市守行智能科技有限公司 | Indoor positioning method using cross vision |
| CN111356014A (en) * | 2020-02-18 | 2020-06-30 | 南京中新赛克科技有限责任公司 | Youtube video identification and matching method based on automatic learning |
| CN111428591A (en) * | 2020-03-11 | 2020-07-17 | 天津华来科技有限公司 | AI face image processing method, device, equipment and storage medium |
| CN111414496A (en) * | 2020-03-27 | 2020-07-14 | 腾讯科技(深圳)有限公司 | Artificial intelligence-based multimedia file detection method and device |
| CN111414496B (en) * | 2020-03-27 | 2023-04-07 | 腾讯科技(深圳)有限公司 | Artificial intelligence-based multimedia file detection method and device |
| CN111563551A (en) * | 2020-04-30 | 2020-08-21 | 支付宝(杭州)信息技术有限公司 | Multi-mode information fusion method and device and electronic equipment |
| CN111541940B (en) * | 2020-04-30 | 2022-04-08 | 深圳创维-Rgb电子有限公司 | Motion compensation method, device, television and storage medium for display device |
| CN111541940A (en) * | 2020-04-30 | 2020-08-14 | 深圳创维-Rgb电子有限公司 | Motion compensation method and device for display equipment, television and storage medium |
| CN111753762A (en) * | 2020-06-28 | 2020-10-09 | 北京百度网讯科技有限公司 | Recognition method, device, device and storage medium for key identification in video |
| CN111753762B (en) * | 2020-06-28 | 2024-03-15 | 北京百度网讯科技有限公司 | Method, device, equipment and storage medium for identifying key identification in video |
| CN111860222A (en) * | 2020-06-30 | 2020-10-30 | 东南大学 | Video action recognition method, system, computer equipment and storage medium based on dense-segmented frame sampling |
| CN111797762A (en) * | 2020-07-02 | 2020-10-20 | 北京灵汐科技有限公司 | A scene recognition method and system |
| CN111783718A (en) * | 2020-07-10 | 2020-10-16 | 浙江大华技术股份有限公司 | Target object state identification method and device, storage medium and electronic device |
| CN111563488A (en) * | 2020-07-14 | 2020-08-21 | 成都市映潮科技股份有限公司 | Video subject content identification method, system and storage medium |
| CN111985345A (en) * | 2020-07-27 | 2020-11-24 | 腾讯科技(深圳)有限公司 | Play data processing method and medium |
| CN111914759A (en) * | 2020-08-04 | 2020-11-10 | 苏州市职业大学 | Pedestrian re-identification method, device, equipment and medium based on video clip |
| CN111914759B (en) * | 2020-08-04 | 2024-02-13 | 苏州市职业大学 | Pedestrian re-identification method, device, equipment and medium based on video clips |
| CN112052441B (en) * | 2020-08-24 | 2021-09-28 | 深圳市芯汇群微电子技术有限公司 | Data decryption method of solid state disk based on face recognition and electronic equipment |
| CN112052441A (en) * | 2020-08-24 | 2020-12-08 | 深圳市芯汇群微电子技术有限公司 | Data decryption method of solid state disk based on face recognition and electronic equipment |
| CN111741356B (en) * | 2020-08-25 | 2020-12-08 | 腾讯科技(深圳)有限公司 | Quality inspection method, device, device and readable storage medium for double-recording video |
| CN111741356A (en) * | 2020-08-25 | 2020-10-02 | 腾讯科技(深圳)有限公司 | Quality inspection method, device, device and readable storage medium for double-recording video |
| CN114155454B (en) * | 2020-09-07 | 2025-04-04 | 中国移动通信有限公司研究院 | Video processing method, device and storage medium |
| CN114155454A (en) * | 2020-09-07 | 2022-03-08 | 中国移动通信有限公司研究院 | Video processing method, device and storage medium |
| CN112150431A (en) * | 2020-09-21 | 2020-12-29 | 京东数字科技控股股份有限公司 | UI visual walkthrough method and device, storage medium and electronic device |
| CN112052911A (en) * | 2020-09-23 | 2020-12-08 | 恒安嘉新(北京)科技股份公司 | Method and device for identifying riot and terrorist content in image, electronic equipment and storage medium |
| CN112231497A (en) * | 2020-10-19 | 2021-01-15 | 腾讯科技(深圳)有限公司 | Information classification method and device, storage medium and electronic equipment |
| CN112231497B (en) * | 2020-10-19 | 2024-04-09 | 腾讯科技(深圳)有限公司 | Information classification method and device, storage medium and electronic equipment |
| CN112581438A (en) * | 2020-12-10 | 2021-03-30 | 腾讯科技(深圳)有限公司 | Slice image recognition method and device, storage medium and electronic equipment |
| CN112581438B (en) * | 2020-12-10 | 2022-11-08 | 腾讯医疗健康(深圳)有限公司 | Slice image recognition method and device, storage medium and electronic equipment |
| CN112995666A (en) * | 2021-02-22 | 2021-06-18 | 天翼爱音乐文化科技有限公司 | Video horizontal and vertical screen conversion method and device combined with scene switching detection |
| CN113077470A (en) * | 2021-03-26 | 2021-07-06 | 天翼爱音乐文化科技有限公司 | Method, system, device and medium for cutting horizontal and vertical screen conversion picture |
| CN113077470B (en) * | 2021-03-26 | 2022-01-18 | 天翼爱音乐文化科技有限公司 | Method, system, device and medium for cutting horizontal and vertical screen conversion picture |
| CN113705563A (en) * | 2021-04-13 | 2021-11-26 | 腾讯科技(深圳)有限公司 | Data processing method, device, equipment and storage medium |
| CN113076566B (en) * | 2021-04-26 | 2024-02-27 | 深圳市三旺通信股份有限公司 | Display content detection method, apparatus, computer program product, and storage medium |
| CN113076566A (en) * | 2021-04-26 | 2021-07-06 | 深圳市三旺通信股份有限公司 | Display content detection method, device, computer program product and storage medium |
| CN113283515B (en) * | 2021-05-31 | 2024-02-02 | 广州宸祺出行科技有限公司 | Detection method and system for illegal passenger carrying of network appointment vehicle |
| CN113283515A (en) * | 2021-05-31 | 2021-08-20 | 广州宸祺出行科技有限公司 | Detection method and system for illegal passenger carrying for online taxi appointment |
| CN113220941B (en) * | 2021-06-01 | 2022-08-02 | 平安科技(深圳)有限公司 | Video type obtaining method and device based on multiple models and electronic equipment |
| CN113220941A (en) * | 2021-06-01 | 2021-08-06 | 平安科技(深圳)有限公司 | Video type obtaining method and device based on multiple models and electronic equipment |
| CN113435443B (en) * | 2021-06-28 | 2023-04-18 | 中国兵器装备集团自动化研究所有限公司 | Method for automatically identifying landmark from video |
| CN113435443A (en) * | 2021-06-28 | 2021-09-24 | 中国兵器装备集团自动化研究所有限公司 | Method for automatically identifying landmark from video |
| CN113821675A (en) * | 2021-06-30 | 2021-12-21 | 腾讯科技(北京)有限公司 | Video identification method and device, electronic equipment and computer readable storage medium |
| CN113821675B (en) * | 2021-06-30 | 2024-06-07 | 腾讯科技(北京)有限公司 | Video identification method, device, electronic equipment and computer readable storage medium |
| CN115705706A (en) * | 2021-08-13 | 2023-02-17 | 腾讯科技(深圳)有限公司 | Video processing method, video processing device, computer equipment and storage medium |
| CN113923472B (en) * | 2021-09-01 | 2023-09-01 | 北京奇艺世纪科技有限公司 | Video content analysis method, device, electronic equipment and storage medium |
| CN113923472A (en) * | 2021-09-01 | 2022-01-11 | 北京奇艺世纪科技有限公司 | Video content analysis method and device, electronic equipment and storage medium |
| CN113779308B (en) * | 2021-11-12 | 2022-02-25 | 冠传网络科技(南京)有限公司 | Short video detection and multi-classification method, device and storage medium |
| CN113779308A (en) * | 2021-11-12 | 2021-12-10 | 冠传网络科技(南京)有限公司 | Short video detection and multi-classification method, device and storage medium |
| CN114189708A (en) * | 2021-12-07 | 2022-03-15 | 国网电商科技有限公司 | A kind of video content identification method and related device |
| CN115529475A (en) * | 2021-12-29 | 2022-12-27 | 北京智美互联科技有限公司 | Method and system for detecting video flow content and controlling wind |
| CN114639164A (en) * | 2022-03-10 | 2022-06-17 | 平安科技(深圳)有限公司 | Behavior recognition method, device and equipment based on voting mechanism and storage medium |
| CN114639164B (en) * | 2022-03-10 | 2024-07-19 | 平安科技(深圳)有限公司 | Behavior recognition method, device equipment and storage medium based on voting mechanism |
| CN114821401A (en) * | 2022-04-07 | 2022-07-29 | 腾讯科技(深圳)有限公司 | Video auditing method, device, equipment, storage medium and program product |
| CN114465737A (en) * | 2022-04-13 | 2022-05-10 | 腾讯科技(深圳)有限公司 | Data processing method and device, computer equipment and storage medium |
| CN115049953A (en) * | 2022-05-09 | 2022-09-13 | 中移(杭州)信息技术有限公司 | Video processing method, device, equipment and computer readable storage medium |
| CN114626024A (en) * | 2022-05-12 | 2022-06-14 | 北京吉道尔科技有限公司 | Internet infringement video low-consumption detection method and system based on block chain |
| CN114821272A (en) * | 2022-06-28 | 2022-07-29 | 上海蜜度信息技术有限公司 | Image recognition method, image recognition system, image recognition medium, electronic device, and target detection model |
| CN115062186A (en) * | 2022-08-05 | 2022-09-16 | 北京远鉴信息技术有限公司 | Video content retrieval method, device, equipment and storage medium |
| CN115908280A (en) * | 2022-11-03 | 2023-04-04 | 广东科力新材料有限公司 | Data processing-based performance determination method and system for PVC calcium zinc stabilizer |
| CN115755059A (en) * | 2022-11-23 | 2023-03-07 | 中国船舶重工集团公司第七一五研究所 | Passive high-resolution processing method based on multi-scale deep convolution neural regression network |
| CN117173608A (en) * | 2023-08-23 | 2023-12-05 | 山东新一代信息产业技术研究院有限公司 | Video content review methods and systems |
| CN117319749A (en) * | 2023-10-27 | 2023-12-29 | 深圳金语科技有限公司 | Video data transmission method, device, equipment and storage medium |
| CN120550406A (en) * | 2025-07-31 | 2025-08-29 | 赛力斯汽车有限公司 | Game interactive control method, device, computer equipment and storage medium |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN109376603A (en) | A kind of video frequency identifying method, device, computer equipment and storage medium | |
| Zhang | Deepfake generation and detection, a survey | |
| Pan et al. | Deepfake detection through deep learning | |
| Liu et al. | Learning human pose models from synthesized data for robust RGB-D action recognition | |
| CN113762138B (en) | Identification method, device, computer equipment and storage medium for fake face pictures | |
| Chen et al. | Chinesefoodnet: A large-scale image dataset for chinese food recognition | |
| Deng et al. | Image aesthetic assessment: An experimental survey | |
| CN113569895B (en) | Image processing model training method, processing method, device, equipment and medium | |
| US8391617B2 (en) | Event recognition using image and location information | |
| Ben Tamou et al. | Multi-stream fish detection in unconstrained underwater videos by the fusion of two convolutional neural network detectors | |
| Xu et al. | Saliency prediction on omnidirectional image with generative adversarial imitation learning | |
| CN113762041B (en) | Video classification method, device, computer equipment and storage medium | |
| CN114360073B (en) | Image recognition method and related device | |
| Park et al. | Performance comparison and visualization of ai-generated-image detection methods | |
| Hou et al. | Text-aware single image specular highlight removal | |
| Alfarano et al. | A novel convmixer transformer based architecture for violent behavior detection | |
| Roy et al. | Unmasking deepfake visual content with generative AI | |
| Daniilidis et al. | Computer Vision--ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part V | |
| CN117156078B (en) | Video data processing method and device, electronic equipment and storage medium | |
| CN116896654B (en) | Video processing method and related device | |
| Khedkar et al. | Exploiting spatiotemporal inconsistencies to detect deepfake videos in the wild | |
| CN119172634A (en) | A panoramic video navigation method driven by user subjective preference | |
| Lahrache et al. | A survey on image memorability prediction: From traditional to deep learning models | |
| Guo et al. | Generative model based data augmentation for special person classification | |
| Mac | Learning efficient temporal information in deep networks: From the viewpoints of applications and modeling |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190222 |
|
| RJ01 | Rejection of invention patent application after publication |