US12112514B2 - Device for generating prediction image on basis of generator including concentration layer, and control method therefor - Google Patents
Device for generating prediction image on basis of generator including concentration layer, and control method therefor Download PDFInfo
- Publication number
- US12112514B2 US12112514B2 US17/361,556 US202117361556A US12112514B2 US 12112514 B2 US12112514 B2 US 12112514B2 US 202117361556 A US202117361556 A US 202117361556A US 12112514 B2 US12112514 B2 US 12112514B2
- Authority
- US
- United States
- Prior art keywords
- generator
- image
- neural network
- image frame
- data block
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/14—Picture signal circuitry for video frequency region
- H04N5/144—Movement detection
- H04N5/145—Movement estimation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0475—Generative networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/094—Adversarial learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T9/00—Image coding
- G06T9/002—Image coding using neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Definitions
- An electronic apparatus for generating a prediction image based on a plurality of input (past) images More particularly, an electronic apparatus using a generator model including an attention layer itself is disclosed
- a generator model for generating prediction image frames could be trained through a generative adversarial network (GAN).
- GAN generative adversarial network
- This generator model could be used, for example, for anomaly detection of video captured through CCTV.
- an electronic apparatus comprises: a memory storing a generator previously trained to generate a prediction image based on one or more input images; and a processor configured to: acquire feature data from a plurality of image frames input through at least one layer included in the generator, extract feature data corresponding to change over time from the feature data acquired through an attention layer included in the generator, and acquire a prediction image frame by inputting the extracted feature data to at least one other layer included in the generator.
- the processor may be configured to, based on a result of comparing an image frame inputted after the plurality of image frames and the prediction image frame, train the generator including the attention layer.
- Each of the plurality of image frames may comprise a plurality of pixels, and wherein the attention layer is configured to be trained to extract feature data of pixels predicted to change over time from feature data for each of the plurality of pixels outputted from the at least one layer.
- the memory may be configured to include a discriminator trained to identify whether the inputted image is a real image frame or not, and wherein the processor is configured to train the generator based on the output acquired by inputting the prediction image frame into the discriminator.
- the plurality of image frames may be configured to correspond to a normal situation
- the processor is configured to input a plurality of image frames captured according to time into the generator to generate a prediction image frame, and based on an image frame captured after the plurality of captured image frames and the generated prediction image frame, identify whether an abnormal situation occurs.
- the generator may be configured to include a first neural network for performing encoding with respect to the plurality of inputted image frames, and a second neural network for performing decoding with respect to data encoded through the first neural network, wherein the first neural network includes a first attention layer and the second neural network includes a second attention layer, and wherein the processor is configured to perform max-pooling through the first attention layer and perform deconvolution through the second attention layer.
- the generator may be composed of U-net in which at least one output excluding an output layer from a plurality of layers of the first neural network is inputted into at least one layer excluding an input layer from a plurality of layers of the second neural network.
- a method for controlling an electronic apparatus including a memory storing a generator previously trained to generate a prediction image based on one or more inputted images is stored, the method comprising: inputting a plurality of image frames inputted according to time into the generator; acquiring feature data from the plurality of image frames input through at least one layer included in the generator, extracting feature data corresponding to change over time from the feature data acquired through an attention layer included in the generator, and acquiring a prediction image frame by inputting the extracted feature data to at least one other layer included in the generator.
- the method may further include based on a result of comparing an image frame inputted after the plurality of image frames and the prediction image frame, training the generator including the attention layer.
- Each of the plurality of image frames comprises a plurality of pixels
- the training the generator includes training the attention layer to extract feature data of pixels predicted to change over time from feature data for each of the plurality of pixels outputted from the at least one layer.
- the memory may be configured to include a discriminator trained to identify whether the inputted image is a real image frame or not, and wherein the training includes training the generator based on the output acquired by inputting the prediction image frame into the discriminator.
- the plurality of image frames may be configured to correspond to a normal situation, and wherein the method further includes inputting a plurality of image frames captured according to time into the generator to generate a prediction image frame, and based on an image frame captured after the plurality of captured image frames and the generated prediction image frame, identifying whether an abnormal situation occurs.
- the generator may be configured to include a first neural network including a first attention layer and a second neural network including a second attention layer, and wherein the generating the prediction image frame includes performing encoding with respect to the plurality of inputted image frames through the first neural network, and performing, through the second neural network, decoding with respect to data encoded through the first neural network, wherein the encoding includes performing max-pooling through the first attention layer, and wherein the decoding includes performing deconvolution through the second attention layer.
- a non-transitory computer-readable recording medium that is executed by a processor of an electronic apparatus including a memory storing a generator previously trained to generate a prediction image based on one or more inputted image, the non-transitory computer-readable recording medium storing at least one instruction to cause the electronic apparatus to perform a plurality of operations comprising: inputting a plurality of image frames according to time; acquiring feature data from the plurality of image frames input through at least one layer included in the generator; extracting feature data corresponding to change over time from the feature data acquired through an attention layer included in the generator, and inputting the extracted feature data into at least one other layer included in the generator to acquire a prediction image frame.
- the electronic apparatus may generate a more accurate prediction image frame by using a generator including an attention layer.
- the electronic apparatus may generate an accurate prediction image frame, while reducing the amount of computation and data, sine a motion pattern over time may be considered through the generator's own configuration.
- FIG. 1 is a block diagram illustrating a configuration of an electronic apparatus according to an embodiment
- FIG. 2 is a view functionally illustrating a configuration of a generator according to an embodiment
- FIG. 3 is a view illustrating a process of processing image data blocks acquired through each layer in a generator according to the disclosure
- FIG. 4 is a block diagram illustrating a configuration of an attention layer included in a generator according to an embodiment
- FIG. 5 is a block diagram illustrating a discriminator for determining whether an image frame generated by a generator is authentic or not according to an embodiment
- FIG. 6 is a table illustrating an effect of a generator of the disclosure in which an attention layer is self-contained.
- FIG. 7 is a block diagram illustrating a more detailed configuration of an electronic apparatus according to certain embodiments.
- FIG. 8 is a flowchart illustrating a method for controlling an electronic apparatus according to an embodiment.
- Certain embodiments of the disclosure provide an electronic apparatus for generating a prediction image frame using a generator model including an attention layer itself.
- the certain embodiments provide an electronic apparatus using a generator model that enables a prediction image frame to be generated in an end-to-end manner as long as previous sequential image frames are input.
- Certain embodiments provide for an electronic apparatus that flexibly copes with characteristics of images that vary depending on situations (e.g., difference in target between a person-centered situation and a natural disaster-centered situation).
- ordinal numbers such as “first”, “second”, etc.
- the ordinal numbers are used in order to distinguish the same or similar elements from one another, and the use of the ordinal number should not be understood as limiting the meaning of the terms.
- used orders, arrangement orders, or the like of elements that are combined with these ordinal numbers may not be limited by the numbers.
- the respective ordinal numbers are interchangeably used, if necessary.
- module In the exemplary embodiment of the disclosure, the term “module,” “unit,” or “part” is referred to as an element that performs at least one function or operation, and may be implemented with hardware, software, or a combination of hardware and software.
- a plurality of “modules,” a plurality of “units,” a plurality of “parts” may be integrated into at least one module or chip except for a “module,” a “unit,” or a “part” which has to be implemented with specific hardware, and may be implemented with at least one processor (not shown).
- any part when any part is connected to another part, this includes a direct connection and an indirect connection through another medium.
- FIG. 1 is a block diagram illustrating a configuration of an electronic apparatus according to an embodiment.
- the electronic apparatus 10 may include a memory 110 and a processor 120 .
- the term processor 120 shall be understood to include one or more processors 120 .
- the electronic apparatus 10 may be implemented as a server device, a smart phone, tablet, and other various PCs or terminal devices.
- the memory 110 stores an operating system (OS).
- OS comprises a plurality of executable instruction, which when executed by processor 120 , controls an overall operation of components of the electronic apparatus 10 and a component for storing various data related to the components of the electronic apparatus 10 .
- At least one instruction related to one or more components of the electronic apparatus 10 may be stored in the memory 110 .
- the memory 110 may be implemented as a non-volatile memory (e.g., a hard disk, a solid state drive (SSD), a flash memory), a volatile memory, or the like.
- a non-volatile memory e.g., a hard disk, a solid state drive (SSD), a flash memory
- volatile memory e.g., a volatile memory, or the like.
- a generator 200 may be stored in the memory 110 .
- the generator 200 is a model for generating a prediction image based on one or more input images.
- the generator 200 may be previously trained based on a plurality of sequential images and a known image following the plurality of images. That is, the generator 200 may generated a prediction image following a plurality of sequential images, and be trained by comparison of the prediction image with the actual known image.
- the generator 200 may be composed of a plurality of neural network layers. Each layer has a plurality of weight values, and perform a layer operation through an operation result of a previous layer and a plurality of weight values.
- the generator 200 may be partially implemented as a generative model including a convolutional neural network (CNN), but is not limited thereto.
- CNN convolutional neural network
- the generator 200 may include an attention layer 215 .
- the attention layer 215 is a layer for extracting feature data related to motion over time from feature data output from some layers of the generator 200 .
- the processor 120 may control the overall operation of the electronic apparatus 10 .
- the processor 120 may be connected to the memory 110 to control the electronic apparatus 100 .
- the processor 120 may include a central processing unit (CPU), a graphical processing unit (GPU), a neural processing unit (NPU), or the like in hardware, and perform control-related operations and data processing included in the electronic apparatus 100 .
- CPU central processing unit
- GPU graphical processing unit
- NPU neural processing unit
- the processor 120 may be implemented as a micro processing unit (MPU), or may correspond to a computer in which random access memory (RAM) and read only memory (ROM) are connected to a CPU or the like through a system bus.
- MPU micro processing unit
- RAM random access memory
- ROM read only memory
- the processor 120 may control not only hardware components included in the electronic apparatus 10 , but also one or more software modules included in the electronic apparatus 10 , and a result of controlling the software module by the processor 120 may be derived as an operation of hardware components.
- the processor 120 may be composed of one or a plurality of processors.
- one or more processors may be general-purpose processors such as CPU and AP, a graphics-only processor such as GPU, VPU or the like, or an artificial intelligence-only processor such as NPU.
- One or the plurality of processors may control and process input data according to a predefined operation rule or an artificial intelligence model stored in the memory.
- a predefined operation rule or artificial intelligence model is characterized by being generated through learning (training).
- Being generated through learning means that a predefined operation rule or an artificial intelligence model of a desired characteristic is generated by applying a learning algorithm to a plurality of learning data. Such learning may be performed in a device on which artificial intelligence according to the disclosure is performed, or may be performed through a separate server/system.
- the learning algorithm is a method in which a predetermined target device (e.g., a robot) is trained using a plurality of learning data such that a predetermined target device can make a decision or make a prediction by itself.
- a predetermined target device e.g., a robot
- learning algorithms include supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning, and the learning algorithm in this disclosure is not limited to the examples described above except for being specified.
- the processor 120 may generate a prediction image frame by inputting a plurality of image frames inputted according to a time based into a generator.
- the time based can be similar to the time of capture of the frames, where the frames are video frames.
- the plurality of image frames inputted according to time may be a plurality of image frames sequentially input in the past according to a chronological order.
- the plurality of image frames may constitute a video as images sequentially captured in chronological order.
- the plurality of image frames may be sequential image frames constituting an animation or a virtual image.
- the prediction image frame means an image frame predicted to appear immediately after the plurality of image frames at regular increment of time later.
- the processor 120 may acquire feature data from a plurality of input image frames through at least one layer included in the generator 200 .
- feature data may be acquired.
- the processor 120 may extract feature data corresponding to a change over time from the acquired feature data through the attention layer 215 included in the generator 200 . That is, as a result of inputting the previously acquired feature data to the attention layer 215 , feature data corresponding to a change over time among the acquired feature data may be extracted.
- each of the plurality of image frames is composed of a plurality of pixels.
- the acquired feature data may be feature data for each of a plurality of pixels.
- the attention layer 215 only feature data of some pixels among the feature data for each of the plurality of pixels may be extracted.
- Some pixels may be defined as pixels that are predicted to change over time, but are not limited thereto.
- features can be detected by performing edge detection and finding patterns of edges. The similar pattern of edges can be searched for in subsequent frames.
- the processor 120 may acquire the prediction image frame by inputting the extracted feature data to at least one other layer included in the generator 200 .
- the processor 120 may train the generator 200 based on a result of comparing a real image frame with the prediction image frame.
- the real image frame may actually be an image frame captured or input immediately after the plurality of image frames.
- the real image frame may constitute one video together with a plurality of image frames.
- the processor 120 may train the generator 200 to reduce Residual Loss and/or Gradient Loss between a real image frame and a prediction image frame. As the gradient loss is minimized, blurring in the generated prediction image frame may be reduced.
- the attention layer 215 may be trained to extract feature data of pixels predicted to change over time from the feature data for each of the plurality of pixels outputted from at least one layer.
- the processor 120 may train the generator 200 based on a feedback of a discriminator, which will be described below with reference to FIG. 5 .
- FIG. 2 is a view functionally illustrating a configuration of a generator according to an embodiment.
- the generator 200 may include a first neural network (Encoder 210 ) for encoding a plurality of inputted image frames.
- the encoder 210 can include, for example, an encoder according the Motion Pictures Expert Group (MPEG), or Advanced Video Compression (AVC).
- MPEG Motion Pictures Expert Group
- AVC Advanced Video Compression
- the generator 200 may be connected to the first neural network (Encoder. 210 ) and include a second neural network (Decoder 220 ) for decoding data encoded through the first neural network (Encoder 210 ).
- the encoder 210 may include a first attention layer 215
- the decoder 220 may include a second attention layer 225 .
- a prediction image frame 21 immediately following the plurality of image frames 20 may be generated.
- the generator 200 can examine motion vectors of different blocks over a number of frames, and use curve fitting to predict motion vectors between the last received frame and the predicted frame. Predicted frame can be generated by applying the predicted motion vectors to the last received frame.
- FIG. 3 is a view illustrating a process of processing image data blocks acquired through each layer in the generator according to the disclosure.
- FIG. 3 illustrates intermediate results acquired through a plurality of layers in the generator 200 as image data blocks.
- FIG. 3 assumes that an image data block 31 with respect to image frames having a resolution of 256 ⁇ 256 is input to a first layer in the generator 200 .
- 256 ⁇ 256 is used, it shall be noted that other size blocks can be used, such as, and not limited to 1920 ⁇ 1080.
- the ‘t’ refers to the number of image frames.
- a number of convolutions and max-pooling may be performed, and a resulting image data block 32 may be input to the attention layer 215 .
- Convolution refers to acquiring a new output by applying filter values or weight values to input data. Convolution may be performed by one or more layers.
- Pooling refers to an operation of sampling or resizing input data and may be performed through one or more layers.
- Max-pooling refers to an operation of extracting a maximum value from the input data.
- the image data block 32 may include feature data for a resolution of 128 ⁇ 128, which is smaller than the existing number of pixels.
- the attention layer 215 may output feature data of pixels whose motion over time is relatively large among a plurality of pixels constituting the image data block 32 .
- an image data block 33 including feature data with respect to a resolution of 64 ⁇ 64 may be acquired.
- convolution, max-pooling, and deconvolution may be additionally performed on the image data block 33 thereafter. As a result, an image data block 34 can be acquired.
- Deconvolution refers to an operation necessary to return a size of data changed or reduced by convolution or pooling to a size at the time of input. Deconvolution may correspond to upsampling, but is not limited thereto.
- the second attention layer 225 may acquire an image data block 35 including image data corresponding to a change in motion over time from the image data block 34 .
- the image data block 35 may correspond to 128 image frames having a resolution of 128 ⁇ 128.
- Concatenation may be performed to connect the image data block 32 to the image data block 35 described above.
- Concatenation may refer to concatenation or concatenation, and in the case of FIG. 3 , the image data block 32 and the image data block 35 corresponding to the number of 128 image frames are connected to each other such that a larger image data block (corresponding to 256 image frames) is formed.
- a deblocking filter can smooth the boundaries between the blocks.
- one prediction image frame 36 may be finally output.
- the image data block 32 corresponding to the encoder block output from at least some layers of the encoder 210 may be input to at least some layers of the decoder 220 along with the image data block 35 corresponding to the decoder block.
- At least one output excluding the output layer among the plurality of layers of the encoder 210 may be input to at least one of the layers among the plurality of layers of the decoder 220 .
- the generator 200 may be configured as a U-net.
- the number or order of convolution/deconvolution/max-pooling/concatenation, etc. may be modified by those skilled in the art, a resolution and the number of image frames of each of image data blocks to be input or output of each of the layers in the generator 200 may also be variously defined.
- FIG. 4 is a block diagram illustrating a configuration of an attention layer included in a generator according to an embodiment.
- a feature map 410 may be input to the attention layer 215 .
- the feature map 410 may include feature data for each pixel acquired as a result of a plurality of image frames input to the generator 200 going through at least one layer in the generator 200 .
- the feature map 410 may include information on time and/or space within a plurality of input image frames.
- the feature map 410 may be included in the image data block 32 of FIG. 3 .
- the feature map 410 may be input to different convolution layers 215 - 1 and 215 - 2 in the attention layer 215 , respectively.
- matrix multiplication is performed on the feature maps 411 and 412 output from different convolution layers 215 - 1 and 215 - 2 , and as a result of applying softmax, an attention map 415 may be output.
- Each element of the attention map 415 represents an attention probability for a specific spatial point and/or a temporal point in the input feature map 410 .
- a feature map 413 may be acquired as a result of inputting the feature map 410 to the convolution layer 215 - 3 .
- matrix multiplication may be performed on the feature map 413 and the attention map 415 and ⁇ may be multiplied.
- the output data 420 of the attention layer 215 may be generated.
- ⁇ is a trainable scale parameter for allocating non-local evidence.
- the attention layer 215 configured as shown in FIG. 4 may be trained on spatial/temporal correlation of a plurality of input image frames. Based on the spatial/temporal correlation output through the trained attention layer 215 , the generator 200 of the disclosure may improve performance in generating a prediction image frame following the plurality of image frames.
- the generator 200 may include two or more attention layers, and the second attention layer 225 as well as the first attention layer 215 may have a similar configuration complying with FIG. 4 .
- the memory 110 may further include a discriminator trained to identify whether the input image frame is a real image frame or a fake image frame.
- the discriminator may also be implemented as a neural network model.
- the processor 120 may train the generator 300 based on the output acquired by inputting the prediction image frame to the discriminator.
- FIG. 5 is a block diagram illustrating a discriminator for identifying whether an image frame generated by a generator is authentic or not according to an embodiment of the disclosure.
- the discriminator 300 may operate as a classifier capable of discriminating whether the inputted image frame is a real image frame or a fake image frame.
- the fake image frame refers to an image frame generated virtually through the generator 200 or the like, and the real image frame refers to an image in a real image that is not generated virtually.
- the processor 120 may train the discriminator 300 based on training data composed of a plurality of image frames and an image frame following the plurality of image frames.
- the processor 120 may generate a prediction image frame 51 ′ following the plurality of image frames 50 by inputting the plurality of image frames 50 to the generator 200 .
- the processor 120 may input the prediction image frame 51 ′ to the discriminator 300 .
- the discriminator 300 may identify whether the prediction image frame 51 ′ is a real image frame or a fake image frame.
- the processor 120 may input a real image frame 51 following the plurality of image frames 50 to the discriminator 300 .
- the processor 120 may train or update the discriminator 300 such that the discriminator 300 identifies the real image frame 51 as a ‘real image frame’ and the prediction image frame 51 ′ as a ‘fake image frame’.
- the processor 120 may train or update the generator 200 such that a probability that the discriminator 300 identifies the prediction image frame generated through the generator 200 as the ‘real image frame’ increases.
- the generator 200 of FIG. 5 constitutes a GAN together with the discriminator 300 and is trained to be hostile to each other, and as a result, its performance may be improved.
- the processor 120 may detect an abnormal situation by using the trained generator 200 as the embodiment of FIG. 5 .
- the abnormal situation refers to a situation that is contrary to a normal situation requiring quick discovery and response for personal or public safety, such as terrorism, natural disasters, accidents, occurrences of patient, damage/failures of equipment/facilities, etc., but is not limited thereto.
- the processor 120 may train the generator 200 and the discriminator 300 based on a plurality of image frames corresponding to a normal situation. Specifically, the processor 120 may train the generator 200 and the discriminator 300 using a plurality of image frames constituting one or more images identified (by humans) not to include the abnormal situation.
- the processor 120 may generate a prediction image frame by inputting a plurality of image frames captured according to time into the generator 200 , and as a result, may identify whether the abnormal situation has occurred based on the image frame captured after the plurality of captured image frames and the generated prediction image frame.
- the processor 120 may identify that the abnormal situation has occurred, but is not limited thereto.
- FIG. 6 is a table illustrating an effect of the generator of the disclosure in which an attention layer is self-contained.
- FIG. 6 illustrates conventional deep learning models for each of the CUHK dataset [19] and the UCSD dataset [20] and an area under curve (AUC) associated with the generator 200 according to the disclosure.
- AUC is a value representing an area of a receiver operation characteristic (ROC) curve.
- the ROC curve is a curve representing a ratio of ‘true positive’ (e.g., a real image frame, and identified as a real image frame) to a ratio of ‘false positive’ (e.g., a virtual image frame, but identified as a real image frame).
- ‘true positive’ e.g., a real image frame, and identified as a real image frame
- ‘false positive’ e.g., a virtual image frame, but identified as a real image frame
- the AUC of the generator 200 according to the disclosure is generally higher than that of conventional deep learning models.
- FIG. 7 is a block diagram illustrating a more detailed configuration of an electronic apparatus according to certain embodiments of the disclosure.
- the electronic apparatus 10 may further include a camera 130 , a communicator 140 , or the like.
- the memory 110 may further include the discriminator 300 of FIG. 5 in addition to the generator 200 .
- the camera 130 is a component for capturing at least one image.
- the processor 120 may generate a prediction image frame by inputting a plurality of image frames sequentially captured through the camera 130 into the generator 200 .
- the camera 130 may be implemented as an RGB camera, a 3D camera, or the like.
- the processor 120 may input a plurality of image frames received from an external electronic apparatus to the generator 200 through the communicator 140 .
- a plurality of image frames constituting the received video may be input to the generator 200 .
- the processor 120 may compare the prediction image frame output through the generator 200 with a real image frame to identify whether an abnormal situation has occurred.
- the processor 120 may notify the external server that the abnormal situation has occurred through the communicator 140 .
- the communicator 140 may be directly/indirectly connected to an external electronic apparatus through wired communication and/or wireless communication.
- the communicator 140 may be directly/indirectly connected to an external electronic apparatus based on a network implemented through wired communication and/or wireless communication.
- the wireless communication may include at least one of long-term evolution (LET), LTE advance (LTE-A), 5th Generation (5G) mobile communication, code division multiple access (CDMA), wideband CDMA (WCDMA), universal mobile telecommunications system (UMTS), wireless broadband (WiBro), global system for mobile communications (GSM), time division multiple access (DMA), Wi-Fi, Wi-Fi Direct, Bluetooth, near field communication (NFC), Zigbee, or the like.
- LET long-term evolution
- LTE-A LTE advance
- 5G 5th Generation
- CDMA code division multiple access
- WCDMA wideband CDMA
- UMTS universal mobile telecommunications system
- WiBro wireless broadband
- GSM global system for mobile communications
- DMA time division multiple access
- Wi-Fi Wi-Fi Direct
- Bluetooth near field communication
- NFC near field communication
- Zigbee Zigbee, or the like.
- Wired communication may include at least one of communication methods such as Ethernet, optical network, universal serial bus (USB), and ThunderBolt, or the like.
- communication methods such as Ethernet, optical network, universal serial bus (USB), and ThunderBolt, or the like.
- the network may be a personal area network (PAN), a local area network (LAN), a wide area network (WAN), etc., depending on areas or sizes, and may be Intranet, Extranet or the Internet depending on openness of the network.
- PAN personal area network
- LAN local area network
- WAN wide area network
- the communicator 140 may include a network interface or a network chip according to the wired/wireless communication method described above. Meanwhile, the communication method is not limited to the example described above, and may include a communication method newly emerging according to technology development.
- FIG. 8 is a flowchart illustrating a method of controlling an electronic apparatus according to an embodiment of the disclosure.
- the electronic apparatus may include a memory in which a generator trained to generate a prediction image based on one or more input images is stored.
- a plurality of image frames input according to time may be input to the generator (S 810 ).
- the plurality of image frames may be image frames sequentially input over time, and may be included in one image.
- a prediction image frame following the plurality of image frames may be generated using the generator (S 820 ).
- feature data may be acquired from a plurality of image frames input through at least one layer included in the generator.
- feature data corresponding to a change over time may be extracted from the acquired feature data, and the extracted feature data may be input to at least one other layer included in the generator to acquire a prediction image frame.
- the generator may include a first neural network including a first attention layer and a second neural network including a second attention layer.
- encoding of a plurality of image frames input through the first neural network may be performed, and decoding of data encoded through the first neural network may be performed through the second neural network.
- max-pooling may be performed through the first attention layer
- deconvolution may be performed through the second attention layer.
- control method may train a generator including the attention layer based on a result of comparing an input image frame after a plurality of image frames, and a prediction image frame.
- the attention layer may be trained to extract feature data of pixels predicted to change over time from feature data for each of the plurality of pixels output from at least one layer.
- the generator may be trained based on the output acquired by inputting the prediction image frame to the discriminator.
- control method may generate a prediction image frame by inputting a plurality of image frames captured according to time into the generator, and identify whether an abnormal situation has occurred based on the image frame captured after the plurality of captured image frames and the generated prediction image frame.
- the control method of the disclosure may be performed through the electronic apparatus 10 illustrated and described with reference to FIGS. 1 and 7 .
- the control method of the disclosure described above may be performed through a system including the electronic apparatus 10 and one or more external electronic apparatus.
- exemplary embodiments described above may be embodied in a recording medium that may be read by a computer or a similar apparatus to the computer by using software, hardware, or a combination thereof.
- exemplary embodiments that are described in the disclosure may be embodied by using at least one selected from Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), processors, controllers, micro-controllers, microprocessors, electrical units for performing other functions.
- ASICs Application Specific Integrated Circuits
- DSPs Digital Signal Processors
- DSPDs Digital Signal Processing Devices
- PLDs Programmable Logic Devices
- FPGAs Field Programmable Gate Arrays
- processors controllers, micro-controllers, microprocessors, electrical units for performing other functions.
- the embodiments described herein may be implemented by the processor 120 itself.
- certain embodiments described in the specification such as a procedure and a function may be embodied as separate software modules.
- the software modules may respectively perform one or more functions and operations described in the present specification.
- computer instructions for performing a processing operation in the electronic apparatus 10 may be stored in a non-transitory computer-readable medium.
- the computer instructions stored in the non-transitory computer-readable medium are executed by a processor of a specific device, the specific device described above performs the processing operation in the electronic apparatus 100 according to certain embodiments described above.
- the non-transitory computer readable recording medium refers to a medium that stores data and that can be read by devices.
- the non-transitory computer-readable medium may be CD, DVD, a hard disc, Blu-ray disc, USB, a memory card, ROM, or the like.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Biomedical Technology (AREA)
- Multimedia (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Signal Processing (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
Abstract
Description
-
- [1] Vincent, P, Larochelle, H, Lajoie, I, Bengio, Y, & Manzagol, P. A (2010) Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion Journal of machine learning research, 11 (December), 3371-3408
- [2] O Ronneberger, P Fischer, and T Brox U-net: Convolutional networks for biomedical image segmentation In International Conference on Medical image computing and computer-assisted intervention, pages 234-241 Springer, 2015
- [3] Goodfellow, I, Pouget-Abadie, J, Mirza, M, Xu, B, Warde-Farley, D, Ozair, S, & Bengio, Y (2014) Generative adversarial nets In Advances in neural information processing systems (pp 2672-2680)
- [4] M Ravanbakhsh, M Nabi, E Sangineto, L Marcenaro, C Regazzoni, and N Sebe Abnormal event detection in videos using generative adversarial nets In 2017 IEEE International Conference on Image Processing (ICIP), pages 1577-1581 IEEE, 2017.
- [5] Bergmann, P, Lowe, S, Fauser, M, Sattlegger, D, & Steger, C (2018) Improving Unsupervised Defect Segmentation by Applying Structural Similarity to Autoencoders arXiv preprint arXiv:180702011
- [6] T Schlegl, P Seeböck, S M Waldstein, U Schmidt-Erfurth, and G Langs Unsupervised anomaly detection with generative adversarial networks to guide marker discovery In International Conference on Information Processing in Medical Imaging, pages 146-157 Springer, 2017
- [7] B Chen, W Wang, and J Wang Video imagination from a single image with transformation generation In Proceedings of the on Thematic Workshops of ACM Multimedia 2017, pages 358-366 ACM, 2017.
- [8] T-W Hui, X Tang, and C Change Loy Liteflownet: A lightweight convolutional neural network for optical flow estimation In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8981-89, 2018
- [9] E Ilg, N Mayer, T Saikia, M Keuper, A Dosovitskiy, and T Brox Flownet 20: Evolution of optical flow estimation with deep networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2462-2470, 2017.
- [10] A Dosovitskiy, P Fischer, E Ilg, P Hausser, C Hazirbas, V Golkov, P Van Der Smagt, D Cremers, and T Brox Flownet: Learning optical flow with convolutional networks In Proceedings of the IEEE international conference on computer vision, pages 2758-2766, 2015
- [11] K Simonyan and A Zisserman Two-stream convolutional networks for action recognition in videos In Advances in neural information processing systems, pages 568-576, 2014
- [12] W Liu, W Luo, D Lian, and S Gao Future frame prediction for anomaly detection-a new baseline. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6536-6545, 2018
- [13] H Zhang, I Goodfellow, D Metaxas, and A Odena Selfattention generative adversarial networks arXiv preprint arXiv:1805.08318, 2018
- [14] X Wang, R Girshick, A Gupta, and K He Non-local neural networks In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7794-7803, 2018
- [15] Zeng, K, Yu, J, Wang, R, Li, C, & Tao, D (2015) Coupled deep autoencoder for single image super-resolution IEEE transactions on cybernetics, 47(1), 27-37
- [16] H Cai, C Bai, Y-W Tai, and C-K Tang. Deep video generation, prediction and completion of human action sequences. In Proceedings of the European Conference on Computer Vision (ECCV), pages 366-382, 2018.
- [17] Y S Chong and Y H Tay. Abnormal event detection in videos using spatiotemporal autoencoder. In International Symposium on Neural Networks, pages 189-196 Springer, 2017
- [18] W Luo, W Liu, and S Gao. Remembering history with convolutional lstm for anomaly detection. In 2017 IEEE International Conference on Multimedia and Expo (ICME), pages 439-444 IEEE, 2017
- [19] C Lu, J Shi, and J Jia Abnormal event detection at 150 fps in matlab In Proceedings of the IEEE international conference on computer vision, pages 2720-2727, 2013.
- [20] V Mahadevan, W Li, V Bhalodia, and N Vasconcelos Anomaly detection in crowded scenes In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 1975-1981 IEEE, 2010
- [21] M Mathieu, C Couprie, and Y LeCun Deep multi-scale video prediction beyond mean square error arXiv preprint arXiv:1511.05440, 2015.
- [22] A Hore and D Ziou Image quality metrics: Psnr vs ssim. In 2010 20th International Conference on Pattern Recognition, pages 2366-2369 IEEE, 2010
- [23] J Van Amersfoort, A Kalman, M Ranzato, A Szlam, D Tran, and S Chintala Transformation-based models of video sequences. arXiv preprint arXiv:1701.08435, 2017.
- [24] C Vondrick, H Pirsiavash, and A Torralba. Generating videos with scene dynamics. In Advances In Neural Information Processing Systems, pages 613-621, 2016
- [25] C Vondrick and A Torralba. Generating the future with adversarial transformers. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1020-1028, 2017
- [26] T Xue, J Wu, K Bouman, and B Freeman Visual dynamics: Probabilistic future frame synthesis via cross convolutional networks. In Advances in Neural Information Processing Systems, pages 91-99, 2016
- [27] Y Yoo, S Yun, H Jin Chang, Y Demiris, and J Young Choi Variational autoencoded regression: high dimensional regression of visual data on complex manifold. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3674-3683, 2017
- [28] M Hasan, J Choi, J Neumann, A K Roy-Chowdhury, and L. S Davis Learning temporal regularity in video sequences. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 733-742, 2016
- [29] S Smeureanu, R T Ionescu, M Popescu, and B Alexe Deep appearance features for abnormal behavior detection in video. In International Conference on Image Analysis and Processing, pages 779-789 Springer, 2017
- [30] R Hinami, T Mei, and S Satoh Joint detection and recounting of abnormal events by learning deep generic knowledge. In Proceedings of the IEEE International Conference on Computer Vision, pages 3619-3627, 2017
- [31] R Tudor Ionescu, S Smeureanu, B Alexe, and M Popescu. Unmasking the abnormal events in video. In Proceedings of the IEEE International Conference on Computer Vision, pages 2895-2903, 2017
- [32] W Luo, W Liu, and S Gao A revisit of sparse coding based anomaly detection in stacked rnn framework In Proceedings of the IEEE International Conference on Computer Vision, pages 341-349, 2017
- [33] Christiansen, P, Nielsen, L, Steen, K, Jorgensen, R, & Karstoft, H (2016) DeepAnomaly: Combining background subtraction and deep learning for detecting obstacles and anomalies in an agricultural field Sensors, 16(11), 1904
- [34] Basharat, A, Gritai, A, & Shah, M (2008, June). Learning object motion patterns for anomaly detection and improved object detection. In 2008 IEEE Conference on Computer Vision and Pattern Recognition (pp 1-8) IEEE.
Claims (9)
Applications Claiming Priority (5)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| KR10-2019-0058189 | 2019-05-17 | ||
| KR20190058189 | 2019-05-17 | ||
| KR10-2020-0020271 | 2020-02-19 | ||
| KR1020200020271A KR20200132665A (en) | 2019-05-17 | 2020-02-19 | Attention layer included generator based prediction image generating apparatus and controlling method thereof |
| PCT/KR2020/006356 WO2020235861A1 (en) | 2019-05-17 | 2020-05-14 | Device for generating prediction image on basis of generator including concentration layer, and control method therefor |
Related Parent Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/KR2020/006356 Continuation WO2020235861A1 (en) | 2019-05-17 | 2020-05-14 | Device for generating prediction image on basis of generator including concentration layer, and control method therefor |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20210326650A1 US20210326650A1 (en) | 2021-10-21 |
| US12112514B2 true US12112514B2 (en) | 2024-10-08 |
Family
ID=73645649
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/361,556 Active 2041-08-28 US12112514B2 (en) | 2019-05-17 | 2021-06-29 | Device for generating prediction image on basis of generator including concentration layer, and control method therefor |
Country Status (2)
| Country | Link |
|---|---|
| US (1) | US12112514B2 (en) |
| KR (1) | KR20200132665A (en) |
Families Citing this family (20)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109615019B (en) * | 2018-12-25 | 2022-05-31 | 吉林大学 | Abnormal behavior detection method based on space-time automatic encoder |
| CN111401138B (en) * | 2020-02-24 | 2023-11-07 | 上海理工大学 | Countermeasure optimization method for generating countermeasure neural network training process |
| US20230085827A1 (en) * | 2020-03-20 | 2023-03-23 | The Regents Of The University Of California | Single-shot autofocusing of microscopy images using deep learning |
| US11810312B2 (en) | 2020-04-21 | 2023-11-07 | Daegu Gyeongbuk Institute Of Science And Technology | Multiple instance learning method |
| CN113052365B (en) * | 2021-02-26 | 2022-07-01 | 浙江工业大学 | A Life Prediction Method for Rotating Machinery Based on MSWR-LRCN |
| KR102266165B1 (en) * | 2021-03-26 | 2021-06-17 | 인하대학교 산학협력단 | Method and Apparatus for Editing of Personalized Face Age via Self-Guidance in Generative Adversarial Networks |
| US12087096B2 (en) | 2021-03-31 | 2024-09-10 | Samsung Electronics Co., Ltd. | Method and apparatus with biometric spoofing consideration |
| CN113052257B (en) * | 2021-04-13 | 2024-04-16 | 中国电子科技集团公司信息科学研究院 | Deep reinforcement learning method and device based on visual transducer |
| WO2022250253A1 (en) * | 2021-05-25 | 2022-12-01 | Samsung Electro-Mechanics Co., Ltd. | Apparatus and method with manufacturing anomaly detection |
| KR102609153B1 (en) * | 2021-05-25 | 2023-12-05 | 삼성전기주식회사 | Apparatus and method for detecting anomalies in manufacturing images based on deep learning |
| CN113341419B (en) * | 2021-05-25 | 2022-11-01 | 成都信息工程大学 | Weather extrapolation method and system based on VAN-ConvLSTM |
| CN113283393B (en) * | 2021-06-28 | 2023-07-25 | 南京信息工程大学 | Deepfake video detection method based on image group and two-stream network |
| CN113362230B (en) * | 2021-07-12 | 2024-04-05 | 昆明理工大学 | Method for realizing super-resolution of countercurrent model image based on wavelet transformation |
| CN114494780B (en) * | 2022-01-26 | 2025-07-11 | 上海交通大学 | Semi-supervised industrial defect detection method and system based on feature comparison |
| CN114511929B (en) * | 2022-02-17 | 2025-08-15 | 阳光保险集团股份有限公司 | Abnormal behavior detection method and device, electronic equipment and storage medium |
| US20230325631A1 (en) * | 2022-04-12 | 2023-10-12 | Optum, Inc. | Combined deep learning inference and compression using sensed data |
| CN115456922B (en) * | 2022-09-01 | 2025-07-15 | 东南大学 | A heterogeneous image fusion method in traffic scenes |
| CN116109902A (en) * | 2023-02-14 | 2023-05-12 | 中国科学院空天信息创新研究院 | Fuzzy image target detection model training method, fuzzy image target detection model training device and fuzzy image target detection method |
| CN116787432B (en) * | 2023-06-14 | 2025-09-02 | 安徽工程大学 | A robot vision-guided grasping method |
| CN117612168B (en) * | 2023-11-29 | 2024-11-05 | 湖南工商大学 | Recognition method and device based on feature pyramid and attention mechanism |
Citations (20)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| GB2443739B (en) | 2006-11-13 | 2009-02-25 | Bosch Gmbh Robert | Method for detecting image regions of salient motion, apparatus and computer program for executing the method |
| JP2009060481A (en) | 2007-09-03 | 2009-03-19 | Sony Corp | Image processing apparatus and method, learning apparatus and method, and program |
| KR101180887B1 (en) | 2010-09-08 | 2012-09-07 | 중앙대학교 산학협력단 | Apparatus and method for detecting abnormal behavior |
| KR20140076815A (en) | 2012-12-13 | 2014-06-23 | 한국전자통신연구원 | Apparatus and method for detecting an abnormal motion based on pixel of images |
| US9129158B1 (en) * | 2012-03-05 | 2015-09-08 | Hrl Laboratories, Llc | Method and system for embedding visual intelligence |
| KR20160093253A (en) | 2015-01-29 | 2016-08-08 | 쿠도커뮤니케이션 주식회사 | Video based abnormal flow detection method and system |
| KR101752387B1 (en) | 2015-09-11 | 2017-07-11 | 디투이모션 주식회사 | A mobile device for detecting abnormal activity and system including the same |
| KR20180065498A (en) | 2016-12-08 | 2018-06-18 | 한국항공대학교산학협력단 | Method for deep learning and method for generating next prediction image using the same |
| KR101916347B1 (en) | 2017-10-13 | 2018-11-08 | 주식회사 수아랩 | Deep learning based image comparison device, method and computer program stored in computer readable medium |
| US20180336466A1 (en) | 2017-05-17 | 2018-11-22 | Samsung Electronics Co., Ltd. | Sensor transformation attention network (stan) model |
| KR20180126353A (en) | 2017-05-17 | 2018-11-27 | 삼성전자주식회사 | Sensor transformation attention network(stan) model |
| US20180368721A1 (en) | 2015-12-21 | 2018-12-27 | Samsung Electronics Co., Ltd. | Medical imaging device and magnetic resonance imaging device, and control method therefor |
| KR101975186B1 (en) | 2018-07-04 | 2019-05-07 | 광운대학교 산학협력단 | Apparatus and method of data generation for object detection based on generative adversarial networks |
| KR20190049401A (en) | 2017-10-31 | 2019-05-09 | 바이두 유에스에이 엘엘씨 | Identity authentication method, terminal equipment and computer readable storage medium |
| US20190258925A1 (en) * | 2018-02-20 | 2019-08-22 | Adobe Inc. | Performing attribute-aware based tasks via an attention-controlled neural network |
| US20190333198A1 (en) * | 2018-04-25 | 2019-10-31 | Adobe Inc. | Training and utilizing an image exposure transformation neural network to generate a long-exposure image from a single short-exposure image |
| US20200379640A1 (en) * | 2019-05-29 | 2020-12-03 | Apple Inc. | User-realistic path synthesis via multi-task generative adversarial networks for continuous path keyboard input |
| US20220051061A1 (en) * | 2019-10-30 | 2022-02-17 | Tencent Technology (Shenzhen) Company Limited | Artificial intelligence-based action recognition method and related apparatus |
| US20220222409A1 (en) * | 2021-01-12 | 2022-07-14 | Wuhan University | Method and system for predicting remaining useful life of analog circuit |
| US20240095989A1 (en) * | 2022-09-15 | 2024-03-21 | Nvidia Corporation | Video generation techniques |
-
2020
- 2020-02-19 KR KR1020200020271A patent/KR20200132665A/en not_active Withdrawn
-
2021
- 2021-06-29 US US17/361,556 patent/US12112514B2/en active Active
Patent Citations (24)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| GB2443739B (en) | 2006-11-13 | 2009-02-25 | Bosch Gmbh Robert | Method for detecting image regions of salient motion, apparatus and computer program for executing the method |
| JP2009060481A (en) | 2007-09-03 | 2009-03-19 | Sony Corp | Image processing apparatus and method, learning apparatus and method, and program |
| KR101180887B1 (en) | 2010-09-08 | 2012-09-07 | 중앙대학교 산학협력단 | Apparatus and method for detecting abnormal behavior |
| US9129158B1 (en) * | 2012-03-05 | 2015-09-08 | Hrl Laboratories, Llc | Method and system for embedding visual intelligence |
| KR20140076815A (en) | 2012-12-13 | 2014-06-23 | 한국전자통신연구원 | Apparatus and method for detecting an abnormal motion based on pixel of images |
| KR20160093253A (en) | 2015-01-29 | 2016-08-08 | 쿠도커뮤니케이션 주식회사 | Video based abnormal flow detection method and system |
| KR101752387B1 (en) | 2015-09-11 | 2017-07-11 | 디투이모션 주식회사 | A mobile device for detecting abnormal activity and system including the same |
| US20180368721A1 (en) | 2015-12-21 | 2018-12-27 | Samsung Electronics Co., Ltd. | Medical imaging device and magnetic resonance imaging device, and control method therefor |
| KR20180065498A (en) | 2016-12-08 | 2018-06-18 | 한국항공대학교산학협력단 | Method for deep learning and method for generating next prediction image using the same |
| KR20180126353A (en) | 2017-05-17 | 2018-11-27 | 삼성전자주식회사 | Sensor transformation attention network(stan) model |
| US20180336466A1 (en) | 2017-05-17 | 2018-11-22 | Samsung Electronics Co., Ltd. | Sensor transformation attention network (stan) model |
| US20200388021A1 (en) * | 2017-10-13 | 2020-12-10 | Sualab Co., Ltd. | Deep learning based image comparison device, method and computer program stored in computer readable medium |
| KR101916347B1 (en) | 2017-10-13 | 2018-11-08 | 주식회사 수아랩 | Deep learning based image comparison device, method and computer program stored in computer readable medium |
| US10937141B2 (en) | 2017-10-13 | 2021-03-02 | Sualab Co., Ltd. | Deep learning based image comparison device, method and computer program stored in computer readable medium |
| KR20190049401A (en) | 2017-10-31 | 2019-05-09 | 바이두 유에스에이 엘엘씨 | Identity authentication method, terminal equipment and computer readable storage medium |
| US10635893B2 (en) | 2017-10-31 | 2020-04-28 | Baidu Usa Llc | Identity authentication method, terminal device, and computer-readable storage medium |
| US20190258925A1 (en) * | 2018-02-20 | 2019-08-22 | Adobe Inc. | Performing attribute-aware based tasks via an attention-controlled neural network |
| US20190333198A1 (en) * | 2018-04-25 | 2019-10-31 | Adobe Inc. | Training and utilizing an image exposure transformation neural network to generate a long-exposure image from a single short-exposure image |
| KR101975186B1 (en) | 2018-07-04 | 2019-05-07 | 광운대학교 산학협력단 | Apparatus and method of data generation for object detection based on generative adversarial networks |
| US20200012896A1 (en) * | 2018-07-04 | 2020-01-09 | Kwangwoon University Industry-Academic Collaboration Foundation | Apparatus and method of data generation for object detection based on generative adversarial networks |
| US20200379640A1 (en) * | 2019-05-29 | 2020-12-03 | Apple Inc. | User-realistic path synthesis via multi-task generative adversarial networks for continuous path keyboard input |
| US20220051061A1 (en) * | 2019-10-30 | 2022-02-17 | Tencent Technology (Shenzhen) Company Limited | Artificial intelligence-based action recognition method and related apparatus |
| US20220222409A1 (en) * | 2021-01-12 | 2022-07-14 | Wuhan University | Method and system for predicting remaining useful life of analog circuit |
| US20240095989A1 (en) * | 2022-09-15 | 2024-03-21 | Nvidia Corporation | Video generation techniques |
Non-Patent Citations (36)
| Title |
|---|
| A Dosovitskiy, P Fischer, E Ilg, P Hausser, C Hazirbas, V Golkov, P Van Der Smagt, D Cremers, and T Brox Flownet: Learning optical flow with convolutional networks in Proceedings of the IEEE international conference on computer vision, pp. 2758-2766, 2015. |
| A Hore and D Ziou Image quality metrics: Psnr vs ssim. In 2010 20th International Conference on Pattern Recognition, pp. 2366-2369 IEEE, 2010. |
| An, J., & Cho, S. (2015). Variational autoencoder based anomaly detection using reconstruction probability. Special Lecture on IE, 2, 1-18. |
| B Chen, W Wang, and J Wang Video imagination from a single image with transformation generation in Proceedings of the on Thematic Workshops of ACM Multimedia 2017, pp. 358-366 ACM, 2017. |
| Basharat, A, Gritai, A, & Shah, M (Jun. 2008). Learning object motion patterns for anomaly detection and improved object detection. In 2008 IEEE Conference on Computer Vision and Pattern Recognition (pp. 1-8) IEEE. |
| Bergmann, P, Lowe, S, Fauser, M, Sattlegger, D, & S t e g e r, C (2018) Improving Unsupervised Defect Segmentation by Applying Structural Similarity to Autoencoders arXiv preprint arXiv:180702011. |
| C Lu, J Shi, and J Jia Abnormal event detection at 150 fps in matlab in Proceedings of the IEEE international conference on computer vision, pp. 2720-2727, 2013. |
| C Vondrick and A Torralba. Generating the future with adversarial transformers. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1020-1028, 2017. |
| C Vondrick, H Pirsiavash, and A Torralba. Generating videos with scene dynamics. In Advances in Neural Information Processing Systems, pp. 613-621, 2016. |
| Christiansen, P, Nielsen, L, Steen, K, Jørgensen, R, & Karstoft, H (2016) DeepAnomaly: Combining background subtraction and deep learning for detecting obstacles and anomalies in an agricultural field Sensors, 16(11), 1904. |
| E llg, N Mayer, T Saikia, M Keuper, A Dosovitskiy, and T Brox Flownet 20: Evolution of optical flow estimation with deep networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2462-2470, 2017. |
| Goodfellow, I, Pouget-Abadie, J, Mirza, M, Xu, B, Warde-Farley, D, Ozair, S, & Bengio, Y (2014) Generative adversarial nets in Advances in neural information processing systems (pp. 2672-2680). |
| H Cai, C Bai, Y-W Tai, and C-K Tang. Deep video generation, prediction and completion of human action sequences. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 366-382, 2018. |
| H Zhang, I Goodfellow, D Metaxas, and A Odena Selfattention generative adversarial networks arXiv preprint arXiv:1805.08318, 2018. |
| J Van Amersfoort, A Kannan, M Ranzato, A Szlam, D Tran, and S Chintala Transformation-based models of video sequences. arXiv preprint arXiv:1701.08435, 2017. |
| K Simonyan and A Zisserman Two-stream convolutional networks for action recognition in videos in Advances in neural information processing systems, pp. 568-576, 2014. |
| Li, W., Mahadevan, V., & Vasconcelos, N. (2014). Anomaly detection and localization in crowded scenes. IEEE transactions on pattern analysis and machine intelligence, 36(1), 18-32. |
| M Hasan, J Choi, J Neumann, A K Roy-Chowdhury, and L.S Davis Learning temporal regularity in video sequences. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 733-742, 2016. |
| M Mathieu, C Couprie, and Y LeCun Deep multi-scale video prediction beyond mean square error arXiv preprint arXiv: 1511.05440, 2015. |
| M Ravanbakhsh, M Nabi, E Sangineto, L Marcenaro, C Regazzoni, and N Sebe Abnormal event detection in videos using generative adversarial nets In 2017 IEEE International Conference on Image Processing (ICIP), pp. 1577-1581 IEEE, 2017. |
| R Hinami, T Mei, and S Satoh Joint detection and recounting of abnormal events by learning deep generic knowledge. In Proceedings of the IEEE International Conference on Computer Vision, pp. 3619-3627, 2017. |
| R Tudor Ionescu, S Smeureanu, B Alexe, and M Popescu. Unmasking the abnormal events in video. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2895-2903, 2017. |
| Ronneberger, p Fischer, and T Brox U-net: Convolutional networks for biomedical image segmentation in International Conference on Medical image computing and computer-assisted intervention, pp. 234-241 Springer, 2015. |
| S Smeureanu, R T Ionescu, M Popescu, and B Alexe Deep appearance features for abnormal behavior detection in video. In International Conference on Image Analysis and Processing, pp. 779-789 Springer, 2017. |
| T Schlegl, P. Seeböock, S MWaldstein, U Schmidt-Erfurth, and G Langs Unsupervised anomaly detection with generative adversarial networks to guide marker discovery In International Conference on Information Processing in Medical Imaging, pp. 146-157 Springer, 2017. |
| T Xue, J Wu, K Bouman, and B Freeman Visual dynamics: Probabilistic future frame synthesis via cross convolutional networks. In Advances in Neural Information Processing Systems, pp. 91-99, 2016. |
| T-W Hui, X Tang, and C Change Loy Liteflownet: A lightweight convolutional neural network for optical flow estimation in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8981-8989, 2018. |
| V Mahadevan, W Li, V Bhalodia, and N Vasconcelos Anomaly detection in crowded scenes in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1975-1981 IEEE, 2010. |
| Vincent, P, Larochelle, H, Lajoie, I, Bengio, Y, & Manzagol, P. A (2010) Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion Journal of machine learning research, 11 (Dec), 3371-3408. |
| W Liu, W Luo, D Lian, and S Gao Future frame prediction for anomaly detection-a new baseline. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6536-6545, 2018. |
| W Luo, W Liu, and S Gao A revisit of sparse coding based anomaly detection in stacked rnn framework in Proceedings of the IEEE International Conference on Computer Vision, pp. 341-349, 2017. |
| W Luo, W Liu, and S Gao. Remembering history with convolutional Istm for anomaly detection. In 2017 IEEE International Conference on Multimedia and Expo (ICME), pp. 439-444 IEEE, 2017. |
| X Wang, R Girshick, A Gupta, and K He Non-local neural networks in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794-7803, 2018. |
| Y S Chong and Y H Tay. Abnormal event detection in videos using spatiotemporal autoencoder. In International Symposium on Neural Networks, pp. 189-196 Springer, 2017. |
| Y Yoo, S Yun, H Jin Chang, Y Demiris, and J Young Choi Variational autoencoded regression: high dimensional regression of visual data on complex manifold. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3674-3683, 2017. |
| Zeng, K, Yu, J, Wang, R, Li, C, & Tao, D (2015) Coupled deep autoencoder for single image super-resolution IEEE transactions on cybernetics, 47(1), 27-37. |
Also Published As
| Publication number | Publication date |
|---|---|
| KR20200132665A (en) | 2020-11-25 |
| US20210326650A1 (en) | 2021-10-21 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12112514B2 (en) | Device for generating prediction image on basis of generator including concentration layer, and control method therefor | |
| US12008797B2 (en) | Image segmentation method and image processing apparatus | |
| CN108805015B (en) | A Crowd Anomaly Detection Method for Weighted Convolutional Autoencoder Long Short-Term Memory Networks | |
| CN109325954B (en) | Image segmentation method and device and electronic equipment | |
| TWI759286B (en) | System and method for training object classifier by machine learning | |
| GB2554435B (en) | Image processing | |
| JP2023526207A (en) | Maintaining a constant size of the target object in the frame | |
| US9317784B2 (en) | Image processing apparatus, image processing method, and program | |
| US9189867B2 (en) | Adaptive image processing apparatus and method based in image pyramid | |
| CN112446270A (en) | Training method of pedestrian re-identification network, and pedestrian re-identification method and device | |
| US20220122729A1 (en) | Apparatus for learning cerebrovascular disease, apparatus for detecting cerebrovascular disease, mehtod of learning cerebrovascular disease and method of detecting cerebrovascular disease | |
| KR102476022B1 (en) | Face detection method and apparatus thereof | |
| WO2020043296A1 (en) | Device and method for separating a picture into foreground and background using deep learning | |
| CN113553957B (en) | A multi-scale prediction behavior recognition system and method | |
| CN111950339A (en) | video processing | |
| Liu et al. | Scene background estimation based on temporal median filter with Gaussian filtering | |
| Dronova et al. | Flynerf: Nerf-based aerial mapping for high-quality 3d scene reconstruction | |
| CN115713810B (en) | Lightweight fueller behavior recognition method and device based on sequence diagram | |
| Wang et al. | Object counting in video surveillance using multi-scale density map regression | |
| JP6963038B2 (en) | Image processing device and image processing method | |
| JP2023519527A (en) | Generating segmentation masks based on autoencoders in alpha channel | |
| KR20220003946A (en) | Electronic device and controlling method of electronic device | |
| JP6598952B2 (en) | Image processing apparatus and method, and monitoring system | |
| CN113160027A (en) | Image processing model training method and device | |
| WO2020235861A1 (en) | Device for generating prediction image on basis of generator including concentration layer, and control method therefor |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: SAMSUNG ELECTRONICS CO., LTD., KOREA, REPUBLIC OF Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:JANG, JUNIK;JUNG, JAEIL;HONG, JONGHEE;REEL/FRAME:056700/0750 Effective date: 20210623 |
|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
| ZAAB | Notice of allowance mailed |
Free format text: ORIGINAL CODE: MN/=. |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |