Shi et al., 2024 - Google Patents
Image-text multimodal classification via cross-attention contextual transformer with modality-collaborative learningShi et al., 2024
- Document ID
- 1103603356299116839
- Author
- Shi Q
- Xu W
- Miao Z
- Publication year
- Publication venue
- Journal of Electronic Imaging
External Links
Snippet
Nowadays, we are surrounded by various types of data from different modalities, such as text, images, audio, and video. The existence of this multimodal data provides us with rich information, but it also brings new challenges: how do we effectively utilize this data for …
- 238000000034 method 0 abstract description 58
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
- G06F17/3066—Query translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30861—Retrieval from the Internet, e.g. browsers
- G06F17/30864—Retrieval from the Internet, e.g. browsers by querying, e.g. search engines or meta-search engines, crawling techniques, push systems
- G06F17/30867—Retrieval from the Internet, e.g. browsers by querying, e.g. search engines or meta-search engines, crawling techniques, push systems with filtering and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30017—Multimedia data retrieval; Retrieval of more than one type of audiovisual media
- G06F17/30023—Querying
- G06F17/30029—Querying by filtering; by personalisation, e.g. querying making use of user profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30781—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F17/30784—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/02—Knowledge representation
- G06N5/022—Knowledge engineering, knowledge acquisition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06Q—DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for a specific business sector, e.g. utilities or tourism
- G06Q50/01—Social networking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/18—Digital computers in general; Data processing equipment in general in which a programme is changed according to experience gained by the computer itself during a complete run; Learning machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Song et al. | Vman: visual-modified attention network for multimodal paradigms | |
| Yusuf et al. | An analysis of graph convolutional networks and recent datasets for visual question answering | |
| Yang et al. | Large language models meet text-centric multimodal sentiment analysis: A survey | |
| Zhang et al. | Asymmetric light-aware progressive decoding network for RGB-thermal salient object detection | |
| Li et al. | MIA-Net: Multi-modal interactive attention network for multi-modal affective analysis | |
| Wang et al. | TASTA: text‐assisted spatial and temporal attention network for video question answering | |
| Zheng et al. | Visual entity linking via multi-modal learning | |
| Wang et al. | Deep tensor evidence fusion network for sentiment classification | |
| Li et al. | CCMA: CapsNet for audio–video sentiment analysis using cross-modal attention | |
| Sun et al. | Rumour detection technology based on the BiGRU_capsule network | |
| Luo et al. | Attention fusion network for multimodal sentiment analysis | |
| Liu et al. | A multimodal approach for multiple-relation extraction in videos | |
| Shi et al. | Learning modality-fused representation based on transformer for emotion analysis | |
| Jia et al. | Semantic association enhancement transformer with relative position for image captioning | |
| Yu et al. | CAGNet: a context-aware graph neural network for detecting social relationships in videos | |
| Wen et al. | Few-shot named entity recognition with joint token and sentence awareness | |
| Shi et al. | Image-text multimodal classification via cross-attention contextual transformer with modality-collaborative learning | |
| Zhang et al. | A Multimodal Semantic Fusion Network with Cross-Modal Alignment for Multimodal Sentiment Analysis | |
| Chen et al. | Multimodal encoders for food-oriented cross-modal retrieval | |
| Yu et al. | Feature interaction and two-stage cross-modal fusion for RGB-D salient object detection | |
| Wang et al. | Semantic-guided spatio-temporal attention for few-shot action recognition: J. Wang and B. Liu | |
| Yang et al. | Performance comparison of deep learning text embeddings in sentiment analysis tasks with online consumer reviews | |
| Zeng et al. | Aspect-aware semantic feature enhanced networks for multimodal aspect-based sentiment analysis | |
| Zheng | AVMFAC: acoustic visual multi-scale feature aggregation conformer for multimodal sentiment analysis | |
| Liu et al. | Knowledge enhanced and incongruity perceiving network for multimodal sarcasm detection |