-
Institute of Automation,Chinese Academy of Sciences
Highlights
- Pro
Stars
Code for ALBEF: a new vision-language pre-training method
[ICML 2024] Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Multimodal model for text and tabular data with HuggingFace transformers as building block for text data
TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
LAVIS - A One-stop Library for Language-Vision Intelligence
Open source deep learning based unsupervised image retrieval toolbox built on PyTorch🔥
The open-source tool for building high-quality datasets and computer vision models
OpenMMLab Pre-training Toolbox and Benchmark
PyTorch implementation of SimCLR: A Simple Framework for Contrastive Learning of Visual Representations
PyTorch implementation of SimCLR: A Simple Framework for Contrastive Learning of Visual Representations by T. Chen et al.
A codebase for flexible and efficient Image Text Representation Alignment
Source code of paper "Remote Sensing Cross-Modal Image-Text Retrieval Based on Global and Local Information"
Download scripts for EPIC-KITCHENS
The source code of AMFMN and the dataset RSITMD
Parameter-Efficient Transfer Learning for Remote Sensing Image-Text Retrieval, 2023
Datasets for remote sensing images (Paper:Exploring Models and Data for Remote Sensing Image Caption Generation)
RS5M: a large-scale vision language dataset for remote sensing [TGRS]
🧀 [ACMMM'23 Oral] Official Code for “A Prior Instruction Representation Framework for Remote Sensing Image-text Retrieval”
[ICLRW 2024] Efficient Remote Sensing with Harmonized Transfer Learning and Modality Alignment
Open-source code for the paper "Enhancing Remote Sensing Vision-Language Models for Zero-Shot Scene Classification"
Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning
🛰️ Official repository of paper "RemoteCLIP: A Vision Language Foundation Model for Remote Sensing" (IEEE TGRS)