-
Max Planck Institute for Intelligent Systems
- https://ps.is.mpg.de/person/mkocabas
Lists (1)
Sort Name ascending (A-Z)
Stars
MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
A batched offline inference oriented version of segment-anything
WiLoR hand 3d pose estimation! Simplifying WiLoR into a python package!
ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion
faster parallel inference of mochi-1 video generation model
PyTorch native quantization and sparsity for training and inference
Source code for the SIGGRAPH 2024 paper "X-Portrait: Expressive Portrait Animation with Hierarchical Motion Attention"
Free Palestine🇵🇸🇵🇸🇵🇸Cross platform super fast single header c++ library to get image size and format without loading/decoding. Support avif, bmp, cur, dds, gif, hdr (pic), heic (heif), icns, ico, j…
No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images
MACVO: Metrics-aware Covariance for Learning-based Stereo Visual Odometry
Efficient vision foundation models for high-resolution generation and perception.
Interactive Character Control with Auto-Regressive Motion Diffusion Models
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
[ECCV2024] IDM-VTON : Improving Diffusion Models for Authentic Virtual Try-on in the Wild
This is the project page for paper "A Simple Baseline for Efficient Hand Mesh Reconstruction, CVPR2024"
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Official Pytorch Implementation of Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
[3DV 2022] A semi-supervised method for improving generalization for 3D hand reconstruction.
Official Code Release for ECCV 2024 paper AvatarPose: Avatar-guided 3D Pose Estimation of Close Human Interaction from Sparse Multi-view Videos
DIAMOND (DIffusion As a Model Of eNvironment Dreams) is a reinforcement learning agent trained in a diffusion world model. NeurIPS 2024 Spotlight.
Code of Pyramidal Flow Matching for Efficient Video Generative Modeling
[NeurIPS 2024] VCR-GauS: View Consistent Depth-Normal Regularizer for Gaussian Surface Reconstruction
Perplexica is an AI-powered search engine. It is an Open source alternative to Perplexity AI