Multimodal Learning 14
- [논문 리뷰] Geodesic Multi-Modal Mixup for Robust Fine-Tuning
- [논문 리뷰] What to Align in Multimodal Contrastive Learning
- [논문 리뷰] Train Once, Deploy Anywhere: Matryoshka Representation Learning for Multimodal Recommendation
- [논문 리뷰] DrVideo: Document Retrieval Based Long Video Understanding
- [논문 리뷰] Discovering Clone Negatives via Adaptive Contrastive Learning for Image-Text Matching
- [논문 리뷰] Weighted Point Cloud Embedding for Multi-modal Contrastive Learning Toward Optimal Similarity Metric
- [논문 리뷰] Two Effects, One Trigger: On the Modality Gap, Object Bias, and Information Imbalance in Contrastive Vision-Language Models
- [논문 리뷰] TULIP: Token-length Upgraded CLIP
- [논문 리뷰] Expertized Caption Auto-Enhancement for Video-Text Retrieval
- [논문 리뷰] Unified Lexical Representation for Interpretable Visual-Language Alignment
- [논문 리뷰] CLIPLoss and Norm-Based Data Selection Methods for Multimodal Contrastive Learning
- [논문 리뷰] Are Diffusion Models Vision-And-Language Reasoners?
- [논문 리뷰] Reversed in Time: A Novel Temporal-Emphasized Benchmark for Cross-Modal Video-Text Retrieval
- [논문 리뷰] ATM: Action Temporality Modeling for Video Question Answering