[논문 리뷰] TULIP: Token-length Upgraded CLIP
TULIP: Token-length Upgraded CLIP ICLR 2025 Poser Ivona Najdenkoska, Mohammad Mahdi Derakhshani, Yuki M. Asano, Nanne van Noord, Marcel Worring, Cees G. M. Snoek University of Amsterdam [pape...
TULIP: Token-length Upgraded CLIP ICLR 2025 Poser Ivona Najdenkoska, Mohammad Mahdi Derakhshani, Yuki M. Asano, Nanne van Noord, Marcel Worring, Cees G. M. Snoek University of Amsterdam [pape...
DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers CVPR 2022 Xianing Chen, Qiong Cao, Yujie Zhong, Jing Zhang, Shenghua Gao, Dacheng Tao ShanghaiTech University, JD Ex...
Expertized Caption Auto-Enhancement for Video-Text Retrieval arxiv, 5 Feb 2025 Junxiang Chen, Wenbin Yao, Baoyao Yang WeChat, Tencent, Guandong University of Technology [paper] 1. Abstract &...
Unified Lexical Representation for Interpretable Visual-Language Alignment NeurIPS 2024 Poster Yifan Li, Yikai Wang, Yanwei Fu, Dongyu Ru, Zheng Zhang, Tong He Fudan University, Amazon Web Servi...
CLIPLoss and Norm-Based Data Selection Methods for Multimodal Contrastive Learning NeurIPS 2024 spotlight YipingWang, Yifang Chen, Wendan Yan, Alex Fang, Wenjing Zhou, Kevin Jamieson, Simon Shao...
Decoupled Knowledge Distillation CVPR 2022 Borui Zhao, Quan Cu, Renjie Song, Yiyu Qiu, Jiajun Liang MEGVII Technology, Waseda University, Tsinghua University [paper] [github] Abstract 최근 KD방...
Are Diffusion Models Vision-And-Language Reasoners? Neurips 2023 Benno Krojer, Elinor Poole-Dayan, Vikram Voleti, Christopher Pal, Siva Reddy Mila University, McGill University, Polytechnique Mon...
Reversed in Time: A Novel Temporal-Emphasized Benchmark for Cross-Modal Video-Text Retrieval ACM MM 2024 Yang Du, Yuqi Liu, Qin Jin Renmin University of China [paper] [supplementary] [github] ...
ATM: Action Temporality Modeling for Video Question Answering ACM MM 2023 Junwen Chen, Jie Zhu, Yu Kong Michigan State University [paper] 1. Abstract & Introduction 기존 연구 문제점 프레임간 ...