文献索引号: https://doi.org/10.48550/arXiv.2311.10122 Video-LLaVA Learning United Visual Representation by Alignment Before Projection下载 微信扫描下方的二维码阅读本文 文章导航 [文献CS-LLM-EN-20241204]Video LLMs for Temporal Reasoning in Long Videos