copy and paste this google map to your website or blog!
Press copy button and paste into your blog or website.
(Please switch to 'HTML' mode when posting into your blog. Examples: WordPress Example, Blogger Example)
Long-CLIP: Unlocking the Long-Text Capability of CLIP To this end, we propose Long-CLIP as a plug-and-play alternative to CLIP that supports long-text input, retains or even surpasses its zero-shot generalizability, and aligns the CLIP latent space, making it readily replace CLIP without any further adaptation in downstream frameworks
arXiv. org e-Print archive This paper explores pre-training models for learning state-of-the-art image representations using natural language captions paired with images
[2411. 04997] LLM2CLIP: Powerful Language Model Unlocks Richer Visual . . . Motivated by the remarkable advancements in large language models (LLMs), this work explores how LLMs' superior text understanding and extensive open-world knowledge can enhance CLIP's capability, especially for processing longer and more complex image captions
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters Scaling up contrastive language-image pretraining (CLIP) is critical for empowering both vision and multimodal models We present EVA-CLIP-18B, the largest and most powerful open-source CLIP model to date, with 18-billion parameters
Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese In this work, we construct a large-scale dataset of image-text pairs in Chinese, where most data are retrieved from publicly available datasets, and we pretrain Chinese CLIP models on the new dataset