|
- Qwen-VL: A Versatile Vision-Language Model for Understanding . . .
In this work, we introduce the Qwen-VL series, a set of large-scale vision-language models (LVLMs) designed to perceive and understand both texts and images Starting from the Qwen-LM as a
- LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation
Remarkably, LLaVA-MoD-2B surpasses Qwen-VL-Chat-7B with an average gain of 8 8\%, using merely $0 3\%$ of the training data and 23\% trainable parameters The results underscore LLaVA-MoD's ability to effectively distill comprehensive knowledge from its teacher model, paving the way for developing efficient MLLMs
- Q -VL: A VERSATILE V M FOR UNDERSTANDING, L ING AND EYOND QWEN-VL: A . . .
In this paper, we explore a way out and present the newest members of the open-sourced Qwen fam-ilies: Qwen-VL series Qwen-VLs are a series of highly performant and versatile vision-language foundation models based on Qwen-7B (Qwen, 2023) language model We empower the LLM base-ment with visual capacity by introducing a new visual receptor including a language-aligned visual encoder and a
- MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context . . .
Large Language Models (LLMs) have become more prevalent in long-context applications such as interactive chatbots, document analysis, and agent workflows, but it is challenging to serve long-context requests with low latency and high throughput Speculative decoding (SD) is a widely used technique to reduce latency losslessly, but the conventional wisdom suggests that its efficacy is limited
- Junyang Lin - OpenReview
Junyang Lin Pronouns: he him Principal Researcher, Qwen Team, Alibaba Group Joined July 2019
- LiveVQA: Assessing Models with Live Visual Knowledge
We introduce LiveVQA, an automatically collected dataset of latest visual knowledge from the Internet with synthesized VQA problems LiveVQA consists of 3,602 single- and multi-hop visual questions from 6 news websites across 14 news categories, featuring high-quality image-text coherence and authentic information Our evaluation across 15 MLLMs (e g , GPT-4o, Gemma-3, and Qwen-2 5-VL family
- Towards Understanding Distilled Reasoning Models: A. . .
To explore this, we train a crosscoder on Qwen-series models and their fine-tuned variants Our results suggest that the crosscoder learns features corresponding to various types of reasoning, including self-reflection and computation verification
- Exchange of Perspective Prompting Enhances Reasoning in Large. . .
Large language models (LLMs) have made significant advancements in addressing diverse natural language processing (NLP) tasks However, their performance is often limited by inherent comprehension
|
|
|