|
- Qwen-VL: A Versatile Vision-Language Model for Understanding . . .
In this work, we introduce the Qwen-VL series, a set of large-scale vision-language models (LVLMs) designed to perceive and understand both texts and images Starting from the Qwen-LM as a
- LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation
Superior Performance: LLaVA-MoD surpasses larger models like Qwen-VLChat-7B in various benchmarks, demonstrating the effectiveness of its knowledge distillation approach
- MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context . . .
(a) Summary of Scientific Claims and Findings The paper presents MagicDec, a speculative decoding technique aimed at improving throughput and reducing latency for long-context Large Language Models (LLMs) It challenges the conventional understanding by demonstrating that speculative decoding can be effective even in high-throughput scenarios with large batch sizes and extended sequences
- Q -VL: A VERSATILE V M FOR UNDERSTANDING, L ING AND EYOND QWEN-VL: A . . .
In this paper, we explore a way out and present the newest members of the open-sourced Qwen fam-ilies: Qwen-VL series Qwen-VLs are a series of highly performant and versatile vision-language foundation models based on Qwen-7B (Qwen, 2023) language model We empower the LLM base-ment with visual capacity by introducing a new visual receptor including a language-aligned visual encoder and a
- Gated Attention for Large Language Models: Non-linearity, Sparsity,. . .
Gating mechanisms have been widely utilized, from early models like LSTMs and Highway Networks to recent state space models, linear attention, and also softmax attention Yet, existing literature
- Towards Understanding Distilled Reasoning Models: A. . .
To explore this, we train a crosscoder on Qwen-series models and their fine-tuned variants Our results suggest that the crosscoder learns features corresponding to various types of reasoning, including self-reflection and computation verification
- Towards Interpretable Time Series Foundation Models - OpenReview
Leveraging a synthetic dataset of mean-reverting time series with systematically varied trends and noise levels, we generate natural language annotations using a large multimodal model and use these to supervise the fine-tuning of compact \texttt {Qwen} models
- LiveVQA: Assessing Models with Live Visual Knowledge
We introduce LiveVQA, an automatically collected dataset of latest visual knowledge from the Internet with synthesized VQA problems LiveVQA consists of 3,602 single- and multi-hop visual questions from 6 news websites across 14 news categories, featuring high-quality image-text coherence and authentic information Our evaluation across 15 MLLMs (e g , GPT-4o, Gemma-3, and Qwen-2 5-VL family
|
|
|