|
- Qwen-VL: A Versatile Vision-Language Model for Understanding . . .
In this work, we introduce the Qwen-VL series, a set of large-scale vision-language models (LVLMs) designed to perceive and understand both texts and images Starting from the Qwen-LM as a
- LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation
Remarkably, LLaVA-MoD-2B surpasses Qwen-VL-Chat-7B with an average gain of 8 8\%, using merely $0 3\%$ of the training data and 23\% trainable parameters The results underscore LLaVA-MoD's ability to effectively distill comprehensive knowledge from its teacher model, paving the way for developing efficient MLLMs
- Junyang Lin - OpenReview
Junyang Lin Pronouns: he him Principal Researcher, Qwen Team, Alibaba Group Joined July 2019
- Evaluating the Instruction-following Abilities of Language Models. . .
Qwen 3B model is better than Qwen 7B and 14B variants for the print_correct_answer_append_string instruction We consistently see 32B and 72B variants outperforming other models by a significant margin Distractors: We report details of how different model families (Llama, Qwen and Phi) are affected by distractors, at different scales
- Towards Federated RLHF with Aggregated Client Preference for LLMs
For example, our experiments demonstrate that the Qwen-2-0 5B selector provides strong performance enhancements to larger base models like Gemma-2B while ensuring computationally efficient This approach reduces the training burden for federated RLHF and broadens its applicability to resource-constrained scenarios
- MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context . . .
(a) Summary of Scientific Claims and Findings The paper presents MagicDec, a speculative decoding technique aimed at improving throughput and reducing latency for long-context Large Language Models (LLMs) It challenges the conventional understanding by demonstrating that speculative decoding can be effective even in high-throughput scenarios with large batch sizes and extended sequences
- ADIFF: Explaining audio difference using natural language
We evaluate our model using objective metrics and human evaluation and show our model enhancements lead to significant improvements in performance over naive baseline and SoTA Audio-Language Model (ALM) Qwen Audio
- Instance-adaptive Zero-shot Chain-of-Thought Prompting
Experiments conducted with LLaMA-2, LLaMA-3, and Qwen on math, logic, and commonsense reasoning tasks (e g , GSM8K, MMLU, Causal Judgement) obtain consistent improvement, demonstrating that the instance-adaptive zero-shot CoT prompting performs better than other task-level methods with some curated prompts or sophisticated procedures, showing
|
|
|