copy and paste this google map to your website or blog!
Press copy button and paste into your blog or website.
(Please switch to 'HTML' mode when posting into your blog. Examples: WordPress Example, Blogger Example)
OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for . . . OmniSpatial covers four major categories: dynamic reasoning, complex spatial logic, spatial interaction, and perspective-taking, with 50 fine-grained subcategories Through careful manual annotation, we construct over 8 4K question-answer pairs
Mengdi Jia We present OmniSpatial, a benchmark for evaluating spatial reasoning abilities in vision-language models It covers four major categories and 50+ subtypes, totaling over 1 5K QA pairs
arXiv:2506. 03135v1 [cs. CV] 3 Jun 2025 - ResearchGate • We develop the OmniSpatial dataset, which offers a diverse and challenging set of spatial tasks, serving as a comprehensive benchmark for assessing VLMs’ spatial reasoning capabilities
OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for . . . OmniSpatial diagnoses the limits of today's vision-language models (VLMs) on higher-order spatial cognition It spans 50 fine-grained tasks grouped into 4 dimensions —dynamic reasoning, complex spatial logic, spatial interaction and perspective-taking
OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for . . . OmniSpatial is introduced, a comprehensive and challenging benchmark for spatial reasoning, grounded in cognitive psychology, that covers four major categories: dynamic reasoning, complex spatial logic, spatial interaction, and perspective-taking, with 50 fine-grained subcategories
OmniSpatial: VLM Spatial Reasoning Benchmark The paper presents OmniSpatial, a novel benchmark that challenges vision-language models by assessing spatial reasoning across 1,500 curated Q A pairs in four dimensions
OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for . . . Extensive experiments show that both open- and closed-source VLMs, as well as existing reasoning and spatial understanding models, exhibit significant limitations in comprehensive spatial understanding We further analyze failure cases and propose potential directions for future research
OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for . . . Vision-Language Models (VLMs) excel at identifying and describing objects but struggle with spatial reasoning <n>Inspired by the dual-pathway (ventral-dorsal) model of human vision, we investigate why VLMs fail spatial tasks despite strong object recognition capabilities