|
- DepthAnything Video-Depth-Anything - GitHub
This work presents Video Depth Anything based on Depth Anything V2, which can be applied to arbitrarily long videos without compromising quality, consistency, or generalization ability Compared with other diffusion-based models, it enjoys faster inference speed, fewer parameters, and higher consistent depth accuracy
- GitHub - MME-Benchmarks Video-MME: [CVPR 2025] Video-MME: The First . . .
We introduce Video-MME, the first-ever full-spectrum, M ulti- M odal E valuation benchmark of MLLMs in Video analysis It is designed to comprehensively assess the capabilities of MLLMs in processing video data, covering a wide range of visual domains, temporal durations, and data modalities
- Video-R1: Reinforcing Video Reasoning in MLLMs - GitHub
Our Video-R1-7B obtain strong performance on several video reasoning benchmarks For example, Video-R1-7B attains a 35 8% accuracy on video spatial reasoning benchmark VSI-bench, surpassing the commercial proprietary model GPT-4o
- 【EMNLP 2024 】Video-LLaVA: Learning United Visual . . . - GitHub
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection If you like our project, please give us a star ⭐ on GitHub for latest update 💡 I also have other video-language projects that may interest you Open-Sora Plan: Open-Source Large Video Generation Model
- GitHub - k4yt3x video2x: A machine learning-based video super . . .
A machine learning-based video super resolution and frame interpolation framework Est Hack the Valley II, 2018 - k4yt3x video2x
- Wan: Open and Advanced Large-Scale Video Generative Models
Wan: Open and Advanced Large-Scale Video Generative Models In this repository, we present Wan2 1, a comprehensive and open suite of video foundation models that pushes the boundaries of video generation Wan2 1 offers these key features:
- VideoLLM-online: Online Video Large Language Model for Streaming Video
Online Video Streaming: Unlike previous models that serve as offline mode (querying responding to a full video), our model supports online interaction within a video stream It can proactively update responses during a stream, such as recording activity changes or helping with the next steps in real time
- Video-T1: Test-Time Scaling for Video Generation - GitHub
Video-T1: We present the generative effects and performance improvements of video generation under test-time scaling (TTS) settings The videos generated with TTS are of higher quality and more consistent with the prompt than those generated without TTS
|
|
|