Overall performance on FutureOmni. Dedicated exclusively to assessing whether models can predict future states based on audio-visual causal logic.
| # | Model | LLM Size | Modality | Cartoon | Education | Emergency | Surveillance | Dailylife | Movie | Game | Documentary | Average |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Gemini 3 Flash
|
- | A+V | 62.71 | 75.00 | 58.70 | 80.28 | 68.75 | 59.03 | 65.06 | 53.47 | 64.80 | |
|
Gemini 2.5 Pro
|
- | A+V | 49.15 | 75.00 | 54.35 | 69.01 | 62.50 | 51.54 | 65.06 | 46.53 | 57.93 | |
|
Gemini 2.5 Flash
|
- | A+V | 50.85 | 70.00 | 47.83 | 59.15 | 58.59 | 51.54 | 60.24 | 50.00 | 55.61 | |
|
Qwen 3 Omni
Alibaba |
30B | A+V | 52.94 | 68.00 | 32.88 | 62.71 | 59.05 | 45.60 | 62.65 | 49.25 | 53.05 | |
| Claude Haiku 4.5
Anthropic |
- | A+V | 55.08 | 66.00 | 44.57 | 57.04 | 51.56 | 48.90 | 57.83 | 41.67 | 52.03 | |
| GPT-4o
OpenAI |
- | V | 44.06 | 65.00 | 34.78 | 57.74 | 52.34 | 50.22 | 51.80 | 36.11 | 49.70 | |
| Qwen3-VL
Alibaba |
30B | V | 41.88 | 66.00 | 43.48 | 59.15 | 53.12 | 41.85 | 61.45 | 39.58 | 49.32 | |
|
MiniCPM-o 2.6
OpenBMB |
8B | A+V | 48.72 | 63.00 | 43.48 | 59.15 | 50.00 | 41.85 | 62.65 | 36.11 | 49.08 | |
|
Ola
Tsinghua & Tencent & NTU |
7B | A+V | 44.44 | 62.00 | 42.39 | 64.08 | 47.66 | 41.41 | 59.04 | 37.50 | 48.54 | |
|
Qwen 2.5 Omni
Alibaba |
7B | A+V | 47.86 | 55.00 | 35.87 | 59.86 | 48.44 | 40.09 | 61.45 | 40.28 | 47.48 | |
|
video-SALMONN 2+ 7B
Tsinghua & ByteDance |
7B | A+V | 50.43 | 61.00 | 39.13 | 55.63 | 52.34 | 40.09 | 54.22 | 33.33 | 47.00 | |
|
VideoLLaMA3
Alibaba |
7B | V | 42.74 | 59.00 | 33.70 | 58.16 | 42.97 | 43.61 | 67.47 | 35.66 | 46.80 | |
|
video-SALMONN 2 7B
Tsinghua & ByteDance |
7B | A+V | 43.59 | 55.00 | 39.13 | 57.04 | 48.44 | 40.97 | 57.83 | 34.72 | 46.03 | |
| Qwen3-VL
Alibaba |
7B | V | 39.32 | 64.00 | 34.78 | 58.45 | 48.44 | 38.33 | 57.83 | 36.11 | 45.84 | |
| Qwen2.5-VL
Alibaba |
7B | V | 43.59 | 58.00 | 30.43 | 52.82 | 48.44 | 37.00 | 53.01 | 34.72 | 43.71 | |
|
VideoLLaMA2
Alibaba |
7B | A+V | 43.59 | 47.00 | 29.35 | 53.52 | 40.62 | 32.60 | 57.83 | 31.94 | 40.75 | |
|
LLaVA-NeXT
Wisconsin-Madison & Microsoft & ByteDance |
7B | V | 43.59 | 49.00 | 31.52 | 49.30 | 35.94 | 38.33 | 50.60 | 31.94 | 40.62 | |
|
Qwen 2.5 Omni
Alibaba |
3B | A+V | 37.61 | 51.00 | 29.35 | 57.75 | 35.94 | 32.16 | 51.81 | 25.00 | 38.91 | |
|
Video-LLaVA
Peking & Peng Cheng & PandaVilla |
7B | V | 39.32 | 47.00 | 33.70 | 41.55 | 42.19 | 32.16 | 44.58 | 29.86 | 37.72 | |
|
AVicuna
Rochester & Sony |
7B | A+V | 31.62 | 39.00 | 26.09 | 35.21 | 32.81 | 28.19 | 33.73 | 20.83 | 30.37 |