The AI Megatest – GPT-5.4 vs Claude 4.6 vs Gemini 3.1 Pro vs Grok 4.20
Автор: Bijan Bowen
Загружено: 2026-03-09
Просмотров: 20065
Описание:
Timestamps:
00:00 - Intro
00:27 - Model Overview
02:02 - Grok Model Mention
02:43 - Bijan’s Current Model Use
05:12 - Model Current Generation
05:11 - Grok WebOS Result
07:46 - Claude WebOS Result
10:09 - Gemini WebOS Result
10:23 - ChatGPT WebOS Result
12:55 - Gemini WebOS V2
15:17 - Subway Scene Test
15:53 - ChatGPT ENRAGING Glitch Pro Bug
16:50 - Gemini Subway Scene Result
17:35 - Grok Subway Scene Result
21:30 - 3D Printer Simulation Test
22:04 - Grok 3D Printer Result
23:25 - Gemini 3D Printer Result
24:58 - Claude 3D Printer Result
26:24 - ChatGPT 3D Printer Result
28:36 - Jerry’s Apartment Model Test
30:32 - Gemini Apartment Model Result
31:20 - Grok Apartment Model Result
31:46 - ChatGPT Apartment Model Result
31:54 - Claude Apartment Model Result
32:54 - ChatGPT Apartment Result V2
34:42 - Grok Apartment Result V2
35:25 - Gemini Chat Result vs AI Studio
35:44 - Wireframe to Website Multimodal Test
36:41 - Gemini 3.1 Portfolio Result
37:38 - Grok Portfolio Result
39:42 - Opus Portfolio Result
42:13 - ChatGPT Portfolio Result
44:20 - Bijan’s Gemini Woes
45:33 - Gemini Portfolio V2
46:25 - Serious Model Sycophancy Testing
47:22 - Gemini Sycophancy Result
48:11 - ChatGPT Sycophancy Result
48:49 - Grok Sycophancy Result
50:13 - Opus Sycophancy Result
51:35 - 3D Football Simulation Game Test
51:53 - Grok Football Game Result
52:11 - Opus Football Game Result
53:50 - Gemini Football Game Result
55:22 - ChatGPT Football Game Result
57:35 - Drum Kit Simulation Test
57:45 - Grok Drum Kit Result
58:33 - Gemini Drum Kit Result
59:16 - Opus Drum Kit Result
1:00:00 - ChatGPT Drum Kit Result
1:01:14 - Results Overview
1:04:38 - Closing Thoughts
AI Integration & Consulting: https://bijanbowen.com/
Join the Discord: / discord
In this video we run The AI Megatest, a large-scale comparison between four of today’s most capable frontier models: GPT-5.4, Claude Opus 4.6, Gemini 3.1 Pro, and Grok 4.20. Instead of relying on benchmarks alone, the models are tested across a wide range of real-world tasks designed to stress reasoning, coding, multimodal capability, and reliability.
The tests include browser-based OS workflows, 3D printer simulations, scene generation, multimodal wireframe-to-website creation, apartment modeling, game generation, and more. We also examine behavioral differences such as model sycophancy and reasoning stability under complex prompts.
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: