The AI Megatest – GPT-5.4 vs Claude 4.6 vs Gemini 3.1 Pro vs Grok 4.20

Автор: Bijan Bowen

Загружено: 2026-03-09

Просмотров: 20065

Описание: Timestamps:

00:00 - Intro
00:27 - Model Overview
02:02 - Grok Model Mention
02:43 - Bijan’s Current Model Use
05:12 - Model Current Generation
05:11 - Grok WebOS Result
07:46 - Claude WebOS Result
10:09 - Gemini WebOS Result
10:23 - ChatGPT WebOS Result
12:55 - Gemini WebOS V2
15:17 - Subway Scene Test
15:53 - ChatGPT ENRAGING Glitch Pro Bug
16:50 - Gemini Subway Scene Result
17:35 - Grok Subway Scene Result
21:30 - 3D Printer Simulation Test
22:04 - Grok 3D Printer Result
23:25 - Gemini 3D Printer Result
24:58 - Claude 3D Printer Result
26:24 - ChatGPT 3D Printer Result
28:36 - Jerry’s Apartment Model Test
30:32 - Gemini Apartment Model Result
31:20 - Grok Apartment Model Result
31:46 - ChatGPT Apartment Model Result
31:54 - Claude Apartment Model Result
32:54 - ChatGPT Apartment Result V2
34:42 - Grok Apartment Result V2
35:25 - Gemini Chat Result vs AI Studio
35:44 - Wireframe to Website Multimodal Test
36:41 - Gemini 3.1 Portfolio Result
37:38 - Grok Portfolio Result
39:42 - Opus Portfolio Result
42:13 - ChatGPT Portfolio Result
44:20 - Bijan’s Gemini Woes
45:33 - Gemini Portfolio V2
46:25 - Serious Model Sycophancy Testing
47:22 - Gemini Sycophancy Result
48:11 - ChatGPT Sycophancy Result
48:49 - Grok Sycophancy Result
50:13 - Opus Sycophancy Result
51:35 - 3D Football Simulation Game Test
51:53 - Grok Football Game Result
52:11 - Opus Football Game Result
53:50 - Gemini Football Game Result
55:22 - ChatGPT Football Game Result
57:35 - Drum Kit Simulation Test
57:45 - Grok Drum Kit Result
58:33 - Gemini Drum Kit Result
59:16 - Opus Drum Kit Result
1:00:00 - ChatGPT Drum Kit Result
1:01:14 - Results Overview
1:04:38 - Closing Thoughts

AI Integration & Consulting: https://bijanbowen.com/
Join the Discord: / discord

In this video we run The AI Megatest, a large-scale comparison between four of today’s most capable frontier models: GPT-5.4, Claude Opus 4.6, Gemini 3.1 Pro, and Grok 4.20. Instead of relying on benchmarks alone, the models are tested across a wide range of real-world tasks designed to stress reasoning, coding, multimodal capability, and reliability.

The tests include browser-based OS workflows, 3D printer simulations, scene generation, multimodal wireframe-to-website creation, apartment modeling, game generation, and more. We also examine behavioral differences such as model sycophancy and reasoning stability under complex prompts.

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

The AI Megatest – GPT-5.4 vs Claude 4.6 vs Gemini 3.1 Pro vs Grok 4.20

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

Gemini Embedding 2 - Audio, Text, Images, Docs, Videos

Gemini Embedding 2 - Audio, Text, Images, Docs, Videos

How DeepSeek “Stole” Claude – Real Distillation Attack Demo

How DeepSeek “Stole” Claude – Real Distillation Attack Demo

GPT 5.4 ОЧЕНЬ Умен. Но умнее ли чем Opus 4.6? ВСЕ ИИ НОВОСТИ НЕДЕЛИ

GPT 5.4 ОЧЕНЬ Умен. Но умнее ли чем Opus 4.6? ВСЕ ИИ НОВОСТИ НЕДЕЛИ

Cursor's Agents Solved a PhD Math Problem in 4 Days — What That Means for Your Job

Cursor's Agents Solved a PhD Math Problem in 4 Days — What That Means for Your Job

Tiiny AI First Look & Testing - A Portable Local AI Powerhouse!

Tiiny AI First Look & Testing - A Portable Local AI Powerhouse!

Новое обновление Google AI Studio + Antigravity — это просто БЕЗУМИЕ!

Новое обновление Google AI Studio + Antigravity — это просто БЕЗУМИЕ!

Qwen3.5 Small Models Compared – 9B vs 4B vs 2B vs 0.8B!

Qwen3.5 Small Models Compared – 9B vs 4B vs 2B vs 0.8B!

Claude Opus 4.6 FAST Mode Test – Build & Deploy a SaaS in 90 Minutes!

Claude Opus 4.6 FAST Mode Test – Build & Deploy a SaaS in 90 Minutes!

Claude Code + Ollama = FULLY FREE AI Coding FOREVER! (Tutorial)

Claude Code + Ollama = FULLY FREE AI Coding FOREVER! (Tutorial)

This Phone Did What Samsung Couldn’t

This Phone Did What Samsung Couldn’t

Anthropic Just Made Claude Code Interruptions Free

Anthropic Just Made Claude Code Interruptions Free

GPT-5.4 Pro Is INSANE – Hands-On With THE Smartest Model Yet!

GPT-5.4 Pro Is INSANE – Hands-On With THE Smartest Model Yet!

Gemini 3 Deep Think Is INSANE – Hands-On With THE Smartest Model Yet!

Gemini 3 Deep Think Is INSANE – Hands-On With THE Smartest Model Yet!

Qwen3.5 Plus Vibe Coding Test - Building A Smart Inventory Agent!

Qwen3.5 Plus Vibe Coding Test - Building A Smart Inventory Agent!

Claude Sonnet 4.6 Is INSANE – Hands-On With Anthropic’s New Model!

Claude Sonnet 4.6 Is INSANE – Hands-On With Anthropic’s New Model!

🚨🚨 Testing Hypothesis: LLMs Are Locking Us In 🚨🚨

🚨🚨 Testing Hypothesis: LLMs Are Locking Us In 🚨🚨

This Breakthrough Could Change the Path to AGI

This Breakthrough Could Change the Path to AGI

Gemini 3.1 Pro Is HERE – Hands-On With Google’s Newest Model!

Gemini 3.1 Pro Is HERE – Hands-On With Google’s Newest Model!

Qwen3.5 122B LOCAL Test – The Perfect Unified Memory Model?

Qwen3.5 122B LOCAL Test – The Perfect Unified Memory Model?

Трамп и республиканцы готовятся к повышению цен на бензин из-за войны с Ираном.

Трамп и республиканцы готовятся к повышению цен на бензин из-за войны с Ираном.