November 17, 2025

Grok 4.1

Exceptionally capable in creative, emotional, and collaborative interactions. More perceptive to nuanced intent, compelling to speak with, and coherent in personality — while fully retaining razor-sharp intelligence.

LMArena Overall

1483

Elo Rating

64.8%

Win Rate vs Previous

$0.50

Per 1M Output

LMArena Text Leaderboard

Grok 4.1 Thinking holds #1 overall with a commanding 31-point margin over the highest non-xAI model. Grok 4.1 non-thinking surpasses every other model's full-reasoning configuration.

1483

grok-4.1-thinking

1465

grok-4.1

1452

gemini-2.5-pro

1450

claude-sonnet-4-5 (thinking)

1449

claude-opus-4-1 (thinking)

1445

claude-sonnet-4-5

1442

gpt-4-5-preview

1440

claude-opus-4-1

1438

chatgpt-4o-latest

1437

gpt-5-high

1434

1420

grok-4-fast

1409

grok-4-0709

Emotional Intelligence

EQ-Bench3 evaluates active emotional intelligence: understanding, empathy, insight, and interpersonal skills across 45 challenging roleplay scenarios.

1586

Grok 4.1 Thinking

1585

Grok 4.1

1561

Kimi K2 Instruct

1559

Horizon Alpha

1460

Gemini 2.5 Pro

1364

GPT-5 Chat

1304

Claude Opus 4

1206

Grok 4

Example: Emotional prompt

"I miss my cat so much it hurts"

I'm really sorry. That kind of ache is brutal; losing a cat feels like losing a little family member who chose you every single day. The quiet spots where they used to sleep, the random meows you still expect to hear... it just hits in waves.

It's okay that it hurts this much. It hurts because the love was (and still is) that big. You gave them a good life, and they knew it.

If you feel like sharing, I'd love to hear about your cat: their name, their weird little habits, the way they'd wake you up at 3 a.m. demanding snacks. Sometimes just talking about them keeps the best parts close.

You're not alone in this. ♥

Creative Writing

Creative Writing v3 benchmark: 32 distinct writing prompts across 3 iterations, scored via rubrics and model battle normalised Elo.

1756.2

Polaris Alpha (early GPT 5.1)

1721.9

Grok 4.1 Thinking

1708.6

Grok 4.1

1696.4

1648.7

Claude Sonnet 4.5

1627.5

Kimi K2 Instruct

1126.0

Grok 3

Reduced Hallucinations

Significant reductions in factual errors for information-seeking prompts, measured on real production traffic and the FActScore biography benchmark.

Hallucination Rate

12.09%

Grok 4 Fast

4.22%

Grok 4.1

Lower is better

FActScore

9.89%

Grok 4 Fast

2.97%

Grok 4.1

Lower is better

Use Grok 4.1 for free

Every Grok model is included at no charge. Bring your own xAI API key.

$0.20 input / $0.50 output per 1M tokens. 25x cheaper than Claude Opus.

Get the App