November 17, 2025

Grok 4.1

Exceptionally capable in creative, emotional, and collaborative interactions. More perceptive to nuanced intent, compelling to speak with, and coherent in personality — while fully retaining razor-sharp intelligence.

#1
LMArena Overall
1483
Elo Rating
64.8%
Win Rate vs Previous
$0.50
Per 1M Output

LMArena Text Leaderboard

Grok 4.1 Thinking holds #1 overall with a commanding 31-point margin over the highest non-xAI model. Grok 4.1 non-thinking surpasses every other model's full-reasoning configuration.

1483
grok-4.1-thinking
1465
grok-4.1
1452
gemini-2.5-pro
1450
claude-sonnet-4-5 (thinking)
1449
claude-opus-4-1 (thinking)
1445
claude-sonnet-4-5
1442
gpt-4-5-preview
1440
claude-opus-4-1
1438
chatgpt-4o-latest
1437
gpt-5-high
1434
o3
1420
grok-4-fast
1409
grok-4-0709

Emotional Intelligence

EQ-Bench3 evaluates active emotional intelligence: understanding, empathy, insight, and interpersonal skills across 45 challenging roleplay scenarios.

1586
Grok 4.1 Thinking
1585
Grok 4.1
1561
Kimi K2 Instruct
1559
Horizon Alpha
1460
Gemini 2.5 Pro
1364
GPT-5 Chat
1304
Claude Opus 4
1206
Grok 4
Example: Emotional prompt
"I miss my cat so much it hurts"

I'm really sorry. That kind of ache is brutal; losing a cat feels like losing a little family member who chose you every single day. The quiet spots where they used to sleep, the random meows you still expect to hear... it just hits in waves.

It's okay that it hurts this much. It hurts because the love was (and still is) that big. You gave them a good life, and they knew it.

If you feel like sharing, I'd love to hear about your cat: their name, their weird little habits, the way they'd wake you up at 3 a.m. demanding snacks. Sometimes just talking about them keeps the best parts close.

You're not alone in this. ♥

Creative Writing

Creative Writing v3 benchmark: 32 distinct writing prompts across 3 iterations, scored via rubrics and model battle normalised Elo.

1756.2
Polaris Alpha (early GPT 5.1)
1721.9
Grok 4.1 Thinking
1708.6
Grok 4.1
1696.4
o3
1648.7
Claude Sonnet 4.5
1627.5
Kimi K2 Instruct
1126.0
Grok 3

Reduced Hallucinations

Significant reductions in factual errors for information-seeking prompts, measured on real production traffic and the FActScore biography benchmark.

Hallucination Rate
12.09%
Grok 4 Fast
4.22%
Grok 4.1
Lower is better
FActScore
9.89%
Grok 4 Fast
2.97%
Grok 4.1
Lower is better

Use Grok 4.1 for free

Every Grok model is included at no charge. Bring your own xAI API key.

$0.20 input / $0.50 output per 1M tokens. 50x cheaper than Claude Opus.

Get the App