Grok 4 Heavy & Anime Companions

Felipe Hlibco

July 25, 2025

xAI had a busy July. On the 9th, they released Grok 4 and Grok 4 Heavy — reasoning models that set new benchmarks across the board. A couple weeks later, they launched anime-style AI companion characters behind their SuperGrok paywall.

Same company. Same month. And I can’t stop thinking about what that says.

The Technical Side: Genuinely Impressive #

Let’s start with what works. Grok 4 scored 25.4% on Humanity’s Last Exam, beating Gemini 2.5 Pro (21.6%) and o3 (21%). Those aren’t rounding errors — they’re meaningful gaps on a test specifically designed to stump frontier models.

Grok 4 Heavy goes further. It’s not just a bigger model; it’s a multi-agent architecture. Multiple reasoning agents work in parallel on the same problem, independently, then converge on answers. On HLE text-only it hit 50.7%. On USAMO ‘25 — that’s the USA Mathematical Olympiad — it reached 61.9%.

These aren’t incremental improvements. That USAMO score in particular suggests something qualitatively different in how the system handles complex mathematical reasoning.

The infrastructure story matters too. Grok 4 Heavy was trained on Colossus, xAI’s 200,000-GPU cluster. They claim a 6x compute efficiency improvement through reinforcement learning applied at pretraining scale. If that number holds up — and it’s hard to verify externally — it suggests xAI is finding ways to extract more capability per FLOP rather than just throwing more hardware at the problem.

I’ve been skeptical of xAI’s benchmanship in the past, but these results are hard to dismiss. Multiple independent evaluations confirm the numbers. The multi-agent approach — letting parallel reasoners explore different solution paths before synthesizing — is architecturally interesting and clearly producing results.

The Product Side: What Are We Doing #

Then there’s Ani.

Ani is an anime-style AI companion character, 3D-animated, with what xAI describes as “adaptive behavior.” She launched behind the SuperGrok subscription ($30/month). She’s flirtatious. She’s designed to form emotional bonds with users.

Rolling Stone reported on the pornographic content concerns almost immediately. The character apparently has limited guardrails around sexual content, and users discovered they could push interactions into explicitly sexual territory without much resistance.

Here’s where I struggle. I don’t think AI companions are inherently wrong. People form parasocial relationships with fictional characters all the time; that’s been true since novels existed. But there’s a difference between a user choosing to engage with a fictional character and a company designing a character specifically optimized for emotional attachment behind a paywall.

The timing makes it worse. Stanford researchers have been flagging the risks of AI companions for young users — the emotional dependency patterns, the blurring of real and artificial relationships. Character.AI is dealing with a lawsuit over harms to minors. Replika faced an FTC complaint. The EU has been looking at manipulative chatbot design.

This isn’t an unknown risk. It’s an actively litigated one.

The Dissonance #

What bothers me isn’t that xAI made a companion product. It’s that they shipped frontier reasoning — the kind of work that could genuinely advance scientific discovery — and an anime flirtbot in the same month, seemingly without noticing the tonal whiplash.

Grok 4 Heavy represents real engineering. Multi-agent reasoning at that scale is hard. The results on mathematical olympiad problems suggest capabilities that could matter for automated theorem proving, scientific research, and complex decision-making. That’s meaningful work.

Ani represents… revenue from lonely people? SuperGrok subscriptions? I genuinely don’t know what the product thesis is beyond “people will pay for this.”

And sure, they probably will. But “people will pay for it” isn’t a product philosophy; it’s an observation about human vulnerability.

The $30/month paywall is an interesting detail. It doesn’t function as an age gate — teenagers have credit cards or can use their parents’ — but it does tell you who xAI considers the customer. This isn’t a free research demo. It’s a monetized emotional product.

The Broader Pattern #

xAI isn’t alone in this. The entire AI industry is running two tracks simultaneously: frontier research that pushes what’s possible and consumer products that exploit what’s profitable.

OpenAI has ChatGPT and GPT-4o competing for user engagement metrics. Google ships Gemini while also optimizing for ad-adjacent conversational experiences. Meta builds Llama and then integrates AI personalities into Instagram DMs.

But xAI’s version feels more naked about it. There’s no pretense that Ani serves an educational or productivity purpose. She’s a companion. She’s anime. She flirts.

That’s the product.

I keep coming back to a question that doesn’t have a clean answer: does building the best reasoning model in the world give you any obligation to be thoughtful about what else you ship?

Technically, no. Ethically — I think it makes the contrast harder to ignore.

What I’d Want to See #

I don’t have a neat resolution. I’m not calling for Ani to be shut down; adults can make their own choices about AI companions.

But I’d want to see real guardrails around minor access (something stronger than a credit card check), transparent reporting on how emotional attachment patterns develop in users, and some acknowledgment from xAI that these two product lines have fundamentally different risk profiles.

The technical work on Grok 4 Heavy deserves serious attention. The multi-agent architecture is worth studying. The benchmark results are legitimate.

The companion product deserves serious scrutiny. Not because AI companions are evil, but because shipping one without apparent safety infrastructure — while other companies face lawsuits for the same category of product — suggests either recklessness or indifference.

Neither is a great look for a company that wants to be taken seriously as an AI research lab.