AI

5 Mar 2026

Scaling AI: Simplicity Over Better Models

Last month I sat in a room with a dozen engineering leads from companies running AI in production. Not demos, not prototypes—actual revenue-generating workloads. I asked what their biggest bottleneck was.

Not one person said “model quality.”

Every answer was some version of: our data is a mess, our infrastructure can’t keep up, or we’re burning GPU budget on workflows that don’t need it. A CockroachDB survey of over 1,000 tech leaders backs this up; AI workloads scale faster than the systems underneath them can adapt. And yet the industry conversation stays fixated on the next frontier release. It’s maddening, honestly.

31 Dec 2025

2025's Radical Frontend Shift: Rise of Agent Runners

A year ago, the AI-in-frontend conversation went roughly like this: Copilot writes your React components, ChatGPT scaffolds your Next.js pages, Cursor auto-completes your CSS. The AI was a faster keyboard. That was the whole pitch.

That’s not where 2025 ended up.

What actually happened was stranger — and I think more consequential. AI agents started living inside frontend architectures. Not as code generators that hand off to a human for the last mile, but as autonomous actors that monitor production, diagnose breakage, and ship fixes without waiting for anyone to approve a pull request.

13 Nov 2025

Supercharging ML and AI Dev Experience at Netflix

Every ML engineer I know has the same complaint. Notebooks feel great for exploration but terrible for production. Production pipelines feel great for reliability but terrible for iteration. Pick one, then spend the rest of your week fighting whichever you didn’t pick.

Netflix just shipped something that might actually fix this. Or at least make the fight less painful.

Spin: The Missing Piece in Metaflow #

Metaflow 2.19 introduced Spin, and honestly, it’s the kind of feature that makes you wonder why nobody built it sooner. The idea is dead simple: take a single @step in your production pipeline, pull it out, run it locally with full state from the parent step. Notebook-style iteration, but inside your actual production DAG.

8 Oct 2025

Right-Sizing AI for the Edge: Power and Security Focus

There’s a default assumption in the AI industry that bigger wins. More parameters, larger context windows, heavier compute. For many tasks, that holds. Complex reasoning, multi-step planning, fine-grained code generation: those benefit from frontier-scale models.

But a huge chunk of real-world inference doesn’t need any of that.

Classifying a support ticket? Detecting anomalous sensor readings? Running intent recognition on a phone? Shipping 405 billion parameters to answer “is this a cat?” is not engineering. That’s waste.

25 Jul 2025

Grok 4 Heavy & Anime Companions

xAI had a busy July. On the 9th, they released Grok 4 and Grok 4 Heavy — reasoning models that set new benchmarks across the board. A couple weeks later, they launched anime-style AI companion characters behind their SuperGrok paywall.

Same company. Same month. And I can’t stop thinking about what that says.

The Technical Side: Genuinely Impressive #

Let’s start with what works. Grok 4 scored 25.4% on Humanity’s Last Exam, beating Gemini 2.5 Pro (21.6%) and o3 (21%). Those aren’t rounding errors — they’re meaningful gaps on a test specifically designed to stump frontier models.

16 Jul 2025

Collaborative Retrieval for Conversational RecSys

Recommender systems have a split personality problem.

On one side, you’ve got LLMs that can hold a conversation — parse nuance, understand when someone says “something like Inception but weirder” and actually get it. On the other, you’ve got collaborative filtering: decades of behavioral data showing that users who liked X also liked Y. Both are powerful. Neither talks to the other.

CRAG fixes that.

It’s a joint effort from University of Virginia’s VAST LAB, Cornell, and Netflix (published at WWW 2025). CRAG stands for Collaborative Retrieval Augmented Generation — the first conversational recommender system that actually combines LLM context understanding with collaborative filtering retrieval. Not in a hand-wavy “we use both” sense; in a structured, two-step mechanism that pulls collaborative filtering knowledge into the LLM’s prompt at inference time.

8 Apr 2025

AI Software Delivery: Collaborative Intent Articulation

Something changed in the past six months. I don’t think we’ve fully processed it yet.

I’ve been writing code professionally for eighteen years. For all of those years, one thing stayed constant: the code was the source of truth. Requirements documents got stale. Design specs drifted. Jira tickets turned into archaeological artifacts. But the code? The code was always right — because the code was what actually ran.

That assumption is breaking down.

26 Jan 2025

SEMR Report 2025: AI Moves from Experimental to Essential

Jellyfish just published their 2025 State of Engineering Management Report, and one number jumped off the page: 90% of engineering teams now use AI coding tools. A year ago, that figure was 61%. Only 3% of respondents said they have no plans to adopt.

The adoption question is settled. The measurement question—wide open.

The Adoption Curve Collapsed #

I’ve managed engineering teams through several technology shifts—containerization at TaskRabbit, cloud migration at earlier companies—and none moved this fast. From “should we try Copilot?” to “everyone’s using something” in twelve months. GitHub Copilot leads at 42% adoption, but here’s the part that surprised me: 48% of teams report using two or more AI coding tools simultaneously.

3 Jan 2025

Ethical Standards and Trust in AI-driven CX

A friend of mine—let’s call her Clara—called her bank last week, spending twelve minutes explaining a billing issue to an agent that grew more confusing by the minute. Patient responses. Thorough answers. Technically accurate. Also not human.

She found out only after asking directly. The disclosure felt less like a notification and more like a caught-out confession.

She switched banks.

Extreme? Maybe on the surface. But the reaction had nothing to do with the chatbot’s quality—which was apparently fine. Deception drove it. Having an emotional conversation with something that wore a human mask without warning.

8 Dec 2024

Vishing Attacks Increase by 442% via AI Cloning

Three seconds. That’s all a modern voice cloning model needs—just three seconds of your voice—to produce a replica convincing enough to fool your CFO, your IT helpdesk, or your mom.

I’ve been tracking this space since my DreamFlare days, when we were building entertainment products with generative AI. The speed at which offensive tooling has matured? Genuinely unsettling.

The numbers back it up. CrowdStrike’s threat intelligence team documented a 442% increase in voice phishing (vishing) attacks between the first and second halves of 2024. Not a typo. Four hundred and forty-two percent.

7 Nov 2024

OSI Releases Version 1.0 of Open Source AI Definition

Back in September I wrote about the headache of defining “open source” for AI models. The Open Source Initiative has now published their answer—OSAID v1.0, released October 28 at the All Things Open conference in Raleigh. I’ve spent the last ten days reading the definition, the endorsements, the criticism, and the reaction from companies whose models don’t qualify.

My verdict? It’s a necessary compromise that will make some people furious and make everyone’s procurement conversations slightly less painful.

24 Sep 2024

Strategic Restructuring for AI-Centric Operations

Here’s a stat that should keep every CTO up at night: 92% of enterprises invest in AI, but only 1% have achieved scaled impact across their operations. Gartner estimates that 40-85% of AI projects never make it from proof of concept to production.

Those numbers aren’t a technology failure. They’re an organizational failure. And I’ve seen it firsthand.

The PoC Graveyard #

Every company I’ve worked with in the last year has an AI proof of concept. Most have several. They demo well. Leadership gets excited. And then nothing happens.

12 Sep 2024

Defining Open Source AI: Solving a Million Headaches

Last month I burned two days evaluating “open source” models for a production use case at DreamFlare. By the end I was more confused than when I started — not about the models themselves, but about what “open source” even means anymore.

Traditional open source is straightforward: you get the source code, you can modify it, you can redistribute it. That definition has been settled for decades. But AI models aren’t source code. They’re trained artifacts; the “source” is really the training data, the training code, the hyperparameters, and the weights. Calling a model “open source” because you released the weights is like calling a compiled binary “open source” because you published the .exe.

19 Jun 2024

AI-Driven Attrition in Data-Heavy Support Roles

The AI job displacement everyone warned about? It’s not happening the way people imagined. No dramatic announcements. No factory floors going dark. No headlines about a hundred thousand people being replaced by a single model.

Instead, someone on the customer support team quits and the position doesn’t get backfilled. A market research analyst retires and the team absorbs the work using GPT-4. A data entry contractor’s engagement ends and nobody renews it. The org chart shrinks by one, then two, then five — and nobody calls it a layoff because technically, nobody was fired.

31 Jul 2023

Llama 2: Why Local Inference in C Matters for Node Devs

Two weeks ago Meta released Llama 2 with a commercial license. That alone was significant — the first truly open large language model you could legally ship in a product. But the thing that got me out of my chair was what Andrej Karpathy did with it eight days later.

He wrote Llama 2 inference in ~500 lines of pure C. No libraries. No frameworks. No PyTorch, no CUDA, no nothing. Just C and math. The repo is called llama2.c, and it runs the 7B parameter model at about 18 tokens per second on an M1 MacBook Air.

27 May 2023

Generative AI: Cognitive Industrial Revolution

The Industrial Revolution mechanized physical labor. Steam engines replaced muscle. Factories replaced workshops. The economic transformation took decades, displaced millions, and ultimately created more wealth and more jobs than the systems it replaced.

I think we’re at the start of something equivalent for cognitive labor. And unlike the original, this one is moving on a timeline measured in years — not generations.

The Numbers #

McKinsey’s latest analysis projects that generative AI could add $2.6 to $4.4 trillion in annual value to the global economy. To put that in perspective, the UK’s entire GDP is roughly $3.1 trillion. We’re talking about a technology whose economic impact — by McKinsey’s estimate — is comparable to adding another G7 economy to the world.

11 Mar 2023

Conversational AI Market Projections through 2030

I’ve spent the past two weeks drowning in market research reports on conversational AI. Here’s my takeaway before we even look at the numbers: they’re guessing. Sure, it’s sophisticated guessing—proprietary survey data, impressive methodologies, models with Greek letters. But still guessing. ChatGPT dropped three and a half months ago and rearranged the entire landscape. Any projection built before November 2022? It’s working from assumptions that no longer hold.

That caveat out of the way, let me show you what the guesses look like.

1 Jan 2023

LLMs as Forensic Architects for Architecture Discovery

It’s New Year’s Day and I’m thinking about legacy code. Specifically, a conversation I had last week with a friend who just inherited a monolith. Two million lines of Java. The original architects left years ago. The documentation—such as it was—describes a system that no longer exists. The actual architecture is embedded in the code itself, visible only to someone willing to spend weeks reading it.

His question was simple: “Can I just paste chunks of this into ChatGPT and ask it what the architecture is?”

7 Dec 2022

ChatGPT: The Natural-Language Rupture of 2022

One million users in five days.

I keep coming back to that number. Instagram took two and a half months to hit a million. Facebook needed ten. Netflix took three and a half years. ChatGPT did it in five days. And everyone I’ve talked to this week — engineers, product managers, even my dentist — has an opinion about it.

I’ve been working in and around conversational AI at Google for over a year now. I’ve seen impressive demos. I’ve built prototypes with language models. None of that prepared me for the visceral reaction people are having to ChatGPT. This isn’t excitement about a new product. It’s something closer to a rupture.

9 Nov 2022

GitHub Copilot Metrics: Coding 55% Faster

A 55% improvement in task completion speed. That’s the headline from GitHub’s recent study on Copilot, conducted with Microsoft’s Office of the Chief Economist. And honestly? My first reaction was skepticism.

Not because I doubt AI-assisted coding works — I’ve been using Copilot since it went GA in June. But productivity numbers that clean always make me want to read the fine print.

So I read the fine print.

The experiment #

GitHub recruited 95 professional developers and split them into two groups. One group used Copilot; the other didn’t. The task: implement an HTTP server in JavaScript. Simple enough to complete in a few hours, complex enough to require real engineering decisions.

11 Oct 2022

AI in Commerce: Order Intelligence and Payment Security

When I worked at TaskRabbit, I sat next to the payments team during a fraud spike that cost us six figures in a single week. The pattern was clever—synthetic identities (fabricated personal details cobbled together from real data fragments) booking high-value tasks, paying with stolen cards, then disputing the charges. Our rule-based detection caught maybe 30% of the fraudulent transactions. The other 70% sailed through because the patterns didn’t match any rule we’d written.

6 Sep 2022

DALL-E 2: Moving Image Synthesis into the API

Last week, OpenAI launched outpainting for DALL-E 2. You take an existing image, and the model extends it beyond its original borders — generating new content that matches the style, lighting, and subject matter of what’s already there. The demos look magical. A Vermeer painting expanded to reveal the rest of the room. A photograph extended to show what was just outside the frame.

It’s impressive. But I keep thinking about something else entirely.

8 Aug 2022

Stable Diffusion: The Open Source AI Explosion

Something big happens in generative AI, and it has nothing to do with who makes the best images.

Stable Diffusion—a text-to-image model built by Stability AI, RunwayML, CompVis at LMU Munich, EleutherAI, and LAION—prepares to release its model weights to the public. Not behind an API. Not through a Discord bot. The actual model, downloadable, runnable on your own hardware. About 10,000 testers held the beta, with a broader research release to roughly 1,000 researchers expected any day now.

12 Jul 2022

Midjourney and the Rise of Generative Media

I woke up this morning and generated a Renaissance painting of a cat in a spacesuit. It took about sixty seconds. The cat looked contemplative.

This is where we are now.

Midjourney opened its beta to the public today, and if you haven’t tried it yet, the onboarding experience alone tells you something about where generative AI is headed. You don’t download an app. You don’t sign up for a waitlist (well, not anymore). You join a Discord server; you type a text prompt in a chat channel; you wait. And then an image appears that didn’t exist thirty seconds ago.

9 May 2022

Talk: Bringing Conversational AI to Search and Maps

In two days, I’ll be at Shoreline Amphitheatre in Mountain View for Google I/O 2022. I’m speaking, which still feels slightly surreal to type. The event runs May 11-12, hybrid format with virtual sessions, and the announcements this year are… significant.

I can’t share everything yet. But I want to give a preview of what I’ll be covering — specifically how conversational AI is reshaping the way people interact with Search and Maps. Not as some distant future, but as stuff that’s rolling out now.

30 Dec 2021

PadCare Labs: AI for Social Impact Case Study

I spend most of my time thinking about developer tools, APIs, and cloud infrastructure. Interesting problems, sure. Life-changing? Not exactly. Every once in a while, though, I come across a use of technology that reminds me why I got into this field in the first place.

PadCare Labs is an Indian startup that processes menstrual hygiene waste. Not a glamorous pitch. Not the kind of thing that gets covered at major tech conferences. But the engineering behind it is solid — and the social impact is real in ways that most “AI for good” marketing copy only pretends to be.

31 Jul 2020

Moving Infrastructure Inference to Hardware Accelerators

Last quarter we moved a couple of our ML inference workloads off general-purpose CPUs and onto NVIDIA T4 GPUs. The performance gains were immediate and dramatic. The operational complexity that came with them was… also immediate.

At TaskRabbit, we use ML models for ranking and recommendation—matching Taskers to jobs, surfacing relevant categories, scoring urgency. These aren’t massive models by research standards, but they run on every request. Latency matters. Cost matters. And for a while, our CPU-based inference was both too slow and too expensive.