Talk: Bringing Conversational AI to Search and Maps
In two days, I’ll be at Shoreline Amphitheatre in Mountain View for Google I/O 2022. I’m speaking, which still feels slightly surreal to type. The event runs May 11-12, hybrid format with virtual sessions, and the announcements this year are… significant.
I can’t share everything yet. But I want to give a preview of what I’ll be covering — specifically how conversational AI is reshaping the way people interact with Search and Maps. Not as some distant future, but as stuff that’s rolling out now.
LaMDA 2: Conversations That Actually Go Somewhere #
If you followed I/O last year, you heard about LaMDA — our Language Model for Dialogue Applications. LaMDA 2 is the next iteration, and the capabilities are a meaningful step forward.
What makes LaMDA different from other large language models is its focus on dialogue. Not just generating text, but maintaining coherent multi-turn conversations across an enormous range of topics. Ask it about astronomy, then pivot to cooking, then ask a follow-up about the astronomy topic from three turns ago — it tracks. The conversational memory and topic-switching feel genuinely natural in ways that earlier models couldn’t manage.
We’re launching AI Test Kitchen alongside LaMDA 2 at I/O. It’s an app that lets people outside Google experiment with LaMDA’s capabilities in a controlled setting. This is deliberate: rather than shipping LaMDA directly into products and hoping for the best, we’re opening it up for feedback first. The responsible AI angle matters here, and AI Test Kitchen is how we’re operationalizing that.
I’ve been playing with LaMDA 2 internally for a few months now. What strikes me most? The quality of its “I don’t know” responses. Earlier conversational AI had this pathological confidence problem — it would answer everything, even when clearly making things up. LaMDA 2 is better (not perfect, but better) at expressing uncertainty. Honestly, that’s more important than being right all the time.
PaLM: 540 Billion Parameters and Chain-of-Thought #
PaLM is the other major AI announcement, and the scale is staggering. 540 billion parameters. For context, GPT-3 has 175 billion. PaLM isn’t just bigger; the architecture choices and training methodology produced genuine capability jumps in reasoning tasks.
The headline technique is chain-of-thought prompting. Instead of asking the model for an answer directly, you prompt it to show its reasoning steps. The improvement in accuracy on math, logic, and commonsense reasoning tasks is substantial — in some benchmarks, PaLM with chain-of-thought prompting outperforms fine-tuned models specifically trained for those tasks.
Why does this matter for Search and Maps? Because information retrieval is fundamentally a reasoning task. When someone asks “restaurants near me that are good for a first date and have outdoor seating,” that query requires understanding intent (date context), filtering (outdoor seating), and subjective judgment (good for a date). A model that can reason through those layers produces better results than one that pattern-matches keywords.
Multisearch: Photos Plus Text #
This is one of the features I’m most excited to demo. Multisearch lets you combine a photo with a text query in Google Search. Take a picture of a plant, add “care instructions” — and get results about that specific plant. Snap a photo of a dress, type “in blue” — and find that dress in different colors.
The “near me” variant connects this to local results. Photograph a dish at a restaurant, search “near me,” and find other restaurants in your area that serve something similar. The multimodal combination of visual search and text input feels like a genuinely new interaction paradigm rather than an incremental improvement.
Google Lens is powering a lot of this, and the usage numbers are remarkable: over 8 billion visual queries per month, nearly 3x year-over-year growth. People are using their cameras as search inputs at a scale that would have seemed implausible even two years ago.
The new scene exploration capability takes this further. Point your camera at a shelf of products — say, a grocery aisle — and Google Lens can identify and surface information about multiple items simultaneously. Pan across the shelf and it tracks, filters, and presents information about each product as it enters the frame. The engineering behind real-time multi-object identification, overlaid with search results, is genuinely impressive.
Immersive View in Maps: ML-Powered 3D Worlds #
Immersive View is the Maps announcement I think will have the longest-lasting impact. It fuses aerial imagery and street-level photography using machine learning to create 3D representations of neighborhoods you can explore.
But the clever part is the simulation layer on top. Immersive View can show you what a location looks like at different times of day, in different weather conditions, and with estimated traffic patterns. Planning a dinner reservation? Check what the neighborhood looks like at 8 PM. Thinking about a morning run? See the estimated foot traffic and weather.
The ML pipeline behind this — stitching together multiple imagery sources, generating consistent 3D geometry, then simulating dynamic conditions — is a multi-year engineering effort. What you see at I/O is the first public demonstration, with broader rollout coming later.
For developers building on Google Maps Platform, the implications are interesting. Immersive View creates a richer spatial context that could eventually surface in APIs. Imagine a real estate app that shows not just listing photos but an immersive view of the neighborhood at different times of day. Or a travel planning app that lets you virtually walk the streets of a destination before booking.
The Conversational Thread #
What ties all of these announcements together is a shift in how people interact with information. Search used to be a keyword box. Maps used to be a navigation tool. Both are becoming conversational, multimodal interfaces where you can ask questions — in text, voice, or visual input — and get answers that demonstrate understanding rather than just retrieval.
LaMDA and PaLM provide the language understanding and reasoning capabilities. Multisearch and scene exploration provide the multimodal input layer. Immersive View provides the spatial context layer. Together, they’re moving Google’s products from “tools that find information” to “systems that understand what you need.”
I’m biased, obviously. I work here. But having seen these technologies develop over the past several months, I think the gap between “search engine” and “knowledge assistant” is closing faster than most people realize.
What I’m Watching For at I/O #
Beyond my own talk, I’m paying attention to the developer tooling announcements. The AI capabilities I described above are impressive, but their real impact depends on how accessible they are to developers building on Google’s platforms. APIs, SDKs, documentation, pricing — that’s where vision meets reality.
I’m also curious about the developer community’s reaction to PaLM and chain-of-thought prompting. The technique has implications well beyond Google’s products; any developer working with large language models should pay attention to how structured prompting changes what’s possible.
If you’re attending virtually or in person, come find my session. I’ll be going deeper on the conversational commerce angle — how these AI capabilities translate into business value for enterprises building on Google’s messaging and search platforms. That’s where my day job intersects with the I/O announcements, and it’s the part I’m most looking forward to sharing.
See you at Shoreline.