Every CX Platform Went All-In on AI Agents — Here’s What They Still Can’t Do

Furthermore, enterprise Connect 2026 just wrapped, and the theme was impossible to miss: agentic AI is the future of customer experience. This is especially relevant when thinking about AI agents customer experience.

Additionally, salesforce launched Agentforce Contact Center. Dialpad introduced a full AI agent lifecycle, from skill mining to governance. 8×8 revealed JourneyIQ. Zoom unveiled Virtual Agent 3.0. UJET announced Agentic Experience Orchestration. Even Amazon Connect is redefining what “resolution” means.

In fact, everybody’s building AI that talks to customers. And they’re getting incredibly good at it.

But every single one of these announcements shares the same blind spot.

None of them can see.

The Most Expensive Words in Customer Support: Understanding AI agents customer experience

Here’s a scenario every support team knows:

Furthermore, importantly, a customer calls in about a broken product, a confusing setup screen, or an error they can’t describe. The AI agent, armed with perfect NLP, enterprise knowledge bases, and conversational memory, asks the customer to describe what they see.

Also, and the customer says: “There’s a thing… it’s like a red light? Or maybe orange. And it’s blinking. I think.”

Additionally, notably, that’s the moment AI agents fall apart. Not because the technology is bad. Because words are the wrong tool for visual problems.

McKinsey estimates that 70% of customer service interactions involve some element that would benefit from visual context. Hardware issues. Installation problems. Physical damage. Configuration screens that look different on every device.

In fact, every one of Enterprise Connect’s keynote-worthy AI agents handles these the same way: they ask the customer to describe what they see, in words, to a machine that has never seen anything.

Why Text-and-Voice AI Hits a Ceiling

Furthermore, the CX industry’s AI investments are massive. Gartner predicts that by 2027, AI agents will handle 40% of customer service interactions end-to-end. The platforms announced at Enterprise Connect are clearly building toward that future.

But there’s a ceiling, and it’s not about intelligence, it’s about input.

Moreover, today’s AI agents process text and voice. That’s it. They can parse sentiment, route intelligently, summarize conversations, even take autonomous actions. What they can’t do is look at a blinking router, a cracked screen, a confusing settings menu, or a damaged package and understand what’s actually happening.

And this is where the metrics get interesting.

The problems that require visual context are almost always the expensive ones. They’re the ones that generate the longest call times, the most transfers, the most repeat contacts. They’re the ones where first-contact resolution rates crater and customer satisfaction plummets.

AI agents are solving the easy problems faster. That’s genuinely valuable. But the hard, expensive, loyalty-destroying problems? They’re still stuck in the “can you describe what you see?” loop.

The Zoom Virtual Agent Clue

To their credit, Zoom seems to sense this gap. Their Virtual Agent 3.0, announced in early March, includes what they call “multimodal LLM intelligence”, the ability for their virtual agent to interpret customer-submitted documents, images, and structured identifiers.

That’s a step in the right direction. But there’s a critical difference between a customer uploading a photo after the fact and an agent seeing the problem in real-time.

Upload-based approaches still put the burden on the customer. They have to figure out what to photograph, take the picture, upload it, wait for processing. For physical products, installations, or anything that requires movement and context, a static image often isn’t enough.

Real-time visual context, actually seeing what the customer sees, as they see it, is a fundamentally different capability. It’s the difference between reading a police report and being at the scene.

The $13 Billion Blind Spot

The contact center AI market is projected to reach $13 billion by 2028. That’s a lot of money flowing into technology that can listen, speak, and reason, but not see.

Consider what visual context unlocks:

  • Faster diagnosis. Instead of five minutes of “which light is blinking?”, an agent sees the problem in three seconds.
  • Higher first-contact resolution. Visual confirmation eliminates the guesswork that causes repeat contacts.
  • Reduced escalations. Tier 1 agents (or AI agents) can resolve visual issues that previously required on-site visits or senior technical support.
  • Better training data. Visual interactions create richer datasets that actually teach AI what real-world problems look like.

None of these outcomes are achievable through text and voice alone, no matter how sophisticated the NLP.

Why This Matters Now

Enterprise Connect 2026’s big message was that CX leaders need to stop chasing AI capabilities and start demanding outcomes. Amazon Connect literally told attendees: “Deflection is the wrong goal. Relationships are the goal.”

If relationships are the goal, then understanding the customer’s actual situation matters more than efficiently routing their ticket.

And for a huge category of support interactions, hardware, physical products, installations, visual troubleshooting, damage assessment, understanding the situation requires seeing it.

The CX industry is building incredibly sophisticated AI agents that are, for all practical purposes, blind. They can carry on a brilliant conversation about a problem they’ve never seen.

That’s an expensive limitation to ignore.

The Visual Layer Is Missing

Every major CX platform now has an AI strategy. Every one includes conversational AI, workflow automation, and agent assist capabilities. The stacks are deep and getting deeper.

But look at the architecture and you’ll notice what’s absent: real-time visual communication as a native capability. Not video conferencing bolted onto a help desk. Not “upload a screenshot” thrown into a chat window. Real-time, no-app, frictionless visual context that feeds directly into the AI resolution engine.

This isn’t a feature gap. It’s an architectural blind spot.

The platforms that figure this out first, that add eyes to their AI agents, will have a decisive advantage in first-contact resolution rates, customer satisfaction, and the operational metrics that actually matter.

What Comes Next

Enterprise Connect 2026 proved that the CX industry is serious about AI. The investments are real. Additionally, the capabilities are impressive. The shift from experimentation to accountability is genuine.

But accountability means measuring outcomes. And when your AI agent can’t see the broken product, the confusing interface, or the physical damage that prompted the call, your outcomes will always have a ceiling.

The conversation about visual customer support is just getting started. And for teams dealing with physical products, hardware, field service, or anything that requires seeing to understand, it might be the most important CX conversation of 2026.

What support problems does your team solve that words alone can’t describe? The answer might reveal your biggest opportunity for resolution improvement.

If you’re looking for a simpler way to handle visual support, Viewabo lets your team see exactly what your customers see, without any app installs.