Voice-first data: the missing input layer for enterprise AI LLMs & agents

The National Retail Foundation

Original article posted on the National Retail Foundation Europe website.

For every CTO and innovation officer building an AI strategy, the bottleneck isn't computing power or models. It's data. Specifically, the kind of data that language models actually need.

Forward-thinking retail enterprises are betting on AI transforming how we operate. The investments are in place. Data clouds are running. Large Language Model (LLM) strategies are being funded, staffed and piloted. And yet, something isn't working as well as the pitch decks promised. AI is only as good as the data. It’s the fundamental truth about computing. Garbage in, garbage out.

The problem isn't the models. It's what we're feeding them.

The data enterprise LLMs need

As language models, LLMs thrive on natural, contextual and varied text. They're built to parse “I wanted to love this jacket but the colour looked nothing like the photo and now I have to deal with returning it”, and extract multiple signals simultaneously: product issue, misleading photo, returns friction, emotional disappointment. That single sentence contains more actionable insight than a thousand rows of transaction data.

However, most enterprise AI initiatives continue to rely on structured, tabular data such as transaction logs, clickstream coordinates and spreadsheets. These mathematical data points work well for dashboards and BI tools, the infrastructure built over the past decade. Language models can ingest this data, but it's not what makes them powerful.

The enterprises pulling ahead won't be the ones with the most data. They'll be the ones with the right kind of data. Rich, contextual, unstructured text at scale.

So where does that come from?

The usual sources fall short

Support tickets are reactive. They capture feedback only after something has gone badly enough for customers to wait on hold or write an email. The result is a skewed sample, dominated by the angriest fraction of the customer base.

Reviews are sparse and publicly performative. People write them to warn others or to vent. The happy customer who had a small suggestion? They don't bother. And the data belongs to the platform, not to the actual business.

Gen Z and Gen Alpha are never going to fill out surveys as they avoid typing. They send voice messages to friends and family. There's no point fighting it. Surveys have abysmal completion rates, providing no context and untrustworthy responses. The people who do respond are either very happy or very unhappy. The messy middle, where most customers actually live, stays silent. The busy, successful professional target customer never has the time to respond to a 5-page questionnaire or join a focus group.

Focus groups are expensive, artificial, and slow. By the time insights reach the product team, the moment has passed. They have a place in targeted market research, a necessary step in expensive product design, but not in day-to-day operations.

Brands spend billions trying to understand their customers. And still, there's no easy way for a customer to just tell a brand something. Social media is public. Phones are expensive. Email is slow. Chat is bot-driven. There's a wall. Most tools just add more layers.

Voice breaks it down

What if customers could simply speak?

Not call a support line. Not fill out a form. Not download an app. Just tap a surface and talk for 30 seconds, the way they'd tell a friend about an experience.

Voicebox has pioneered this capability by working with a growing list of enterprise retail customers. A customer taps an NFC tag, scans a QR code, or clicks a short URL and speaks naturally. No friction. No account creation. No forms. They’re done in under a minute, with a direct voice line between themselves and the brand.

What emerges is unlike any other data source: real-time, authentic and unfiltered voice. In just 30 seconds, customers share 100+ words on a product, packaging, pricing and service, creating high-volume, richly contextual first-party data that has never before been captured.

And because it's natural language, it's exactly what LLMs need.

From unstructured to structured, in real-time

This isn't a capability that can be built in-house. Capturing voice is one thing. Turning it into structured, LLM-ready data with metadata, intent classification, and sentiment signals in real-time, with cutting-edge accuracy, is another.

Voicebox handles the full pipeline, from frictionless capture through speech to text, structuring, and automated insights, to integration with the systems that teams already use. It is hyper location-aware: the platform distinguishes a scan on “Aisle 7” from one on “Aisle 5 North-end”, tying every voice note to exactly where it was captured. Raw voice becomes tagged, queryable data, ready to feed enterprise models alongside transaction history, clickstream data and the rest of their existing data warehouse.

Through a partnership with Snowflake, voice data flows directly into the AI Data Cloud. There are no new pipelines to build. No data movement headaches. Teams can run Cortex workloads on customer voice alongside every other dataset, using the infrastructure in which companies have already invested.

"Voice is the missing data layer in retail," says Rosemary DeAragon, Global Head of Retail & Consumer, AI Data Cloud at Snowflake. "Transactions show what customers buy, behaviours show how they shop, but voice reveals why they make decisions. Voicebox fills a crucial intelligence gap."

One input, many consumers

The interesting thing about voice data is who uses it. Marketing hears sentiment and campaign reactions. Product teams hear feature requests and friction points. CX hears issues before they escalate to support tickets. Operations hear what's breaking in real-time.

For data leaders and innovation officers, this is the leverage point. Rather than adding another siloed tool, a single, rich input layer serves every department, allowing existing AI investments to operate across the organisation and deliver greater impact.

Voice data transforms into structured insights, flows into existing systems such as Snowflake, Salesforce, HubSpot, MS Teams, and Slack, and triggers workflows. It's not just measurement. It's an operational signal.

Enterprise-grade from day one

Voicebox was architected from first principles for enterprise deployment, with SOC 2, HIPAA and GDPR compliance built in. The platform uses zero third-party tracking, with no Google Analytics or Meta pixels, and runs multiple AI models on internal infrastructure supported by in-house audit logging and data pipelines. For industries with strict data residency requirements, BYOC and on-premise deployment options are available, including regionally compliant OEM servers through a partnership with Lenovo.

The technical barriers that would have made this impossible five years ago are gone.

A simple pilot for any roadmap

For CTOs, CDOs and innovation officers assessing transformation initiatives, voice-first data collection offers a low-friction pilot, with clear signals emerging within a single location within two months.

In weeks, it becomes clear whether customers will speak, with participation rates and data quality already demonstrably high across current deployments. The data quickly proves richer than existing sources, providing concrete proof points for the business case.

The enterprises winning with AI won't be the ones with the most sophisticated models. They'll be the ones who solve the input problem and figure out how to capture what customers actually think, in their own words, at scale.

The voice line is open. The question is who builds on it.

Voicebox is a voice intelligence platform backed by Mark Cuban, built by retail technology executives who scaled two retail startups through IPO, and recently selected for the Snowflake Startup Accelerator. The platform has been featured in NRF's Innovators Showcase in Singapore, Paris, and New York. Learn more at https://voicebox.ai or email the CEO at karan.gupta@vbx.ai.

Try Voicebox  today

Start turning customer voice into revenue