Agents can now generate completely new data sets. How does that change things?
To explore this question, I used Parallel.ai’s FindAll tool to assemble a dataset of all parks within 100 miles of Seattle. Park-finding is a classic data federation problem: there are park finders at various state and local sites, but no one single park finder.
Here’s a walkthrough of that flow. The full process is at the end of the post.
Even when agents generate the data, the fundamentals of how humans evaluate data stay the same.
You still need to ask:
The first question, “what’s in there?” is even more relevant when agents gather the data. If you’re pricing bananas and you accidentally prompt for data about bandanas, your whole strategy may (should?) fail. It seems a trivial example but I've seen the equivalent happen, without even needing agents to help.
Likewise, “Do I trust it?” is most relevant when working with a new data set– and most agent-generated data sets are, by definition, new.
What does change is the tempo. Agent-generated data will create pressure for flexible (and likely agentic) tools that let people work with data as soon as it's generated. Research has shown time and again that people can understand data much faster when it's organized and aggregated visually. In the parks data, I could immediately see that there several duplicates by charting parks by size. As I explored, I could ask questions using the Data Agent to learn more ("What parks are in "Other?")
This is where AI comes in. We’ve already seen how the ability to ask natural-language questions of data means more flexibility and less need for bespoke dashboards. AI is going to have to shoulder the load of bringing data into a visual, queryable format so that humans can do the sense-making we need to do.
I mean, we’ve known for decades that it takes too long to build dashboards (the primary way to make data visual), and that inhibits people’s ability to think with and use data. Soon, anything longer than a few minutes will be “too long.”
“Yes,” you may say, “but I’m not searching the web for my data. It still comes from traditional pipelines”
True. But agentic workflows are not only for the web: agents are emerging to make it easier to use all that enterprise data. For example, Bobsled helps product companies “build agentic experiences on complex data.” Spice AI provides fast and federated access to enterprise data to allow agents to access it.
Agents that can gather data on the fly hold the promise of unlocking value in that data. And that will drive the rest of the data stack to be more interoperable and agentic as well.
What I did: