The Next Automation Frontier: Models and Data for the Physical World
Robots, Baby!
THESIS: The next automation frontier isn’t another software workflow—it’s the physical world. Agents that can perceive, reason, and act across messy, high-variance tasks will demand specialized models trained on instrumented interaction data, not just bigger LLMs. The hardware is already on factory floors; the intelligence—and the data infrastructure to support it—are what’s missing.
I keep wanting to write a post about what an AI agent is. Part of that would be for my own education (because I am not sure I really understand something until I publish it on the internet), but also part of that is because I think that term gets thrown around a lot for MANY different things. However, agent technology appears to be changing so rapidly it hasn’t seemed within my strike zone – my work process & workflow don’t match up with agents quite yet because it’s such a moving target.
Interestingly that’s what agents are all about - process & workflow. Ultimately, that might be the ultimate litmus test for an AI Agent: are they creating, executing upon, or carrying out a process or workflow? If not, then it’s not really an Agent.
But on top of that, most processes & workflows are moving targets. Which is what makes agents such a compelling new piece of technology. We have been able to automate rote processes for quite some time now, to varying degrees of efficacy. Many of these processes were based on complex (or sometimes less complex) nested if statements. “If this email contains this word, then it goes in this inbox”, and so forth. But it’s an entirely different thing to be able to actually READ that email and respond to it based on the specific context of the email. Suddenly, the universe of things that can be automated grows significantly.
However, it’s not like these automations occur out of thin air. But in order to build out these workflows, some data set is required. You can’t just tell an LLM to do anything and it will figure it out – at least not yet. There are some things it’s great at, and there are plenty of things it is lacking a good data set for. See below:
But the reality is that Roon’s benchmark is actually a pretty good benchmark because that requires automating both digital and physical workflows. Most of what AI is doing today is automating digital-first workflows. There are some examples of physical workflows being automated, but the reality is that most workflows being automated happen on a mobile device or a computer. They happen in the browser or in the cloud. And before ChatGPT was launched and agents were all the rage, there still were a bunch of companies that were trying to automate these processes.
This was back when we called it Machine Learning. And if you ever sat in on a pitch from an AI / ML company, the thing they would start off telling you is how much training data they had. Data was the key thing. It was the only way you could build a competent model, if you had really good data. There were startups (and still are) that would pitch you on the fact that the data didn’t have to be that good because they had built a better mousetrap. This was, of course, what transformer models allowed to happen: garbage-in didn’t NECESSARILY mean garbage-out. Scale and self-supervision greatly reduced, but did not eliminate, the functional output costs of noisy data (TO BE CLEAR, NOT THE ACTUAL COSTS OF INFERENCE).
So the question was always – how can you get your hands on really interesting data.
It was hard to extract data – and still is – but the LLMs we are using today make it significantly easier to do because they have done a lot of the heavy lifting. So now, anybody with a digital workflow or tool can “easily” (it’s still hard to do) start automating that workflow. And they can start building a training corpus to make that possible.
As a result, it feels like only a matter of time before digital workflows get automated by agents. As I previously said, current LLMs can’t automate EVERYTHING, but with the right data, I think they can automate MOST digital workflows. The data exists, the digital experience can generate this data pretty easily, and the models can train on top of those, even if they don’t have the correct context yet to do so. Send an email? No problem, tons of them get sent every second. Write a memo? LOL, already done. Schedule calls? Organize a project? Buy something online? Come up with creative? Etc. etc.
Now, I am not saying that we SHOULD automate all of those things, or that the automated output is going to be better than a human. But chances are that stuff either gets automated, or a heck of a lot easier for a human to do, OR a premium is paid to have a human execute that task.
But those are DIGITAL workflows. Again, that should all be relatively easy to automate because we have the data – readily accessible – and we have the models. Or will in short order.
But what about physical workflows? Where does that data exist? And can we really train models to do something about it.
We live in an age where software is eating the world. But it’s not like software was invented last year. Software has had a while to eat the world. Some parts of the world were gobbled up pretty quickly (CRMs, data tools, etc.), but it has taken a while to slowly break into more physically-related industries. There is a reason VCs always get scared when end markets like EdTech or Manufacturing Technology get brought up – because those spaces are hard to digitize and have been on the right tail side of the adoption curve for technology over the last 50+ years. These are physical industries that can’t be put into a spreadsheet and managed via email.
But that doesn’t mean they are without automation and automation opportunities. It is estimated that FANUC robots (just FANUC) have an installed base of 200K – 300K robots in the Americas. So it’s not like Manufacturing and Industrial sites have gone completely devoid of automation and automation attempts.
The trick is that robots are expensive. And hardware is hard. And even if you get them purchased and installed and working, they are not smart (in most cases!). You need someone with unique, niche technology competencies to make them work effectively and update effectively over-time. Where software has been getting easier and no-codified over time, robots are not on the same trajectory, or at least keeping up the same pace.
The reality is that the automation that is being built today via LLMs doesn’t have a strong dataset to build automations for robots. Some of that is occurring, but extracting that data is a lot harder than it could seem. And even if LLMs have eaten the entire internet, a lot of the workflows that robotics are involved in are not online. Bits and pieces have been added to the cloud, but not at an appropriate rate.
There are automations occurring in industrial settings, but not with the same quality that automations are occurring in the digital world.
So better training data is required. I wrote once that the robot apocalypse was going to start somewhere in the Midwest. We have the tools and the robots in place, and the data exists in our existing workflows. So then it’s just a matter of digitizing it and feeding it into the right models. Once we do that, Real automation can get started - right?
Is it really about data, however? Before transformer models exploded onto the scene, the question was about having better data and building better models. Then transformer models rolled around and allowed for language-models to use “brute force”. They are not called Exact Language Models – they are called Large Language Models. They are large because, again, the old “Garbage In – Garbage Out” adage became less true. Of course, there are still problems with LLMs that more data might not necessarily fix (hallucinations, context windows, etc.). However, the solve that has allowed for our current era of AI to achieve such an explosion was not BETTER DATA. And I would argue that it’s not even MORE DATA. More data would imply that we found some special trove of data that unlocked something for us. No, in fact, it was a better model (or way of building models) that has allowed for the current language foundational models to flourish. These LLMs do continue to add MORE AND MORE data and are getting much better with time (and less expensive), but it was the improved model(s) that made that possible, not the other way around.
So the mistake is to assume that there are no more models to be built – that the foundational models we have today are the foundational models we will use for all of time – or rather, that the runners in that race have already claimed various victories. I think it’s probably true that the foundational LANGUAGE models are reaching a higher plateau and those winners are already well into the race.
But, I don’t think it’s safe to assume that being the foundational model for language necessarily means you will be the foundational model for Math, or Chemistry, or Biology, or Physics. Or even Finance & Accounting. Yes, theoretically language is the base layer through which we, as humans, understand all of these things. But language is a lossy and incomplete method of communication, and those other disciplines require more than just words to understand. There is a reason we have a completely different character set to describe math and don’t just use natural language.
In theory, we could jam all of the written word on a subject into current LLMs and train them adequately, but that seems like it is going to be an inefficient way of creating output. Also, the data requirement might be too gargantuan, because, again, that data might not necessarily exist in a digital format, ready to be consumed. And it might not exist on the same vector that allows for language models to truly understand it.
So the question becomes – could you build a better model that is more efficient to intake that type of information? Could you create a model that adequately understands physics in a way that language cannot appropriately describe? And can you do it using less data? What about using better data?
So to bring our point back to robotics, you need more than just natural language to speak to a robot. You need physics and context that current LLMs don’t seem equipped to manage or understand. Current big LLM hyperscalers may get there someday, of course, but it also doesn’t seem like something they are actively focused on. Mainstream LLM roadmaps remain language-centric; physical-world modeling is emerging but not yet mature. Now, could you use current LLMs to build the model for physics? Absolutely. But that might become a totally different model altogether.
Physical automation needs specialized models trained on instrumented interaction data (force/torque, vision, proprioception). Scale helps, but without reliable capture and evaluation harnesses, progress bottlenecks.
There are several startup companies that are heavily focused on robotic automation today. Many of them are focused on building a new robot, and then layering in a better model to help that robot. There are some that are robot agnostic and trying to build a better system with just better software (and maybe some sensors). I don’t know which one of these will win the race. My VC-brain thinks that the robot-agnostic strategy will win because of improved margins, better distribution positioning, etc. But there is also a case to be made that we don’t really have a Robotics PLATFORM yet that can allow for software to spread rapidly. Before we had the mobile moment, we needed to have the smartphone moment.
Regardless of the strategy that works (and maybe it’s a third rail not accounted for here), they need to build better models. It’s not enough to just get better data.
Industrial settings, by nature, seek efficiency. Since the start of the Industrial Revolution, there has been a slow march towards increased automation. There have been failed attempts, of course. But also radical successes, and I doubt that march is going to stop any time soon. Building the foundational model that understands the nuances between those failures and those successes is the place where great venture value lies.





