Data Network Effects vs. the Innovator’s Dilemma
A Paradox in the Current Consensus
Thesis: Data abundance without a business-model shift is just an expensive asset. The paradox is that incumbents have scale data but misaligned incentives, while upstarts with smaller but purpose-built datasets—and usage-based, outcome-based pricing—can compound faster.
Data Network Effects
Network effects occur when, as more users utilize a product or service, the actual value generated by the service or product increases. This, in turn, creates self-reinforcing value mechanism: more users generate a better product, which generates more users, and so on and so forth. If you want the long version, I highly recommend reading Helmer’s Seven Powers or The Cold Start Problem by Andrew Chen.
There are plenty of examples of strong network effects in place in the real world, especially in the tech universe. Google might be the most prevalent example: as more users spend time searching, google generates better search results, which means people are more likely to use their product over others (a gross oversimplification, but still valid). This is the reason Alphabet, Google’s parent company, is the most profitable business in America today. But beyond Google, there are tons of other examples, like Uber, most social media platforms, most peer-to-peer products in general, etc.
The status quo for network effects is frequently exemplified by more people on platform results in more / better opportunities for engagement on the platform. While this isn’t always the case, the reality is that more users use a product and then there is a denser opportunity to take advantage of the network. More uber passengers = more drivers = more passengers. More users on facebook = more content = more reason to be on facebook = more likely to invite more members to be on facebook.
However, the current viewpoint is that this is evolving such that AI is opening the door for a NEW era of network effects: data network effects. The consensus belief is that, as AI tools rely heavily on data (both more and better data being positive inputs thanks to transformer models), the more data they are able to ingest, utilize, and/or generate, the better the product will become. So, as more users utilize an AI product, the better the AI product will become, thus driving more users only the product.
While this might sound novel, this is effectively what Google is doing. And in many ways, this is what most algorithmic media platforms are doing today as well. The difference is that the outputs for AI today still feel so new, whereas it’s hard to even recognize when Search or Media is getting better readily. Those are not novel platforms.
But it’s not really as simple as the more users = better product formula would lead you to believe. The formula can’t simply be: more users using the product results in a better model, which results in a more users using the product. Rather, more users utilize the AI product, which in turn helps further refine the model, which in turn helps generate better outputs or results for the user, which in turn creates MORE VALUE for the user, which drives them to use the product MORE. But it’s not just for the single user whose product experience gets better – it has to ripple through the product experience for all users (or a decent chunk of them at least). Sure the AI tool you have deeply customized might be getting really good as you spend more time with it, but is it making other users’ experiences better as well?
On top of this, the model can’t just magically get better – it has to actually drive BETTER RESULTS for a wide swath of users. That might seem like a minor nuance, but it’s deeply embedded in the reality of AI. Value generation is a big question for all AI models. In consumer / lightweight AI use cases, we are seeing relatively broad adoption (for a few select use cases); the “mainstream” is already engaging with AI in ways people recognize (chatbots, image generation, writing assistants). As of September 2025, 53% of surveyed U.S. consumers are now experimenting with or regularly using GenAI (up from 38% in 2024). In enterprise settings, most firms are still in the early adopter / exploration / pilot stage for deeper, mission-critical use cases. The path to scaled deployment is still blocked in many cases.
As a result, the bar to clear to generate real world solutions is still fairly high and somewhat uncertain. So you can’t just make the model better. In theory, ANY DATA makes a model better. The actual result needs to drive improved outcomes for the end user and buyer of the product.
The highest enterprise market adopted technology in the AI space is what we call integrated development environments – or code AIs / IDEs. These products are amalgamations of models that help developers write better, faster code (at least, in theory). The big examples are Cursor, Claude Code (anthropic), GitHub Copilot, among others.
As each piece of code gets accepted or rejected by the user, that action acts as a signal which helps the model refine what is usable code / code generation for the next prompt instance. That happens not just at the user level – keeping memory of what that a user likes and doesn’t like over time – but also at the total user base level – assessing what all users approve and throw away over time in aggregate and updating the models accordingly.
Aside from the ludicrous growth of these IDEs, there is data to suggest that these AI tools are getting better AND they are driving better business outcomes as more users utilize these tools. GitHub has reported that Copilot’s “accepted suggestion rate” improves over time as the model learns from aggregate usage. As a result, the more developers use Copilot, the better it gets and the higher the switching cost to competitors with smaller usage bases.
There are other examples of this AI network effect actually at play too: design platforms utilize metadata to see what designs get accepted, rejected, or heavily edited, to make better automated design decisions moving forward.
Additionally, we are seeing early signs that models embedded in knowledge tools generate network effects as they aggregate and generalize data across an organization. This doesn’t necessarily mean the model is improved for an ENTIRE user base, but it might improve for an the entirety of just ONE ENTERPRISE as more folks use it internally. However, I am not sure this is technically a data network effect as the total product doesn’t necessarily get better for people outside of the enterprise using that instance of the model. Alternatively, this is the effect that just makes enterprise so sticky, regardless of AI or not – if a model learns only from one company’s data, you may get strong organizational lock-in (classic enterprise network dynamics), but that’s not a cross-customer data network effect.
How this relates to data network effects, however, is a less clear. For the enterprise companies who are claiming internal tools get better with usage, a skeptical view of that would be that can only really be true for internal users – people on the same tool, but within a different company, don’t necessarily benefit if more folks are using that product. The internal data that these AI tools are taking advantage of is heavily siloed and it would be difficult to extract / paste that value generation onto users within another enterprise.
This, of course, wouldn’t be impossible – OpenAI does a decent job of silo-ing its consumer data (at least, it appears to do so). It definitely gets better as more users train the model, and relies heavily upon metadata. But it’s one thing to do this for a consumer base of 800 million weekly active users. Usage that high results in a lot of BIG common use cases that allow OpenAI to abstract down to reinforcing improvements. When you reduce that number significantly, to the number of users that an enterprise software tool has, it becomes a lot trickier.
So to summarize, what makes a true data network effect:
Improves global model performance
Benefits most users
Drives adoption → more data → better performance
Cross-customer (not just per-tenant)
The Paradox
There is a general consensus (or at least fairly popular) opinion that the large enterprise software companies are in a good position vis-à-vis AI because they “have all of the data”. They can, in turn, use this data to create better models and more robustly take advantage of existing AI products. And the data isn’t just stagnant, like customer data baked into a CRM – it is also metadata and digital workflow data. This, in theory, will allow enterprises to build agents and more nuanced AI because they know what a user is working on as well as HOW a user does their work.
But the reality is that this is a dubious proposition. For one, building AI products that actually work is hard. There is a reason that the talent war right now is for Machine Learning / Artificial Intelligence PhDs. And despite the insane amounts of money being thrown around by big hyperscalers, several of them, like Apple, are still lagging behind.
But for the sake of argument, let’s say that building an AI product is not a deeply complex process that you can just throw money at and hope to solve. It’s something that only requires good data (and a ton of money). Well, then bigco’s should be well positioned to take advantage of this opportunity, as that is two things they have in spades.
But the problem is that even if you build a great AI product, it can take a lot of institutional will to shift course and start selling that product. An enterprise-grade company has a lot of employees, who are all likely incentivized to sell the legacy version of the product that doesn’t have AI deeply embedded. And even if you change those incentives, there is a general sense of inertia among any person when changing what motivates them. They have built up relationships that are predicated on the old way. The prior value chain is a classic sunk cost fallacy, and while it is misguided, there is a reason that’s a very well known mental trap – it’s because it is easy to fall into!
The result is that it takes a lot of work and a lot of time to convince the existing organization to change direction and drive towards a new outcome. And this is assuming that the new product isn’t cannibalizing the old business in some way. Which is a pretty big assumption – disruptive technologies do just that, they disrupt. So an enterprise is telling its employees to change behavior, change incentives and to jump off a cliff before knowing what’s on the other side. It’s definitely possible, but very difficult and there aren’t a lot of great historical examples of this at play.
More on this from Bret Taylor at Sierra:
“Closing a technology gap in your product is hard, but not impossible. Changing your business model is really hard. … There’s a graveyard of CEOs who’ve been fired because they couldn’t make that transition.”
This is Innovator’s Dilemma 101. The Acquired guys had a great focus on this about Google dealing with this currently in their most recent episode. Google is facing one of the most tangible Innovator’s Dilemmas in recent memory – trying to maintain their cash cow of search while also recognizing that AI has already massively impacted search in a meaningful way. Google is an AI company and they have all of the tools to succeed here, but there is no guarantee of that.
But that’s the thing about Innovator’s Dilemmas – they are infrequently as linear and straightforward as what is happening to Google’s market landscape. Google is facing this in a more head-on, highly publicized way. The 1:1 nature of Search vs AI is pretty clear today and I would argue has been for a while. But for most enterprise companies, I am not sure AI disruption is going to look so straightforward while it’s occurring. We are talking about new technologies and new behaviors, not just new technologies. So I expect the shift within enterprise software companies to be less predictable.
Ultimately, be wary of enterprise companies spiking the football on AI. I am deeply skeptical of the Salesforce advertisements with Matthew McConaughey telling me about their Salesforce Agent Cloud, or whatever the hell it is. And of course, this opens the door for upstarts to take on this new technological shift. Will there be a new CRM winner that comes out of all of these automated outbound tools? Will there be a new customer success platform that arises from all of these AI CS agents? It’s too early to tell, but that’s why we do venture capital.



