The whining will only get louder from here on out
OpenAI whining about "unfair" competition from China is both delicious, delicious karma and the new runner-up for the definition of hypocrisy in the Oxford dictionary.
Without governments somehow banning Open Source and Chinese Competition, OpenAIs business model is broken and will never deliver return to investors, because transformer models are just lossy, compressed information storage, not mystical Shoggoth or quasi human entities.
And when you're selling access to information you've stored via an API, there's naught you can do to protect that information from flowing elsewhere, including into someone else's lossy, compressed storage.
The hand wringing 1 about “We scraped the internet without permission” but people need permission to scrape our mode is delicious and we should run these guys out of town, because they themselves are scraping their competition. As with US Republicans, every accusation is an admission.
Model extraction is an unsolvable problem when you sell API services. Just 50.000 prompt pairs is enough to extract any area where a model has an advantage for augmentation of your own models, including SLMs. It looks indistinguishable from regular use patterns.
The process is trivial, any engineer, or Claude agent can do it. Or, as an anonymous Google engineer put it “we have no moat and neither has OpenAI”.
And that is good. Because wide diffusion of AI via Open Weights is the last best hope for fair re-distribution of value sucked from the commons, the authors, the internet behind pay per inference walls of Big Tech. Open Weights guarantees the future more akin to the calculator, a cheap, powerful invention available to everyone in society equally to build value on top, rather than concentrated power in the hands of a handful of techbros building end times bunkers.
Generative AI isn't what you think it is.
AI, at its core, is an impressive but incremental technical capability: Semantic retrieval and shifting along latents, on top of a much more powerful legal invention: The concentration of all of humanity’s knowledge in one place, despite prevailing laws and subsequent monetisation.
More than a decade ago Google tried to digitise all books but was limited in the ability to sell access to the results by courts. Had they succeeded, much of the step change attributed to AI today would have been attributed to Google: Low friction knowledge retrieval was possible before the transformer. It’s all of Google’s business model (part 1 - part 2 involves reintroducing your own friction in form of ads to monetise it - part 3 usually follows by allowing you to graciously pay to remove that friction again).
With the transformer, the ability to launder the same data from its protective copyright has enabled monetisation, fundamentally made possible by maintaining a narrative smokescreen of “intelligence” and progress long enough to collect enough investment to claim a compelling growth narrative for governments to compel them into non-enforcing copyright and IP provisions.
The blast radius is the entire economy
The question people in charge of companies need to ask themselves, now, is "how does our business look like when knowledge retrieval is frictionless and either free (Open) or monopolised by a small number of vendors". Now, because that's the world we already live in today., with vast consequences:
Education, often reduced to the act of preloading people's brains with knowledge and apply knowledge, has sustained a critical hit. Much like you can't train to outrun an internal combustion in a race on foot, you can't out-brain a transformer for knowledge storage.
A possibly critical mass of knowledge economy jobs are highly vulnerable to the transformer, despite it's reliability and security failings, because they lack the kind of regulatory accountability/responsibility obligations and with that protection doctors, lawyers and professional (aka not software) engineers enjoy.
By leaning on the "training is transformational" framing in order to exempt companies from copyright enforcement, we've broken copyright, creating the ultimate laundering mechanism for the fruits of creative work and innovation:
- A book, semantically shifted along latents through ChatGPT to contain the same entropy with changed expression, can be sold, in competition to its author.
- An app, the process of painstaking user research, design work and testing, can be re-synthesized in seconds, for cents, into a competitor, and supercharged with the money saved on creation by paying attention platforms for user acquisition.
And with that, Copyright, historically meant to ensure incentive for creators to continue creating (for the benefits of publishers who lobbied for it), is now broken, as original creation continues to be a costly activity while reproduction and plagiarism is cheap now.Copyright infringement is erased when the bits of copyrighted information are broken up and stored along latents using a ~1000 lines of code transformation algorithm.
There's some, cold comfort in the fact that techbros are suffering from the consequences of their own actions here, but that's not how this particular play usually ends.
The Next Act
The real question we should ask now is
The next act is already written, following the standard tech playbook:Are the combined capital and hype forces the broke copyright enough put it back together in favour of Tech Companies?
Pull up the ladder by convincing politicians to protect you from the very actions that allowed you to succeed, by arguing for legal protection of the ill gotten gains.
This worked well for social media but is hitting a (Great) Wall with AI: You can’t beat the Chinese with capital spend because they've done their homework: China has the advantage on power, on STEM, on data and will invariably make the better and cheaper models, neutralizing the US tech colonial infrastructure play. (See 2 on this topic).
Prepare for much more screaming and eventually decouple of the US from Open Source and trying to create hardware lock in via their NVIDIA stack - signed models, giving the hardware platform controller the ability to decide what runs and what doesn't run akin to game cartridges for game consoles (technology Nvidia is intimately familiar with.
This however only works if other economies let themselves be swept up in the plot though rather than doing the sensible thing and just run with the Open Source recipes to build sovereign AI. It seems entirely obvious that the right thing for societies is to reject it in their own best interest and let the tech broligarchs and investors fail - it's their mistake to put too much money into a technology play that offers no moat, thoughts and prayers, investment is risky.