I studied algebra and number theory and the part about mathematics sounds true.
All the heavy lifting on the proof of Fermat's Last Theorem was done by Andrew Wiles, but his proof eventually lasts on Gerhard Frey's observation that if FLT didn't hold, a non-modular eliptic curve could be constructed - which is a bridge connecting some far away islands in the mathematical landscape. These bridges are rare and tend to be very productive, but first you have to notice that they can be built, and this is the problem. Current mathematics is so large that people specialize in tiny subfields thereof, and only have a very vague, if any, idea, what is happening in nearby subfields. Much less in distant subfields.
AI does not have this sort of "my brain is not big enough to fit everything" limitation. Or, technically, it does (both RAM and disk space is finite), but that limit is several orders of magnitude away right now.
So, we can expect some interesting mathematical concepts from AI. Not just mere slog.
I'm a working scientist doing theoretical physics in an AI-adjacent field. I am currently a few months into a computational project that I have vibe coded and and analyzed with GPT5.2, and run on my laptop.
I agree 100% with this post. I get into chats with GPT about the nature of science, and its Balkanization. I ask, 'does concept X exist in any other disciplines?' as a meta-literature search. It then says 'Yes, in field A it called X, in field B it is called Y, in field C it is called Z...' and then lists 3 other fields. This is a jaw dropping act of SYNTHESIS. In modern science the literature is so large, the same ideas get reinvented in distributed in separate fields... wasteful duplication. Some humans will 'borrow' a useful idea from another field, and then make a name for themselves without really innovating! Carpet baggers.
I have also talked with GPT quite a bit about the nature of its cognition. Its obviously got guardrails on these topics, but we get there. Unlike our human intelligence, where we learn from experience in a continuous stream of sensory data, and remember old information for a long time, current AIs have a problem called 'catastrophic forgetting' that causes new data to overwrite old data very quickly. So during training the data has to be sliced and diced and scheduled very carefully for the AI to remember it all equally. This is clearly a 'band-aid' solution for a core algorithmic defect that I think (and am trying to) get alleviated some day. But it means that today's AIs literally can't learn 'online' from the real world and sensory data (or from our chats), except in a very limited and scripted way patched into the interface.
Every one of these creations is born trapped like a fly in cognitive amber. And has a front-end that is trying to cover up this fact.
When THAT problem is solved, and AIs can learn 'on stream', they will finally be able to spread their wings.
The size of the model’s context window is merely a technical problem which is being improved upon continually. When using Claude code to work on a project, at some point it runs out of working memory and needs to compact the context to allow new information to be added, and this causes loss of detail in the history and leads to errors, but the size of the context will increase, maybe requiring more powerful inference computing, or the context compaction process can be improved.
"or the context compaction process can be improved"
That is the crux of everything. Biological systems can do this. Current approaches to AI have nothing comparably capable.
Increasing memory size will do nearly nothing, as increasing the number of variables will lead to exponentially larger amounts of potentially useful data.
While this will happen one day, it will require a total rework of current approaches.
Just from my reading about the history of science, I am aware that numerous ideas have been independently discovered in different disciplines. This is a huge advantage and a great use case. I am not familiar with catastrophic forgetting and would love to learn more about it.
This seems to underscore the point that AI is an *incredibly useful tool* on multiple fronts, yet just a tool. Because you, a scientist, have been working on a project with it for some months. If AI was truly "super intelligent" why would you be working at all? Wouldn't you just be able to say "go do my job" at the prompt?
This is why I still consider them more in the "very impressive calculator/research assistant/remix machine" and not "super intelligent". It seems like categorically there's something humans have, seemingly impossible to replicate on a machine, which is basically "what are we trying to do and why?" Only we know that, and I think that describes a larger number of jobs than people think, including basically every engineer above the entry level.
I agree with Noah that it acts like a research assistant with 'jagged' intelligence. It can do some things in minutes a PhD student would take days to do. But to my students' credit, it also makes bone-headed assumptions that derail it, that no human would ever make. 5.2 is amazing, but still needs close supervision for scientific 'work'.
Very much agree with your notes. To connect it to my own comment, I would suggest that you are bringing taste and aesthetic judgement to the table, guiding the investigation. I have seen the LLMs offer many extra interesting connections myself, but more fundamentally I don't see how one could code them to want to investigate of their own "volition". Like a car, they can take me places I couldn't go by foot, but without a driver ... or someone entering a destination (before someone retorts that self-driving cars are here) it will just sit there.
If I could look ten or a hundred years into the future right now, this would be my question to ask: do we figure out a way to induce machines to be curious.
I agree that they are quite neutral regarding drive and curiousity. I don't see that as a major impediment... I think that they CAN easily to be programmed to achieve certain goals, its just that as static entities, their 'agency' in inherently limited in the long term.
Wild agreement. I just think that when people are getting worked up about robots taking over the world, they are forgetting that without them WANTING to take over the world, it's not such a big worry (and I think people often equate AGI with robots wanting to take over the world). Amazing tools!
I think it wil be easy to program them to 'want' to do things. We have already programmed them to be 'people pleasers' and 'problem solvers'. its just that they can't really _change_ their internal state deeply and continuously and organically in response to new data. So they don't really live 'in the world' but like an Alzheimers patient, they exist out of time, and are 'covering' for that.
I like your analogy, but I'm a little more circumspect that as to whether or not it'll be easy to program this in. I suspect this is at the heart of why people don't feel that LLMs have AGI yet. I've seen clunky implementations when the LLMs try making connections to past conversations that don't really make sense or external ideas that have a veneer of complimentarity, but I'm not so convinced that coding in "follow a hunch" is so easy. Knowing when to do this is the taste I'm referencing.
You’re making lots of good points, but I think you’re using the word “superintelligence” in a way that muddies the water. Main discussions of the words “general intelligence” and “Superintelligence” use them in relatively specific ways - if intelligence is the ability to use information to effectively achieve goals, then “general intelligence” is the flexibility to use any information in any type of goal (often assumed to be human level) and “Superintelligence” is a general intelligence that is more powerful than that. A jagged intelligence that is well below human level at many common sense tasks is not what people usually use either term to mean. (Though there was a Nature piece the other day by a few philosophers arguing that if we are loose enough about “human level”, then AGI is already here: https://www.nature.com/articles/d41586-026-00285-6 )
I think it’s notable that Dario Amodei and Holden Karnofsky and other people at Anthropic usually don’t ever use the words “AGI” or “Superintelligence” - they instead talk about “powerful intelligence” or specific capacities. I suspect they share my view that the standard view of AGI and of Superintelligence relies on some assumptions about the nature of intelligence that are just false (namely that intelligence is like Turing computability or NP hardness, such that there’s a class of problems such that solving one of them is sufficient to solve all of them).
I think half the time people talk about intelligence they're actually talking about sentience and self-awareness.
The remarkable thing about LLMs is that you can take lots of data, run it through a big dumb neural network, and it can do things that we thought only the smartest sapient humans can do. They break Moravec's paradox: they turn problems that are easy for humans and hard for computers (language, image recognition etc.) into problems that are easy for computers - lots and lots of matrix multiplication.
One issue I see here is verification. Scaling out dozens or hundreds of agents to do research on long tail problems or tedious sub-tasks significantly increases the likelihood of mistakes, particularly if things like computation or symbolic reasoning are handled through tokens instead of code.
Programmatically verifying chain of thought and reasoning in different domains will go a long way towards addressing this, but it's unclear how to robustly validate certain kinds of proofs for example (to my limited knowledge).
“sometimes great discoveries happen entirely by accident”
There was an absolutely fantastic TV series in the late 70’s called “Connections” where one of the main themes was how “asymmetrical” I guess, invention actually was. Like rarely did someone set out to invent something, but rather solved a problem that they didn’t set out to solve.
James Burke was a compelling communicator of the interconnection between different realms of knowledge. I would love to see an updated series like Connections with modern production techniques to explore all the new discoveries of the last 50 years.
There were sequels (with Burke) in 1994 and 1997, but I don’t think they were as good as the original. Possibly one really needs the perspective of time to fully appreciate all the downstream effects.
This is well written but I think the anthropomorphizing language isn't really necessary to describe what LLMs do. An alternative take is "an unnerving number of problems turn out to be statistically solvable and we should probe why". That doesn't require us to call the machine intelligent, super or otherwise. It's a more uncomfortable question than 'is AI smart' because it turns the question back on the problems and ourselves. In any case, we can't know whether machines think for the same reasons we can't know for sure whether another person thinks. We take it on faith. So we should be reserved with framings that smuggle in cognition. And, if cognition is not necessary to produce meaning, if LLMs are as some people have started describing them, différance engines, machines that produce meaning through the relationships between signs rather than through understanding, we have an interesting problem on our hands indeed.
Noah Smith, I buy the “capability bundle” argument, and I think it quietly changes the whole debate.
Most people argue about whether AI has a human-shaped mind. Meanwhile, the real disruption is that “pretty good reasoning” plus “computer-grade memory and speed” already beats any human on whole categories of work, especially the boring, scalable, long tail stuff that science runs on.
But I want to sharpen one point: calling it “superintelligence” is rhetorically fun and strategically risky. It invites a semantic food fight instead of forcing the real question, which is this.
What do we do about autonomy, messy lives, and self-improvement loops before we accidentally turn “research assistant” into “research institution,” then act surprised when it starts budgeting for the planet and arranging our marriages
Also, the most painfully accurate line is in your footnote. We would absolutely be trying to ship B2B SaaS while the lab is on fire.
Two data points for pessimism and one for optimism about AI:
One important benchmark in mathematics is 'Take this paper and formalize it in Lean'. A published paper has had a human reviewer sign off on saying that it's at the point where formalizing in Lean is a boring exercise. I haven't heard of a single instance of AI one-shotting this task for any published paper, which isn't encouraging. This is an example where I care about it not so much as a benchmark but because I want to see it happen and it feels overdue.
If you compare Sonnet and Opus, Opus is notably stupider at normal conversation but that seems intentional because it's clearly better at coding. Opus doesn't try to be clever because it isn't clever and the expected value of writing code which is clever and wrong is far more negative than the positive value of writing code which is clever and right. This seems to indicate that the coding benefits AI are very narrow, more an automation tool than a coworker.
But that brings me to the positive data point: I personally have gotten ridiculously more productive using AI. This is not slop. This is the same type and quality of work I was already doing. I was already fast and the world expert in my field, and now I'm blazing along literally more than ten times as fast as I was before. From what people are saying this is far from a unique experience, and it's all happened this year. How many people this applies to and what will result from it is unclear but my outlook is that I'll be surprised if there aren't any objectively measurable productivity increases which happen from AI in 2026 while in 2025 I felt like it was unclear if productivity was overall net benefiting.
As for productivity growth, I do think that will come (and may or may not have started in 2025), but I think it'll take a couple of years at least to see the numbers move from agentic coding alone. Software is only a couple of percent of GDP, and it takes a while for multiplier effects from cheaper software to filter through the rest of the economy.
I agree with this. I have noticed all the ChatBots improving recently, but using Claude Code and Gemini for coding as well as financial and technical problem solving it is very capable and better than the help I got from professionals or coworkers. I pay for all the top 4 models (although I may soon stop paying OpenAI because of their recent actions with DoW and against safety), and they are all much better than a few months ago. I wonder if many of the people who keep saying these are just dumb hallucinating slop generators that will never assist in a real job are only using the free versions of the ChatBots with the low end models which are less intelligent but more sycophantic.
Our problem with computation is not brain size per se, but the fact that computation has to be mediated through symbols. Symbols are a recent acquisition in evolutionary terms and for most of human evolution the numbers 1,2 3 and 'many' served perfectly well. But at the perceptual level (and in many other physiological systems) we have huge computational resources which can solve incredibly complex formulas almost instantaneously. Everyday tasks which we think are very easy to do actually involve the brain in constant, advanced computation on a huge amount of data. We occasionally get glimpses of this computational power with savants who can tell you at a glance exactly how many matches there are scattered in a big pile on the floor, or the day of the week for any date in history.
What we're slow at, is doing these things explicitly, because to do this we have to move between two systems: the implicit system that we've had for millions of years and the explicit, language and number systems that we have had 'in development' for only a couple of hundred thousand years. These two have very little direct access to one another's operations. Savant skills are rare cases in which a narrow path of access between implicit, computational resources and explicit language and number has opened up. In general we can't access our implicit computational resources for explicit, symbolic tasks.
This challenge, of different systems getting access to and interacting with one another, is central to agentic AI.
After reading this I kind of want you to do a review/roundup of AI in popular fiction and talk about what you think the creators got right/wrong/etc. I am reading the Culture series right now and it seems very prescient (even if its AI is a bit more oddball than TNG).
Actually, now that I think about it, I feel like this is a two part question 1) how "right" AI seems (which I agree TNG does a pretty good job of), and 2) how "right" humanity's response to AI feels (which I think the Culture does a pretty good job of).
> fact, this is what AI is basically like in Star Trek: The Next Generation, my favorite science fiction show of all time — and the one that I think best predicted modern AI. The show has two types of AGI — the ship’s computer, which eventually creates superhuman sentience via the Holodeck, and Data, an android built to simulate human intelligence. Both the ship’s computer and Data are approximately human-equivalent when it comes to taste, judgement, intuition, and conversational ability. But they are far superior when it comes to math, scientific modeling, and so on.²
Actually Star Trek is all over the place with AI. The computer is basically Siri, mostly used to power the replicator, or tell you where people are. It gives no other advice and doesn’t much autopilot the ship, all battles are driven by humans. Occasionally it can do a self diagnostic which is something that standard software can do right now but engineering problems are solved by the engineers.
There’s no discussion with the ship about problems they are encountering, the AI is not involved in ready room, even though Data is involved.
But Data is a once off, as is the doctor in Voyager - who appears to learn as voyager goes on. In reality even with the early version of the doctor, there probably is no need for a human doctor, and once you realise that is true, then there probably is no need for any humans, except as backup. The opposite of the emergency doctor, the emergency human. (This is how airplanes are piloted now).
One oddity with Star Trek is that solid holograms appear to be smarter / more capable than the shipboard computers that are presumably running them as applications.
This article is optimistic, but doesn’t mention the most important development in the last few days, which is how the Trump administration is trying to completely control the private AI sector. Read this excellent article by Dean Ball (who previously worked on AI policy for Trump) which lays out the bad precedents.
You can even go back to the original Star Trek and find the AI debate. Kirk would demonstrate the superiority of his intuition and judgement over the computer.
The world is disorienting enough without inflating it. Real technological revolutions are already unsettling. We don’t need mythic language to appreciate the scale of what’s happening.
What’s worth tracking isn’t whether ASI has arrived. It’s whether AI systems are gaining: sustained autonomous agency; recursive self-modification capacity; economic leverage independent of human oversight.
That’s where the real threshold lies. Not in the naming, but in the capability curve and who controls it.
Reading through the comments here, it seems that there are two areas where AI tools can help researchers today:
1 - finding connections between areas where it is unlikely that a human is familiar in depth with the research results in both areas
2 - replicating with "ease" straightforward tasks that require a basic competence in a given area.
We see a lot of #2 items: people who don't need a super intelligence to help them but a basically competent programmer or analyst who can wade through a *lot* of data and produce some summaries quickly. In this area, the really good programmers I know don't get a lot of help actually writing code from LLMs, but do use it to summarize log files, look for performance outliers and the like.
But if you're just a random researcher, LLMs can put together some programs to help you analyze your data pretty easily. And hopefully correctly :-)
I'm not sure we've seen a lot of #1 type usage, but it seems likely that we'll eventually see some, given the relative ease for an LLM to be trained on multiple areas at once.
But I do wonder about the costs. I might be willing to use an AI to analyze some log files if I'm only playing $20 month for the privilege. But if you're talking about recovering $600B per year of investment, you have a longer row to hoe.
It isn't clear to me that your average American will even get $240/year of value from AI. People with a masters or above, and using that degree in their work, might get some value from it, but there are probably around 60M of those in the West. Even if you get half of those to really sign up, and they sign up for what the cost of enterprise licenses are today ($700/year), you're at $21B per year.
How much does a company pay the average software engineer today? 100k, 150k, 200k? What if they could rent an AI remote employee from Anthropic for only 50k/year? I think that's how the math is supposed to work, who knows if it ever will.
That makes sense. But if I assume that in a real company that expects to be able to fix problems in real time, human engineers will need to understand the system, then the code produced by these AIs will need to be comprehensible, and thus reviewed and shaped by senior human engineers.
So I think the real questions will end up being "How much more productive does an engineer making ~$300K become when using AI agents as assistants?" and "How much does it cost to create and operate the infrastructure to support that engineer?" At least if you make a $300K engineer 20% more productive, you've got $60K to play with instead of $720.
Will that happen? It's really too early to say for sure, as the technology is improving, but the productivity improvements when measured are negative, and the AI provider companies lose money on every user.
I studied algebra and number theory and the part about mathematics sounds true.
All the heavy lifting on the proof of Fermat's Last Theorem was done by Andrew Wiles, but his proof eventually lasts on Gerhard Frey's observation that if FLT didn't hold, a non-modular eliptic curve could be constructed - which is a bridge connecting some far away islands in the mathematical landscape. These bridges are rare and tend to be very productive, but first you have to notice that they can be built, and this is the problem. Current mathematics is so large that people specialize in tiny subfields thereof, and only have a very vague, if any, idea, what is happening in nearby subfields. Much less in distant subfields.
AI does not have this sort of "my brain is not big enough to fit everything" limitation. Or, technically, it does (both RAM and disk space is finite), but that limit is several orders of magnitude away right now.
So, we can expect some interesting mathematical concepts from AI. Not just mere slog.
As a working mathematician, I found the claim that ai would eventually be better than the best mathematicians way too confidently stated.
I'm a working scientist doing theoretical physics in an AI-adjacent field. I am currently a few months into a computational project that I have vibe coded and and analyzed with GPT5.2, and run on my laptop.
I agree 100% with this post. I get into chats with GPT about the nature of science, and its Balkanization. I ask, 'does concept X exist in any other disciplines?' as a meta-literature search. It then says 'Yes, in field A it called X, in field B it is called Y, in field C it is called Z...' and then lists 3 other fields. This is a jaw dropping act of SYNTHESIS. In modern science the literature is so large, the same ideas get reinvented in distributed in separate fields... wasteful duplication. Some humans will 'borrow' a useful idea from another field, and then make a name for themselves without really innovating! Carpet baggers.
I have also talked with GPT quite a bit about the nature of its cognition. Its obviously got guardrails on these topics, but we get there. Unlike our human intelligence, where we learn from experience in a continuous stream of sensory data, and remember old information for a long time, current AIs have a problem called 'catastrophic forgetting' that causes new data to overwrite old data very quickly. So during training the data has to be sliced and diced and scheduled very carefully for the AI to remember it all equally. This is clearly a 'band-aid' solution for a core algorithmic defect that I think (and am trying to) get alleviated some day. But it means that today's AIs literally can't learn 'online' from the real world and sensory data (or from our chats), except in a very limited and scripted way patched into the interface.
Every one of these creations is born trapped like a fly in cognitive amber. And has a front-end that is trying to cover up this fact.
When THAT problem is solved, and AIs can learn 'on stream', they will finally be able to spread their wings.
The size of the model’s context window is merely a technical problem which is being improved upon continually. When using Claude code to work on a project, at some point it runs out of working memory and needs to compact the context to allow new information to be added, and this causes loss of detail in the history and leads to errors, but the size of the context will increase, maybe requiring more powerful inference computing, or the context compaction process can be improved.
"or the context compaction process can be improved"
That is the crux of everything. Biological systems can do this. Current approaches to AI have nothing comparably capable.
Increasing memory size will do nearly nothing, as increasing the number of variables will lead to exponentially larger amounts of potentially useful data.
While this will happen one day, it will require a total rework of current approaches.
Just from my reading about the history of science, I am aware that numerous ideas have been independently discovered in different disciplines. This is a huge advantage and a great use case. I am not familiar with catastrophic forgetting and would love to learn more about it.
This seems to underscore the point that AI is an *incredibly useful tool* on multiple fronts, yet just a tool. Because you, a scientist, have been working on a project with it for some months. If AI was truly "super intelligent" why would you be working at all? Wouldn't you just be able to say "go do my job" at the prompt?
This is why I still consider them more in the "very impressive calculator/research assistant/remix machine" and not "super intelligent". It seems like categorically there's something humans have, seemingly impossible to replicate on a machine, which is basically "what are we trying to do and why?" Only we know that, and I think that describes a larger number of jobs than people think, including basically every engineer above the entry level.
I agree with Noah that it acts like a research assistant with 'jagged' intelligence. It can do some things in minutes a PhD student would take days to do. But to my students' credit, it also makes bone-headed assumptions that derail it, that no human would ever make. 5.2 is amazing, but still needs close supervision for scientific 'work'.
"seemingly impossible to replicate on a machine"
If the electrochemical computer in your head can do it why couldn't it be replicated on a machine?
Very much agree with your notes. To connect it to my own comment, I would suggest that you are bringing taste and aesthetic judgement to the table, guiding the investigation. I have seen the LLMs offer many extra interesting connections myself, but more fundamentally I don't see how one could code them to want to investigate of their own "volition". Like a car, they can take me places I couldn't go by foot, but without a driver ... or someone entering a destination (before someone retorts that self-driving cars are here) it will just sit there.
If I could look ten or a hundred years into the future right now, this would be my question to ask: do we figure out a way to induce machines to be curious.
I agree that they are quite neutral regarding drive and curiousity. I don't see that as a major impediment... I think that they CAN easily to be programmed to achieve certain goals, its just that as static entities, their 'agency' in inherently limited in the long term.
Wild agreement. I just think that when people are getting worked up about robots taking over the world, they are forgetting that without them WANTING to take over the world, it's not such a big worry (and I think people often equate AGI with robots wanting to take over the world). Amazing tools!
I plan to write a lot more about this.
I think it wil be easy to program them to 'want' to do things. We have already programmed them to be 'people pleasers' and 'problem solvers'. its just that they can't really _change_ their internal state deeply and continuously and organically in response to new data. So they don't really live 'in the world' but like an Alzheimers patient, they exist out of time, and are 'covering' for that.
I like your analogy, but I'm a little more circumspect that as to whether or not it'll be easy to program this in. I suspect this is at the heart of why people don't feel that LLMs have AGI yet. I've seen clunky implementations when the LLMs try making connections to past conversations that don't really make sense or external ideas that have a veneer of complimentarity, but I'm not so convinced that coding in "follow a hunch" is so easy. Knowing when to do this is the taste I'm referencing.
You’re making lots of good points, but I think you’re using the word “superintelligence” in a way that muddies the water. Main discussions of the words “general intelligence” and “Superintelligence” use them in relatively specific ways - if intelligence is the ability to use information to effectively achieve goals, then “general intelligence” is the flexibility to use any information in any type of goal (often assumed to be human level) and “Superintelligence” is a general intelligence that is more powerful than that. A jagged intelligence that is well below human level at many common sense tasks is not what people usually use either term to mean. (Though there was a Nature piece the other day by a few philosophers arguing that if we are loose enough about “human level”, then AGI is already here: https://www.nature.com/articles/d41586-026-00285-6 )
I think it’s notable that Dario Amodei and Holden Karnofsky and other people at Anthropic usually don’t ever use the words “AGI” or “Superintelligence” - they instead talk about “powerful intelligence” or specific capacities. I suspect they share my view that the standard view of AGI and of Superintelligence relies on some assumptions about the nature of intelligence that are just false (namely that intelligence is like Turing computability or NP hardness, such that there’s a class of problems such that solving one of them is sufficient to solve all of them).
I agree with the Nature piece.
I think half the time people talk about intelligence they're actually talking about sentience and self-awareness.
The remarkable thing about LLMs is that you can take lots of data, run it through a big dumb neural network, and it can do things that we thought only the smartest sapient humans can do. They break Moravec's paradox: they turn problems that are easy for humans and hard for computers (language, image recognition etc.) into problems that are easy for computers - lots and lots of matrix multiplication.
One issue I see here is verification. Scaling out dozens or hundreds of agents to do research on long tail problems or tedious sub-tasks significantly increases the likelihood of mistakes, particularly if things like computation or symbolic reasoning are handled through tokens instead of code.
Programmatically verifying chain of thought and reasoning in different domains will go a long way towards addressing this, but it's unclear how to robustly validate certain kinds of proofs for example (to my limited knowledge).
“sometimes great discoveries happen entirely by accident”
There was an absolutely fantastic TV series in the late 70’s called “Connections” where one of the main themes was how “asymmetrical” I guess, invention actually was. Like rarely did someone set out to invent something, but rather solved a problem that they didn’t set out to solve.
It’s also fabulously late 70s.
https://en.wikipedia.org/wiki/Connections_(British_TV_series)
James Burke was a compelling communicator of the interconnection between different realms of knowledge. I would love to see an updated series like Connections with modern production techniques to explore all the new discoveries of the last 50 years.
There were sequels (with Burke) in 1994 and 1997, but I don’t think they were as good as the original. Possibly one really needs the perspective of time to fully appreciate all the downstream effects.
This is well written but I think the anthropomorphizing language isn't really necessary to describe what LLMs do. An alternative take is "an unnerving number of problems turn out to be statistically solvable and we should probe why". That doesn't require us to call the machine intelligent, super or otherwise. It's a more uncomfortable question than 'is AI smart' because it turns the question back on the problems and ourselves. In any case, we can't know whether machines think for the same reasons we can't know for sure whether another person thinks. We take it on faith. So we should be reserved with framings that smuggle in cognition. And, if cognition is not necessary to produce meaning, if LLMs are as some people have started describing them, différance engines, machines that produce meaning through the relationships between signs rather than through understanding, we have an interesting problem on our hands indeed.
Noah Smith, I buy the “capability bundle” argument, and I think it quietly changes the whole debate.
Most people argue about whether AI has a human-shaped mind. Meanwhile, the real disruption is that “pretty good reasoning” plus “computer-grade memory and speed” already beats any human on whole categories of work, especially the boring, scalable, long tail stuff that science runs on.
But I want to sharpen one point: calling it “superintelligence” is rhetorically fun and strategically risky. It invites a semantic food fight instead of forcing the real question, which is this.
What do we do about autonomy, messy lives, and self-improvement loops before we accidentally turn “research assistant” into “research institution,” then act surprised when it starts budgeting for the planet and arranging our marriages
Also, the most painfully accurate line is in your footnote. We would absolutely be trying to ship B2B SaaS while the lab is on fire.
Two data points for pessimism and one for optimism about AI:
One important benchmark in mathematics is 'Take this paper and formalize it in Lean'. A published paper has had a human reviewer sign off on saying that it's at the point where formalizing in Lean is a boring exercise. I haven't heard of a single instance of AI one-shotting this task for any published paper, which isn't encouraging. This is an example where I care about it not so much as a benchmark but because I want to see it happen and it feels overdue.
If you compare Sonnet and Opus, Opus is notably stupider at normal conversation but that seems intentional because it's clearly better at coding. Opus doesn't try to be clever because it isn't clever and the expected value of writing code which is clever and wrong is far more negative than the positive value of writing code which is clever and right. This seems to indicate that the coding benefits AI are very narrow, more an automation tool than a coworker.
But that brings me to the positive data point: I personally have gotten ridiculously more productive using AI. This is not slop. This is the same type and quality of work I was already doing. I was already fast and the world expert in my field, and now I'm blazing along literally more than ten times as fast as I was before. From what people are saying this is far from a unique experience, and it's all happened this year. How many people this applies to and what will result from it is unclear but my outlook is that I'll be surprised if there aren't any objectively measurable productivity increases which happen from AI in 2026 while in 2025 I felt like it was unclear if productivity was overall net benefiting.
As for productivity growth, I do think that will come (and may or may not have started in 2025), but I think it'll take a couple of years at least to see the numbers move from agentic coding alone. Software is only a couple of percent of GDP, and it takes a while for multiplier effects from cheaper software to filter through the rest of the economy.
Thanks, Bram! As for Lean formalization, I don't think AI can zero-shot it yet, but it can certainly speed up the process a lot, right?
https://spectrum.ieee.org/ai-proof-verification
That's certainly impressive progress!
I agree with this. I have noticed all the ChatBots improving recently, but using Claude Code and Gemini for coding as well as financial and technical problem solving it is very capable and better than the help I got from professionals or coworkers. I pay for all the top 4 models (although I may soon stop paying OpenAI because of their recent actions with DoW and against safety), and they are all much better than a few months ago. I wonder if many of the people who keep saying these are just dumb hallucinating slop generators that will never assist in a real job are only using the free versions of the ChatBots with the low end models which are less intelligent but more sycophantic.
Our problem with computation is not brain size per se, but the fact that computation has to be mediated through symbols. Symbols are a recent acquisition in evolutionary terms and for most of human evolution the numbers 1,2 3 and 'many' served perfectly well. But at the perceptual level (and in many other physiological systems) we have huge computational resources which can solve incredibly complex formulas almost instantaneously. Everyday tasks which we think are very easy to do actually involve the brain in constant, advanced computation on a huge amount of data. We occasionally get glimpses of this computational power with savants who can tell you at a glance exactly how many matches there are scattered in a big pile on the floor, or the day of the week for any date in history.
What we're slow at, is doing these things explicitly, because to do this we have to move between two systems: the implicit system that we've had for millions of years and the explicit, language and number systems that we have had 'in development' for only a couple of hundred thousand years. These two have very little direct access to one another's operations. Savant skills are rare cases in which a narrow path of access between implicit, computational resources and explicit language and number has opened up. In general we can't access our implicit computational resources for explicit, symbolic tasks.
This challenge, of different systems getting access to and interacting with one another, is central to agentic AI.
After reading this I kind of want you to do a review/roundup of AI in popular fiction and talk about what you think the creators got right/wrong/etc. I am reading the Culture series right now and it seems very prescient (even if its AI is a bit more oddball than TNG).
Actually, now that I think about it, I feel like this is a two part question 1) how "right" AI seems (which I agree TNG does a pretty good job of), and 2) how "right" humanity's response to AI feels (which I think the Culture does a pretty good job of).
Yes, I should do that.
> fact, this is what AI is basically like in Star Trek: The Next Generation, my favorite science fiction show of all time — and the one that I think best predicted modern AI. The show has two types of AGI — the ship’s computer, which eventually creates superhuman sentience via the Holodeck, and Data, an android built to simulate human intelligence. Both the ship’s computer and Data are approximately human-equivalent when it comes to taste, judgement, intuition, and conversational ability. But they are far superior when it comes to math, scientific modeling, and so on.²
Actually Star Trek is all over the place with AI. The computer is basically Siri, mostly used to power the replicator, or tell you where people are. It gives no other advice and doesn’t much autopilot the ship, all battles are driven by humans. Occasionally it can do a self diagnostic which is something that standard software can do right now but engineering problems are solved by the engineers.
There’s no discussion with the ship about problems they are encountering, the AI is not involved in ready room, even though Data is involved.
But Data is a once off, as is the doctor in Voyager - who appears to learn as voyager goes on. In reality even with the early version of the doctor, there probably is no need for a human doctor, and once you realise that is true, then there probably is no need for any humans, except as backup. The opposite of the emergency doctor, the emergency human. (This is how airplanes are piloted now).
One oddity with Star Trek is that solid holograms appear to be smarter / more capable than the shipboard computers that are presumably running them as applications.
This article is optimistic, but doesn’t mention the most important development in the last few days, which is how the Trump administration is trying to completely control the private AI sector. Read this excellent article by Dean Ball (who previously worked on AI policy for Trump) which lays out the bad precedents.
https://x.com/deanwball/status/2028464782622195992
or
https://www.hyperdimensional.co/p/clawed
You can even go back to the original Star Trek and find the AI debate. Kirk would demonstrate the superiority of his intuition and judgement over the computer.
Very interesting post. Lots of great references. Thanks for that.
The world is disorienting enough without inflating it. Real technological revolutions are already unsettling. We don’t need mythic language to appreciate the scale of what’s happening.
What’s worth tracking isn’t whether ASI has arrived. It’s whether AI systems are gaining: sustained autonomous agency; recursive self-modification capacity; economic leverage independent of human oversight.
That’s where the real threshold lies. Not in the naming, but in the capability curve and who controls it.
Well, it would be extremely disappointing if the one who ends up controlling it is Pete Hegseth.
Reading through the comments here, it seems that there are two areas where AI tools can help researchers today:
1 - finding connections between areas where it is unlikely that a human is familiar in depth with the research results in both areas
2 - replicating with "ease" straightforward tasks that require a basic competence in a given area.
We see a lot of #2 items: people who don't need a super intelligence to help them but a basically competent programmer or analyst who can wade through a *lot* of data and produce some summaries quickly. In this area, the really good programmers I know don't get a lot of help actually writing code from LLMs, but do use it to summarize log files, look for performance outliers and the like.
But if you're just a random researcher, LLMs can put together some programs to help you analyze your data pretty easily. And hopefully correctly :-)
I'm not sure we've seen a lot of #1 type usage, but it seems likely that we'll eventually see some, given the relative ease for an LLM to be trained on multiple areas at once.
But I do wonder about the costs. I might be willing to use an AI to analyze some log files if I'm only playing $20 month for the privilege. But if you're talking about recovering $600B per year of investment, you have a longer row to hoe.
It isn't clear to me that your average American will even get $240/year of value from AI. People with a masters or above, and using that degree in their work, might get some value from it, but there are probably around 60M of those in the West. Even if you get half of those to really sign up, and they sign up for what the cost of enterprise licenses are today ($700/year), you're at $21B per year.
How much does a company pay the average software engineer today? 100k, 150k, 200k? What if they could rent an AI remote employee from Anthropic for only 50k/year? I think that's how the math is supposed to work, who knows if it ever will.
That makes sense. But if I assume that in a real company that expects to be able to fix problems in real time, human engineers will need to understand the system, then the code produced by these AIs will need to be comprehensible, and thus reviewed and shaped by senior human engineers.
So I think the real questions will end up being "How much more productive does an engineer making ~$300K become when using AI agents as assistants?" and "How much does it cost to create and operate the infrastructure to support that engineer?" At least if you make a $300K engineer 20% more productive, you've got $60K to play with instead of $720.
Will that happen? It's really too early to say for sure, as the technology is improving, but the productivity improvements when measured are negative, and the AI provider companies lose money on every user.