The third magic

A meditation on history, science, and AI

Jan 01, 2023

A stoned robot wizard making magic with technology...pretty much encapsulates both the subject matter and the tone of this post! — Art by u/jcockle2 using Midjourney

This post is essentially a rewrite of a big and half-formed idea that I wrote on my old blog eight years ago. I was deeply dissatisfied with that post, but I thought it contained a few interesting seeds. So now I’m trying again, and will undoubtedly fail again. But hopefully something else interesting will come out of the attempt.

Humanity’s living standards are vastly greater than those of the other animals. Many people attribute this difference to our greater intelligence or our greater linguistic communication ability. But without minimizing the importance of those underlying advantages, I’d like to offer the idea that our material success is due, in large part, to two great innovations. Usually we think of innovations as specific technologies — agriculture, writing, the wheel, the steam engine, the computer. The most important of these are the things we call “general purpose technologies”. But I think that at a deeper level, there are more profound and fundamental meta-innovations that underlie even those things, and these are ways of learning about the world.

The first magic

Humans’ first big meta-innovation, roughly speaking — the first thing that lifted us above an animal existence — was history. By this, I don’t just mean the chronicling of political events and social trends that we now call “history”, but basically any knowledge that’s recorded in language — instructions on how to farm, family genealogies, techniques for building a house or making bronze, etc. Originally these were recorded in oral traditions, but these are a very lossy medium; eventually, we started writing knowledge down, and then we got agricultural manuals, almanacs, math books, and so on. That’s when we really got going.

Animals make tools, but they don’t collectively remember how to make those tools. History, especially written history, is what allows tinkering to stick — it means that when one human finds an ingenious new way of doing something, there’s a good chance that many other humans, and eventually all humans, will know how to do it. And of course those techniques can then build on each other over time. In the modern day we think of history as primarily a social science, but fundamentally it’s the foundation of technology as well; it’s the thing that lifted us from an almost animal existence into the agricultural age.

The second magic

Then — I won’t say exactly when, because it wasn’t a discrete process and the argument about exactly when it occurred is kind of boring — humanity discovered our second magic trick, our second great meta-innovation for gaining control over our world. This was science.

“History”, as I think of it, is about chronicling the past, passing on and accumulating information. “Science", by contrast, is about figuring out generally applicable principles about how the world works. Chronicling the motions of the planets is one thing; being able to predict the motion of planets you’ve never discovered is quite another. Tinkering with steam engines and writing down your findings can lead to progress; understanding the principles of thermodynamics and being able to use those to design a better engine, without having to tinker for decades or centuries, is far more effective.

Science is often done in a lab, but it doesn’t have to be. Ecologists can derive laws for predator-prey relationships simply by observing nature. Kepler didn’t need to do experiments with gravity in order to write down laws of planetary motion that would hold generally true. Nor do you need math to do science; many of the principles that govern the world can be expressed purely in words and pictures. But just as writing supercharged the process of recording events, controlled experimentation and mathematics supercharged the process of discovering the laws of the Universe.

Controlled experiments are powerful because they let you go from the small to the large — you can roll balls down a ramp in your house, and use that to figure out the laws of motion that control the motions of the stars. You can grow peas in your yard and use these to figure out laws of heredity. And mathematics is powerful because it lets you express those laws in ways that are easy to use across a dizzying breadth of applications. If you’re an artilleryman, field manuals embodying decades or centuries of cumulated experience can tell you how to calibrate the range and accuracy of your cannon; physics, derived from experiments with little wooden balls and expressed in mathematics, can tell you how to hit a target with far less trial and error.

If you think about it, it’s pretty incredible that the world actually works that way. If you went up to someone in the year 1500 and told them that one kooky hobbyist rolling little balls down ramps could be right about how the physical world works, when the cumulated experience of millions of human beings around him was wrong, and that his insights could be written down by an even kookier kook in an arcane language that few could speak, and that this language would allow its speakers to apply the laws to do wondrous feats that others couldn’t possibly do, they would have thought you were crazy. They did think that was crazy. And yet, it worked. It worked more spectacularly than anything had ever worked before, and the economic result looked like this:

In 1960, the physicist Eugene Wigner wrote an essay called “The Unreasonable Effectiveness of Mathematics in the Natural Sciences”. It’s really not about math so much as it’s about the fact that such a thing as simple, generalizable “laws of the universe” exist in the first place. Why should principles like physical laws or the basics of genetics be so simple, and yet so generalizable and consistent across time and space? There’s no obvious reason this should be true, and yet it is.

Well, for some things it is. Charles Marcus, my freshman physics teacher, told me that “physics is great, but it will never tell you how to make a tree.” That wasn’t entirely true. But it is true that a whole lot of complex phenomena have so far defied the approach that gave us the laws of physics and genetics. Language, cognition, society, economics, complex ecologies — these things so far don’t have any equivalent of Newton’s Laws, and it’s not clear they ever will.

This problem has been recognized for a very long time, and thinkers tried several approaches to get around it. Some hoped that all complex phenomena would be governed by emergent properties — that simplicity would emerge at higher levels of complexity, allowing us to discover simple laws for things like psychology and economics even without connecting those laws to the underlying physics. Indeed, this idea is implicit (or, occasionally, explicit) in the way economists try to write down simple mathematical laws of collective human behavior. People make fun of this approach as “physics envy”, but sometimes it really works; auction theory isn’t derived from physics, but it has been able to make very effective predictions about how much people will pay for Google ads or spectrum rights. Ditto for “gravity models” of trade, migration, retail shopping, etc. Sometimes emergence works.

Sometimes, though, it doesn’t — or at least, it doesn’t yet. But in psychology, in macroeconomics, in natural language processing, and many other domains, the search for laws of nature has been mostly stymied so far, and it’s not clear when real progress might ever be made. Wigner goes so far as to postulate that some domains of human knowledge might never be described by such simple, generalizable principles.

Other approaches for getting around the problem of complexity — chaos theory, complexity theory — yielded interesting insights, but ultimately didn’t succeed in giving us substantially more mastery of the phenomena they dealt with. In the late 20th century, the problem of complexity was like a looming wall up ahead — as scientists found more and more of the laws that could be found, a larger and larger percentage of the remaining problems were things where laws seemed very hard or potentially even impossible to find.

Our second great magic, powerful though it had proven to be, was still not omnipotent.

Control without understanding, power without knowledge

In 2001, the statistician Leo Breiman wrote an essay called “Statistical Modeling: The Two Cultures”, in which he described an emerging split between statisticians who were interested in making parsimonious models of the phenomena they modeled, and others who were more interested in predictive accuracy. He demonstrated that in a number of domains, what he calls “algorithmic” models (early machine learning techniques) were yielding consistently better predictions than what he calls “data models”, even though the former were far less easy, or even impossible, to interpret.

This raises an important question: What is the goal of human knowledge? As I see it — and as Breiman sees it — the fundamental objective is not understanding but control. By recording which crops grow in which season, we can feed our families. By understanding that germs cause disease, we can know to wash our hands or get a vaccine, and lower our risk of death. In these situations, knowledge and understanding might be intrinsically satisfying to our curiosity, but that satisfaction ultimately pales in importance to our ability to reshape our world to our benefit. And the “algorithmic” learning models that Breiman talks about were better able to deliver their users the power to reshape the world, even if they offered less promise of understanding what they were predicting.

Why should we care about understanding the things we predict? To most of us, raised and inculcated in the age of science, that might seem like a laughable question, but there actually is a good reason. “Understanding”, in the scientific sense, means deriving a simple, generalizable principle that you can apply in other domains. You can write down Kepler’s laws of planetary motion, but Newton’s laws of motion and gravitation let you generalize from planetary orbits to artillery shells. Collapsing observed phenomena to simple, generalizable laws and then expanding these laws again in some other domain to allow you to control other phenomena is fundamental to the awesome power of science. So because you and I sit at the end of 400 years of science being the most powerful tool in the world, we have naturally been taught that it is very, very important to understand things.

But what if, sometimes, there are ways to generalize from one phenomenon to another without finding any simple “law” to intermediate between the two? Breiman sadly never lived to see his vision come to fruition, but that is exactly what the people who work in machine learning and artificial intelligence are increasingly doing. In 2009 — just before the deep learning revolution really kicked off — the Google researchers Alon Halevy, Peter Norvig, and Fernando Pereira wrote an essay called “The Unreasonable Effectiveness of Data” that picked up the argument where Breiman left off. They argued that in the cases of natural language processing and machine translation, applying large amounts of data was effective even in the absence of simple generalizable laws. A few excerpts:

[S]ciences that involve human beings rather than elementary particles have proven more resistant to elegant mathematics…An informal, incomplete [list of the grammatical rules that define] the English language runs over 1,700 pages. Perhaps when it comes to natural language processing and related fields, we’re doomed to complex theories that will never have the elegance of physics equations. But if that’s so, we should stop acting as if our goal is to author extremely elegant theories, and instead embrace complexity and make use of the best ally we have: the unreasonable effectiveness of data…
So, follow the data…Represent all the data with a nonparametric model rather than trying to summarize it with a parametric model, because with very large data sources, the data holds a lot of detail. For natural language applications, trust that human language has already evolved words for the important concepts. See how far you can go by tying together the words that are already there, rather than by inventing new concepts with clusters of words. Now go out and gather some data, and see what it can do.

(Note: ChatGPT, the chatbot that has recently wowed the world, is a parametric model, but I wouldn’t get too hung up on that little detail. In terms of interpretability, a model with 175 billion parameters is not really different from one with no parameters at all.)

Anyway, the basic idea here is that many complex phenomena like language have underlying regularities that are difficult to summarize but which are still possible to generalize. If you have enough data, you can create a model (or, if you prefer, an “AI”) that can encode many (all?) of the fantastically complex rules of human language, and apply them to conversations that have never existed before. As a demonstration of how powerful this can be, let’s ask ChatGPT about the subject of this post:

N.S.: Do we need to understand things in order to predict and control them?
ChatGPT: It is generally easier to predict and control things that we understand. When we have a good understanding of something, we can identify patterns and regularities, and use this knowledge to make informed predictions about how the thing will behave in the future. We can also use our understanding of a thing to identify key variables that we can manipulate in order to control it.
However, it is possible to predict and control things without a full understanding of them. For example, we can use machine learning algorithms to make predictions about things that we don't fully understand, by finding patterns in data that we have collected about them. We can also use feedback loops and other control mechanisms to control things without necessarily understanding how they work in detail.
Overall, understanding can be helpful for predicting and controlling things, but it is not strictly necessary.

The ability to write down farming techniques is power. The ability to calculate the path of artillery shells is power. And the ability to have a machine reliably and consistently write paragraphs as clear and helpful as the one above is power, even if we don’t really understand the principles of how it’s doing what it does.

This power is hardly limited to natural language processing and chatbots. In recent years, Google’s AlphaFold algorithm has outpaced traditional scientific methods in predicting the shapes of folded proteins. Biologist Mohammed AlQuraishi wrote that:

There was, in many ways, a broad sense of existential angst felt by most academic researchers [in the field]…[those] who have bet their careers on trying to obsolete crystallographers are now worried about getting obsoleted ourselves.

We are almost certainly going to call this new type of prediction technique “science”, at least for a while, because it deals with fields of inquiry that we have traditionally called “science”, like protein folding. But I think this will obscure more than it clarifies. I hope we eventually come up with a new term for this sort of black-box prediction method, not because it’s better or worse than science, but because it’s different.

A big knock on AI is that because it doesn’t really let you understand the things you’re predicting, it’s unscientific. And in a formal sense, I think this is true. But instead of spending our effort on a neverending (and probably fruitless) quest to make AI fully interpretable, I think we should recognize that science is only one possible tool for predicting and controlling the world. Compared to science, black-box prediction has both strengths and weaknesses.

One weakness — the downside of being “unscientific” — is that without simple laws, it’s harder to anticipate when the power of AI will fail us. Our lack of knowledge about AI’s internal workings means that we’re always in danger of overfitting and edge cases. In other words, the “third magic” may be more like actual magic than the previous two — AI may always be powerful yet ineffable, performing frequent wonders, but prone to failure at fundamentally unpredictable times.

But even wild, occasionally-uncontrollable power is real power.

It’s impossible to know, just yet, how powerful this new technique will be. Perhaps AI will be a niche application, or perhaps it will revolutionize all the fields of endeavor where traditional science has run into diminishing returns. Just as none of the scientists in the 1600s knew how many wonders their theories would eventually produce, we have no idea how far the third magic will take us. People may look back on this post in half a century and laugh at how I dared to frame AI as an epistemological successor to history and science. Or perhaps AI will lead to a leap in human power and flourishing comparable to those in the two graphs above.

As always, we won’t know until we try.

The third magic and economics

Let’s briefly circle back to my old 2014 post. One reason I’m dissatisfied with that post is that I focused on ways of “understanding” the world, but as ChatGPT notes above, understanding isn’t the only thing we care about when we’re trying to make predictions and control our world. A second thing I’m dissatisfied with is that I presented empirics — statistical analysis of uncontrolled observational data — as a third tool, separate from science and history. With some time to reflect, I see less of a distinction to be made there. Observing correlations might involve some fancy math, but conceptually it isn’t that different from marking down the patterns of the seasons or the orbits of the planets. And using observational data to uncover the laws of nature — like predator-prey models — is really just another way of doing traditional science.

I do think I was right, though, to see natural experiments as something a bit different. In the past few decades, as economics has moved away from theory and toward empirics, the most important innovation has been the use of natural experiments — situations where some policy change or seemingly random difference allows you to tell yourself that you’re looking at causation, rather than just correlation. This is different than what I call “history”, because you’re doing more than just documenting facts; you’re verifying causal links. But it’s also different from science, because a lot of the time you don’t exactly know why the causal links are there. In a way, a natural experiment is its own sort of black-box prediction algorithm.

A number of subfields of econ, however, are so complex, with so many feedback systems, that they’ve largely resisted the natural experiment approach. These include not just the study of business cycles (what most people call “macro”), but also the study of economic growth, international finance, and a number of others. In these fields, theory (including “structural estimation”) still rules, but predictive power is very low.

Might we apply AI tools to these hard problems, in order to predict vast economic forces without needing to understand them? A recent paper by Khachiyan et al. argues that the answer is “yes”. The authors use deep neural nets (i.e., AI) to look at daytime satellite imagery, in order to predict future economic growth at the hyper-local level. The results they achieve are nothing short of astonishing:

For grid cells with lateral dimensions of 1.2km and 2.4km (where the average US county has dimension of 55.6km), our model predictions achieve R2 values of 0.85 to 0.91 in levels, which far exceed the accuracy of existing models, and 0.32 to 0.46 in decadal changes, which have no counterpart in the literature and are 3-4 times larger than for commonly used nighttime lights.

This isn’t yet AlphaFold, but being able to predict the economic growth of a few city blocks 10 years into the future with even 30% or 40% accuracy is leaps and bounds ahead of anything I’ve ever seen. It suggests that rather than being utter incomprehensible chaos, some economic systems have patterns and regularities that are too complex to be summarized with simple mathematical theories, but which nevertheless can be captured and generalized by AI.

And this is just a first-pass attempt, by a team with relatively limited resources. What if the people at DeepMind were to tackle the questions of economic growth and business cycles, with even bigger data sets and more advanced models? Khachiyan et al.’s paper raises the possibility that in a decade or two, macroeconomics might go from being something we simply theorize about to something we can anticipate — and therefore, something we can control. The authors suggest place-based policies, transportation infrastructure construction, and disaster relief as three possible applications of their work.

This is just one example that happens to be near and dear to my heart (and which inspired me to write this post). But it’s an example of how fields of inquiry that seemed like impossible cliffs just a few years ago may seem like easily scalable boulders in a year or a decade or a century, thanks to a grand new meta-innovation in how we predict and control the world around us. That’s what happened with the scientific revolution, and there’s no obvious reason why it might not happen again. Just the possibility should make us dizzy with excitement.

Update: Here’s another possible example, this time in physics.

David Hugh-Jones

Jan 1, 2023

A couple of comments.

1. I think you’re too quick to dismiss what you call “oral tradition”. I’d suggest that *stories* are a kind of zeroth technology. Before written history, humans had already (for millennia!) found ways to embody their knowledge in ways that they could pass down. The basic function was narrative: folk tales, myths, and so on. By 0 AD there’s already 100-300m humans on the planet - many orders of magnitude more than the chimpanzees our nearest relatives. That suggest our cumulative knowledge was already doing a lot, before writing was widespread enough to have much effect.

2. AI may really be discovering deep principles. It just doesn’t know how to explain them to us yet. If that’s true, then work on interpretable AI may help these models tell us the new laws they have discovered. After all, there must be some regularity there, otherwise how could they make their predictions?

5 replies by Noah Smith and others

DxS

"Trade" was as critical as "history". Consider how many separate expert products go into a semiconductor factory. Or how the potter's wheel disappeared in Britain, after Rome's fall. Why? Because there was no longer enough population density and social trust to support full-time potters.

Without specialization, humans have a hard limit on their tools and productivity.

Without trade, there's not much room for specialization.

1 reply

106 more comments...

Noahpinion

Discussion about this post

Ready for more?