Model Diversity and Model Hierarchies

"A" hierarchy, or "the" hierarchy?

Apr 23, 2026

I wanted to follow up my last post with an addendum on model diversity. Increases in compute have led to increases in diversity of both state-of-the-art models and simpler/more idealized models.

For “production”-level models — models used for weather forecasting and climate projections — increased diversity seems like an unambiguous positive. For these tasks we are trying to optimize over so many sources of uncertainty that the more models, and the more modeling approaches, the better. One could worry about resource allocation (how much diversity is too much?), but the current situation seems reasonable.

For research models the situation is more complicated. In 2005, Isaac Held wrote an influential essay on “The Gap between Simulation and Understanding in Climate Modeling”. This essay is often read as being about the concept of model hierarchies in general, but Isaac makes a sharper point about hierarchies of “lasting value”. In other words, establishing reference hierarchies for certain classes of problem that can be widely used and built on;. The benefits of using “the” hierarchy to study the Hadley circulation rather than “a” hierarchy. (Pengcheng Zhang recently derived some new analytical solutions for the Hadley circulation in a model that could be at the base of this hierarchy.)

Much (most?) scientific progress is made by simplifying a system down to a level at which we can understand it, then systematically adding complexity back in. So of course model hierarchies — collections of models of different complexity— are a good idea for studying the climate. The problem is it’s hard to compare the results of different studies if everyone uses a different model hierarchy. This leads to a conceptual question of which results are robust, and a practical question of which results are reproducible.

Isaac pointed out that in biology, where there is a nearly limitless number of organisms to study, “model organisms” exist that are widely studied, such as the E. coli bacteria. Progress in biology would probably be a lot slower if every lab used their own bacteria, even if that organism was better suited to their research interests.

The easy availability of compute has led to a drift in this direction, as it’s easy to spin up different models or different model configurations depending on your interest. One area this shows up is tropical atmospheric dynamics, where many different models (SAM, WRF, DAM, …) and configurations are used. The huge uncertainties in things like cloud microphysics make it hard to compare results from different models, or even the same models with different parameter settings.

In midlatitude dynamics, which is one of the most mature parts of the field with a very well established model hierarchy, you still find bespoke set-ups. As just one example, Polvani and Kushner added a more realistic stratosphere to the standard Held-Suarez model, which led to variants with their own choices of transition height and topography, each drifting a bit further from the original Held-Suarez formulation.

The rise of MIPs (Model Intercomparison Projects) has been one response to this diversity. But these ask the opposite question of what behavior is robust across all model variants, rather than probing a single model or set of models deeply. Projects like RCEMIP have had huge impacts, but they represent a different approach.

One way of thinking about this is as a coordination problem across the field of trying to balance the benefits of model diversity (exploring parameter space, establishing robust results) with the benefits of reference models (reproducibility, depth of investigation). The availability of compute has made the former easier, and naturally pushed us in this direction. And of course many of the problems we are interested in do have huge uncertainties associated with them. Each new model formulation is a bet on which physical processes matter most, and progress is often made when a clever new model is introduced.

But that doesn’t mean the current balance is right. In fact, developing reference hierarchies for studying small-scale RCE, or warming trends in the eastern Pacific, could be viewed as opportunities to sharpen our thinking about these problems. The process of agreeing on a hierarchy – deciding which processes to include and which approximations/abstractions matter – would be a way of clarifying what we know and which questions we want to answer. Just creating these hierarchies could represent progress. MIPs will remain essential, but it might be worth spending a little less time on them and a little more time building shared reference hierarchies for particular classes of problem.

Notes on Climate

Discussion about this post

Ready for more?