Uncharted Territory

May 1, 2012

The Wettest Drought in History

One of my responsibilities as a teenager was to keep the lawn under control. Flymos had presumably not yet been invented, and petrol-driven mowers were perhaps too much hassle, so ours was manual. If the grass got too long it was hard work and it could even become necessary to resort to shears, which was back-breaking work. But mowing was also difficult if the grass was damp. There was therefore a trade-off each spring. The first mow had to be done when it was mild enough for the grass to be reasonably dry, but couldn’t be put off until it was too long. And as the grass grew it dried out more slowly each day. So it was essential to make use of any opportunity to mow in case the weather turned wet again. It probably only happened once or twice, but it seems I was always caught out. I’d wait for one more dry day to make the job easier, but the skies would open and a week later the job would be twice as difficult.

Nowadays the internet and improved forecasting allows me to monitor the weather far more effectively. Thus it was I’d already been out with the mower in March, and, seeing the long-range forecast, made sure I got a mow in just before it started raining early in April.

The point is that the 5-10 day forecast is now fairly reliable.

Why, then, was the UK drought – declared in a few regions in March, with hosepipe bans from 5th Aprilofficially extended in mid April?

Yes, that’d be in the middle of the wettest April on record!

We’re now in the farcical situation of the “wettest drought in history”, with a succession of “experts” (and junior ministers) popping up on TV claiming the rain in April somehow doesn’t count. Apparently it’ll run off compacted ground. Yes, maybe for the first day or two, but not after a month. With the wettest April on record followed by significant rain already in May, and more forecast in a day or two, the drought risk is simply receding. We’re in one of those surreal situations where reasons are being invented not to contradict previous claims, in this case that the drought would last into next year.

What baffles me is why the drought was extended when wet weather was forecast. Surely – since most of the time it’s dry – the drought risk is receding as long as there’s significant rain in the forecast. And, as the 5-10 day forecast is fairly reliable and everything after that isn’t, you simply run the risk of looking stupid if you don’t wait until the forecast is for dry weather.

I wonder whether there’s a tendency to believe long-term forecasts more than short-term ones. But long-term forecasts only indicate a small bias one way or another, as Met Office modelling indicates:

“New three-month forecasts by the Met office suggest little respite with April, May and June expected to be drier than average. ‘With this forecast, the water resources situation in southern, eastern and central England is likely to deteriorate further during the period. The probability that UK precipitation for April-May-June will fall into the driest of our five categories is 20-25% while the probability that it will fall into the wettest of our five categories is 10-15%, it says.’ ” [my emphasis]

So 20-25% dry plays 10-15% wet plays (presumably) 60-70% around average. Not sure I’d have put a lot of money on the “expectation” of a dry spring this year (certainly wouldn’t now!). Even less after I’d looked at the Met Office report (scroll down to find PDFs) because the model runs are all over the place.

And are these “probabilities”, anyway? Isn’t the modelling signal swamped by the noise of uncertainty? It seems to me likelihoods based on model-runs are not the same as probabilities in the real world.

I’d say the Met Office and the media (the quote marks indicate the introductory sentence was written by the Guardian’s John Vidal) need to mind their language. How about “slightly more likely than not to be” rather than “expected to be”? And perhaps “indication” rather than “forecast”? And “x% of model runs gave…” rather than “the probability that…”? And definitely “might” rather than “is likely to”!

September 23, 2011

Drill Ice, Baby, Drill Ice – Reflections on Clive Oppenheimer’s Eruptions that Shook the World

Filed under: Global climate trends, Global warming, Science, UK climate trends, Volcanoes — Tim Joslin @ 4:03 pm

Clive Oppenheimer notes in his Acknowledgements that he “planned to finish writing this book in 1999!”. Whilst I found Eruptions that Shook the World very informative and readable, it would have benefited from just a bit more effort. For example, the date of the El Chichon eruption is referred to in several places as 1985, though in others, correctly, as 1982 (as I’m sure I read in some other review, though even Google can’t help me out here). More substantively, there is some repetition and an immense amount of cross-referencing. I would also have preferred the inclusion of a comprehensive list of eruptions rather than (or as well as) the superficial details that are included between the Preface and the Introduction and as Appendix A, which excludes the large category of Unknowns and many other events discussed in the book (some of which are in the earlier table). As well as being an incomplete reference source, the book has the feel of being a final draft rather than the finished article.

Most annoyingly, of recent eruptions of which we obviously have the best data, Pinatubo (1991) is discussed in detail (p.54-69), but El Chichon (1982) is referenced only in passing and Agung (1963) hardly at all. In particular, there seem to be important differences between the climatic effects of the El Chichon and Pinatubo eruptions, which would have been worthy of discussion.

Nevertheless, Eruptions fills a gap between school-level and academic material and anyone interested in the subject will find it a stimulating read. Some other reviews are listed here, though how carefully Kate Ravilious read it for New Scientist is in some doubt as she seems to think Oppenheimer discusses “thick layers of ash in Greenland ice cores” rather than the varying sulphuric acid fallout in the cores.

I should say that whilst I read Eruptions to better understand the effects of volcanoes on the climate, the book does discuss the other nasty things volcanoes can do to you, and a great deal more besides.

Minor gripes aside, I presume Oppenheimer’s account reflects the current state of academic thinking about the effects of eruptions on climate. It is this about which I have concerns, that is, the science itself, rather than Oppenheimer’s account of it.

Let me outline what appear to be the central tenets of the current paradigm, and comment as I go along:

1. The climatic effects of eruptions are entirely due to sulphuric acid aerosols.
Volcanoes eject varying amounts of sulphur in the form of sulphur dioxide and hydrogen sulphide into the atmosphere at varying heights and in varying proportions to the total amount of ash, lava and other material. The sulphur reacts to form sulphuric acid aerosols which can remain in the stratosphere for months to years, where they reflect light (and absorb heat, which helps keep them aloft). There is therefore a “recipe for a climate-forcing eruption” (Eruptions, p.69ff).

Eventually the sulphuric acid aerosols descend, and a historic record of sulpuric acid loading can be derived from ice cores, principally from Greenland and Antarctica. Oppenheimer draws on work at Rutgers University by Chaochao Gao, Alan Robock and Caspar Amman (presumably et al – this must have ben a lot of work) to produce an ice-core volcanic index (IVI). He reproduces Gao et al’s graph (as Fig. 4.6, p.98), which I kept referring back to. Here’s my copy-paste from Rutger’s site (for some reason there are spurious double lines on my version – check back at Rutgers if confused):

The IVI replaces H.H. Lamb’s famous Dust Veil Index (DVI). The idea that particles of dust as opposed to sulphuric acid could reflect light away is rejected entirely, or at least the effect of dust is considered insignificant. I find this assumption dubious. For example, the eruption of Huanyaputina in 1600 apparently had catastrophic effects on the climate – causing the Great Russian Famine – yet was, according to the IVI, only about twice as severe as Pinatubo, which really didn’t have a huge effect. Its sulphur emissions are dwarfed by those of Tambora in 1815 and Kuwae in 1452, yet it seems to have had at least as much of a cooling effect. Unfortunately, instrumental temperature records don’t go back to 1600, so we have to rely on anecdotal evidence. Here’s what Brian Fagan says in The Little Ice Age (p.104):

“The volcano discharged at least 19.2 cubic kilometres of fine sediment into the upper atmosphere. The discharge darkened the sun and moon for months and fell to earth as far away as Greenland and the South Pole. Fortunately for climatologists, the fine volcanic glass-powder from Huanyaputina is highly distinctive and easily identified in ice cores.

Huanyaputina played havoc with global climate. The summer of 1601 was the coldest since 1400 throughout the northern hemisphere… Summer sunlight was so dim in Iceland that there were no shadows.”

It seems to me at least plausible that the effect of eruptions on climate is due to dust particles as well as sulphuric acid aerosols. Indeed, my main problem with the IVI (see the paper Gao et al, 2008, which is available to download as a PDF from the Rutgers site) is that not enough has been done to establish how closely ice core sulphate levels correlate with climate impacts of volcanoes. As well as the possibility that there are significant effects due to other kinds of particle, there are other potential complicating factors:

  • varying proportions of stratospheric sulphuric acid aerosol may end up in the ice, so that the IVI only gives an indication of the severity of the climate impacts of the eruption;
  • the amount of sulphate in the ice gives no indication of how long it remained as sulphuric acid aerosol in the atmosphere – obviously the amount of sunlight reflected away is a function of time as well as aerosol density;
  • some of the sulphate in the ice may not have reached as high as the stratosphere to cause significant climate effects (this must surely distort the figures for Icelandic, such as Laki, 1783, and Alaskan eruptions).

We have a lot of data on recent eruptions, which would seem to provide a means of establishing the usefulness of the IVI, which might be a good idea before translating it into a dataset to be plugged into climate models, as Gao et al have done. I can see the appeal of such a mechanistic approach, but it seems to me that the effects of different eruptions vary more than a single variable (OK in most recent cases we also have a date, or at least a season) would seem to suggest.

One problem with the IVI is that although it includes Pinatubo (1991), it does not include El Chichon (1982) because not enough of the Arctic ice cores were old enough, and there is no signal for El Chichon in the Antarctic (I’m unclear why a full signal for Pinatubo is apparently included). This misses a golden opportunity to validate the data. Clearly we need to get out to Greenland and drill more ice cores before the whole lot melts.

A further problem is that the IVI does not explain all of the data. For example, the cold period in the 1690s, including the exceptionally cold summer in 1695, as well as the record cold summer of 1725 (see my recent post on the cold summer of 2011) are completely unexplained. Note that the 1690s has long been a problem. Lamb interrupts his DVI list to discuss it (note the criticism of subjectivity which may affect the whole DVI – the eruption data may be deduced from the weather data rather than independent of it). Here’s a screen grab of part of the DVI which is accessible on Google Books:

Excerpt from Lamb, Climate: Past, Present and Future

More recently, I’d understood* that the dip in temperatures at the start of the 20th century was due to the Santa Maria eruption of 1902 (not to be confused with the famous Mount Pelee eruption of the same year, which is notable for causing a large number of fatalities). But there is only a small signal in the Gao et al data for 1902 (3.77 compared to 30.09 for 1991, the year of Pinatubo, limiting Gao et al’s spurious accuracy perhaps less than I should!).

* e.g. in the IPCC figure produced in a previous post.

2. Eruptions may affect only one hemisphere.
Tropical eruptions can have effects on both hemispheres, depending on (apart from the characteristics of the eruption and the weather at the time) latitude and time of year (and hence the position of the inter-tropical convergence zone, ITCZ). In their paper, Gao et al in fact separate out the hemispheric sulphate records:

Pinatubo affected both hemispheres, but El Chichon only the northern hemisphere (NH). El Chichon, though, seems to have reflected away at least as much heat, having produced a volcanic cloud “extending from the Equator to 30 deg N for more than 6 months, and then gradually spreading more widely” (Alan Robock, 2002, PDF). We can see this first in the atmospheric transmission of solar radiation record from Hawaii:

and, more to the point, in the record of oceanic heat content, where the dip in the early 1980s seems to have been greater than that in the early 1990s (though perhaps already underway by the time of the eruption):

So, if El Chichon removed more heat from the oceans than Pinatubo, and removed the bulk of it from the NH, you might expect some kind of effect on the Arctic ice. Here’s the annual ice extent for August 1979-2011, from the (US) National Snow and Ice Data Centre (NSIDC):

1983 and 1991 both seem to be above the annoying blue trend line (I always feel you need a better reason for drawing lines through data than that you feel like it!), but one might expect the effect to take longer than one year to play out. Indeed, if you imagine replacing the annoying blue line with one from around the turn of the millennium when one might suppose the effects of the two eruptions to have played out, the trend would seem to be a lot steeper. Maybe this tells us nothing more than that the eruptions cause a bit of an ice melt backlog, but I just thought I’d throw that point in.

Perhaps resolving the puzzle a tad, Realclimate have helpfully drawn my attention to ice volume data from PIOMAS, which I copy here purely for convenience:

This perhaps shows more clearly the greater effect of El Chichon (1982) than Pinatubo (1991) on the Arctic ice, though, again, we have trend-lines that confuse the issue, and, again, the eruption occurred somewhat after the temporary ice volume minimum at the start of 1982, and could not have influenced ice volume until at least mid-1982. Notwithstanding, if, here, one ignores the blue line and confidence-interval shading, one might postulate that the combined effect of the two eruptions was to negate any ice-melt that would have otherwise occurred – due to global warming and the fact that if the ice builds after eruptions, logic suggests that it must melt in their absence – for almost two decades, from 1982 to the turn of the millennium, and tentatively conclude that we’re now playing catch-up.

3. Tropical eruptions are climatologically more important.
The theory (Eruptions, p.72-3) seems to be that high latitude eruptions have less effect on climate, though time of year is obviously critical. Although Laki (1783) had dramatic effects on the climate, at least for a year or two, it was a very large eruption.

Oppenheimer briefly mentions the case of Kasatochi (August 2008), a moderate sized sulphur-releasing eruption in Alaska, and the most significant climatologically since Pinatubo (1991). Sure enough, you can see the signal in the Mauna Loa, Hawaii record, above (now I realise I should have numbered the figures). And here’s the possibility of an effect in another ice extent representation from NSIDC:

Not very conclusive**, but maybe the ice did start to re-form a bit quicker than usual in 2008.

** See also the Postscript to this post.

4. The climatic effects of eruptions last only for a few years.
There seems to be an emphasis in the literature on the short-term effects of eruptions. Presumably this is because an event, such as the eruption of Pinatubo, attracts a burst of interest – and generates a flurry of publications – for a few years, before everyone moves on to other projects. Oppenheimer (p.76), suggests forcing lasts around 3 years, after which aerosols disperse, temperature is affected for around 7 years, and sea-ice “perhaps for a decade”. But, he says, oceanic circulation “can be perturbed for up to a century”. Surely this in turn would affect climate? The emphasis on transient effects seems to conflict with the reconstructions of historic temperature records, when, I understood, the main explanation for century-scale variability (the Little Ice Age and all that) is the pattern of natural forcings, principally volcanic eruptions. The story doesn’t appear to be entirely straight, and perhaps this is due to an emphasis on debunking the idea that supervolcanoes (such as Toba 73kya) could have plunged the Earth “back into the ice age” (Oppenheimer, p.190ff).

5. The climatic effect of eruptions scales less than linearly – larger eruptions do not have a proportionately greater effect.
The theory (Oppenheimer, p.191-2) seems to be that larger eruptions produce so much sulphur that larger sulphuric acid particles form, which descend through the atmosphere quicker, so that larger eruptions (as indicated by the sulphuric acid loading in ice cores) do not have proportionately greater effects on the climate.

This all seems a bit speculative. I would have thought a sufficient explanation was that, assuming larger eruptions don’t affect the atmosphere for longer than less extreme events (you’d expect similar sized particles to descend at a similar rate however many of them there are), it seems impossible for effects to scale, given the amount of sunlight reflected away by even relatively small eruptions like Pinatubo and El Chichon (see the Mauna Loa diagram, above, again!). After all, there’s only so much sunlight to reflect away, so (as for greenhouse gases) the energy gain (negative in the case of volcanic aerosols) will be a log function of concentration.

6. The effect of eruptions is to produce cool summers and mild winters.
Except when they don’t.

This is a very confusing aspect, perhaps complicated by the small sample size of recent eruptions. There’s also a need to clarify what is meant.

It’s certainly true that it’s rare for the year of an eruption to experience a cold NH winter. This is what I naively expected when I first started looking at the Central England Temperature (CET) record – eruptions cool the planet, so winter should be colder, right? But in fact cold winters do not immediately follow eruptions, with one notable exception – 1784 after Laki, which also produced a hot summer (Oppenheimer devotes his chapter 12, The haze famine, p.269ff to this event, a repetition of which would, even, or maybe especially, in the 21st century, present serious challenges to health, transport – especially air – and agricultural services in Europe and maybe the entire Northern Hemisphere).

The general story seems to be that eruptions produce more zonal weather at least in the short-term, by heating the stratosphere and disrupting poleward heat transport by the large-scale atmospheric circulation. This leads to mild winters in western Europe (i.e. the zonal pattern of westerly airstreams dominates).

It seems to me there must also be an immediate effect on patterns of oceanic temperature and heat content. I’ve noted before that it appears volcanoes can trigger or exacerbate El Nino events, although this seems to be an area of controversy. Among other effects this may tend to produce mild NH winters.

But perhaps there are also persistent effects on patterns of oceanic heat content, thought to determine NH winter weather in particular. For example, there were generally mild winters in the UK at least for more than a decade after Pinatubo. Yet cold winters – and often runs of colder than usual winters – followed a few years after Huanyaputina (1600 – 1607 was extremely cold); the unknown 1809 eruption (General Winter defeated Napoleon in 1812 and 1814 was the last Thames Frost Fair); Katmai (1912 – 1917 was particularly cold); an eruption in 1925 which has a similar ice-core sulphur signature to Katmai (1929 was cold); Agung (1963); and El Chichon (1982). It’s a confusing picture, and it’s possible that these eruptions simply occurred during series of cold winters (e.g. the famously cold winter of 1962-3 was over by the time of the Agung eruption). Nevertheless, a hypothesis might be framed to relate the location (and season) of eruptions and hence their differential effect on ocean heat content in different regions (or just latitudes) to their effect on climate over a decade or more, through intensifying or weakening (or, in the case of the largest eruptions, completely overriding) the underlying multi-decadal cycles, such as the Atlantic Multi-decadal Oscillation (AMO).

Scientists often give the impression that they’ve answered all the questions. It’s often seemed to me that this puts off those most inclined to produce radical new ideas from specialising in the disciplines that seem to be “solved”. That is certainly not the case with the effect of volcanic eruptions on climate. There are more questions than answers. And, if the historic record is not enough, new events to investigate occur every few years. I’ll certainly be keeping an eye out for new developments in the field.

Postscript (2/10/11): Amended post to tidy up section on cold winters following eruptions, adding a reference to the 1809 event (location unknown) and to scale down some of the diagrams so they’re less in your face. Also, the figure below (from JAXA via Realclimate), perhaps shows the more than usually rapid ice build in 2008 more clearly than the NSIDC figure above, though you have top look closely at the spaghetti to see that the 2008 dark green line shows one of the lowest September ice extents in the period covered turning into one of the highest extents by November:

September 3, 2011

How unusual was the cool UK summer of 2011?

Filed under: Global climate trends, Global warming, Science, UK climate trends — Tim Joslin @ 7:05 pm

Why do so many in the media feel they have to get their story in before the final whistle? It’s always a risk. Towards the end of August, a number of articles, typified by this one in the Guardian, trumpeted 2011 as “the coldest summer since 1993″. Political correctness is the order of the day – judging from the figures quoted, the record refers to the whole of the UK. I prefer to use the Central England Temperature (CET) record, which goes back further, to 1659. And I waited until the final data was in (there’s always a delay at the end of the month before the Met Office provide the final figure) and updated my spreadsheet. Here’s my latest summer temperature graph:

CET for Summers 1660-2011 (smoothing shown at central point of date range)

Note that my running means (smoothing) are shown centred, i.e. for the central of the 5, 11 or 21 years averaged. I tried the possible alternatives (i.e. trailing and forward – the latter to try to see the effect of events, such as eruptions), but this representation seems clearest to me. This way, you can most easily see the effect of, for example, the mystery eruption of 1809 and the Tambora eruption of 1815, with all curves dipping at about the same time.

I was expecting to be writing that a comparison with 1993 is not a level playing field, since the eruption of Pinatubo in 1991 cooled the whole planet for a few years (see the graphs from James Hansen that I posted in 2010), making 2011 more freakish, since there hasn’t been a recent eruption. But, in the CET record at least, summer 2011 was in fact colder than 1993.

As the graph shows, there were some colder summers in the mid 1980s, but, again, the whole planet was cooled a tad at that time by the eruption of El Chichon in 1982.

So you have to go back to the 1970s to find a summer cooler than 2011 that wasn’t induced by a volcanic eruption.

Still, you can expect the coldest summer in 40 years every 40 years, so on this reckoning 2011 was not that exceptional – compared to, say, December 2010.

But let’s go a little bit further and take global warming into account. Because of global warming we’d expect warmer summers. Indeed, as the graph shows, prior to 1933, the CET summer mean had only exceeded 17C twice (in 1826 and 1846). The mean CET touched 17C in 1933 and edged past it in 1947. But in the last 40 years it has passed that mark on 5 occasions: 1976, 1983, 1995, 2003 and 2006. The 5, 11 and 21 year running means have all broken new ground.

We should really judge the freakishness of 2011 against the prevailing summer temperature. The trouble is, we don’t know whether temperatures will continue to increase, level off for a decade or two, or even dip – that’s why the 11 and 21 year running mean curves stop before they get to the present day. If summers over the next few years are as warm as from 2003-6, then 2011 will look very unusual – perhaps the most atypical summer since 1860, which was more than 1.5C cooler than might have been expected.

On the other hand, if it turns out that the atypical summers were 2003-6, and temperatures level off for a while, then summer 2011 will just represent the sort of anomaly that might be expected every few decades, rather than a once a century or two event.

Regardless, 2011 is a long way from matching 1725 as the most disappointing summer in the CET record. 1725 was even cooler than 1816, the “year without a summer” following the Tambora eruption!

So, 2011 was surprisingly cool, but not unprecedented.

———
Incidentally, anyone who followed the link to my previous post which looked at global temperature data might have noticed that the graph of the mean summer UK CET record is uncannily similar in shape to that of (annual, not just summer) Northern Hemisphere (NH) temperatures as a whole.

For more convenient comparison here’s a more recent graph (i.e. including 2010) from the GISS graph site (we’re primarily interested in the solid red line representing the 5 year running mean NH temperature):

The hemispheric temperature record from GISS

Note how, in both graphs, the temperature peaks around 1900, then dips (usually attributed to the 1902 Santa Maria eruption), rises from around 1920 to a peak around 1940, dips again to 1970 or so, then rises into the new millennium. Overall, the magnitude of UK summer temperature changes is about the same as that for the NH as a whole, though the 1930s to 1940s peak is a little more pronounced. So it’s not just UK summer temperatures that vary – as I said, in comparing summer temperatures for freakishness (rather than trends), we need to take account of global warming.

Note also the effects of the eruptions of Pinatubo (1991) and El Chichon (1982) on the NH temperature record (or at least the dips in NH temperature following the dates of the eruptions!). This justifies my decision to exclude 1993 and the mid 1980s summers from the comparison with 2011.

I recently visited my storage unit and discovered that some boxes had fallen and damaged a fan I bought in response to the heat in, I think, 2005. The fan had been gathering dust for a few years – I haven’t needed it. The fact that I’ve paid to store the thing surely shows, though, that I certainly didn’t expect such a change from 2003-6, when all summers exceeded a mean CET of 16C, to 2007-11 when none have (the sudden dip in summer temperatures is clearly shown by the green 5 year running mean in the first figure, above). This weather/climate business is sure full of surprises!

July 31, 2011

Uncertain about Risk and Uncertainty

Filed under: Complex decisions, Global warming, Philosophy of science, Reflections, Science — Tim Joslin @ 6:57 pm

Here’s an interesting – nay, potentially iconic – figure from a paper, Greenhouse-gas emission targets for limiting global warming to 2C, Meinshausen et al, Nature vol.458, p.1158, 30th April 2009:

It represents the results of a heroic assemblage of climate modelling data. The horizontal axis gives the emissions from 2000-49 (belying the paper’s title, incidentally) in GtCO2 for each set of data plotted and the vertical axis – careful wording alert – the probability of the given model output set predicting a global mean temperature increase of 2C or greater over pre-industrial levels before 2100. The dots and swathe of colour are the outputs of all this modelling; the solid black line some kind of best fit (I know not how determined) or “illustrative default”; the dotted line incidentally is the outcome based on a set of models including carbon cycle feedbacks, which implies less likelihood (carefully chosen word) of hitting 2C for given fossil fuel and land-use change emissions.

The good news is that if this modelling exercise represented a set of real world possibilities, we could, for example, emit around 1500GtC02 and still have a 50% chance of avoiding “dangerous climate change” defined as the 2C temperature increase. The bad news is that the grey area bottom left of the figure represents the 234GtC02 of actual emissions from 2000-06 (7 rather than 6 years, I believe, though the paper is scandalously ambiguous on this point) – basically we have to slow way down.

You could probably write a dissertation about the diagram – for example, why are we discussing scenarios for the future such as the IPCC’s A1F1 (top right) which apparently emit more carbon before 2050 than is stored in the world’s fossil fuel reserves (and even more in the second half of the century)? In fact, why are we discussing more than one of the IPCC scenarios, since even the best of them, B1, is unlikely to lead to less than 2C warming? There is surely no longer any need to nuance the basic point that unmitigated emissions will lead to dangerous climate change.

But all I want to cover just now is essentially one word. The word “probability” on the vertical axis.

What we have here are not probabilities in any real-world sense. There will only be one outcome. We will be at one position on the horizontal axis, depending on (in this case) our emissions before 2050 (though different pathways may give different outcomes for the same 2000-49 emissions). The distribution about the illustrative default (a vertical slice through the diagram) is an indication of our state of knowledge (as encapsulated by the models used) as to what will happen to the global mean temperature as a result. It is not, in a strict sense, a probability.

Using the term probability in this context is unfortunate. In fact, I might even go so far as to say it is symptomatic of a pathology: the same pathology evident when the idea of carrying out climate “experiments” is discussed. Guys, we are using models, not instances of the real world. If parallel universes do exist, we do not have access to them.

My personal issue with all this is that when I read the word “probability” I assume we’re talking about risk, when in fact the topic is uncertainty. I simply can’t help it.

It is extremely unfortunate that the climate modelling community (and perhaps a wider group) has chosen to use the word “probability” to refer to uncertainty as well as risk when they could simply have used “likelihood”. This has been extended to probability distribution functions (PDFs) which are often nothing of the sort (I say “often”, because, confusingly, the horizontal axis may be a probability function and the vertical axis what I would call a likelihood function). These figures should be renamed likelihood distribution functions, with the added advantage of a less overused TLA, LDF rather than PDF.

Apart from confusing models with the real world, the inappropriate use of the term “probability” has caused another problem: too much weight is being given to quantifiable knowledge (included in models) and too little to that which has not been quantified. The term “likelihood” would instead force people to focus on the right questions. What’s happened is that the climate science community is pretending it can somehow answer epistemological questions – those about the state of knowledge – in a scientific, quantifiable way. Presenting the information in precise terms – “my belief in the actual increase in the global mean temperature we can expect for 1500GtC02 emissions is represented by this graph” – doesn’t alter the fact that all we are discussing is the state of our knowledge. And when we let machines produce the graph we automatically lose anything not input to the calculation.

I thought I’d try to clarify the problem with a simple analogy.

If I toss a coin and call “heads” the probabilityrisk – of losing is approximately 50%. I can make decisions on this basis.

But I said “approximately”, because there is some uncertainty – the coin may have a bias.

Now, any estimate of uncertainty in this case depends entirely on my state of knowledge. For example, I might have tested the coin – suspecting it might be weighted – before the crucial coin toss and found that of hundreds of attempts 55% came up heads. This (and my knowledge of statistical confidence testing) may then lead me to believe the risk of losing to be not 50% but 45%. But this would be exceptional. For a randomly chosen coin I would normally be prepared to say it’s likely – an expression of certainty – that the risk of losing when calling heads is, for all practical purposes, 50%.

If asked to quantify how sure I am that the risk of losing is 50%, I might say 99.99%, because I believe the vast majority of the coins in circulation are equally likely to come down heads as tails. But this 99.99% is what I would term a likelihood – a judgement of the uncertainty – not a probability.

On the other hand I may be asked to pick a coin from a bag I’m told contains 50 normal coins and 50 weighted so as to come down heads only 25% of the time. Now it becomes a probability as to how likely it is that the coin is true. I can even calculate the overall probability of tossing a head (37.5%). There may still be uncertainty, though – whoever told me 50% of the coins are weighted might have been lying.

What I hope this analogy conveys is the importance of being clear about what we know and what we don’t know. We should only talk about risk and probabilities within a defined theoretical framework. When making judgements as to the state of our knowledge we should be discussing uncertainty and (I suggest) likelihood.

Let’s turn back to the iconic figure from Meinshausen et al and that vertical axis labelled “probability”.

Are we really talking about probability or likelihood?

Remember where the numbers come from – a series of computer model runs. Here’s a philosophical point: no scientific model is the same as the real world. A simple scientific law such as F=ma or E=mc2 may make very good predictions, but has different characteristics to the real world. And these simple models give the same answer every time (or rather, are not sensitive to small variations in the initial conditions). When we use complex computer-based models such as of climate, we find that quite different answers result from multiple runs with minimally perturbed initial conditions.

What we don’t know is how much of the variation in model runs depends on a lack of determinism of the real world and how much is a result of the characteristics of the models. Remember the real world is air, water, sunlight, clouds and so on. The model is numbers representing big chunks of real stuff in a computer. Maybe there is a “butterfly effect” and the magnitude of global warming in a century’s time will depend on minor unpredictable events that happen today. Somehow I doubt it – weather may be affected by small events, but not climate – though the issue could be discussed long into the night.

What is less disputable, though, is that we don’t know how much of the variation in model predictions of global warming is due to different real world possibilities and how much is due to the characteristics of the models.

By failing to distinguish probability from what I term likelihood, and labelling their vertical axis as “probability”, papers such as Meinshausen et al are perhaps implicitly asserting that what is represented is entirely due to real world variability.

It seems to me far wiser to do the opposite. We should assume that the variability of model outputs represents uncertainty. We are best off considering that if the models were perfect they’d give the same answer every time.

But Meinshausen et al might argue that all they’ve done is use the word “probability” loosely. They might agree with my argument. Now, though, we have another problem. We are giving much more weight to model variability than to other forms of uncertainty. The models only capture some of the known unknowns. There are known real-world phenomena that are not included in the models – poorly understood phenomena, for example, carbon-cycle feedbacks, such as methane release from tundra. And then there are unknown unknowns.

Here’s my biggest problem – if we are to present some forms of uncertainty in numeric form then surely we are obliged to present all forms of uncertainty in the same way. It is misleading to simply list the things we haven’t taken account of. We have to make judgements about the known unknowns and unknown unknowns and adjust the uncertainty distribution accordingly. It’s not all bad news: we might argue that the models overstate the uncertainty in outcome and narrow the distribution in figures such as that by Meinshausen et al.

Policymakers need a considered view of the state of climate knowledge, not diagrams that present dubious “probabilities” and a set of provisos.

February 24, 2011

Extreme Madness: A Critique of Pall et al (Part 3: Juicy Bits and Summary)

Filed under: Effects, Global warming, Science — Tim Joslin @ 6:22 pm

I continue to be bothered by Pall et al, the paper which attempts to determine how much more likely the autumn 2000 floods in England and Wales were because of the anthropogenic global warming (AGW) since 1900.

To recap, Part 1 of this extended critique described the method adopted by Pall et al and made a few criticisms, one of which I’ll elaborate on in the first part of this post. Part 1 ended by asking why Pall et al didn’t eliminate more statistical uncertainty, given the large number of of data points they produced (they ran over 10,000 simulations of the climate in 2000 when floods occurred).

Part 2 looked more closely at how Pall et al had defined risk and uncertainty and handled it statistically. Part 3 will further question the approach adopted, in particular by considering the uncertainty introduced by the process of modelling the climate itself.

Oops, it’s a log scale, or “about this 0.41mm threshold” revisited

In Part 1, I noted the arbitrariness of the threshold for severe flooding adopted by Pall et al. They considered their model had predicted flooding when it estimated 0.41mm/day or more of runoff, but their Fig 3 clearly shows that this level actually gives rather more than the 5-7 floods in the ~2000 model runs of each of the 4 A2000N scenarios (those without AGW, the AGW runs being referred to as the A2000 series, of which around 2000 were also run) that would be expected for the once in 3-400 year event the 2000 floods are said to be.

Pall et al includes no evidence as to the skill of their model in predicting flooding or calibration between the models’ estimation of runoff in the 2000 floods and what actually happened in the real world. As I noted in Part 1, they could have run the model for years other than 2000 in order to show what is termed its “skill”, in this case in predicting flooding.

Why, then, did Pall et al not calibrate their model? Because they didn’t think it mattered, that’s why. They write:

“Crucially, however, most runoff occurrence frequency curves in Fig 3 remain approximately linear over a range of extreme values, so our FAR estimate would be consistent over a range of bias-corrected flood values.”

It’s about time we had a picture, and I can now include Pall et al’s Fig 3 itself. Ignore the sub-divisions on the bottom of the 2 scales in each diagram – these are in error as pointed out in Part 1. The question for any youngsters reading is: are the scales on these diagrams linear or logarithmic?:

Answer: logarithmic, of course.

So is it the case that the “FAR estimate would be consistent over a range of bias-corrected flood thresholds”? The FAR, remember, is the ratio of the AGW risk of flooding to the non-AGW risk of flooding. This ratio would indeed not depend on the level chosen in the model set-up to indicate flooding of the extent seen in the real world in 2000 were the runoff occurrence frequency curves linear. But they’re not. They’re logarithmic. The increased risk therefore does depend on the flood level, as was seen simply from reading figures off the diagrams in Part 1. One wonders if we’re all clear exactly what the graphs in Pall et al’s Fig 3 actually represent.

Does Pall et al actually tell us anything useful at all?

The Pall et al study assumes it has some skill in forecasting flooding in England in autumn from the state of the climate system in April. Unfortunately we have no idea what this level of skill actually is. The model has not been calibrated against the real world by running it for years other than 2000 (or if it has, this information is not included in Pall et al). Note that analysing the results of such an exercise would not be a trivial exercise, since there are two unknowns: the skill of the model and its bias. As far as we know, 0.41mm runoff in the model could be anything in the real world – 0.35mm or 0.5mm, we have no idea. Similarly we don’t know if the model would forecast floods such as those in 2000 with a probability of 1 in 10, 1 in 100 or whatever.

To be fair, Pall et al do devote one of their 4 pages in Nature to showing their modelling does bear some relation to reality. Their Fig 1 shows similar correlation between Northern Hemisphere (NH) air pressure patterns in the model and rainfall in England and Wales as exists in the real world. And their Fig 2 shows that the rainfall patterns in the model bear some resemblance to those in the real world.

But one (more) big problem nags away at me. The basic premise is that a particular pattern of SSTs and sea ice causes the pressure system patterns that lead to rainfall in the UK. Pall et al therefore used the observed April 2000 pattern as input to the A2000 (AGW) series of model runs. But the patterns used for the non-AGW (A2000N) runs were different. Here’s what they say:

“…four spatial patterns of attributable [i.e. to AGW] warming were obtained from simulations with four coupled atmosphere-ocean climate models (HadCM3, GFDLR30, NCARPCM1 and MIROC3.2)… Hence the full A2000N scenario actually comprises four scenarios with a range of SST patterns and sea ice…” [my stress]

So if the A2000 model runs can predict flooding in a particular year from the SST and sea ice pattern in April, we wouldn’t expect the A2000N runs to do so, not just because everything is warmer, but also because the SST and sea ice patterns are different! So we don’t know whether the increased flood risk in the A2000 series is because of global warming or because the SST patterns are different.

It also seems to me that were it the case that Pall et al’s model could predict autumn flooding in April around 15-20x as often as it actually occurs (around 1 in 20 times for 2000 compared to the actual risk of 1 in 3-400) as is implied by their Fig 3, then we’d be reading about a breakthrough in seasonal forecasting and more money would be being invested to improve the modelling further (and increase the speed of forecasting of course, so that it’s not autumn already by the time we know it’s going to be wet!). This isn’t just the forecast for the next season we’re talking about, which the Met Office has given up on, but the forecast for the season after that.

So I’m not convinced. I’m going to assume that Pall et al’s modelling can’t tell one year from another, and that all they’ve done is model the increased risk of flooding in a warmer world in general. (One way to test this would be to compare the flood risks of the 4 A2000N models against each other for the same extent of AGW – it could be that the models give different results simply because they suggest different amounts of warming, not different patterns).

Under this not very radical assumption, we can actually calibrate Pall et al’s modelling. We know that the floods in 2000 were a once in 3-400 year event. That implies that in each of the diagrams in Fig 3 there should be around 5-7 floods (there are – or should be – approx. 2000 dots representing non AGW model runs on each diagram). We can therefore estimate by inspecting the figures how much flooding in the model respresents a 3-400 flood – it’s the level with only 5-7 dots above. We can then read across to the line of blue dots (the AGW case) and then, by reading up to the return time scale (the one with correct subdivisions), work out how often the modelling suggests the flooding should then occur. Here’s what I get:
- Fig 3a: 3-400 year flood threshold ~0.49mm; risk after AGW once every 40 years.
- Fig 3b: ~0.47mm, and risk now once every 30 years.
- Fig 3c and d: ~0.5mm, and risk now once every 50 years.

So the Pall et al study implies, assuming it’s no better at forecasting flooding when it knows the SST and sea ice conditions in April than it is if it doesn’t, that the risk of a 3-400 year flood in England and Wales, similar or more severe to that which occurred in 2000 is now, as a result of AGW up to 2000 only, between once in 30 and once in 50 years. That is, under this assumption, the risk of flooding in England and Wales of what was previously once in 3-400 year severity has increased by a factor of between 6 and 13, according to Pall et al’s modelling.

Trouble is, the Pall et al model may have a bit of skill in forecasting flooding from April SST and sea ice conditions (the A2000 case) and this skill may have been reduced by an unknown factor when processing the data to remove the effects of 20th century warming. If Pall et al’s results are to have any meaning whatsoever they need to do further work to establish the skill of the model and calibrate it to measures of flooding in the real world.

More uncertainty about uncertainty

In Part 2 I discussed how Pall et al’s treatment of uncertainty has resulted in them actually saying very little. Essentially, they’ve estimated that the risk of autumn flooding as great as or exceeding that in 2000 has increased as a result of AGW by between 20% and around 700% – and there’s 20% probability it could be outside that range! I argued that the sources of this uncertainty are:
(i) the 4 different models used to derive conditions as if AGW hadn’t happened – fair enough, we can’t distinguish between these, (though in Part 1 I estimated how certain we’d be of the increased risk of flooding if we did assume they were all equally probable), and
(ii) statistical uncertainty which could have been eliminated.

But these are not the only sources of uncertainty. We are also uncertain of all the parameters used to drive the HadAM3-N144 model which attempts to reproduce the development of the autumn weather from the April conditions that were fed into it; we’re uncertain of the accuracy of the April SST and sea-ice conditions input into the model; we’re uncertain as to whether atmosphere-ocean feedbacks may have affected the autumn 2000 weather (Pall et al are explicit that such feedbacks were insignificant, so used “an atmosphere-only model, with SSTs and sea ice as bottom boundary conditions); we’re uncertain of the precise magnitude of the forcings in 2000 which affected the development of the autumn weather; we’re uncertain as to whether there are errors in the implementation of the models; and we’re uncertain as to whether there are processes below the resolution of the model which are important in the development of weather patterns. There are probably more.

Consider that the reason we are uncertain as to which of the 4 models used to derive the A2000N initial conditions is most correct (or how correct any of them are) is because we don’t know how well each of them perform on moreorless the same criteria as the higher resolution model used to simulate the 2000 weather. If they didn’t have different parameters, all had the same resolution and so on, then – tautologically – they’d all be the same! If we’re uncertain which of those is most accurate then we must also be uncertain about the HadAM3-N144 model. Just because only one model was used for that stage of the exercise doesn’t mean we’re not still uncertain (and for that matter the fact that we’ve used 4 in the first stage doesn’t mean we’re certain any of them, they could all be wildly wrong, a possibility not apparently taken account of in Pall et al).

It seems to me the real causes of uncertainty in the findings of Pall et al derive from the general characteristics of the models, not (as discussed in Part 2) the statistical uncertainty as to the amplitude of 20th century warming (the 10 sub-scenarios for each of the 4 cases) which has been used.

Judith Curry has recently written at length about uncertainty and her piece is well worth a look (though I disagree where statistical uncertainty belongs in Rumsfeld’s classification – I think it’s a known unknown, maybe in a “knowable” category, since it can be reduced simply by collecting more of the same type of data as one already has). In particular, though, she provides a link to IPCC guidelines on dealing with uncertainty (pdf). A quick skim of this document suggests to me that Probability Distribution Functions (PDFs) such as Pall et al’s Fig 4 should be accompanied by a discussion of the factors creating uncertainty in the estimate, including some consideration as to how complete the analysis is deemed to be. I say deemed to be, since by it’s very nature uncertainty is uncertain!

That seems a good note to end the discussion on.

Here’s Pall et al’s Fig 4 (apologies if it looks a bit smudged):

Summary

In Part 1 of this critique I identified the two main problems with Pall et al:
- the model results are not calibrated with real world data. The paper therefore chooses an arbitrary threshold of flooding.
- statistical uncertainty has not been eliminated, rather it seems to have been introduced unnecessarily.

Part 2 drilled down into the issue of statistical uncertainty and suggested how Pall et al could have used the vast computing resources at their disposal to eliminate much of the uncertainty of their headline findings.

Part 3 picks up on some of the issues raised in Parts 1 and 2, in particular noting that the paper seems to include an erroneous assumption which led them to conclude that calibration of their model for skill and bias was not important. If my reasoning is correct, this was a mistake. Part 3 also continues the discussion about uncertainty, suggesting that the real reasons for uncertainty as to the increased risk of flooding have not been included in the analysis (whereas statistical uncertainty should have been eliminated).

There are so many open questions that it is not clear what Pall et al does tell us, if anything. I suspect, though, that the models used have little skill in modelling autumn floods on the basis of April SST and sea ice conditions. If this is correct then the study confirms that extreme flooding in general is likely to become more frequent in a warmer world, with events that have historically been experienced only every few centuries occurring every few decades in the future.

Note: Towards the end of writing Part 3 I came across another critique by Willis Eschenbach.  So there may well be a Part 4 when I’ve digested what Willis has to say!

February 22, 2011

Extreme Madness: A Critique of Pall et al (Part 2: On Risk and Uncertainty)

Filed under: Effects, Global warming, Science — Tim Joslin @ 2:42 pm

Keeping my promises? Whatever next! I said on Sunday that I had more to say on Pall et al, and, for once, I haven’t lost interest. Good job, really – after all, Pall et al does relate directly to the E3 project on Rapid Decarbonisation.

My difficulties centre around the way Pall et al handle the concepts of risk and uncertainty. I’m going to have to start at the beginning, since I doubt Pall et al is fundamentally different in many respects from other pieces of research. They’re no doubt at least trying to follow standard practice, so I need to start by considering the thinking underlying that. I feel like the Prime alien in Peter Hamilton’s Commonwealth Saga (highly recommended) trying to work out how humans think from snippets of information!

Though I should add that Pall et al does have the added spice of trying to determine the risk of an event that has already occurred. That’s one aspect that really does my head in.

Let’s first recap the purpose of the exercise. The idea is to try to determine the fraction of the risk of the 2000 floods in the UK attributable (the FAR) to anthropogenic global warming (AGW). This is principally of use in court cases and for propaganda purposes, though it may also be useful to policy-makers as it implies the risk of flooding going forward, relative to past experience.

Now, call me naive, but it seems to me that, in order to determine the damages to award against Exxon or the UK, those crazy, hippy judges are going to want a single number:
- What, Mr Pall et al, is your best estimate of the increased risk of the 2000 autumn floods due to this AGW business?
- Um, we’re 90% certain that the risk was at least 20% greater and 66% certain that the risk was 90% greater…
- I’m sorry, Mr Pall et al, may we have a yes or no answer please.
- Um…
- I mean a single number.
- Sorry, your honour, um… {shuffles papers} here it is! Our best estimate is that the 2000 floods were 150% more likely because of global warming, that is, 2 and a half times as likely, that is, the AGW FAR was 60%.
- Thank you.
- OBJECTION!
- Yes?
- How certain is Mr um {consults notes} Pall et al of that estimate.
- Mr Pall et al?
- Let’s see… here it is… yes, we spent £120 million running our climate model more than 10,000 times, so our best estimate is tightly constrained. We have calculated that 95% of such suites of simulations would give the result that the floods were between 2.2 and 2.8 times more likely because of global warming [see previous post for this calculation].

But Pall et al don’t provide this number at all! This is what Nature’s own news report says:

“The [Pall et al] study links climate change to a specific event: damaging floods in 2000 in England and Wales. By running thousands of high-resolution seasonal forecast simulations with or without the effect of greenhouse gases, Myles Allen of the University of Oxford, UK, and his colleagues found that anthropogenic climate change may have almost doubled the risk of the extremely wet weather that caused the floods… The rise in extreme precipitation in some Northern Hemisphere areas has been recognized for more than a decade, but this is the first time that the anthropogenic contribution has been nailed down… The findings mean that Northern Hemisphere countries need to prepare for more of these events in the future. ‘What has been considered a 1-in-100-years event in a stationary climate may actually occur twice as often in the future,’ says Allen.” [my stress]

When Nature writes that “anthropogenic climate change may have almost doubled the risk of the extremely wet weather that caused the floods” [my stress] what they are actually referring to is the “66% certain that the risk was 90% greater”, mentioned by Pall et al in court (and as “two out of three cases” in the Abstract of Pall et al even though the legend of Fig 4 in the text clearly states that we’re talking about the 66th percentile, i.e. 66, not 66.66666… but I’m beginning to think we’ll be here all day if we play spot the inaccuracy – the legend in their Fig 2 should read mm per day not mm^2, that would get you docked a mark in your GCSE exam).

We could have a long discussion now about the semantics and usage in science of the words “may” and “almost” as in the translation of “66% certain that the risk was 90% greater” into “may have almost doubled”, but let’s move on. The point is that in the best scientific traditions a monster has been created, in this case a chimera of risk and uncertainty that the rest of the human race is bound to attack impulsively with pitch-forks.

So how did we get to this point?

Risk vs uncertainty

It’s critical to understand what is meant by this these two terms in early 21st century scientific literature.

Risk is something quantifiable. For example, the risk that an opponent may have been dealt a pair of aces in a game of poker is perfectly quantifiable.

First, why, then do poker players of equal competence sometimes win and sometimes not? Surely the best players should win all the time, because after all, all they’re doing is placing bets on the probability of their opponent holding certain cards. One reason is statistical uncertainty. There’s always a chance in a poker session that one player will be dealt better cards than another. Such uncertainty can be quantified statistically.

But there’s more to poker than this. Calculating probabilities is the easy part. The best poker players can all do this. So the second question is why, then, are some strong poker players better than others? And why do the strongest human players still beat the best computer programs – which can calculate the odds perfectly – in multi-player games? The answer is that there’s even more uncertainty, because you don’t know what the opponent is going to do when he has or does not have two aces. Some deduction of the opponent’s actions is possible, but these require understanding the opponent’s reasoning. Sometimes he may simply be bluffing. Either way, to be a really good poker player you have to get inside your opponent’s head. The best poker players are able to assess this kind of uncertainty, the uncertainty as to how much the statistical rules to apply in any particular case, uncertainties as to basic assumptions.

Expressing risk and uncertainty as PDFs

PDFs in this case doesn’t stand for Portable Document Format, but Probability Density (or Distribution) Function.

The PDF represents the probability (y-axis) of the risk (x-axis) of an event, that is, the y-axis is a measure of uncertainty. Pall et al’s Fig 4 is an example of a PDF. It’s where their statement in court that they were 90% sure that the risk of flooding was greater than 20% higher because of AGW (and so on) came from.

The immediate issue is that risk is a probability function. Our best estimate of the increase in risk (the FAR) because of AGW is 150%, so we’re already uncertain whether the 2000 floods were caused by global warming (the probability is 60% or 3/5). So we have a probability function of a probability function. The only difference between these probability functions is that the one is deemed to be calculable, the other not. Though it has in fact been calculated! Furthermore, as we’ll see, some aspects of the uncertainty in the risk can be reduced, and other aspects cannot – the PDF includes both statistical uncertainty and genuine “we don’t know what we know” uncertainty (and I’m not even discussing “unknown unknowns” here, both types of uncertainty are unknown knowns).

Risk and uncertainty in Pall et al

What Pall et al have done is assume their model is able to assess risks correctly. Everything else, it seems, is treated as uncertainty.

Their A2000 series is straightforward enough. They set sea surface temperatures (SSTs) and the sea-ice state to those observed in April 2000 and roll the model (with minor perturbations to ensure the runs aren’t all identical).

But for the A2000N series they use the same conditions, but set GHG concentrations to 1900 levels, subtract observed 20th century warming from SSTs and project sea-ice conditions accordingly. There’s one hint of trouble, though, they note that the SSTs are set “accounting for uncertainty”. I’m not clear what this means, but it doesn’t seem to be separated out in the results in the same way as will be seen is done for other sources of uncertainty.

They then add on the warming over the 20th century that would have occurred without AGW, i.e. with natural forcings only, according to 4 different models, giving 4 different patterns of warming in terms of SSTs etc. As will be seen, for each of these 4 different patterns they used 10 different “equiprobable” temperature increase amplitudes.

First cause of uncertainty: 4 different models of natural 20th century warming

As Pall et al derive the possible 20th century natural warming using 4 different models giving 4 different patterns of natural warming, there are 4 different sets of results, giving 4 separate PDFs of the AGW FAR of flooding in 2000. Now, listen carefully. They don’t know which of these models gives the correct result, so – quite reasonably – they are uncertain. Their professional judgement is to weight them all equally, so that means that so far, they’ll only be able to say at best something like: we’re 25% certain the FAR is only x; 25% certain it’s y; 25% certain it’s z; and, crikey, there’s a 25% possibility it could be as much as w!

Trouble is, they can only run 2,000 or so of each of 4 non AGW simulations. So for each of the 4 there’ll be a sampling error. They treat this statistical uncertainty in exactly the same way as what we might call their professional judgement uncertainty, which certainly gives me pause for thought. So what happens is they smear the 4 estimates x, y, z and w and combine them into one “aggregate histogram” (see their Fig 4). That’s how they’re able to say we’re 90% certain the FAR is >20% and so on.

Nevertheless, their Fig 4 also includes the 4 separate histograms for our estimates x, y, z and w. It’s therefore possible for another expert to come along and say, “well, x has been discredited so I’m just going to ignore the pink histogram and look at the risk of y, z and w” or “z is far and away the most thorough piece of work, I’ll take my risk assessment from that”, or even to weight them other than evenly.

One of the 4 models may be considered an outlier, as in fact the pink (NCARPCM1) one is in this case. It’s the only one with a most likely (and median) FAR below the overall median value (or the overall most likely value which happens to be higher than the overall median). Further investigation might suggest it should be discarded.

Another critical point: x, y, z and w can be determined as accurately as we want by running more simulations, because the statistical uncertainty reduces as the square root of the number of data items (see Part 1).

I’m not going to argue any more as to whether the 4 models introduce uncertainty. Clearly they do. I have no way of determining which of the 4 models most correctly estimate natural warming between 1900 and 2000. It’s a question of professional judgement.

However, I will point out that if uncertainty between the models is not going to be combined statistically (as in the previous post) I am uneasy about combining them at all:

Criticism 6: The headline findings against each of the 4 models of natural warming over the 20th century should have been presented separately in a similar way to the IPCC scenarios (for example as in the figure in my recent post, On Misplaced Certainty and Misunderstood Uncertainty).

Second cause of uncertainty: 10 different amounts of warming from each of the 4 models of natural 20th century warming

But Pall et al didn’t stop at 4 models of natural 20th century warming. They realised that each of the 4 models has statistical uncertainty in its modelling of the amount of natural warming to 2000. The models in particular each noted a risk of greater than the mean warming. This has to be accounted for in the initial data to our flood modelling. Never mind, you’d have thought, let’s see how often floods occur overall, because what we’re interested in is the overall risk of flooding.

But Pall et al didn’t simply initialise their model with a range of initial values for the amplitude of warming for each of their 4 scenarios. They appear to have created 10 different warming amplitudes for each of the 4 scenarios and treated each of these as different cases. This leaves me bemused, as the 4 scenarios must also have had different patterns of warming, so why not create different cases from these? Similarly, they seem to have varied initial SST conditions in their AGW model since they “accounted for uncertainty” in that data. Why, then, were these not different cases?

I must admit that even after spending last Sunday morning slobbing about pondering Pall et al, rather than just slobbing about as usual, I am still uncertain(!) whether Pall et al did treat each of the 10 sub-scenarios as separate cases. If not, they did something else to reduce the effective sample size and therefore increase the statistical uncertainty surrounding their FAR estimates. Their Methods Summary section talks about “Monte Carlo” sampling, which makes no sense to me in this case as we can simply use Statistics 101 methods (as shown in Part 1).

The creation of 10 sub-scenarios of each scenario (or the Monte Carlo sampling) effectively means that, instead of 4 tightly constrained estimates of the risk, we have 4 wide distributions. Remember (see previous post) the formula for calculating the statistical uncertainty (Standard Deviation (SD)) that the mean of a sample represents the mean of the overall population is:

SQRT((sample %)*(100-sample%)/sample size) %

so varies with the square root of the sample size. In this case the sample sizes for each of the 4 scenarios was 2000+, so that of each of the 10 subsets was only around 200. The square root of 10, obviously, is 3 and a bit, so the error associated with a sample of 200 gives an error 3 times as large as if the sample size were 2000.

For example, one of the yellow runs is an outlier: it predicts floods about 15% of the time. How confident can we be in this figure?:

SQRT((15*85)/200) = ~2.5

So it’s likely (within 1 SD either way) that the true risk is between 12.5 and 17.5% and very likely (2 SD either way) only that it is between 10 and 20%.

So if we ran enough models we might find that that particular yellow sub-scenario only implied a flood risk of somewhere around 10%. Or maybe it was even more. The trouble is, in salami-slicing our data into small chunks and saying we’re uncertain which represents the true state of affairs, we’ve introduced statistical uncertainty. And this affects our ability to be certain, since it is bound to increase the number of extreme results in our suite of 40 scenarios, disproportionately affecting our ability to make statements as to what we are certain or very certain of.

Criticism 7: The design of the Pall et al modelling experiment ensures poor determination of the extremes of likely true values of the FAR – yet it is the extreme value that was presumably required, since that was presented to the world in the form of the statement in the Abstract that AGW has increased the risk of floods “in 9 out of 10 cases” by “more than 20%“. The confidence in the 20% figure is in fact very low!

Note that if the April 2000 temperature change amplitude variability had been treated as a risk, instead of as uncertainty, the risks in each case would have been tightly constrained and the team would have been able to say it was very likely (>90%) that the increased flood risk due to AGW exceeds 60% (since all the 4 scenarios would yield an increased risk of more than that) and likely it is greater than 150% (since 3 of the 4 scenarios suggest more than that).

The problem of risks within risks

Consider how the modelling could have been done differently, at least in principle. Instead of constructing April 2000 temperatures based on previous modelling exercises and running the model from there, they could have modelled the whole thing (or at least the natural forcing representations) from 1900 to autumn 2000 and output rainfall data for England. Without the intermediate step of exporting April 2000 temperatures from one model to another there’d be no need to treat the variable as “uncertainty” rather than “risk”.

Similarly, say we were interested in flooding in one particular location. Say it’s April 2011 and we’re concerned about this autumn since the SSTs look rather like those in 2000. Maybe we’re concerned about waterlogging of Reading FC’s pitch on the day of the unmissable local derby with Southampton in early November. Should we take advantage of a £10 advance offer for train tickets for a weekend away in case the match is postponed or wait until the day and pay £150 then if the match is off?

In this case we’d want to feed the aggregate rainfall data from Pall et al’s model into a local rainfall model. By Pall et al’s logic everything prior to our model would count as “uncertainty”. We’d input a number of rainfall scenarios into our local rainfall model and come up with a wide range of risks of postponement of the match, none of which we had a great deal of confidence in. I might want to be 90% certain there was a 20% chance of the match being postponed before I spent my tenner. I’d have to do a lot more modelling to eliminate statistical uncertainty if I use 10 separate cases than if I treat them all the same.

How Pall et al could focus on improving what we know

If we inspect Pall et al’s Figs 3, it looks first of all that very few – perhaps just 1 yellow and 1 pink – of the 40 non-AGW cases result in floods 10% of the time (this includes the yellow run that predicts 15%). About 12% of the AGW runs result in floods. Yet we’re only able to say we are 90% certain that the flood risk is 20% greater because of AGW. This would imply at most 4 non AGW runs within 20% of the AGW flood risk (i.e. predicting a greater than 10% flood risk).

If we look at Pall et al’s Fig 4, we see that, first:
- the “long tail” where the risk of floods is supposedly somewhat (FAR <-0.25!) greater "without AGW" is almost entirely due to the yellow outlier case. If just 10 runs in this case had not predicted flooding instead of predicting it then the long tail of the entire suite of 10,000 runs would have practically vanished.
- the majority of the risk of the FAR being below its 10th percentile (giving rise to the statement of 90% probability of a FAR of greater than (only) 20%) is attributable to pink cases.

It would have been possible to investigate these cases further, simply by running more simulations of the critical cases to eliminate the statistical uncertainty. I can hear people screaming “cheat!”. But this simply isn’t cheating. Obviously if 10x as many runs of the critical cases as non-critical ones are done, they’d have to be scaled down when the statistical data is combined (but this must have been done anyway as the sample sizes for the different scenarios were not the same). It’s not cheating. In fact, it’s good scientific investigation of the critical cases. If we want to be able to quote the increased risk of flooding because of AGW at the 10 percentile level (i.e. that we’re 90% sure of) with more certainty then that’s what our research should be aimed at.

Of course, if we find that the yellow sub-scenario really does suggest a risk of flooding of 15%, somewhat more than with AGW on top, and we don’t see regression to the mean, that might also tell us something interesting. Maybe the natural variability is more than we thought and that April 2000 meteorological conditions (principally SSTs) were possible that would have left the UK prone to even more flooding than actually occurred with more warming.

Criticism 8: Having introduced unnecessary uncertainty in the design of their modelling experiment, Pall et al did not take use of the opportunities available to eliminate such uncertainty by running a final targeted batch of simulations.

Preliminary conclusion

It looks like there’s going to have to be a Part 3 as I have a couple more points to make about Pall et al and will need a proper summary.

Nevertheless, I understand a lot better than I did at the outset why they are only able to say we’re 90% certain the FAR is at least 20% etc.

But I still don’t agree that’s what they should be doing.

We want to use the outputs of expensive studies like this to make decisions. Part of Pall et al’s job should be to eliminate statistical uncertainty, not introduce it.

They should have provided one headline figure of the increased risk due to global warming, about 2.5 times as much, taking into account all their uncertainties.

And the only real uncertainties in the study should have been between the 4 different patterns of natural warming. These are the only qualitative differences between their modelling runs. Everything else was statistical and should have been minimised by virtue of the large sample sizes.

If we just label everything as uncertainty and not as risk, we’re not really saying anything.

After all, it might be quite useful for policy-makers to know that flood risks are already 2.5 times what they were in 1900. This might allow the derivation of some kind of metric as to how much should be spent on flood defences in the future, or even on relocation of population and/or infrastructure away from vulnerable areas. Knowing that the scientists are 90% certain the increased risk is greater than 20% really isn’t quite as useful.

The aim of much research in many domains, including the study of climate, and in particular that of Pall et al should be to quantify risks and eliminate uncertainties. It rather seems they’d done neither satisfactorily.

(to be continued)

———-
23/2/11, 16:22: Fixed typo, clarified remarks about the value of Pall et al’s findings to policy-makers.

February 20, 2011

Extreme Madness: A critique of Pall et al (Part 1: General comments on the paper and discussion of use of statistics)

Filed under: Effects, Global warming, Science — Tim Joslin @ 3:59 pm

Do what I say, not what I do. Refrain from seeking out papers in scientific journals, because they inevitably create more questions than answers. Jobs for the boys, I suppose.

I first read about Pall et al last Thursday when a headline on guardian.co.uk caught my eye: Climate change doubled likelihood of devastating UK floods of 2000. What could that possibly mean? The point is we know the floods occurred.

Are we saying there would have been a 50% chance of them happening if global warming hadn’t occurred? That would at least make sense, but seems to me extremely unlikely, since autumn 2000 was apparently the wettest since records began in 1766. The chances of an entirely different set of weather events in a parallel universe coming together to produce something as extreme is clearly much less than one in two.

I was just mulling over this when a Realclimate post notification popped into my Inbox. Nature, which this week splashed on rain (ho, ho), had of course caught the eye of Gavin Schmidt, who reported on Pall et al and another paper in the same issue. I immediately dived in where professional scientists with people to upset fear to tread and voiced some of my concerns. Gavin responded (I stand by the points I made which he disagrees with, btw) and the debate went on, a Mathieu chipped in, violently agreeing with me, as I pointed out and I similarly responded to some remarks by a Thomas.

At this point I started to get serious about the issue. The rest of this post is a more systematic critique of Pall et al.

What a way to conduct a debate

It is absurd that we are attempting to formulate policy on the basis of information that is not in the public domain. Particularly since a weekly scientific news cycle has developed as the main journals try to grab headlines. As well as the main Guardian article, George Monbiot also commented soberly on Pall et al, remarking that:

“[Pall et al] gives us a clear warning that more global heating is likely to cause more floods here.”

though when he says:

“They found that, in nine out of 10 cases, man-made greenhouse gases increased the risks of flooding…”

he (or the dreaded sub-editor) has in fact lost the sense of Pall et al’s Abstract, which went on to say:

“…by more than 20%”.

so George’s “nine out of 10″ is in fact an understatement.

The science news cycle process does rather allow a bit of spin. I hate to say it, but the main Guardian piece does have the feel of having been planned in advance – hey, journo, here’s three quotes for the price of one. As well as Myles Allen (the leader of the Pall et al team, and one of the paper’s authors), a Richard Lord QC is also quoted. It’s not immediately obvious, but Lord appears to be a long-time collaborator of Allen in what has to be described as a political project to use the legal system to tackle the global warming problem. I’m not at all sure about the “blame game” in general. It seems if anything to put obstacles in the way of reaching international agreement on emissions cuts.

It wasn’t until Friday afternoon that I was able to read the whole of Pall et al, rather than just the Abstract (thanks Ealing Central Library). Nature is a good journal, but I don’t think they paid for the work that went into Pall et al. In fact the climate modelling was actually executed by volunteers at climateprediction.net. This is an exciting initiative, but, as someone who once participated (I was pleased my model showed an extreme result of something like 11C 21st century warming!), it would be much better – and I’d be much more likely to take the trouble to participate again – if the results were presented in an open manner, rather than held back (it seems) for scientific papers that appear a year after they’re submitted, so well after the experiment. Much more could be done to at least explain the findings of all the experiments to date on the site.

Anyway, here I finally am with a cup of tea, a hot-cross bun and my dissecting kit, so let’s proceed…

The Pall et al method

It turns out that what Pall et al did was initialise the state of climate models to April 2000. They ran one set of 2268 simulations (their A2000) with the actual conditions and other sets (of 2158, 2159, 2170 and 2070 simulations) each with one of 4 counterfactuals (each with 10 “equally probable” variants so 40 scenarios in all), with global warming stripped out.

They fed the climate model inputs into a flood model to determine run-off and considered the floods had been predicted if the average daily run-off was equal or greater than the 0.41mm recorded in autumn 2000.

The result was a set of graphs showing the results with and without global warming. Basically these consist of a bunch of results from the global warming case and each of the 4 models. They show these as cumulative frequency distributions, such that 100% of the global warming case (a line of dots on the log scale they use) result in run-off above 0.3mm/day, 13% (1 in about 7.5) above 0.4mm, maybe (the graphs are quite small) 12% (1 in about 8.5) above the actual flood level of 0.41mm/day and so on, with around 1.2% (1 in about 80) as high as 0.55mm a day (which presumably is a Biblical level). Actually I’ve just realised that in fact the graphs (Pall et al’s Fig 3) are printed with the horizontal access logarithmic scale marked with the same subdivisions for occurrence frequency (as I carelessly read before my final Realclimate post) and its inverse, return time (which actually is a log scale) – you’d think a peer reviewer or someone at Nature would have spotted that in the 10 and a half months between submission and publication.

The other cases (the A2000Ns in Pall et al’s terminology) are each 10 similar lines of dots, so appropriately enough they appear as a spray, running below the A2000 line, except in 2 cases which manage to nip above the A2000 line.

Call me naive, but I think this shows that in >95% of cases (that is, except for two out of 40, part of the time) the 2000 floods were worse than they would have been without global warming. That is, according to the modelling, the exercise has shown, statistically significantly, that the flooding was worse as a result of global warming. All we need to do is assume the same model errors affected all the scenarios approximately equally. This seems an intelligent conclusion.

But that’s not what the authors do. They randomly select from the A2000s and each of the 4 sets of A2000Ns to produce graphs of the probability distribution of the run-off being more likely to exceed the threshold of 0.41mm/day (the actual level). They also produce a combined graph, and this is where the aforementioned increased risk of greater than 20% in 9 out of 10 cases comes from, as well as an increased risk of 90% in 2 out of 3 cases and the Guardian headline of approximately double the risk at the median.

The point is that Pall et al don’t want to just say “flooding will be more severe”, they want to be able to calculate the fraction of attributable risk (FAR) for anthropogenic global warming (AGW) for the particular event. Why? So they can take people to court, that’s why.

As I noted in my final Realclimate post on the topic, it seems to me that Pall et al are trying to push things just a little too far.

About this 0.41mm threshold

This wasn’t where I intended to start, but it seems logical. Why define the flood event in this way? Why not say anything over say 0.4mm/day would count as a flood? Floods aren’t threshold types of things anyway.

Further, why are we including runs with very high runoffs? These types of models are known to sometimes “go wild”. Surely we’re interested in forecasting the actual flood event, not some other extreme.

One effect of choosing the 0.41mm threshold is it makes the flood reasonably rare. But as I argued repeatedly on Realclimate, the flood definitely happened; one reason it’s rare in the modelling experiment is because the model (and/or the initial data it was supplied with) is not good enough to forecast it more than about 1 in 8.5 cases or about 12% of the time. We’ll have to come back to this.

Now here’s another pet hate. The fact that the flood is rare in both the A2000 and A2000N model runs means that the result can (and is) expressed as a % increase in risk, even if George Monbiot (or his sub-editor) managed to miss this off. If the occurrence in both sets of data had been higher then these percentages would have been considerably lower.

For example, Fig 3b (using GFDLR30 data in purple, for those with access to the paper) is the easiest to read as the A2000 series is much better than the purple set of A2000Ns at predicting the flood. For the “best” (probably warmest) of the purple A2000N series, I can therefore read off intersection data together with that for the A2000 series. For 0.41mm/day A2000 predicts the flood about 12% of the time (1 run in every 8.5) whilst the A2000N predicts it 5% of the time (one year in 20). We’d conclude on the basis of this data that the increased risk of the flood because of AGW is around 140% (i.e. 12/5 = 2.4 times what it was before).

But for 0.35mm I get 50% (1 in 2) and 33% (1 in 3) respectively, so the flood risk is only about 50% greater!

As a check, if I go even higher to 0.46mm I get 5% (1 in 20) and about 1.5% (around 1 in 70), so the flood risk is 233% greater.

It’s well known, as discussed in the other paper in this week’s Nature, Min et al, that climate models tend to underestimate extreme precipitation events, so choosing a lower runoff threshold for the flood might have made some sense. On the other hand, exceptionally extreme events become much more likely with AGW.

I can’t find any calibration between the models used by Pall et al and actual rainfall (e.g. by trying to simulate other years) – maybe they’re just not very good at forecasting rainfall in flood years or maybe they forecast the same rainfall every year, regardless of the initial condition in April.

Criticism 1: The paper should have included the the real-world distribution of run-offs which the modelling is supposedly correlated with.

Criticism 2: The paper should have included validation of the model against actual run-offs over a number of years. Some model runs should have been initialised to the conditions in April 1999, 2001 etc.

If I’d been editor of Nature (and I never will be if this upsets the wrong people – the sacrifices I make for truth), I might have asked for such a calibration or at least a sensitivity analysis between the “increased risks” and the flood threshold value chosen.

Criticism 3: The results should have been presented as a graph of increased risk of floods of different severity (and therefore different return times).

About this computing time

As I mentioned earlier, Pall et al ran over 10,000 simulations the autumn 2000 weather. Yet whilst their mean case is that the floods in the AGW case were about 2.5 times likely as without AGW, they are only 90% confident that the floods were 20% more likely to occur.

Huh?

If I do an opinion poll – as I happen to have – I can tell you within a small % how the nation will vote.

So I stared at Pall et al’s method and the more I think about it the more bizarre it seems. They’ve only gone and sampled the samples! In their Fig. 4 they’ve presented a Monte Carlo distribution of samples of pairs from each set of simulations, plotting the probability in each case of the floods being worse because of AGW. They don’t give the sample size – 43 say – of each of these Monte Carlo samples, but unless I’ve gone completely mad, these plots are sensitive to the sample size. i.e. if they’d taken a sample size of say 87 random pairs of simulations the certainty (that the floods are 2.5 times as likely to occur in the AGW case) would have been greater (probably by the square root of 2, but that’s just an educated guess). This is basically an example of how what we used to call “technowank” in the IT trade can go badly wrong.

If I’m right and I think I am, Pall et al have not only presented the wrong headline finding (the world should have been informed that the floods, according to their modelling exercise, were 2.5x as likely because of AGW +/- not very much), they’ve also thrown away the advantage of using so much computer time – I read somewhere that those 10,000+ simulations would have cost £120m if run commercially rather than as volunteers’ screensavers!

They say it’s better to understand how to do something simple, than misunderstand something complex. Well, they don’t actually, I just made that up. Anyway, here’s some schoolboy stats Pall et al could have employed:

From their graphs, about 12% of the AGW simulations were greater than their 0.41mm threshold for the flood. With a sample size of 2268, what my textbook calls the STandard Error of Percentages (STEP), the standard deviation of this estimate of the whole (infinite) population of simulations is given by:

SQRT((12*(100-12))/2268) = 0.68%

That is, it’s likely (within 1 SD) that the actual risk of flooding in the AGW case (according to our model) is 12+/-0.68%.

Similarly for the counterfactual ensemble (all 40 sets combined), it’s likely (based on inspection of their Fig.4 that the number of AGW simulations exceeding the 0.41mm threshold is 2.5x the number of non-AGW ones doing so) that the flood risk without AGW is within 4.8%+/-:

SQRT((4.8*(100-4.8))/8557) = 0.23%

There’s probably some clever stato way of combining these estimates, but all I’m going to do is crudely compare the top estimate of each with the bottom estimate of the other – that gives us roughly 2 standard deviations. On this basis, according to our modelling, the actual likelihood of the floods occurring because of AGW has increased by a factor of very likely between 12.68/4.57 = 2.8 and 11.32/5.03 = 2.2, with a best estimate of 2.5 times.

This is an important conclusion because the problem with global warming is not just or even mainly the increase in averages, in this case of precipitation. That may not be noticeable.

I think I’ll stop here and consider in another post my more philosophical arguments as to how the methodology of the Pall et al study is dubious.

In the meantime:

Criticism 4: Pall et al’s statistical approach understated the certainty of their modelling result. In fact the study provides some evidence that:

Even the limited warming over the 20th century is very likely, according to a comparative modelling exercise, to have made flooding of the severity of that in 2000 between 2.2 and 2.8 times as likely as in 1900. Historical records suggest the 2000 floods were around a once in 400 year event before global warming, but as a result of the warming up to 2000 they are, according to this modelling exercise, a once in 140 to 180 years event.

Criticism 5: The study should have run ensembles with the expected increase in temperatures expected by (say) 2030 and 2050.

(to be continued)

22/2/11, 11:13: Correction of typo and minor mods for clarity.
22/2/11. 16:07: Corrected another typo and clarified the meaning of the STEP calculations.

January 21, 2011

On Misplaced Certainty and Misunderstood Uncertainty

Filed under: Complex decisions, Global warming, Reflections, Science, Science and the media — Tim Joslin @ 9:42 pm

I know the climate scientists know they’re right, but a little care is called for. It’s important not to play fast and loose with the figures – especially when criticising someone else for playing fast and loose with the figures!

In a post entitled Getting things right, Realclimate yesterday addressed a piece of rogue science conducted apparently in-house by an NGO. Gavin Schmidt wrote:

The erroneous claim in the study was that the temperature anomaly in 2020 would be 2.4ºC above pre-industrial. This is obviously very different from the IPCC projections… which show trends of about 0.2ºC/decade, and temperatures at 2020 of around 1-1.4ºC above pre-industrial.

But the chance of “temperatures at 2020″ being 1.4ºC above pre-industrial seems to me pretty remote – certainly less than 2.5%, if Gavin is quoting within 2 sigma confidence limits, as is customary.

You’d think in a blog post titled “Getting things right” that it was pretty important to get things right…

So I posted a comment and was pleased to see not one, but two replies:

Now I’m confused. I understand we are currently about 0.8ºC above pre-industrial. A mean global surface temperature 1.4ºC above by 2020 implies a 0.6ºC rise over the next decade.

[Response: The range is just eyeballing the IPCC figure for the year 2020 - so there is some component of internal variability in there as well. - gavin]

[Response: GISS temperature of 2010 (which happens to be right on the long-term trend) is 0.9 ºC above the mean 1880-1920 (and the latter is probably a bit higher than "preindustrial"). -stefan]

OK, let’s take 0.9ºC, though that’s not a figure you often hear.

The IPCC graphic Gavin is referring to when he says “projections” is one I’ve never really liked:

It’s all a bit too imprecise and pretty for my liking. For example, the yellow line (constant GHG levels from 2000) diverges from the other scenarios almost immediately, even though natural variation would initially overwhelm differences between emission trajectories.

It does rather look, though, as if at least one of the scenarios could, according to the models, lead to warming of 1.4ºC above the pre-industrial level. Could this be because emissions in the scenario are much higher than we’re actually experiencing? No, Gavin notes that:

* Current CO2 is 390 ppm
* Growth in CO2 is around 2 ppm/yr, and so by 2020 there will be ~410 ppm
So far so good. The different IPCC scenarios give a range of 412-420 ppm.

The difference between 420ppm and 410ppm would only lead to a 0.1ºC extra rise in temperature over the very long term and even then the climate sensitivity (the eventual temperature increase for every doubling of the atmospheric CO2 level) would have to be on the high side – around 4ºC.

No, the problem is that the temperature hasn’t risen fast enough to 2010 for the more extreme modelling predictions in the IPCC figure for 2020 to be sufficiently likely any more. The IPCC graphic is out of date, plain and simple.

It’s a bit puzzling to be honest why Gavin used the IPCC graphic, because another Realclimate post today has trend-lines suggesting a much more accurate estimate of the likely global mean surface temperature at 2020 – around 0.2ºC higher than at present or around 1.1ºC above the pre-industrial level (as Stefan noted, 2010 is roughly on trend).

But how confident are we in this estimate? What is the range Gavin should have quoted?

Well, here’s the point: you can’t just express uncertainty by running a few models with slightly different starting conditions (the “Monte Carlo” approach) and discarding 2.5% at each extreme of the resulting distribution.

No, we have to actually think about what we’re doing.

It rather seems to me there are different kinds of uncertainty that we might want to consider when trying to predict the temperature “at 2020″.

What are the types of uncertainty we might need to take into account?

Parameter uncertainty
These are our “known unknowns”. In this case, we don’t actually know that the trend is 0.18 or 0.19ºC per decade as discussed at Realclimate. It looks like it is, but this could change when we get a bit more data – maybe we’ll find over a longer timescale that the real figure is 0.16 or 0.21ºC per decade. This makes us less certain about temperatures further out – at 2030 or 2050, say – than at 2020.

But a relatively short time into the future, parameter uncertainty is dominated by:

Calculable statistical uncertainty
Measurements of mean surface temperature show some variability about the underlying trend, as can be seen from the graphs in the Realclimate post discussing the data for 2010.

But the most any year has varied above the trend-line is about 0.2ºC in the case of 1998, which remains one of the 3 warmest years on record (with 2005 and 2010) due to the super El Nino that year. Maybe Gavin is implicitly including the possibility that there will be another strong El Nino in 2020. But that would only get us to a 1.3ºC total temperature increase (1.1ºC for the trend plus 0.2ºC for the El Nino), not 1.4ºC.

Statistical distribution uncertainty
It’s just conceivable Gavin calculated the Standard Deviation (SD) of annual temperature deviations from the trend and found it to be 0.15ºC or more so that 2 SDs includes 1.4ºC, so even if the long-term increase in temperature around 2020 is our 1.1ºC, there may still be a greater than 2.5% chance that the temperature in that one particular year is 1.4ºC or above. The only trouble is, with a mean of 1.1ºC and SD of 0.15ºC there would be an equal probability of 2020 being much colder than usual, so Gavin would have had to give a range of 0.8-1.4ºC.

Ah, but maybe Gavin expects the distribution to be skewed, so that freakishly hot years are more likely than freakishly cold ones…

The point is we don’t actually know a priori what the distribution of probabilities (often called the Probability Density – or sometimes Distribution – Function, or PDF, if that isn’t too confusing!) for the annual mean temperature of a given year actually looks like. We need a theory to tell us that – and the PDF could be complex, not a nice normal, lognormal or power curve at all.

Damn, we already have three sources of uncertainty compounding our estimate of the 2020 temperature!

It can’t get trickier than this can it?

Execution uncertainty
Yes it can.

Global temperatures are depressed following volcanic eruptions. It’s almost as if these are being ignored and that global warming projections include the implicit qualifier: “unless there’s a major volcanic eruption”. These are frequent enough for them to be included in our “2 sigma” (central 95%) range: volcanoes in 1963 (Mount Agung), 1982 (El Chichon) and 1991 (Pinatubo) depressed global temperatures by up to 0.3ºC. Despite a long-term warming trend, the temperature “at 2020″ could easily be knocked back to 2010 levels, that is, 0.9ºC above pre-industrial, or below.

I don’t want anyone coming back and saying I predicted 2020 to be warmer than 2010 and it wasn’t. Sure, I could say “the theory was right, there was just that damn eruption”. But really we need to include the possibility of volcanic activity if we’re going to make a serious forecast.

I’m beginning to think 1-1.4ºC above pre-industrial might not be that good a prediction for 2020. It seems a volcanic eruption could push us further below our central forecast of 1.1ºC than a strong El Nino could lift us above. I suspect 2 sigma confidence limits are more like 0.8-1.3ºC, with the proviso that a really serious volcanic event could leave us even cooler, without the possibility of a corresponding extreme warming event.

The point, of course, is that uncertainty in a complex system, such as the climate or the economy isn’t likely to be a simple mathematical relationship. We need to explore the theory itself. We need qualitative as well as quantitative understanding.

Unknown unknowns
So far our 2020 temperature predictions have assumed we’re certain about our theory.

But maybe we’re not as smart as we think we are.

This is where it gets really difficult. Nevertheless, we should really have a look at any developments that are bubbling up. For example, Realclimate itself has discussed modelling that suggests there could be natural cycles that affect the temperature over timescales of decades. Personally, I think there could be something in this.

Again, the risks, according to the researchers, are to the temperature downside over the next decade. How sure are we that the groups looking at these patterns of variability are wrong? Not more than 95%, surely?

Let’s make one final allowance. Let’s take account of this unknown unknown and predict that the mean global temperature at 2020 will in fact be in the range 0.7-1.3ºC above the pre-industrial level, with a central prediction of a 1.1ºC rise. That is, it will be from 0.2ºC cooler than 2010 to 0.4ºC warmer, with a median expectation in the PDF of a 0.2ºC rise, so a skewed distribution. Think of the 0.2ºC drop as maybe some cyclical cooling cancelling out some of the warming trend plus a bit of volcanic action; the 0.4ºC warming would perhaps arise with a continuation of the current trend plus a big El Nino.

This is the point I want to make: the PDF is in large part a judgement, based on understanding (so there’s plenty of people who could make a better stab at it than me). Number-crunching on its own will never do the job.

I agree with the guys at Realclimate, though: it’s important to get things right!

January 18, 2011

On Hulme on Science

This post is an addendum to my previous musings on Mike Hulme’s Why We Disagree About Climate Change. In particular I want to respond to Paul Hayne’s comment that:

“Mike Hulme’s argument is not relativist. He is arguing that there really is no argument that can leverage action, which seems pretty true.”

OK, I suppose – after re-reading Chapter 3 of Why We Disagree – my claim did go a bit far. However, I’m not sure I want to concede the point fully.

First, I’m not the only one who’s confused. What was uppermost in my mind I think were the comments about Why We Disagree made by Peter Kircher in Science (pdf), to which Hulme refers on his website, as detailed in my original post.

I concur with Kircher’s view that “Hulme’s book invites misreading” and his disquiet over Hulme’s infamous passage (p.80-2) discussing how science “must concede some ground to other ways of knowing.” There is, though, a way in which this makes sense, which Hulme doesn’t identify and which doesn’t in any way undermine science.

Second, any critique of science must always address the fundamental precept that science is about testing theories against reality. It either describes the world or it doesn’t. There’s no room for compromise with “other ways of knowing”.

There’s one little fly in the ointment, though, which is very apparent in the social sciences. Concepts are not always easy to define. What is “poverty”, for example? Before you can study “poverty” you have to get out there and translate what people mean by “poverty” into something or things that you can actually measure.

Hulme refers to “local tacit knowledge”, which he patronisingly suggests is “not conventionally classified as scientific knowledge”. He muddles strategies for coping with climate conditions with describing “environmental change” and weather-forecasting, but certainly some of what he’s driving at very much is scientific knowledge – climate science relies on interpretations of subjective historic anecdotal evidence in diaries, ships’ logs and so on.

The issue is merely about communication between scientists and those affected. In the case of climate change, science may need to translate its scientific predictions – expressed in terms of directly measurable parameters – into language that relates to people’s day to day experiences. But those experiences are not “other ways of knowing”.

Let’s take the example of “severe winter weather” in the UK, since “here’s one I prepared earlier”! As I explored recently – there is no direct correlation between measurable parameters and the common perception of, in this case, what constitutes a “cold winter”. No-one writes books about, say, February 1986, which was exceptionally cold, whereas (slightly) milder conditions with more snow, such as the Winter of Discontent (1978-9) and perhaps December 2010, linger much longer in the collective memory.

Science could, in principle, develop a “severe winter” index which included temperature extremes, averages, snowfall, lying snow days and so on. Trouble is, different people would want to constitute the index differently. Hence we all have to refer to the same variables if we want to make comparisons. This is what science is. It doesn’t stop us all making our subjective judgements, though.

So, there’s an inescapable conclusion: we have to agree on a framework, on what we can measure in order to make objective comparisons.

And this is the real weakness of Hulme’s work. In terms of both the science and making decisions on emission trajectories, we need a quantitative framework. Or we simply can’t reach any sort of agreement. It’s all very well to note that people have different values, but we can’t conceivably ever agree what is an acceptable level of climate change based on religious and political views. It is irrelevant on one level that the media distort the debate as Hulme goes on to discuss in Chapter 7, The Communication of Risk. This doesn’t alter the consequences of different courses of action and therefore the optimum path by one iota.

It might also be worth noting, en passant, that it is in fact historically somewhat unusual for public opinion to greatly matter in decision-making. The reason the media has influence is a result of our current political system. At most other times in history a ruler, or elite would simply make the decision. The long-term interest of society as a whole was the responsibility of a small group and not something actively contested between different interests. Maybe, as a civilisation, we need ways of making a clearer distinction between the general interest and the individual and sectional interests that drive our political processes. Tricky stuff!

Nevertheless, just as we can only meaningfully discuss and quantify the physical phenomena of climate change within the agreed framework that we call science we can only decide on a course of action in response to global warming by agreeing a framework that permits quantification.

And that framework is called economics.

December 27, 2010

Call this a Cold Winter? Maybe…

Filed under: AMO, Global warming, Science, UK climate trends — Tim Joslin @ 6:29 pm

If you want publicity for a scientific paper, global warming is definitely the topic to go for. Especially if you manage to feed our collective snow madness at the same time!  The Independent’s baby brother newspaper the 20p “i” even used the recent findings of Petoukhov and Semenov as the basis for its Christmas Eve front-page lead.  Basically, as far as I can glean without actually seeing their paper – it’s shameful that we’re expected to make policy on the basis of data that’s not open access – P&S have done a bit of very specific computer modelling showing that less sea-ice in the Russian Arctic can change weather patterns in such a way as to bring cold weather to Western Europe.  Pretty much what I’ve been wittering on about for quite some time, as have others, with more specific academic credentials, such as a Dr Overland.

Essentially, the lack of ice allows heat to escape, lowering the air pressure over the relevant part of the Arctic and therefore strengthening the continental highs, those over Greenland and Scandinavia being most relevant to the phenomenon of interest, namely those cold European winters as manifested in the UK in particular.  Strangely, the Independent writes that:

“Their [P&S's] models found that, as the ice cap over the ocean disappeared, this allowed the heat of the relatively warm seawater to escape into the much colder atmosphere above, creating an area of high pressure surrounded by clockwise-moving winds that sweep down from the polar region over Europe and the British Isles.” [my stress]

which is a bit confused to say the least, and doesn’t appear to have come from P&S themselves, at least judging by their press release.  The heat would create low pressure in the first instance.

A more reflective (obscure pun intended) source is a piece by George Monbiot who explained the effects on atmospheric pressure thus:

“Sea ice in the Arctic has two main effects on the weather. Because it’s white, it bounces back heat from the sun, preventing it from entering the sea. It also creates a barrier between the water and the atmosphere, reducing the amount of heat that escapes from the sea into the air. In the autumns of 2009 and 2010 the coverage of Arctic sea ice was much lower than the long-term average: the second smallest, last month, of any recorded November. The open sea, being darker, absorbed more heat from the sun in the warmer, light months. As it remained clear for longer than usual it also bled more heat into the Arctic atmosphere. This caused higher air pressures, reducing the gradient between the Iceland low and the Azores high.” [my stress again]

Maybe the Indy cribbed from George.  As every schoolboy knows, its always a giveaway when you copy your classmate’s errors.

What was George’s source?  Well it may have been Realclimate, where Rasmus wrote:

“One interesting question is how the Barents-Kara sea-ice affects the winter temperatures over the northern continents. By removing the sea-ice, the atmosphere above feels a stronger heating from the ocean, resulting in anomalous warm conditions over the Barent-Kara seas. The local warming gives rise to altered temperature profiles (temperature gradients) along the vertical and horizontal dimensions.

Changes in the temperature profiles, in turn, affect the circulation, triggering a development of a local blocking structure when the sea-ice extent is reduced from 80% to 40%. But Petoukhov and Semenov also found that it brings a different response when the sea-ice is reduced from 100% to 80% or from 40% to1%, and hence a non-linear response. The most intriguing side to this study was the changing character of the atmospheric response to the sea-ice reduction: from a local cyclonic to anti-cyclonic, and back to cyclonic pattern again. These cyclonic and anti-cyclonic patterns bear some resemblance to the positive and negative NAO phases.”

which doesn’t actually say that high pressure is caused by warmer air.  What Rasmus means by “local cyclonic” and “anti-cyclonic” patterns is anyone’s guess – I venture that he may not have been referring specifically to the air pressure over the Barents and Kara Seas.  Rather, he seems to be referring to the well-known positive (“cyclonic”) and negative (“anticyclonic”) NAO “patterns”. I can see a trip ahead to the British Library to access P&S’s original paper…

All I actually want to establish in this post – it’s Chrimbo after all, not a time to do anything resembling work – is that there is indeed a phenomenon to explain.

I’m prompted by a comment Rasmus made in his piece:

“I admit, last winter felt quite cold, but still it wasn’t so cold when put into longer historical perspective. This is because I remember the most recent winters more vividly than those of my childhood – which would be considered to be really frosty by today’s standards. But such recollections can be very subjective, and more objective measurements show that the winters in Europe have in general become warmer in the long run…”

I’m tempted to start with my own contrary anecdotal evidence, but let’s consider the data first.

The Beeb were reporting on all media (lead on News 24 and radio bulletins) on Christmas morning that this December is set to be the coldest since records began in 1890, sorry 1910 (from mid-morning – the online article presumably reflects this correction).  Totally confused and can’t be trusted.  In fact, earlier in the week they had me wondering what happened in 1910 – there’s a big difference between “since 1910″ and “since records began in 1910″.

Why there’s a Year Zero in 1910 is beyond me.  I’ll let you know when I find out.  Presumably someone has decided that records are unreliable before that point, despite the tens of thousands of hours of effort that have gone into constructing the Central England Temperature (CET) record which goes back to 1659.  I can believe that the monthly averages are off by 0.1 or 0.2C, but they’re going to be good enough for the purposes of comparison. Regular readers will be aware that I have imported the CET data into Excel.

The facts are as follows:

Mean December temperature

1. No “records began” in 1890 either.  December that year is the coldest in the entire CET at -0.8C.  There are only 5 other Decembers with mean temperatures below zero: 1676 at -0.5C; 1788, 1796 and 1878 at -0.3C  and 1874 with a pathetic -0.2C.

2. Only one December since 1890 has averaged below 1C – 1981 at 0.3C.

3. The CET for December 2010 up to and including 26th is -1.0C!  OK there are 6 days to go when the weather is expected to be a little milder.  Each of these could knock 0.1 or so off the monthly average.  Even so, it’s odds on that December 2010 will be only the 7th in the entire CET since 1659 averaging a temperature below 0C.

4. It might be worth pointing out that the first cold snap began in November, so the 30 or 31 days up to Boxing Day may be even more exceptional – although this may have happened in previous years as well.

5. December isn’t usually the coldest month.  In fact the last month averaging below 0C in the CET was nearly a quarter of a century ago (though it seems like yesterday, sigh!) – January 1986 at -1.1C.  Before that, not surprisingly was January 1979, the Winter of Discontent at -0.4.   Before that, we have to look to January and February 1963 at -2.1C and -0.7C respectively.  Postwar that only leaves February 1956 at -0.2C and February 1947 at -1.9C.

6. 2010 as a whole will average no more than 8.9C in the CET, so will be the coldest since 1986 at 8.74C (though there’s no chance of it being the coldest since 1963 as suggested at Real Science).

Record daily minima

Another way of assessing a spell of severe weather is by the number of exceptional days, in this case exceptionally cold days.  Ideally we’d ask how many days this year have been in (say) the 10 coldest on record, but I only have data as to the very coldest days, courtesy of The Wrong Kind of Snow, by Antony Woodward and Robert Penn (“W&P”).  This limitation introduces a little more randomness into the exercise than I’d ideally like.  You could have an exceptionally cold day corresponding just by chance to another one on the same date in the past.  In fact, this has happened several times this year:

- 2nd December 2010 was -20.9C at Altnaharra, but failed to beat the -21.1C at Kelso during the Great Frost of 1879.

- similarly the -20.4C recorded at Braemar on 3rd December 2010 is trounced by the -26.7C at Kelso in 1879.

- and there’s a bit of a pattern here as the same thing also happened on 6th and 7th December, when the cold didn’t quite match 1879.

- later in the month, the exceptionally cold Christmas and Boxing Days this year didn’t quite match those in 1878 and 1981 respectively.

The records in W&P go back well over a century, so on average you’d expect no more than 3 over December to be broken per decade.  Let’s make it tough for ourselves and set a benchmark of 5 over November and December per decade.

Now that I’ve built up the suspense how many record low daily minima have occurred so far this winter?

The following list isn’t necessarily complete, I could have missed some (I’ve jotted them in the margins of my copy of :

- 28th November: -18.0C in Llysdinam (Powys).

- 1st December: -21.1C at Altnaharra.

- 8th December: -18.3C at Tyndrum (finally beating one of those 1879 records).

- 19th December: -19.6C at Shawbury (removing one of those 1981 records).

- 20th December: -18.7C at Pershore.

- 21st December: -17.8C at Katesbridge.

- 22nd December: -20.2C at Altnaharra.

- 23rd December: -18.6C at Castlederg.

- and 24th December: -17.4C, also at Castlederg.

I make that 9 daily records.  On this basis, not just Decembers, but early winters (November and December) in the 2010s are after just one year notably cold!

There are a few comparable cold years, of course.  1919 has the coldest days from 13th-16th November (4 daily records), including -23.3 at Braemar on 14th.

1879 has lost 8th December to 2010, but still holds the records for the six days 2nd-7th December inclusive.  It must have been more intensely cold back then than in 2010 as trees were reportedly killed.  This happens somewhere below -20C when the sap can freeze and the tree splits with a loud crack.  W&P’s entry for 4th December reports the same phenomenon in 1855.  That hasn’t happened this year.  Yet.

More recently, 1981 has lost 19th to 2010 but still holds the daily records for 7 days: 11th-14th, 17th-18th and 26th December.  And the four 1995 daily records for 27th-30th December, including -27.2C at Altnaharra on 30th, don’t look under threat this time round.

So putting our global warming expectations to one side for a minute, on the basis of daily minima extremes, 2010 is up there with the 4 or 5 other most notable early winter cold snaps in the last century and a third.

Anecdotal Evidence

I mentioned earlier the very few postwar months averaging below 0C.  These occurred in 5 winters, three of which – 1947, 1962-3 and 1978-9 – feature in Frozen in Time by Ian McCaskill and Paul Hudson (“M&H”).  Of these, I only remember 1978-9.  And that is mostly for a single snow event between Christmas and New Year.  The dry powder snow was more severe than anything this year, but I’d say the 2010 winter weather has already been more sustained in Southern England, at least.

The exceptionally cold January 1986 isn’t covered in M&H.  I suppose it wasn’t photogenic and provides little to write about because there was very little snow.  It was a thoroughly miserable month.  I remember day after day of an unceasing easterly wind bringing grey, bitterly cold, but dry weather.  I was working onsite in a poorly heated office.  If my memory isn’t playing tricks, we eventually used a thermometer to back up our complaints about the conditions.   I also remember the cold 1985 well as the Year of Crossing Frozen Car-parks.  These winters do seem to occur in runs (though I haven’t yet been able to demonstrate any persuasive statistics).

1956, with its cold February, is occasionally mentioned as a severe winter, but largely forgotten.

Will the winter of 2010(-11) be one of those that is remembered decades hence?   Much depends on the social significance.  The industrial action during 1978-9 has become the stuff of legend.  It hardly deserves its place in the Big Three on meteorological grounds alone.  The impact of 1947 was exacerbated by the continuance of wartime rationing – 1940 was also severe, but not reported to the same degree apparently because of government restrictions enacted for reasons of morale and propaganda.   1962-3 was simply exceptionally severe and prolonged.

Conclusions

Can we draw any conclusions?  Whilst I certainly can’t remember a December as persistently cold, and the records suggest there hasn’t been one since the 19th century – I haven’t even discussed the 10 or so days of lying snow we’ve had in Southern England this December, compared to an average here of only one or two – objectivity and a look at the history books is called for.  Taking my various measures in the round, in terms of the early winter, I’d judge 2010 to be around a once in 30 years event, perhaps 50 if we’re feeling generous, and only 100 if we weigh duration a lot more than severity.  But, and it’s a big “but”, given global warming (and the absence of recent cooling volcanic events), and the perhaps unwise predictions of less frequent cold winters that have frequently been made, there is indeed a phenomenon requiring scientific explanation.

My feeling, though, is that we haven’t yet seen enough for 2010-11 to be ranked amongst the overall Great Winters.  The worst Januaries and Februaries are significantly colder than the worst Decembers.  And although there’s been disruption, it’s not really been unprecedented.  Both the snow events and the nights have been severe, but have not in themselves exceeded others in living memory.  There’s been nothing, for example, that you’d really describe as a blizzard and we’ve been a little way off recording the very coldest nights.  The most notable feature has been how long the cold and snow has gone on for, as evidenced by the number of record daily minima and the low mean temperature for December.

We can’t yet expect people to say in one breath, “1947, 1962-3 and 2010-11″.  But the show goes on – we’ll just have to see what happens over the next couple of months!

 

Older Posts »

Theme: Silver is the New Black. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.