Uncharted Territory

February 24, 2011

Extreme Madness: A Critique of Pall et al (Part 3: Juicy Bits and Summary)

Filed under: Effects, Global warming, Science, UK climate trends — Tim Joslin @ 6:22 pm

I continue to be bothered by Pall et al, the paper which attempts to determine how much more likely the autumn 2000 floods in England and Wales were because of the anthropogenic global warming (AGW) since 1900.

To recap, Part 1 of this extended critique described the method adopted by Pall et al and made a few criticisms, one of which I’ll elaborate on in the first part of this post. Part 1 ended by asking why Pall et al didn’t eliminate more statistical uncertainty, given the large number of of data points they produced (they ran over 10,000 simulations of the climate in 2000 when floods occurred).

Part 2 looked more closely at how Pall et al had defined risk and uncertainty and handled it statistically. Part 3 will further question the approach adopted, in particular by considering the uncertainty introduced by the process of modelling the climate itself.

Oops, it’s a log scale, or “about this 0.41mm threshold” revisited

In Part 1, I noted the arbitrariness of the threshold for severe flooding adopted by Pall et al. They considered their model had predicted flooding when it estimated 0.41mm/day or more of runoff, but their Fig 3 clearly shows that this level actually gives rather more than the 5-7 floods in the ~2000 model runs of each of the 4 A2000N scenarios (those without AGW, the AGW runs being referred to as the A2000 series, of which around 2000 were also run) that would be expected for the once in 3-400 year event the 2000 floods are said to be.

Pall et al includes no evidence as to the skill of their model in predicting flooding or calibration between the models’ estimation of runoff in the 2000 floods and what actually happened in the real world. As I noted in Part 1, they could have run the model for years other than 2000 in order to show what is termed its “skill”, in this case in predicting flooding.

Why, then, did Pall et al not calibrate their model? Because they didn’t think it mattered, that’s why. They write:

“Crucially, however, most runoff occurrence frequency curves in Fig 3 remain approximately linear over a range of extreme values, so our FAR estimate would be consistent over a range of bias-corrected flood values.”

It’s about time we had a picture, and I can now include Pall et al’s Fig 3 itself. Ignore the sub-divisions on the bottom of the 2 scales in each diagram – these are in error as pointed out in Part 1. The question for any youngsters reading is: are the scales on these diagrams linear or logarithmic?:

Answer: logarithmic, of course.

So is it the case that the “FAR estimate would be consistent over a range of bias-corrected flood thresholds”? The FAR, remember, is the ratio of the AGW risk of flooding to the non-AGW risk of flooding. This ratio would indeed not depend on the level chosen in the model set-up to indicate flooding of the extent seen in the real world in 2000 were the runoff occurrence frequency curves linear. But they’re not. They’re logarithmic. The increased risk therefore does depend on the flood level, as was seen simply from reading figures off the diagrams in Part 1. One wonders if we’re all clear exactly what the graphs in Pall et al’s Fig 3 actually represent.

Does Pall et al actually tell us anything useful at all?

The Pall et al study assumes it has some skill in forecasting flooding in England in autumn from the state of the climate system in April. Unfortunately we have no idea what this level of skill actually is. The model has not been calibrated against the real world by running it for years other than 2000 (or if it has, this information is not included in Pall et al). Note that analysing the results of such an exercise would not be a trivial exercise, since there are two unknowns: the skill of the model and its bias. As far as we know, 0.41mm runoff in the model could be anything in the real world – 0.35mm or 0.5mm, we have no idea. Similarly we don’t know if the model would forecast floods such as those in 2000 with a probability of 1 in 10, 1 in 100 or whatever.

To be fair, Pall et al do devote one of their 4 pages in Nature to showing their modelling does bear some relation to reality. Their Fig 1 shows similar correlation between Northern Hemisphere (NH) air pressure patterns in the model and rainfall in England and Wales as exists in the real world. And their Fig 2 shows that the rainfall patterns in the model bear some resemblance to those in the real world.

But one (more) big problem nags away at me. The basic premise is that a particular pattern of SSTs and sea ice causes the pressure system patterns that lead to rainfall in the UK. Pall et al therefore used the observed April 2000 pattern as input to the A2000 (AGW) series of model runs. But the patterns used for the non-AGW (A2000N) runs were different. Here’s what they say:

“…four spatial patterns of attributable [i.e. to AGW] warming were obtained from simulations with four coupled atmosphere-ocean climate models (HadCM3, GFDLR30, NCARPCM1 and MIROC3.2)… Hence the full A2000N scenario actually comprises four scenarios with a range of SST patterns and sea ice…” [my stress]

So if the A2000 model runs can predict flooding in a particular year from the SST and sea ice pattern in April, we wouldn’t expect the A2000N runs to do so, not just because everything is warmer, but also because the SST and sea ice patterns are different! So we don’t know whether the increased flood risk in the A2000 series is because of global warming or because the SST patterns are different.

It also seems to me that were it the case that Pall et al’s model could predict autumn flooding in April around 15-20x as often as it actually occurs (around 1 in 20 times for 2000 compared to the actual risk of 1 in 3-400) as is implied by their Fig 3, then we’d be reading about a breakthrough in seasonal forecasting and more money would be being invested to improve the modelling further (and increase the speed of forecasting of course, so that it’s not autumn already by the time we know it’s going to be wet!). This isn’t just the forecast for the next season we’re talking about, which the Met Office has given up on, but the forecast for the season after that.

So I’m not convinced. I’m going to assume that Pall et al’s modelling can’t tell one year from another, and that all they’ve done is model the increased risk of flooding in a warmer world in general. (One way to test this would be to compare the flood risks of the 4 A2000N models against each other for the same extent of AGW – it could be that the models give different results simply because they suggest different amounts of warming, not different patterns).

Under this not very radical assumption, we can actually calibrate Pall et al’s modelling. We know that the floods in 2000 were a once in 3-400 year event. That implies that in each of the diagrams in Fig 3 there should be around 5-7 floods (there are – or should be – approx. 2000 dots representing non AGW model runs on each diagram). We can therefore estimate by inspecting the figures how much flooding in the model respresents a 3-400 flood – it’s the level with only 5-7 dots above. We can then read across to the line of blue dots (the AGW case) and then, by reading up to the return time scale (the one with correct subdivisions), work out how often the modelling suggests the flooding should then occur. Here’s what I get:
– Fig 3a: 3-400 year flood threshold ~0.49mm; risk after AGW once every 40 years.
– Fig 3b: ~0.47mm, and risk now once every 30 years.
– Fig 3c and d: ~0.5mm, and risk now once every 50 years.

So the Pall et al study implies, assuming it’s no better at forecasting flooding when it knows the SST and sea ice conditions in April than it is if it doesn’t, that the risk of a 3-400 year flood in England and Wales, similar or more severe to that which occurred in 2000 is now, as a result of AGW up to 2000 only, between once in 30 and once in 50 years. That is, under this assumption, the risk of flooding in England and Wales of what was previously once in 3-400 year severity has increased by a factor of between 6 and 13, according to Pall et al’s modelling.

Trouble is, the Pall et al model may have a bit of skill in forecasting flooding from April SST and sea ice conditions (the A2000 case) and this skill may have been reduced by an unknown factor when processing the data to remove the effects of 20th century warming. If Pall et al’s results are to have any meaning whatsoever they need to do further work to establish the skill of the model and calibrate it to measures of flooding in the real world.

More uncertainty about uncertainty

In Part 2 I discussed how Pall et al’s treatment of uncertainty has resulted in them actually saying very little. Essentially, they’ve estimated that the risk of autumn flooding as great as or exceeding that in 2000 has increased as a result of AGW by between 20% and around 700% – and there’s 20% probability it could be outside that range! I argued that the sources of this uncertainty are:
(i) the 4 different models used to derive conditions as if AGW hadn’t happened – fair enough, we can’t distinguish between these, (though in Part 1 I estimated how certain we’d be of the increased risk of flooding if we did assume they were all equally probable), and
(ii) statistical uncertainty which could have been eliminated.

But these are not the only sources of uncertainty. We are also uncertain of all the parameters used to drive the HadAM3-N144 model which attempts to reproduce the development of the autumn weather from the April conditions that were fed into it; we’re uncertain of the accuracy of the April SST and sea-ice conditions input into the model; we’re uncertain as to whether atmosphere-ocean feedbacks may have affected the autumn 2000 weather (Pall et al are explicit that such feedbacks were insignificant, so used “an atmosphere-only model, with SSTs and sea ice as bottom boundary conditions); we’re uncertain of the precise magnitude of the forcings in 2000 which affected the development of the autumn weather; we’re uncertain as to whether there are errors in the implementation of the models; and we’re uncertain as to whether there are processes below the resolution of the model which are important in the development of weather patterns. There are probably more.

Consider that the reason we are uncertain as to which of the 4 models used to derive the A2000N initial conditions is most correct (or how correct any of them are) is because we don’t know how well each of them perform on moreorless the same criteria as the higher resolution model used to simulate the 2000 weather. If they didn’t have different parameters, all had the same resolution and so on, then – tautologically – they’d all be the same! If we’re uncertain which of those is most accurate then we must also be uncertain about the HadAM3-N144 model. Just because only one model was used for that stage of the exercise doesn’t mean we’re not still uncertain (and for that matter the fact that we’ve used 4 in the first stage doesn’t mean we’re certain any of them, they could all be wildly wrong, a possibility not apparently taken account of in Pall et al).

It seems to me the real causes of uncertainty in the findings of Pall et al derive from the general characteristics of the models, not (as discussed in Part 2) the statistical uncertainty as to the amplitude of 20th century warming (the 10 sub-scenarios for each of the 4 cases) which has been used.

Judith Curry has recently written at length about uncertainty and her piece is well worth a look (though I disagree where statistical uncertainty belongs in Rumsfeld’s classification – I think it’s a known unknown, maybe in a “knowable” category, since it can be reduced simply by collecting more of the same type of data as one already has). In particular, though, she provides a link to IPCC guidelines on dealing with uncertainty (pdf). A quick skim of this document suggests to me that Probability Distribution Functions (PDFs) such as Pall et al’s Fig 4 should be accompanied by a discussion of the factors creating uncertainty in the estimate, including some consideration as to how complete the analysis is deemed to be. I say deemed to be, since by it’s very nature uncertainty is uncertain!

That seems a good note to end the discussion on.

Here’s Pall et al’s Fig 4 (apologies if it looks a bit smudged):


In Part 1 of this critique I identified the two main problems with Pall et al:
– the model results are not calibrated with real world data. The paper therefore chooses an arbitrary threshold of flooding.
– statistical uncertainty has not been eliminated, rather it seems to have been introduced unnecessarily.

Part 2 drilled down into the issue of statistical uncertainty and suggested how Pall et al could have used the vast computing resources at their disposal to eliminate much of the uncertainty of their headline findings.

Part 3 picks up on some of the issues raised in Parts 1 and 2, in particular noting that the paper seems to include an erroneous assumption which led them to conclude that calibration of their model for skill and bias was not important. If my reasoning is correct, this was a mistake. Part 3 also continues the discussion about uncertainty, suggesting that the real reasons for uncertainty as to the increased risk of flooding have not been included in the analysis (whereas statistical uncertainty should have been eliminated).

There are so many open questions that it is not clear what Pall et al does tell us, if anything. I suspect, though, that the models used have little skill in modelling autumn floods on the basis of April SST and sea ice conditions. If this is correct then the study confirms that extreme flooding in general is likely to become more frequent in a warmer world, with events that have historically been experienced only every few centuries occurring every few decades in the future.

Note: Towards the end of writing Part 3 I came across another critique by Willis Eschenbach.  So there may well be a Part 4 when I’ve digested what Willis has to say!

February 22, 2011

Extreme Madness: A Critique of Pall et al (Part 2: On Risk and Uncertainty)

Filed under: Effects, Global warming, Science, UK climate trends — Tim Joslin @ 2:42 pm

Keeping my promises? Whatever next! I said on Sunday that I had more to say on Pall et al, and, for once, I haven’t lost interest. Good job, really – after all, Pall et al does relate directly to the E3 project on Rapid Decarbonisation.

My difficulties centre around the way Pall et al handle the concepts of risk and uncertainty. I’m going to have to start at the beginning, since I doubt Pall et al is fundamentally different in many respects from other pieces of research. They’re no doubt at least trying to follow standard practice, so I need to start by considering the thinking underlying that. I feel like the Prime alien in Peter Hamilton’s Commonwealth Saga (highly recommended) trying to work out how humans think from snippets of information!

Though I should add that Pall et al does have the added spice of trying to determine the risk of an event that has already occurred. That’s one aspect that really does my head in.

Let’s first recap the purpose of the exercise. The idea is to try to determine the fraction of the risk of the 2000 floods in the UK attributable (the FAR) to anthropogenic global warming (AGW). This is principally of use in court cases and for propaganda purposes, though it may also be useful to policy-makers as it implies the risk of flooding going forward, relative to past experience.

Now, call me naive, but it seems to me that, in order to determine the damages to award against Exxon or the UK, those crazy, hippy judges are going to want a single number:
– What, Mr Pall et al, is your best estimate of the increased risk of the 2000 autumn floods due to this AGW business?
– Um, we’re 90% certain that the risk was at least 20% greater and 66% certain that the risk was 90% greater…
– I’m sorry, Mr Pall et al, may we have a yes or no answer please.
– Um…
– I mean a single number.
– Sorry, your honour, um… {shuffles papers} here it is! Our best estimate is that the 2000 floods were 150% more likely because of global warming, that is, 2 and a half times as likely, that is, the AGW FAR was 60%.
– Thank you.
– Yes?
– How certain is Mr um {consults notes} Pall et al of that estimate.
– Mr Pall et al?
– Let’s see… here it is… yes, we spent £120 million running our climate model more than 10,000 times, so our best estimate is tightly constrained. We have calculated that 95% of such suites of simulations would give the result that the floods were between 2.2 and 2.8 times more likely because of global warming [see previous post for this calculation].

But Pall et al don’t provide this number at all! This is what Nature’s own news report says:

“The [Pall et al] study links climate change to a specific event: damaging floods in 2000 in England and Wales. By running thousands of high-resolution seasonal forecast simulations with or without the effect of greenhouse gases, Myles Allen of the University of Oxford, UK, and his colleagues found that anthropogenic climate change may have almost doubled the risk of the extremely wet weather that caused the floods… The rise in extreme precipitation in some Northern Hemisphere areas has been recognized for more than a decade, but this is the first time that the anthropogenic contribution has been nailed down… The findings mean that Northern Hemisphere countries need to prepare for more of these events in the future. ‘What has been considered a 1-in-100-years event in a stationary climate may actually occur twice as often in the future,’ says Allen.” [my stress]

When Nature writes that “anthropogenic climate change may have almost doubled the risk of the extremely wet weather that caused the floods” [my stress] what they are actually referring to is the “66% certain that the risk was 90% greater”, mentioned by Pall et al in court (and as “two out of three cases” in the Abstract of Pall et al even though the legend of Fig 4 in the text clearly states that we’re talking about the 66th percentile, i.e. 66, not 66.66666… but I’m beginning to think we’ll be here all day if we play spot the inaccuracy – the legend in their Fig 2 should read mm per day not mm^2, that would get you docked a mark in your GCSE exam).

We could have a long discussion now about the semantics and usage in science of the words “may” and “almost” as in the translation of “66% certain that the risk was 90% greater” into “may have almost doubled”, but let’s move on. The point is that in the best scientific traditions a monster has been created, in this case a chimera of risk and uncertainty that the rest of the human race is bound to attack impulsively with pitch-forks.

So how did we get to this point?

Risk vs uncertainty

It’s critical to understand what is meant by this these two terms in early 21st century scientific literature.

Risk is something quantifiable. For example, the risk that an opponent may have been dealt a pair of aces in a game of poker is perfectly quantifiable.

First, why, then do poker players of equal competence sometimes win and sometimes not? Surely the best players should win all the time, because after all, all they’re doing is placing bets on the probability of their opponent holding certain cards. One reason is statistical uncertainty. There’s always a chance in a poker session that one player will be dealt better cards than another. Such uncertainty can be quantified statistically.

But there’s more to poker than this. Calculating probabilities is the easy part. The best poker players can all do this. So the second question is why, then, are some strong poker players better than others? And why do the strongest human players still beat the best computer programs – which can calculate the odds perfectly – in multi-player games? The answer is that there’s even more uncertainty, because you don’t know what the opponent is going to do when he has or does not have two aces. Some deduction of the opponent’s actions is possible, but these require understanding the opponent’s reasoning. Sometimes he may simply be bluffing. Either way, to be a really good poker player you have to get inside your opponent’s head. The best poker players are able to assess this kind of uncertainty, the uncertainty as to how much the statistical rules to apply in any particular case, uncertainties as to basic assumptions.

Expressing risk and uncertainty as PDFs

PDFs in this case doesn’t stand for Portable Document Format, but Probability Density (or Distribution) Function.

The PDF represents the probability (y-axis) of the risk (x-axis) of an event, that is, the y-axis is a measure of uncertainty. Pall et al’s Fig 4 is an example of a PDF. It’s where their statement in court that they were 90% sure that the risk of flooding was greater than 20% higher because of AGW (and so on) came from.

The immediate issue is that risk is a probability function. Our best estimate of the increase in risk (the FAR) because of AGW is 150%, so we’re already uncertain whether the 2000 floods were caused by global warming (the probability is 60% or 3/5). So we have a probability function of a probability function. The only difference between these probability functions is that the one is deemed to be calculable, the other not. Though it has in fact been calculated! Furthermore, as we’ll see, some aspects of the uncertainty in the risk can be reduced, and other aspects cannot – the PDF includes both statistical uncertainty and genuine “we don’t know what we know” uncertainty (and I’m not even discussing “unknown unknowns” here, both types of uncertainty are unknown knowns).

Risk and uncertainty in Pall et al

What Pall et al have done is assume their model is able to assess risks correctly. Everything else, it seems, is treated as uncertainty.

Their A2000 series is straightforward enough. They set sea surface temperatures (SSTs) and the sea-ice state to those observed in April 2000 and roll the model (with minor perturbations to ensure the runs aren’t all identical).

But for the A2000N series they use the same conditions, but set GHG concentrations to 1900 levels, subtract observed 20th century warming from SSTs and project sea-ice conditions accordingly. There’s one hint of trouble, though, they note that the SSTs are set “accounting for uncertainty”. I’m not clear what this means, but it doesn’t seem to be separated out in the results in the same way as will be seen is done for other sources of uncertainty.

They then add on the warming over the 20th century that would have occurred without AGW, i.e. with natural forcings only, according to 4 different models, giving 4 different patterns of warming in terms of SSTs etc. As will be seen, for each of these 4 different patterns they used 10 different “equiprobable” temperature increase amplitudes.

First cause of uncertainty: 4 different models of natural 20th century warming

As Pall et al derive the possible 20th century natural warming using 4 different models giving 4 different patterns of natural warming, there are 4 different sets of results, giving 4 separate PDFs of the AGW FAR of flooding in 2000. Now, listen carefully. They don’t know which of these models gives the correct result, so – quite reasonably – they are uncertain. Their professional judgement is to weight them all equally, so that means that so far, they’ll only be able to say at best something like: we’re 25% certain the FAR is only x; 25% certain it’s y; 25% certain it’s z; and, crikey, there’s a 25% possibility it could be as much as w!

Trouble is, they can only run 2,000 or so of each of 4 non AGW simulations. So for each of the 4 there’ll be a sampling error. They treat this statistical uncertainty in exactly the same way as what we might call their professional judgement uncertainty, which certainly gives me pause for thought. So what happens is they smear the 4 estimates x, y, z and w and combine them into one “aggregate histogram” (see their Fig 4). That’s how they’re able to say we’re 90% certain the FAR is >20% and so on.

Nevertheless, their Fig 4 also includes the 4 separate histograms for our estimates x, y, z and w. It’s therefore possible for another expert to come along and say, “well, x has been discredited so I’m just going to ignore the pink histogram and look at the risk of y, z and w” or “z is far and away the most thorough piece of work, I’ll take my risk assessment from that”, or even to weight them other than evenly.

One of the 4 models may be considered an outlier, as in fact the pink (NCARPCM1) one is in this case. It’s the only one with a most likely (and median) FAR below the overall median value (or the overall most likely value which happens to be higher than the overall median). Further investigation might suggest it should be discarded.

Another critical point: x, y, z and w can be determined as accurately as we want by running more simulations, because the statistical uncertainty reduces as the square root of the number of data items (see Part 1).

I’m not going to argue any more as to whether the 4 models introduce uncertainty. Clearly they do. I have no way of determining which of the 4 models most correctly estimate natural warming between 1900 and 2000. It’s a question of professional judgement.

However, I will point out that if uncertainty between the models is not going to be combined statistically (as in the previous post) I am uneasy about combining them at all:

Criticism 6: The headline findings against each of the 4 models of natural warming over the 20th century should have been presented separately in a similar way to the IPCC scenarios (for example as in the figure in my recent post, On Misplaced Certainty and Misunderstood Uncertainty).

Second cause of uncertainty: 10 different amounts of warming from each of the 4 models of natural 20th century warming

But Pall et al didn’t stop at 4 models of natural 20th century warming. They realised that each of the 4 models has statistical uncertainty in its modelling of the amount of natural warming to 2000. The models in particular each noted a risk of greater than the mean warming. This has to be accounted for in the initial data to our flood modelling. Never mind, you’d have thought, let’s see how often floods occur overall, because what we’re interested in is the overall risk of flooding.

But Pall et al didn’t simply initialise their model with a range of initial values for the amplitude of warming for each of their 4 scenarios. They appear to have created 10 different warming amplitudes for each of the 4 scenarios and treated each of these as different cases. This leaves me bemused, as the 4 scenarios must also have had different patterns of warming, so why not create different cases from these? Similarly, they seem to have varied initial SST conditions in their AGW model since they “accounted for uncertainty” in that data. Why, then, were these not different cases?

I must admit that even after spending last Sunday morning slobbing about pondering Pall et al, rather than just slobbing about as usual, I am still uncertain(!) whether Pall et al did treat each of the 10 sub-scenarios as separate cases. If not, they did something else to reduce the effective sample size and therefore increase the statistical uncertainty surrounding their FAR estimates. Their Methods Summary section talks about “Monte Carlo” sampling, which makes no sense to me in this case as we can simply use Statistics 101 methods (as shown in Part 1).

The creation of 10 sub-scenarios of each scenario (or the Monte Carlo sampling) effectively means that, instead of 4 tightly constrained estimates of the risk, we have 4 wide distributions. Remember (see previous post) the formula for calculating the statistical uncertainty (Standard Deviation (SD)) that the mean of a sample represents the mean of the overall population is:

SQRT((sample %)*(100-sample%)/sample size) %

so varies with the square root of the sample size. In this case the sample sizes for each of the 4 scenarios was 2000+, so that of each of the 10 subsets was only around 200. The square root of 10, obviously, is 3 and a bit, so the error associated with a sample of 200 gives an error 3 times as large as if the sample size were 2000.

For example, one of the yellow runs is an outlier: it predicts floods about 15% of the time. How confident can we be in this figure?:

SQRT((15*85)/200) = ~2.5

So it’s likely (within 1 SD either way) that the true risk is between 12.5 and 17.5% and very likely (2 SD either way) only that it is between 10 and 20%.

So if we ran enough models we might find that that particular yellow sub-scenario only implied a flood risk of somewhere around 10%. Or maybe it was even more. The trouble is, in salami-slicing our data into small chunks and saying we’re uncertain which represents the true state of affairs, we’ve introduced statistical uncertainty. And this affects our ability to be certain, since it is bound to increase the number of extreme results in our suite of 40 scenarios, disproportionately affecting our ability to make statements as to what we are certain or very certain of.

Criticism 7: The design of the Pall et al modelling experiment ensures poor determination of the extremes of likely true values of the FAR – yet it is the extreme value that was presumably required, since that was presented to the world in the form of the statement in the Abstract that AGW has increased the risk of floods “in 9 out of 10 cases” by “more than 20%“. The confidence in the 20% figure is in fact very low!

Note that if the April 2000 temperature change amplitude variability had been treated as a risk, instead of as uncertainty, the risks in each case would have been tightly constrained and the team would have been able to say it was very likely (>90%) that the increased flood risk due to AGW exceeds 60% (since all the 4 scenarios would yield an increased risk of more than that) and likely it is greater than 150% (since 3 of the 4 scenarios suggest more than that).

The problem of risks within risks

Consider how the modelling could have been done differently, at least in principle. Instead of constructing April 2000 temperatures based on previous modelling exercises and running the model from there, they could have modelled the whole thing (or at least the natural forcing representations) from 1900 to autumn 2000 and output rainfall data for England. Without the intermediate step of exporting April 2000 temperatures from one model to another there’d be no need to treat the variable as “uncertainty” rather than “risk”.

Similarly, say we were interested in flooding in one particular location. Say it’s April 2011 and we’re concerned about this autumn since the SSTs look rather like those in 2000. Maybe we’re concerned about waterlogging of Reading FC’s pitch on the day of the unmissable local derby with Southampton in early November. Should we take advantage of a £10 advance offer for train tickets for a weekend away in case the match is postponed or wait until the day and pay £150 then if the match is off?

In this case we’d want to feed the aggregate rainfall data from Pall et al’s model into a local rainfall model. By Pall et al’s logic everything prior to our model would count as “uncertainty”. We’d input a number of rainfall scenarios into our local rainfall model and come up with a wide range of risks of postponement of the match, none of which we had a great deal of confidence in. I might want to be 90% certain there was a 20% chance of the match being postponed before I spent my tenner. I’d have to do a lot more modelling to eliminate statistical uncertainty if I use 10 separate cases than if I treat them all the same.

How Pall et al could focus on improving what we know

If we inspect Pall et al’s Figs 3, it looks first of all that very few – perhaps just 1 yellow and 1 pink – of the 40 non-AGW cases result in floods 10% of the time (this includes the yellow run that predicts 15%). About 12% of the AGW runs result in floods. Yet we’re only able to say we are 90% certain that the flood risk is 20% greater because of AGW. This would imply at most 4 non AGW runs within 20% of the AGW flood risk (i.e. predicting a greater than 10% flood risk).

If we look at Pall et al’s Fig 4, we see that, first:
– the “long tail” where the risk of floods is supposedly somewhat (FAR <-0.25!) greater “without AGW” is almost entirely due to the yellow outlier case. If just 10 runs in this case had not predicted flooding instead of predicting it then the long tail of the entire suite of 10,000 runs would have practically vanished.
– the majority of the risk of the FAR being below its 10th percentile (giving rise to the statement of 90% probability of a FAR of greater than (only) 20%) is attributable to pink cases.

It would have been possible to investigate these cases further, simply by running more simulations of the critical cases to eliminate the statistical uncertainty. I can hear people screaming “cheat!”. But this simply isn’t cheating. Obviously if 10x as many runs of the critical cases as non-critical ones are done, they’d have to be scaled down when the statistical data is combined (but this must have been done anyway as the sample sizes for the different scenarios were not the same). It’s not cheating. In fact, it’s good scientific investigation of the critical cases. If we want to be able to quote the increased risk of flooding because of AGW at the 10 percentile level (i.e. that we’re 90% sure of) with more certainty then that’s what our research should be aimed at.

Of course, if we find that the yellow sub-scenario really does suggest a risk of flooding of 15%, somewhat more than with AGW on top, and we don’t see regression to the mean, that might also tell us something interesting. Maybe the natural variability is more than we thought and that April 2000 meteorological conditions (principally SSTs) were possible that would have left the UK prone to even more flooding than actually occurred with more warming.

Criticism 8: Having introduced unnecessary uncertainty in the design of their modelling experiment, Pall et al did not take use of the opportunities available to eliminate such uncertainty by running a final targeted batch of simulations.

Preliminary conclusion

It looks like there’s going to have to be a Part 3 as I have a couple more points to make about Pall et al and will need a proper summary.

Nevertheless, I understand a lot better than I did at the outset why they are only able to say we’re 90% certain the FAR is at least 20% etc.

But I still don’t agree that’s what they should be doing.

We want to use the outputs of expensive studies like this to make decisions. Part of Pall et al’s job should be to eliminate statistical uncertainty, not introduce it.

They should have provided one headline figure of the increased risk due to global warming, about 2.5 times as much, taking into account all their uncertainties.

And the only real uncertainties in the study should have been between the 4 different patterns of natural warming. These are the only qualitative differences between their modelling runs. Everything else was statistical and should have been minimised by virtue of the large sample sizes.

If we just label everything as uncertainty and not as risk, we’re not really saying anything.

After all, it might be quite useful for policy-makers to know that flood risks are already 2.5 times what they were in 1900. This might allow the derivation of some kind of metric as to how much should be spent on flood defences in the future, or even on relocation of population and/or infrastructure away from vulnerable areas. Knowing that the scientists are 90% certain the increased risk is greater than 20% really isn’t quite as useful.

The aim of much research in many domains, including the study of climate, and in particular that of Pall et al should be to quantify risks and eliminate uncertainties. It rather seems they’d done neither satisfactorily.

(to be continued)

23/2/11, 16:22: Fixed typo, clarified remarks about the value of Pall et al’s findings to policy-makers.

February 20, 2011

Extreme Madness: A critique of Pall et al (Part 1: General comments on the paper and discussion of use of statistics)

Filed under: Effects, Global warming, Science, UK climate trends — Tim Joslin @ 3:59 pm

Do what I say, not what I do. Refrain from seeking out papers in scientific journals, because they inevitably create more questions than answers. Jobs for the boys, I suppose.

I first read about Pall et al last Thursday when a headline on guardian.co.uk caught my eye: Climate change doubled likelihood of devastating UK floods of 2000. What could that possibly mean? The point is we know the floods occurred.

Are we saying there would have been a 50% chance of them happening if global warming hadn’t occurred? That would at least make sense, but seems to me extremely unlikely, since autumn 2000 was apparently the wettest since records began in 1766. The chances of an entirely different set of weather events in a parallel universe coming together to produce something as extreme is clearly much less than one in two.

I was just mulling over this when a Realclimate post notification popped into my Inbox. Nature, which this week splashed on rain (ho, ho), had of course caught the eye of Gavin Schmidt, who reported on Pall et al and another paper in the same issue. I immediately dived in where professional scientists with people to upset fear to tread and voiced some of my concerns. Gavin responded (I stand by the points I made which he disagrees with, btw) and the debate went on, a Mathieu chipped in, violently agreeing with me, as I pointed out and I similarly responded to some remarks by a Thomas.

At this point I started to get serious about the issue. The rest of this post is a more systematic critique of Pall et al.

What a way to conduct a debate

It is absurd that we are attempting to formulate policy on the basis of information that is not in the public domain. Particularly since a weekly scientific news cycle has developed as the main journals try to grab headlines. As well as the main Guardian article, George Monbiot also commented soberly on Pall et al, remarking that:

“[Pall et al] gives us a clear warning that more global heating is likely to cause more floods here.”

though when he says:

“They found that, in nine out of 10 cases, man-made greenhouse gases increased the risks of flooding…”

he (or the dreaded sub-editor) has in fact lost the sense of Pall et al’s Abstract, which went on to say:

“…by more than 20%”.

so George’s “nine out of 10” is in fact an understatement.

The science news cycle process does rather allow a bit of spin. I hate to say it, but the main Guardian piece does have the feel of having been planned in advance – hey, journo, here’s three quotes for the price of one. As well as Myles Allen (the leader of the Pall et al team, and one of the paper’s authors), a Richard Lord QC is also quoted. It’s not immediately obvious, but Lord appears to be a long-time collaborator of Allen in what has to be described as a political project to use the legal system to tackle the global warming problem. I’m not at all sure about the “blame game” in general. It seems if anything to put obstacles in the way of reaching international agreement on emissions cuts.

It wasn’t until Friday afternoon that I was able to read the whole of Pall et al, rather than just the Abstract (thanks Ealing Central Library). Nature is a good journal, but I don’t think they paid for the work that went into Pall et al. In fact the climate modelling was actually executed by volunteers at climateprediction.net. This is an exciting initiative, but, as someone who once participated (I was pleased my model showed an extreme result of something like 11C 21st century warming!), it would be much better – and I’d be much more likely to take the trouble to participate again – if the results were presented in an open manner, rather than held back (it seems) for scientific papers that appear a year after they’re submitted, so well after the experiment. Much more could be done to at least explain the findings of all the experiments to date on the site.

Anyway, here I finally am with a cup of tea, a hot-cross bun and my dissecting kit, so let’s proceed…

The Pall et al method

It turns out that what Pall et al did was initialise the state of climate models to April 2000. They ran one set of 2268 simulations (their A2000) with the actual conditions and other sets (of 2158, 2159, 2170 and 2070 simulations) each with one of 4 counterfactuals (each with 10 “equally probable” variants so 40 scenarios in all), with global warming stripped out.

They fed the climate model inputs into a flood model to determine run-off and considered the floods had been predicted if the average daily run-off was equal or greater than the 0.41mm recorded in autumn 2000.

The result was a set of graphs showing the results with and without global warming. Basically these consist of a bunch of results from the global warming case and each of the 4 models. They show these as cumulative frequency distributions, such that 100% of the global warming case (a line of dots on the log scale they use) result in run-off above 0.3mm/day, 13% (1 in about 7.5) above 0.4mm, maybe (the graphs are quite small) 12% (1 in about 8.5) above the actual flood level of 0.41mm/day and so on, with around 1.2% (1 in about 80) as high as 0.55mm a day (which presumably is a Biblical level). Actually I’ve just realised that in fact the graphs (Pall et al’s Fig 3) are printed with the horizontal access logarithmic scale marked with the same subdivisions for occurrence frequency (as I carelessly read before my final Realclimate post) and its inverse, return time (which actually is a log scale) – you’d think a peer reviewer or someone at Nature would have spotted that in the 10 and a half months between submission and publication.

The other cases (the A2000Ns in Pall et al’s terminology) are each 10 similar lines of dots, so appropriately enough they appear as a spray, running below the A2000 line, except in 2 cases which manage to nip above the A2000 line.

Call me naive, but I think this shows that in >95% of cases (that is, except for two out of 40, part of the time) the 2000 floods were worse than they would have been without global warming. That is, according to the modelling, the exercise has shown, statistically significantly, that the flooding was worse as a result of global warming. All we need to do is assume the same model errors affected all the scenarios approximately equally. This seems an intelligent conclusion.

But that’s not what the authors do. They randomly select from the A2000s and each of the 4 sets of A2000Ns to produce graphs of the probability distribution of the run-off being more likely to exceed the threshold of 0.41mm/day (the actual level). They also produce a combined graph, and this is where the aforementioned increased risk of greater than 20% in 9 out of 10 cases comes from, as well as an increased risk of 90% in 2 out of 3 cases and the Guardian headline of approximately double the risk at the median.

The point is that Pall et al don’t want to just say “flooding will be more severe”, they want to be able to calculate the fraction of attributable risk (FAR) for anthropogenic global warming (AGW) for the particular event. Why? So they can take people to court, that’s why.

As I noted in my final Realclimate post on the topic, it seems to me that Pall et al are trying to push things just a little too far.

About this 0.41mm threshold

This wasn’t where I intended to start, but it seems logical. Why define the flood event in this way? Why not say anything over say 0.4mm/day would count as a flood? Floods aren’t threshold types of things anyway.

Further, why are we including runs with very high runoffs? These types of models are known to sometimes “go wild”. Surely we’re interested in forecasting the actual flood event, not some other extreme.

One effect of choosing the 0.41mm threshold is it makes the flood reasonably rare. But as I argued repeatedly on Realclimate, the flood definitely happened; one reason it’s rare in the modelling experiment is because the model (and/or the initial data it was supplied with) is not good enough to forecast it more than about 1 in 8.5 cases or about 12% of the time. We’ll have to come back to this.

Now here’s another pet hate. The fact that the flood is rare in both the A2000 and A2000N model runs means that the result can (and is) expressed as a % increase in risk, even if George Monbiot (or his sub-editor) managed to miss this off. If the occurrence in both sets of data had been higher then these percentages would have been considerably lower.

For example, Fig 3b (using GFDLR30 data in purple, for those with access to the paper) is the easiest to read as the A2000 series is much better than the purple set of A2000Ns at predicting the flood. For the “best” (probably warmest) of the purple A2000N series, I can therefore read off intersection data together with that for the A2000 series. For 0.41mm/day A2000 predicts the flood about 12% of the time (1 run in every 8.5) whilst the A2000N predicts it 5% of the time (one year in 20). We’d conclude on the basis of this data that the increased risk of the flood because of AGW is around 140% (i.e. 12/5 = 2.4 times what it was before).

But for 0.35mm I get 50% (1 in 2) and 33% (1 in 3) respectively, so the flood risk is only about 50% greater!

As a check, if I go even higher to 0.46mm I get 5% (1 in 20) and about 1.5% (around 1 in 70), so the flood risk is 233% greater.

It’s well known, as discussed in the other paper in this week’s Nature, Min et al, that climate models tend to underestimate extreme precipitation events, so choosing a lower runoff threshold for the flood might have made some sense. On the other hand, exceptionally extreme events become much more likely with AGW.

I can’t find any calibration between the models used by Pall et al and actual rainfall (e.g. by trying to simulate other years) – maybe they’re just not very good at forecasting rainfall in flood years or maybe they forecast the same rainfall every year, regardless of the initial condition in April.

Criticism 1: The paper should have included the the real-world distribution of run-offs which the modelling is supposedly correlated with.

Criticism 2: The paper should have included validation of the model against actual run-offs over a number of years. Some model runs should have been initialised to the conditions in April 1999, 2001 etc.

If I’d been editor of Nature (and I never will be if this upsets the wrong people – the sacrifices I make for truth), I might have asked for such a calibration or at least a sensitivity analysis between the “increased risks” and the flood threshold value chosen.

Criticism 3: The results should have been presented as a graph of increased risk of floods of different severity (and therefore different return times).

About this computing time

As I mentioned earlier, Pall et al ran over 10,000 simulations the autumn 2000 weather. Yet whilst their mean case is that the floods in the AGW case were about 2.5 times likely as without AGW, they are only 90% confident that the floods were 20% more likely to occur.


If I do an opinion poll – as I happen to have – I can tell you within a small % how the nation will vote.

So I stared at Pall et al’s method and the more I think about it the more bizarre it seems. They’ve only gone and sampled the samples! In their Fig. 4 they’ve presented a Monte Carlo distribution of samples of pairs from each set of simulations, plotting the probability in each case of the floods being worse because of AGW. They don’t give the sample size – 43 say – of each of these Monte Carlo samples, but unless I’ve gone completely mad, these plots are sensitive to the sample size. i.e. if they’d taken a sample size of say 87 random pairs of simulations the certainty (that the floods are 2.5 times as likely to occur in the AGW case) would have been greater (probably by the square root of 2, but that’s just an educated guess). This is basically an example of how what we used to call “technowank” in the IT trade can go badly wrong.

If I’m right and I think I am, Pall et al have not only presented the wrong headline finding (the world should have been informed that the floods, according to their modelling exercise, were 2.5x as likely because of AGW +/- not very much), they’ve also thrown away the advantage of using so much computer time – I read somewhere that those 10,000+ simulations would have cost £120m if run commercially rather than as volunteers’ screensavers!

They say it’s better to understand how to do something simple, than misunderstand something complex. Well, they don’t actually, I just made that up. Anyway, here’s some schoolboy stats Pall et al could have employed:

From their graphs, about 12% of the AGW simulations were greater than their 0.41mm threshold for the flood. With a sample size of 2268, what my textbook calls the STandard Error of Percentages (STEP), the standard deviation of this estimate of the whole (infinite) population of simulations is given by:

SQRT((12*(100-12))/2268) = 0.68%

That is, it’s likely (within 1 SD) that the actual risk of flooding in the AGW case (according to our model) is 12+/-0.68%.

Similarly for the counterfactual ensemble (all 40 sets combined), it’s likely (based on inspection of their Fig.4 that the number of AGW simulations exceeding the 0.41mm threshold is 2.5x the number of non-AGW ones doing so) that the flood risk without AGW is within 4.8%+/-:

SQRT((4.8*(100-4.8))/8557) = 0.23%

There’s probably some clever stato way of combining these estimates, but all I’m going to do is crudely compare the top estimate of each with the bottom estimate of the other – that gives us roughly 2 standard deviations. On this basis, according to our modelling, the actual likelihood of the floods occurring because of AGW has increased by a factor of very likely between 12.68/4.57 = 2.8 and 11.32/5.03 = 2.2, with a best estimate of 2.5 times.

This is an important conclusion because the problem with global warming is not just or even mainly the increase in averages, in this case of precipitation. That may not be noticeable.

I think I’ll stop here and consider in another post my more philosophical arguments as to how the methodology of the Pall et al study is dubious.

In the meantime:

Criticism 4: Pall et al’s statistical approach understated the certainty of their modelling result. In fact the study provides some evidence that:

Even the limited warming over the 20th century is very likely, according to a comparative modelling exercise, to have made flooding of the severity of that in 2000 between 2.2 and 2.8 times as likely as in 1900. Historical records suggest the 2000 floods were around a once in 400 year event before global warming, but as a result of the warming up to 2000 they are, according to this modelling exercise, a once in 140 to 180 years event.

Criticism 5: The study should have run ensembles with the expected increase in temperatures expected by (say) 2030 and 2050.

(to be continued)

22/2/11, 11:13: Correction of typo and minor mods for clarity.
22/2/11. 16:07: Corrected another typo and clarified the meaning of the STEP calculations.

February 16, 2011

Quick, FIT Farmers!

I once asked a careers adviser about the possibilities of becoming a journalist. I was told it was a difficult profession to get into. Clearly the reasons for that have nothing to do with competence to actually do the job.

Following my post back in October pointing out that the feed-in tariff (FIT) subsidy for large installations is so generous that there’s no longer an incentive to use sunlight to grow food, or, as the Guardian put it on Monday 7th Feb, “[a]fter a Guardian report on Sunday” – that would be 6th Feb – DECC have decided to bring forward their review of the scheme.

So anyone planning to take advantage of the current tariffs better move fast. But make sure you understand because the papers seem to labour under one or two misconceptions.

For example, yesterday the Independent wrote that:

“…including projects of more than 50 megawatts (MW) in the review will catch out community solar schemes from schools, hospitals and housing associations, as well as truly large-scale farm installations.”

That should have read 50kW, and soon did after the error was pointed out. The point is that the schemes being subsidised by FITs will generate relatively piffling amounts of energy.

As the predictable farce continues, it’s becoming less and less clear to me what the rationale for the FIT scheme actually is, at least for solar PV. The fundamental problem is that government made the a priori assumption that microgeneration is economically efficient. Wrong, wrong, wrong. FIT farms are much more efficient than sticking solar panels on people’s roofs. As ever, scale economies are critical.

So we keep hearing statements accusing farmers of taking up a subsidy which was “intended for” even smaller-scale producers (I say “even smaller-scale” because what’s really needed is industrial-scale production of solar electricity in the Sahara). It’s a no-brainer what DECC will actually do: they’ll reduce the FIT rates for larger installations and/or reduce the size limit for which FITs apply and/or allocate different pots of subsidy for different size schemes – fortunately Osborne has capped the amount that can be committed (from our future electricity bills). Basically they’ll defend the micro micro-generators. But why?

If the future isn’t microgeneration, why would we want to subsidise it? Why not do the reverse of what the government is about to do and allow relatively large-scale solar PV installations to use the subsidy? Surely that would achieve the objective of building up scale economies (that term again – what mental contortions to recognise one form of scale economy and not another in the same initiative!) for the supply of solar panels in the UK?

There’s misconception about another aspect of the scheme, too, extending even to a picture caption serving as the subtitle to a Guardian article supposedly answering all your solar PV FIT questions. They write that:

“Homeowners can make money from their solar panels by selling the energy produced to electricity companies”

More wrongness, journos!

You make most of the money – 41.3p/kWh – by generating the electricity. That’s what you’ll get a meter for on day one.

In fact, the last thing you want to do is sell it to your electricity company! For that you only get an additional 3p/kWh. Last time I looked I was paying around 12p/kWh for electricity and 5p/kWh for gas. So what you want to do is use the solar PV generated electricity yourself rather than buying electricity or even gas. Arrange to use the electricity during the day (perhaps by using storage heaters) or even store it in a bank of batteries to cook in the evening.

There’s a wrinkle that favours the home microgenerator even more. Until smart meters roll out it will be assumed that you export half the electricity you produce and use the rest. So anything over half you use is totally free!

As I expected was inevitable all along, we are now well into the realm of perverse incentives. If you’re a home microgenerator the opportunity cost of your own electricity is only at most 3p/kWh. So you might be able to afford to use it up when you wouldn’t have previously spent the money buying electricity. Air-conditioning springs to mind.

It seems the 3p/kWh export tariff has been set at the price electricity distributors normally pay suppliers. But that seems a bit daft, since they (or we) are subsidising generation of the same electricity. Clearly, the export tariff should be approximately the same as the consumer price for electricity and the generation tariff somewhat lower than it is now to compensate.

It might be worth pointing out that with the scheme as it is, electricity consumers should favour larger-scale solar PV installations – FIT farms – since they have no choice but to export their electricity at 3p/kWh (on top of a lower generation tariff of as low as 29.3p/kWh compared to the domestic tariff of 41.3p/kWh).

It’s obvious why home microgenerators would support FITs. It’s not so obvious why electricity consumers would be so enthusiastic. From the detached point of view of decarbonising the UK’s electricity supply, it seems to me there’s a problem looming a decade or two down the line. Current policies should deliver the 15% renewables by 2020 the UK is commited to, though not much will be solar PV, by the way – offshore wind will dominate. But sometime after 2020 we’ll need to start getting domestic consumers to switch from gas central heating and cooking to electricity. At present, the gas price is a fraction of that for electricity. The gap can only widen, especially as we add expensive renewables to the supply. Better start thinking now, I suggest, how we’re going to manage – politically – to tax domestic gas at around the level we do petrol.

And best to think too about how to keep the domestic electricity price down. Generous FITs are probably not the way. And a much larger proportion of onshore wind at about half the cost of offshore might be a good idea as well.

Create a free website or blog at WordPress.com.