Uncharted Territory

July 22, 2018

July 2018 UK Weather: CET Records Set

Filed under: Effects, Global warming, Science, UK climate trends — Tim Joslin @ 2:51 pm

Last month I jumped the gun to report the hottest UK June since 1976 in the Central England Temperature (CET) record.  I was slightly undone by a slight downward revision so that in the event June 2018 was only equal with that in 2003 as the warmest since 1976.  Despite that, the forecast for another week of temperatures reaching the 30Cs and the CET for July to date of significantly over 19C prompts me to call July 2018 even earlier as one of the three hottest on record in the CET.  Here’s a graph (the first of many, so be prepared!):
180722 July CET graph to 2018
Only 2006 (19.7C), 1983 (19.5C) and now 2018 (the CET so far this month was 19.3C when I prepared this graph) have exceeded 19C in the CET (thanks, as ever, to the Met Office for the data). In fact, since the next hottest July was in 1783 at 18.8C – which should possibly even be discounted on the grounds that the heat was in part the effect of volcanic smog from the Icelandic volcano Laki – some wintry weather indeed would be necessary for July 2018 to now not be one of the three warmest, justifying my early call (though there’s a huge getting round to it factor in that!).

What is also striking about the July temperature graph is that the three hottest Julys – 2006, 1983 and 2018 – are all in the global warming era. Of course.

I’ve also labelled some notable years in this and subsequent graphs. In particular, I read articles drawing 1955 and 1911 to my attention. Ian Jack wrote nostalgically about 1955, though I do wonder if its impact was magnified by his age at the time. I’d personally rank 1983 – one of the few summers when I played tennis regularly – as up there with 1976. And I’m backed up by the CET data!

A brilliant Weatherwatch column in the Guardian (better even than the one of 2011 on the same topic) reports on the summer of 1911. It’s worth quoting:

“The long hot summer of 1911 is credited with changing fashions, with women shedding whalebone corsets and brassieres becoming the rage. Edwardian [sic, though Edward VII died in 1910] aristocrats are said to have taken up nude tennis at their country estates…

There was record heat in August and the sunshine continued until September, by which time the countryside was also in severe distress and riots had broken out in the cities.”

Time will tell if we’re in for a repeat!

So onto the graph-fest.

I was going to follow up last month’s post with one of the April to June CET, having noticed that the hot June had followed a distinctly mild mid to late spring (despite cold snaps continuing). Anyway, here’s that one, a little belatedly:
180722 Apr to Jun CET graph to 2018
Yep, that’s right, April to June this year has been one of the three warmest such periods in the CET record, exceeded only by 1762 and 1798. Crikey!

Then, of course, a hot June followed by an exceptionally hot July must make the early to mid summer graph (June and July) quite interesting:
180722 June to July CET graph to 2018
It is, but 2018 is still only the third hottest year, after 1976 and 2006 this time (though 2018 could still also fall behind 1826, I suppose).

Surely there must be some measure on which 2018 is (provisionally) the warmest ever?

Yes, you’ve guessed it. A mild late spring and hot early to mid summer makes 2018 a record-breaker for May to July mean CET:
180722 May to July CET graph to 2018

And that’s not it. If we add in April as well, sort of mid-spring to mid-summer, it’s not even close:
180722 April to July CET graph to 2018


June 27, 2018

Hottest UK June Since 1976 (and Weather Reporting Hype)

Filed under: Effects, Global warming, Media, Science, Science and the media, UK climate trends — Tim Joslin @ 2:27 pm

It always baffles me that the Met Office reports notable weather months 2 or 3 days before their end – you’d have thought they’d wait to finalise the “official” data – so this time I’m facetiously reporting before they do (assuming I type fast enough)!

I know it’s only the 27th and CET (that’s the Central England Temperature for any newbies) data has been published only up to the 25th (thanks again to the Met Office for this resource):

180626 Heatwave CET data

but it’s already a nailed-on slam-dunk that the CET mean for June 2018 will exceed the 16.1C recorded in the exceptionally hot summer of 2003, making this June the warmest since the legendary summer of 1976 (17C).

I say this simply because the forecast for the rest of the month is for fairly hot conditions to persist (thanks this time to Weathercast):

180626 Heatwave Weathercast

Simple arithmetic suggests that daily mean temperatures of around 20C (London’s are not atypical of England as a whole, slightly cooler if anything) will drag up the average for the month from 15.9C for the first 25 days to over the 16.1C recorded in 2003.  Here’s a graph showing June CET since 1659, assuming (conservatively) a mean of 16.2C this year:

180626 Heatwave June CET

Having said all that, this June and May (which I’ll come to) have not been notable for exceptional temperatures.  For example, the current “heatwave”, though fairly unpleasant, has come nowhere near breaking daily records for the CET area (though some local records may be broken, in Wales, for example).  Temperatures have so far only edged above 30C in one or two places, with 30.1C at Hampton Water Works on 25th (Monday) not a patch on the 33.5C at East Bergholt on the same date in 1976.  Even the 30.7C at Rostherne No 2 yesterday, 26th, is well below 1976’s 35.4C at North Heath.

I might even go so far as to say it’s a little bit of an exaggeration to call the current conditions a “heatwave” (at least in southern England).  The term is being devalued by tabloid reporting.  It’s an outrage!  (To use another overused word).  It’s just “hot weather”.

Given that we had several days in succession over 35C in June 1976, and we’ve had 42 years of global warming since then, and warming affects extreme events disproportionately, I wonder what temperatures we’d hit if we had similar conditions to 1976?  Presumably then (as in 2003), high pressure didn’t just sit over the UK, but drew in air from the warmest direction in summer, that is from the south-east (or even just from further east).

What has been notable this year has been the persistence of dry, sunny, windless, anticyclonic conditions, with only a small interlude of westerlies in June. That persistent high pressure conditions are fairly unusual in June is presumably the reason why, on average, June CET temperatures have risen less than other months in the global warming era (the black line in the graph above shows that, averaged over 21 years, the recent period has not been exceptional, though global warming will inevitably drag the mean temperature up over the coming decades).  Because the oceans warm only slowly, periods of weather dominated by westerlies are likely to be only a little warmer than before global warming set in.  The 5 year periods in the mid 2000s and most recently (the green line in the graph) show the potential for generally hotter Junes.

And the historical record (check out 1676 and 1846!), suggests that a truly freakish June these days (with global warming) would average well over 18C, possibly even touching 19C.  Much worse than the low 16Cs this year.

At least this June has been reasonably hot.  May was widely reported as the hottest and sunniest on record.  It was exceptionally sunny (as may also be the case for June), but nowhere near the hottest.  Here’s my latest graph of May CET:

180626 Heatwave May CET

In fact, at 13.2C in the CET, May 2018 was only equally as warm as May 2017 and less warm than in 2004 (13.4C), 1992 (13.6C) and quite a few others!

So how can May 2018 be reported as the “hottest on record”?

Well, obviously it might be because statistics are being used for a different region e.g. the UK as a whole, but I don’t think that’s the main reason.  The CET is fairly representative.

No, if you read the small print you’ll find that the “hottest May” claim is based on daily maximum temperatures only.  When you take night-time temperatures into account, as is almost universal practice, May 2018 was not exceptionally warm.  The reason for the difference lies in all that sunny weather, which tends to lead to warm days and cool nights, so that the day-time average temperature is higher than the overall average.

If that weren’t enough, weather record reporting is also afflicted by “Year Zero Syndrome”.  The CET record back to 1659 is not used, or even referred to.  Instead records are based on the period since 1910, when more comparable records begin.  It’s a bit like the way football records in England are now based on the period since the start of the Premier League in 1992, so that we no longer realise that goal-scoring feats comparable (rather than equal, because there are now 2 fewer top-flight teams) to Dixie Dean‘s 60 goals in 1927-28 are still possible.  Clearly, from the chart above, no recent May mean temperature approaches that of 1833, or even 1848.

That leads me to my usual warning.  May 1833 was about 3.5C hotter at 15.1C than the mean for the period (given by the black 21-year running mean on the graph).  Because, in line with global warming, an average May (unlike an average June) is now warmer than at any time since 1659 (the black line again), a similarly freakish May would be somewhere in the mid-15Cs.

Unless the last few years are exceptional, it’s curious that June shows the global warming signal so weakly.  I’ll have to look more closely at the data to see if that for any other months exhibits a similar feature.

June 23, 2017

How Not to Report a Weather Record: 21st June 2017

Filed under: Effects, Global warming, Science, UK climate trends — Tim Joslin @ 5:36 pm

Well, well, well.  Less than a year on from an exceptionally hot mid-September day (at least exceptionally hot for the UK, if not, perhaps, for Kuwait), and it’s only gone and happened again.

Yeap, the presumably less poisonous than mercury red liquid in my re-purposed fridge thermometer has only gone and reached 34.5C this week, on what was widely reported as “the hottest June day for 41 years”, that is, since the summer of 1976.  And curiously I was close to the epicentre of the heatwave back in ’76, in FA Cup-winning Southampton, then the hottest place in the country, just as where I am now, a few miles from Heathrow, has been this time.

And once again the record has been somewhat understated.   I explained in my post on the topic last September that the true significance of the 13th September 2016 was that it was the hottest day that had been recorded in the UK so late in the year.

You’ve guessed it.  The 34.5C recorded at Heathrow this summer solstice was the hottest daily maximum so early in the year.  Back in 1976 the temperatures over 35C (peaking at 35.6C in Southampton on 28th) were later in the month.  In other words, 21st June 2017 saw a new “date record”.

Admittedly, it was not a particularly notable date record, since 34.4C was recorded at Waddington as early as 3rd June during the glorious post-war summer of the baby-boom year of 1947.  And 35.4C at North Heath on 26th June 1976 also seems somewhat more significant than nearly a whole degree less on 21st June.  Furthermore, unlike in 1947, 1976, and, for that matter, 1893, only one “daily record” (the hottest maximum for a particular date) was set in the 2017 June heatwave.

Nevertheless, 21st June 2017 set a new date record for 5 days (21st to 25th June, inclusive) and that is of statistical significance.  The point is that without global warming you would expect there to be approximately the same number of date records each year, or, more practically, decade.  The same is true of daily records, of course – providing a recognised statistical demonstration of global warming – but my innovation of date records provides for a more efficient analysis, since it takes account of the significance of daily records compared to those on neighbouring dates.  It makes use of more information in the data.

Supporting the “hypothesis” of warming temperatures, the 5 day date record set on 21st June 2017 exceeds what you would expect in an average year, given that daily temperature records go back over 150 years.  On average you’d expect less than 3 days of date records in any given year.  But we can’t read too much into one weather event, so how does it look for recent decades?

Last September, I provided a list of UK date records from the hottest day, 10th August, when 38.1C was recorded in Gravesend in 2003 through to October 18th, promising to do some more work next time there was a heatwave.  So, keeping my word, we have the following date records:

34.4C – 3rd June 1947 – 18(!) days

34.5C – 21st June 2017 – 5 days

35.4C – 26th June 1976 – 1 day

35.5C – 27th June 1976 – 1 day

35.6C – 28th June 1976 – 3 days

36.7C – 1st July 2015 – 33(!!) days

37.1C – 3rd August 1990 – 7 days (through 9th August)

Obviously, weighting for how exceptionally hot they were, the 2010s have had way, way over their share of exceptionally hot days for the time of year during the summer months.  I’m timed out for today, but I will definitely have to get round to an analysis of the whole year!  Watch this space.


September 20, 2016

How Not to Report a Weather Record: 13th September 2016

Filed under: Effects, Global warming, Science, UK climate trends — Tim Joslin @ 11:21 am

Last Sunday, the Guardian website suggested Tuesday 13th September would be jolly warm:

“If the mercury rises above 31.6C, the temperature was [sic] reached at Gatwick on 2 September 1961, it will be the hottest September day for 55 years.”

“No, no, no!!”, I was obliged to point out, adding, by way of explanation that:

“If the temperature rises above 31.6C it will be the hottest September day for more than 55 years, since 1961 was 55 years ago.

For it to be the hottest September day for 55 years it will only have to be hotter on Tuesday than the hottest September day since 1961.”

Good grief.

After that I was hardly surprised – since your average journo seems not even to be an average Joe, but, to be blunt, an innumerate plagiarist – to read in the Evening Standard on the 13th itself:

“If the heat rises above 31.6C, which was reached at Gatwick on September 2, 1961, then it will be the hottest [September] day for 55 years.”

See what they’ve done there?  With a bit of help from Mr Google, of course.

In the event, it reached 34.4C on 13th, making it the hottest September day for 105 years.

Much was also made of the fact that we had 3 days in a row last week when the temperature broke 30C for the first time in September in 87 years.

But the significance of the 34.4C last Tuesday was understated.

The important record was that the temperature last Tuesday was the highest ever recorded so late in the year, since the only higher temperatures – 34.6C on 8th September 1911 (the year of the “Perfect Summer”, with the word “Perfect” used as in “Perfect Storm”) and 35.0C on 1st rising to 35.6C on 2nd during the Great Heatwave of 1906 – all occurred earlier in the month.  By the way, in 1906 it also reached 34.2C on 3rd September.  That’s 3 days in a row over 34C.  Take that 2016.  They recorded 34.9C on 31st August 1906 to boot, as they might well have put it back then.

No, what’s really significant this year is that we now know it’s possible for the temperature to reach 34.4C as late as 13th September which we didn’t know before.

I’m going to call this a “date record”, for want of a better term.  Any date record suggests either a once in 140 years freak event (since daily temperature records go back that far, according to my trusty copy of The Wrong Kind of Snow) or that it’s getting warmer.

One way to demonstrate global warming statistically is to analyse the distribution of record daily temperatures, i.e. the hottest 1st Jan, 2nd Jan and so on.  Now, if the climate has remained stable, you’d expect these daily records to be evenly distributed over time, a similar number each decade, for example, since 1875 when the records were first properly kept.  But if the climate is warming you’d expect more such records in recent decades.  I haven’t carried out the exercise, but I’d be surprised if we haven’t had more daily records per decade since 1990, say, than in the previous 115 years.

It occurs to me that another, perhaps quicker, way to carry out a similar exercise would be to look at the date records.  You’d score these based on how many days they apply for.  For example, the 34.4C on 13th September 2016 is also higher than the record daily temperatures for 12th, 11th, 10th and 9th September, back to that 34.6C on 8th September 1911.  So 13th September 2016 “scores” 5 days.

Here’s a list of date records starting with the highest temperature ever recorded in the UK:

38.1C – 10th August 2003 – counts for 1 day, since, in the absence of any evidence to the contrary, we have to assume 10th August is the day when it “should” be hottest

36.1C – 19th August 1932 – 9 days

35.6C – 2nd September 1906 – 14 days

34.6C – 8th September 1911 – 6 days

34.4C – 13th September 2016 – 5 days

31.9C – 17th September 1898 – 4 days

31.7C – 19th September 1926 – 2 days

30.6C – 25th September 1895 – 6 days

30.6C – 27th September 1895 – 2 days

29.9C – 1st October 2011 – 4 days

29.3C – 2nd October 2011 – 1 day

28.9C – 5th October 1921 – 3 days

28.9C – 6th October 1921 – 1 day

27.8C – 9th October 1921 – 3 days

25.9C – 18th October 1997 – 9 days

And you could also compile a list of date records going back from 10th August, i.e. the earliest in the year given temperatures have been reached.

The list above covers a late summer/early autumn sample of just 70 days, but you can see already that the current decade accounts for 10 of those days, that is, around 14%, during 5% of the years.  The 2000s equal and the 1990s exceed expectations in this very unscientific exercise.

Obviously I need to analyse the whole year to draw firmer conclusions.  Maybe I’ll do that and report back, next time a heatwave grabs my attention.

It’s also interesting to note that the “freakiest” day in the series was 2nd September 1906, with a daily record temperature hotter than for any of the previous 13 days.  2nd freakiest was 19th August 1932 – suggesting (together with 2nd September 1906) that perhaps the real story is an absence of late August heatwaves in the global warming era – joint with 18th October 1997, a hot day perhaps made more extreme by climate change.

Am I just playing with numbers?  Or is there a serious reason for this exercise?

You bet there is.

I strongly suspect that there’s now the potential for a sustained UK summer heatwave with many days in the high 30Cs.  A “Perfect Summer” turbocharged by global warming could be seriously problematic.  I breathe a sigh of relief every year we dodge the bullet.




January 19, 2016

Two More Extreme UK Months: March 2013 and April 2011

Filed under: Effects, Global warming, Science, Sea ice, Snow cover, UK climate trends — Tim Joslin @ 7:17 pm

My previous post showed how December 2015 was not only the mildest on record in the Central England Temperature (CET) record, but also the mildest compared to recent and succeeding years, that is, compared to the 21 year running mean December temperature (though I had to extrapolate the 21-year running mean forward).

December 2010, though not quite the coldest UK December in the CET data, was the coldest compared to the running 21 year mean.

I speculated that global warming might lead to a greater range of temperatures, at least until the planet reaches thermal equilibrium, which could be some time – thousands of years, maybe.  The atmosphere over land responds rapidly to greenhouse gases. But there is a lag before the oceans warm because of the thermal inertia of all that water. One might even speculate that the seas will never warm as much as the land, but we’ll discuss that another time. So in UK summers we might expect the hottest months – when a continental influence dominates – to be much hotter than before, whereas the more usual changeable months – when maritime influences come into play – to be not much hotter than before.

The story in winter is somewhat different.  Even in a warmer world, frozen water (and land) will radiate away heat in winter until it reaches nearly as cold a temperature as before, because what eventually stops it radiating heat away is the insulation provided by ice, not the atmosphere.  So the coldest winter months – when UK weather is influenced by the Arctic and the Continent – will be nearly as cold as before global warming.   This will also slow the increase in monthly mean temperatures.  Months dominated by tropical influences on the UK will therefore be warmer, compared to the mean, than before global warming.

If this hypothesis is correct, then it would obviously affect other months as well as December.  So I looked for other recent extreme months in the CET record.  It turns out that the other recent extreme months have been in late winter or early spring.

Regular readers will recall that I wrote about March 2013, the coldest in more than a century, at the time, and noted that the month was colder than any previous March compared to the running mean.  I don’t know why I didn’t produce a graph back then, but here it is:

160118 Extreme months in CET slide 1b

Just as December 2010 was not quite the coldest December on record, March 2013 was not the coldest March, just the coldest since 1892, as I reported at the time.  It was, though, the coldest in the CET record compared to the 21-year running mean, 3.89C below, compared to 3.85C in 1785.  And because I’ve had to extrapolate, the difference will increase if the average for Marches 2016-2023 (the ones I’ve had to assume) is greater than the current 21-year mean (for 1995-2015), which is more than half likely, since the planet is warming, on average.

We’re talking about freak years, so it’s surprising to find yet another one in the 2010s.  April 2011 was, by some margin, the warmest April on record, and the warmest compared to the 21-year running mean:

160119 Extreme months in CET slide 2

The mean temperature in April 2011 was 11.8C.  The next highest was only 4 years earlier, 11.2 in 2007.  The record for the previous 348 years of CET data was 142 years earlier, in 1865, at 10.6C.

On our measure of freakishness – deviation from the 21-year running mean – April 2011, at 2.82C, was comfortably more freakish than 1893 (2.58C), which was in a period of cooler Aprils than the warmest April before the global warming era, 1865.  The difference between 2.82C and 2.58C is unlikely to be eroded entirely when the data for 2016-2021 is included in place of my extrapolation.  It’s possible, but for that to happen April temperatures for the next 6 years would need to average around 10C to sufficiently affect the running mean – the warmth in the Aprils in the period including 2007 and 2011 would need to be repeated.

So, of the 12 months of the year, the most freakishly cold for two of them, December and March, have occurred in the last 6 years, and so have the most freakishly warm for two of them, December and April. The CET record is over 350 years long, so we’d expect a most freakishly warm or cold month to have occurred approximately once every 15 years (360 divided by 24 records).  In 6 years we’d have expected a less than 50% chance of a single freakishly extreme monthly temperature.

According to the CET record, we’ve had more than 8 times the number of freakishly extreme cold or warm months in the last 6 years than would have been expected had they occurred randomly since 1659.

And I bet we get more freakishly extreme cold or warm months over the next 6 years, too.


May 1, 2012

The Wettest Drought in History

One of my responsibilities as a teenager was to keep the lawn under control. Flymos had presumably not yet been invented, and petrol-driven mowers were perhaps too much hassle, so ours was manual. If the grass got too long it was hard work and it could even become necessary to resort to shears, which was back-breaking work. But mowing was also difficult if the grass was damp. There was therefore a trade-off each spring. The first mow had to be done when it was mild enough for the grass to be reasonably dry, but couldn’t be put off until it was too long. And as the grass grew it dried out more slowly each day. So it was essential to make use of any opportunity to mow in case the weather turned wet again. It probably only happened once or twice, but it seems I was always caught out. I’d wait for one more dry day to make the job easier, but the skies would open and a week later the job would be twice as difficult.

Nowadays the internet and improved forecasting allows me to monitor the weather far more effectively. Thus it was I’d already been out with the mower in March, and, seeing the long-range forecast, made sure I got a mow in just before it started raining early in April.

The point is that the 5-10 day forecast is now fairly reliable.

Why, then, was the UK drought – declared in a few regions in March, with hosepipe bans from 5th Aprilofficially extended in mid April?

Yes, that’d be in the middle of the wettest April on record!

We’re now in the farcical situation of the “wettest drought in history”, with a succession of “experts” (and junior ministers) popping up on TV claiming the rain in April somehow doesn’t count. Apparently it’ll run off compacted ground. Yes, maybe for the first day or two, but not after a month. With the wettest April on record followed by significant rain already in May, and more forecast in a day or two, the drought risk is simply receding. We’re in one of those surreal situations where reasons are being invented not to contradict previous claims, in this case that the drought would last into next year.

What baffles me is why the drought was extended when wet weather was forecast. Surely – since most of the time it’s dry – the drought risk is receding as long as there’s significant rain in the forecast. And, as the 5-10 day forecast is fairly reliable and everything after that isn’t, you simply run the risk of looking stupid if you don’t wait until the forecast is for dry weather.

I wonder whether there’s a tendency to believe long-term forecasts more than short-term ones. But long-term forecasts only indicate a small bias one way or another, as Met Office modelling indicates:

“New three-month forecasts by the Met office suggest little respite with April, May and June expected to be drier than average. ‘With this forecast, the water resources situation in southern, eastern and central England is likely to deteriorate further during the period. The probability that UK precipitation for April-May-June will fall into the driest of our five categories is 20-25% while the probability that it will fall into the wettest of our five categories is 10-15%, it says.’ ” [my emphasis]

So 20-25% dry plays 10-15% wet plays (presumably) 60-70% around average. Not sure I’d have put a lot of money on the “expectation” of a dry spring this year (certainly wouldn’t now!). Even less after I’d looked at the Met Office report (scroll down to find PDFs) because the model runs are all over the place.

And are these “probabilities”, anyway? Isn’t the modelling signal swamped by the noise of uncertainty? It seems to me likelihoods based on model-runs are not the same as probabilities in the real world.

I’d say the Met Office and the media (the quote marks indicate the introductory sentence was written by the Guardian’s John Vidal) need to mind their language. How about “slightly more likely than not to be” rather than “expected to be”? And perhaps “indication” rather than “forecast”? And “x% of model runs gave…” rather than “the probability that…”? And definitely “might” rather than “is likely to”!

February 24, 2011

Extreme Madness: A Critique of Pall et al (Part 3: Juicy Bits and Summary)

Filed under: Effects, Global warming, Science, UK climate trends — Tim Joslin @ 6:22 pm

I continue to be bothered by Pall et al, the paper which attempts to determine how much more likely the autumn 2000 floods in England and Wales were because of the anthropogenic global warming (AGW) since 1900.

To recap, Part 1 of this extended critique described the method adopted by Pall et al and made a few criticisms, one of which I’ll elaborate on in the first part of this post. Part 1 ended by asking why Pall et al didn’t eliminate more statistical uncertainty, given the large number of of data points they produced (they ran over 10,000 simulations of the climate in 2000 when floods occurred).

Part 2 looked more closely at how Pall et al had defined risk and uncertainty and handled it statistically. Part 3 will further question the approach adopted, in particular by considering the uncertainty introduced by the process of modelling the climate itself.

Oops, it’s a log scale, or “about this 0.41mm threshold” revisited

In Part 1, I noted the arbitrariness of the threshold for severe flooding adopted by Pall et al. They considered their model had predicted flooding when it estimated 0.41mm/day or more of runoff, but their Fig 3 clearly shows that this level actually gives rather more than the 5-7 floods in the ~2000 model runs of each of the 4 A2000N scenarios (those without AGW, the AGW runs being referred to as the A2000 series, of which around 2000 were also run) that would be expected for the once in 3-400 year event the 2000 floods are said to be.

Pall et al includes no evidence as to the skill of their model in predicting flooding or calibration between the models’ estimation of runoff in the 2000 floods and what actually happened in the real world. As I noted in Part 1, they could have run the model for years other than 2000 in order to show what is termed its “skill”, in this case in predicting flooding.

Why, then, did Pall et al not calibrate their model? Because they didn’t think it mattered, that’s why. They write:

“Crucially, however, most runoff occurrence frequency curves in Fig 3 remain approximately linear over a range of extreme values, so our FAR estimate would be consistent over a range of bias-corrected flood values.”

It’s about time we had a picture, and I can now include Pall et al’s Fig 3 itself. Ignore the sub-divisions on the bottom of the 2 scales in each diagram – these are in error as pointed out in Part 1. The question for any youngsters reading is: are the scales on these diagrams linear or logarithmic?:

Answer: logarithmic, of course.

So is it the case that the “FAR estimate would be consistent over a range of bias-corrected flood thresholds”? The FAR, remember, is the ratio of the AGW risk of flooding to the non-AGW risk of flooding. This ratio would indeed not depend on the level chosen in the model set-up to indicate flooding of the extent seen in the real world in 2000 were the runoff occurrence frequency curves linear. But they’re not. They’re logarithmic. The increased risk therefore does depend on the flood level, as was seen simply from reading figures off the diagrams in Part 1. One wonders if we’re all clear exactly what the graphs in Pall et al’s Fig 3 actually represent.

Does Pall et al actually tell us anything useful at all?

The Pall et al study assumes it has some skill in forecasting flooding in England in autumn from the state of the climate system in April. Unfortunately we have no idea what this level of skill actually is. The model has not been calibrated against the real world by running it for years other than 2000 (or if it has, this information is not included in Pall et al). Note that analysing the results of such an exercise would not be a trivial exercise, since there are two unknowns: the skill of the model and its bias. As far as we know, 0.41mm runoff in the model could be anything in the real world – 0.35mm or 0.5mm, we have no idea. Similarly we don’t know if the model would forecast floods such as those in 2000 with a probability of 1 in 10, 1 in 100 or whatever.

To be fair, Pall et al do devote one of their 4 pages in Nature to showing their modelling does bear some relation to reality. Their Fig 1 shows similar correlation between Northern Hemisphere (NH) air pressure patterns in the model and rainfall in England and Wales as exists in the real world. And their Fig 2 shows that the rainfall patterns in the model bear some resemblance to those in the real world.

But one (more) big problem nags away at me. The basic premise is that a particular pattern of SSTs and sea ice causes the pressure system patterns that lead to rainfall in the UK. Pall et al therefore used the observed April 2000 pattern as input to the A2000 (AGW) series of model runs. But the patterns used for the non-AGW (A2000N) runs were different. Here’s what they say:

“…four spatial patterns of attributable [i.e. to AGW] warming were obtained from simulations with four coupled atmosphere-ocean climate models (HadCM3, GFDLR30, NCARPCM1 and MIROC3.2)… Hence the full A2000N scenario actually comprises four scenarios with a range of SST patterns and sea ice…” [my stress]

So if the A2000 model runs can predict flooding in a particular year from the SST and sea ice pattern in April, we wouldn’t expect the A2000N runs to do so, not just because everything is warmer, but also because the SST and sea ice patterns are different! So we don’t know whether the increased flood risk in the A2000 series is because of global warming or because the SST patterns are different.

It also seems to me that were it the case that Pall et al’s model could predict autumn flooding in April around 15-20x as often as it actually occurs (around 1 in 20 times for 2000 compared to the actual risk of 1 in 3-400) as is implied by their Fig 3, then we’d be reading about a breakthrough in seasonal forecasting and more money would be being invested to improve the modelling further (and increase the speed of forecasting of course, so that it’s not autumn already by the time we know it’s going to be wet!). This isn’t just the forecast for the next season we’re talking about, which the Met Office has given up on, but the forecast for the season after that.

So I’m not convinced. I’m going to assume that Pall et al’s modelling can’t tell one year from another, and that all they’ve done is model the increased risk of flooding in a warmer world in general. (One way to test this would be to compare the flood risks of the 4 A2000N models against each other for the same extent of AGW – it could be that the models give different results simply because they suggest different amounts of warming, not different patterns).

Under this not very radical assumption, we can actually calibrate Pall et al’s modelling. We know that the floods in 2000 were a once in 3-400 year event. That implies that in each of the diagrams in Fig 3 there should be around 5-7 floods (there are – or should be – approx. 2000 dots representing non AGW model runs on each diagram). We can therefore estimate by inspecting the figures how much flooding in the model respresents a 3-400 flood – it’s the level with only 5-7 dots above. We can then read across to the line of blue dots (the AGW case) and then, by reading up to the return time scale (the one with correct subdivisions), work out how often the modelling suggests the flooding should then occur. Here’s what I get:
– Fig 3a: 3-400 year flood threshold ~0.49mm; risk after AGW once every 40 years.
– Fig 3b: ~0.47mm, and risk now once every 30 years.
– Fig 3c and d: ~0.5mm, and risk now once every 50 years.

So the Pall et al study implies, assuming it’s no better at forecasting flooding when it knows the SST and sea ice conditions in April than it is if it doesn’t, that the risk of a 3-400 year flood in England and Wales, similar or more severe to that which occurred in 2000 is now, as a result of AGW up to 2000 only, between once in 30 and once in 50 years. That is, under this assumption, the risk of flooding in England and Wales of what was previously once in 3-400 year severity has increased by a factor of between 6 and 13, according to Pall et al’s modelling.

Trouble is, the Pall et al model may have a bit of skill in forecasting flooding from April SST and sea ice conditions (the A2000 case) and this skill may have been reduced by an unknown factor when processing the data to remove the effects of 20th century warming. If Pall et al’s results are to have any meaning whatsoever they need to do further work to establish the skill of the model and calibrate it to measures of flooding in the real world.

More uncertainty about uncertainty

In Part 2 I discussed how Pall et al’s treatment of uncertainty has resulted in them actually saying very little. Essentially, they’ve estimated that the risk of autumn flooding as great as or exceeding that in 2000 has increased as a result of AGW by between 20% and around 700% – and there’s 20% probability it could be outside that range! I argued that the sources of this uncertainty are:
(i) the 4 different models used to derive conditions as if AGW hadn’t happened – fair enough, we can’t distinguish between these, (though in Part 1 I estimated how certain we’d be of the increased risk of flooding if we did assume they were all equally probable), and
(ii) statistical uncertainty which could have been eliminated.

But these are not the only sources of uncertainty. We are also uncertain of all the parameters used to drive the HadAM3-N144 model which attempts to reproduce the development of the autumn weather from the April conditions that were fed into it; we’re uncertain of the accuracy of the April SST and sea-ice conditions input into the model; we’re uncertain as to whether atmosphere-ocean feedbacks may have affected the autumn 2000 weather (Pall et al are explicit that such feedbacks were insignificant, so used “an atmosphere-only model, with SSTs and sea ice as bottom boundary conditions); we’re uncertain of the precise magnitude of the forcings in 2000 which affected the development of the autumn weather; we’re uncertain as to whether there are errors in the implementation of the models; and we’re uncertain as to whether there are processes below the resolution of the model which are important in the development of weather patterns. There are probably more.

Consider that the reason we are uncertain as to which of the 4 models used to derive the A2000N initial conditions is most correct (or how correct any of them are) is because we don’t know how well each of them perform on moreorless the same criteria as the higher resolution model used to simulate the 2000 weather. If they didn’t have different parameters, all had the same resolution and so on, then – tautologically – they’d all be the same! If we’re uncertain which of those is most accurate then we must also be uncertain about the HadAM3-N144 model. Just because only one model was used for that stage of the exercise doesn’t mean we’re not still uncertain (and for that matter the fact that we’ve used 4 in the first stage doesn’t mean we’re certain any of them, they could all be wildly wrong, a possibility not apparently taken account of in Pall et al).

It seems to me the real causes of uncertainty in the findings of Pall et al derive from the general characteristics of the models, not (as discussed in Part 2) the statistical uncertainty as to the amplitude of 20th century warming (the 10 sub-scenarios for each of the 4 cases) which has been used.

Judith Curry has recently written at length about uncertainty and her piece is well worth a look (though I disagree where statistical uncertainty belongs in Rumsfeld’s classification – I think it’s a known unknown, maybe in a “knowable” category, since it can be reduced simply by collecting more of the same type of data as one already has). In particular, though, she provides a link to IPCC guidelines on dealing with uncertainty (pdf). A quick skim of this document suggests to me that Probability Distribution Functions (PDFs) such as Pall et al’s Fig 4 should be accompanied by a discussion of the factors creating uncertainty in the estimate, including some consideration as to how complete the analysis is deemed to be. I say deemed to be, since by it’s very nature uncertainty is uncertain!

That seems a good note to end the discussion on.

Here’s Pall et al’s Fig 4 (apologies if it looks a bit smudged):


In Part 1 of this critique I identified the two main problems with Pall et al:
– the model results are not calibrated with real world data. The paper therefore chooses an arbitrary threshold of flooding.
– statistical uncertainty has not been eliminated, rather it seems to have been introduced unnecessarily.

Part 2 drilled down into the issue of statistical uncertainty and suggested how Pall et al could have used the vast computing resources at their disposal to eliminate much of the uncertainty of their headline findings.

Part 3 picks up on some of the issues raised in Parts 1 and 2, in particular noting that the paper seems to include an erroneous assumption which led them to conclude that calibration of their model for skill and bias was not important. If my reasoning is correct, this was a mistake. Part 3 also continues the discussion about uncertainty, suggesting that the real reasons for uncertainty as to the increased risk of flooding have not been included in the analysis (whereas statistical uncertainty should have been eliminated).

There are so many open questions that it is not clear what Pall et al does tell us, if anything. I suspect, though, that the models used have little skill in modelling autumn floods on the basis of April SST and sea ice conditions. If this is correct then the study confirms that extreme flooding in general is likely to become more frequent in a warmer world, with events that have historically been experienced only every few centuries occurring every few decades in the future.

Note: Towards the end of writing Part 3 I came across another critique by Willis Eschenbach.  So there may well be a Part 4 when I’ve digested what Willis has to say!

February 22, 2011

Extreme Madness: A Critique of Pall et al (Part 2: On Risk and Uncertainty)

Filed under: Effects, Global warming, Science, UK climate trends — Tim Joslin @ 2:42 pm

Keeping my promises? Whatever next! I said on Sunday that I had more to say on Pall et al, and, for once, I haven’t lost interest. Good job, really – after all, Pall et al does relate directly to the E3 project on Rapid Decarbonisation.

My difficulties centre around the way Pall et al handle the concepts of risk and uncertainty. I’m going to have to start at the beginning, since I doubt Pall et al is fundamentally different in many respects from other pieces of research. They’re no doubt at least trying to follow standard practice, so I need to start by considering the thinking underlying that. I feel like the Prime alien in Peter Hamilton’s Commonwealth Saga (highly recommended) trying to work out how humans think from snippets of information!

Though I should add that Pall et al does have the added spice of trying to determine the risk of an event that has already occurred. That’s one aspect that really does my head in.

Let’s first recap the purpose of the exercise. The idea is to try to determine the fraction of the risk of the 2000 floods in the UK attributable (the FAR) to anthropogenic global warming (AGW). This is principally of use in court cases and for propaganda purposes, though it may also be useful to policy-makers as it implies the risk of flooding going forward, relative to past experience.

Now, call me naive, but it seems to me that, in order to determine the damages to award against Exxon or the UK, those crazy, hippy judges are going to want a single number:
– What, Mr Pall et al, is your best estimate of the increased risk of the 2000 autumn floods due to this AGW business?
– Um, we’re 90% certain that the risk was at least 20% greater and 66% certain that the risk was 90% greater…
– I’m sorry, Mr Pall et al, may we have a yes or no answer please.
– Um…
– I mean a single number.
– Sorry, your honour, um… {shuffles papers} here it is! Our best estimate is that the 2000 floods were 150% more likely because of global warming, that is, 2 and a half times as likely, that is, the AGW FAR was 60%.
– Thank you.
– Yes?
– How certain is Mr um {consults notes} Pall et al of that estimate.
– Mr Pall et al?
– Let’s see… here it is… yes, we spent £120 million running our climate model more than 10,000 times, so our best estimate is tightly constrained. We have calculated that 95% of such suites of simulations would give the result that the floods were between 2.2 and 2.8 times more likely because of global warming [see previous post for this calculation].

But Pall et al don’t provide this number at all! This is what Nature’s own news report says:

“The [Pall et al] study links climate change to a specific event: damaging floods in 2000 in England and Wales. By running thousands of high-resolution seasonal forecast simulations with or without the effect of greenhouse gases, Myles Allen of the University of Oxford, UK, and his colleagues found that anthropogenic climate change may have almost doubled the risk of the extremely wet weather that caused the floods… The rise in extreme precipitation in some Northern Hemisphere areas has been recognized for more than a decade, but this is the first time that the anthropogenic contribution has been nailed down… The findings mean that Northern Hemisphere countries need to prepare for more of these events in the future. ‘What has been considered a 1-in-100-years event in a stationary climate may actually occur twice as often in the future,’ says Allen.” [my stress]

When Nature writes that “anthropogenic climate change may have almost doubled the risk of the extremely wet weather that caused the floods” [my stress] what they are actually referring to is the “66% certain that the risk was 90% greater”, mentioned by Pall et al in court (and as “two out of three cases” in the Abstract of Pall et al even though the legend of Fig 4 in the text clearly states that we’re talking about the 66th percentile, i.e. 66, not 66.66666… but I’m beginning to think we’ll be here all day if we play spot the inaccuracy – the legend in their Fig 2 should read mm per day not mm^2, that would get you docked a mark in your GCSE exam).

We could have a long discussion now about the semantics and usage in science of the words “may” and “almost” as in the translation of “66% certain that the risk was 90% greater” into “may have almost doubled”, but let’s move on. The point is that in the best scientific traditions a monster has been created, in this case a chimera of risk and uncertainty that the rest of the human race is bound to attack impulsively with pitch-forks.

So how did we get to this point?

Risk vs uncertainty

It’s critical to understand what is meant by this these two terms in early 21st century scientific literature.

Risk is something quantifiable. For example, the risk that an opponent may have been dealt a pair of aces in a game of poker is perfectly quantifiable.

First, why, then do poker players of equal competence sometimes win and sometimes not? Surely the best players should win all the time, because after all, all they’re doing is placing bets on the probability of their opponent holding certain cards. One reason is statistical uncertainty. There’s always a chance in a poker session that one player will be dealt better cards than another. Such uncertainty can be quantified statistically.

But there’s more to poker than this. Calculating probabilities is the easy part. The best poker players can all do this. So the second question is why, then, are some strong poker players better than others? And why do the strongest human players still beat the best computer programs – which can calculate the odds perfectly – in multi-player games? The answer is that there’s even more uncertainty, because you don’t know what the opponent is going to do when he has or does not have two aces. Some deduction of the opponent’s actions is possible, but these require understanding the opponent’s reasoning. Sometimes he may simply be bluffing. Either way, to be a really good poker player you have to get inside your opponent’s head. The best poker players are able to assess this kind of uncertainty, the uncertainty as to how much the statistical rules to apply in any particular case, uncertainties as to basic assumptions.

Expressing risk and uncertainty as PDFs

PDFs in this case doesn’t stand for Portable Document Format, but Probability Density (or Distribution) Function.

The PDF represents the probability (y-axis) of the risk (x-axis) of an event, that is, the y-axis is a measure of uncertainty. Pall et al’s Fig 4 is an example of a PDF. It’s where their statement in court that they were 90% sure that the risk of flooding was greater than 20% higher because of AGW (and so on) came from.

The immediate issue is that risk is a probability function. Our best estimate of the increase in risk (the FAR) because of AGW is 150%, so we’re already uncertain whether the 2000 floods were caused by global warming (the probability is 60% or 3/5). So we have a probability function of a probability function. The only difference between these probability functions is that the one is deemed to be calculable, the other not. Though it has in fact been calculated! Furthermore, as we’ll see, some aspects of the uncertainty in the risk can be reduced, and other aspects cannot – the PDF includes both statistical uncertainty and genuine “we don’t know what we know” uncertainty (and I’m not even discussing “unknown unknowns” here, both types of uncertainty are unknown knowns).

Risk and uncertainty in Pall et al

What Pall et al have done is assume their model is able to assess risks correctly. Everything else, it seems, is treated as uncertainty.

Their A2000 series is straightforward enough. They set sea surface temperatures (SSTs) and the sea-ice state to those observed in April 2000 and roll the model (with minor perturbations to ensure the runs aren’t all identical).

But for the A2000N series they use the same conditions, but set GHG concentrations to 1900 levels, subtract observed 20th century warming from SSTs and project sea-ice conditions accordingly. There’s one hint of trouble, though, they note that the SSTs are set “accounting for uncertainty”. I’m not clear what this means, but it doesn’t seem to be separated out in the results in the same way as will be seen is done for other sources of uncertainty.

They then add on the warming over the 20th century that would have occurred without AGW, i.e. with natural forcings only, according to 4 different models, giving 4 different patterns of warming in terms of SSTs etc. As will be seen, for each of these 4 different patterns they used 10 different “equiprobable” temperature increase amplitudes.

First cause of uncertainty: 4 different models of natural 20th century warming

As Pall et al derive the possible 20th century natural warming using 4 different models giving 4 different patterns of natural warming, there are 4 different sets of results, giving 4 separate PDFs of the AGW FAR of flooding in 2000. Now, listen carefully. They don’t know which of these models gives the correct result, so – quite reasonably – they are uncertain. Their professional judgement is to weight them all equally, so that means that so far, they’ll only be able to say at best something like: we’re 25% certain the FAR is only x; 25% certain it’s y; 25% certain it’s z; and, crikey, there’s a 25% possibility it could be as much as w!

Trouble is, they can only run 2,000 or so of each of 4 non AGW simulations. So for each of the 4 there’ll be a sampling error. They treat this statistical uncertainty in exactly the same way as what we might call their professional judgement uncertainty, which certainly gives me pause for thought. So what happens is they smear the 4 estimates x, y, z and w and combine them into one “aggregate histogram” (see their Fig 4). That’s how they’re able to say we’re 90% certain the FAR is >20% and so on.

Nevertheless, their Fig 4 also includes the 4 separate histograms for our estimates x, y, z and w. It’s therefore possible for another expert to come along and say, “well, x has been discredited so I’m just going to ignore the pink histogram and look at the risk of y, z and w” or “z is far and away the most thorough piece of work, I’ll take my risk assessment from that”, or even to weight them other than evenly.

One of the 4 models may be considered an outlier, as in fact the pink (NCARPCM1) one is in this case. It’s the only one with a most likely (and median) FAR below the overall median value (or the overall most likely value which happens to be higher than the overall median). Further investigation might suggest it should be discarded.

Another critical point: x, y, z and w can be determined as accurately as we want by running more simulations, because the statistical uncertainty reduces as the square root of the number of data items (see Part 1).

I’m not going to argue any more as to whether the 4 models introduce uncertainty. Clearly they do. I have no way of determining which of the 4 models most correctly estimate natural warming between 1900 and 2000. It’s a question of professional judgement.

However, I will point out that if uncertainty between the models is not going to be combined statistically (as in the previous post) I am uneasy about combining them at all:

Criticism 6: The headline findings against each of the 4 models of natural warming over the 20th century should have been presented separately in a similar way to the IPCC scenarios (for example as in the figure in my recent post, On Misplaced Certainty and Misunderstood Uncertainty).

Second cause of uncertainty: 10 different amounts of warming from each of the 4 models of natural 20th century warming

But Pall et al didn’t stop at 4 models of natural 20th century warming. They realised that each of the 4 models has statistical uncertainty in its modelling of the amount of natural warming to 2000. The models in particular each noted a risk of greater than the mean warming. This has to be accounted for in the initial data to our flood modelling. Never mind, you’d have thought, let’s see how often floods occur overall, because what we’re interested in is the overall risk of flooding.

But Pall et al didn’t simply initialise their model with a range of initial values for the amplitude of warming for each of their 4 scenarios. They appear to have created 10 different warming amplitudes for each of the 4 scenarios and treated each of these as different cases. This leaves me bemused, as the 4 scenarios must also have had different patterns of warming, so why not create different cases from these? Similarly, they seem to have varied initial SST conditions in their AGW model since they “accounted for uncertainty” in that data. Why, then, were these not different cases?

I must admit that even after spending last Sunday morning slobbing about pondering Pall et al, rather than just slobbing about as usual, I am still uncertain(!) whether Pall et al did treat each of the 10 sub-scenarios as separate cases. If not, they did something else to reduce the effective sample size and therefore increase the statistical uncertainty surrounding their FAR estimates. Their Methods Summary section talks about “Monte Carlo” sampling, which makes no sense to me in this case as we can simply use Statistics 101 methods (as shown in Part 1).

The creation of 10 sub-scenarios of each scenario (or the Monte Carlo sampling) effectively means that, instead of 4 tightly constrained estimates of the risk, we have 4 wide distributions. Remember (see previous post) the formula for calculating the statistical uncertainty (Standard Deviation (SD)) that the mean of a sample represents the mean of the overall population is:

SQRT((sample %)*(100-sample%)/sample size) %

so varies with the square root of the sample size. In this case the sample sizes for each of the 4 scenarios was 2000+, so that of each of the 10 subsets was only around 200. The square root of 10, obviously, is 3 and a bit, so the error associated with a sample of 200 gives an error 3 times as large as if the sample size were 2000.

For example, one of the yellow runs is an outlier: it predicts floods about 15% of the time. How confident can we be in this figure?:

SQRT((15*85)/200) = ~2.5

So it’s likely (within 1 SD either way) that the true risk is between 12.5 and 17.5% and very likely (2 SD either way) only that it is between 10 and 20%.

So if we ran enough models we might find that that particular yellow sub-scenario only implied a flood risk of somewhere around 10%. Or maybe it was even more. The trouble is, in salami-slicing our data into small chunks and saying we’re uncertain which represents the true state of affairs, we’ve introduced statistical uncertainty. And this affects our ability to be certain, since it is bound to increase the number of extreme results in our suite of 40 scenarios, disproportionately affecting our ability to make statements as to what we are certain or very certain of.

Criticism 7: The design of the Pall et al modelling experiment ensures poor determination of the extremes of likely true values of the FAR – yet it is the extreme value that was presumably required, since that was presented to the world in the form of the statement in the Abstract that AGW has increased the risk of floods “in 9 out of 10 cases” by “more than 20%“. The confidence in the 20% figure is in fact very low!

Note that if the April 2000 temperature change amplitude variability had been treated as a risk, instead of as uncertainty, the risks in each case would have been tightly constrained and the team would have been able to say it was very likely (>90%) that the increased flood risk due to AGW exceeds 60% (since all the 4 scenarios would yield an increased risk of more than that) and likely it is greater than 150% (since 3 of the 4 scenarios suggest more than that).

The problem of risks within risks

Consider how the modelling could have been done differently, at least in principle. Instead of constructing April 2000 temperatures based on previous modelling exercises and running the model from there, they could have modelled the whole thing (or at least the natural forcing representations) from 1900 to autumn 2000 and output rainfall data for England. Without the intermediate step of exporting April 2000 temperatures from one model to another there’d be no need to treat the variable as “uncertainty” rather than “risk”.

Similarly, say we were interested in flooding in one particular location. Say it’s April 2011 and we’re concerned about this autumn since the SSTs look rather like those in 2000. Maybe we’re concerned about waterlogging of Reading FC’s pitch on the day of the unmissable local derby with Southampton in early November. Should we take advantage of a £10 advance offer for train tickets for a weekend away in case the match is postponed or wait until the day and pay £150 then if the match is off?

In this case we’d want to feed the aggregate rainfall data from Pall et al’s model into a local rainfall model. By Pall et al’s logic everything prior to our model would count as “uncertainty”. We’d input a number of rainfall scenarios into our local rainfall model and come up with a wide range of risks of postponement of the match, none of which we had a great deal of confidence in. I might want to be 90% certain there was a 20% chance of the match being postponed before I spent my tenner. I’d have to do a lot more modelling to eliminate statistical uncertainty if I use 10 separate cases than if I treat them all the same.

How Pall et al could focus on improving what we know

If we inspect Pall et al’s Figs 3, it looks first of all that very few – perhaps just 1 yellow and 1 pink – of the 40 non-AGW cases result in floods 10% of the time (this includes the yellow run that predicts 15%). About 12% of the AGW runs result in floods. Yet we’re only able to say we are 90% certain that the flood risk is 20% greater because of AGW. This would imply at most 4 non AGW runs within 20% of the AGW flood risk (i.e. predicting a greater than 10% flood risk).

If we look at Pall et al’s Fig 4, we see that, first:
– the “long tail” where the risk of floods is supposedly somewhat (FAR <-0.25!) greater “without AGW” is almost entirely due to the yellow outlier case. If just 10 runs in this case had not predicted flooding instead of predicting it then the long tail of the entire suite of 10,000 runs would have practically vanished.
– the majority of the risk of the FAR being below its 10th percentile (giving rise to the statement of 90% probability of a FAR of greater than (only) 20%) is attributable to pink cases.

It would have been possible to investigate these cases further, simply by running more simulations of the critical cases to eliminate the statistical uncertainty. I can hear people screaming “cheat!”. But this simply isn’t cheating. Obviously if 10x as many runs of the critical cases as non-critical ones are done, they’d have to be scaled down when the statistical data is combined (but this must have been done anyway as the sample sizes for the different scenarios were not the same). It’s not cheating. In fact, it’s good scientific investigation of the critical cases. If we want to be able to quote the increased risk of flooding because of AGW at the 10 percentile level (i.e. that we’re 90% sure of) with more certainty then that’s what our research should be aimed at.

Of course, if we find that the yellow sub-scenario really does suggest a risk of flooding of 15%, somewhat more than with AGW on top, and we don’t see regression to the mean, that might also tell us something interesting. Maybe the natural variability is more than we thought and that April 2000 meteorological conditions (principally SSTs) were possible that would have left the UK prone to even more flooding than actually occurred with more warming.

Criticism 8: Having introduced unnecessary uncertainty in the design of their modelling experiment, Pall et al did not take use of the opportunities available to eliminate such uncertainty by running a final targeted batch of simulations.

Preliminary conclusion

It looks like there’s going to have to be a Part 3 as I have a couple more points to make about Pall et al and will need a proper summary.

Nevertheless, I understand a lot better than I did at the outset why they are only able to say we’re 90% certain the FAR is at least 20% etc.

But I still don’t agree that’s what they should be doing.

We want to use the outputs of expensive studies like this to make decisions. Part of Pall et al’s job should be to eliminate statistical uncertainty, not introduce it.

They should have provided one headline figure of the increased risk due to global warming, about 2.5 times as much, taking into account all their uncertainties.

And the only real uncertainties in the study should have been between the 4 different patterns of natural warming. These are the only qualitative differences between their modelling runs. Everything else was statistical and should have been minimised by virtue of the large sample sizes.

If we just label everything as uncertainty and not as risk, we’re not really saying anything.

After all, it might be quite useful for policy-makers to know that flood risks are already 2.5 times what they were in 1900. This might allow the derivation of some kind of metric as to how much should be spent on flood defences in the future, or even on relocation of population and/or infrastructure away from vulnerable areas. Knowing that the scientists are 90% certain the increased risk is greater than 20% really isn’t quite as useful.

The aim of much research in many domains, including the study of climate, and in particular that of Pall et al should be to quantify risks and eliminate uncertainties. It rather seems they’d done neither satisfactorily.

(to be continued)

23/2/11, 16:22: Fixed typo, clarified remarks about the value of Pall et al’s findings to policy-makers.

February 20, 2011

Extreme Madness: A critique of Pall et al (Part 1: General comments on the paper and discussion of use of statistics)

Filed under: Effects, Global warming, Science, UK climate trends — Tim Joslin @ 3:59 pm

Do what I say, not what I do. Refrain from seeking out papers in scientific journals, because they inevitably create more questions than answers. Jobs for the boys, I suppose.

I first read about Pall et al last Thursday when a headline on guardian.co.uk caught my eye: Climate change doubled likelihood of devastating UK floods of 2000. What could that possibly mean? The point is we know the floods occurred.

Are we saying there would have been a 50% chance of them happening if global warming hadn’t occurred? That would at least make sense, but seems to me extremely unlikely, since autumn 2000 was apparently the wettest since records began in 1766. The chances of an entirely different set of weather events in a parallel universe coming together to produce something as extreme is clearly much less than one in two.

I was just mulling over this when a Realclimate post notification popped into my Inbox. Nature, which this week splashed on rain (ho, ho), had of course caught the eye of Gavin Schmidt, who reported on Pall et al and another paper in the same issue. I immediately dived in where professional scientists with people to upset fear to tread and voiced some of my concerns. Gavin responded (I stand by the points I made which he disagrees with, btw) and the debate went on, a Mathieu chipped in, violently agreeing with me, as I pointed out and I similarly responded to some remarks by a Thomas.

At this point I started to get serious about the issue. The rest of this post is a more systematic critique of Pall et al.

What a way to conduct a debate

It is absurd that we are attempting to formulate policy on the basis of information that is not in the public domain. Particularly since a weekly scientific news cycle has developed as the main journals try to grab headlines. As well as the main Guardian article, George Monbiot also commented soberly on Pall et al, remarking that:

“[Pall et al] gives us a clear warning that more global heating is likely to cause more floods here.”

though when he says:

“They found that, in nine out of 10 cases, man-made greenhouse gases increased the risks of flooding…”

he (or the dreaded sub-editor) has in fact lost the sense of Pall et al’s Abstract, which went on to say:

“…by more than 20%”.

so George’s “nine out of 10” is in fact an understatement.

The science news cycle process does rather allow a bit of spin. I hate to say it, but the main Guardian piece does have the feel of having been planned in advance – hey, journo, here’s three quotes for the price of one. As well as Myles Allen (the leader of the Pall et al team, and one of the paper’s authors), a Richard Lord QC is also quoted. It’s not immediately obvious, but Lord appears to be a long-time collaborator of Allen in what has to be described as a political project to use the legal system to tackle the global warming problem. I’m not at all sure about the “blame game” in general. It seems if anything to put obstacles in the way of reaching international agreement on emissions cuts.

It wasn’t until Friday afternoon that I was able to read the whole of Pall et al, rather than just the Abstract (thanks Ealing Central Library). Nature is a good journal, but I don’t think they paid for the work that went into Pall et al. In fact the climate modelling was actually executed by volunteers at climateprediction.net. This is an exciting initiative, but, as someone who once participated (I was pleased my model showed an extreme result of something like 11C 21st century warming!), it would be much better – and I’d be much more likely to take the trouble to participate again – if the results were presented in an open manner, rather than held back (it seems) for scientific papers that appear a year after they’re submitted, so well after the experiment. Much more could be done to at least explain the findings of all the experiments to date on the site.

Anyway, here I finally am with a cup of tea, a hot-cross bun and my dissecting kit, so let’s proceed…

The Pall et al method

It turns out that what Pall et al did was initialise the state of climate models to April 2000. They ran one set of 2268 simulations (their A2000) with the actual conditions and other sets (of 2158, 2159, 2170 and 2070 simulations) each with one of 4 counterfactuals (each with 10 “equally probable” variants so 40 scenarios in all), with global warming stripped out.

They fed the climate model inputs into a flood model to determine run-off and considered the floods had been predicted if the average daily run-off was equal or greater than the 0.41mm recorded in autumn 2000.

The result was a set of graphs showing the results with and without global warming. Basically these consist of a bunch of results from the global warming case and each of the 4 models. They show these as cumulative frequency distributions, such that 100% of the global warming case (a line of dots on the log scale they use) result in run-off above 0.3mm/day, 13% (1 in about 7.5) above 0.4mm, maybe (the graphs are quite small) 12% (1 in about 8.5) above the actual flood level of 0.41mm/day and so on, with around 1.2% (1 in about 80) as high as 0.55mm a day (which presumably is a Biblical level). Actually I’ve just realised that in fact the graphs (Pall et al’s Fig 3) are printed with the horizontal access logarithmic scale marked with the same subdivisions for occurrence frequency (as I carelessly read before my final Realclimate post) and its inverse, return time (which actually is a log scale) – you’d think a peer reviewer or someone at Nature would have spotted that in the 10 and a half months between submission and publication.

The other cases (the A2000Ns in Pall et al’s terminology) are each 10 similar lines of dots, so appropriately enough they appear as a spray, running below the A2000 line, except in 2 cases which manage to nip above the A2000 line.

Call me naive, but I think this shows that in >95% of cases (that is, except for two out of 40, part of the time) the 2000 floods were worse than they would have been without global warming. That is, according to the modelling, the exercise has shown, statistically significantly, that the flooding was worse as a result of global warming. All we need to do is assume the same model errors affected all the scenarios approximately equally. This seems an intelligent conclusion.

But that’s not what the authors do. They randomly select from the A2000s and each of the 4 sets of A2000Ns to produce graphs of the probability distribution of the run-off being more likely to exceed the threshold of 0.41mm/day (the actual level). They also produce a combined graph, and this is where the aforementioned increased risk of greater than 20% in 9 out of 10 cases comes from, as well as an increased risk of 90% in 2 out of 3 cases and the Guardian headline of approximately double the risk at the median.

The point is that Pall et al don’t want to just say “flooding will be more severe”, they want to be able to calculate the fraction of attributable risk (FAR) for anthropogenic global warming (AGW) for the particular event. Why? So they can take people to court, that’s why.

As I noted in my final Realclimate post on the topic, it seems to me that Pall et al are trying to push things just a little too far.

About this 0.41mm threshold

This wasn’t where I intended to start, but it seems logical. Why define the flood event in this way? Why not say anything over say 0.4mm/day would count as a flood? Floods aren’t threshold types of things anyway.

Further, why are we including runs with very high runoffs? These types of models are known to sometimes “go wild”. Surely we’re interested in forecasting the actual flood event, not some other extreme.

One effect of choosing the 0.41mm threshold is it makes the flood reasonably rare. But as I argued repeatedly on Realclimate, the flood definitely happened; one reason it’s rare in the modelling experiment is because the model (and/or the initial data it was supplied with) is not good enough to forecast it more than about 1 in 8.5 cases or about 12% of the time. We’ll have to come back to this.

Now here’s another pet hate. The fact that the flood is rare in both the A2000 and A2000N model runs means that the result can (and is) expressed as a % increase in risk, even if George Monbiot (or his sub-editor) managed to miss this off. If the occurrence in both sets of data had been higher then these percentages would have been considerably lower.

For example, Fig 3b (using GFDLR30 data in purple, for those with access to the paper) is the easiest to read as the A2000 series is much better than the purple set of A2000Ns at predicting the flood. For the “best” (probably warmest) of the purple A2000N series, I can therefore read off intersection data together with that for the A2000 series. For 0.41mm/day A2000 predicts the flood about 12% of the time (1 run in every 8.5) whilst the A2000N predicts it 5% of the time (one year in 20). We’d conclude on the basis of this data that the increased risk of the flood because of AGW is around 140% (i.e. 12/5 = 2.4 times what it was before).

But for 0.35mm I get 50% (1 in 2) and 33% (1 in 3) respectively, so the flood risk is only about 50% greater!

As a check, if I go even higher to 0.46mm I get 5% (1 in 20) and about 1.5% (around 1 in 70), so the flood risk is 233% greater.

It’s well known, as discussed in the other paper in this week’s Nature, Min et al, that climate models tend to underestimate extreme precipitation events, so choosing a lower runoff threshold for the flood might have made some sense. On the other hand, exceptionally extreme events become much more likely with AGW.

I can’t find any calibration between the models used by Pall et al and actual rainfall (e.g. by trying to simulate other years) – maybe they’re just not very good at forecasting rainfall in flood years or maybe they forecast the same rainfall every year, regardless of the initial condition in April.

Criticism 1: The paper should have included the the real-world distribution of run-offs which the modelling is supposedly correlated with.

Criticism 2: The paper should have included validation of the model against actual run-offs over a number of years. Some model runs should have been initialised to the conditions in April 1999, 2001 etc.

If I’d been editor of Nature (and I never will be if this upsets the wrong people – the sacrifices I make for truth), I might have asked for such a calibration or at least a sensitivity analysis between the “increased risks” and the flood threshold value chosen.

Criticism 3: The results should have been presented as a graph of increased risk of floods of different severity (and therefore different return times).

About this computing time

As I mentioned earlier, Pall et al ran over 10,000 simulations the autumn 2000 weather. Yet whilst their mean case is that the floods in the AGW case were about 2.5 times likely as without AGW, they are only 90% confident that the floods were 20% more likely to occur.


If I do an opinion poll – as I happen to have – I can tell you within a small % how the nation will vote.

So I stared at Pall et al’s method and the more I think about it the more bizarre it seems. They’ve only gone and sampled the samples! In their Fig. 4 they’ve presented a Monte Carlo distribution of samples of pairs from each set of simulations, plotting the probability in each case of the floods being worse because of AGW. They don’t give the sample size – 43 say – of each of these Monte Carlo samples, but unless I’ve gone completely mad, these plots are sensitive to the sample size. i.e. if they’d taken a sample size of say 87 random pairs of simulations the certainty (that the floods are 2.5 times as likely to occur in the AGW case) would have been greater (probably by the square root of 2, but that’s just an educated guess). This is basically an example of how what we used to call “technowank” in the IT trade can go badly wrong.

If I’m right and I think I am, Pall et al have not only presented the wrong headline finding (the world should have been informed that the floods, according to their modelling exercise, were 2.5x as likely because of AGW +/- not very much), they’ve also thrown away the advantage of using so much computer time – I read somewhere that those 10,000+ simulations would have cost £120m if run commercially rather than as volunteers’ screensavers!

They say it’s better to understand how to do something simple, than misunderstand something complex. Well, they don’t actually, I just made that up. Anyway, here’s some schoolboy stats Pall et al could have employed:

From their graphs, about 12% of the AGW simulations were greater than their 0.41mm threshold for the flood. With a sample size of 2268, what my textbook calls the STandard Error of Percentages (STEP), the standard deviation of this estimate of the whole (infinite) population of simulations is given by:

SQRT((12*(100-12))/2268) = 0.68%

That is, it’s likely (within 1 SD) that the actual risk of flooding in the AGW case (according to our model) is 12+/-0.68%.

Similarly for the counterfactual ensemble (all 40 sets combined), it’s likely (based on inspection of their Fig.4 that the number of AGW simulations exceeding the 0.41mm threshold is 2.5x the number of non-AGW ones doing so) that the flood risk without AGW is within 4.8%+/-:

SQRT((4.8*(100-4.8))/8557) = 0.23%

There’s probably some clever stato way of combining these estimates, but all I’m going to do is crudely compare the top estimate of each with the bottom estimate of the other – that gives us roughly 2 standard deviations. On this basis, according to our modelling, the actual likelihood of the floods occurring because of AGW has increased by a factor of very likely between 12.68/4.57 = 2.8 and 11.32/5.03 = 2.2, with a best estimate of 2.5 times.

This is an important conclusion because the problem with global warming is not just or even mainly the increase in averages, in this case of precipitation. That may not be noticeable.

I think I’ll stop here and consider in another post my more philosophical arguments as to how the methodology of the Pall et al study is dubious.

In the meantime:

Criticism 4: Pall et al’s statistical approach understated the certainty of their modelling result. In fact the study provides some evidence that:

Even the limited warming over the 20th century is very likely, according to a comparative modelling exercise, to have made flooding of the severity of that in 2000 between 2.2 and 2.8 times as likely as in 1900. Historical records suggest the 2000 floods were around a once in 400 year event before global warming, but as a result of the warming up to 2000 they are, according to this modelling exercise, a once in 140 to 180 years event.

Criticism 5: The study should have run ensembles with the expected increase in temperatures expected by (say) 2030 and 2050.

(to be continued)

22/2/11, 11:13: Correction of typo and minor mods for clarity.
22/2/11. 16:07: Corrected another typo and clarified the meaning of the STEP calculations.

November 23, 2009

A Message from Cockermouth to Copenhagen

Dear, oh dear. The issue of the CRU hack is simply not going to go away.

BBC Radio 4 gave some of its Today programme airtime to Nigel Lawson this morning. It’s not very clear to the casual listener exactly what Lawson’s position is, since he seemed to claim he wasn’t denying the science, just the policy, perhaps in a similar fashion to Bjorn Lomborg. But (it seemed to me whilst having breakfast) Lawson then went on to question the science.

Annoyingly, Lawson, an experienced politician (though disastrous economic policy-maker – stoking back in the 1980s the sort of boom-bust his party, the Tories are ironically criticising Labour for – so his track record doesn’t suggest a lot of faith should be put in his judgement of complex issues) and therefore used to media appearances, came across rather better than the scientist (whose name I didn’t catch) up against him.

I picked up a couple of points:

1. 1998 still warmest year

Lawson kept insisting it hasn’t warmed since 1998.

The problem here is that the scientists have picked the wrong weapon for the duel. The average surface temperature is highly variable. It varies by much more than the average annual temperature increase, so is bound to vary erratically over relatively short time periods.

The point is that the ocean will take many centuries – possibly millennia – to completely warm up. It only takes a larger than average amount (or strictly area) of cold water coming to the surface one year to reduce the average surface temperature of the planet compared to the previous year.

But the ocean gains heat (and ice melts) every year that the planet is out of thermal equilibrium (radiating less heat away than it receives from the Sun, because GHGs capture the energy). Perhaps the scientists should develop tools for measuring the total heat gain of the planet – or at least the oceans – rather than the average surface temperature. They could then tell us how many PWh (maybe the next up EWh, exawatt hours) we’ve gained each year.

But what really gets me is how much the scientists downplay another major prediction of their theory – that there will be more extreme weather. The rhetoric they use is bizarre. Normally you hear (and Hilary Benn the UK Government Minister said this sort of thing on Sunday) something along the lines of “you can’t attribute a single event to climate change” and “this is the sort of thing we can expect more of in future”.

I’d like to make a philosophical point here. I’d like to dispense with this ridiculous “can’t attribute a single event to climate change” business. Because you can’t not attribute it to climate change either! We do not have the luxury of what scientists would call a “control”. We have no other planet where we haven’t put GHGs into the atmosphere. When someone says “you can’t attribute a single event to climate change” people hear “it might have happened anyway”. No, it might not have happened anyway, because there is no “anyway”.

What I really don’t understand is why the scientists don’t make more of events that confirm their theory. Because that’s how science progresses. The prediction is “there will be more extreme weather events, such as flooding”. In the UK this week we’ve had such an extreme event. We’ve had the heaviest rainfall ever experienced in 24 hours.

Let’s just consider how big a record this is. The UK has been recording weather for a long time – centuries. And in all that time there’s never been as much rain in a 24 hour period. In fact, I’ve heard the previous record – the Martinstown Deluge of 1955 – doubted because of its implausibility!

If it was me I’d be crowing. The theory predicted this sort of thing. The Cockermouth event is strong support for global warming.

Consider other complex systems, the financial markets, say. You might hear predictions along the lines of “continued loose monetary policy will lead to further rises in the price of gold”. When the gold price rises do you think those who predicted it um and ah about how “it might have happened anyway”? You bet you don’t.

2. Datasets not in the public domain

Another point came out of the discussion on Today this morning. If I gathered correctly what was being said, the point was that the hacked emails included cases of data being wiped. And apparently it turns out this is because some of those who supplied the data considered it to be a valuable asset (in fact, it presumably is a valuable asset in that it can be sold). This is unacceptable.

It’s a fundamental tenet that scientific findings must be reproducible. And if the finding is an analysis of certain data, then others are unable to reproduce the findings. Perhaps the Copenhagen participants should spend a little of the $bns they’re throwing at the problem on paying data owners (meteorological offices) to put their data in the public domain. Scientific conclusions shouldn’t have to rely on the integrity of those with privileged access to measurements!

Part of the problem is the science seems so arcane to the general public. It needn’t be. We can all look at weather records and perhaps should be encouraged to do so.

You can, for example, download various historical records from the UK Met Office website and do your own analysis. Basically anyone can put together this sort of thing, from Joe Romm’s site.

My irritation should not, I suppose, be with “the scientists”. I know that’s what I’ve written. But it’s an oversimplification. The problem is partly the way science works. Detailed work is rewarded much more highly, in general, than high level explanation. We need more generalists who can bridge the gap between the nitty-gritty science and the public.

Here, again from Joe Romm’s blog is how the issue should be presented. K.I.S.S. Keep it simple, stupid.

Older Posts »

Create a free website or blog at WordPress.com.