Does data coverage impact the HADCRUT4 and NASA GISS Temperature Anomalies?

Introduction

This post started with the title “HADCRUT4 and NASA GISS Temperature Anomalies – a Comparison by Latitude“.  After deriving a global temperature anomaly from the HADCRUT4 gridded data, I was intending to compare the results with GISS’s anomalies by 8 latitude zones. However, this opened up an intriguing issue. Are global temperature anomalies impacted by a relative lack of data in earlier periods? The leads to a further issue of whether infilling of the data can be meaningful, and hence be considered to “improve” the global anomaly calculation.

A Global Temperature Anomaly from HADCRUT4 Gridded Data

In a previous post, I looked at the relative magnitudes of early twentieth century and post-1975 warming episodes. In the Hadley datasets, there is a clear divergence between the land and sea temperature data trends post-1980, a feature that is not present in the early warming episode. This is reproduced below as Figure 1.

Figure 1 : Graph of Hadley Centre 7 year moving average temperature anomalies for Land (CRUTEM4), Sea (HADSST3) and Combined (HADCRUT4)

The question that needs to be answered is whether the anomalous post-1975 warming on the land is due to real divergence, or due to issues in the estimation of global average temperature anomaly.

In another post – The magnitude of Early Twentieth Century Warming relative to Post-1975 Warming – I looked at the NASA Gistemp data, which is usefully broken down into 8 Latitude Zones. A summary graph is shown in Figure 2.

Figure 2 : NASA Gistemp zonal anomalies and the global anomaly

This is more detail than the HADCRUT4 data, which is just presented as three zones of the Tropics, along with Northern and Southern Hemispheres. However, the Hadley Centre, on their HADCRUT4 Data: download page, have, under  HadCRUT4 Gridded data: additional fields, a file HadCRUT.4.6.0.0.median_ascii.zip. This contains monthly anomalies for 5o by 5o grid cells from 1850 to 2017. There are 36 zones of latitude and 72 zones of longitude. Over 2016 months, there are over 5.22 million grid cells, but only 2.51 million (48%) have data. From this data, I have constructed a global temperature anomaly. The major issue in the calculation is that the grid cells are of different areas. A grid cell nearest to the equator at 0o to 5o has about 23 times the area of a grid cell adjacent to the poles at 85o to 90o. I used the appropriate weighting for each band of latitude.

The question is whether I have calculated a global anomaly similar to the Hadley Centre. Figure 3 is a reconciliation with the published global anomaly mean (available from here) and my own.

Figure 3 : Reconciliation between HADCRUT4 published mean and calculated weighted average mean from the Gridded Data

Prior to 1910, my calculations are slightly below the HADCRUT 4 published data. The biggest differences are in 1956 and 1915. Overall the differences are insignificant and do not impact on the analysis.

I split down the HADCRUT4 temperature data by eight zones of latitude on a similar basis to NASA Gistemp. Figure 4 presents the results on the same basis as Figure 2.

Figure 4 : Zonal surface temperature anomalies a the global anomaly calculated using the HADCRUT4 gridded data.

Visually, there are a number of differences between the Gistemp and HADCRUT4-derived zonal trends.

A potential problem with the global average calculation

The major reason for differences between HADCRUT4 & Gistemp is that the latter has infilled estimated data into areas where there is no data. Could this be a problem?

In Figure 5, I have shown the build-up in global coverage. That is the percentage of 5o by 5o grid cells with an anomaly in the monthly data.

Figure 5 : HADCRUT4 Change in the percentage coverage of each zone in the HADCRUT4 gridded data. 

Figure 5 shows a build-up in data coverage during the late nineteenth and early twentieth centuries. The World Wars (1914-1918 & 1939-1945) had the biggest impact on the Southern Hemisphere data collection. This is unsurprising when one considers it was mostly fought in the Northern Hemisphere, and European powers withdrew resources from their far-flung Empires to protect the mother countries. The only zones with significantly less than 90% grid coverage in the post-1975 warming period are the Arctic and the region below 45S. That is around 19% of the global area.

Finally, comparing comparable zones in the Northen and Southern hemispheres, the tropics seem to have comparable coverage, whilst for the polar, temperate and mid-latitude areas the Northern Hemisphere seems to have better coverage after 1910.

This variation in coverage can potentially lead to wide discrepancies between any calculated temperature anomalies and a theoretical anomaly based upon one with data in all the 5o by 5o grid cells. As an extreme example, with my own calculation, if just one of the 72 grid cells in a band of latitude had a figure, then an “average” would have been calculated for a band right around the world 555km (345 miles) from North to South for that month for that band. In the annual figures by zone, it only requires one of the 72 grid cells, in one of the months, in one of the bands of latitude to have data to calculate an annual anomaly. For the tropics or the polar areas, that is just one in 4320 data points to create an anomaly. This issue will impact early twentieth-century warming episode far more than the post-1975 one. Although I would expect the Hadley centre to have done some data cleanup of the more egregious examples in their calculation, potentially lack of data in grid cells could have quite random impacts, thus biasing the global temperature anomaly trends to an unknown, but significant extent. An appreciation of how this could impact can be appreciated from an example of NASA GISS Global Maps.

NASA GISS Global Maps Temperature Trends Example

NASA GISS Global Maps from GHCN v3 Data provide maps with the calculated change in average temperatures. I have run the maps to compare annual data for 1940 with a baseline of 1881-1910, capturing much of the early twentieth-century warming. The maps are at both the 1200km and 250km smoothing.

Figure 6 : NASA GISS Global anomaly Map and average anomaly by Latitude comparing 1940 with a baseline of 1881 to 1910 and a 1200km smoothing radius

Figure 7 : NASA GISS Global anomaly Map and average anomaly by Latitude comparing 1940 with a baseline of 1881 to 1910 and a 250km smoothing radius. 

With respect to the maps in figures 6 & 7

  • There is no apparent difference in the sea data between the 1200km and 250km smoothing radius, except in the polar regions with more cover in the former. The differences lie in the land area.
  • The grey areas with insufficient data all apply to the land or ocean areas in polar regions.
  • Figure 6, with 1200km smoothing, has most of the land infilled, whilst the 250km smoothing shows the lack of data coverage for much of South America, Africa, the Middle East, South-East Asia and Greenland.

Even with these land-based differences in coverage, it is clear that from either map that at any latitude there are huge variations in calculated average temperature change. For instance, take 40N. This line of latitude is North of San Francisco on the West Coast USA, clips Philidelphia on the East Coast. On the other side of the Atlantic, Madrid, Ankara and Beijing are at about 40N. There are significant points on the line on latitude with estimate warming greater than 1C (e.g. California), whilst at the same time in Eastern Europe, cooling may have exceeded 1C in the period. More extreme is at 60N (Southern Alaska, Stockholm, St Petersburg) the difference in temperature along the line of latitude is over 3C. This compares to a calculated global rise of 0.40C.

This lack of data may have contributed (along with a faulty algorithm) to the differences in the Zonal mean charts by Latitude. The 1200km smoothing radius chart bears little relation to the 250km smoothing radius. For instance:-

  •  1200km shows 1.5C warming at 45S, 250km about zero. 45S cuts through South Island, New Zealand.
  • From the equator to 45N, 1200km shows rise from 0.5C to over 2.0C, 250km shows drop from less than 0.5C to near zero, then rise to 0.2C. At around 45N lies Ottowa, Maine, Bordeaux, Belgrade, Crimea and the most Northern point in Japan.

The differences in the NASA Giss Maps, in a period when available data covered only around half the 2592 5o by 5o grid cells, indicate quite huge differences in trends between different areas. As a consequence, trying to interpolate warming trends from one area to adjacent areas appears to give quite different results in terms of trends by latitude.

Conclusions and Further Questions

The issue I originally focussed upon was the relative size of the early twentieth-century warming to the Post-1975. The greater amount of warming in the later period seemed to be due to the greater warming on land covering just 30% of the total global area. The sea temperature warming phases appear to be pretty much the same.

The issue that I focussed upon was a data issue. The early twentieth century had much less data coverage than after 1975. Further, the Southern Hemisphere had worse data coverage than the Northern Hemisphere, except in the Tropics. This means that in my calculation of a global temperature anomaly from the HADCRUT4 gridded data (which in aggregate was very similar to the published HADCRUT4 anomaly) the average by latitude will not be comparing like with like in the two warming periods. In particular, in the early twentieth-century, a calculation by latitude will not average right the way around the globe, but only on a limited selection of bands of longitude. On average this was about half, but there are massive variations. This would be alright if the changes in anomalies were roughly the same over time by latitude. But an examination of NASA GISS global maps for a period covering the early twentieth-century warming phase reveals that trends in anomalies at the same latitude are quite different over time. This implies that there could be large, but unknown, biases in the data.

I do not believe the analysis ends here. There are a number of areas that I (or others) can try to explore.

  1. Does the NASA GISS infilling of the data get us closer or further away from a what a global temperature anomaly would look like with full data coverage? My guess, based on the extreme example of Antartica trends (discussed here) is that the infilling will move away from the more perfect trend. The data could show otherwise.
  2. Are the changes in data coverage on land more significant than the global average or less? Looking at CRUTEM4 data could resolve this question.
  3. Would anomalies based upon similar grid coverage after 1900 give different relative trend patterns to the published ones based on dissimilar grid coverage?

Whether I get the time to analyze these is another issue.

Finally, the problem of trends varying considerably and quite randomly across the globe is the same issue that I found with land data homogenisation discussed here and here. To derive a temperature anomaly for a grid cell, it is necessary to make the data homogeneous. In standard homogenisation techniques, it is assumed that the underlying trends in an area is pretty much the same. Therefore, any differences in trend between adjacent temperature stations will be as a result of data imperfections. I found numerous examples where there were likely differences in trend between adjacent temperature stations. Homogenisation will, therefore, eliminate real but local climatic trends. Averaging incomplete global data where missing data could contain regional but unknown data trends may cause biases at a global scale.

Kevin Marshall

HADCRUT4, CRUTEM4 and HADSST3 Compared

In the previous post, I compared early twentieth-century warming with the post-1975 warming in the Berkeley Earth Global temperature anomaly. From a visual inspection of the graphs, I determined that the greater warming in the later period is due to more land-based warming, as the warming in the oceans (70% of the global area) was very much the same. The Berkeley Earth data ends in 2013, so does not include the impact of the strong El Niño event in the last three years.

Global average temperature series page of the Met Office Hadley Centre Observation Datasets has the average annual temperature anomalies for CRUTEM4 (land-surface air temperature) and HADSST3 (sea-surface temperature)  and HADCRUT4 (combined). From these datasets, I have derived the graph in Figure 1.

Figure 1 : Graph of Hadley Centre annual temperature anomalies for Land (CRUTEM4), Sea (HADSST3) and Combined (HADCRUT4)

  Comparing the early twentieth-century with 1975-2010,

  • Land warming is considerably greater in the later period.
  • Combined land and sea warming is slightly more in the later period.
  • Sea surface warming is slightly less in the later period.
  • In the early period, the surface anomalies for land and sea have very similar trends, whilst in the later period, the warming of the land is considerably greater than the sea surface warming.

The impact is more clearly shown with 7 year centred moving average figures in Figure 2.

Figure 2 : Graph of Hadley Centre 7 year moving average temperature anomalies for Land (CRUTEM4), Sea (HADSST3) and Combined (HADCRUT4)

This is not just a feature of the HADCRUT dataset. NOAA Global Surface Temperature Anomalies for land, ocean and combined show similar patterns. Figure 3 is on the same basis as Figure 2.

Figure 3 : Graph of NOAA 7 year moving average temperature anomalies for Land, Ocean and Combined.

The major common feature is that the estimated land temperature anomalies have shown a much greater warming trend that the sea surface anomalies since 1980, but no such divergence existed in the early twentieth century warming period. Given that the temperature data sets are far from complete in terms of coverage, and the data is of variable quality, is this divergence a reflection of the true average temperature anomalies based on far more complete and accurate data? There are a number of alternative possibilities that need to be investigated to help determine (using beancounter terminology) whether the estimates are a true and fair reflection of the prespective that more perfect data and techniques would provide. My list might be far from exhaustive.

  1. The sea-surface temperature set understates the post-1975 warming trend due to biases within data set.
  2. The spatial distribution of data changed considerably over time. For instance, in recent decades more data has become available from the Arctic, a region with the largest temperature increases in both the early twentieth century and post-1975.
  3. Land data homogenization techniques may have suppressed differences in climate trends where data is sparser. Alternatively, due to relative differences in climatic trends between nearby locations increasing over time, the further back in time homogenization goes, the more accentuated these differences and therefore the greater the suppression of genuine climatic differences. These aspects I discussed here and here.
  4. There is deliberate manipulation of the data to exaggerate recent warming. Having looked at numerous examples three years ago, this is a perspective that I do not believe to have had any significant impact. However, simply believing something not to be the case, even with examples, does not mean that it is not there.
  5. Strong beliefs about how the data should look have, over time and multiple data adjustments created biases within the land temperature anomalies.

What I do believe is that an expert opinion to whether this divergence between the land and sea surface anomalies is a “true and fair view” of the actual state of affairs can only be reached by a detailed examination of the data. Jumping to conclusions – which is evident from many people across the broad spectrum of opinions on catastrophic anthropogenic global warming debate – will fall short of the most rounded opinion that can be gleaned from the data.

Kevin Marshall

 

Ocean Impact on Temperature Data and Temperature Homgenization

Pierre Gosselin’s notrickszone looks at a new paper.

Temperature trends with reduced impact of ocean air temperature – Frank LansnerJens Olaf Pepke Pedersen.

The paper’s abstract.

Temperature data 1900–2010 from meteorological stations across the world have been analyzed and it has been found that all land areas generally have two different valid temperature trends. Coastal stations and hill stations facing ocean winds are normally more warm-trended than the valley stations that are sheltered from dominant oceans winds.

Thus, we found that in any area with variation in the topography, we can divide the stations into the more warm trended ocean air-affected stations, and the more cold-trended ocean air-sheltered stations. We find that the distinction between ocean air-affected and ocean air-sheltered stations can be used to identify the influence of the oceans on land surface. We can then use this knowledge as a tool to better study climate variability on the land surface without the moderating effects of the ocean.

We find a lack of warming in the ocean air sheltered temperature data – with less impact of ocean temperature trends – after 1950. The lack of warming in the ocean air sheltered temperature trends after 1950 should be considered when evaluating the climatic effects of changes in the Earth’s atmospheric trace amounts of greenhouse gasses as well as variations in solar conditions.

More generally, the paper’s authors are saying that over fairly short distances temperature stations will show different climatic trends. This has a profound implication for temperature homogenization. From Venema et al 2012.

The most commonly used method to detect and remove the effects of artificial changes is the relative homogenization approach, which assumes that nearby stations are exposed to almost the same climate signal and that thus the differences between nearby stations can be utilized to detect inhomogeneities (Conrad and Pollak, 1950). In relative homogeneity testing, a candidate time series is compared to multiple surrounding stations either in a pairwise fashion or to a single composite reference time series computed for multiple nearby stations. 

Lansner and Pederson are, by implication, demonstrating that the principle assumption on which homogenization is based (that nearby temperature stations are exposed to almost the same climatic signal) is not valid. As a result data homogenization will not only eliminate biases in the temperature data (such a measurement biases, impacts of station moves and the urban heat island effect where it impacts a minority of stations) but will also adjust out actual climatic trends. Where the climatic trends are localized and not replicated in surrounding areas, they will be eliminated by homogenization. What I found in early 2015 (following the examples of Paul Homewood, Euan Mearns and others) is that there are examples from all over the world where the data suggests that nearby temperature stations are exposed to different climatic signals. Data homogenization will, therefore, cause quite weird and unstable results. A number of posts were summarized in my post Defining “Temperature Homogenisation”.  Paul Matthews at Cliscep corroborated this in his post of February 2017 “Instability og GHCN Adjustment Algorithm“.

During my attempts to understand the data, I also found that those who support AGW theory not only do not question their assumptions but also have strong shared beliefs in what the data ought to look like. One of the most significant in this context is a Climategate email sent on Mon, 12 Oct 2009 by Kevin Trenberth to Michael Mann of Hockey Stick fame, and copied to Phil Jones of the Hadley centre, Thomas Karl of NOAA, Gavin Schmidt of NASA GISS, plus others.

The fact is that we can’t account for the lack of warming at the moment and it is a travesty that we can’t. The CERES data published in the August BAMS 09 supplement on 2008 shows there should be even more warming: but the data are surely wrong. Our observing system is inadequate. (emphasis mine)

Homogenizing data a number of times, and evaluating the unstable results in the context of strongly-held beliefs will bring the trends evermore into line with those beliefs. There is no requirement for some sort of conspiracy behind deliberate data manipulation for this emerging pattern of adjustments. Indeed a conspiracy in terms of a group knowing the truth and deliberately perverting that evidence does not really apply. Another reason for the conspiracy not applying is the underlying purpose of homogenization. It is to allow that temperature station to be representative of the surrounding area. Without that, it would not be possible to compile an average for the surrounding area, from which the global average in constructed. It is this requirement, in the context of real climatic differences over relatively small areas, I would suggest leads to the deletions of “erroneous” data and the infilling of estimated data elsewhere.

The gradual bringing the temperature data sets into line will beliefs is most clearly shown in the NASA GISS temperature data adjustments. Climate4you produces regular updates of the adjustments since May 2008. Below is the March 2018 version.

The reduction of the 1910 to 1940 warming period (which is at odds with theory) and the increase in the post-1975 warming phase (which correlates with the rise in CO2) supports my contention of the influence of beliefs.

Kevin Marshall

 

A note on Bias in Australian Temperature Homogenisations

Jo Nova has an interesting and detailed post guest post by Bob Fernley-Jones on heavily homogenised rural sites in Australia by the Australian BOM.

I did a quick comment that was somewhat lacking in clarity. This post is to clarify my points.

In the post Bob Fernley-Jones stated

The focus of this study has been on rural stations having long records, mainly because the BoM homogenisation process has greatest relevance the older the data is.

Venema et al. 2012 stated (Italics mine)

The most commonly used method to detect and remove the effects of artificial changes is the relative homogenization approach, which assumes that nearby stations are exposed to almost the same climate signal and that thus the differences between nearby stations can be utilized to detect inhomogeneities (Conrad and Pollak, 1950). In relative homogeneity testing, a candidate time series is compared to multiple surrounding stations either in a pairwise fashion or to a single composite reference time series computed for multiple nearby stations.

This assumption of nearby temperature stations being exposed to same climate signal is standard practice. Victor Venema, (who has his own blog) is a leading academic expert on temperature homogenisation. However, there are extreme examples where this assumption does not hold. One example is at the end of the 1960s in much of Paraguay where average temperatures fell by one degree. As this was not replicated in the surrounding area both GISTEMP and Berkeley Earth homogenisations eliminated this anomaly. This was despite using very different homogenisation techniques. My analysis is here.

On a wider scale take a look at the GISTEMP land surface temperature anomaly map for 2014 against 1976-2010. (obtained from here)


Despite been homogenised and smoothed it is clear that trends are different. Over much of North America there was cooling, bucking the global trend. What this suggests to me is that the greater the distance between weather stations the greater the likelihood that the climate signals will be different. Most importantly for temperature anomaly calculations, over the twentieth century the number of weather stations increased dramatically. So it is more likely homogenisation will end up smoothing out local and sub-regional variations in temperature trends in the early twentieth century than in the later period. This is testable.

Why should this problem occur with expert scientists? Are they super beings who know the real temperature data, but have manufactured some falsehood? I think it is something much more prosaic. Those who work at the Australian BOM believe that the recent warming is human caused. In fact they believe that more than 100% of warming is human caused. When looking at outlier data records, or records that show inconsistencies there is a very human bias. Each time the data is reprocessed they find new inconsistencies, having previously corrected the data.

Kevin Marshall

Climatic Temperature Variations

In the previous post I identified that the standard definition of temperature homogenisation assumes that there are little or no variations in climatic trends within the homogenisation area. I also highlighted specific instances of where this assumption has failed. However, the examples may be just isolated and extreme instances, or there might be other, offsetting instances so the failures could cancel each other out without a systematic bias globally. Here I explore why this assumption should not be expected to hold anywhere, and how it may have biased the picture of recent warming. After a couple of proposals to test for this bias, I look at alternative scenarios that could bias the global average temperature anomalies. I concentrate on the land surface temperatures, though my comments may also have application to the sea surface temperature data sets.

 

Comparing Two Recent Warming Phases

An area that I am particularly interested in is the relative size of the early twentieth century warming compared to the more recent warming phase. This relative size, along with the explanations for those warming periods gives a route into determining how much of the recent warming was human caused. Dana Nuccitelli tried such an explanation at skepticalscience blog in 20111. Figure 1 shows the NASA Gistemp global anomaly in black along with a split be eight bands of latitude. Of note are the polar extremes, each covering 5% of the surface area. For the Arctic, the trough to peak of 1885-1940 is pretty much the same as the trough to peak from 1965 to present. But in the earlier period it is effectively cancelled out by the cooling in the Antarctic. This cooling, I found was likely caused by use of inappropriate proxy data from a single weather station3.

Figure 1. Gistemp global temperature anomalies by band of latitude2.

For the current issue, of particular note is the huge variation in trends by latitude from the global average derived from the homogenised land and sea surface data. Delving further, GISS provide some very useful maps of their homogenised and extrapolated data4. I compare two identical time lengths – 1944 against 1906-1940 and 2014 against 1976-2010. The selection criteria for the maps are in figure 2.

Figure 2. Selection criteria for the Gistemp maps.

Figure 3. Gistemp map representing the early twentieth surface warming phase for land data only.


Figure 4. Gistemp map representing the recent surface warming phase for land data only.

The later warming phase is almost twice the magnitude of, and has much the better coverage than, the earlier warming. That is 0.43oC against 0.24oC. In both cases the range of warming in the 250km grid cells is between -2oC and +4oC, but the variations are not the same. For instance, the most extreme warming in both periods is at the higher latitudes. But, with the respect to North America in the earlier period the most extreme warming is over the Northwest Territories of Canada, whilst in the later period the most extreme warming is over Western Alaska, with the Northwest Territories showing near average warming. In the United States, in the earlier period there is cooling over Western USA, whilst in the later period there is cooling over much of Central USA, and strong warming in California. In the USA, the coverage of temperature stations is quite good, at least compared with much of the Southern Hemisphere. Euan Mearns has looked at a number of areas in the Southern Hemisphere4, which he summarised on the map in Figure 5

Figure 5. Euan Mearns says of the above “S Hemisphere map showing the distribution of areas sampled. These have in general been chosen to avoid large centres of human population and prosperity.

For the current analysis Figure 6 is most relevant.

Figure 6. Euan Mearns’ says of the above “The distribution of operational stations from the group of 174 selected stations.

The temperature data for the earlier period is much sparser than for later period. Even where there is data available in the earlier period the temperature data could be based on a fifth of the number of temperature stations as the later period. This may exaggerate slightly the issue, as the coasts of South America and Eastern Australia are avoided.

An Hypothesis on the Homogenisation Impact

Now consider again the description of homogenisation Venema et al 20125, quoted in the previous post.

 

The most commonly used method to detect and remove the effects of artificial changes is the relative homogenization approach, which assumes that nearby stations are exposed to almost the same climate signal and that thus the differences between nearby stations can be utilized to detect inhomogeneities. In relative homogeneity testing, a candidate time series is compared to multiple surrounding stations either in a pairwise fashion or to a single composite reference time series computed for multiple nearby stations. (Italics mine)

 

The assumption of the same climate signal over the homogenisation will not apply where the temperature stations are thin on the ground. The degree to which homogenisation eliminates real world variations in trend could be, to some extent, inversely related to the density. Given that the density of temperature data points diminishes in most areas of the world rapidly when one goes back in time beyond 1960, homogenisation in the early warming period far more likely to be between climatically different temperature stations than in the later period. My hypothesis is that, relatively, homogenisation will reduce the early twentieth century warming phase compared the recent warming phase as in earlier period homogenisation will be over much larger areas with larger real climate variations within the homogenisation area.

Testing the Hypothesis

There are at least two ways that my hypothesis can be evaluated. Direct testing of information deficits is not possible.

First is to conduct temperature homogenisations on similar levels of actual data for the entire twentieth century. If done for a region, the actual data used in global temperature anomalies should be run for a region as well. This should show that the recent warming phase is post homogenisation is reduced with less data.

Second is to examine the relative size of adjustments to the availability of comparative data. This can be done in various ways. For instance, I quite like the examination of the Manaus Grid block record Roger Andrews did in a post The Worst of BEST6.

Counter Hypotheses

There are two counter hypotheses on temperature bias. These may undermine my own hypothesis.

First is the urbanisation bias. Euan Mearns in looking at temperature data of the Southern Hemisphere tried to avoid centres of population due to the data being biased. It is easy to surmise the lack of warming Mearns found in central Australia7 was lack of an urbanisation bias from the large cities on the coast. However, the GISS maps do not support this. Ronan and Michael Connolly8 of Global Warming Solved claim that the urbanisation bias in the global temperature data is roughly equivalent to the entire warming of the recent epoch. I am not sure that the urbanisation bias is so large, but even if it were, it could be complementary to my hypothesis based on trends.

Second is that homogenisation adjustments could be greater the more distant in past that they occur. It has been noted (Steve Goddard in particular) that each new set of GISS adjustments adjusts past data. The same data set used to test my hypothesis above could also be utilized to test this hypothesis, by conducting homogenisations runs on the data to date, then only to 2000, then to 1990 etc. It could be that the earlier warming trend is somehow suppressed by homogenizing the most recent data, then working backwards through a number of iterations, each one using the results of the previous pass. The impact on trends that operate over different time periods, but converge over longer periods, could magnify the divergence and thus cause differences in trends decades in the past to be magnified. As such differences in trend appear to the algorithm to be more anomalous than in reality they actually are.

Kevin Marshall

Notes

  1. Dana Nuccitelli – What caused early 20th Century warming? 24.03.2011
  2. Source http://data.giss.nasa.gov/gistemp/graphs_v3/
  3. See my post Base Orcadas as a Proxy for early Twentieth Century Antarctic Temperature Trends 24.05.2015
  4. Euan Mearns – The Hunt For Global Warming: Southern Hemisphere Summary 14.03.2015. Area studies are referenced on this post.
  5. Venema et al 2012 – Venema, V. K. C., Mestre, O., Aguilar, E., Auer, I., Guijarro, J. A., Domonkos, P., Vertacnik, G., Szentimrey, T., Stepanek, P., Zahradnicek, P., Viarre, J., Müller-Westermeier, G., Lakatos, M., Williams, C. N., Menne, M. J., Lindau, R., Rasol, D., Rustemeier, E., Kolokythas, K., Marinova, T., Andresen, L., Acquaotta, F., Fratianni, S., Cheval, S., Klancar, M., Brunetti, M., Gruber, C., Prohom Duran, M., Likso, T., Esteban, P., and Brandsma, T.: Benchmarking homogenization algorithms for monthly data, Clim. Past, 8, 89-115, doi:10.5194/cp-8-89-2012, 2012.
  6. Roger Andrews – The Worst of BEST 23.03.2015
  7. Euan Mearns – Temperature Adjustments in Australia 22.02.2015
  8. Ronan and Michael Connolly – Summary: “Urbanization bias” – Papers 1-3 05.12.2013


Defining “Temperature Homogenisation”

Summary

The standard definition of temperature homogenisation is of a process that cleanses the temperature data of measurement biases to only leave only variations caused by real climatic or weather variations. This is at odds with GHCN & GISS adjustments which delete some data and add in other data as part of the homogenisation process. A more general definition is to make the data more homogenous, for the purposes of creating regional and global average temperatures. This is only compatible with the standard definition if assume that there are no real data trends existing within the homogenisation area. From various studies it is clear that there are cases where this assumption does not hold good. The likely impacts include:-

  • Homogenised data for a particular temperature station will not be the cleansed data for that location. Instead it becomes a grid reference point, encompassing data from the surrounding area.
  • Different densities of temperature data may lead to different degrees to which homogenisation results in smoothing of real climatic fluctuations.

Whether or not this failure of understanding is limited to a number of isolated instances with a near zero impact on global temperature anomalies is an empirical matter that will be the subject of my next post.

Introduction

A common feature of many concepts involved with climatology, the associated policies and sociological analyses of non-believers, is a failure to clearly understand of the terms used. In the past few months it has become evident to me that this failure of understanding extends to term temperature homogenisation. In this post I look at the ambiguity of the standard definition against the actual practice of homogenising temperature data.

The Ambiguity of the Homogenisation Definition

The World Meteorological Organisation in its’ 2004 Guidelines on Climate Metadata and Homogenization1 wrote this explanation.

Climate data can provide a great deal of information about the atmospheric environment that impacts almost all aspects of human endeavour. For example, these data have been used to determine where to build homes by calculating the return periods of large floods, whether the length of the frost-free growing season in a region is increasing or decreasing, and the potential variability in demand for heating fuels. However, for these and other long-term climate analyses –particularly climate change analyses– to be accurate, the climate data used must be as homogeneous as possible. A homogeneous climate time series is defined as one where variations are caused only by variations in climate.

Unfortunately, most long-term climatological time series have been affected by a number of nonclimatic factors that make these data unrepresentative of the actual climate variation occurring over time. These factors include changes in: instruments, observing practices, station locations, formulae used to calculate means, and station environment. Some changes cause sharp discontinuities while other changes, particularly change in the environment around the station, can cause gradual biases in the data. All of these inhomogeneities can bias a time series and lead to misinterpretations of the studied climate. It is important, therefore, to remove the inhomogeneities or at least determine the possible error they may cause.

That is temperature homogenisation is necessary to isolate and remove what Steven Mosher has termed measurement biases2, from the real climate signal. But how does this isolation occur?

Venema et al 20123 states the issue more succinctly.

The most commonly used method to detect and remove the effects of artificial changes is the relative homogenization approach, which assumes that nearby stations are exposed to almost the same climate signal and that thus the differences between nearby stations can be utilized to detect inhomogeneities (Conrad and Pollak, 1950). In relative homogeneity testing, a candidate time series is compared to multiple surrounding stations either in a pairwise fashion or to a single composite reference time series computed for multiple nearby stations. (Italics mine)

Blogger …and Then There’s Physics (ATTP) partly recognizes these issues may exist in his stab at explaining temperature homogenisation4.

So, it all sounds easy. The problem is, we didn’t do this and – since we don’t have a time machine – we can’t go back and do it again properly. What we have is data from different countries and regions, of different qualities, covering different time periods, and with different amounts of accompanying information. It’s all we have, and we can’t do anything about this. What one has to do is look at the data for each site and see if there’s anything that doesn’t look right. We don’t expect the typical/average temperature at a given location at a given time of day to suddenly change. There’s no climatic reason why this should happen. Therefore, we’d expect the temperature data for a particular site to be continuous. If there is some discontinuity, you need to consider what to do. Ideally you look through the records to see if something happened. Maybe the sensor was moved. Maybe it was changed. Maybe the time of observation changed. If so, you can be confident that this explains the discontinuity, and so you adjust the data to make it continuous.

What if there isn’t a full record, or you can’t find any reason why the data may have been influenced by something non-climatic? Do you just leave it as is? Well, no, that would be silly. We don’t know of any climatic influence that can suddenly cause typical temperatures at a given location to suddenly increase or decrease. It’s much more likely that something non-climatic has influenced the data and, hence, the sensible thing to do is to adjust it to make the data continuous. (Italics mine)

The assumption of a nearby temperature stations have the same (or very similar) climatic signal, if true would mean that homogenisation would cleanse the data of the impurities of measurement biases. But there is only a cursory glance given to the data. For instance, when Kevin Cowtan gave an explanation of the fall in average temperatures at Puerto Casado neither he, nor anyone else, checked to see if the explanation stacked up beyond checking to see if there had been a documented station move at roughly that time. Yet the station move is at the end of the drop in temperatures, and a few minutes checking would have confirmed that other nearby stations exhibit very similar temperature falls5. If you have a preconceived view of how the data should be, then a superficial explanation that conforms to that preconception will be sufficient. If you accept the authority of experts over personally checking for yourself, then the claim by experts that there is not a problem is sufficient. Those with no experience of checking the outputs following processing of complex data will not appreciate the issues involved.

However, this definition of homogenisation appears to be different from that used by GHCN and NASA GISS. When Euan Mearns looked at temperature adjustments in the Southern Hemisphere and in the Arctic6, he found numerous examples in the GHCN and GISS homogenisations of infilling of some missing data and, to a greater extent, deleted huge chunks of temperature data. For example this graphic is Mearns’ spreadsheet of adjustments between GHCNv2 (raw data + adjustments) and the GHCNv3 (homogenised data) for 25 stations in Southern South America. The yellow cells are where V2 data exist V3 not; the greens cells V3 data exist where V2 data do not.

Definition of temperature homogenisation

A more general definition that encompasses the GHCN / GISS adjustments is of broadly making the data homogenous. It is not done by simply blending the data together and smoothing out the data. Homogenisation also adjusts anomalous data as a result of pairwise comparisons between local temperature stations, or in the case of extreme differences in the GHCN / GISS deletes the most anomalous data. This is a much looser and broader process than homogenisation of milk, or putting some food through a blender.

The definition I cover in more depth in the appendix.

The Consequences of Making Data Homogeneous

A consequence of cleansing the data in order to make it more homogenous gives a distinction that is missed by many. This is due to making the strong assumption that there are no climatic differences between the temperature stations in the homogenisation area.

Homogenisation is aimed at adjusting for the measurement biases to give a climatic reading for the location where the temperature station is located that is a closer approximation to what that reading would be without those biases. With the strong assumption, making the data homogenous is identical to removing the non-climatic inhomogeneities. Cleansed of these measurement biases the temperature data is then both the average temperature readings that would have been generated if the temperature station had been free of biases and a representative location for the area. This latter aspect is necessary to build up a global temperature anomaly, which is constructed through dividing the surface into a grid. Homogenisation, in the sense of making the data more homogenous by blending is an inappropriate term. All what is happening is adjusting for anomalies within the through comparisons with local temperature stations (the GHCN / GISS method) or comparisons with an expected regional average (the Berkeley Earth method).

But if the strong assumption does not hold, homogenisation will adjust these climate differences, and will to some extent fail to eliminate the measurement biases. Homogenisation is in fact made more necessary if movements in average temperatures are not the same and the spread of temperature data is spatially uneven. Then homogenisation needs to not only remove the anomalous data, but also make specific locations more representative of the surrounding area. This enables any imposed grid structure to create an estimated average for that area through averaging the homogenized temperature data sets within the grid area. As a consequence, the homogenised data for a temperature station will cease to be a closer approximation to what the thermometers would have read free of any measurement biases. As homogenisation is calculated by comparisons of temperature stations beyond those immediately adjacent, there will be, to some extent, influences of climatic changes beyond the local temperature stations. The consequences of climatic differences within the homogenisation area include the following.

  • The homogenised temperature data for a location could appear largely unrelated to the original data or to the data adjusted for known biases. This could explain the homogenised Reykjavik temperature, where Trausti Jonsson of the Icelandic Met Office, who had been working with the data for decades, could not understand the GHCN/GISS adjustments7.
  • The greater the density of temperature stations in relation to the climatic variations, the less that climatic variations will impact on the homogenisations, and the greater will be the removal of actual measurement biases. Climate variations are unlikely to be much of an issue with the Western European and United States data. But on the vast majority of the earth’s surface, whether land or sea, coverage is much sparser.
  • If the climatic variation at a location is of different magnitude to that of other locations in the homogenisation area, but over the same time periods and direction, then the data trends will be largely retained. For instance, in Svarlbard the warming temperature trends of the early twentieth century and from the late 1970s were much greater than elsewhere, so were adjusted downwards8.
  • If there are differences in the rate of temperature change, or the time periods for similar changes, then any “anomalous” data due to climatic differences at the location will be eliminated or severely adjusted, on the same basis as “anomalous” data due to measurement biases. For instance in large part of Paraguay at the end of the 1960s average temperatures by around 1oC. Due to this phenomena not occurring in the surrounding areas both the GHCN and Berkeley Earth homogenisation processes adjusted out this trend. As a consequence of this adjustment, a mid-twentieth century cooling in the area was effectively adjusted to out of the data9.
  • If a large proportion of temperature stations in a particular area have consistent measurement biases, then homogenisation will retain those biases, as it will not appear anomalous within the data. For instance, much of the extreme warming post 1950 in South Korea is likely to have been as a result of urbanization10.

Other Comments

Homogenisation is just part of the process of adjusting data for the twin purposes of attempting to correct for biases and building a regional and global temperature anomalies. It cannot, for instance, correct for time of observation biases (TOBS). This needs to be done prior to homogenisation. Neither will homogenisation build a global temperature anomaly. Extrapolating from the limited data coverage is a further process, whether for fixed temperature stations on land or the ship measurements used to calculate the ocean surface temperature anomalies. This extrapolation has further difficulties. For instance, in a previous post11 I covered a potential issue with the Gistemp proxy data for Antarctica prior to permanent bases being established on the continent in the 1950s. Making the data homogenous is but the middle part of a wider process.

Homogenisation is a complex process. The Venema et al 20123 paper on the benchmarking of homogenisation algorithms demonstrates that different algorithms produce significantly different results. What is clear from the original posts on the subject by Paul Homewood and the more detailed studies by Euan Mearns and Roger Andrews at Energy Matters, is that the whole process of going from the raw monthly temperature readings to the final global land surface average trends has thrown up some peculiarities. In order to determine whether they are isolated instances that have near zero impact on the overall picture, or point to more systematic biases that result from the points made above, it is necessary to understand the data available in relation to the overall global picture. That will be the subject of my next post.

Kevin Marshall

Notes

  1. GUIDELINES ON CLIMATE METADATA AND HOMOGENIZATION by Enric Aguilar, Inge Auer, Manola Brunet, Thomas C. Peterson and Jon Wieringa
  2. Steven Mosher – Guest post : Skeptics demand adjustments 09.02.2015
  3. Venema et al 2012 – Venema, V. K. C., Mestre, O., Aguilar, E., Auer, I., Guijarro, J. A., Domonkos, P., Vertacnik, G., Szentimrey, T., Stepanek, P., Zahradnicek, P., Viarre, J., Müller-Westermeier, G., Lakatos, M., Williams, C. N., Menne, M. J., Lindau, R., Rasol, D., Rustemeier, E., Kolokythas, K., Marinova, T., Andresen, L., Acquaotta, F., Fratianni, S., Cheval, S., Klancar, M., Brunetti, M., Gruber, C., Prohom Duran, M., Likso, T., Esteban, P., and Brandsma, T.: Benchmarking homogenization algorithms for monthly data, Clim. Past, 8, 89-115, doi:10.5194/cp-8-89-2012, 2012.
  4. …and Then There’s Physics – Temperature homogenisation 01.02.2015
  5. See my post Temperature Homogenization at Puerto Casado 03.05.2015
  6. For example

    The Hunt For Global Warming: Southern Hemisphere Summary

    Record Arctic Warmth – in 1937

  7. See my post Reykjavik Temperature Adjustments – a comparison 23.02.2015
  8. See my post RealClimate’s Mis-directions on Arctic Temperatures 03.03.2015
  9. See my post Is there a Homogenisation Bias in Paraguay’s Temperature Data? 02.08.2015
  10. NOT A LOT OF PEOPLE KNOW THAT (Paul Homewood) – UHI In South Korea Ignored By GISS 14.02.2015

Appendix – Definition of Temperature Homogenisation

When discussing temperature homogenisations, nobody asks what the term actual means. In my house we consume homogenised milk. This is the same as the pasteurized milk I drank as a child except for one aspect. As a child I used to compete with my siblings to be the first to open a new pint bottle, as it had the cream on top. The milk now does not have this cream, as it is blended in, or homogenized, with the rest of the milk. Temperature homogenizations are different, involving changes to figures, along with (at least with the GHCN/GISS data) filling the gaps in some places and removing data in others1.

But rather than note the differences, it is better to consult an authoritative source. From Dictionary.com, the definitions of homogenize are:-

verb (used with object), homogenized, homogenizing.

  1. to form by blending unlike elements; make homogeneous.
  2. to prepare an emulsion, as by reducing the size of the fat globules in (milk or cream) in order to distribute them equally throughout.
  3. to make uniform or similar, as in composition or function:

    to homogenize school systems.

  4. Metallurgy. to subject (metal) to high temperature to ensure uniform diffusion of components.

Applying the dictionary definitions, data homogenization in science is not about blending various elements together, nor about additions or subtractions from the data set, or adjusting the data. This is particularly true in chemistry.

For UHCN and NASA GISS temperature data homogenization involves removing or adjusting elements in the data that are markedly dissimilar from the rest. It can also mean infilling data that was never measured. The verb homogenize does not fit the processes at work here. This has led to some, like Paul Homewood, to refer to the process as data tampering or worse. A better idea is to look further at the dictionary.

Again from Dictionary.com, the first two definitions of the adjective homogeneous are:-

  1. composed of parts or elements that are all of the same kind; not heterogeneous:

a homogeneous population.

  1. of the same kind or nature; essentially alike.

I would suggest that temperature homogenization is a loose term for describing the process of making the data more homogeneous. That is for smoothing out the data in some way. A false analogy is when I make a vegetable soup. After cooking I end up with a stock containing lumps of potato, carrot, leeks etc. I put it through the blender to get an even constituency. I end up with the same weight of soup before and after. A similar process of getting the same after homogenization as before is clearly not what is happening to temperatures. The aim of making the data homogenous is both to remove anomalous data and blend the data together.

Base Orcadas as a Proxy for early Twentieth Century Antarctic Temperature Trends

Temperature trends vary greatly across different parts of the globe, an aspect that is not recognized when homogenizing temperatures. At a top level NASA GISS usefully split their global temperature anomaly into eight bands of latitude. I have graphed the five year moving averages for each band, along with the Gistemp global anomaly in Figure 1.

Figure 1. Gistemp global temperature anomalies by band of latitude.

The biggest oddity is the 64S-90S band. This bottom slice of the globe roughly equates to Antarctica, which is South of 66°34′S. Not only was there massive cooling until 1930 – in contradiction to the global trend – but prior to the 1970 was very large volatility in temperatures, despite my using five year moving averages. Looking at the GHCN database of weather stations, there none listed in Antarctica until Rothera point started collecting data in 1946, as shown in Figure 21.

Figure 2. A selection of temperature anomalies in the Antarctica. The most numerous are either on the Antarctic Pennisula, or the islands just to the North.

The only long record is at Base Orcadas located at (60.8 S 44.7 W). I have graphed the GISS homogenised temperature anomaly data for station 701889680000 with the Gistemp 64S-90S band in Figure 3.

Figure 3. Gistemp 64S-90S annual temperature anomaly compared to Base Orcadas GISS homogenised data.

There is a remarkable similarity in the data sets until 1950, after which they appear unrelated. This suggests that in the absence of other data, Base Orcadas was the principle element in creating a proxy for the missing Antarctic data, despite it being located outside the area, and not being related to the actual data for well over half a century. The outcome is to bias the overall global temperature anomaly by suppressing the early twentieth century warming, making the late twentieth century warming appear relatively greater than is the underlying reality2. The error is due to assuming that temperature trends are the same at different latitudes are the same, an assumption that the homogenised data shows to be false.

Kevin Marshall

 

Notes

  1. Also in Antarctica (but not listed) there has been data collected at Amundsen-Scot base at the South Pole (90.0 S 0.0 E) since 1957, and at Vostok base (78.5 S 106.9 E) since 1958.
  2. Removing the Antarctic data would increase both the early twentieth century and post 1975 warming periods. But, given that 64S-90S is 5% of the global surface area, I estimate it would increase the earlier warming trends by 5-10% as against 1-3% for the later trend.


Temperature Homogenization at Puerto Casado

Summary

The temperature homogenizations for the Paraguay data within both the BEST and UHCN/Gistemp surface temperature data sets points to a potential flaw within the temperature homogenization process. It removes real, but localized, temperature variations, creating incorrect temperature trends. In the case of Paraguay from 1955 to 1980, a cooling trend is turned into a warming trend. Whether this biases the overall temperature anomalies, or our understanding of climate variation, remains to be explored.

 

A small place in Mid-Paraguay, on the Brazil/Paraguay border has become the centre of focus of the argument on temperature homogenizations.

For instance here is Dr Kevin Cowtan, of the Department of Chemistry at the University of York, explaining the BEST adjustments at Puerto Casado.

Cowtan explains at 6.40

In a previous video we looked at a station in Paraguay, Puerto Casado. Here is the Berkeley Earth data for that station. Again the difference between the station record and the regional average shows very clear jumps. In this case there are documented station moves corresponding to the two jumps. There may be another small change here that wasn’t picked up. The picture for this station is actually fairly clear.

The first of these “jumps” was a fall in the late 1960s of about 1oC. Figure 1 expands the section of the Berkeley Earth graph from the video, to emphasise this change.

Figure 1 – Berkeley Earth Temperature Anomaly graph for Puerto Casado, with expanded section showing the fall in temperature and against the estimated mean station bias.

The station move is after the fall in temperature.

Shub Niggareth looked at the metadata on the actual station move concluding

IT MOVED BECAUSE THERE IS CHANGE AND THERE IS A CHANGE BECAUSE IT MOVED

That is the evidence of the station move was vague. The major evidence was the fall in temperatures. Alternative evidence is that there were a number of other stations in the area exhibiting similar patterns.

But maybe there was some, unknown, measurement bias (to use Steven Mosher’s term) that would make this data stand out from the rest? I have previously looked eight temperature stations in Paraguay with respect to the NASA Gistemp and UHCN adjustments. The BEST adjustments for the stations, along another in Paul Homewood’s original post, are summarized in Figure 2 for the late 1960s and early 1970s. All eight have similar downward adjustment that I estimate as being between 0.8 to 1.2oC. The first six have a single adjustment. Asuncion Airport and San Juan Bautista have multiple adjustments in the period. Pedro Juan CA was of very poor data quality due to many gaps (see GHCNv2 graph of the raw data) hence the reason for exclusion.

GHCN Name

GHCN Location

BEST Ref

Break Type

Break Year

 

Concepcion

23.4 S,57.3 W

157453

Empirical

1969

 

Encarcion

27.3 S,55.8 W

157439

Empirical

1968

 

Mariscal

22.0 S,60.6 W

157456

Empirical

1970

 

Pilar

26.9 S,58.3 W

157441

Empirical

1967

 

Puerto Casado

22.3 S,57.9 W

157455

Station Move

1971

 

San Juan Baut

26.7 S,57.1 W

157442

Empirical

1970

 

Asuncion Aero

25.3 S,57.6 W

157448

Empirical

1969

 

  

  

  

Station Move

1972

 

  

  

  

Station Move

1973

 

San Juan Bautista

25.8 S,56.3 W

157444

Empirical

1965

 

  

  

  

Empirical

1967

 

  

  

  

Station Move

1971

 

Pedro Juan CA

22.6 S,55.6 W

19469

Empirical

1968

 

  

  

  

Empirical

3 in 1970s

 
           

Figure 2 – Temperature stations used in previous post on Paraguayan Temperature Homogenisations

 

Why would both BEST and UHCN remove a consistent pattern covering and area of around 200,000 km2? The first reason, as Roger Andrews has found, the temperature fall was confined to Paraguay. The second reason is suggested by the UHCNv2 raw data1 shown in figure 3.

Figure 3 – UHCNv2 “raw data” mean annual temperature anomalies for eight Paraguayan temperature stations, with mean of 1970-1979=0.

There was an average temperature fall across these eight temperature stations of about half a degree from 1967 to 1970, and over one degree by the mid-1970s. But it was not at the same time. The consistency is only show by the periods before and after as the data sets do not diverge. Any homogenisation program would see that for each year or month for every data set, the readings were out of line with all the other data sets. Now maybe it was simply data noise, or maybe there is some unknown change, but it is clearly present in the data. But temperature homogenisation should just smooth this out. Instead it cools the past. Figure 4 shows the impact average change resulting from the UHCN and NASA GISS homogenisations.

Figure 4 – UHCNv2 “raw data” and NASA GISS Homogenized average temperature anomalies, with the net adjustment.

A cooling trend for the period 1955-1980 has been turned into a warming trend due to the flaw in homogenization procedures.

The Paraguayan data on its own does not impact on the global land surface temperature as it is a tiny area. Further it might be an isolated incident or offset by incidences of understating the warming trend. But what if there are smaller micro climates that are only picked up by one or two temperature stations? Consider figure 5 which looks at the BEST adjustments for Encarnacion, one of the eight Paraguayan stations.

Figure 5 – BEST adjustment for Encarnacion.

There is the empirical break in 1968 from the table above, but also empirical breaks in the 1981 and 1991 that look to be exactly opposite. What Berkeley earth call the “estimated station mean bias” is as a result of actual deviations in the real data. Homogenisation eliminates much of the richness and diversity in the real world data. The question is whether this happens consistently. First we need to understand the term “temperature homogenization“.

Kevin Marshall

Notes

  1. The UHCNv2 “raw” data is more accurately pre-homogenized data. That is the raw data with some adjustments.

RealClimate’s Mis-directions on Arctic Temperatures

Summary

Real Climate attempted to rebut the claims that the GISS temperature data is corrupted with unjustified adjustments by

  • Attacking the commentary of Christopher Booker, not the primary source of the allegations.
  • Referring readers instead to a dogmatic source who claims that only 3 stations are affected, something clearly contradicted by Booker and the primary source.
  • Alleging that the complaints are solely about cooling the past, uses a single counter example for Svarlbard of a GISS adjustment excessively warming the past compared to the author’s own adjustments.
  • However, compared to the raw data, the author’s adjustments, based on local knowledge were smaller than GISS, showing the GISS adjustments to be unjustified. But the adjustments bring the massive warming trend into line with (the still large) Reykjavik trend.
  • Examination of the site reveals that the Stevenson screen at Svarlbard airport is right beside the tarmac of the runway, with the heat from planes and the heat from snow-clearing likely affecting measurements. With increasing use of the airport over the last twenty years, it is likely the raw data trend should be reduced, but at an increasing adjustment trend, not decreasing.
  • Further, data from a nearby temperature station at Isfjord Radio reveals that the early twentieth century warming on Spitzbergen may have been more rapid and of greater magnitude. GISS Adjustments reduce that trend by up to 4 degrees, compared with just 1.7 degrees for the late twentieth century warming.
  • Questions arise how raw data for Isfjord Radio could be available for 22 years before the station was established, and how the weather station managed to keep on recording “raw data” between the weather station being destroyed and abandoned in 1941 and being re-opened in 1946.

Introduction

In climate I am used to mis-directions and turning, but in this post I may have found the largest temperature adjustments to date.

In early February, RealClimate – the blog of the climate science consensus – had an article attacking Christopher Booker in the Telegraph. It had strong similarities the methods used by anonymous blogger ….andthentheresphysics. In a previous post I provided a diagram to illustrate ATTP’s methods.


One would expect that a blog supported by the core of the climate scientific consensus would provide a superior defence than an anonymous blogger who censors views that challenge his beliefs. However, RealClimate may have dug an even deeper hole. Paul Homewood covered the article on February 12th, but I feel it only scratched the surface. Using the procedures outlined above I note similarities include:-

  • Attacking the secondary commentary, and not mentioning the primary sources.
  • Misleading statements that understate the extent of the problem.
  • Avoiding comparison of the raw and adjusted data.
  • Single counter examples that do not stand up.

Attacking the secondary commentary

Like ATTP, RealClimate attacked the same secondary source – Christopher Booker – but another article. True academics would have referred Paul Homewood, the source of the allegations.

Misleading statement about number of weather stations

The article referred to was by Victor Venema of Variable Variability. The revised title is “Climatologists have manipulated data to REDUCE global warming“, but the original title can be found from the link address – http://variable-variability.blogspot.de/2015/02/evil-nazi-communist-world-government.html

It was published on 10th February and only refers to Christopher Booker’s original article in the Telegraph article of 24th January without mentioning the author or linking. After quoting from the article Venema states:-

Three, I repeat: 3 stations. For comparison, global temperature collections contain thousands of stations. ……

Booker’s follow-up article of 7th February states:-

Following my last article, Homewood checked a swathe of other South American weather stations around the original three. ……

Homewood has now turned his attention to the weather stations across much of the Arctic, between Canada (51 degrees W) and the heart of Siberia (87 degrees E). Again, in nearly every case, the same one-way adjustments have been made, to show warming up to 1 degree C or more higher than was indicated by the data that was actually recorded.

My diagram above was published on the 8th February, and counted 29 stations. Paul Homewood’s original article on the Arctic of 4th February lists 19 adjusted sites. If RealClimate had actually read the cited article, they would have known that quotation was false in connection to the Arctic. Any undergraduate who made this mistake in an essay would be failed.

Misleading Counter-arguments

Øyvind Nordli – the Real Climate author – provides a counter example from his own research. He compares his adjustments of the Svalbard, (which he did as part of temperature reconstruction for Spitzbergen last year) with those of NASA GISS.

Clearly, he is right in pointing out that his adjustments created a lower warming trend than those of GISS.

I checked the “raw data” with the “GISS Homogenised” for Svalbard and compare with the Reykjavik data I looked at last week, as the raw data is not part of the comparison. To make them comparable, I created anomalies based on the raw data average of 2000-2009. I have also used a 5 year centered moving average.

The raw data is in dark, the adjusted data in light. For Reykjavik prior to 1970 the peaks in the data have been clearly constrained, making the warming since 1980 appear far more significant. For the much shorter Svalbard data the total adjustments from GHCN and GISS reduce the warming trend by a full 1.7oC, bringing the warming trend into line with the largely unadjusted Reykjavik. The GHCN & GISS seem to be adjusted to a pre-conceived view of what the data should look like. What Nordli et. al have effectively done is to restore the trend present in the raw data. So Nordli et al, using data on the ground, has effectively reached a similar conclusion to Trausti Jonsson of the Iceland Met Office. The adjustments made thousands of miles away in the United States by homogenization algorithms are massive and unjustified. It just so happens that in this case it is in the opposite direction to cooling the past. I find it somewhat odd Øyvind Nordli, an expert on local conditions, should not challenge these adjustments but choose to give the opposite impression.

What is even worse is that there might be a legitimate reason to adjust downwards the recent warming. In 2010, Anthony Watts looked at the citing of the weather station at Svalbard Airport. Photographs show it to right beside the runway. With frequent snow, steam de-icers will regularly pass, along with planes with hot exhausts. The case is there for a downward adjustment over the whole of the series, with an increasing trend to reflect the increasing aircraft movements. Tourism quintupled between 1991 and 2008. In addition, the University Centre in Svalbad founded in 1993 now has 500 students.

Older data for Spitzbergen

Maybe the phenomenal warming in the raw data for Svarlbard is unprecedented, despite some doubts about the adjustments. Nordli et al 2014 is titled Long-term temperature trends and variability on Spitsbergen: the extended Svalbard Airport temperature series, 1898-2012. Is a study that gathers together all the available data from Spitzbergen, aiming to create a composite temperature record from fragmentary records from a number of places around the Islands. From NASA GISS, I can only find Isfjord Radio for the earlier period. It is about 50km west of Svarlbard, so should give a similar shape of temperature anomaly. According to Nordli et al

Isfjord Radio. The station was established on 1 September 1934 and situated on Kapp Linne´ at the mouth of Isfjorden (Fig. 1). It was destroyed by actions of war in September 1941 but re-established at the same place in July 1946. From 30 June 1976 onwards, the station was no longer used for climatological purposes.

But NASA GISS has data from 1912, twenty-two years prior to the station citing, as does Berkeley Earth. I calculated a relative anomaly to Reykjavik based on 1930-1939 averages, and added the Isfjord Radio figures to the graph.

The portion of the raw data for Isfjord Radio, which seems to have been recorded before any thermometer was available, shows a full 5oC rise in the 5 year moving average temperature. The anomaly for 1917 was -7.8oC, compared with 0.6 oC in 1934 and 1.0 oC in 1938. For Svarlbard Airport lowest anomalies are -4.5 oC in 1976 and -4.7 oC in 1988. The peak year is 2.4 oC in 2006, followed by 1.5 oC in 2007. The total GHCNv3 and GISS adjustments are also of a different order. At the start of the Svarlbard series every month was adjusted up by 1.7. The Isfjord Radio 1917 data was adjusted up by 4.0 oC on average, and 1918 by 3.5 oC. February of 1916 & 1918 have been adjusted upwards by 5.4 oC.

So the Spitzbergen warming the trough to peak warming of 1917 to 1934 may have been more rapid and greater than in magnitude that the similar warming from 1976 to 2006. But from the adjusted data one gets the opposite conclusion.

Also we find from Nordli at al

During the Second World War, and also during five winters in the period 18981911, no observations were made in Svalbard, so the only possibility for filling data gaps is by interpolation.

The latest any data recording could have been made was mid-1941, and the island was not reoccupied for peaceful purposes until 1946. The “raw” GHCN data is actually infill. If it followed the pattern of Reykjavik – likely the nearest recording station – temperatures would have peaked during the Second World War, not fallen.

Conclusion

Real Climate should review their articles better. You cannot rebut an enlarging problem by referring to out-of-date and dogmatic sources. You cannot pretend that unjustified temperature adjustments in one direction are somehow made right by unjustified temperature adjustments in another direction. Spitzbergen is not only cold, it clearly experiences vast and rapid fluctuations in average temperatures. Any trend is tiny compared to these fluctuations.

Is there a Homogenisation Bias in Paraguay’s Temperature Data?

Last month Paul Homewood at Notalotofpeopleknowthat looked at the temperature data for Paraguay. His original aim was to explain the GISS claims of 2014 being the hottest year.

One of the regions that has contributed to GISS’ “hottest ever year” is South America, particularly Brazil, Paraguay and the northern part of Argentina. In reality, much of this is fabricated, as they have no stations anywhere near much of this area…

….there does appear to be a warm patch covering Paraguay and its close environs. However, when we look more closely, we find things are not quite as they seem.

In “Massive Tampering With Temperatures In South America“, Homewood looked at the “three genuinely rural stations in Paraguay that are currently operating – Puerto Casado, Mariscal and San Juan.” A few days later in “All Of Paraguay’s Temperature Record Has Been Tampered With“, he looked at remaining six stations.

After identifying that all of the three rural stations currently operational in Paraguay had had huge warming adjustments made to their data since the 1950’s, I tended to assume that they had been homogenised against some of the nearby urban stations. Ones like Asuncion Airport, which shows steady warming since the mid 20thC. When I went back to check the raw data, it turns out all of the urban sites had been tampered with in just the same way as the rural ones.

What Homewood does not do is to check the data behind the graphs, to quantify the extent of the adjustment. This is the aim of the current post.

Warning – This post includes a lot of graphs to explain how I obtained my results.

Homewood uses comparisons of two graphs, which he helpful provides the links to. The raw GHCN data + UHSHCN corrections is available here up until 2011 only. The current after GISS homogeneity adjustment data is available here.

For all nine data sets that I downloaded both the raw and homogenised data. By simple subtraction I found the differences. In any one year, they are mostly the same for each month. But for clarity I selected a single month – October – the month of my wife’s birthday.

For the Encarnacion (27.3 S,55.8 W) data sets the adjustments are as follows.

In 1967 the adjustment was -1.3C, in 1968 +0.1C. There is cooling of the past.

The average adjustments for all nine data sets is as follows.

This pattern is broadly consistent across all data sets. These are the maximum and minimum adjustments.

However, this issue is clouded by the special adjustments required for the Pedro Juan CA data set. The raw data set has been patched from four separate files,

Removing does not affect the average picture.

But does affect the maximum and minimum adjustments. This is shows the consistency in the adjustment pattern.

The data sets are incomplete. Before 1941 there is only one data set – Ascuncion Aero. The count for October each year is as follows.

In recent years there are huge gaps in the data, but for the late 1960s when the massive switch in adjustments took place, there are six or seven pairs of raw and adjusted data.

Paul Homewood’s allegation that the past has been cooled is confirmed. However, it does not give a full understanding of the impact on the reported data. To assist, for the full year mean data, I have created temperature anomalies based on the average anomaly in that year.

The raw data shows a significant cooling of up to 1oC in the late 1960s. If anything there has been over-compensation in the adjustments. Since 1970, any warming in the adjusted data has been through further adjustments.

Is this evidence of a conspiracy to “hide a decline” in Paraguayan temperatures? I think not. My alternative hypothesis is that this decline, consistent over a number of thermometers is unexpected. Anybody looking at just one of these data sets recently, would assume that the step change in 40-year-old data from a distant third world country is bound to be incorrect. (Shub has a valid point) That change goes against the known warming trend for over a century from the global temperature data sets and the near stationary temperatures from 1950-1975. More importantly cooling goes against the “known” major driver of temperature recent change – rises in greenhouse gas levels. Do you trust some likely ropey instrument data, or trust your accumulated knowledge of the world? The clear answer is that the instruments are wrong. Homogenisation is then not to local instruments in the surrounding areas, but to the established expert wisdom of the world. The consequent adjustment cools past temperatures by one degree. The twentieth century warming is enhanced as a consequence of not believing what the instruments are telling you. The problem is that this step change is replicated over a number of stations. Paul Homewood had shown that it probably extends into Bolivia as well.

But what happens if the converse happens? What if there is a step rise in some ropey data set from the 1970s and 1980s? This might be large, but not inconsitent with what is known about the world. It is unlikely to be adjusted downwards. So if there have been local or regional step changes in average temperature over time both up and down, the impact will be to increase the rate of warming if the data analysts believe that the world is warming and human beings are the cause of it.

Further analysis is required to determine the extent of the problem – but not from this unpaid blogger giving up my weekends and evenings.

Kevin Marshall

All first time comments are moderated. Please also use the comments as a point of contact, stating clearly that this is the case and I will not click the publish button, subject to it not being abusive. I welcome other points of view, though may give a robust answer.