Defining “Temperature Homogenisation”

Summary

The standard definition of temperature homogenisation is of a process that cleanses the temperature data of measurement biases to only leave only variations caused by real climatic or weather variations. This is at odds with GHCN & GISS adjustments which delete some data and add in other data as part of the homogenisation process. A more general definition is to make the data more homogenous, for the purposes of creating regional and global average temperatures. This is only compatible with the standard definition if assume that there are no real data trends existing within the homogenisation area. From various studies it is clear that there are cases where this assumption does not hold good. The likely impacts include:-

  • Homogenised data for a particular temperature station will not be the cleansed data for that location. Instead it becomes a grid reference point, encompassing data from the surrounding area.
  • Different densities of temperature data may lead to different degrees to which homogenisation results in smoothing of real climatic fluctuations.

Whether or not this failure of understanding is limited to a number of isolated instances with a near zero impact on global temperature anomalies is an empirical matter that will be the subject of my next post.

Introduction

A common feature of many concepts involved with climatology, the associated policies and sociological analyses of non-believers, is a failure to clearly understand of the terms used. In the past few months it has become evident to me that this failure of understanding extends to term temperature homogenisation. In this post I look at the ambiguity of the standard definition against the actual practice of homogenising temperature data.

The Ambiguity of the Homogenisation Definition

The World Meteorological Organisation in its’ 2004 Guidelines on Climate Metadata and Homogenization1 wrote this explanation.

Climate data can provide a great deal of information about the atmospheric environment that impacts almost all aspects of human endeavour. For example, these data have been used to determine where to build homes by calculating the return periods of large floods, whether the length of the frost-free growing season in a region is increasing or decreasing, and the potential variability in demand for heating fuels. However, for these and other long-term climate analyses –particularly climate change analyses– to be accurate, the climate data used must be as homogeneous as possible. A homogeneous climate time series is defined as one where variations are caused only by variations in climate.

Unfortunately, most long-term climatological time series have been affected by a number of nonclimatic factors that make these data unrepresentative of the actual climate variation occurring over time. These factors include changes in: instruments, observing practices, station locations, formulae used to calculate means, and station environment. Some changes cause sharp discontinuities while other changes, particularly change in the environment around the station, can cause gradual biases in the data. All of these inhomogeneities can bias a time series and lead to misinterpretations of the studied climate. It is important, therefore, to remove the inhomogeneities or at least determine the possible error they may cause.

That is temperature homogenisation is necessary to isolate and remove what Steven Mosher has termed measurement biases2, from the real climate signal. But how does this isolation occur?

Venema et al 20123 states the issue more succinctly.

The most commonly used method to detect and remove the effects of artificial changes is the relative homogenization approach, which assumes that nearby stations are exposed to almost the same climate signal and that thus the differences between nearby stations can be utilized to detect inhomogeneities (Conrad and Pollak, 1950). In relative homogeneity testing, a candidate time series is compared to multiple surrounding stations either in a pairwise fashion or to a single composite reference time series computed for multiple nearby stations. (Italics mine)

Blogger …and Then There’s Physics (ATTP) partly recognizes these issues may exist in his stab at explaining temperature homogenisation4.

So, it all sounds easy. The problem is, we didn’t do this and – since we don’t have a time machine – we can’t go back and do it again properly. What we have is data from different countries and regions, of different qualities, covering different time periods, and with different amounts of accompanying information. It’s all we have, and we can’t do anything about this. What one has to do is look at the data for each site and see if there’s anything that doesn’t look right. We don’t expect the typical/average temperature at a given location at a given time of day to suddenly change. There’s no climatic reason why this should happen. Therefore, we’d expect the temperature data for a particular site to be continuous. If there is some discontinuity, you need to consider what to do. Ideally you look through the records to see if something happened. Maybe the sensor was moved. Maybe it was changed. Maybe the time of observation changed. If so, you can be confident that this explains the discontinuity, and so you adjust the data to make it continuous.

What if there isn’t a full record, or you can’t find any reason why the data may have been influenced by something non-climatic? Do you just leave it as is? Well, no, that would be silly. We don’t know of any climatic influence that can suddenly cause typical temperatures at a given location to suddenly increase or decrease. It’s much more likely that something non-climatic has influenced the data and, hence, the sensible thing to do is to adjust it to make the data continuous. (Italics mine)

The assumption of a nearby temperature stations have the same (or very similar) climatic signal, if true would mean that homogenisation would cleanse the data of the impurities of measurement biases. But there is only a cursory glance given to the data. For instance, when Kevin Cowtan gave an explanation of the fall in average temperatures at Puerto Casado neither he, nor anyone else, checked to see if the explanation stacked up beyond checking to see if there had been a documented station move at roughly that time. Yet the station move is at the end of the drop in temperatures, and a few minutes checking would have confirmed that other nearby stations exhibit very similar temperature falls5. If you have a preconceived view of how the data should be, then a superficial explanation that conforms to that preconception will be sufficient. If you accept the authority of experts over personally checking for yourself, then the claim by experts that there is not a problem is sufficient. Those with no experience of checking the outputs following processing of complex data will not appreciate the issues involved.

However, this definition of homogenisation appears to be different from that used by GHCN and NASA GISS. When Euan Mearns looked at temperature adjustments in the Southern Hemisphere and in the Arctic6, he found numerous examples in the GHCN and GISS homogenisations of infilling of some missing data and, to a greater extent, deleted huge chunks of temperature data. For example this graphic is Mearns’ spreadsheet of adjustments between GHCNv2 (raw data + adjustments) and the GHCNv3 (homogenised data) for 25 stations in Southern South America. The yellow cells are where V2 data exist V3 not; the greens cells V3 data exist where V2 data do not.

Definition of temperature homogenisation

A more general definition that encompasses the GHCN / GISS adjustments is of broadly making the data homogenous. It is not done by simply blending the data together and smoothing out the data. Homogenisation also adjusts anomalous data as a result of pairwise comparisons between local temperature stations, or in the case of extreme differences in the GHCN / GISS deletes the most anomalous data. This is a much looser and broader process than homogenisation of milk, or putting some food through a blender.

The definition I cover in more depth in the appendix.

The Consequences of Making Data Homogeneous

A consequence of cleansing the data in order to make it more homogenous gives a distinction that is missed by many. This is due to making the strong assumption that there are no climatic differences between the temperature stations in the homogenisation area.

Homogenisation is aimed at adjusting for the measurement biases to give a climatic reading for the location where the temperature station is located that is a closer approximation to what that reading would be without those biases. With the strong assumption, making the data homogenous is identical to removing the non-climatic inhomogeneities. Cleansed of these measurement biases the temperature data is then both the average temperature readings that would have been generated if the temperature station had been free of biases and a representative location for the area. This latter aspect is necessary to build up a global temperature anomaly, which is constructed through dividing the surface into a grid. Homogenisation, in the sense of making the data more homogenous by blending is an inappropriate term. All what is happening is adjusting for anomalies within the through comparisons with local temperature stations (the GHCN / GISS method) or comparisons with an expected regional average (the Berkeley Earth method).

But if the strong assumption does not hold, homogenisation will adjust these climate differences, and will to some extent fail to eliminate the measurement biases. Homogenisation is in fact made more necessary if movements in average temperatures are not the same and the spread of temperature data is spatially uneven. Then homogenisation needs to not only remove the anomalous data, but also make specific locations more representative of the surrounding area. This enables any imposed grid structure to create an estimated average for that area through averaging the homogenized temperature data sets within the grid area. As a consequence, the homogenised data for a temperature station will cease to be a closer approximation to what the thermometers would have read free of any measurement biases. As homogenisation is calculated by comparisons of temperature stations beyond those immediately adjacent, there will be, to some extent, influences of climatic changes beyond the local temperature stations. The consequences of climatic differences within the homogenisation area include the following.

  • The homogenised temperature data for a location could appear largely unrelated to the original data or to the data adjusted for known biases. This could explain the homogenised Reykjavik temperature, where Trausti Jonsson of the Icelandic Met Office, who had been working with the data for decades, could not understand the GHCN/GISS adjustments7.
  • The greater the density of temperature stations in relation to the climatic variations, the less that climatic variations will impact on the homogenisations, and the greater will be the removal of actual measurement biases. Climate variations are unlikely to be much of an issue with the Western European and United States data. But on the vast majority of the earth’s surface, whether land or sea, coverage is much sparser.
  • If the climatic variation at a location is of different magnitude to that of other locations in the homogenisation area, but over the same time periods and direction, then the data trends will be largely retained. For instance, in Svarlbard the warming temperature trends of the early twentieth century and from the late 1970s were much greater than elsewhere, so were adjusted downwards8.
  • If there are differences in the rate of temperature change, or the time periods for similar changes, then any “anomalous” data due to climatic differences at the location will be eliminated or severely adjusted, on the same basis as “anomalous” data due to measurement biases. For instance in large part of Paraguay at the end of the 1960s average temperatures by around 1oC. Due to this phenomena not occurring in the surrounding areas both the GHCN and Berkeley Earth homogenisation processes adjusted out this trend. As a consequence of this adjustment, a mid-twentieth century cooling in the area was effectively adjusted to out of the data9.
  • If a large proportion of temperature stations in a particular area have consistent measurement biases, then homogenisation will retain those biases, as it will not appear anomalous within the data. For instance, much of the extreme warming post 1950 in South Korea is likely to have been as a result of urbanization10.

Other Comments

Homogenisation is just part of the process of adjusting data for the twin purposes of attempting to correct for biases and building a regional and global temperature anomalies. It cannot, for instance, correct for time of observation biases (TOBS). This needs to be done prior to homogenisation. Neither will homogenisation build a global temperature anomaly. Extrapolating from the limited data coverage is a further process, whether for fixed temperature stations on land or the ship measurements used to calculate the ocean surface temperature anomalies. This extrapolation has further difficulties. For instance, in a previous post11 I covered a potential issue with the Gistemp proxy data for Antarctica prior to permanent bases being established on the continent in the 1950s. Making the data homogenous is but the middle part of a wider process.

Homogenisation is a complex process. The Venema et al 20123 paper on the benchmarking of homogenisation algorithms demonstrates that different algorithms produce significantly different results. What is clear from the original posts on the subject by Paul Homewood and the more detailed studies by Euan Mearns and Roger Andrews at Energy Matters, is that the whole process of going from the raw monthly temperature readings to the final global land surface average trends has thrown up some peculiarities. In order to determine whether they are isolated instances that have near zero impact on the overall picture, or point to more systematic biases that result from the points made above, it is necessary to understand the data available in relation to the overall global picture. That will be the subject of my next post.

Kevin Marshall

Notes

  1. GUIDELINES ON CLIMATE METADATA AND HOMOGENIZATION by Enric Aguilar, Inge Auer, Manola Brunet, Thomas C. Peterson and Jon Wieringa
  2. Steven Mosher – Guest post : Skeptics demand adjustments 09.02.2015
  3. Venema et al 2012 – Venema, V. K. C., Mestre, O., Aguilar, E., Auer, I., Guijarro, J. A., Domonkos, P., Vertacnik, G., Szentimrey, T., Stepanek, P., Zahradnicek, P., Viarre, J., Müller-Westermeier, G., Lakatos, M., Williams, C. N., Menne, M. J., Lindau, R., Rasol, D., Rustemeier, E., Kolokythas, K., Marinova, T., Andresen, L., Acquaotta, F., Fratianni, S., Cheval, S., Klancar, M., Brunetti, M., Gruber, C., Prohom Duran, M., Likso, T., Esteban, P., and Brandsma, T.: Benchmarking homogenization algorithms for monthly data, Clim. Past, 8, 89-115, doi:10.5194/cp-8-89-2012, 2012.
  4. …and Then There’s Physics – Temperature homogenisation 01.02.2015
  5. See my post Temperature Homogenization at Puerto Casado 03.05.2015
  6. For example

    The Hunt For Global Warming: Southern Hemisphere Summary

    Record Arctic Warmth – in 1937

  7. See my post Reykjavik Temperature Adjustments – a comparison 23.02.2015
  8. See my post RealClimate’s Mis-directions on Arctic Temperatures 03.03.2015
  9. See my post Is there a Homogenisation Bias in Paraguay’s Temperature Data? 02.08.2015
  10. NOT A LOT OF PEOPLE KNOW THAT (Paul Homewood) – UHI In South Korea Ignored By GISS 14.02.2015

Appendix – Definition of Temperature Homogenisation

When discussing temperature homogenisations, nobody asks what the term actual means. In my house we consume homogenised milk. This is the same as the pasteurized milk I drank as a child except for one aspect. As a child I used to compete with my siblings to be the first to open a new pint bottle, as it had the cream on top. The milk now does not have this cream, as it is blended in, or homogenized, with the rest of the milk. Temperature homogenizations are different, involving changes to figures, along with (at least with the GHCN/GISS data) filling the gaps in some places and removing data in others1.

But rather than note the differences, it is better to consult an authoritative source. From Dictionary.com, the definitions of homogenize are:-

verb (used with object), homogenized, homogenizing.

  1. to form by blending unlike elements; make homogeneous.
  2. to prepare an emulsion, as by reducing the size of the fat globules in (milk or cream) in order to distribute them equally throughout.
  3. to make uniform or similar, as in composition or function:

    to homogenize school systems.

  4. Metallurgy. to subject (metal) to high temperature to ensure uniform diffusion of components.

Applying the dictionary definitions, data homogenization in science is not about blending various elements together, nor about additions or subtractions from the data set, or adjusting the data. This is particularly true in chemistry.

For UHCN and NASA GISS temperature data homogenization involves removing or adjusting elements in the data that are markedly dissimilar from the rest. It can also mean infilling data that was never measured. The verb homogenize does not fit the processes at work here. This has led to some, like Paul Homewood, to refer to the process as data tampering or worse. A better idea is to look further at the dictionary.

Again from Dictionary.com, the first two definitions of the adjective homogeneous are:-

  1. composed of parts or elements that are all of the same kind; not heterogeneous:

a homogeneous population.

  1. of the same kind or nature; essentially alike.

I would suggest that temperature homogenization is a loose term for describing the process of making the data more homogeneous. That is for smoothing out the data in some way. A false analogy is when I make a vegetable soup. After cooking I end up with a stock containing lumps of potato, carrot, leeks etc. I put it through the blender to get an even constituency. I end up with the same weight of soup before and after. A similar process of getting the same after homogenization as before is clearly not what is happening to temperatures. The aim of making the data homogenous is both to remove anomalous data and blend the data together.

18 Comments

  1. Why do you say that the post 1950 warming in South Korea is extreme?

    • manicbeancounter

       /  27/06/2015

      Follow the link and you will find out.

      • You mean the link to Paul Homewood’s post? If so, I doubt it.

        I’ll make a semi-philosophical comment. The problem with what you, Paul Homewood, Euan Mearns, Roger Andrews, etc are doing is that it’s essentially reverse science. As you clearly realise, the goal of these temperature analyses is to try and understand how we have warmed over the instrumental temperature record (both globally and regionally). If you think there is an issue with these analyses, there are two things you can do: you can do your own (as Berkeley Earth have done), or you can show that it isn’t possible to do, given the data. All that you (and the others) are doing is highlighting some things that you regard as strange, or maybe not quite right. What does that tell us? Well, nothing really. Who would expect these type of analyses to be perfect? That’s why we encourage multiple groups to develop their own methods.

        Now, of course, if your goal is to actually make some kind of positive contribution, then what you were doing might be perfectly fine. Given that it appears to simply be an attempt to pick holes in some kind of analysis, with the goal of suggesting that it might be wrong, it’s largely meaningless. Of course, if you could actually show that we haven’t warmed as much as these analyses suggest, that would be quite interesting, but you haven’t. All you really seem to be doing is picking holes in an analysis. You can probably do that with any type of analysis that uses an imperfect method to analyse data this isn’t perfect. Essentially you’re promoting the idea that we can’t trust some analysis unless it is perfect. Given that that is probably impossible, you essentially appear to be suggesting that we ignore science.

        • manicbeancounter

           /  27/06/2015

          UHI In South Korea Ignored By GISS

          Homewood compares, station by station the GISS homogenisation adjustments with a peer-reviewed study estimating the warming effects of urbanization. GISS homogenisation does not pick up the effects, so exaggerates warming in the country.
          You link to a BEST estimated AVERAGE post homogenisation graph. This is meant to contradict. It would be nice if you could compare and contrast the BEST and GISS homogenisations. If they are different, it is another example that undermines the assumption of homogenisation clears out measurement biases, and another example what is a theme of blog – look at data in different ways. But you, following the norm in climatology, do not check your assumptions, nor do you apply any form of objectivity to analysis of data output. You my then start to understand the limitations of the data and the homogenisation processes. Instead you make a waffly statement based on you prejudices to distract from your own lack of understanding, as you did a few months ago on your own blog. Homewood and I both have an accountancy background. In finance if one does complex calculations, one checks the results, or better still get someone else to check. If the data does not stack up, you keep on checking until you understand why it does not stack up. That involves reconciliations. Ones does that until a true and fair view is established. I have been trying to establish with the temperature data, over a number of posts this year. So I suggest the following comment is based on your self-imposed ignorance.
          Essentially you’re promoting the idea that we can’t trust some analysis unless it is perfect.

          • But you, following the norm in climatology, do not check your assumptions, nor do you apply any form of objectivity to analysis of data output.

            So I suggest the following comment is based on your self-imposed ignorance.

            Okay, you’re one of those people. I had maybe thought you weren’t. My apologies for butting in. Carry on.

          • Kevin Marshall

             /  21/07/2015

            Update 21/07/15
            ATTP – The above comment is totally out of order. You claim to be a scientist, but you are offering junk opinion. You totally ignore the argument that leads to the conclusion. In particular I avoid picking holes, but state where homogenisation can fail, with examples. I conclude the summary with the following sentience.

            Whether or not this failure of understanding is limited to a number of isolated instances with a near zero impact on global temperature anomalies is an empirical matter that will be the subject of my next post.

            That post is

            Climatic Temperature Variations.

  2. Fascinating post. There is definietly more to it than I could ever expected.
    Looking forward to your next post!

  3. tom0mason

     /  27/06/2015

    Nice analysis.
    And this is why I insist that homogenization, if it must be done can only be done at the local level, homogenization across large areas is foolhardy, and leads to numbers that are at the best meaningless at worst wildly out of order.
    There is of course the problem that in homogenizing, infilling and averaging the global temperature you dilute the signals of real climate variation. The trend gets hidden, or at lease significantly reduced.
    IMO real climate changes are seen first in the few localities away from places where the climate is relatively invariant and predictable — out at the areas where the local climate is usually more unstable, more variable. Careful and thorough temperature analysis, taking into account the geography and topology of the terrain, at these more variable sites should show the short term climate change, and over a few centuries of data, show whether the trend is heating or cooling.
    My point is that when looking back at previous global climate events the overall average global temperature variation as a percentage of average global temperature was very small, whereas certain localities witness huge (percentage-wise) temperature variations.

    • manicbeancounter

       /  27/06/2015

      Thanks Tom for the comment. You say

      There is of course the problem that in homogenizing, infilling and averaging the global temperature you dilute the signals of real climate variation. The trend gets hidden, or at lease significantly reduced.

      I agree. See my follow-up post.
      You also say

      IMO real climate changes are seen first in the few localities away from places where the climate is relatively invariant and predictable — out at the areas where the local climate is usually more unstable, more variable.

      An example of this might be in the Venezuelan Andes. You may recall Paul Homewood’s recent post The Little Ice Age In South America. The ice core data showed a much larger temperature variation in the high altitudes (3000m) than possibly elsewhere. So I looked for modern temperature data, and discovered Merida in the State of Merida. It is located at 1600 metres, and is the only temperature in the area in the mountains. Nearby data is from near sea level, where there is much less variability. As a result homogenisation messes up the data. The GHCN V2 (raw data & minor adjustments) shows data going back to the 1920s. It also shows a 2C rise in temperatures between 1955 and 1960. As this is incompatible with any data in the locality, homogenisation deletes the data.
      Compare the GHCNv2 data

      Click to access station.pdf

      with the GISS Homogenised data.

      Click to access station.pdf

      Berkerley Earth uses a different homogenisation technique. Differences with the expected regional average are adjusted through an “Estimated Mean Station Bias“. The adjustments are quite large. The 1955-1960 rise in temperatures is largely eradicated.

      Click to access 165773-TAVG-Alignment.pdf

      Your thought about careful analysis I believe is important. The temperature data is extremely limited, especially prior to 1950. However, I believe that with careful checking of the data, and analysis of results, it is possible to see trends at much shorter time periods than centuries. But checking data or assumptions is not something the climate community are willing to do. For example see ATTP’s comments above.

      • tom0mason

         /  28/06/2015

        Thank-you Kevin, it was such reports as Paul Homewood’s that got me thinking this way.
        Also there was the weak defense of homogenization of GISS data (IIRC) at stevengoddard blog site sometime ago.

        • But checking data or assumptions is not something the climate community are willing to do. For example see ATTP’s comments above.

          What a bizarre and insulting thing to suggest and, also, has nothing to do with what I said in my comment above. I really have misjudged you extremely badly. Oh well; live and learn.

          • manicbeancounter

             /  28/06/2015

            I will give the same substantiation that you give to your claims.
            0

            I will remind you that we are both fallible human beings. Also when dealing in the soft sciences, or the analysis of numbers based on incomplete and distorted information, you have less training and experience than I do. Climatology is also a complex subject, reliant on this information to keep a grip on the real world. If not all what you deal with is a bunch of equations that amount to pseudo scientific mumbo-jumbo. I charted some of your initial efforts on temperature homogenisation some time ago. You continue to throw the yellow stars to misdirect. I do not think you are capable of anything else. But this a falsifiable hypothesis.

          • manicbeancounter

             /  28/06/2015

            ATTP

            I have not deleted your latest two comments. I have put them into spam, and will release them at later date. I have sent you a personal email at the wotts address to explain why.

            Best regards
            Kevin Marshall

          • I will remind you that we are both fallible human beings.

            There is a slight difference. I haven’t personally attacked you. I had thought that you were more reasonable and more decent than a typical Bishop-Hillian. I was wrong.

            Also when dealing in the soft sciences, or the analysis of numbers based on incomplete and distorted information, you have less training and experience than I do.

            So what? This isn’t a soft science. I would point out areas where I have (I would imagine) considerably more experience than you. However, appeals to authority (especially one’s own) are particularly irritating and – in my experience – rarely correct.

            Okay, this is truly bizarre and I should have stuck (more than once) with my first impressions. I really do hope that a small group of maverick accountants do overthrow a scientific paradigm. I’m not particularly confident that they will. Good luck, though.

  4. Reblogged this on CraigM350.

  1. Temperature Data Ocean Impact and Temperature Homgenization | ManicBeancounter
  2. HADCRUT4, CRUTEM4 and HADSST3 Compared | ManicBeancounter
  3. Does data coverage impact the HADCRUT4 and NASA GISS Temperature Anomalies? | ManicBeancounter