Skip to main content

Alaskan North Slope climate change just outran one of our tools to measure it

It was bound to happen. In fact, my colleagues have planned for this. More on that later.

On December 4th, the folks in the Climate Monitoring group at the National Centers for Environmental Information (NCEI) did what we do pretty much every 4th of the month: we processed the previous month's data to prepare our initial US climate report. The data from Utqiaġvik, Alaska, was missing, which was odd. It was also missing for all of 2017 and the last few months of 2016. This was even weirder, because we knew we’d kinda marveled at how insanely warm the station had been for several weeks and months during 2017.

What happened?

The short version: in an ironic exclamation point to swift regional climate change in and near the Arctic, the average temperature observed at the weather station at Utqiaġvik has now changed so rapidly that it triggered an algorithm designed to detect artificial changes in a station’s instrumentation or environment and disqualified itself from the NCEI Alaskan temperature analysis, leaving northern Alaska analyzed a little cooler than it really was.

How did that happen? Why is it important? What are the impacts? 

We’ll hit some of these questions in this edition of Beyond the Data.

First, where is Utqiaġvik?

Utqiaġvik (it's pronounced something like OOT-ki-aag'-vik) sits within eyesight of Point Barrow, the northernmost point in America, on the Arctic Coast of northern Alaska. Now recognized by its Iñupiat place name, it is still commonly known as “Barrow.” My fellow Oklahomans will recognize Barrow as the place where Will Rogers and Wiley Post perished in a plane crash in 1935. (Full disclosure: Will’s my personal hero and perpetual “person living or dead I’d want to have dinner with”).

In the context of a changing climate, the Arctic is changing more rapidly than the rest of the planet. This is true for temperature and in ways unfamiliar to most of us down here in the Lower 48: permafrost thawing, changes in sea ice cover, the “shrubification” of tundra.

Utqiaġvik, as one of a precious few fairly long-term observing sites in the American Arctic, is often referenced as an embodiment of rapid Arctic change. The many facets of its changing climate have been detailed here on climate.gov. Beyond the Data touched on it a few months ago. Heck, you can search this site for “Barrow” and you’ll have enough reading material for hours (go on, search for it).

Some of the many changes happening in and around Utqiaġvik involve sea ice, when it arrives near the Arctic coast, how long it stays, when it recedes. October and November are months during which Arctic sea ice, having contracted to its smallest footprint each September, advances towards Alaska’s Arctic (northern) coast. In recent years, that arrival has been slower and less complete.

Here’s a look at the relationship between that, and the temperature at Utqiaġvik. The graphic just below shows two sets of data for each November since 1979. While you peruse it, please let me thank my NOAA colleagues Jake Crouch, who sits ten feet from me, and Rick Thoman, who sits about 3,500 miles from me in Fairbanks, for being the brains/inspirations for these next few graphics.

Sea ice area for the combined Chukchi and Beaufort basins of the Arctic Ocean (vertical axis) plotted against average temperature at Utqiaġvik (Barrow), Alaska for each November from 1979-2017. Sea ice area courtesy National Snow & Ice Data Center. Utqiaġvik temperature from NWS/NCEI.

On the vertical axis, the combined November sea ice area for the Chukchi and Beaufort Seas, the two sub-basins of the Arctic Ocean that touch Alaska’s Arctic (northern) coast. Higher on the scale means more sea ice in the combined basin.

On the horizontal, the November temperature at Utqiaġvik. Farther right means warmer.

The pattern is striking, showing a clear relationship. For Novembers with low sea ice in the basins (the recent years), the temperature, fairly reliably, is high (R=-0.77 if you're scoring at home). November 2017 is the rightmost (warmest) and bottommost (smallest sea ice area) of the group.

The basics are this: when sea ice in the region is small, more (relatively warm) Arctic water is exposed to the atmosphere, which means much warmer air temperatures in the region, all else being equal.

It’s not just November that’s changing at Utqiaġvik. This next graphic shows each month’s average temperature for two mini-eras, 1979-99 and 2000-17. I chose 1979 as a starting point because that’s when satellite-based sea ice area data begins in earnest. I broke the 39-year satellite era roughly in half, into its 20th century and 21st century segments.

The first nine months of the year have warmed across those segments by 1.9°F. That’s almost twice the contiguous U.S. increase for the same same nine months and the same two mini-eras. But look at the increases in October, November and December. 21st century October at Utqiaġvik is a whopping 7.8°F warmer than late 20th century October.

Add it all up, and thanks largely to these three months, even the annual temperature at Utqiaġvik has changed dramatically in the last two decades.

So what does this have to do with dropping the data?

Many things can affect a long-term temperature record at a place. In a perfect world, all of those things are related to the actual temperature. Thankfully, for the most part, they are. But things can happen to artificially affect a long-term temperature record. Sensors can change, the types of shelters that house those instruments can change, the time of day that people (many of them volunteers) make observations can change, stations get moved to a warmer or cooler place nearby. You get the idea.

Networks of weather stations are designed to measure stuff for weather forecasters. They run non-stop, and spit out a lot of data. It's awesome. Unfortunately for climate trackers, networks built for informing weather pros don't necessarily take care of all the things needed to track climate. 

Where a convenient move to the other end of the runway to a place that's half a degree cooler shouldn't really impact a weather forecast model from day to day, that can look a lot like climate change in the long term. So those kind of details need to be tracked in a best-case scenario, and detected when they slip through the cracks.

It would be great if we just had 4000+ weather station gnomes whose families are bound to tend a given station’s climate record for generations, who knew every detail of a station’s quirks, and who always write down all the station’s data including margin notes like "moved down the runway, built a shed nearby, and swapped out the sensor" - then hand carry their notebooks to NCEI each month. But instead, data ingest is an automated process, which means the handwritten margin notes sometimes get overlooked. So we need an automated process that flags problems and tells scientists, “hey, check out Barrow, there’s something odd there.”

Some of my colleagues at NCEI - namely Matt Menne and Claude Williams of Menne et al. and Williams et al. fame - have been at the vanguard of developing algorithms and approaches to detect and flag these types of changes, so that the true climate signal can shine through some of these challenges. One suite of those tests is called the “pairwise homogeneity algorithm” but let's just call it the “PHA test” from here on.

Stations behaving badly

The PHA test is not mathematically simple, but the concept is straightforward. When climate changes—naturally or from global warming—most of the stations in the same region should change in a similar way.

The PHA test's basic job is to detect when a station's relationship to its neighbors changes - over the long term. It does this by pairing different combinations of stations in a region and looking at how the temperatures at each station co-evolve consistently over time. In other words, it checks if a station's temperature over the long term is changing in ways that are different from nearby stations and inconsistent with its past behavior.

This change in behavior typically indicates some kind of "artificial" change like those mentioned above. For example, if a station is moved from a higher (cooler) elevation down to a lower (warmer) one, over the long term, that will show up as the station tracking warmer, relatively speaking, relative to its neighbors. The computer programs running the PHA test flag stations that jump to a new (relative) state like this.

What happens to stations behaving badly

Here are some important things to note:

  • the test is quite conservative: it takes considerable evidence for it to “make the call,”
  • the algorithm running the test, like many things in data science, is hungry for more data points. Generally speaking the more neighboring stations, the better and more confident the algorithm is,
  • when there is enough information from neighboring stations to build high confidence, the station record is adjusted. If not, an estimate is made, and the data are flagged as an estimate,
  • NCEI does not use estimated data in its US temperature analysis. Data flagged as such are considered not to exist,
  • the raw observations are preserved for perpetuity. Each station carries two histories: an adjusted set that corrects for the issues detailed above, and an unadjusted set that is never altered from its original observation.

You can see where I’m going here, right?

As a relatively isolated station, experiencing profound and unique change, Utqiaġvik was destined to get flagged. And it happened this month. Having built confidence that a disruption to the station was afoot, the PHA test retroactively flagged the last 16 months and removed them from the monthly analysis. But in this case, instead of a station move, or urban sprawl, or an equipment change, it was actually very real climate change that changed the environment, by erasing a lot of the sea ice that used to hang out nearby.

The silver lining

Climate change is challenging. So is measuring it. But it's important to measure it, and better data sets mean better and more confident measurements. We will be able to restore Utqiaġvik in coming months, but we tread slowly and conservatively, so it won't pop back in our US and Global analyses right away.

On a longer time horizon, those same folks that designed the PHA test are constantly working to improve it. The next version of the Global Historical Climatology Network (GHCN-Monthly), upon which the monthly temperature analyses are based, is coming in early 2018*. You know what’s already built into that version? A certain latitude (65°N, currently), poleward of which the PHA test becomes even more forgiving, considering the rapid Arctic changes, and the station scarcity in the region.

That’s a posture of continuous improvement, and it’s just one way that even old stalwart datasets like GHCN-Monthly get better over time, thanks to a few of my colleagues that work Beyond the Data.

*[CORRECTION: This date was originally mistyped as 2019.] 

Comments

And that's why we can't have nice pairwise homogeneity things. Could they do a linear regression to remove the trend and then perform the pairwise homogeneity test (e.g., fit a trend to the October temperatures since 1980 and then look at the residuals)? Or does that weaken the power of the test?

Great question, Jim. As you probably suspect, the answer is complicated. One challenge with that is that some "threats to validity" in the observing environment (encroachment of urban areas being a great example) will manifest as something that looks a lot like linear trends! I'll ask the pros to see if they've considered that and get back to you.

In reply to by Jim Angel

This was very informative -- it illustrates how scientists care about the details and want to get this climate change stuff right. I wish more of the public realized how much goes on behind the scenes to ensure accuracy and precision. Thanks for a great write-up!

Thank you Jack for taking the time to write this note. I really appreciate it.

Deke

 

In reply to by Jack

Ah yes, identifying possible outliers - examining them - and finding the causes. Very good article - some background questions if I may The rate at which the algorithms identify possible station outliers - do you do this daily, weekly, just monthly?? What are the "usual" problems - as in a Pareto chart? How do correct issues at the station? What about the data itself? Many thanks for educating us!

Hi there, thanks for the kind notes.

In operations, the algorithms do not attempt to distinguish between any of these or definitely name a problem. This prevents the creep of (well-intentioned) overly prescriptive fixes. It instead points out when the data are indicating an issue, and in which direction the data suggest things are going off course, and a suggested fix.

With that said, the "usual suspects" are:

  • changes in observation practice, particularly changes in time of observation. The US has transitioned from a nation of primarily evening observers to a nation of primarily morning observers since about WW2.
  • station moves, the effect of which can be exacerbated when changes in elevation are involved; 
  • changes in station environment: land use changes (removal or encroachment of trees), urbanization near a station
  • changes in instrumentation or shelter: for example, there was a large swap out of temperature equipment in the 1980s for much ot the cooperative observer network, which makes up a plurality/majority of observing stations used in our national analysis.
  • the intersection of two or more the above: for example, that swap-out in the 1980s influenced temperature measurement in a couple of ways, and not in the same direction. The change to electronic equipment meant wired stations, which brought them closer to buildings, generally a warming influence [relative to history], especially in the morning. But the change in shelter type from what is effectively a wooden box to a more effectively passively aspirated "beehive" type shelter had a cooling influence [relative to history], especially in the afternoon.

Each of these types of scenarios was used in a "blind test" for the algorithms - blind to the algorithms and even blind to the creators of the algorithms! - to see if the algorithms were capturing the types of changes seen, and correcting in the right direction, while minimizing harm done to "innocent" observations. The results were quite positive. While no test is perfect, the algorithms were generally correcting the signal in the proper direction.

If you're a QC geek (and it sounds like you might be!) the tests and the results are here: http://onlinelibrary.wiley.com/doi/10.1029/2011JD016761/abstract 

Deke

 

 

Like your algorithms I too wish for more data. Not only are we looking at a tiny slice of the earth, we are looking at a tiny piece of time. Do you have any data from before 1980? Thanks, Hinheckle Jones

Hello and thanks. That's kind of the point of this article: that a localized (relative to the observing network, anyway) phenomenon over a short time (a couple of decades) caused such a change that it appeared to be artificial. It wasn't.

Deke

 

In reply to by Hinheckle Jones

The first comment says the data is from the Barrow Observatory, but IIRC they do upper air balloons and CO2 measurements, but aren't the long term climate data station. Is this data from Barrow/Wiley Post airport? Or, someplace else?

Thanks, for the reply. Q. is the data for that station available someplace? The link you provided was for some history of the station.

In reply to by Derek.Arndt

I assume data are also collected at Prudhoe, Kaktovik, Wainwright, etc. Did the PHA test also reject data from those locations?

Hi Derek, Thanks for this post. Very interesting stuff. I was wondering if you guys had at all considered the effect of UHI in this? It looks like most of the warming is in the winter months where one would expect UHI to be most pronounced in the Arctic. Pretty sure past research has established Barrow as a UHI. Just wondering if you guys had considered this. Please let me know. Thanks!

Hi there,

The local impacts of urbanization or similar land use change is one of several factors considered in the design of the algorithm, in that it is one of the local changes that can cause the algorithm to bite. Operationally, the algorithm doesn't try to distinguish between factors. 

Deke

 

In reply to by Mike Bastasch

So, if the PHA test catches a temperature increase from UHI, what happens then? Is the data retained, like in the case of Barrow, or is it discarded?

In reply to by Derek.Arndt

Hi there, as with all PHA "hits" the original data are retained and saved for perpetuity. The unadjusted data are the ultimate foundation for all of our analysis products. Each month, we start with the unadjusted ("raw") data. The values that are flagged are either held out of the assessment (as in the case for Barrow), or are adjusted to correct for the perceived issue (the algorithm needs much longer segments to build enough confidence to make the adjustment).

Again, to re-emphasize: the unadjusted data are preserved and we use this as the foundational set for subsequent analyses.

Not only are they preserved, but they are publicly available at https://www1.ncdc.noaa.gov/pub/data/ghcn/v3/  In that directory the latest average temperature at each station ("tavg") is available in raw, unadjusted form and the adjusted dataset indicators

Thanks!
Deke

 

In reply to by Mike Bastasch

OK, but if data gets flagged by PHA test as “questionable” and it is found to be erroneous (like some equipment failure) does it still get included in the database use for climate analysis, or does it get flagged or separated in some way?

In reply to by Derek.Arndt

For any given month's report:

  • If the data are flagged as suspect by the PHA, and the record can confidently be segmented into clear pieces, each corrected appropriately, then those segments will be used in a report
  • If the data are flagged as suspect by the PHA, but the record cannot be confidently segmented (a segment is too short to confidently constitute an independent segment, etc.), then the data are withheld from the analysis.

In any event, the raw data are preserved from month to month, and are the foundation for analyses in future months. 

Deke

 

In reply to by Mike Bastasch

No. Many of the instruments are "passive" - meaning they don't emit any energy Earthward; they only "listen" or "take pictures". "Active" sensors will emit a signal and wait to record the characteristics of its "echo." The amounts of energy involved in active sensors are miniscule compared to the factors that determine temperature on the Earth - namely the Sun's energy, and the ways that the atmosphere and ocean systems transport it around.

Deke

 

I am comfortable asserting that the retreat of sea ice is the dominant cause of the autumn differences in the mean (not necessarily for every given year). It is both a very real local change affecting the locality and one of several Arctic feedbacks that are generally accelerating warming in the region. There is rich detail in NOAA's 2017 Arctic Report Card on the interplay between sea ice and air temperature.

The CRN is a wonderful network and was designed and deployed to help provide a baseline for situations just like these. However, the nearby CRN station did not exist in the "before" years in this comparison; it would not provide the before-and-after context needed to compare.

Thanks for your questions. I'm going to close out this thread now and devote some attention to other NCEI customers. Thanks!

Deke

 

Thank you for the excellent account of what happened here. A quick clarification question: When you say the disruption resulted from "very real climate change that changed the environment, by erasing a lot of the sea ice that used to hang out nearby," does that mean the melting sea ice caused an actual temperature spike in the area (like from a melting sea ice/warming water feedback loop) so big that the algorithm flagged the temp increase as "artificial"? Or, did the rapid melting of the sea ice affect the station's climate monitoring instrumentation in a way that made it "think" temperatures had risen faster than they really had? I believe it's the former, but would be grateful for confirmation. Thank you, Chris

Thanks, I very much appreciate the comment. Yes, it's the former scenario, as you are leaning toward: the retreat of sea ice warmed the area and the instrumentation correctly observed that warming. 

Deke

 

Thanks for your quick reply, Deke. That is really helpful. I don't want to take advantage of your time and generosity, but if you're up for fielding one additional question from me, I would be super appreciative, as your response has helped me clarify now what I'm really after on this. I’m writing about what happened in Utqiaġvik for an academic journal in the humanities and am trying my best to understand something very basic about how the PHA works. In essence, I’m trying to understand whether the limitations of a PHA are, ultimately, purely technical and practical (i.e., with a sufficient number of reliably working monitoring stations in a given area, the algorithm would be foolproof), or if there is also always the possibility (however slight) for an error caused, instead, by abnormally rapid yet very real climate driven temperature change. Above, you explain that a PHA works by comparing the rate of temperature change across multiple monitoring stations in a given area. And you say that the monitoring station in Utqiaġvik was "relatively isolated," thus making it more vulnerable to error than most. But does this mean that if there are enough stations in a given area that the algorithm would be foolproof, i.e., would then *only* flag a temperature change as “artificial” when it really is? Or, is it possible for a rate of temperature increase, itself, to be so unusual or improbable that it could be mistakenly classified by the algorithm as "artificial," even if it were registered by lots of monitoring stations in a given region (perhaps because the algorithm needs to differentiate between “artificial” changes to the surroundings like logging)? In other words, from my understanding so far, it seems to me to come down to an ultimately not-fully-eliminable distance/temperature trade-off, that is, a trade-off between the number of monitoring stations in a given area vs. the rate of temperature change that an algorithm currently deems “acceptable in reality” in that area. But if, as global warming unfolds, its effects can always bring about “new normal” standards for the distance/temperature trade-off, then it seems there must always be the possibility (however small) for a current algorithm to give a false alarm. No matter how many monitoring stations there are in a given area. Because what an algorithm deems “acceptable in reality” can never fully anticipate the “new normal” standards of acceptability that global warming can bring about. I hope what I’m asking about makes some kind of sense. In essence, I’m just asking if the PHA is ultimately prone *only* to errors due to practical or technical limitations, or if there is also always the possibility (however small) of error caused by sufficiently unusual or unprecedented yet nevertheless very real changes due to climate change itself. And I would be hugely grateful for a response on this from you. Thank you so much for your time and expertise! Chris

Hi Chris,

There is always the possibility that the PHA algorithm can get "fooled." Having more dense data (in space and time) would help, and probably help a lot, but it would not eliminate or make this aspect of the PHA infallible.

At the risk of over-generalizing before really thinking it through, I'd say that all quality assurance / quality control algorithms fit in this category. It's certainly been my experience that no QA routine is bulletproof.

Deke

 

 

A quick follow-up question if you're still fielding them here, Deke. A 2012 article talks about how climate divisions in Alaska were updated recently through a cluster analysis (dividing Alaska into 13 climate divisions): https://journals.ametsoc.org/doi/pdf/10.1175/JAMC-D-11-0168.1 Are those climate divisions (along with the other climate divisions in the lower 48 states) taken into account by the PHA somehow (or the GHCN), like as a starting point assumption? Or, does the PHA establish its own version of such divisions, independently, when it performs its homogeneity comparisons? Thank you, Chris

Hi Chris,

No, the PHA does not consider these "climate divisions" as part of its operation. It is strictly a station-by-station method. We compute the divisional averages after the PHA has done its job of correcting the station records.

Deke

 

In reply to by Chris

Many thanks for your responses, Deke. I hadn't seen your response to my lengthier, previous question until now, but thank you for that as well, which helps me confirm my understanding. Looking back, I think I posted that one sooner than I should have, as I was still wading through the technical lingo on the PHA to try to understand the basics of it, so my apologies for the long-windedness there! The 2009 Menne & Williams article introducing the PHA ended up helping me get a handle on things, too. The PHA is a fascinating and powerful tool. To make sure I understand something from your response today, when you say, "We compute the divisional averages after the PHA has done its job of correcting the station records," the "divisional averages" you're referring to there are the ones the PHA generates, right? And the climate divisions listed in the 2012 article are derived through a fully separate (cluster analysis) method. One final quick question if I could, which I'm having trouble finding the official answer to, is how many monitoring stations there are in Alaska that are part of the GHCN. I've seen that there are 1,221 stations in the U.S. HCN but that that covers only the lower 48 states. If you happen to know of an official source noting how many Alaska stations are included in the current GHCN that you could direct me to, I'd be super grateful. Thank you again for your work and for fielding these questions. Chris

Hi Chris,

The climate divisions are effectively unrelated to the PHA. They are geographic regions selected to identify regions within Alaska that share similar climate regimes, to the extent that is possible. They show up on maps like this: https://www.ncdc.noaa.gov/monitoring-content/sotc/national/divisionalta… ... The methods used to define those (Bienek et al.) were independent of PHA.

PHA is useful to get the best long-term values on a station-by-station basis. Then its job ends. The resulting values are then used in subsequent analysis, like the interpolation we do to come up with the gridded data set that is then used to compute these divisional values.

There are 982 Alaskan stations on the GHCN-Daily roster. Some of those are defunct/retired. A fraction of those remaining report in a timely enough manner to be used in the latest month's climate analysis. I'd say "a few hundred" which includes a handful in bordering parts of Canada that help inform near-border analysis. I'll have to get back with you with exactly how many went into, say, the January 2018 analysis.

The USHCN has not been used operationally as a network since 2014. The stations that were part of USHCN are incorporated into GHCN. USHCN is no longer relevant to discussions of US temperature.

Deke

 

 

Thank you, Deke. That's super helpful. And I'm with you on the now defunct USHCN being fed into the GHCN, which the PHA monitors and adjusts. On the Alaska station count, I'd assumed I wasn't finding an obvious source, but please don't spend any time digging into that on my behalf. And I'm all clear at this point on the essentials of what I needed to understand. Thank you again for your help and clarity. Take care, Chris

Just wanted to say how impressed I am by the depth of thought presented by these questions and answers. Serious, informed people are concerned about climate change data and it shows. Not so much so for me, but I know enough to observe and appreciate the process. Defending any thesis is a core value of the scientific process and is well represented here. Thanks for all your good work, it increases my confidence in the presentations. Regards

Add new comment