The direction of travel for the industry should be away from tightly controlled cooling to a more easygoing approach.
It’s well established that the data center industry is based on controlling risk. One of those risks is obviously that an unexpected spike in temperature and/or humidity will cause IT equipment to fail.
For that reason, even data center operators in largely temperate climates such as Western Europe have continued to invest in at least some form of mechanical cooling – as well as a range of software and control systems to closely manage that cooling capacity. This is usually presented as engineering-based decision making built around hard data, efficiency and years of best practice. That’s true up to a point but is not the whole story.
The reality is that some of the decisions around cooling, as well other aspects of data center design, can be put down to a cascade of exaggerated safety margins and service level agreements more akin to Chinese-whispers than engineering-based analysis.
That has contributed to a historic practice of overcooling facilities well below the required levels for IT equipment. That practice persists to some degree despite the fact that recommended and allowable ranges for temperature and humidity continue to be reassessed and relaxed.
For example, in 2011 the main organisation charged with advising on temperature range standards – ASHRAE – extended recommended and allowable limits with the latter potentially pushed up to 113F (45°C). But even though server makers such as Dell actually warrantied equipment for ‘excursions’ up to 113F few, if any, operators have opted to go anywhere near these limits.
That’s not to say however that there hasn’t been some upward movement of cooling set points over the recent past. As with other aspects of the data center, hyperscale operators have been trailblazers in this area. Google for example runs its facilities at 80F or higher.
Tolerance of higher temperatures is also closely linked to use of economisation or free cooling. The use of outside-air cooling to replace or supplement mechanical cooling is established in hyperscales but is also becoming more widely used in colocation and enterprise sites. Around 30 percent of organisations now use indirect or direct air-cooling (with evaporative support) according to Uptime Institute’s 2017 industry survey.
Warmer is cheaper
Replacing the majority of the conventional mechanical cooling in a data center- which can account for up to 40 percent of capital costs - with forms of free cooling technology can result in significant operational and capital cost savings.
According to some supplier estimates, indirect free cooling systems cost 15% less than an equivalent chilled water system, but the operational cost is 90% less – adding up to a 75% reduction over the total life. This has benefits in terms of other capital costs as diesel generator size can also be reduced by 60% and transformer size by 70%. The other big benefit is that the power and space that is freed up from eliminating mechanical cooling can be used to support more IT.
These combined benefits are one of the reasons why colocation suppliers and hyperscale operators have been locating sites in the Nordics where mostly stable and low year-round temperatures – Sweden can get a little warm in the summer – are ideal for free-cooling.
Switching on the x-Factor
But while there is growing acceptance of the use of different flavours of outside air cooling – as well as direct liquid cooling as we have covered in previous blogs - there is some evidence that the whole notion of tightly controlling data center temperatures should be reconsidered.
ASHRAE continues to update its thinking around what it describes as the ‘X-factor’. Luckily, this has nothing to do with insipid TV singing competitions and everything to do with server failure rates. Specifically, X-factor is an attempt to model how server annual failure rates (AFR) changes in relation to a baseline at 20°C (68°F) inlet temperature.
The specifics are complex but the upshot of all of this analysis is rather surprising: if you let servers follow natural shifts in temperature over time, then effectively the lows counteract the highs. In terms of failure rates, time spent at temperatures below 20°C largely offsets time spent above it.
Obviously this is very simplified summary of ASHRAE’s X-factor data but the essential points hold true:
- Increases in temperature (up to a certain point) have a marginal impact on server failure rates
- Mechanical cooling is largely unnecessary (outside of tropical or sub-tropical climates).
- A looser approach to cooling is generally preferable to predefined tight limits
There are obviously ramifications for data center operators, customers and technology suppliers from this X-factor data. The findings around X-factor haven’t turned the industry on its head yet but should feed into the long-term shift towards chiller-free cooling.
But while the direction of travel for the industry seems to be away from mechanical overcooling, climate change is creating more uncertainty about the future. On a recent webcast, a panel of experts from the Uptime Institute argued that operators should be trying to design climate-change proof facilities.
“We see that operators are not planning for increased heat and humidity, and they are often putting in cooling systems that may not be up for the job,” said Andy Lawrence, an Uptime research VP. “It does worry me that the level of foresight and planning isn’t resilient enough. It is certainly a question I would be worrying about if I was building out new facilities, ‘Should we be even more conservative?’”
Uptime is also worried about increasing use of evaporative free cooling systems that use large amounts of water – a resource that may become scarcer in the future.
But despite these concerns, a movement back towards widespread overcooling seems unlikely. Quite apart from the cost and efficiency implications it would be a perverse move; carbon and power profligate data centers would further exacerbate the damaging symptoms of climate change that these toughened sites were supposed to be protected against.
The best future for data center design is to build facilities and solutions that are efficient, agile and crucially highly interconnected. As hyperscale operators are already showing, data replication across multiple sites – so-called distributed resiliency – is the best way to safeguard availability. In that future, the importance of any one server, rack or facility is lessened and need to overcool should eventually evaporate.