Facebook uses heat maps to find problems in the IT Infrastructure

We are all used to the use of thermal scanners to find hot spots.  The term heat map is also used to figure out problem areas.

This ComputerWorld article has information on Facebook's use of the heat map technique to find problems in its IT infrastructure.

Facebook heat maps pinpoint data center trouble spots

A Facebook engineer developed heat-map technology to quickly identify server, rack or cluster failures

By Joab Jackson
September 19, 2012 03:37 PM ET

IDG News Service - Faced with the challenge of overseeing the health of large caching systems, a Facebook engineer developed heat-map software to quickly pinpoint problems in the social network's data centers.

The Facebook blog post has more details and some images.

When I first deployed Claspin, the view above had a lot more red in it. By making it easier for more people to spot server issues quickly, Claspin has allowed us to catch more "yellows" and prevent more "reds." I suppose there's no better validation of one's choice of statistics and thresholds than to have things start out red and then turn green as the service improves.