Here is a talk from Oct 2013 on “How Google backs up tho Internet"
If you don’t have the time to watch the video here is a post based on the video.
MONDAY, FEBRUARY 3, 2014 AT 8:56AM
Raymond Blum leads a team of Site Reliability Engineers charged with keeping Google's data secret and keeping it safe. Of course Google would never say how much data this actually is, but from comments it seems that it is not yet a yottabyte, but is many exabytes in size. GMail alone is approaching low exabytes of data.
Mr. Blum, in the video How Google Backs Up the Internet, explained common backup strategies don’t work for Google for a very googly sounding reason: typically they scale effort with capacity. If backing up twice as much data requires twice as much stuff to do it, where stuff is time, energy, space, etc., it won’t work, it doesn’t scale. You have to find efficiencies so that capacity can scale faster than the effort needed to support that capacity. A different plan is needed when making the jump from backing up one exabyte to backing up two exabytes. And the talk is largely about how Google makes that happen.