Infrastructure of IoT, Beyond availability and scalability

I wrote this post for Gigaom on what the Infrastructure of IoT is.  My thoughts are it is beyond the typical abilities - scalability, availability, etc.  I included part of what I wrote below.  For the full text go to the Gigaom post.

I am moderating panel discussion on the Infrastructure of IoT at Gigaom Structure on June 19.  Please join me there or watch the live stream.  This event should be one the of best here are the headline speakers.

NewImage

Here is my post on IoT.

Infrastructure of IoT, beyond availability and scalability

by Dave Ohara

 MAY. 24, 2014 - 12:00 PM PDT

 Comment

Internet of things, globe, fiber optics
photo: asharkyu
SUMMARY:

To handle the addition of billions more devices — including sensors that talk to each other, not necessarily to us — how must our infrastructure evolve? That’s a big topic on tap for Structure 2014.

Infrastructure is something that people are used to not thinking about. It is normally associated with roads, water, electricity, and telecommunications. Things it takes for a society to function. People just want infrastructure to work when they need it. When roads are being repaired, when the waterline breaks, when the power is out, and the Internet is down — that’s when people pay attention to infrastructure. Most would assume that the Infrastructure for IoT should be the same just like the rest of information technology (IT).

In IT, search, email, finance, social networks are the infrastructure for being connected. When people talk about infrastructure for IT they think of security, availability, scalability, and reliability, as the key capabilities to focus on. Whenever there is a security breach or services go down, teams scramble to remedy the situation. The internet of things is being driven by many of the same technology companies that users are familiar with. Running a Google Search for “IoT” the top three paid advertisers are Microsoft, Cisco, and Intel.

Building IoT Infrastructure the same as other IT Infrastructure

If you take a traditional approach, the IoT is the same infrastructure approach for IT but scaled to work with billions and billions of IoT devices. Servers, network, storage are now at a scale to allow billions of devices to be connected to cloud services. Along with this scale comes millions of failure events, which could be a degradation of device performance or outright failure. One view is users will get another device run setup based on the new device, connect a replacement IoT device, and the old one disappears. Another view is we have the history of your IoT device, we can help you repair it, replace it, or upgrade it. The damaged IoT device is part of a bigger experience and a device failure is an opportunity to build a new and better experience.

Tamar Budec, VP of portfolio operations at Digital Realty

Tamar Budec, VP of portfolio operations at Digital Realty

Some of you may still think I just want to build highly available, secure, and scalable Infrastructure for IoT, that users will expect it to be no different than their existing IT services. But, I would argue, that we need ore than that. we need IoT infrastructure that does more than compete on availability, security, and scalability. We need infrastructure that provides a sort of institutional memory of what you’ve done with your devices. Where do you think the money is in the infrastructure of IoT? A low-cost infrastructure that quickly gets commoditized or a value added service for the Infrastructure of IoT users will stick with?

 

Peak Inside Code Conference, a place not interesting to data center executives

Here is a post on what it is like to attend the Code Conference.  It’s interesting, but I have no interest in attending.

Secrets of the Code Conference

BradAnimoto

Entrepreneur’s dream: Animoto’s Brad Jefferson with Walt Mossberg

One of the perennial marketing conundrums facing start-up CEOs is picking the best industry conferences to attend to boost their companies’ brands and expand their networks. This week, some lucky entrepreneurs dropped in on what may be the gold standard for tech-industry confabs: The Code Conference.

This is a schmooz event for C level executives.

It was refreshing to hear the big-time CEOs and company founders discuss their companies’ problems and learn that many of them relate to the same issues vexing smaller companies, Shank said—recruiting, dealing with product snafus, etc. Uber Co-Founder and CEO Travis Kalanick talked about not taking a salary for four years and living at home with his mother, which he said didn’t do wonders for his dating life.

See you at 7x24 Exchange Boca Raton

By this time tomorrow I will be at 7x24 Exchange Boca Raton.  Many of my friends have made going to 7x24 Exchange a regular part of our conference plans.  Some going once a year.  I go twice a year and after 6 continuous trips over the past 3 years, I always learn something, make new friends, and enjoy discussing data centers with good friends and smart people.

NewImage

The 7x24 Exchange hears the same from their attendees.

NewImage

Human Perception behind Joyent and Adobe's outage, Typing is Not A Cause

Joyent and Adobe have the outages for the month of May.

Joyent post their post-mortem.

Adobe posts an apology.

Both of these outages occurred during maintenance where someone type a command that did what it was supposed to do impacting a service.  The human perception problem is the person who typed the command could not see/perceive the system wide impact of their command.

Adobe doesn’t provide any details on what they plan to do.  Joyent does.

We will be taking several steps to prevent this failure mode from happening again, and ensuring that other business disaster scenarios are able to recover more quickly.

First, we will be dramatically improving the tooling that humans (and systems) interact with such that input validation is much more strict and will not allow for all servers, and control plane servers to be rebooted simultaneously. We have already begun putting in place a number of immediate fixes to tools that operators use to mitigate this, and we will be rethinking what tools are necessary over the coming days and weeks so that "full power" tools are not the only means by which to accomplish routine tasks.

You can imagine the efforts people will go through to create safeguards to eliminate the possibility of this type of outage.  Unfortunately, this will also most likely put a burden on day to day operations.

Another way to solve the problem is to give people the ability to see the impact of their actions.  No one in their right mind would execute a command to reboot all the servers at Joyent.  And no one in their right mind would delete a directory with all user records.

Watching Google's Data Center Machine Learning News spread

I was curious on how Google’s Data Center Machine Learning news would spread. 

At 1a on May 28, 2014 google posted on its main company blog with this kind of traffic over the past two days.

NewImage

The following are three posts that went live at 1a PT May 28, 2014 as well with the google post and they were able to interview Joe Kava, VP of Data Centers

http://gigaom.com/2014/05/28/google-is-harnessing-machine-learning-to-cut-data-center-energy/

Google’s head of data center operations, Joe Kava, says that the company is now rolling out the machine learning model for use on all of its data centers. Gao has spent about a year building it, testing it out and letting it learn and become more accurate. Kava says the model is using unsupervised learning, so Gao didn’t have to specificy the interactions between the data is — the model will learn those interactions over time.

http://www.datacenterknowledge.com/archives/2014/05/28/google-using-machine-learning-boost-data-center-efficiency/

http://www.wired.com/2014/05/google-data-center-ai/

The Wired article spun the machine learning as an Artificial Brain which gave them more traffic than others.

NewImage

But as I wrote Google’s machine learning is not really AI the way people would think.

BTW, in looking at the other articles, I realized my mistake.  In my post at 1a on May 28, I was a total nerd and got focused on the technology and didn’t mention Joe Kava’s name in my post even though I had interviewed him.  Damn.

Throughout the day the rest of the tech media were able to add their own posts.  I don’t know about you, but I am pretty impressed that Google was able to execute a media strategy that got the range of tech media to post on its Going Beyond PUE with Machine Learning.  PUE is not something widely discussed beyond the data center crowd.

Note the ComputerWeekly post was at the event where Joe Kava Keynoted and got 10 minutes of Joe’s time.  

My 10 minutes with Google's datacentre VP

ComputerWeekly.com (blog) - ‎May 28, 2014‎
Google's Joe Kava speaking at the Google EU Data Center Summit (Photo credit: Tom Raftery) ... Google's network division, which is the size of a medium enterprise, had a technology refresh and by spending between $25,000 and $50,000 per site, we could improve their high availability features and improve their PUEs from 2.2 to 1.5. The savings ... As more volumes of data are created and as mass adoption of the cloud takes place, naturally it will require IT to think about datacentres and its efficiency differently.
 

Google Blog: Better Data Centers Through Machine Learning

PCBDesign007 - ‎May 28, 2014‎
It's no secret that we're obsessed with saving energy. For over a decade we've been designing and building data centers that use half the energy of a typicaldata center, and we're always looking for ways to reduce our energy use even further. In our pursuit ...
 

Google is improving its data centers with the power of machine learning

GeekWire - ‎May 28, 2014‎
google-datacenter-tech-05 In its continuing quest to improve the efficiency of its data centers, Google has found a new solution: machine learning. Jim Gao, an engineer on the company's data center team, has been hard at work on building a model of how ...

Google crafts neural network to watch over its data centers

Register - ‎May 28, 2014‎
The project began as one of Google's vaunted "20 per cent projects" by engineer Jim Gao, who decided to apply machine learning to the problem of predicting how the power usage effectiveness of Google's data centers would change in response to tweaking ...
 

Google's Machine Learning: It's About More Than Spotting Cats

Wall Street Journal (blog) - ‎May 28, 2014‎
Google said in a blog post Wednesday that it is using so-called neural networks to reduce energy usage in its data centers. These computer brains are able to recognize patterns in the huge amounts of data they are fed and “learn” how things like air ...
 

Google data centers get smarter all on their own -- no humans required

VentureBeat - ‎May 28, 2014‎
While most of us were thinking that research would turn out speech recognition consumer products, it actually turns out that Google has applied its neural networks to the challenge of making its vast data centers run as efficiently as possible, preventing the ...
 

Google AI improves datacentre energy efficiency

ComputerWeekly.com - ‎May 28, 2014‎
“Realising that we could be doing more with the data coming out of datacentres, Jim studied machine learning and started building models to predict – and improve – datacentre performance.” The team's machine learning model behaves like other machine ...
 

Google taps machine learning technology to zap data center electricity costs

Network World (blog) - ‎May 28, 2014‎
Google is using machine learning technology to forecast - with an astounding 99.6% accuracy -- the energy usage in its data centers and automatically shift power to certain sites when needed. Using a machine learning system developed by its self ...
 

Google's machine-learning data centers make themselves more efficient

Ars Technica - ‎May 28, 2014‎
Google's data centers are famous for their efficient use of power, and now they're (literally) getting even smarter about how they consume electricity. Google today explained how it uses neural networks, a form of machine learning, to drive energy usage in its ...
 

Google is harnessing machine learning to cut data center energy

Bayoubuzz - ‎May 28, 2014‎
Leave it to Google to have an engineer so brainy he hacks out machine learning models in his 20 percent time. Google says that recently it's been using machine learning — developed by data center engineer Jim Gao (his Googler nickname is “Boy Wonder”) ...
 

Google turns to machine learning to build a better datacentre

ZDNet - ‎May 28, 2014‎
"The application of machine learning algorithms to existing monitoring data provides an opportunity to significantly improve DC operating efficiency," Google'sJim Gao, a mechanical engineer and data analyst, wrote in a paper online. "A typical large-scale ... These models can accurately predict datacentre PUE and be used to automatically flag problems if a centre deviates too far from the model's forecast, identify energy saving opportunities and test new configurations to improve the centre's efficiency. "This type of ...