Containers: Beyond Virtualization
By Andy Randall, GM Networking Business Unit & SVP Corp Development, Metaswitch Networks
In the terms of its technology life-cycle, virtualization is rapidly reaching maturity. By 2016, Gartner is predicting that 86 percent of workloads will be virtualized, as enterprises large and small get more comfortable with the migration of their data centers to virtual machines and the cloud.
CIOs seeking the next step-change in resource utilization efficiency are now looking to the next wave of virtualization: Lightweight Linux Containers, a technology being brought to market by several companies, including the white-hot Silicon Valley startup, Docker.
"To work around the security Issues (guaranteeing that contents of one container are not accessible to those of another), some users are combining virtual machines and containers"
These containers provide many of the benefits of virtual machines – enabling you to run multiple, isolated workloads on a single physical machine – but share a single instance of the operating system kernel, thereby massively increasing efficiency. A typical server might run hundreds of containers, compared with a few dozen virtual machine instances. As an added benefit, containers can be launched and terminated extremely quickly (think milliseconds rather than tens of seconds), enabling applications to be built in an entirely new way – for example firing up a new container instance to handle a single web query, in real time.
To these efficiency benefits, environments such as Docker standardize how applications are packaged, enabling developers to specify a well-defined set of prerequisites (such as a particular version of Python or Cassandra). This ensures applications are easily installed and work wherever they install – what Docker calls “build once, run anywhere.”
Google is already at the bleeding edge of this trend. According to a blog post by Eric Brewer, Google’s VP of Infrastructure, the company launches “more than 2 billion container instances” across its global data centers every week. He adds that “the power of containers has enabled both more reliable services and higher, more efficient, scalability.”
That’s all well for Google, but what are the implications for the rest of us with more typical data center requirements?
If you are already running workloads in the public cloud – such as Amazon Web Services, Google Compute Engine or Microsoft Azure – then you may already have access to containers. All of those major cloud providers support Docker today, and others are rushing to add it to their offerings.
When it comes to the private cloud, there is a vibrant ecosystem of technology providers and open source projects – which is of course another way of saying that the choices are confusing! As Ross Jimenez, Engineering Director at Century Link Labs puts it in his recent blog post: “there is still much work to be done in creating the management tools and processes within enterprises to truly leverage the technology.”
We see three key areas that really need work before containers can make it to prime-time in the enterprise.
The need for better security was highlighted in Gartner’s January 2015 report, “Security Properties of Containers Managed by Docker.”
As reports author Joerg Fritsch: Docker containers “disappoint when it comes to secure administration and management, and to support for common controls for confidentiality, integrity and availability."
To work around the security issues (guaranteeing that contents of one container are not accessible to those of another), some users are combining virtual machines and containers. By putting a given tenant’s containers in a single VM, they may not be perfectly isolated from each other but they are protected from other users. However, as Fritsch points out, “except for a further fortification of resource isolation, there is little to be gained from the underlying hyper visor.” Hence, this is a short-term strategy until containers can be strengthened to the point where they can deliver the same level of resource isolation as virtual machines do today.
When managing a network of virtual machines, there are several well-established platforms for orchestrating workloads. For example, VMware’s vCenter or Open Stack in the open source world. These platforms provide an administration console that makes it easy to create, destroy, and even move virtual machines between physical hosts – as well as monitor and troubleshoot.
The equivalent platforms for containers are in their infancy today. The established virtual machine orchestrators are making a play to expand into this space, and in addition many new players – both commercial vendors and open source efforts – are emerging, each with their own view of how containers should be orchestrated. For example, Cloud Soft’s “Clocker” project expands the open source Brooklyn provisioning tool to manage Docker containers, while Mesosphere organizes the machines in a cluster to act as a single virtual computer, with slave nodes that make offers of available resources that are matched to workload requirements. One of the higher profile projects is Google’s own Kubernetes orchestrator, based on the same platform Google uses internally to schedule those 2+ billion workloads every week.
The Networking Model
Today, most container implementations such as Docker support a very simple means of getting IP traffic in and out of containers: all containers in a machine share an IP address, and port mapping is used to identify which container a particular packet is intended for. There is wide recognition that this is not an adequate long-term solution, with existing software defined networking vendors all trying to show how their VM-based solutions can also apply to containers.
Most existing networking solutions, however, implement an “overlay / underlay” model, which – while it functionally enables communication between containers – imposes additional efficiency burdens and complexity issues that become unworkable at the scale we expect containers to be deployed in (hundreds of thousands, if not millions, of containers per data center).
One of the most promising solutions in this area is an open-source solution, Project Calico, sponsored by Metaswitch Networks. Calico treats a set of workloads just like a network of hosts on the Internet, using the same IP networking techniques that we know scale to many millions of endpoints.
As should be clear, we are still at the early stages of containerization, with a lot of wrinkles in the technology to be ironed out. What should a CIO be doing in this situation? I agree with the advice given by Century Link’s Jimenez: “Keep an eye on Docker and Linux containers, have a small team do pilot projects, continue to gain and capture knowledge and figure out what trade-offs have to be met for you to make the jump and heavily invest in it.”