Why DIY Vanilla Openstack Deployments Fail in Production?

With the rising popularity of Openstack as a great alternative platform for building clouds; a lot of organizations have been tempted to deploy Openstack in production in a DIY mode. While some large organizations have been successful (by hiring an army of Engineers); most DIY openstack deployments end up either failing completely or coming nowhere close to the expected performance, meeting SLAs and reducing the overall cost of cloud operations.

Here are a few reasons that we have seen to be the fundamental problems with trying create your own production Openstack cloud. Please do note that all of these examples are from our experience in-house and with customers; it might be very different from your experience with DIY Openstack.

Steep learning curve

If someone wants to deploy a cloud on a single physical node with Virtualization, Networking, Storage and simple VM orchestration; here is the list of core Openstack and other ancillary projects they will need to deploy and gain expertise in.The challenge becomes exponential when you try to do the same on multiple physical nodes to form a single cloud cluster. We have not even covered other basic production level expectations like service high-availability, integration with existing Identity management systems, backup mechanisms and the new-age Hyper converged Storage.

Non-Availability of Reference Architectures

There is a big dearth of reference architectures for the Hardware and Software components required to create an Openstack cloud. A search on the internet will bring up some simple diagrams on how some Openstack services can be deployed; but we are yet to find a single guide with even a reasonable amount of detail on the full cloud architecture. Here is a list of cloud deployment scenarios that Openstack is usually deployed in, where having a reference architecture is “step 1” to even start thinking of a production deployment.

Lack of Infrastructure Orchestration

An Openstack cloud deployment is impossible to do Manually. The hundreds of software and physical infrastructure components that need to be configured/managed can only be correctly configured and maintained using fully automated infrastructure orchestration tools. For a long time, there was a huge gap in the Openstack ecosystem because of the lack of physical infrastructure orchestration/setup tools. While the OOO project is making great strides in this regard, it is still not production ready.

Missing production features in Vanilla Openstack

There are a few features that are routinely expected when deploying a cloud in production to meet the typical 99.99% uptime SLAs [52 minutes of downtime per year]. Sadly, Openstack lacks these following feature set’s without which, it becomes almost impossible to guarantee anything more than a 99% SLA [3.65 days of downtime per year]

Service/Infrastructure HA:

Currently no out of the box mechanism exists in Openstack which can guarantee cloud availability when whole controller/network nodes fail. While full node failure might be relative rare, in a production setup with more than 10-15 nodes, 1 node failure per year should be routinely expected. More frequent failure occurrences can be that of network paths and disks, which are handled via manual intervention today with vanilla Openstack.

Virtual Machine HA:

There is no mechanism to enable automatic Virtual machine high-availability in Openstack today!

Migration Tools:

There are no simple tools available today to seamlesslly migrate virtual machines and data into an openstack cloud today from other clouds, hypervisors and physical servers. While some third-party paid tools do provide this functionality today, customers usually expect migration tools to be shipped along with the cloud provider/distribution.

Infrastructure Complexity/Cost

For someone who wants to try out Openstack in production with a small footprint, vanilla Openstack sometimes becomes quite expensive in terms of the hardware required to run it.

– A multi-node Openstack deployment with SDN Networking requires about 6 physical nodes.
– A multi-node Openstack + Hyper converged storage and SDN networking requires 9 physical servers!

In conclusion

If you are an Enterprise without your own army of experienced developers to build and manage your Openstack cloud; you will most likely fail and/or overspend on the endeavor. While Openstack provides a great set of cutting edge features for next-gen clouds, organizations should be aware of the complexities and costs involved in deploying and managing an Openstack cloud on their own.