Thoughts on Infrastructure Lifecycle Management

Over the years I had the fortunate opportunity to design, implement, enhance or operate a multitude of IT infrastructures, ranging from “bare-metal” physical datacenter to highly abstracted “pseudo” infrastructures offered by Public Cloud.

What I began to realize is that, while the inner workings of physical and cloud infrastructures are vastly different, the methodologies used for managing infrastructure lifecycle are absolutely the same.

Let me explain.

My first observation was that the infrastructure goes through the same seven phases during its lifecycle:

  • Discovery & Inventory – discover components of the infrastructure (compute nodes, storage services, network subnets, etc.) and store their characteristics in an inventory system.
  • Allocation & Provisioning – “grab” one or more components from inventory and deploy an application.
  • Runtime – run the application and monitor its operational parameters.
  • Maintenance – put the infrastructure that underpins the application or the application itself in maintenance mode for the purpose of performing various operational maintenance procedures (e.g. apply a security patch).
  • De-allocation – EOL (end of life) the application and release the infrastructure components that underpins it.
  • Reclamation – reset back to defaults the components freed up in the prior phase and present up them for inventory.
  • Retirement – retire the components, e.g. remove the server from the rack.

Whether the infrastructure is a physical data center or collection of Amazon EC2 instances, security groups and AMIs, they all evolve through the same phases described above.

My second observation was that tooling and processes employed in each phase are not only vastly different, but the teams that performs them have relatively different skill sets.

For example, Discovery & Inventory is performed using tools and practices that are very vendor-specific. Dell offers Open Manage, HP offers Insight Manager, Cisco has UCSM. In this only for servers. Add Cisco switches and EMC storage and you’ve got yourself at least a dozen technologies that you need to master. And we are not even talking yet about the inventory system that needs to hold all this information.

Next phase, Allocation & Provisioning, is when the operating system is deployed on servers, IP addresses are assigned, storage volumes are stood up and so on. Once again, we have plenty of technologies to choose from. Few that come to mind are PXE, CHEF, Ansible, Kubernetes, vendor-specific, etc.

Anyway, you get it!

The reality is that, instead of a lot of leverage and synergy, every single phase is very much silo-ed in its own domain, with its own owner, skill sets and priorities.

My third and final observation is forcing the Enterprise on to PaaS (platform-as-a-service) or SaaS (software-as-a-service) does not make it much easier for dev and operational teams to manage software and/or system lifecycle.

So … what should we do?

My plan is to answer in few upcoming blog posts.

Thanks for your attention!