Zero Outage Map
The main goal of the Zero Outage Industry Standard is to strive for Zero Business Outages. Companies following this standard will benefit from the combined industry experience of providers as well as consumers of highly available and reliable IT solutions. It contains recommendations to find the right balance between reactive and proactive activities.
The Zero Outage collection of best practices offers specific guidance to enable IT professionals to plan, build, deliver and run end-to-end IT solutions suited for the most critical business functions and processes. Our proposed Industry Standard offers a holistic view beyond technology and includes additional dimensions like people, processes and security.
Therefore our objective is twofold:
- Describe how to provide Zero Outage enabled services based on operating procedures for customers and concrete use cases for suppliers and
- Prescribe how to implement Zero Outage compliant services and management capabilities with concrete design criteria that shall be used by suppliers as requirement document for their product and service development. Benefits for the IT consumption side of services for example is standardized, precise and measurable input to Requests for Proposals (RFP).
In order to articulate descriptive best practices and prescriptive design principles for the delivery of Zero Outage compliant services, the specifications must entail holistic and consistent consequences across the complete operating model of IT, covering people, processes and technology:
- Guidelines towards organizational structure and specific people competencies, including the required architectural governance to steer the execution of service delivery, both, within the provider and the consuming organization.
- The process framework in which services are being delivered, where ITIL is a great start with a very high adoption in the market. However, there are additional and complementary areas of prescription required, towards achieving the Zero Outage quality as well as integrating recent paradigm shifts in IT, like complex sourcing and more agile ways of working.
- The technology platform on which basis the services are being delivered, requiring specific architecture policies to guarantee the required level of resilience, security, availability and performance to support the Zero Outage quality goals.
Consequently, the Zero Outage association entertains workstreams in all these categories to harvest the market knowledge and innovating thinking of the participating members and distill those into consumable value. The editorial board was chartered with the goal structuring and guiding the work across all workstreams to achieve the required integration and consistency.
In order to achieve consistency, not only between the different workstreams, but also between the different complexities of prescriptive design principles vs. descriptive best practices, an architectural framework is necessary in which the individual workstreams can operate fairly independent, yet still deliver consistent outcome. This framework entails three major components:
- Zero Outage Map: structuring and describing the taxonomy of the modern IT Landscape as a holistic, end-to-end IT Value Chain, which helps guiding the development of the standard work as well as navigating through the consumption of the outcome. The remainder of this document focuses on this element.
- Functional and Information Architecture (part of Reference Architecture): translates the integrated IT capabilities described in the ZO Map towards the architectural specificity of a functional model and information model. In other words, it specifies the functions required to achieve the capabilities, the information they hold, and how the functions need to interact in order to preserve the quality and integrity of the data. The reference architecture is a frame in which many specific implementation architectures can be defined in the context of a specific provider-consumer relationship.
- Layered Model: The functions of the reference architecture implement the end-to-end delivery of services on top of IT technologies. These technologies are not independent, but organized as an interconnected stack, e.g. infrastructure typically consists of network, storage and computing layers that depend on each other. However, these connections can be very diverse and dynamically changing. Therefore it is essential to provide design principles across the stack, for which the layered model provides a fundamental and generic structure.
Graphic 1 illustrates the different elements of the Zero Outage architectural framework, how they build on each other, and the related components.
As a consequence of the described motivation we concluded the following driving objectives for the Zero Outage Map component:
- Describe the landscape/taxonomy of end-to-end Zero Outage compliant service delivery in the modern IT. In doing so, take an outside-in, customer value focused perspective, leverage and complement existing standard and frameworks as much as possible (IT4IT, ITIL, CoBIT, ISO, etc.).
- Position and outline the required capabilities in a sufficient level of specificity to structure and guide the Zero Outage Industry Standard development work across People, Process, Platform, and Security.
- Serve as a structure to organize the publication of the best practices and design principles, as well as a navigation and drill-down pane to consume the value in an intuitive and easy manner.
The obvious question is why choosing an IT value chain concept and what does it entail? Let’s start with what it is and what it entails. The value chain is well known business concept, described by Michael Porter in 1985(1) (see graphic 2). The principles are actually pretty simple:
- A sequence of related steps (primary activities) that successively create value for related stakeholders, e.g. customers, shareholders. The key to this definition is that the value outcome of the chain is greater than the sum of the parts.
- The additional value is created by the synergy of working together, based on an integrating and automating commonality (supporting activities), which can be common business functions (e.g. procurement), technology (e.g. master data of a supply chain) or infrastructure (e.g. a conveyer belt setup in a production line). The key is that this common connecting tissue makes the value chain more efficient, repeatable and predictable.
IT cannot really claim predictability, the typical end-to-end maturity is fairly low, as is the level of collaboration across organizational and technology silos. IT has grown fast along technology disruptions creating new silos, barely having the time to mature and integrate between them.
When thinking about this, it becomes glaringly obvious that the concept is highly applicable to IT, and specifically to Zero Outage. In order to deliver services end-to-end, you can quickly determine the set of required steps that are intimately related to each other. And to deliver at the Zero Outage quality level, it has to be efficient and most importantly predictable.
Leverage of IT4ITTM
In order to design the Zero Outage Map as an end-to-end value chain, it seems obvious to leverage the IT4ITTM Value Chain standard(2), published by The Open Group, which provides a description of the IT landscape how to run IT as a business (see graphic 3). Effectively The Open Group applied and translated the Porter Value Chain to the IT problem. The huge advantage is that this is defined as an open industry standard, continuously reviewed and evolved by a representative group of consumers and providers.
The IT4ITTM value chain is structured into four value streams, describing the well-known Plan, Build, Run phases, but adding the Deliver phase, which segregates the concerns of creating service release packages and the actual instantiation in production via service order and fulfillment catalogues. This is a direct answer to IT trends, namely DevOps, Cloud and Service Broker becoming pervasive.
Furthermore, based on the value chain concept, The Open Group developed and published the IT4IT Reference Architecture standard, which provides a functional and an information model prescribing how to deliver services in a business fashion, hence a very good starting point for the Zero Outage Industry Standard association to expand on that basis, adding Zero Outage specific architectural policies and data model aspects.
(1) See “value chain” definition in Wikipedia
(2) IT4IT™ is a trademark of The Open Group
When looking at the IT4ITTM Value Chain, the commonality and applicability to the Zero Outage problem is obvious. All phases are of significance to evolve towards Zero Outage interpretation of the value chain, articulated by the Zero Outage Map:
- Properly rationalize the business demand and cost/benefit of Zero Outage quality and Plan for the appropriate boundary conditions delivering the selected services accordingly. This entails strategic elements, such as an operating model tuned towards Zero Outage, as well as architectural policies, such as properly architecting the infrastructure stack to ensure the required level of resilience and security.
- Based on the given boundary conditions, the services need to be Built to meet the Zero Outage requirements, in particular the required non-functional requirements, which will be major outcome of the Zero Outage Industry Standard, which can be directly leveraged into Request for Proposals to prescribe the sourcing of required products/services from vendors and Service Level Agreements to hold service providers accountable. When developing internally the design principles guide the proper service design, development and specifically the setup of automated testing.
- Traditionally plan and build was followed by run, but the revolution of virtualization technology, sourcing models and development methodologies (agile, DevOps) required the innovation of a fourth, intermediate step called Deliver. These revolutions all have one major consequence: complexity, making it much more difficult to see and track what is going on. That is in direct contradiction to the keys of the value chain concept, namely collaboration and predictability. The traditionalists would say “keep cloud away from Zero Outage” and the New Agers would counter “cloud solves your resilience problem by definition”. Since both are insufficient, we need to make services in hybrid environments (being the dynamic mix of traditional and cloud elements) work and manageable. One crucial element is the ability to construct services on request from various catalogs from various providers, and to activate such services in heterogeneous, hybrid infrastructure environments while keeping them controlled. This includes control of usage and respective charging, as well as manageability. Even though there is more to it, this is the essence of Deliver, what IT4IT calls Request to Fulfill.
- Finally and probably best known in terms of processes and management maturity is the Run phase, which assures that the services being in usage are delivered at the required Zero Outage quality level and stay that way through the reality of inevitable, continuous and dynamic change. However, Zero Outage changes the game significantly. While organizations used to be fairly nonchalant with the notion of “proactive operations”, it seemed to be “good enough” to automate the known and fatalistically accept the disruption of new surprises. Zero Business Outage means by definition that there can’t be surprises. Therefore Zero Outage requires innovation of the traditional Run towards anticipating and preventing issues before they disrupt the business.
In addition to the four main phases of Plan, Build, Deliver and Run, the Zero Outage Map also articulates the fact that services become obsolete, hence explicitly add a phase to Retire services. Zero Outage services typically require cost and/or labor intense components and attention, which should be released explicitly and made available to other use cases.
Like the IT4ITTM Value Chain we place the Service Model in the center as the connecting tissue, which is in the heart of the Zero Outage Industry Standard, as described earlier, and of specific relevance to the platform and security workstream work.
We have chosen to depict the value chain as a circular rather than a linear model, well knowing that the modern IT requires continuous iteration between various capabilities within and between phases. Also we chose to depict supporting functions as a surrounding frame and focused on selected ones specifically important for Zero Outage:
- Governance: Zero Outage has a lot to do with guarantee of a certain quality, which in turn requires governance and control to ensure it actually happens. Plus, it requires governance continuously across the entire value chain. At any given point one needs to be able to determine the current state and require course of action.
- Analytics & Reporting: this is a key enabler as it provides crucial insights to continuously improve service delivery. You could well argue this to be a sub function of above or below. But it has evolved to be a science, big data has evolved technologies to become a source of innovation in and of itself. E.g. the determination of anomalies and anticipation of failure is only feasible through the level of analysis that can be done today.
- Risk Management: At the end of the day decisions always need to find the right balance of conflicting priorities and boundary conditions, which is in particular critical for Zero Outage to meet the goal of no business degradation, clearly understanding the risk and impact of actions and changes, being ready for the unexpected.
- Supplier Management: multi-supplier service delivery is mainstream, even though the maturity of it is often more on the low end. One of the objectives of the Zero Outage Industry Standard is to structure and streamline the cooperation of suppliers jointly delivering Zero Outage compliant services. One of the key elements is to make the touchpoints between suppliers transparent and measurable.
This document provides the fundamental concept and approach of the Zero Outage Map. It articulates the high-level picture and semantics behind it, as well as the principal relevance for the Zero Outage objectives. But it is not yet sufficient to provide the required architectural guidance to develop a related reference architecture. However, we found it important to publish this first step in a timely manner preparing for the next step.
The next version will drill down into the individual phases of Plan, Build, Deliver, and Run, specifying the required capabilities along with their iterative interactions. We plan to develop this next step for the next publishing window.
Looking further out, we also need to evolve the definition and articulation of the Zero Outage Map towards an operating model, expanding on related roles & responsibilities, metrics and KPIs.