To build a Standard that encompasses the best of all the companies contributing
What exactly is your role in the Zero Outage Industry Standard Association?
I have been a part of the Zero Outage Industry Association since the beginning. Currently I am leading the Process workstream, which, so far, covers three main areas; change management, incident management and problem management. We are currently working on the larger topic of Event Management, including how that links with monitoring and alerting, for the next release.
My incentive to work for the Association comes from my strong belief in the idea and concept of Zero Outage. Hundred percent uptime for IT environments is a fascinating concept and the introduction of the Zero Outage Industry Standard aims to help make that a reality. Drawing from the experience and knowledge of the well-known companies that are part of the association, increases the quality & depth of the material produced. The Association has created an environment for professionals from these companies to work together in a mutually respectful way, to exchange information & learn from each other, in order to build a Standard that encompasses the best of all the companies contributing.
Describe what you do in the workstream
Each of the four workstreams is made up of representatives of the member companies. The aim is to bring together the knowledge and experience of these professionals to create a “best-of-breed” standard with the aim of achieving 100% uptime i.e. Zero Outage. I believe that to achieve the ideal of the Zero Industry Standard Association, the four elements People, Processes, Platform and Security, need to be dealt with together. At the start, I was also involved in the People workstream, and it is clear that collaborating and working with other workstreams allows us to gain a more complete understanding of the whole IT environment. So, while in the Process workstream we are focused on developing best practice guidelines & recommendations for various operational processes, we recognise the importance of linking up with the other workstreams to ensure all the work we are doing is aligned with each other.
How does NetApp benefit from the Zero Outage Industry Standard Association?
NetApp has always strived to provide its customers with highly available, highly reliable solutions but it is clearly not enough to simply focus on the product or the security of those products. To achieve zero outage, one must also focus on the processes which govern how we deal with the operations of a business and the people who create & implement those processes. To give you an example of what I mean…
One of our customers wanted to move a storage device from one data centre to another. This was a high availability storage device which means it consisted of two “nodes”. One of the nodes would have all of its data transferred to the other node and then be moved to another datacentre. Physical hardware moves such as these are common but can be problematic if the change is not properly managed. In this case the change process was not followed properly and the move was done during production hours. The data had been transferred to the storage node that was not being moved & the additional load led to this node becoming slower and slower until it became unusable, which inevitably lead to an outage. This outage then created an incident, which badly affected the customer, & inevitably the incident became a problem. The root cause of the problem occurred at the very beginning stages of the planning phase for the hardware move. There had been no consultation with NetApp regarding the change and so the customer did not realise that while, in theory the data transfer and move could be done “online” during production hours, the additional data load on one node could severely impact performance. This is an excellent example of how process guidelines are crucial in the IT industry since the Zero Outage Standard recommends consultation with vendors on important changes such as these, in the production environment. The Zero Outage Industry Association enables competitors, vendors and clients to come together and exchange ideas and knowledge. The resulting guidelines & best practice recommendations will, if implemented, help to avoid outages, like the one in the example.
(Interview by Erik Djamgarian)