IBM Change Risk Expert — Never touch a running system...
— unfortunately, running IT systems are not static systems. Applications need to be adapted, preventive changes carried out, bugs fixed, faulty configurations corrected, and updates applied. Changes bring with them the risk of failure. Seemingly innocent modifications can, in some cases, trigger cascading avalanches of service disruptions, in the worst case bringing entire companies to a standstill. According to Gartner [1], changes are responsible for 80% of all incidents that result in client outages. The more complex an IT system is the more difficult it becomes to estimate the effect of a change.
An essential aspect of an effective Change Management process is Risk Management, which aims to assess and mitigate the impact of changes to reduce any chance of failure. Today, IT service providers typically assess the risk of a change through a risk categorization approach performed either manually, or through a questionnaire. Assessing the risk of a change in this manner not only relies heavily on one person's opinion, but also assumes that the change context is the same regardless of the type of change being raised. In practice, no two changes are truly identical. Such deficiencies of the standard practice to assess risk of a change will inevitably result in inaccurate assessments — which in turn can lead to unmitigated risks, ultimately materializing as failures. To help reduce change failure rates and along with their negative impact on IT systems and services, we need to develop more effective and more accurate change risk management techniques to detect risks early in the process and provide for risk mitigation opportunities before they are implemented.
Going hand in glove with risk mitigation is the aspect of pre-emptive knowledge sharing: The larger an IT organization is, the more difficult it becomes for individual change requesters to stay abreast of the success and failure reasons encountered by their colleagues. Traditional knowledge sharing tools require extra effort on the user's part, dramatically reducing the likelihood of those tools to be used, and precluding effective risk mitigation — effectively rendering "lessons learned" into "lessons forgotten".
The service management research group at the IBM Research–Zurich lab has for the last two years been working on the IBM Change Risk Expert tool that aims to provide a tool for standardized risk accessment along with a builtin knowledge sharing component. By semi-automatically classifying change requests, correlating them on a global level with with past failures and their associated root cause analysis we raise awareness for high risk changes early in the ITIL change management process and allow the change management team to focus on the identified high risk factors.
The talk will provide a short overview of the research activities at the IBM Research–Zurich lab and then focus on the Change Risk Expert project, the challenges we faced, and report on first results from IBM internal pilots of the CRE tool.
IBM research Zurich
Dirk Husemann
|