IBM Change Risk Expert — Never touch a running system...

—  unfortunately, running IT systems  are not  static systems.   Applications need  to  be adapted, preventive  changes  carried out,  bugs  fixed, faulty  configurations corrected, and updates  applied.  Changes bring with them  the risk of failure. Seemingly innocent modifications  can, in some cases, trigger cascading  avalanches  of  service  disruptions,  in  the  worst  case bringing entire  companies to a standstill. According  to Gartner [1], changes are responsible for 80% of all incidents that result in client outages.   The more  complex an  IT system  is the  more  difficult it becomes to estimate the effect of a change.

An essential aspect of an  effective Change Management process is Risk Management, which aims to assess and mitigate the impact of changes to reduce any  chance of failure.  Today, IT service  providers typically assess the  risk of  a change through  a risk  categorization approach performed either  manually, or through a  questionnaire. Assessing the risk  of a  change  in this  manner  not only  relies  heavily on  one person's opinion, but also assumes that the change context is the same regardless of  the type  of change being  raised. In practice,  no two changes  are  truly  identical.  Such  deficiencies  of  the  standard practice  to  assess  risk  of  a change  will  inevitably  result  in inaccurate assessments — which in  turn can lead to unmitigated risks, ultimately materializing  as failures.  To help reduce  change failure rates and along with their negative impact on IT systems and services, we  need to  develop  more  effective and  more  accurate change  risk management techniques to detect risks early in the process and provide for risk mitigation opportunities before they are implemented.

Going hand in glove with  risk mitigation is the aspect of pre-emptive knowledge  sharing:  The  larger  an  IT  organization  is,  the  more difficult it becomes for  individual change requesters to stay abreast of   the   success   and   failure  reasons   encountered   by   their colleagues. Traditional  knowledge sharing tools  require extra effort on  the user's  part, dramatically  reducing the  likelihood  of those tools  to  be  used,   and  precluding  effective  risk  mitigation  —
effectively rendering "lessons learned" into "lessons forgotten".

The service  management research group at the  IBM Research–Zurich lab has for the last two years  been working on the IBM Change Risk Expert tool  that aims  to provide  a tool  for standardized  risk accessment
along    with   a    builtin   knowledge    sharing    component.   By semi-automatically classifying change  requests, correlating them on a global level with  with past failures and their  associated root cause analysis we  raise awareness for high  risk changes early  in the ITIL change  management process  and allow  the change  management  team to focus on the identified high risk factors.

The talk  will provide a short  overview of the  research activities at the IBM Research–Zurich  lab and then focus on  the Change Risk Expert project, the challenges we faced, and report on first results from IBM internal pilots of the CRE tool.

IBM research Zurich

Dirk Husemann