Incident management – managing tickets and getting the impacted asset and the user back to work – is the bread-and-butter practice of the service desk.
Problem management – which basically aims to identify, investigate, and resolve the root cause of incidents – is arguably more important from an ITSM perspective. It definitely drives more impact per time spent.
We covered the applications of AI in incident management in the previous chapter. Let’s talk about problem management and the potential ways in which AI can transform it.
Reactive problem management goes one step ahead of the firefighting nature of incident management. It attacks the source of the fire to prevent further damage and avoid similar problems in the future.
Proactive problem management goes even further. It focuses on anticipating and preventing potential problems before they lead to IT incidents. Think of it as the measures you’d take to make sure that fires don’t erupt in the first place.
AI can proactively identify potential problems by analyzing historical data and identifying patterns that might lead to incidents. For instance, AI could predict server outages based on historical performance data, allowing IT teams to address issues before they can cause an org-wide disruption.
This requires a system that continuously analyses IT systems, processes, and trends to identify weaknesses or risks that could cause issues in the future. The goal should be to resolve these potential problems before they result in incidents, thereby reducing disruptions and downtime.
Even with the most vigilant combination of AI and human, not all problems can be avoided. Some problems might still slip through the cracks and sometimes it just isn’t practically viable to take preventative steps for every problem, just in case it occurs.
In such cases, AI can help speed up problem resolution in a few different ways.
AI is good at pattern recognition, making it excellent for RCA. An AI-powered system can analyze vast amounts of data from various sources and quickly pinpoint the most likely root causes of an issue. Humans will be much more efficient at working with this narrow pool of potential root causes than chasing a wild goose and exploring every possible thread.
Problems that are not super urgent generally go to the backburner and stay there forever. An AI assistant can help sort problems based on non-linear factors like which assets/CIs are impacted, how many users are impacted, and who those users are. If you’d like to bump up a problem that’s blocking mission-critical work, plain automation might not cut it.
Some problems are straightforward. If a company-wide OS update is immediately followed by a spike in incidents about system slowdown, it’s highly likely to be the cause. Other problems can require digging out past incident and problem records, event logs, changes, etc. AI can be leveraged to follow hunches with minimal opportunity cost.
Your IT org might be great at documentation, but some problems are caused by things beyond your control. You can have the AI assistant learn from publicly available KBs like Microsoft support and Apple support. AI can catch a recent security update that’s known to cause issues much faster than a team of human agents.
We’re just scratching the surface of what AI can potentially do to help with problem management. These models will only get smarter and more proactive over time. Getting to zero incidents might be a distant dream, but a combination of AI-powered proactive problem management and service continuity management will get you pretty close to a disruption-free IT org.