Artificial intelligence for IT operations, AIOps, refers to the application of machine learning and data science to IT operations. AIOps systems monitor huge volumes of log and performance data typically generated in a large enterprise, to gain visibility into dependencies and solve problems.
An AIOps platform should include these three capabilities, suggests a recent report in TechTarget:
Automate routine practices. These include user requests and non-critical IT system alerts. For example, a help desk system can process and fulfill a user request to provision a resource automatically. The system is also able to evaluate alerts and determine which ones require action, and which are based on metrics and supporting data within normal parameters.
Recognize serious issues faster and with greater accuracy than humans. The system should be able to detect behavior out of the norm, especially on critical servers, by processing volumes of data not possible for humans to monitor on their own.
Streamline the interactions between data center groups and teams. AIOps provides each functional IT group with relevant data and perspectives. The AIOps system learns what analysis and monitoring data from the large pool of resource metrics to show each group or team.
AIOps is suited to complex IT operations typical of large enterprises, involving hybrid cloud platforms for example. Data comes from multiple sources including log files, metrics, monitoring tools and help desk ticketing systems. Big data technology is used to aggregate and organize the output into a useful form. Analytics techniques are used to interpret the raw data and spot trends and patterns that can identify and isolate problems, including capacity issues.
Algorithms in the system codify the organization’s IT expertise, business policies and goals, so the platform can deliver the most desirable outcomes or actions. The algorithms are used to prioritize security-related events and teach the platform what application performance decisions are appropriate. These algorithms form the foundation for machine learning; they establish a baseline of normal behaviors and activity, and they can learn and evolve as the environment changes over time.
Automation enables the AIOps tools to take action, triggered by outcomes of the analytic and machine learning. A tool’s predictive analytics and ML may determine that an application needs more storage, for example. An automated process can then be initiated to add storage in increments consistent with the rules embedded in the algorithms.
Visualization tools deliver dashboards, reports, graphics and other output so that human operators and managers can see the changes and events in the environment. These typically allow human operators to take actions that require decision-making capabilities beyond those of the AIOps software.
The technology underlying AIOps is fairly mature, and the field is poised to enter the next phase of maturity in combining the technologies for practical use. The amount of time and effort needed to implement, maintain and manage an AIOps platform can be substantial. Results can vary.