Operational Intelligence (OI) is a cross-experiment project aiming to reduce the cost of computing operations for WLCG experiments
Why
Experiments with a complex distributed computing infrastructure, such as ATLAS and CMS, report that around 100 persons are involved in computing operations, both centrally and at the sites. Most operation activities involve gathering and sorting monitoring information from various subsystems, spotting problems, and escalating to the experts. Anomaly detection, time series and classification techniques can be exploited to help the operators in their daily routines, and to improve the overall system efficiency and resource utilisation.
How
- by increasing the level of automation in operation tasks
- by leveraging common tools and infrastructure
- by collaborating and sharing expertise, approaches and solutions to common problems among experiments