DIU: Data Center Predictive Maintenance

DIU, Suspense 17 February 2021. The Department of Defense (DoD) seeks a solution to continually monitor the health and function of critical information technology (IT) infrastructure systems. Today, when mission impacting events occur, the Information Technology Operations Center (ITOC) acts as the IT crisis response team by restoring failed systems to operational status.

 

DoD Communities of Interest

Artificial Intelligence

Subject

Data Center Predictive Maintenance

Due Date

2021-02-17 23:59:59 US/Eastern Time

Government Organization

DIU

Description
Executive Summary (*Note that this Solution development and deployment must be in a classified environment, either on-premise or on DoD’s Commercial Cloud Services (C2S), and existing level-6 TS/SCI clearance to work on-site in a classified facility; vendors that do not already possess a suitable clearance can make use of a pre-existing integration partner.
All work must be completed by U.S.-based resources within classified facilities (DC, VA, HI, …)

The Department of Defense (DoD) seeks a solution to continually monitor the health and function of critical information technology (IT) infrastructure systems. Today, when mission impacting events occur, the Information Technology Operations Center (ITOC) acts as the IT crisis response team by restoring failed systems to operational status. To assist in identifying systems that either have failed or may soon fail, alerts are used to identify a specific set of machine logs that should be investigated by human operators who can evaluate and address the issue. This process is labor-intensive, time-consuming, and often imprecise.

The intent of this effort is to identify in advance those network components (routers, switches, network cards, chassis, etc.) and infrastructure components (servers running Exchange, SharePoint, Active Directory, Jabber chats, VDI) by implementing and automating processes that help enable cost savings and reduce the need for human intervention in maintaining uptime for various systems. It is expected that this prototype will demonstrate the feasibility and effectiveness of utilizing predictive artificial intelligence (AI) tools for computational infrastructure maintenance across multiple secure information processing environments in a geographically dispersed enterprise. Addressing this challenge will enable the DoD to cost-effectively maintain mission-critical infrastructure at the highest levels of government.

DoD Requirements

Ingest petabytes of information from existing resources like incident ticketing records, logs, metrics, instrument sensors, and other data sources of machine-created data.

Automate the identification and analysis of pattern-of-life-behavior for both networking components (routers, switches, processor cards, network line cards, chassis, power supplies) and infrastructure components (Microsoft Exchange, Cisco Jabber chat, Virtual Desktop infrastructure, storage, hosted server computing, cross domain appliance)

Autonomous & dynamic threshold setting, omitting the need for human definition of alerting thresholds

Provide criticality of each alert based on published, agreed, or learned thresholds

Anomaly detection to predict future component failures, providing ITOC personnel the ability to remedy the issue prior to incident and optimize existing maintenance strategies.

Case management functionality for alerts based on criticality

Successful solutions will demonstrate consistency with the DoD AI Ethical Principles

Root Cause Analysis / Auto-remediation of issues found (preferred but not required)

Prior experience deploying similar solutions within the private sector (financial, IT)

Eligibility Requirements
Special Considerations for this AOI

When evaluating proposals, predictive analytics on network components will be given greater weight than on infrastructure components

Solution development and deployment must be in a classified environment, either on-premise or on DoD’s Commercial Cloud Services (C2S)

Existing level-6 TS/SCI clearance to work on-site in a classified facility; vendors that do not already possess a suitable clearance can make use of a pre-existing integration partner
All work must be completed by U.S.-based resources within classified facilities (DC, VA, HI, …)

Website

https://diu.mil/work-with-us/submit-solution/PROJ01414