HUIT Service Status Dashboard

Open OnDemand Service Interruption

Incident Report for HUIT

Resolved

The outage affecting HUIT Open On Demand has been resolved, and users are able to login and access resources. We'll continue to closely monitor to ensure service stability.
Posted Apr 17, 2025 - 09:46 EDT

Monitoring

Restarting the slurm controller node and altering its configuration has enabled launching interactive apps in HUIT Open OnDemand once again. HUIT will continue to monitor the service to ensure stability before resolving the Major Incident.
Posted Apr 16, 2025 - 19:44 EDT

Identified

The service team believes they've identified a path forward. They continue working to investigate and remediate the root cause.
Posted Apr 16, 2025 - 18:45 EDT

Investigating

When logging into https://ood.huit.harvard.edu, users are able to load the Open OnDemand dashboard, but interactive apps will not start properly. The terminal app may load, but the Slurm scheduler is unstable, and compute jobs may or may not run.

User data is unaffected, and can still be downloaded through the Open OnDemand dashboard.

FAS Academic Technology is troubleshooting this issue and working to restore this service.
Posted Apr 16, 2025 - 17:12 EDT
This incident affected: Other Services.