Technical Staff Blog

Archives July 2015

Last update on .

We had a gpfs filesytem failure on Wednesday, July 15, from 12am to 9:30am.  The gpfs cluster was in an unhappy state starting at 12am when one of its node went into an unknow state.  Then around 3:15am the cifs cluster went down.  During the gpfs filesytem outage as we were working to ...

Last update on .

Since the unplanned network outage on July 2nd, our compute grid has been unstable. We've been scouring grid logs, examing network switches, and generally pulling our hair out. The issue could be replicated by simply restarting the grid master service. We may have, finally, located a corruption in the underlying configuration files that was ...

Last update on .

Due to a network outage resulting from a switch upgrade not performed by Tstaff, a handful of services were down this morning.  Specifically, the Grid is currently up but not running any jobs and the List Server was down between 8:30am and 10am.  We're working as quickly as we can to find any ...

Last update on .

The issues with the file system have been resolved.  We are now crawling the department looking for front end machines like VMs and websites that may still be having issues.

If you have any trouble using the department's services, please email

Last update on .

The filesystem is back to normal, but we still have some services that depend on this that need attention. Windows users and remotely connected hosts mounting the filesystem over VPN may still be experiencing connection/slowness issues while we're working on bringing all of our CIFS servers back online.

Daily archives

Previous month

June 2015

Next month

August 2015