We had a gpfs filesytem failure on Wedneday, July 15, from 12am to 9:30am. The gpfs cluster was in an unhappy state starting at 12am when one of its node went into an unknow state. Then around 3:15am the cifs cluster went down. During the gpfs filesytem outage as we …
Since the unplanned network outage on July 2nd, our compute grid has been unstable. We've been scouring grid logs, examing network switches, and generally pulling our hair out. The issue could be replicated by simply restarting the grid master service. We may have, finally, located a corruption in the underlying configuration …
Due to a network outage resulting from a switch upgrade not performed by Tstaff, a handful of services were down this morning. Specifically, the Grid is currently up but not running any jobs and the List Server was down between 8:30am and 10am. We're working as quickly as we can …
The issues with the file system have been resolved. We are now crawling the department looking for front end machines like VMs and websites that may still be having issues.If you have any trouble using the department's services, please email problem@cs.brown.edu.
The filesystem is back to normal, but we still have some services that depend on this that need attention. Windows users and remotely connected hosts mounting the filesystem over VPN may still be experiencing connection/slowness issues while we're working on bringing all of our CIFS servers back online.
We are currently experiencing network and filesystem slowness as a result of heavy grid usage. We are aware of the problem and have contacted the job owner to develop a solution.