Technical Staff Blog

Category archives: Service Issues

Tstaff announcements related to general IT service issues

RSS feed of Service Issues

Last update on .

network-2400px.png CIS replaced the 5th floor UPS in the switch closet this morning. Unfortunately, two of the switches failed to boot up again. CIS is working on replacing the failed switches, we will update you with an ETA once CIS provides us an update.

UPDATES:

08:02: CIS estimates it will be another hour before the ...

Last update on .

ibm-gpfs.jpg We are currently experiencing some issues with our GPFS file system, which are causing logins to the departmental systems to hang. Updates to this blog post will be added as we debug the issue.

08:34 -  A file system fsck process appears to be hung. A support call to IBM has been initiated as we ...

Last update on .

whoa-1444580.jpg At about 1am this morning, disk hardware providing a backing store for our VMWare machines went offline. This caused nearly every one of our production servers to go offline as well as all our hosting class machines provisioned for users and research groups. CIS is investigating the issue and we will post updates as we ...

Last update on .

power-outage.jpg We experienced some sort of power blip on Saturday, likely the result of the storms that rolled through. This took out a number of grid machines. The majority of the machines are back and operational again, but obviously any jobs running on the machines have been killed. There are still about two dozen machines we ...

Last update on .

network-2400px.png This was supposed to be a trouble free morning... just power up the switch we moved downstairs yesterday and add our redundancy back in. It seems to be a trend with this project, things not going quite as we planned...

07:14: The second distribution switch was powered on at 7am. Unfortuantely, something happened that ...

Last update on .

network-2400px.png 09:53: We are currently experiencing a widespread network outage with any site outside the CS department, including wireless. Tstaff is trying to confirm it's not an issue on our end and we have reached out to the CIS networking team to get their help debugging this.

10:06: We lost the network link ...

Last update on .

postfix.png As part of our ongoing virtualization efforts, we rolled out an upgraded version of our mail relay server yesterday at noon. Unbeknownst to us, the default value for a configuration variable changed in a subtle way that affected delivery to some subdomains. This definitely affected delivery to some of our email lists. We put a ...

Last update on .

owncloud.png 08:54: We are experiencing some database issues with our ownCloud service that could affect syncing back to the servers. The database cluster is managed by CIS, so we are working with their DBAs to diagnose and fix the problems.

09:22: We have shut down one of the backend servers to aid in debugging ...