Technical Staff Blog

Author archives: Mark Dieterich

RSS feed of Mark Dieterich

Last update on .

network-2400px.png CIS replaced the 5th floor UPS in the switch closet this morning. Unfortunately, two of the switches failed to boot up again. CIS is working on replacing the failed switches, we will update you with an ETA once CIS provides us an update.

UPDATES:

08:02: CIS estimates it will be another hour before the ...

Last update on .

ibm-gpfs.jpg We are currently experiencing some issues with our GPFS file system, which are causing logins to the departmental systems to hang. Updates to this blog post will be added as we debug the issue.

08:34 -  A file system fsck process appears to be hung. A support call to IBM has been initiated as we ...

Last update on .

whoa-1444580.jpg At about 1am this morning, disk hardware providing a backing store for our VMWare machines went offline. This caused nearly every one of our production servers to go offline as well as all our hosting class machines provisioned for users and research groups. CIS is investigating the issue and we will post updates as we ...

Last update on .

power-outage.jpg We experienced some sort of power blip on Saturday, likely the result of the storms that rolled through. This took out a number of grid machines. The majority of the machines are back and operational again, but obviously any jobs running on the machines have been killed. There are still about two dozen machines we ...

Last update on .

EMC_Isilon.png On Saturday June 4th, CIS will be physically moving their EMC Isilon, the hardware that provides file services for the campus. The move is scheduled to commence at 6am and complete around 4pm. We currently have three production services relying on the Isilon: email lists, ownCloud, and time machine backups. In order to keep our ...

Last update on .

network-2400px.png This was supposed to be a trouble free morning... just power up the switch we moved downstairs yesterday and add our redundancy back in. It seems to be a trend with this project, things not going quite as we planned...

07:14: The second distribution switch was powered on at 7am. Unfortuantely, something happened that ...

Last update on .

network-2400px.png 09:53: We are currently experiencing a widespread network outage with any site outside the CS department, including wireless. Tstaff is trying to confirm it's not an issue on our end and we have reached out to the CIS networking team to get their help debugging this.

10:06: We lost the network link ...

Last update on .

network-2400px.png On May 31st @ 7am CIS will be shutting down one of our two distribution switches and, because it is directly attached, we will be shutting down our backup firewall. Because of redundancy, we do not expect this work to cause any network outages. CIS will spend the remainder of May 31st moving the switch to ...

Last update on .

ibm-gpfs.jpg Overall, the migration of GPFS to the basement went remarkably well! We lost a total of two power supplies, both failures we could sustain through redundant hardware, and ran into a few bumps along the road, but it could have been a catastrophic failure and sure wasn't. We do have some additional cleanup that ...