Technical Staff Blog

Last update on .

The power outage went quite smoothly, in fact the work was completed about an hour ahead of schedule. We did have a few unexpected machines go down when the "A" side of the building power was taken offline, including:

  • dblade05 - dblade10
  • dblade25 - dblade30
  • dblade55 - dblade60
  • cslab0a, babbage, tesla, maple, robby (on the stage of the ...

Last update on .

The start time of the Saturday power outage has been moved up by an hour. Everyone must vacate the building by 5am and no one will be permitted access starting at 5am until the work is complete. In accordance with this, we will now be shutting down supported desktops and affected angstrom compute nodes at ...

Last update on .

The main CS server room has approximately 140 servers. The vast majority of these servers are partnered up, so a loss of any one server automatically fails associated services over to it's partner and the service continues to be available; albeit after a momentary blip as the services fail over. Unfortunately, we don't ...

Last update on .

Power to the entire CIT will be shut off at 7am on Saturday, March 28th to accommodate some work being done in connection with the chiller replacement. They plan to lock building at 6am and reopen at 10am. If all goes as planned, our main server room will stay online as will the Sunlab. This ...

Last update on .

Around 8:15am this morning, we experienced some sort of power blip. According to Facilities, many buildings around campus were affected and the issue was with power supplied by National Grid. We have reports of some offices being affected, machines suddenly shutting down, and about 70 grid nodes went down. Tstaff is working on bringing ...

Last update on .

Contractors are making some power changes to the circuits that feed our compute nodes in the data center tomorrow morning. These changes should prevent our hardware from being affected by the annual data center emergency power test performed by CIS.

If all goes as planned, the majority of the compute nodes should remain operational. We ...

Last update on .

We were experiencing an unusually high load on one of the GPFS NFS nodes (runts), which was causing a slowdown of all NFS traffic through this node. We've seen this behavior before and, as of yet, have been unable to identify the root cause. Unfortunately, it doesn't get better with time, so we ...

Last update on .

We added a new disk tray of 10 2TB sata disks to the Main GPFS file system, providing an additional 15.6TB of usable space.  The Main filesystem is running a disk restripe to rebalance the filesystem. This process will likely take a week to complete, but normal file system operations should be available through ...

Last update on .

By now, you've no doubt noticed a large truck bed with two big cooling towers parked next to the CIT. Are you curious about what's going on? The chiller system, which provides cool water to the various machine rooms in the building, is original to building, rather cantankerous, and in the process of ...

Yearly archives