Technical Staff Blog

Category archives: Service Outages

Tstaff announcement about complete service outages

RSS feed of Service Outages

Last update on .

We are aware of ongoing issues with the CS department website. The problem began shortly after midnight when both our main web servers got into an irregular state and had to be rebooted. Now parts of the website are back, but the home page itself (https://cs.brown.edu/) refuses to load properly. Possibly other ...

Last update on .

Two GPFS NFS servers failed simultaneously at around 1pm today, causing file services to be unavailable for about 40 minutes.  The servers, crows and runts, serve different cluster groups; crows serves the grid and runts serves the internal department network.  Runts' failure was a kernel lock-up which prevented the normal automatic failover behavior.  The technical ...

Last update on .

There was a broken water pipe on the 3rd floor near the network closet.  One of the network switch, cit-cs-as.net.brown, was affected and is offlined.  The following CIT rooms are affected this switch outage: 115, 121, 132, and 134.

We are investigating further and will update this post as we learn more.

Update ...

Last update on .

There was a short network outage at 5pm today, affecting some of our user managed machines in Room 310 and the department's Print Host.  While disabling a set of ports on the switch a port we were not suppose to have access to was also disabled, which severed that switch's communications with the ...

Last update on .

At around 11:30am today, our local DNS database became corrupted, causing domain name resolution to fail for local hosts.  This caused a series of cascading failures which rendered most local services unusable.  We restored the database and restarted our name service shortly after noon and all services should be back to normal.  We are ...

Last update on .

The CS Department mailing list server was down for much of today after an update caused a local misconfiguration.  An update to our Sympa software which was automatically installed overnight broke a local configuration script and caused the service to be unresponsive from around 12:30am this morning until about 7pm this evening.  No messages ...

Last update on .

The system database was down this morning between 12:25am and 1:40am.  The pgpool proxy server failed on host mallo, which is the backup server but was acting at the time as the primary.  Failing over to the the usual primary server, carvel, restored services.  During this period services which rely upon the database ...

Last update on .

We are about to perform an emergency failover of the department FastX gateway, which should be complete by 12:00pm. This will mean that anyone who is presently connected to FastX will lose their session(s), along with any unsaved work.

Several users reported over the weekend that the FastX gateway was refusing new connections ...