CSLab System Updates
These updates can also be obtained via an Atom or RSS feed. Alternatively, to be emailed any new updates as they appear, or to cease being emailed such alerts, send email to systemupdates-request@cs.

Thu, May 17, 2012

apps0.cs downtime this morning

This morning apps0.cs suffered a drive controller failure, effectively 
crippling the machine to the extent that anything requiring disk access 
was failing.

We were unable to resuscitate the drive controller, but fortunately we 
did have a spare server of similar specifications available.  We swapped 
apps0's disks to the spare machine, and after some massaging apps0 is 
once again up and running.

/updates/2012    permanent link

Fri, May 11, 2012

Replacement comps4, updated webmail, bookable compute server status

A number of system improvements have been made in the past term that are worth mentioning here:

/updates/2012    permanent link

Thu, Mar 22, 2012

matlab on comps servers

Matlab 7.13 (2011b) is now the default version of matlab 
(/opt/matlab/bin/matlab) on CSLab's comps servers.

/updates/2012    permanent link

Power outage in SF

Sandford Fleming, which houses our primary machine room, suffered a 
building-wide power outage today.  This took down virtually every piece 
of infrastructure that we have.

To make a long story short, as of now (4:10 pm) almost everything should 
be back and working properly again, although some services such as email 
may be slow while everything catches up.

We're still on track for our 6:0 pm downtime tonight; we did not want to 
further delay recovering the systems by attempting to piggyback the 
downtime tasks on top of the power outage (and we were not ready for 
some of them in any event).

/updates/2012    permanent link

Mon, Mar 19, 2012

Another failure on the same fileserver

In keeping with the theme of 'It never rains but it pours', we have just 
had a second, non-related disk failure on a different back-end machine 
serving the same fileserver that suffered yesterday's failure.  At least 
we are physically present this time, so we are currently going through 
the replacement procedure for the second failed disk.

We will have to reboot the fileserver, and all NFS clients are expected 
to be slow and unresponsive until we can bring the fileserver back to an 
even keel once again.

Outage is expected to be about 15 minutes.

/updates/2012    permanent link

Fileserver problems resolved

We've replaced the failed component in the fileserver back-end, and all 
NFS clients appear to be restored to normal functionality.  Should you 
notice any lingering issues, please communicate them to your PoC or 
software@cs directly depending on the severity.

Thank you, and we regret the inconvenience this event may have caused.

CSLab Staff

/updates/2012    permanent link

Sun, Mar 18, 2012

Cascade problems from failed fileserver

It seems that we have experienced a fileserver-related hardware failure 
which has caused NFS filesystems exported from that fileserver to be 
non-responsive.  This in turn has cascaded to problems with NFS clients 
of that fileserver, as in some cases an NFS client will tie itself in 
knots attempting to access the now unresponsive filesystems.

Most of these problems we will not be able to address until we are 
physically on premises Monday morning.  Even then, it may take some time 
to work through the chain of problems and get everything back to normal.

Should you find a particular server responding poorly or not at all, 
please try a similar machine; i.e. if apps0 is non-responsive, you may 
find better results with apps2 or apps3.  Some clients will have 
weathered the storm better than others.

/updates/2012    permanent link

Fri, Jan 20, 2012

Wireless network outage

Over the last little while (two weeks or so), our wireless router has
been experiencing intermittent lockups.

Today at approximately 12:55 pm the router failed again and we decided
the replace it with a hot spare.

The work took about 20 minutes and unfortunately during this time the
CSLab wireless network would have been very unstable. 

Our testing indicates that the new hardware is running as expected and
we do not expect to see additional interruptions, although we shall
monitor it closely over the coming days. 

We are sorry for the inconvenience that this will have caused. 

Thanks 

CSLab Staff

/updates/2012    permanent link


CSLab System Updates
These updates can also be obtained via an Atom or RSS feed. Alternatively, to be emailed any new updates as they appear, or to cease being emailed such alerts, send email to systemupdates-request@cs.
Blosxom