Wed, Dec 15, 2010
We are recovering from losing a shelf of fileserver disks
CSLab's fileserver environment uses an iSCSI-based SAN with all fileserver storage mirrored between disks on two SAN backends. This evening at 5:51pm, the disk shelf for one of the backends had its power supply fail (for an as-yet undetermined reason), causing all 12 of its disks to become unavailable.
This backend was primarily used by fs8.cs (with a portion of it used for extra redundancy on fs2.cs). To restore redundancy to fs8, we have started the process of re-mirroring fs8's storage onto the disks of our hot spare backend. This process is automated and will continue overnight (and perhaps longer) until all of fs8's storage is once again fully redundant.
Since this re-mirroring process does a lot of IO itself, filesystems on fs8 will respond more slowly while it's happening. People with home directories on fs8.cs may notice symptoms such as slower IMAP sessions or their editors having delays when saving files.
fs2.cs's storage remains heavily redundant even with the failure of
this backend. Since fs2.cs holds filesystems that are very sensitive to
the IO load of a re-mirroring, such as /var/mail, we are not going to
re-mirror its storage onto the hot spare backend until later (perhaps
during a quiet time this weekend).
/updates/2010    permanent link