Archive for the ‘Uncategorized’ Category

File locking and IMAP writes

Wednesday, March 25th, 2009

Occasionally in the lab, users have had problems accessing or copying email from their inbox.  The cause is often kernel file locking.  Here are the two situations I’ve seen the problem occur in:

  1. User has a problem with Pine, and it crashes or is killed in a way that doesn’t release the kernel lock on the file (kill -9).
  2. An offline operation to move a message (“UID COPY”) from one IMAP folder to another fails in some way, saying “Timeout while waiting for lock”.

For us, the first case is much more common.  Yesterday was the first time I’ve seen something like #2.  A user was unable to copy a message from his inbox to another imap folder in Apple Mail.  Our server is running dovecot, so I checked and saw that there was a .lock file being created and that the imap process moving the email was frozen.

I did the following:

  • stopped the imap process and closed the client mail app
  • ls -li imapfolder to get the inode number of the file (we use mbox format so all imap folders are files)
  • cat /proc/locks to check and see if there was a kernel lock on the file
  • mv imapfolder imapfolder_tmp; cp imapfolder_tmp imapfolder This creates a new file with a new inode number that can be written.

Using the ADMIN$ file share to review log files

Wednesday, March 11th, 2009

I found a useful way to review the windows folder on clients via file sharing. Basically, an ADMIN$ folder is accessible for every client machine the active directory admin has access to. It will allow you to view and alter files in the c:\windows directory.

Active Directory Machine Account Problem

Monday, March 9th, 2009

Occasionally, I’ve noticed that some computers in the active directory domain we have will no longer update their group policy. When the computer is restarted, the error message given is :

Windows cannot find the machine account, No authority could be contacted for authentication. .

The solution to this issue, or at least a work-around, is to power off the affected computer and disconnect the power until the motherboard loses power. After it is powered up, it resumes normal function in the active directory domain and the group policy updates.

Cluster Upgrade and Active Directory move

Monday, March 2nd, 2009

It’s been a while since I’ve posted, so we’ve managed to clear a few of things on the task list. I’m going to talk a little bit about them here. The cluster update was a success, but I found a side effect of having the path include a reference to itself (to allow cluster specific binaries to be available). When building adding packages or changing the default partitioning scheme for the cluster nodes, the build process would fail if the root environment was accessed from a user who had this path setting and used su. It would build, but the install process would fail because key files aren’t copied in the hierarchy. Specifically, updates.img and stage2.img would be missing.

The active directory move was accomplished by using several vmware instances to test out the various stages. There is still some work to be done with the security certificates, and then the old domain structure can be removed. The cs disk space move was accomplished without incident.

Swap file problems in CentOS (Rocks Cluster)

Thursday, October 16th, 2008

We’ve been experiencing an interesting problem on our cluster nodes which causes them to freeze up.  It appears to be related to the way the linux kernel in CentOS deals with memory allocation requests.  The issue is caused by the swap partition on a machine filling completely, which freezes the system.  Any attempt to start a new process hangs, waiting for space to become available from the swap (which it never does).  There are several ways of trying to deal with this.  The first is to use oomkiller, a process that will detect when the memory limit is going to be reached and kill a process it decides can be sacrificed for the greater good. this is a good description of the memory issues and how to test for them.

The little binary that could: Dellmgr

Friday, September 19th, 2008

The cluster downtime was avoided, thanks to some helpful advice from Dell.  Our cluster uses a pair of PowerVault 220S enclosures which are configured in a RAID 5 array.  When two of the drives in each enclosure went into predictive failure mode, we needed to replace them to ensure the integrity of our data.  Since we’re running Rocks for the cluster os, the dell openmanage tools that would allow us to do the hotswap while the cluster was running weren’t available.

 I didn’t want to install the openmanage software since it seemed to have a large number of modules and really looked like it might be work (which I try to avoid). A dell rep I talked to recommended DELLmgr instead. He told me to ignore the other rpms, and install only the following: Dellmgr-5.25-0.i386.rpm . It installs one file, dellmgr.bin, that talks to the PERC controller card and gives you an interface very similar to the one used in the card’s BIOS, no restart required. I was able to fail the faulty drives and do the rebuild without having to alter the cluster’s running state at all.

 It’s a shame that dell no longer supports it and hasn’t released a version for the new controller cards.

Directx issues with group policy

Monday, September 8th, 2008

I’ve been experiencing problems putting together a silent directx installation that will work with group policy.  I can get it to run correctly when it is run silently but in an interactive session, only to have it fail when it is run as a startup script from a group policy object.  It usually fails to copy some files from a temporary location into the system folder.  I’m going to continue looking into this to see if a work-around is possible.

PBRT on Rocks Cluster OS tests

Wednesday, August 6th, 2008

We’re trying out PBRT (Physically Based Rendering) on our cluster for some of the students.  Rocks OS doesn’t include it as part of the installation package (it has to be installed from source) and it depends on openexr, a package that was developed at ILM to provide high dynamic-range (HDR) image file format. 

Openexr does have a rpm package available, but not in the repository Rocks OS uses.  I downloaded it from DAG along with openexr-devel, uses the version which corresponds to CentOS 4 (RedHAT EL 4) since I belive that is the source for the Rocks OS version I am using.  The only gotcha during the compile was having to change the include directories for Openexr in pbrt’s Makefile from /usr/local/include to /usr/include and the lib directory to lib64, since we’re running 64 bit.

If pbrt works okay on the test node, I’ll install it on the rest of the cluster.

Maya 3DS Max Auto install

Tuesday, July 29th, 2008

I think I’ve figured out why my autoinstallers for 3DS Max 9 and Maya 2008 didn’t work – they required the directx redistributable package to run, specifically d3dx9_34.dll for Maya and some earlier version for 3DS Max.  In addition to the isscript autoIt install (required because the installshield msi file will break all other versions of isscript.msi if installed via group policy), a directx silent install will be required to ensure the installation can roll out on lab machines.

SP3 install results

Thursday, July 24th, 2008

Due to some problems with the upgrade for the 3.0 .Net framework (which many of the lab machines had), XP Service Pack 3 was not able to install.  The workaround was to manually install the 3.5 framework, which included all the updates to the previous versions of .Net framework.  I was having a little trouble install it using psexec, but that was caused by the framework executable displaying an Open File Security Warning when it ran. By right-mouse clicking on the file and selecting “Properties” I was able to click on the “unblock” button and remove the dialog. The remaining machines were then able to apply the framework, and after SP3. The one holdout had a problem with RAM which was causing windows update and other services to crash. Removing the ram looks like it has resolved the problem.