Dude, where's my disk?

Today I took a look at a clients 10g Oracle RAC environment where one node apparently had started misbehaving quite some time ago. The scenario was that both nodes were up, cluster ready services was up, nodeapps were up, ASM was up but the instance was down. Starting from the top with the database alert log, it was apparent that ASM had no available disk groups for this database. Okay, let’s ask ASM what it feels, then. Looking in the ASM log file it was equally obvious that the ASM instance was online but didn’t service any disk groups. That’s odd. Let’s bring out the whole toolbox and start from the top. It didn’t take very long for the first problem to appear: ASMTOOL couldn’t see any stamped disks. Even worse, running asmtool -list gave me OS Error 21 - disk not ready. The server saw the disks via Disk Manager, but ASM couldn’t see the disk headers. That’s novel. Not very helpful, but novel. Considering what might interfere with disk headers and disk presentation in general, volume managers and similar comes to mind. Knowing that the customer runs NetApp in the organization, I took a look at SnapDrive. SnapDrive should never be installed (or heaven forbid - be running!) on a RAC cluster machine. The node that worked actually HAD SnapDrive - albeit disabled. The node that didn’t work also had SnapDrive - running happily. Time to end that. A reboot and back to square one. Still couldn’t see the ASM stamps. Time to check DISKPART for automount setting. It turned out to be disabled, so that was rectified:

DISKPART> automount enable

Automatic mounting of new volumes enabled.

But still no diskgroups for me, despite a reboot and a CRS restart. On a whim I took the disks offline and online again. To my surprise that was the push that awoke the ASM stack. From that point everything worked like charm; ASM saw the stamps, ASM could be persuaded to mount the disk groups and then the database was started without any issues. Some points to take home:

  • Just because something is ancient, don’t expect it to be removed from use (i.e old knowledge always comes in handy)
  • When people tell you that they haven’t done anything to the environment - politely smile and expect that they have.
  • Oracle RAC is a finicky beast - always pay attention to the support matrix.
  • RAC is only as highly available as the administrator tasked with supporting it.