Wednesday, May 4, 2016

Ocrcheck: Logical corruption check failed: How to backup and recover OLR

Oracle Local Registry (OLR) is introduced in 11gR2/12c Grid Infrastructure. It contains local node specific configuration required by OHASD and is not shared between nodes; in other word, every node has its own OLR.This note provides steps to backup or restore OLR.


Solution


OLR location

The OLR location pointer file is '/etc/oracle/olr.loc' or '/var/opt/oracle/olr.loc' depending on platform. The default location after installing Oracle Clusterware is:
GI Cluster: <GI_HOME>/cdata/<hostname.olr>
GI Standalone (Oracle Restart): <GI_HOME>/cdata/localhost/<hostname.olr>


To backup

OLR will be backed up during GI configuration(installation or upgrade). In contrast to OCR, OLR will NOT be automatically backed up again after GI is configured, only manual backups can be taken. If further backup is required, OLR needs to be backed up manually. To take a backup of the OLR use the following command.
# <GI_HOME>/bin/ocrconfig -local -manualbackup


To list backups

To List the backups currently available:
# <GI_HOME>/bin/ocrconfig -local -showbackup
node1 2010/12/14 14:33:20 /u01/app/oracle/grid/11.2.0.1/cdata/node1/backup_20101214_143320.olr
node1 2010/12/14 14:33:17 /u01/app/oracle/grid/11.2.0.1/cdata/node1/backup_20101214_143317.olr
 
 Clusterware maintains the history of the five most recent manual backups and will not update/delete a manual backups after it has been created.
$ocrconfig -local -showbackup  shows manual backups in the registry though they are removed or archived manually in OS file system by OS commands


#ocrconfig -local -showbackup


To restore

Be sure GI stack is completely down and ohasd.bin is not up and running, use the following command to confirm:


ps -ef| grep ohasd.bin

This should return no process, if ohasd.bin is still up and running, stop it on local node:


# <GI_HOME>/bin/crsctl stop crs -f  <========= for GI Cluster

OR

# <GI_HOME>/bin/crsctl stop has  <========= for GI Standalone

Once it's down, restore with the following command:
# <GI_HOME>/bin/ocrconfig -local -restore <olr-backup>

If the command fails, create a dummy OLR, set correct ownership and permission and retry the restoration command:


# cd <OLR location>
# touch <hostname>.olr
# chmod 600 <hostname>.olr
# chown <grid>:<oinstall> <hostname>.olr

Once it's restored, GI can be brought up:
# <GI_HOME>/bin/crsctl start crs   <========= for GI Cluster

OR

$ <GI_HOME>/bin/crsctl start has  <========= for GI Standalone, this must be done as grid user.


Error:
====
ocrcheck
Status of Oracle Cluster Registry is as follows :
  Version : 2
  Total space (kbytes) : 1024372
  Used space (kbytes) : 3876
  Available space (kbytes) : 1020496
  ID : 916021765
  Device/File Name : /dev/ocr_disk1
  Device/File integrity check succeeded
  Device/File Name : /dev/ocr_disk2
  Device/File integrity check succeeded

  Cluster registry integrity check succeeded


  Logical corruption check failed


Changes

Most likely an OCR parameter was incorrectly set.


Cause

In this case, the -force option was used to set diagwait parameter while CRS 11.1.0.7 was up and running on one/more nodes.
This caused two keys pointing to same keyname.


Solution

To fix logical corruption :
1. Restore consistent backup of OCR. Backup must be from before the change that introduced the corruption.
See steps in OCR / Vote disk Maintenance Operations: (ADD/REMOVE/REPLACE/MOVE) (Doc ID 428681.1)

2. If OCR backup is not available, then rebuild OCR.
See steps in  How to Deconfigure/Reconfigure(Rebuild OCR) or Deinstall Grid Infrastructure (Doc ID 1377349.1)

References

NOTE:428681.1 - OCR / Vote disk Maintenance Operations: (ADD/REMOVE/REPLACE/MOVE)
NOTE:1377349.1 - How to Deconfigure/Reconfigure(Rebuild OCR) or Deinstall Grid Infrastructure