Wednesday, February 17, 2016

What is BBU and Learn Cycle

Battery Monitoring via Learn Cycles

Learn cycles are done periodically to fully discharge the battery and re-charge it. When complete, the BBU determines the new capacity of charge the battery can hold. Failure to run learn cycles at their recommended intervals may reduce the usable life of the battery by reducing the full charge capacity more rapidly leading to premature end of service life. This is reported by the "Full Charge Capacity" field in MegaCLI BBU output and will be updated after a learn cycle. Refer to the next section for an example.

When a learn cycle is initiated, the charging circuit automatically places any virtual drives that are in WB mode into WT mode for the duration of the cycle which will temporarily reduce write performance. Once the learn cycle completes, the virtual drives are automatically transitioned back to WB mode if the battery is still capable of holding the required charge amount. Learn cycle time will vary dependent on the BBU type.

For BBU07 the complete learn cycle process and the cache in WT mode is expected to be 6 to 8 hours.
For BBU08 the complete learn cycle process and the cache in WT mode is expected to be 2 to 3 hours.

Note, when a new BBU is installed into a system, it will have a depleted charge state. Any virtual drives attached will be forced into WT cache mode while a full learn cycle is performed. Usually a sufficient charge to maintain the cache is reached after this cycle is complete. This may take 24 hours or longer.

To determine the Battery Type, run the following:
# ./MegaCli64 -AdpBbuCmd -a0 | grep BatteryType
BatteryType: iBBU08


To know which ASIC version of LSI controller is running:
#lspci | grep RAID
13:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2108 [Liberator] (rev 05)
#


How to check FW version
#./MegaCli64 -AdpAllInfo -a0 | grep "FW Package Build"
FW Package Build: 12.12.0-0178
#

Learn cycles on Exadata are default configured as follows:
    Storage Cells with image 11.2.1.2.x the learn cycle occurs monthly from first power on.
    Storage Cells with image 11.2.1.3.1 or later, the learn cycle is manually scheduled quarterly.
    Database nodes are set for automatic scheduled, which occurs every 30 days from first power on.

To change the start time on Storage Cells for when the learn cycle occurs, use a command similar to the following. The time reverts to the default learn cycle time after the cycle completes:

CellCLI> ALTER CELL bbuLearnCycleTime="2011-01-22T02:00:00-08:00"

How to know bbuLearnCycleTime:

CellCLI> list cell attributes bbuLearnCycleTime
         2016-04-17T02:00:00+01:00

CellCLI>


# ./MegaCli64 -AdpBbuCmd -a0 | grep "Learn Cycle" 
Learn Cycle Requested        : No
 Learn Cycle Active           : No
 Learn Cycle Status           : OK
 Learn Cycle Timeout          : No


Battery Charge Condition Requirements & Replacement Guidelines

The absolute minimum BBU07 charge required to meet the minimum 48 hours hold-up time is 600mAh.
When the BBU07 can no longer hold this much charge, MegaCli64 will report this with the "Remaining Capacity Low" setting will change from the normal "No" to "Yes"which may be an early warning notice to check whether its "Full Charge Capacity" is getting low.
The absolute minimum BBU08 charge required to meet the minimum 48 hours hold-up time is 674mAh.
Note, on BBU08 this may be flagged prematurely due to a firmware bug (Sun CR 7018730) that incorrectly sets the value higher at 960mAh based on incorrect operational assumptions. If this is being flagged due to this bug, ignore the alert if the "Full Charge Capacity" value is over 800mAh.

# ./MegaCli64 -AdpBbuCmd -a0 | grep "Remaining Capacity Low"
  Remaining Capacity Low       : Yes
   
# ./MegaCli64 -AdpBbuCmd -a0 | grep "Capacity: "
Remaining Capacity: 597 mAh
Full Charge Capacity: 612 mAh
Design Capacity: 1215 mAh


In this condition, the BBU can no longer support the cache for the duration required and needs replacement immediately. All virtual drives on a system with this set will be forced into WT mode to protect data until it is replaced, reducing performance until then.

Another parameter that needs to be checked is Max Error. Max Error is a reading that determines whether the reading of the battery condition is accurate or not. An error limit of <10% is considered to be a valid condition reading. If it is greater than it should be considered the battery condition cannot be reported properly and the BBU unit is treated as failed.

 Check the battery status and replace battery if the full charge capacity after learn cycle is less than 600 mAh
BBU08 -
    1. Check the battery status and replace battery if the full charge capacity after learn cycle is less than 674 mAh, regardless of any other BBU output field.
   2. Check the battery status and replace battery if the Max Error rate reported is 10% or greater.

Battery proactive replacement recommended within the next 60 days guidelines are as follows:

BBU07 -
 1. Replace Battery Module after 3 Year service life assuming the battery temperature does not exceed 45C.
If the temperature exceed 45C (Battery temp shall not exceed 49C), replace the battery every 2 years.
BBU08 -
    1. Replace Battery Module after 3 Year service life assuming the battery temperature
    does not exceed 45C. If the temperature exceed 45C (battery temp shall not exceed 55C), replace the battery every 2 years.

# ./MegaCli64 -AdpBbuCmd -a0 | grep "Temperature"
Temperature: 47 C
Temperature             : High
Over Temperature        : Yes


The virtual drive on this DB node is currently in WT and will remain so until the temperature drops and the BBU resumes charging.

# ./MegaCli64 -LDInfo -LALL -aALL | grep "Cache Policy"
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU
Disk Cache Policy: Disabled

3 comments:

  1. Hi,

    1)How frequent battery need to replaced?
    2) is there any change in the batter learn cycle in the new exa version?

    ReplyDelete
  2. 1). Depends on inlet ambient temperature maintained in DC, let's say if it is <25 degree celsius, battery lifetime can last for 3 years where as <32 degree celsius it will only last for 2 years.
    2). Yes starting from X6, servers has no batteries in it, Oracle has introduced something called Cache Vault (CVPM02) that is a super cap and not a battery.

    ReplyDelete
  3. Thanks a lot for your immediate reply. I wonder your passion and in-depth knowledge about Exadata. keep rocking

    ReplyDelete