2011-03-11

"Invalid parameters, or failed to bring up VIP" after upgrade to AIX 6.1

There are two metalink notes which tells about a bug in Oracle CRS after upgrade of AIX to 6.1

Bug 8725020: VIP WONT RUN (LHEA) ADAPTER
The entstat output for the logical Host Ethernet Adapter (LHEA) is different from a regular adapter. The racgvip script uses entstat ouput to determine if the interface is up or not. See MetaLink Note 959746.1 for details.
Bug 6608472: RACGVIP IN RAC FOR AIX FAILS EVEN THOUGH THE PUBLIC INTERFACE IS UP
This fix is required when using the IBM Logical Host Ethernet Adapter (LHEA) for the Oracle RAC Public or VIP interfaces. See MetaLink Note 567286.1 for details.

But, sometimes, the problem is solved by applying the workaround found here : http://oracledba-expert.blogspot.com/2010/01/unable-to-start-vip-on-aix-6.html



#Solution:Follow the steps below to resolve the issue.
$cd $CRS_HOME/bin
Backup the original racgvip file and edit the file to replace the entries below.

CRS script "racgvip" lines #263 and #275;

_O1=`$NETSTAT -n -I $_IF $AWK "{ if (/^$_IF/) {print \$5; exit}}"`

and_O2=`$NETSTAT -n -I $_IF $AWK "{ if (/^$_IF/) {print \$5; exit}}"`

The incorrect columns are being requested from awk for AIX6.1.Field 6 is the desired field on AIX6 where field 5 is applicable to AIX5. So, replace the $5 with $6 in both the lines for AIX6.1

 Metalink notes:
RAC on AIX: With Virtual Interfaces Racgvip Fails Even Though Public Interface is Up [ID 567286.1]


Applies to:

Oracle Server - Enterprise Edition - Version: 10.2.0.3 and later   [Release: 10.2 and later ]
IBM AIX on POWER Systems (64-bit)
IBM AIX Based Systems (64-bit)

Symptoms

If the Etherchannel and VLAN devices are used for the public network, racgvip script may fail and cause the VIP to offline as a result. 

Cause

Currently, the racgvip script issues the following command to check if the public network is up:

'$ENTSTAT -d $_IF | $GREP -iEq '.*lan.*state.*:.*operational.*
|.*link.*status.*:.*up.*' "


However on some of the new AIX network devices, the output from the following command is different, so the above grep check on the entstat output fails:

entstat -d <interface name>


Bug:6608472 addresses this problem. 

The new fix is in 10.2.0.3 patch 6851901 (MLR #16).
Although the bug 6608472 does not say the fix is in the patch 6851901, this patch (6851901) has the new racgvip that now issues the following to check the health of the public network:

$ENTSTAT -d $_IF | $GREP -iEq '.*lan.*state.*:.*operational.*|.*link.*status.*:.*up.*|.*port.*operational.*state.*:.*up.*'

The fix for the bug 6608472 is also included in 10.2.0.4

Solution

To resolve this issue, apply patch 6851901.

Again, this patch provides the new racgvip script that can handle the new / current AIX interface types.

The fix for the bug 6608472 is also in 10.2.0.4, so upgrading the CRS to 10.2.0.4 will resolve the problem produced by the bug 6608472.


VIP on AIX 5.3TL9+ Fails to Come Up with "Invalid Parameters, Or Failed To Bring Up VIP" [ID 959746.1]

Applies to:

Oracle Server - Enterprise Edition - Version: 10.2.0.1 to 10.2.0.4
IBM AIX Based Systems (64-bit)

Symptoms

On AIX 5.3 TL9+ , AIX 6 or AIX 6.1, the VIP fails to come up, with error "Invalid Parameters, Or Failed To Bring Up VIP".

Tracing the racgvip command shows that it fails when checking to see if the public interface (NIC) is up.
However, the public interface is up.

An error message similar to the following may be seen in logs:

2009-07-23 17:40:05.812: [ RACG][1] [270490][1][ora.srvr0101.vip]: Thu Jul 23 17:40:05 BST 2009 [ 159774 ] IsIfAlive: /usr/bin/entstat -d en0 failed. Return = 1 (host=srvr0101)
Thu Jul 23 17:40:05 BST 2009 [ 159774 ] checkIf: end for if=en0
Invalid parameters, or failed to bring up VIP (host=srvr0101)

Changes

The adapter type for the public network is LHEA (IBM Logical Host Ethernet Adapter):

# /usr/bin/entstat -d en0
-------------------------------------------------------------
ETHERNET STATISTICS (en0) :
Device Type: Host Ethernet Adapter (l-hea)
...

Cause

The entstat output for LHEA is different from a regular adapter. 

The racgvip script uses entstat output to determine if the specified interface is up or not; because the entstat output for LHEA is different, the check fails, therefore the VIP will not come up.

This is a known issue reported in the following bug:
Bug 8725020 - VIP WONT RUN (LHEA) ADAPTER 5.3 TL9

Solution

The following workaround fixes the racgvip script so that it does not fail for LHEA adapters:

1. Backup the racgvip script
2. Edit this line in the script using vi:
$ENTSTAT -d $_IF | $GREP -iEq '.*lan.*state.*:.*operational.*|.*link.*status.*:.*up.*|.*port.*operational.*state.*:.*up.*'

and replace it with this:
$ENTSTAT -d $_IF | $GREP -iEq '.*lan.*state.*:.*operational.*|.*link.*status.*:.*up.*|.*port.*operational.*state.*:.*up.*|.*driver.*flags.*:.*up.*'

(Notice that an extra regexp clause has been tacked on the end of the grep argument.)

3. Make sure that no stray characters have been introduced.
4. Save the racgvip file.


References

BUG:8725020 - VIP WONT RUN (LHEA) ADAPTER 5.3 TL9
NOTE:567286.1 - RAC on AIX: With Virtual Interfaces Racgvip Fails Even Though Public Interface is Up 

Niciun comentariu:

Trimiteți un comentariu