2011-05-04

gipchaLowerProcessNode: no valid interfaces found to node

11.2.0.2 Grid Infrastructure upgrade/install on >1 node cluster failing with "gipchaLowerProcessNode: no valid interfaces found to node" in crsd.log (Doc ID 1280234.1)

Symptoms

11.2.0.2 grid infrastructure upgrade or install on >1 node cluster
rootcrs.pl is failing and the following is found in the crsd.log


...
2010-11-29 10:52:38.603: [GIPCHALO][2314] gipchaLowerProcessNode: no valid interfaces found to node for 2614824036 ms, node 111ea99b0 { host 'racdb1', haName '1e0b-174e-37bc-a515', srcLuid 2612fa8e-3db4fcb7, dstLuid 00000000-00000000 numInf 0, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [55 : 55], createTime 2614768983, flags 0x4 }
2010-11-29 10:52:42.299: [ CRSMAIN][515] Policy Engine is not initialized yet!
2010-11-29 10:52:43.554: [ OCRMAS][3342]proath_connect_master:1: could not yet connect to master retval1 = 203, retval2 = 203
2010-11-29 10:52:43.554: [ OCRMAS][3342]th_master:110': Could not yet connect to new master [1]
2010-11-29 10:52:43.605: [GIPCHALO][2314] gipchaLowerProcessNode: no valid interfaces found to node for 2614829038 ms, node 111ea99b0 { host 'racdb1', haName '1e0b-174e-37bc-a515', srcLuid 2612fa8e-3db4fcb7, dstLuid 00000000-00000000 numInf 0, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [60 : 60], createTime 2614768983, flags 0x4 }
2010-11-29 10:52:43.754: [ OCRMAS][3342]proath_master:100b: Polling, connect to master not complete retval1 = 203, retval2 = 203
2010-11-29 10:52:43.955: [ OCRMAS][3342]proath_master:100b: Polling, connect to master not complete retval1 = 203, retval2 = 203
...
2010-11-29 11:13:49.817: [ OCRMAS][3342]proath_master:100b: Polling, connect to master not complete retval1 = 203, retval2 = 203
2010-11-29 11:13:50.018: [ OCRMAS][3342]proath_master:100b: Polling, connect to master not complete retval1 = 203, retval2 = 203
...


Changes

Upgrade or install of 11.2.0.2 grid infrastructure on >1 node cluster

Cause

2 causes found for this symptom. One cause is AIX-specific and the other cause is Unix-generic

1) AIX-specific cause

udp_sendspace is set as default 9216, it is smaller than 10240 bytes which is the size used by CRS.

#no -o udp_sendspace
will show the current setting


2) UNIX-generic cause

Netmask mismatch between the nodes. The private interface must have the same netmask on all nodes. Mismatch between netmask on different nodes can cause this symptom.



Solution

The two causes have two separate solutions.


1) Solution for AIX-specific cause

Increase udp_sendspace to >= 10240.

# no -o udp_sendspace=65536

Note that the 11gR2 documentation instructs to set udp_sendspace to 65536:
Network tuning parameterRecommended value
ipqmaxlen512
rfc13231
sb_max4194304
tcp_recvspace65536
tcp_sendspace65536
udp_recvspace655360
udp_sendspace65536














See Oracle Grid Infrastructure Installation Guide
11g Release 2 (11.2) for IBM AIX on POWER Systems (64-Bit)
2.11.7 Configuring Network Tuning Parameters
http://download.oracle.com/docs/cd/E11882_01/install.112/e17210/preaix.htm#CWAIX219
for more details.

If problem happens during rootupgrade.sh (usually on 2nd node), please do:
1). Increase udp_sendspace to 65536:
#no -o udp_sendspace=65536

 2). Stop CRS on both nodes:

# crsctl stop crs -f
# ps -ef |grep d.bin - to ensure there is no left over CRS process

3). Restart CRS on node 1:

# crsctl start crs
wait till CRS start on node 1.

4). On node 2, rerun rootupgrade.sh

# rootupgrade.sh

It should complete on node 2 this time.


2) Solution for Unix-generic cause

Check that netmask matches on private interface on all nodes.


[grid@mynode1 ~]$ ifconfig eth1
eth1 Link encap:Ethernet HWaddr 00:19:B9:1E:6D:97
inet addr:192.168.1.110 Bcast:192.168.1.255 Mask:255.255.255.0
...
[grid@mynode2 ~]$ ifconfig eth1
eth1 Link encap:Ethernet HWaddr 00:19:B9:1E:6D:97
inet addr:192.168.1.111 Bcast:192.168.1.255 Mask:255.255.255.0
...


In case of mismatch, customer sysadmin must correct the netmask on the private interface(s) where it's wrong. 

Niciun comentariu:

Trimiteți un comentariu