2011-12-15

gipchaInternalResolve: failed to resolve ret gipcretKeyNotFound

CRSD Fails to Start due to GIPC Communication Failure with Master [ID 1337730.1]

Applies to:

Oracle Server - Enterprise Edition - Version: 11.2.0.2 to 11.2.0.2 - Release: 11.2 to 11.2
Information in this document applies to any platform.

Symptoms

11gR2 Grid Infrastructure CRSD fails to join the cluster with the following error even though there's no network issue:
  • $GRID_HOME/log/<node>/crsd/crsd.log or $GRID_HOME/log/<node>/gipcd/gipcd.log
2011-04-10 22:22:39.677: [GIPCHALO][14636816] gipchaLowerDropMsg: dropping because of node failure msg 0x7fefec041c08 { len 1160, seq 181, type gipchaHdrTypeRecvEstablish (5), lastSeq 0, lastAck 0, minAck 180, flags 0x1, srcLuid 11d91062-ea2cdc97, dstLuid 00000000-00000000, msgId 179 }, node 0x7fefe801dd40 { host 'racnode1', haName 'e2cc-aa06-08f2-2397', srcLuid fc67760b-7a856aaa, dstLuid 11d91062-ea2cdc97 numInf 0, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [90 : 90], createTime 4294237860, flags 0xa0 }
...
2011-04-10 22:23:09.759: [GIPCHALO][14636816] gipchaLowerProcessNode: no valid interfaces found to node for 4294357960 ms, node 0x7fefe801d970 { host 'racnode1', haName 'e2cc-aa06-08f2-2397', srcLuid fc67760b-e83bdc10, dstLuid 11d91062-ea2cdc97 numInf 0, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [31 : 31], createTime 4294327890, flags 0x4 }
2011-04-10 22:23:10.707: [GIPCHALO][14636816] gipchaLowerDropMsg: dropping because of sequence timeout, waited 30010, msg 0x7fefec046d28 { len 1160, seq 183, type gipchaHdrTypeRecvEstablish (5), lastSeq 0, lastAck 0, minAck 182, flags 0x1, srcLuid 11d91062-ea2cdc97, dstLuid 00000000-00000000, msgId 181 }, node 0x7fefe801d970 { host 'racnode1', haName 'e2cc-aa06-08f2-2397', srcLuid fc67760b-e83bdc10, dstLuid 11d91062-ea2cdc97 numInf 0, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [31 : 31], createTime 4294327890, flags 0x4 }
...
2011-04-10 22:23:21.317: [ CRSMAIN][31647504] Policy Engine is not initialized yet!
  • $GRID_HOME/log/<node>/crsd/crsd.log from OCR/PE master
2011-04-10 22:19:37.582: [  OCRSRV][2157287184]s_u_r_c_i: Connecting to node with host name [racnode2]
2011-04-10 22:19:37.582: [  OCRSRV][2157287184]s_update_remote_cache_int: Connecting to node [1] at address [racnode2:d633-1505-1365-7a21] for invaldiation.
2011-04-10 22:19:37.583: [GIPCXCPT][2157287184] gipchaInternalResolve: failed to resolve ret gipcretKeyNotFound (36), host 'racnode1', port 'b015-632e-0f93-4247', hctx 0x19f0dd0 [0000000000000010] { gipchaContext : host 'racnode1', name 'e2cc-aa06-08f2-2397', luid '11d91062-00000000', numNode 2, numInf 2, usrFlags 0x0, flags 0x1 }, ret gipcretKeyNotFound (36)
2011-04-10 22:19:37.583: [GIPCHGEN][2157287184] gipchaResolveF [gipcmodGipcResolve : gipcmodGipc.c : 795]: EXCEPTION[ ret gipcretKeyNotFound (36) ]  failed to resolve ctx 0x19f0dd0 [0000000000000010] { gipchaContext : host 'racnode1', name 'e2cc-aa06-08f2-2397', luid '11d91062-00000000', numNode 2, numInf 2, usrFlags 0x0, flags 0x1 }, host 'racnode1', port 'b015-632e-0f93-4247', flags 0x0
2011-04-10 22:19:37.588: [GIPCHGEN][1946154768] gipchaNodeCreate: adding new node 0x7fa3680207d0 { host 'racnode2', haName 'cc4b-6b7e-e26a-1e05', srcLuid 11d91062-ea2cdc97, dstLuid 00000000-00000000 numInf 0, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [0 : 0], createTime 8356304, flags 0x0 }
2011-04-10 22:19:37.589: [GIPCHALO][2178299664] gipchaLowerSend: deffering startup of hdr 0x7fa36c021db8 { len 232, seq 0, type gipchaHdrTypeSend (1), lastSeq 0, lastAck 0, minAck 0, flags 0x0, srcLuid 00000000-00000000, dstLuid 00000000-00000000, msgId 0 }, node 0x7fa3680207d0 { host 'racnode2', haName 'cc4b-6b7e-e26a-1e05', srcLuid 11d91062-ea2cdc97, dstLuid 00000000-00000000 numInf 0, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [0 : 0], createTime 8356304, flags 0x0 }
2011-04-10 22:19:37.589: [GIPCHALO][2178299664] gipchaLowerProcessNode: no valid interfaces found to node for 8356304 ms, node 0x7fa3680207d0 { host 'racnode2', haName 'cc4b-6b7e-e26a-1e05', srcLuid 11d91062-ea2cdc97, dstLuid 00000000-00000000 numInf 0, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [0 : 0], createTime 8356304, flags 0x4 }
...
2011-04-10 22:21:41.349: [  OCRSRV][2157287184]proas_u_r_c: No reconfig received for node [1]after waiting for 93000 msec
2011-04-10 22:21:41.351: [  OCRSRV][2157287184]proas_u_r_c: Invoking conn clean on node [1]. Recovery state [0].
2011-04-10 22:21:41.351: [  OCRSRV][2157287184]proas_u_r_c: Issued a reconnect on node [1] host [racnode2]. Recovery state changed to [1]
...
2011-04-10 22:23:45.173: [  OCRSRV][2157287184]proas_u_r_c: No reconfig received for node [1]after waiting for 93000 msec
2011-04-10 22:23:45.175: [  OCRSRV][2157287184]proas_u_r_c: Invoking member kill on node [1]. Recovery state [1].
2011-04-10 22:23:45.175: [ CSSCLNT][2157287184]clssgsmbrkill: Member kill request: Members map 0x00000002
2011-04-10 22:23:45.177: [ CSSCLNT][2157287184]clssgsmbrkill: Success from kill call rc 0
[  OCRSRV][2157287184](:PROCR00000:)proas_u_r_c: Successfully issued clssgsmbrkill() node [1] host [racnode2].  Recovery state changed to [2]
..
2011-04-10 22:23:45.197: [  OCRMSG][2161489680]prom_send: Failed to send [13]
2011-04-10 22:23:45.198: [  OCRMSG][2161489680]GIPC error [13] msg [gipcretConnectionInvalid]
2011-04-10 22:23:45.199: [  OCRSRV][2161489680]s_update_remote_cache_int: FAILED TO SEND MESSAGE to node 1 retcode 203
2011-04-10 22:23:45.200: [  OCRSRV][2161489680]s_update_remote_cache_int: returning con FAILURE for nodenum [1]
Other likely symptoms is OCR slave restarts after OCR master node is terminated.

Cause

Caused by unpublished bug 9593552

Solution

bug 9593552 is fixed in 11.2.0.2 PSU3, 11.2.0.3 and above.
It's recommended to apply latest patchset and PSU, but if it's not feasible, please engage Oracle Support to request interim patch 9593552

Niciun comentariu:

Trimiteți un comentariu