2011-08-07

how to upgrade to grid infrastructure 11.2

Things to Consider Before Upgrading to 11.2.0.2 Grid Infrastructure [ID 1312225.1]
  

Applies to:

Oracle Server - Enterprise Edition - Version: 11.2.0.2 and later   [Release: 11.2 and later ]
Information in this document applies to any platform.

Purpose

This document is to list known issues to avoid, areas to consider to have smooth upgrade to 11gR2 Grid Infrastructure.

Scope and Application

This document is intended for Oracle Clusterware Administrators and Oracle Support engineers.

Things to Consider Before Upgrading to 11.2.0.2 Grid Infrastructure

11.2.0.2 is a full release, any pre-11.2 Oracle Clusterware (CRS) can be upgraded to 11.2.0.2 directly; also started from 11gR2, in-place patchset upgrade is not supported, any patchset must be installed into a new HOME (out-of-place upgrade).

Validate Existing Clusterware

Before starting OUI to install 11.2.0.2 Grid Infrastructure (GI in short), verify existing environments with the following to eliminate the chance of hitting known issues.

  • The 11.2.0.2 GI owner must be the same as the existing clusterware owner (crs owner for pre-11.2 or GI owner for 11.2.0.1) to avoid the issue in note 1067606.1
  • ASMCA may fail due to non-default ASM instance names or different username for CRS and ASM in pre-11.2, refer to note 1113073.1 for details.
  • Avoid to set any OS user's home directory or ORACLE_BASE to /opt/oracle in order to avoid bug 10362257. This is explained in note 1281913.1
  • Unset Oracle related environment variables especially ORA_CRS_HOME before starting OUI or before executing any root script for both root and grid user. For details, refer to note 952925.1 - NETCA & ASMCA Fail during Upgrade of CRS/ASM to Grid Infrastructure 11gR2
  • The output of the following CVU command should be free of errors, refer to Section "Start OUI" for known CVU issues, refer to note 986822.1 for instructions how to enable CVU tracing.
$INSTALL_DIR/runcluvfy.sh stage -pre crsinst -n <node1>,<node2>,<node3>.. -verbose
  • The IP addresses for the public and private networks should not be in a link local subnet (169.254.*.*) and should be in unrelated, separate subnets to avoid the issue in note 1062682.1. Refer to note 1054902.1 Section "G. Basics of Subnet" for more information on subnet.
  • The output of "ifconfig" for the private network adapter and the cluster_interconnect part of the "oifcfg getif" command output  should be consistent in regards to the network adapter name and the subnet.
  • Public/VIP network information should be consistent to avoid the issue documented in note 1296579.1 - PRVF-10205 : The VIPs do not all share the same subnet.
  • The length of GI owner uid should be less than 7 digits in order to avoid the issue documented in note 10379703.8
  • OS should meet basic version requirement to avoid issues in the following note. Refer to www.oracle.com/pls/db112/portal.portal_db?selected=11 for a list of installation document.
  note 983713.1 - HP-UX: 11gR2 runInstaller Fails with "An internal error occurred within cluster verification framework"

  note 1085154.1 - Solaris: 11gR2 runInstaller Fails with "An internal error occurred within cluster verification framework"

  note 986657.1 - AIX: runInstaller Warns INS-13001 while CLUVFY Reports "Reference data is not available for verifying prerequisites on this operating system distribution"
  • The CSS parameter "diagwait" ("crsctl get css diagwait") should either be unset(output shows "Configuration parameter diagwait is not defined") or set to a small value (13 or less) to avoid issue in note 1102283.1 - 11gR2 rootupgrade.sh Fails as cssvfupgd Can not Upgrade Voting Disk
  • Output of "ocrcheck" should be clean to avoid issue in note 1093014.1
  • On AIX, rootpre.sh should finish successfully with no "Text file busy" messages before starting OUI to avoid the issue in note 1053055.1 - 11gR2 rootupgrade.sh Fails as "Symbol ras_arg_t (number 8) is not exported from dependent module libttsh11.so". Output of rootpre.sh is saved in /tmp/rootpre.out_${TIMESTAMP}.
  • On Windows, network adapter should not have invalid netmask to avoid issue in note 1252334.1 - Grid Infrastructure Configuration Fails as MDNSD can not Register on Network Adapter with Invalid Mask 0.0.0.0

Patch Pre-upgrade Clusterware Home

Rolling Upgrade Related:

The following issues may be encountered during a rolling ASM upgrade (migration). These issues will not happen when performing a non rolling upgrade.
To perform non rolling upgrade, stop pre-upgrade CRS/GI on all nodes, install new GI(i.e. 11.2.0.2) on all nodes and then execute rootupgrade.sh on one node at a time sequentially. If new GI is installed when pre-upgrade clusterware stack is up, either execute config.sh from new GI_HOME/crs/config or remove and reinstall when pre-upgrade clusterware stack is down.
  • Unpublished bug 9329767, it happens only when upgrading from pre-11.2 to 11.2.0.1 or from 11.2.0.1 to higher version (i.e.11.2.0.2). It could happen on any node and when it happens, ASM fails to start with ORA-00600 [kjbmmchkinteg:from]. The patch needs to be applied to 11.2.0.1 home, the original one-off patch 9329767 can not be applied in rolling fashion, bug 12652740 contains same fix but can be rolling applied.
  • Unpublished bug 10229886 - ASM rolling upgrade to 11.2.0.2 could fail as lmd process fails to start with ORA-7445 [kernel_vsyscall] / ORA-7445 [SKGXPIVRCV] / ORA-7445 [GSIGNAL], fixed in 11.2.0.3
  @Call stack of bug 10229886/10321596:
  @ gsignal abort assert_fail skgxp_assert skgxpivrcv skgxpvrcv ksxpqrcvb ..
  • bug 9941757 - Upgrade from 11.2.0.1 ASM INSTANCES FAILED TO START with error "ORA-15163: cluster not in rolling downgrade" after restart of GI on all nodes but before rootupgrade.sh finishs on last nodes, fixed in 11.2.0.3 and workaround is to:
- Log in as sysasm on a node that is already upgraded to 11.2.0.2 and execute: alter system start rolling migration to '11.2.0.1.0';
- Start 11.2.0.1 ASM on the problematic node, preferably via crsctl
- Upgrade ASM to 11.2.0.2 on the problem node (ex: through rootupgrade.sh script)
- Confirm all ASM instances are running 11.2.0.2
- If rolling upgrade is still in progress, run 'alter system stop rolling migration' on any ASM instance.

Other Issues:


Start OUI to Install GI 11.2.0.2

OUI calls CVU to check most components relevant for the clusterware installation, for common 11gR2 GI OUI errors and explanations/solutions, refer to note 1056713.1; to turn on debugging for OUI, refer to note 1056322.1


Below is a list of various OUI and CVU errors/warnings and explanations/solutions.

  • note 887471.1 - PRVF-4664 PRVF-4657: Found inconsistent name resolution entries for SCAN name
  • note 1210863.1 - Device Checks for ASM Fails with PRVF-5150: Path ORCL:<diskname> is not a valid path, bug 10026970 is fixed in 11.2.0.3
  • note 1267569.1 - PRVF-5449 : Check of Voting Disk location "ORCL:<diskname>(ORCL:<diskname>)" failed
  • note 1233505.1 - Checklist for PRVF-10037 : Failed to retrieve storage type for xx on node xx
  • note 1271996.1 - 11.2.0.2 Grid Install Fails with SEVERE: [FATAL] [INS-13013], and PRVF-5640 or a Warning in "Task resolv.conf Integrity"
  • note 1135933.1 - PRVF-4195 Reported while Executing "cluvfy comp ocr"
  • note 1051763.1 - INS-20802 PRVF-4172 Reported after Successful Upgrade to 11gR2 Grid Infrastructure
  • note 1056195.1 - INS-20702 Reported during 11gR2 Installation on getSharedPartitionListCVU
  • note 948382.1 - INS-08109 While Trying to Install Grid Infrastructure
  • note 970166.1 - INS-20702 "checkFreeDiskSpace" Reported During 11gR2 Installation
  • note 1056693.1 - How to Configure NTP or Windows Time to Resolve CLUVFY Error PRVF-5436 PRVF-9652
  • note 974481.1 - INS-20802 PRVF-9802 PRVF-5184 PRVF-5186 Reported after Successful Upgrade to 11gR2 Grid Infrastructure

Patch 11.2.0.2 GI Before Executing rootupgrade.sh

GI bundle patches/PSUs are accumulative and contain fixes to the most critical issues, it is recommended to apply the latest available bundle patch/PSU to avoid: 1). known issues that may prevent the root script from succeeding. 2). known issues that are not particular to the upgrade but could happen while the clusterware is being upgraded. At the time of this writing, PSU 2 patch 12311357 is the latest one.

To apply patches to the 11.2.0.2 GI home before the rootupgrade.sh script is executed, only run the "opatch napply" command. For example, to apply PSU 2, as grid user execute the following on all nodes:

$ <11.2.0.2GI_HOME>/OPatch/opatch napply -oh <11.2.0.2GI_HOME> -local <UNZIPPED_PATCH_LOCATION>/12311357$ <11.2.0.2GI_HOME>/OPatch/opatch napply -oh <11.2.0.2GI_HOME> -local <UNZIPPED_PATCH_LOCATION>/11724916

  • One of the top issues that causes the rootupgrade.sh script to fail in 11.2.0.2 is multicast not working for group 230.0.1.0, patch 9974223 introduces support for the additional group 224.0.0.251 and has been included in bundle 1 and above, refer to note 1212703.1 for more details. patch 9974223 may cause issue on AIX, refer to note 1329597.1 for solutions.
  • The failure to start HAIP may cause the rootupgrade.sh script to fail, bug 11077756 will allow the script to continue even if HAIP fails to start, the fix has been included in bundle 2 and above. For more issues in the HAIP area, refer to note 1210883.1 for more details.
  • Failure to install Oracle Kernel Service could cause the rootupgrade.sh script to fail, for details, refer to note 1265276.1 - ACFS-9327 ACFS-9121 ACFS-9310: ADVM/ACFS installation failed
  • ora.crf resource may fail to start on Solaris with "CRS-2674: Start of 'ora.crf' on '<nodename>' failed", refer to note 1289922.1 for details
  • Watch out root script output for "Configure Oracle Grid Infrastructure for a Cluster ... failed", if it shows up, it means that the root script has failed and corrective action is needed, refer to note 969254.1 for details.

Execute rootupgrade.sh

When switching to the root user in order to execute rootupgrade.sh, "su -" or "su - root" provides the full root environment, while sudo, pbrun, "su root" or "su" or similar facilities don't always do the same. It is recommended to execute rootupgrade.sh with full root environment to avoid issues documented in the following notes:

  • note 1315203.1 - ACFS Drivers Fail To Install During Root.Sh Execution Of 11.2.0.2 GI Standalone On AIX
  • note 1235944.1 - 11gR2 root.sh Fails as crsd.bin Does not Come up due to Wrong ulimit
  • note 1259874.1 - root.sh Fails as the ora.asm resource is not ONLINE or PROTL-16 due to Wrong umask
  • note 1141963.1 - 11gR2 rootupgrade.sh Fails as Environment Variable PATH Points to Wrong crsctl First

If rootupgrade.sh fails, refer to note 1050908.1 and note 1053970.1 for troubleshooting steps.

Patch 11.2.0.2 GI After Executed rootupgrade.sh

Here's a list of known critical issues that are affecting GI 11.2.0.2, many of them have been fixed by latest PSU.

  • bug 11871469 - ORAAGENT CHECK TASK IS TIMING OUT WHICH IS FORCING THE AGENT TO ABORT AND EXIT, fixed in 11.2.0.3, 12.1 and one-off patch 12347844 exists
  • bug 10034417 - OHASD.BIN TAKING 95-100% CPU ON AN IDLE SYSTEM, fixed 11.2.0.2 Bundle2, 11.2.0.3
  • bug 10374874 - RBAL GOT UNRESPONSIVE WAITING FOR A RESPONSE FROM OCSSD, fixed in 11.2.0.3, one-off patches exist
  • bug 10131381 - PROCESS PERSISTS AFTER INSTANCE SHUTDOWN, fixed in 11.2.0.2 Bundle1, 11.2.0.3
  • bug 9336825 - Repeated error "CRS-2332:Error pushing GPnP profile to "mdns:service:gpnp._tcp.local.://racnode1:16739/agent=gpnpd,cname=crs,host=racnode1,pid=17182/gpnpd h:racnode1 c:crs"" in clusterware alert<nodename>.log, fixed in 11.2.0.2 bundle2, 11.2.0.3
  • bug 9897335 - Instance alert.log Flooded With "NOTE: [emcrsp.bin@racnode1 (TNS V1-V3) 3159] opening OCR file", refer to note 1307063.1 for details.
  • bug 10056593 - Failed to add (property/value):('OLD_OCR_ID/'-1') for checkpoint:ROOTCRS_OLDHOMEINFO.Error code is 256, fixed in 11.2.0.3, the warning is ignorable
  • bug 10190153 - ORA.CTSSD AND ORA.CRSD GOES OFFLINE AFTER KILL GIPC ON CRS MASTER, fixed in 11.2.0.2 Bundle3, 11.2.0.3
  • bug 10371451 - CSSD aborting from thread GMClientListener, refer to note 1306137.1 - "ocssd.bin Fails to Start: clssgmPeerListener: connected to 1 of 3" for details


For known issues thats affecting each bundle, refer to note 1272288.1 - 11.2.0.2.X Grid Infrastructure Bundle Known Issues

Miscellaneous


bug 10205230 - ORA-600 or Data Corruption possible during shutdown normal/transactional/immediate of RAC instances in a rolling fashion, refer to note 1318986.1 for details

bug 10121931 - DBCA CONFIGURE DATABASE OPTION DISABLED IF 11201 DATABASE PRESENT, refer to note 10121931.8 for more details, one-off patches exist

bug 11069614 - RDBMS INSTANCE CRASH DUE TO SLOW REAP OF GIPC MESSAGES ON CMT SYSTEMS, refer to note 1287709.1 - "ocssd.bin High CPU Usage and Instance Crashes With ORA-29770" for details

Refer to note 1179474.1 for 11.2.0.2 Patch Set Availability and Known Issues

Refer to note 948456.1 for Known Issues Between Pre-11.2 Database and 11gR2 Grid Infrastructure

Refer to note 810394.1 for RAC and Oracle Clusterware Starter Kit and Best Practices


Niciun comentariu:

Trimiteți un comentariu