如果Oracle数据库hang住了,对Oracle做system dump,或做hang analyze,是研究和解决问题的有效办法,至少在提交SR时能够有更多的有用信息。如果能够连接数据库,并能够进行操作,那么用oradebug是简单快捷的办法。

但有的时候,数据库由于hang住,sqlplus不能连接时(在10g可以尝试用sqlplus -prelim连接数据库),可以使用操作系统上的调试工具来dump oracle系统状态。在记一次Oracle数据库无响应(hang住)故障的处理一文中,就曾使用dbx做systemstate dump,并发现问题所在,并最终解决了问题。下面是当时用dbx做dump的过程:

# dbx -a 446910
Waiting to attach to process 446910 …
Successfully attached to oracle.
Type ‘help’ for help.
reading symbolic information …
stopped in iosl.select at 0×9000000000c94d8 ($t2)
0×9000000000c94d8 (select+0xfffffffffff06318) e8410028 ld r2,0×28(r1)
(dbx) print ksudss(10)

Segmentation fault in slrac at 0×100083aa0 ($t2)
0×100083aa0 (slrac+0xe4) 88030000 lbz r0,0×0(r3)
(dbx) detach

从上面可以看到,使用dbx做dump的过程为:

  • 找到有异常的进程号,比如CPU非常高,HANG住的进程等。如果做系统范围的systemstate dump,可以是其他的进程。
  • dbx -a < 进程号>
  • print ksudss(10) --这里是直接调用ORACLE程序中的ksudss函数,dump level为10,就等同于在sqlplus 中用oradebug dump systemstate 10
  • detach
  • quit

在LINUX下可以使用gdb,下面是一个例子:

[oracle@xty ~]$ ps -ef | grep LOCAL
oracle 3765 3764 1 05:55 ? 00:00:00 oraclexty (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle 3767 3668 0 05:55 pts/2 00:00:00 grep LOCAL
[oracle@xty ~]$ gdb $ORACLE_HOME/bin/oracle 3765
GNU gdb Red Hat Linux (6.1post-1.20040607.62rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...(no debugging symbols found)...Using host libthread_db library "/lib/tls/libthread_db.so.1".

Attaching to program: /u01/app/oracle/product/10.1.0/db_1/bin/oracle, process 3765
Reading symbols from /u01/app/oracle/product/10.1.0/db_1/lib/libskgxp10.so...(no debugging symbols found)...done.
Loaded symbols for /u01/app/oracle/product/10.1.0/db_1/lib/libskgxp10.so
Reading symbols from /u01/app/oracle/product/10.1.0/db_1/lib/libhasgen10.so...done.
Loaded symbols for /u01/app/oracle/product/10.1.0/db_1/lib/libhasgen10.so
Reading symbols from /u01/app/oracle/product/10.1.0/db_1/lib/libskgxn2.so...done.
Loaded symbols for /u01/app/oracle/product/10.1.0/db_1/lib/libskgxn2.so
Reading symbols from /u01/app/oracle/product/10.1.0/db_1/lib/libocr10.so...done.
Loaded symbols for /u01/app/oracle/product/10.1.0/db_1/lib/libocr10.so
Reading symbols from /u01/app/oracle/product/10.1.0/db_1/lib/libocrb10.so...done.
Loaded symbols for /u01/app/oracle/product/10.1.0/db_1/lib/libocrb10.so
Reading symbols from /u01/app/oracle/product/10.1.0/db_1/lib/libocrutl10.so...done.
Loaded symbols for /u01/app/oracle/product/10.1.0/db_1/lib/libocrutl10.so
Reading symbols from /u01/app/oracle/product/10.1.0/db_1/lib/libjox10.so...done.
Loaded symbols for /u01/app/oracle/product/10.1.0/db_1/lib/libjox10.so
Reading symbols from /u01/app/oracle/product/10.1.0/db_1/lib/libclsra10.so...done.
Loaded symbols for /u01/app/oracle/product/10.1.0/db_1/lib/libclsra10.so
Reading symbols from /u01/app/oracle/product/10.1.0/db_1/lib/libdbcfg10.so...done.
Loaded symbols for /u01/app/oracle/product/10.1.0/db_1/lib/libdbcfg10.so
Reading symbols from /u01/app/oracle/product/10.1.0/db_1/lib/libnnz10.so...done.
Loaded symbols for /u01/app/oracle/product/10.1.0/db_1/lib/libnnz10.so
Reading symbols from /usr/lib/libaio.so.1...done.
Loaded symbols for /usr/lib/libaio.so.1
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/tls/libm.so.6...done.
Loaded symbols for /lib/tls/libm.so.6
Reading symbols from /lib/tls/libpthread.so.0...done.
[Thread debugging using libthread_db enabled]
[New Thread -1219938624 (LWP 3765)]
Loaded symbols for /lib/tls/libpthread.so.0
Reading symbols from /lib/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/tls/libc.so.6...done.
Loaded symbols for /lib/tls/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
0x006967a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
(gdb) print ksudss(10)
[Switching to Thread -1219938624 (LWP 3765)]
$1 = 213658428
(gdb) detach
Detaching from program: /u01/app/oracle/product/10.1.0/db_1/bin/oracle, process 3765
(gdb) quit

然后我们可以找到有dump结果的trace文件:

Read the rest of this entry

,