在这里,给我的好友崔华(dbnake)做一个关于Oracle Recovery进阶培训的宣传。

关于崔华,其实不用做过多的介绍,国内研究oracle interal的专家,我常常与他进行oracle方面的讨论,从他那里我获益非浅。

培训时间是2010-10- 21 ~ 2010-10-24, 合计四天课程,培训大纲如下:

培训模块内容

培训提纲

学时

OracleRedo机制解析

掌握Oracleredo log写入原理----Physiological Logging

3学时

深入解析redo log结构----SCNRBAChange VectorCheckpoint Structure

解析oracle recovery的实现机制-Page Fix、写日志优先、Checkpoint

OracleUndo机制解析

回滚段的作用与结构

3学时

深入解析XIDUBA的结构与作用

深入解析KTUXC的结构与作用

深入解析undo chain的结构与作用

Oracle恢复的内部实现

BBED的用法

3学时

Control文件的格式和内容解析

利用control文件借尸还魂的实例

OS Header Block的格式和内容解析

9i中通过修改OS header block实现跨平台迁移的实例

Datafile Header的格式和内容解析----Datafile Header的常规结构、独家披露Datafile Header中一个我们从来未触及过的区域及其作用

通过修改Datafile Header规避ORA-01190: controlfile or data file XXX is from before the last RESETLOGS错误的实例

OracleOffline文件的解析与恢复

offline datafile的实质

3学时

缺失归档日志后的offline datafile的强制恢复实例

offline tablespace的种类与实质

缺失datafile和归档日志后的offline tablespace的强制恢复实例

OracleRedoUndo损坏后的恢复

current redo log损坏后的恢复

3学时

缺失current online redo log后的恢复实例

深入解析undo损坏后的恢复及恢复实例

Oracle中常规恢复案例精讲

oracledelete操作的实质,如何恢复被delete掉的数据及恢复实例

3学时

oracledrop操作的实质,如何恢复被drop掉的表、存储过程及恢复实例

oracletruncate操作的实质,如何恢复被truncate的表及恢复实例

oracle中数据块损坏的种类及恢复实例,从单个坏块中抢救出数据的恢复实例

Oracle中特殊恢复案例精讲

9i中跨越resetlog恢复的实例

6学时

如何跳过缺失的归档及恢复实例

Allow 1 Corruption失效后我们如何处理stuck recovery

深入解析ORA-600[4000]错误及恢复实例

深入解析system回滚段损坏导致的ora-600[4193]/ora-600[4194]错误及恢复实例

深入解析DEPENDENCY$对象的恢复及恢复实例

深入解析ASM Disk Header的结构及ASM Disk Header损坏后的恢复实例

关于培训,详情可见《我第一次独挑大梁的培训》

在以前的一篇文章中,我提到千万不能将Oracle数据库的global_name更新为空。这不,事儿来了。我的一个同事,提到了一个解决办法,不过那个办法实际上是一种不完全恢复的办法,如果没有备份,就行不通。如果没有备份,可以使用BBED来修改块来解决这个问题,不过使用bbed仍然比较麻烦。

下面是我一时心血来潮进行的一次测试。测试环境,10.2.0.4 for Linux AS 5.5。注意,不要在生产库上模仿。

首先UPDATE GLOBAL_NAME为空,COMMIT后以abort方式关闭数据库,以abort方式只是为了增加点难度。之后再启动数据库。

SQL> update global_name set global_name='';

1 row updated.

SQL> commit;

Commit complete.

SQL> shutdown abort
ORACLE instance shut down.
SQL> startup
ORACLE instance started.

Total System Global Area  209715200 bytes
Fixed Size                  1266632 bytes
Variable Size              75500600 bytes
Database Buffers          130023424 bytes
Redo Buffers                2924544 bytes
Database mounted.
ORA-01092: ORACLE instance terminated. Disconnection forced

启动失败,不出意料出现ORA-600 [18062]错误:

Mon Sep  6 15:43:31 2010
Errors in file /oracle/app/oracle/admin/xty/udump/xty_ora_3149.trc:
ORA-00600: internal error code, arguments: [18062], [], [], [], [], [], [], []
QMNC started with pid=16, OS id=3151
Mon Sep  6 15:43:33 2010
Errors in file /oracle/app/oracle/admin/xty/udump/xty_ora_3149.trc:
ORA-00600: internal error code, arguments: [18062], [], [], [], [], [], [], []
Mon Sep  6 15:43:33 2010
Errors in file /oracle/app/oracle/admin/xty/udump/xty_ora_3149.trc:
ORA-00600: internal error code, arguments: [18062], [], [], [], [], [], [], []

下面来解决这个问题。

第1步,重启数据库到MOUNT状态:

[oracle@xty ~]$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.4.0 - Production on Mon Sep 6 15:43:47 2010

Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.

Connected to an idle instance.

SQL> startup mount
ORACLE instance started.

Total System Global Area  209715200 bytes
Fixed Size                  1266632 bytes
Variable Size              75500600 bytes
Database Buffers          130023424 bytes
Redo Buffers                2924544 bytes
Database mounted.

第2步,在另一个窗口中,使用gdb

[oracle@xty ~]$ ps -ef | grep LOCAL
oracle    3186  3156  0 15:43 ?        00:00:00 oraclexty (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle    3188  2978  0 15:44 pts/3    00:00:00 grep LOCAL
[oracle@xty ~]$ gdb $ORACLE_HOME/bin/oracle 3186
GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5)
...(省略)...
Reading symbols from /oracle/app/oracle/product/10.2.0/bin/oracle...(no debugging symbols found)...done.
Attaching to program: /oracle/app/oracle/product/10.2.0/bin/oracle, process 3186
Reading symbols from /etc/libcwait.so...(no debugging symbols found)...done.
Loaded symbols for /etc/libcwait.so
Reading symbols from /oracle/app/oracle/product/10.2.0/lib/libskgxp10.so...(no debugging symbols found)...done.
Loaded symbols for /oracle/app/oracle/product/10.2.0/lib/libskgxp10.so
Reading symbols from /oracle/app/oracle/product/10.2.0/lib/libhasgen10.so...(no debugging symbols found)...done.
...(省略)...
Reading symbols from /oracle/app/oracle/product/10.2.0/lib/libnnz10.so...(no debugging symbols found)...done.
Loaded symbols for /oracle/app/oracle/product/10.2.0/lib/libnnz10.so
Reading symbols from /usr/lib/libaio.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libaio.so.1
Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /lib/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib/libnsl.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libnss_files.so.2
0x0025d402 in __kernel_vsyscall ()
(gdb)  break kokiasg
Breakpoint 1 at 0xa3d404d
(gdb) continue
Continuing.

第3步,OPEN数据库:

SQL> alter database open;

约等一会儿,在alert日志里面可以看到:

Mon Sep  6 15:44:41 2010
SMON: enabling cache recovery
SMON: enabling tx recovery
Mon Sep  6 15:44:41 2010
Database Characterset is ZHS16GBK

在gdb的输出可以看到:

Breakpoint 1, 0x0a3d404d in kokiasg ()

第4步,在gdb那里中止OPEN:

(gdb) kill
Kill the program being debugged? (y or n) y     
(gdb) quit

sqlplus会提示:

alter database open
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel

这一次,Instance并没有terminated。只是Server process被KILL了。

第5步,还原GLOBAL_NAME:

[oracle@xty ~]$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.4.0 - Production on Mon Sep 6 15:45:09 2010

Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> select status from v$instance;

STATUS
------------
OPEN

SQL> update global_name set global_name='XTY';

1 row updated.

SQL> commit
  2  ;
commit
*
ERROR at line 1:
ORA-01109: database not open

虽然可以执行UPDATE,但是不能COMMIT。再试试能不能做DDL:

SQL> create table t1 ( a int);

Table created.

成功了。

在另会一个会话中查看GLOBAL_NAME:

SQL> select * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------
XTY

GLOBAL_NAME回来了。

这里通过DDL的隐式提交特性来UPDATE GLOBAL_NAME。其实还有更简单的办法:OCI主动断开连接时的自动提交。如果UPDATE之后,直接退出sqlplus,UPDATE GLOBAL_NAME的事务实际也提交了。看起来ORACLE这时只是不能执行显式的COMMIT语句。

第6步,重启数据库:

SQL> shutdown immediate
ORA-00604: error occurred at recursive SQL level 1
ORA-01109: database not open


SQL> shutdown abort
ORACLE instance shut down.
SQL> startup
ORACLE instance started.

Total System Global Area  209715200 bytes
Fixed Size                  1266632 bytes
Variable Size              75500600 bytes
Database Buffers          130023424 bytes
Redo Buffers                2924544 bytes
Database mounted.
Database opened.

看起来只能以abort方式关闭数据库:
成功了,数据库起来了。没有数据丢失,没有使用备份。不需要基于时间点的恢复,不需要BBED,不需要Resetlog。

,