在以前的一篇文章中,我提到千万不能将Oracle数据库的global_name更新为空。这不,事儿来了。我的一个同事,提到了一个解决办法,不过那个办法实际上是一种不完全恢复的办法,如果没有备份,就行不通。如果没有备份,可以使用BBED来修改块来解决这个问题,不过使用bbed仍然比较麻烦。

下面是我一时心血来潮进行的一次测试。测试环境,10.2.0.4 for Linux AS 5.5。注意,不要在生产库上模仿。

首先UPDATE GLOBAL_NAME为空,COMMIT后以abort方式关闭数据库,以abort方式只是为了增加点难度。之后再启动数据库。

SQL> update global_name set global_name='';

1 row updated.

SQL> commit;

Commit complete.

SQL> shutdown abort
ORACLE instance shut down.
SQL> startup
ORACLE instance started.

Total System Global Area  209715200 bytes
Fixed Size                  1266632 bytes
Variable Size              75500600 bytes
Database Buffers          130023424 bytes
Redo Buffers                2924544 bytes
Database mounted.
ORA-01092: ORACLE instance terminated. Disconnection forced

启动失败,不出意料出现ORA-600 [18062]错误:

Mon Sep  6 15:43:31 2010
Errors in file /oracle/app/oracle/admin/xty/udump/xty_ora_3149.trc:
ORA-00600: internal error code, arguments: [18062], [], [], [], [], [], [], []
QMNC started with pid=16, OS id=3151
Mon Sep  6 15:43:33 2010
Errors in file /oracle/app/oracle/admin/xty/udump/xty_ora_3149.trc:
ORA-00600: internal error code, arguments: [18062], [], [], [], [], [], [], []
Mon Sep  6 15:43:33 2010
Errors in file /oracle/app/oracle/admin/xty/udump/xty_ora_3149.trc:
ORA-00600: internal error code, arguments: [18062], [], [], [], [], [], [], []

下面来解决这个问题。

第1步,重启数据库到MOUNT状态:

[oracle@xty ~]$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.4.0 - Production on Mon Sep 6 15:43:47 2010

Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.

Connected to an idle instance.

SQL> startup mount
ORACLE instance started.

Total System Global Area  209715200 bytes
Fixed Size                  1266632 bytes
Variable Size              75500600 bytes
Database Buffers          130023424 bytes
Redo Buffers                2924544 bytes
Database mounted.

第2步,在另一个窗口中,使用gdb

[oracle@xty ~]$ ps -ef | grep LOCAL
oracle    3186  3156  0 15:43 ?        00:00:00 oraclexty (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle    3188  2978  0 15:44 pts/3    00:00:00 grep LOCAL
[oracle@xty ~]$ gdb $ORACLE_HOME/bin/oracle 3186
GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5)
...(省略)...
Reading symbols from /oracle/app/oracle/product/10.2.0/bin/oracle...(no debugging symbols found)...done.
Attaching to program: /oracle/app/oracle/product/10.2.0/bin/oracle, process 3186
Reading symbols from /etc/libcwait.so...(no debugging symbols found)...done.
Loaded symbols for /etc/libcwait.so
Reading symbols from /oracle/app/oracle/product/10.2.0/lib/libskgxp10.so...(no debugging symbols found)...done.
Loaded symbols for /oracle/app/oracle/product/10.2.0/lib/libskgxp10.so
Reading symbols from /oracle/app/oracle/product/10.2.0/lib/libhasgen10.so...(no debugging symbols found)...done.
...(省略)...
Reading symbols from /oracle/app/oracle/product/10.2.0/lib/libnnz10.so...(no debugging symbols found)...done.
Loaded symbols for /oracle/app/oracle/product/10.2.0/lib/libnnz10.so
Reading symbols from /usr/lib/libaio.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libaio.so.1
Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /lib/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib/libnsl.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libnss_files.so.2
0x0025d402 in __kernel_vsyscall ()
(gdb)  break kokiasg
Breakpoint 1 at 0xa3d404d
(gdb) continue
Continuing.

第3步,OPEN数据库:

SQL> alter database open;

约等一会儿,在alert日志里面可以看到:

Mon Sep  6 15:44:41 2010
SMON: enabling cache recovery
SMON: enabling tx recovery
Mon Sep  6 15:44:41 2010
Database Characterset is ZHS16GBK

在gdb的输出可以看到:

Breakpoint 1, 0x0a3d404d in kokiasg ()

第4步,在gdb那里中止OPEN:

(gdb) kill
Kill the program being debugged? (y or n) y     
(gdb) quit

sqlplus会提示:

alter database open
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel

这一次,Instance并没有terminated。只是Server process被KILL了。

第5步,还原GLOBAL_NAME:

[oracle@xty ~]$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.4.0 - Production on Mon Sep 6 15:45:09 2010

Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> select status from v$instance;

STATUS
------------
OPEN

SQL> update global_name set global_name='XTY';

1 row updated.

SQL> commit
  2  ;
commit
*
ERROR at line 1:
ORA-01109: database not open

虽然可以执行UPDATE,但是不能COMMIT。再试试能不能做DDL:

SQL> create table t1 ( a int);

Table created.

成功了。

在另会一个会话中查看GLOBAL_NAME:

SQL> select * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------
XTY

GLOBAL_NAME回来了。

这里通过DDL的隐式提交特性来UPDATE GLOBAL_NAME。其实还有更简单的办法:OCI主动断开连接时的自动提交。如果UPDATE之后,直接退出sqlplus,UPDATE GLOBAL_NAME的事务实际也提交了。看起来ORACLE这时只是不能执行显式的COMMIT语句。

第6步,重启数据库:

SQL> shutdown immediate
ORA-00604: error occurred at recursive SQL level 1
ORA-01109: database not open


SQL> shutdown abort
ORACLE instance shut down.
SQL> startup
ORACLE instance started.

Total System Global Area  209715200 bytes
Fixed Size                  1266632 bytes
Variable Size              75500600 bytes
Database Buffers          130023424 bytes
Redo Buffers                2924544 bytes
Database mounted.
Database opened.

看起来只能以abort方式关闭数据库:
成功了,数据库起来了。没有数据丢失,没有使用备份。不需要基于时间点的恢复,不需要BBED,不需要Resetlog。

,
Trackback

26 comments untill now

  1. “看起来只能以abort方式关闭数据库:”这句话何解?

    [回复]

    老熊 回复:

    @boypoo, shutdown immediate那一句报了ORA-00604错误,shutdown失败呀。

    [回复]

  2. 我测试了一下。用bbed恢复不了。大师能否把bbed也写出来。谢谢!!

    [回复]

  3. 能ddl不就已经提交了吗?还需要做ddl trigger吗?

    [回复]

    老熊 回复:

    @koko, 对,你说的很对。

    [回复]

  4. 在aix平台上,用那个工具代替gdb,你的步骤是基于open状态做的,现实是在aix平台上一旦设置global_name为空后,再也无无法打开了。也就无法执行update操作了。。

    [回复]

    老熊 回复:

    @wenxing, 在AIX上用dbx。关于你说的这个问题,请仔细阅读文章

    [回复]

  5. “我的一个同事,提到了一个解决办法,不过那个办法实际上是一种不完全恢复的办法,如果没有备份,就行不通。”

    我尝试了一下这个方法,在没有备份的情况下,也是可以打开的。
    不过我想不通,数据库还可以恢复到比当前SCN小的scn,怪?

    [回复]

    老熊 回复:

    @wenxing, 那种在UPDATE之后的块没有写到数据文件上就可以实现。

    [回复]

  6. 在执行update global_name set global_name=”后
    commit;
    这个时候后台报600 18062,数据库没有shutdown,
    但是已经无法登录了:
    SQL>
    SQL> conn / as sysdba
    ERROR:
    ORA-01075: you are currently logged on
    Warning: You are no longer connected to ORACLE.
    然后shutdwon后,在start mount
    再open时,实例直接shutdown

    SQL> alter database open;
    alter database open
    *
    ERROR at line 1:
    ORA-01092: ORACLE instance terminated. Disconnection forced

    我不知道是否你使用了gdb的关系导致数据库可以open起来,我去尝试一下。。

    [回复]

    老熊 回复:

    @wenxing, 这个测试最关键的地方就是使用gdb,使数据库能够处于open状态。

    [回复]

  7. 果然可以,把在aix平台上的结果贴一下,感谢laoxiang,顺便问一下,为啥使用gdb,dbx就可以open起来?
    在第一个窗口执行:
    SQL> startup mount;
    ORA-32004: obsolete and/or deprecated parameter(s) specified
    ORACLE instance started.

    Total System Global Area 1610612736 bytes
    Fixed Size 2073192 bytes
    Variable Size 654314904 bytes
    Database Buffers 939524096 bytes
    Redo Buffers 14700544 bytes
    Database mounted.
    在第二个窗口执行:
    oracle@ccicdbsrv2:/oracle/admin/ccicdev/bdump/ps -ef | grep LOCAL|grep ccicdev
    oracle 1687838 2371748 0 11:07:55 – 0:00 oracleccicdev (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
    oracle@ccicdbsrv2:/oracle/admin/ccicdev/bdump/dbx -a 1687838
    Waiting to attach to process 1687838 …
    Successfully attached to oracle.
    回到第一个窗口执行:
    Database mounted.
    SQL> alter database open;

    hang住了。。
    新开第三个窗口:
    SQL> conn / as sysdba
    Connected.
    SQL> select status from v$instance;

    STATUS
    ————
    OPEN
    SQL> conn / as sysdba
    Connected.
    SQL> select status from v$instance;

    STATUS
    ————
    OPEN

    SQL> update global_name set global_name=’CCICDEV';

    1 row updated.

    SQL> exit
    Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 – 64bit Production
    With the Partitioning, OLAP and Data Mining options
    oracle@ccicdbsrv2:/home/oracle/sqlplus /nolog

    SQL*Plus: Release 10.2.0.3.0 – Production on Tue Sep 7 11:20:13 2010

    Copyright (c) 1982, 2006, Oracle. All Rights Reserved.

    SQL> conn / as sysdba
    Connected.
    SQL> select global_name from global_name;

    GLOBAL_NAME
    ——————————————————————————–
    CCICDEV

    再中断dbx进程后,第一个窗口:
    SQL> alter database open;

    alter database open
    *
    ERROR at line 1:
    ORA-03113: end-of-file on communication channel

    SQL> SQL> SQL> SQL

    [回复]

  8. “那种在UPDATE之后的块没有写到数据文件上就可以实现。”
    我是这样测试的:
    SQL>update global_name set global_name=”;
    SQL>commit;

    在alert日志中:
    Errors in file /oracle/admin/ccicdev/udump/ccicdev_ora_536662.trc:
    ORA-00600: internal error code, arguments: [18062], [], [], [], [], [], [], []

    然后我shutdown immediate–这个时候数据库应该写回到物理文件了吧?
    –之后的步骤都是按照文全的步骤做的。。

    [回复]

    老熊 回复:

    @wenxing, shutdown immediate方式,内存中的数据已经写回到了数据文件。

    [回复]

  9. 老熊:
    我按照你的步骤在aix平台操作,发现数据open后又报ora-600 18062错误,接着还是shutdown了,但是他可以自动mount,你那里的环境有问题吗?
    LGWR: STARTING ARCH PROCESSES COMPLETE
    ARC1 started with pid=24, OS id=356806
    Tue Sep 14 09:42:59 2010
    Thread 1 advanced to log sequence 5
    Thread 1 opened at log sequence 5
    Current log# 1 seq# 5 mem# 0: /oradata/ccicdev/redo01.log
    Successful open of redo thread 1
    Tue Sep 14 09:42:59 2010
    MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
    Tue Sep 14 09:42:59 2010
    ARC1: Becoming the ‘no FAL’ ARCH
    ARC1: Becoming the ‘no SRL’ ARCH
    Tue Sep 14 09:42:59 2010
    ARC0: Becoming the heartbeat ARCH
    Tue Sep 14 09:42:59 2010
    SMON: enabling cache recovery
    Tue Sep 14 09:43:00 2010
    Successfully onlined Undo Tablespace 5.
    Tue Sep 14 09:43:00 2010
    SMON: enabling tx recovery
    Tue Sep 14 09:43:00 2010
    Database Characterset is AL32UTF8
    replication_dependency_tracking turned off (no async multimaster replication found)
    Starting background process QMNC
    QMNC started with pid=25, OS id=2891958
    Tue Sep 14 09:43:02 2010
    Errors in file /oracle/admin/ccicdev/udump/ccicdev_ora_2293938.trc:
    ORA-00600: internal error code, arguments: [18062], [], [], [], [], [], [], []
    Tue Sep 14 09:43:03 2010
    Errors in file /oracle/admin/ccicdev/udump/ccicdev_ora_266640.trc:
    ORA-00600: internal error code, arguments: [18062], [], [], [], [], [], [], []
    Tue Sep 14 09:43:03 2010
    Errors in file /oracle/admin/ccicdev/udump/ccicdev_ora_2293938.trc:
    ORA-00600: internal error code, arguments: [18062], [], [], [], [], [], [], []
    Tue Sep 14 09:43:03 2010
    Errors in file /oracle/admin/ccicdev/udump/ccicdev_ora_2293938.trc:
    ORA-00600: internal error code, arguments: [18062], [], [], [], [], [], [], []
    Tue Sep 14 09:43:03 2010
    Error 600 happened during db open, shutting down database
    USER: terminating instance due to error 600
    Instance terminated by USER, pid = 2293938
    ORA-1092 signalled during: alter database open…
    Tue Sep 14 09:47:12 2010
    Starting ORACLE instance (normal)
    LICENSE_MAX_SESSION = 0
    LICENSE_SESSIONS_WARNING = 0
    Picked latch-free SCN scheme 3
    Autotune of undo retention is turned on.
    IMODE=BR
    ILAT =97
    LICENSE_MAX_USERS = 0
    SYS auditing is disabled
    ksdpec: called for event 13740 prior to event group initialization
    Starting up ORACLE RDBMS Version: 10.2.0.3.0.
    System parameters with non-default values:
    processes = 800
    sessions = 885
    __shared_pool_size = 620756992
    __large_pool_size = 16777216
    __java_pool_size = 16777216
    __streams_pool_size = 0
    sga_target = 1610612736
    control_files = /oradata/ccicdev/control01.ctl, /oradata/ccicdev/control02.ctl, /oradata/ccicdev/control03.ctl
    db_block_size = 8192
    __db_cache_size = 939524096
    compatible = 10.2.0.3.0
    log_archive_dest_1 = LOCATION=/oradata/archlog/ccicdev
    log_archive_format = %t_%s_%r.dbf
    db_file_multiblock_read_count= 16
    _allow_resetlogs_corruption= TRUE
    _allow_terminal_recovery_corruption= TRUE
    _corrupted_rollback_segments= _SYSSMU1
    undo_management = AUTO
    undo_tablespace = UNDOTBS2
    remote_login_passwordfile= EXCLUSIVE
    db_domain =
    dispatchers = (PROTOCOL=TCP) (SERVICE=ccicdevXDB)
    job_queue_processes = 10
    background_dump_dest = /oracle/admin/ccicdev/bdump
    user_dump_dest = /oracle/admin/ccicdev/udump
    core_dump_dest = /oracle/admin/ccicdev/cdump
    audit_file_dest = /oracle/admin/ccicdev/adump
    db_name = ccicdev
    open_cursors = 800
    sql_trace = FALSE
    pga_aggregate_target = 821035008
    Deprecated system parameters with specified values:
    sql_trace
    End of deprecated system parameter listing
    PMON started with pid=2, OS id=3428352
    PSP0 started with pid=3, OS id=2695392
    MMAN started with pid=4, OS id=1462656
    DBW0 started with pid=5, OS id=2891966
    LGWR started with pid=6, OS id=2662642
    CKPT started with pid=7, OS id=2576400
    SMON started with pid=8, OS id=1810502
    RECO started with pid=9, OS id=1413338
    CJQ0 started with pid=10, OS id=1220834
    MMON started with pid=11, OS id=127044
    Tue Sep 14 09:47:12 2010
    starting up 1 dispatcher(s) for network address ‘(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))’…
    MMNL started with pid=12, OS id=4493448
    Tue Sep 14 09:47:13 2010
    starting up 1 shared server(s) …
    Tue Sep 14 09:47:13 2010
    ALTER DATABASE MOUNT
    Tue Sep 14 09:47:17 2010
    Setting recovery target incarnation to 5
    Tue Sep 14 09:47:17 2010
    Successful mount of redo thread 1, with mount id 2879843617
    Tue Sep 14 09:47:17 2010
    Database mounted in Exclusive Mode
    Completed: ALTER DATABASE MOUNT

    [回复]

  10. 如果你操作正常,下面一行应该不会出现:
    replication_dependency_tracking turned off (no async multimaster replication found)

    [回复]

  11. solaris10下测了一下,报错,哪里出了问题?
    -bash-3.00$ ps -ef|grep LOCAL
    oracle 4100 4069 0 15:04:05 ? 0:00 oracletjdrms14 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
    oracle 4105 3986 0 15:04:49 pts/4 0:00 grep LOCAL
    -bash-3.00$ gdb $ORACLE_HOME/bin/oracle 4100
    GNU gdb 6.8
    Copyright (C) 2008 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law. Type “show copying”
    and “show warranty” for details.
    This GDB was configured as “sparc-sun-solaris2.10″…
    (no debugging symbols found)
    Attaching to program `/export/home/orahome/oracle/product/10.2.0/bin/oracle’, process 4100
    /proc/4100: Value too large for defined data type.
    do_attach: couldn’t save traced faults.
    /export/home/orahome/oracle/4100: No such file or directory.

    [回复]

    老熊 回复:

    @jyc, 这应该是gdb的问题。
    This GDB was configured as “sparc-sun-solaris2.10″…
    看上去是编译成32位的。
    应该用64位的gdb

    [回复]

  12. 我在fedora 12上用oracle 11.2.0.1的版本测试的时候,
    gdb:
    (gdb)break kokiasg
    Breakpoint 1 at 0x8c8ea13
    (gdb) continue
    Continuing.
    Breakpoint 1, 0x08c8ea13 in kokiasg ()

    alter中:
    SMON: enabling tx recovery
    Archived Log entry 178 added for thread 1 sequence 278 ID 0x4a7839fd dest 1:
    Database Characterset is ZHS16GBK

    这时候如果在gdb中kill

    (gdb) kill
    Kill the program being debugged? (y or n) y
    (gdb) quit

    alter.log中:
    ARC3: Archival started
    ARC0: STARTING ARCH PROCESSES COMPLETE
    Tue Oct 12 15:23:51 2010
    PMON (ospid: 22523): terminating the instance due to error 397
    Instance terminated by PMON, pid = 22523
    Instance看起来也被中断了。

    这时候sqlplus 是
    SYS@orcl>select status from v$instance;
    select status from v$instance
    *
    ERROR at line 1:
    ORA-01034: ORACLE not available
    Process ID: 0
    Session ID: 0 Serial number: 0

    sqlplus / as sysdba

    SQL*Plus: Release 11.2.0.1.0 Production on Tue Oct 12 15:28:00 2010

    Copyright (c) 1982, 2009, Oracle. All rights reserved.

    Connected to an idle instance.

    SYS@orcl>

    上面如果没有在gdb中kill,设置了break后,alter database open,执行后hang住,从另外的窗口中select status from v$instance的时候是open的,再执行update成功。

    这是oracle 版本的差别么,即10.2版本中这时候只中断Server process,而11.2中instance也terminated?还是我执行的问题?

    [回复]

    老熊 回复:

    这毕竟是比较internal的东西,Oracle官方是不支持的。随着版本的不同,内部行为肯定有所变化。

    [回复]

  13. break kokiasg
    kokiasg是什么意思?怎么知道在这里break?

    [回复]

    老熊 回复:

    @cc, kokiasg其作用类似于检查一些数据库里面的基础数据,以保证数据库能够正常。所以知道了这一点,在这个地方break,就是阻止进一步的检查。

    [回复]

  14. 甲骨虫虫 @ 2011-01-07 13:40

    “break kokiasg
    kokiasg是什么意思?怎么知道在这里break?”

    –这个才是关键了,但是一般人肯定不知道在哪里设置断点。

    [回复]

    老熊 回复:

    @甲骨虫虫, 这个来自Oracle Support internal的东西,所以这玩意儿只能照着做。

    [回复]

  15. (gdb) break kokiasg
    Breakpoint 1 at 0xa3d404d
    (gdb) continue
    Continuing
    这个啥意思哈,如何操作!

    [回复]

  16. […] 老熊在“UPDATE GLOBAL_NAME为空之后的恢复”这篇文章里用的恢复方法中用到了gdb和kokiasg的特性,这种方法虽然很简洁,但是过于internal。 […]

Add your comment now