otak-otak | favorite evening food

December 30, 2011

ORA-600 17182 Block corruption

Filed under: troubleshoot — ora62 @ 2:12 pm

Suddenly one of my customer make a call at night about ora-600 of their core system…

logging into machine and grep the alert log, found the ora-600; database can be opened for a while (about 2 minutes) and then terminate.

Doing block recovery for file 7 block 15312
Block recovery from logseq 147867, block 3707 to scn 3789052035
Recovery of Online Redo Log: Thread 1 Group 6 Seq 147867 Reading mem 0
Mem# 0: /redolog1/dbname/redo06.log
Block recovery stopped at EOT rba 147867.3776.16
Block recovery completed at rba 147867.3776.16, scn 0.3789052035
Thu Dec 29 21:35:26 2011
Completed: alter database open
Thu Dec 29 21:35:29 2011
Starting background process CJQ0
Thu Dec 29 21:35:29 2011
CJQ0 started with pid=45, OS id=192
Thu Dec 29 21:35:29 2011
Sweep Incident[95434]: completed
Errors in file /oracle/diag/diag/rdbms/dbnameg/dbname/trace/dbname_smon_29913.trc:
SVR4 Error: 2: No such file or directory
Additional information: 3
ORACLE Instance dbname (pid = 13) – Error 600 encountered while recovering transaction (203, 27) on object 79208.
Errors in file /oracle/diag/diag/rdbms/dbnameg/dbname/trace/dbname_smon_29913.trc:
ORA-00600: internal error code, arguments: [17182], [0xFFFFFFFF7B81D750], [], [], [], [], [], []
Errors in file /oracle/diag/diag/rdbms/dbnameg/dbname/trace/dbname_smon_29913.trc (incident=95435):
ORA-00600: internal error code, arguments: [KSMFPG2], [0xFFFFFFFF7B81D000], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [17182], [0xFFFFFFFF7B81D750], [], [], [], [], [], []
Incident details in: /oracle/diag/diag/rdbms/dbnameg/dbname/incident/incdir_95435/dbname_smon_29913_i95435.trc
Thu Dec 29 21:35:32 2011
Trace dumping is performing id=[cdmp_20111229213532]
Thu Dec 29 21:35:33 2011
Fatal internal error happened while SMON was doing active transaction recovery.
Errors in file /oracle/diag/diag/rdbms/dbnameg/dbname/trace/dbname_smon_29913.trc:
ORA-00600: internal error code, arguments: [KSMFPG2], [0xFFFFFFFF7B81D000], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [17182], [0xFFFFFFFF7B81D750], [], [], [], [], [], []
SMON (ospid: 29913): terminating the instance due to error 474
Instance terminated by SMON, pid = 29913

from the alert log I found there are corruption at block level, but oracle actualy can recovered it automatically, but it seems when there is undo application for a transaction got stuck due to block corrupt at the other block.

find the object name as informed by alertlog:

select owner,object_name from dba_objects where object_id=79208;

now time to scan all the logical corruption by rman validate command:

run {
allocate channel d1 type disk;
backup check logical validate database;
release channel d1;
}

confirmed there are 2 block corruption on the table:

Error backing up file 7, block 15312: logical corruption
Error backing up file 7, block 18714: logical corruption

another check by dbms_repair to investigate the real corruption, and marked it as corrupt:
declare
begin
dbms_repair.admin_tables(
table_name => ‘REPAIR_TABLE_99’,
table_type => dbms_repair.repair_table,
action => dbms_repair.create_action,
tablespace => ‘USERS’);
end;
/
select owner,object_name,object_type
from dba_objects
where object_name like ‘%REPAIR_TABLE%’;

declare
begin
dbms_repair.admin_tables(
table_name => ‘ORPHAN_REP_TBL_99’,
table_type => dbms_repair.orphan_table,
action => dbms_repair.create_action,
tablespace => ‘USERS’);
end;
/

select owner,object_name,object_type
from dba_objects
where object_name like ‘%ORPHAN%’;

set serveroutput on

declare
rpr_count int;
begin
rpr_count :=0;

dbms_repair.check_object(
schema_name => ‘TEST’,
object_name => ‘TBL1’,
repair_table_name => ‘REPAIR_TABLE_99’,
corrupt_count => rpr_count);

dbms_output.put_line(‘rep cnt : ‘||to_char(rpr_count));
end;
/

select object_name,block_id,corrupt_type,marked_corrupt,corrupt_description,repair_description from repair_table_99;

OBJECT_NAME BLOCK_ID CORRUPT_TYPE MARKED_COR
—————————— ———- ———— ———-
CORRUPT_DESCRIPTION
——————————————————————————–
REPAIR_DESCRIPTION
——————————————————————————–
TBL1 15312 6130 FALSE
Block Checking: DBA = 29375440, Block Type = KTB-managed data block
data header at 0x3d85b007c
kdbchk: fseo(223) < fsbo(282)
mark block software corrupt
TBL1 18714 6110 FALSE
Block Checking: DBA = 29378842, Block Type = KTB-managed data block
data header at 0x3d756e0ac
kdbchk: the amount of space used is not equal to block size
used=7403 fsc=0 avsp=769 dtl=8016
mark block software corrupt

mark those block as corrupted:

declare
l_fix_count int;
begin
l_fix_count:=0;

dbms_repair.fix_corrupt_blocks(
schema_name => ‘TEST’,
object_name => ‘TBL1’,
object_type => dbms_repair.table_object,
repair_table_name => ‘REPAIR_TABLE_99’,
fix_count => l_fix_count);

dbms_output.put_line(‘fix cnt : ‘||to_char(l_fix_count));
end;
/

select object_name,block_id,corrupt_type,marked_corrupt,corrupt_description,repair_description from repair_table_99;

OBJECT_NAME BLOCK_ID CORRUPT_TYPE MARKED_COR
—————————— ———- ———— ———-
CORRUPT_DESCRIPTION
——————————————————————————–
REPAIR_DESCRIPTION
——————————————————————————–
TBL1 15312 6130 TRUE
Block Checking: DBA = 29375440, Block Type = KTB-managed data block
data header at 0x3d85b007c
kdbchk: fseo(223) < fsbo(282)
mark block software corrupt

TBL1 18714 6110 TRUE
Block Checking: DBA = 29378842, Block Type = KTB-managed data block
data header at 0x3d756e0ac
kdbchk: the amount of space used is not equal to block size
used=7403 fsc=0 avsp=769 dtl=8016
mark block software corrupt

now time to recover the table
skip corrupted block from reading, and get the good data:

alter session set event=’10231 trace name context forever, level 10’;
create table test.tmp as select * from test.tbl1;

ok done, now the application guy who have to look into the data and fix them
have a nice sleep then :)

December 1, 2011

perl script to automate restore, recover rman backup

Filed under: barman — ora62 @ 11:26 am

I created ‘simple-silly’ perl script to automate restore and recover of rman backup. this script will only need supply rman backup log.

how to execute:
1. to restore & recover until all archivelog in the backup.
./restore_arc.sh <full_path_rman_backup_log>

2. to restore & recover until all archivelog (until cancel).
./restore_arc_last.sh <full_path_rman_backup_log>

has been tested on linux env, you may use this script for your needs and please take your own risk :)

cat restore_arc.sh

#!/bin/bash
. /home/ujang/.bashrc

SCRPATH=/home/ujang/docs/orascript/backup-recovery/rman
LOGPATH=/home/ujang/docs/orascript/backup-recovery/rman
TAG=`date +%Y%m%d`
CMDFILE=${SCRPATH}/restore.txt
LOGFILE=${LOGPATH}/restore_${TAG}.log
RMANLOG=$1
cd $SCRPATH
export ORACLE_HOME=/apps/oracle/product/9.2
export ORACLE_SID=PROD
./restore.pl $RMANLOG spfile=NO lastarc=NO
##$ORACLE_HOME/bin/rman target / nocatalog cmdfile=$CMDFILE msglog $LOGFILE

cat restore_arc_last.sh

#!/bin/bash
. /home/ujang/.bashrc

SCRPATH=/home/ujang/docs/orascript/backup-recovery/rman
LOGPATH=/home/ujang/docs/orascript/backup-recovery/rman
TAG=`date +%Y%m%d`
CMDFILE=${SCRPATH}/restore.txt
LOGFILE=${LOGPATH}/restore_${TAG}.log
RMANLOG=$1
cd $SCRPATH
export ORACLE_HOME=/apps/oracle/product/9.2
export ORACLE_SID=PROD
./restore.pl $RMANLOG spfile=NO lastarc=YES
##$ORACLE_HOME/bin/rman target / nocatalog cmdfile=$CMDFILE msglog $LOGFILE

cat restore.pl

#!/usr/bin/perl
my @array = qq[input archive log thread];
my @match = (“sequence=”,”thread=”,”DBID=”,”piece handle=”,”CONTROL”,”SPFILE”);
my @seqid;
my $thread=0;
my $dbid=0;
my $ctlfile=”XXX”;
my $initfile=”XXX”;
my $orahome=”XXX”;
my $orasid=”X”;
foreach $key (keys(%ENV)) {
if ($orahome eq “XXX”) {
if ($key eq “ORACLE_HOME”) {
$orahome=$ENV{$key};
}
}
if ($orasid eq “X”) {
if ($key eq “ORACLE_SID”) {
$orasid=$ENV{$key};
}
}
}

if ($orahome eq “XXX” || $orasid eq “X”)
{
print “environment variable ORACLE_HOME dan ORACLE_SID harus diset!!\n”;
exit 0;
}

if ($ARGV[0] eq “help=yes” || $ARGV[0] eq “help=y” || $ARGV[0] eq “-h” || $ARGV[0] eq “–h”)
{
print “Cara penggunaan :\n”;
print “1. anda harus berada pada directory dimana file ini dijalankan, misalnya cd /oradata/rman/scr”;
print “2. ketik \n”;
print ” recover spfile=YES|NO\n”;
print ” misal: \n”;
print ” recover /oradata/rman/log/rman_bkp_20110504.log spfile=NO\n”;
exit 0;
}

my $rmanlog=$ARGV[0];
my $spfileloc=$ARGV[1];
my $lastarc=$ARGV[2];
my $spfileloc=substr($spfileloc,index($spfileloc,”=”,0)+1,3);
my $lastarc=substr($lastarc,index($lastarc,”=”,0)+1,3);

if (length($rmanlog)<=0)
{
print “anda harus passing parameter, gunakan restore.sh -h untuk bantuan\n”;
exit 0;
}

open(OLD, “<“, $rmanlog) or die “can’t open $rmanlog”;
while ()
{
my $line = $_;
if ($dbid == 0) {
my $seq = (split /$match[2]/, $line)[1];
$dbid=substr($seq,0,index($seq,”)”,0));
}
if ($ctlfile eq “XXX”) {
my $seq = (split /^$match[3]/, $line)[1];
if($seq =~ m/$match[4]/i) {
$ctlfile=substr($seq,0,index($seq,chr(32),0));
}
}

if ($initfile eq “XXX”) {
my $seq = (split /$match[3]/, $line)[1];
if($seq =~ m/$match[5]/i) {
$initfile=substr($seq,0,index($seq,chr(32),0));
}
}
if ($line =~ /^@array/) {
my $seq = (split /$match[0]/, $line)[1];
push @seqid, substr($seq,0,index($seq,chr(32),0));
if ($thread == 0) {
my $seq = (split /$match[1]/, $line)[1];
$thread=substr($seq,0,index($seq,chr(32),0));
}
}
}
@seqid=sort(@seqid);
my $seqmax = @seqid;
close(OLD);

system(“rm -f restore.txt”);

($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time);
$year=$year+1900;
$new=”restore.txt”;
open(NEW, “>”, $new) or die “can’t open $new: $!”;
if ($spfileloc eq “YES” or $spfileloc eq “yes”) {
print NEW “host ‘mv $orahome/dbs/spfile$orasid\.ora $orahome/dbs/spfile$orasid\_$year$mon$mday$hour$min\.ora’;\n” or die “can’t write $new: $!”;
}
print NEW “set dbid=$dbid;\n” or die “can’t write $new: $!”;
print NEW “startup force nomount;\n” or die “can’t write $new: $!”;
if ($spfileloc eq “YES” or $spfileloc eq “yes”) {
print NEW “restore spfile to ‘$orahome/dbs/spfile$orasid\.ora’ from ‘$initfile’;\n” or die “can’t write $new: $!”;
}
print NEW “shutdown abort;\n” or die “can’t write $new: $!”;
print NEW “set dbid=$dbid;\n” or die “can’t write $new: $!”;
print NEW “startup nomount;\n” or die “can’t write $new: $!”;
print NEW “restore controlfile from ‘$ctlfile’;\n” or die “can’t write $new: $!”;
print NEW “alter database mount;\n” or die “can’t write $new: $!”;
print NEW “restore database;\n” or die “can’t write $new: $!”;
print NEW “restore archivelog from sequence $seqid[0] until sequence $seqid[$seqmax-1];\n” or die “can’t write $new: $!”;

if ($lastarc eq “YES” or $lastarc eq “yes”) {
print NEW “sql ‘RECOVER DATABASE UNTIL CANCEL USING BACKUP CONTROLFILE;’;\n” or die “can’t write $new: $!”;
} else {
print NEW “recover database until logseq $seqid[$seqmax-1] thread $thread;\n” or die “can’t write $new: $!”;
}
print NEW “sql ‘alter database open resetlogs’;\n” or die “can’t write $new: $!”;
close(NEW) or die “can’t close $new: $!”;

Blog at WordPress.com.