System Necromancy

This Page is for all the quick tips  to help bring back systems from dead

E3500 Boot from Solaris 2.6 Media

 {e} ok boot cdrom
 Boot device: /sbus@3,0/SUNW,fas@3,8800000/sd@6,0:f  File and args:
 SunOS Release 5.6 Version Generic [UNIX(R) System V Release 4.0]
 Copyright (c) 1983-1997, Sun Microsystems, Inc.
 BAD TRAP: cpu=14 type=0x34 rp=0x104034b8 addr=0xfad006aa mmu_fsr=0x0
 : alignment error:
 addr=0xfad006aa
 pid=0, pc=0x10029dc4, sp=0x10403548, tstate=0x880001e04, context=0x0
 g1-g7: 1040fc00, 6000601c, 46ce3, 16, 0, 0, 10404040
 Begin traceback... sp = 10403548
 Called from 1002a2ac, fp=104035a8, args=18c 31c 0 1040ff8c 0 0
 Called from 1003bf8c, fp=10403618, args=10407ff4 0 9 0 1040ff88 c70
 Called from 1002215c, fp=104036a8, args=1041197c 60338000 1 10412674 

perhaps my media is bad! , NO! 

400mhz CPU machines will panic if you attempt to boot the Solaris 2.6 5/98 media

 the Fix is to limit the amount of cache until you patch the kernel :

limit-ecache-size


now it works :

{e} ok boot cdrom -s
Boot device: /sbus@3,0/SUNW,fas@3,8800000/sd@6,0:f  File and args: -s
SunOS Release 5.6 Version Generic [UNIX(R) System V Release 4.0]
Copyright (c) 1983-1997, Sun Microsystems, Inc.
Configuring devices...
|
INIT: SINGLE USER MODE
#

until you patch your kernel to at least 105181-20 you will need to set limit-ecache-size at every boot


 

Recovering from Meta device corruption 

 On some older unpatched systems we can get it to a situation where the machine goes into a panic -reboot loop over metadevs being failed

if there is no obvious disk issues try a metareplace -e metadev dsk

but sometimes even this will fail

 boot the system as below rip out SVM

checking if you have softpartitions with metastat ... if you do it gets more interesting as you may loose them 

mv /etc/system /etc/system.old and edit the rootdev entry to match the physical path of the root disk /pci...

modify /etc/vfstab and revert the / and swap devices back to bare disks and comment out all other disks

attempt a normal boot , it should now boot normally

if you have got this far ... you have booted your system on bare disk volumes 

save a copy of metastat output and metadb output (on paper)

 now nuke the metadb

metadb -f -c < all of them> and delete all entries in /etc/lvm/md/cf (take a copy first) - see other tips for metadb to file backup

reboot again to free the kernel from any knowledge of SVM

now recreate metadb -f -a -c 3 <devices>

and recreate the metadevs from before using you list


now metarecover -v -d3 -p -d (answer yes) and your data might be back on the volumes

fsck the volumes are they are recovered

at this point - get a good backup if you have the time


encapsulation of the root etc needs to be done to bring the system back up to production ready status


Booting troubled systems

boot -savb

should get you out of most troubles


mount -o remount <dev> / to make the root readabke



Install/recover boot block

 /usr/sbin/installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/xxx