Zurück

Why is the Sun Solaris System Corefile helpful ?


The System Corefile is helpful during problem analysis on a SUN Solaris Computer.

When is a System corefile produced ?

A System Corefile is produced when the panic() routine calls vfs_syncall() and dumpsys() to sync physical memory to the appropriate disks and the current kernel image to the dump device. When savecore is run during bootup, it scans the top end of the primary swap partition and creates a unix.0 and a corresponding vmcore.0 files. These files are automatically incremented as additional corefiles are captured. The .bounds file keeps track of the current increment. Panic() is called when a situation occurs which would compromise the data integrity of the running system. The philosopy is that continuing would be worse than stopping and rebooting. 

What to do when a System hangs ?

This are the steps in case a SUN Solaris System hangs:

  • The first goal is to get the system to the OK> prompt by pressing the 'Stop' & 'A' keys together or by sending a 'break' signal using a TTY, or unplugging and replugging the console keyboard.

[Stop]-[A]
OK>

  • A system corefile can then be manually produced by typing:

OK> sync

Successfully capturing a corefile is dependent upon patch level, the type of device used for primary swap space. Sun Support has a utility named core_check.sh that will report if the system is at the proper patch level and is configured properly to capture a corefile. This is available upon request at the SUN support.

What is captured in a System Corefile ?

All kernel memory pages are saved, active pages in the kernel segment map are saved, and running user process stacks are saved. By default, the kernel memory pages of active processes are saved. Setting the appropriate switches with dumpadm -u -c all, forces all memory pages to be captured, however most of this data is not useful and capturing it creates extremely large corefiles. Our advice is to not to enable this feature unless directed by Sun Support. See the manpage on dumpadm for more details.

Why is the System Corefile valuable to analyze a Problem ?

A system corefile is a snapshot of kernel memory at the moment of the panic. This data shows what threads are running on each cpu, the process table, the current threads on the dispatch queue, the kernel memory structures. Through corefile analysis, SUN is able to reconstruct the events which led to the panic.

Based upon this information SUN can usually determine if the problem was caused by hardware or software, which part caused the panic, what code the cpu was running when the panic condition occurred and then search for an exisiting bug and patch fix. Just because a cpu reported the panic, that doesn't mean the cpu was the cause.

How to get the Panic Strings ?

It's important to produce the panic strings and provide this info when opening a case with Sun. If this a known problem it could save hours of effort to find a solution.

# strings vmcore.* | head

How is Savecore enabled ?

In Solaris 2.5 through 2.6, savecore is normally not enabled. It must be enabled by the system administrator through editing the /etc/init.d/sysetup file. If the system panics, the /var/adm/messages file will show 'dumping pages....', this indicates that the system has captured a corefile. If savecore has not been enabled, it may be run manually shortly after reboot by cd'ing into a directory with sufficient space to hold the system corefile and typing the command savecore -v . which tells the system to dump the savecore 'here' and provides a verbose status message if it was able to process the savecore.

In Solaris 7 and above, savecore is enabled by default and is controlled by the dumpadm command. You can run the dumpadm command without arguments to get the current configuration. Starting with Solaris 7, the system corefile is automatically compressed to conserve room in the primary swap partition.

# dumpadm

Dump content: kernel pages
Dump device: /dev/dsk/c0t0d0s3 (swap)
Savecore directory: /var/crash/diamond
Savecore enabled: yes

If you are using something other than a raw primary swap partition, there is a risk that a savecore may not be produced. For instance if the ' vxfs ' driver caused the panic, the savecore may not work if swap is under ' vxfs ' control. The fewer layers of drivers involved, the better chance of capturing a useful corefile.

It's critical that the directory where the /etc/init.d/sysetup puts the corefile is:

  • Has enough space available

  • Mounted at the time savecore is going to run

Savecore normally runs as part of /etc/rc2.d/SXXsysetup, /var is normally mounted right away, before run level 2, so that should be OK if there is enough room on /var for the core file.

Configuration Files and Setup
 

/etc/init.d/sysetup

Check if savecore is enabled

/etc/dumpadm.conf

Configuration File for dumpadm

/var/crash/`uname -n`

Location of Crash Dump Directory

/usr/bin/savecore

Save a crash dump