Go to the first, previous, next, last section, table of contents.


Encoding the Structure of the Program

The elements of the program structure that stabs encode include the name of the main function, the names of the source and include files, the line numbers, procedure names and types, and the beginnings and ends of blocks of code.

Main Program

Most languages allow the main program to have any name. The N_MAIN stab type tells the debugger the name that is used in this program. Only the string field is significant; it is the name of a function which is the main program. Most C compilers do not use this stab (they expect the debugger to assume that the name is main), but some C compilers emit an N_MAIN stab for the main function. I'm not sure how XCOFF handles this.

Paths and Names of the Source Files

Before any other stabs occur, there must be a stab specifying the source file. This information is contained in a symbol of stab type N_SO; the string field contains the name of the file. The value of the symbol is the start address of the portion of the text section corresponding to that file.

Some compilers use the desc field to indicate the language of the source file. Sun's compilers started this usage, and the first constants are derived from their documentation. Languages added by gcc/gdb start at 0x32 to avoid conflict with languages Sun may add in the future. A desc field with a value 0 indicates that no language has been specified via this mechanism.

N_SO_AS (0x1)
Assembly language
N_SO_C (0x2)
K&R traditional C
N_SO_ANSI_C (0x3)
ANSI C
N_SO_CC (0x4)
C++
N_SO_FORTRAN (0x5)
Fortran
N_SO_PASCAL (0x6)
Pascal
N_SO_FORTRAN90 (0x7)
Fortran90
N_SO_OBJC (0x32)
Objective-C
N_SO_OBJCPLUS (0x33)
Objective-C++

Some compilers (for example, GCC2 and SunOS4 `/bin/cc') also include the directory in which the source was compiled, in a second N_SO symbol preceding the one containing the file name. This symbol can be distinguished by the fact that it ends in a slash. Code from the cfront C++ compiler can have additional N_SO symbols for nonexistent source files after the N_SO for the real source file; these are believed to contain no useful information.

For example:

.stabs "/cygint/s1/users/jcm/play/",100,0,0,Ltext0     # 100 is N_SO
.stabs "hello.c",100,0,0,Ltext0
        .text
Ltext0:

Instead of N_SO symbols, XCOFF uses a .file assembler directive which assembles to a C_FILE symbol; explaining this in detail is outside the scope of this document.

If it is useful to indicate the end of a source file, this is done with an N_SO symbol with an empty string for the name. The value is the address of the end of the text section for the file. For some systems, there is no indication of the end of a source file, and you just need to figure it ended when you see an N_SO for a different source file, or a symbol ending in .o (which at least some linkers insert to mark the start of a new .o file).

Names of Include Files

There are several schemes for dealing with include files: the traditional N_SOL approach, Sun's N_BINCL approach, and the XCOFF C_BINCL approach (which despite the similar name has little in common with N_BINCL).

An N_SOL symbol specifies which include file subsequent symbols refer to. The string field is the name of the file and the value is the text address corresponding to the end of the previous include file and the start of this one. To specify the main source file again, use an N_SOL symbol with the name of the main source file.

The N_BINCL approach works as follows. An N_BINCL symbol specifies the start of an include file. In an object file, only the string is significant; the linker puts data into some of the other fields. The end of the include file is marked by an N_EINCL symbol (which has no string field). In an object file, there is no significant data in the N_EINCL symbol. N_BINCL and N_EINCL can be nested.

If the linker detects that two source files have identical stabs between an N_BINCL and N_EINCL pair (as will generally be the case for a header file), then it only puts out the stabs once. Each additional occurrence is replaced by an N_EXCL symbol. I believe the GNU linker and the Sun (both SunOS4 and Solaris) linker are the only ones which supports this feature.

A linker which supports this feature will set the value of a N_BINCL symbol to the total of all the characters in the stabs strings included in the header file, omitting any file numbers. The value of an N_EXCL symbol is the same as the value of the N_BINCL symbol it replaces. This information can be used to match up N_EXCL and N_BINCL symbols which have the same filename. The N_EINCL value, and the values of the other and description fields for all three, appear to always be zero.

For the start of an include file in XCOFF, use the `.bi' assembler directive, which generates a C_BINCL symbol. A `.ei' directive, which generates a C_EINCL symbol, denotes the end of the include file. Both directives are followed by the name of the source file in quotes, which becomes the string for the symbol. The value of each symbol, produced automatically by the assembler and linker, is the offset into the executable of the beginning (inclusive, as you'd expect) or end (inclusive, as you would not expect) of the portion of the COFF line table that corresponds to this include file. C_BINCL and C_EINCL do not nest.

Line Numbers

An N_SLINE symbol represents the start of a source line. The desc field contains the line number and the value contains the code address for the start of that source line. On most machines the address is absolute; for stabs in sections (see section Using Stabs in Their Own Sections), it is relative to the function in which the N_SLINE symbol occurs.

GNU documents N_DSLINE and N_BSLINE symbols for line numbers in the data or bss segments, respectively. They are identical to N_SLINE but are relocated differently by the linker. They were intended to be used to describe the source location of a variable declaration, but I believe that GCC2 actually puts the line number in the desc field of the stab for the variable itself. GDB has been ignoring these symbols (unless they contain a string field) since at least GDB 3.5.

For single source lines that generate discontiguous code, such as flow of control statements, there may be more than one line number entry for the same source line. In this case there is a line number entry at the start of each code range, each with the same line number.

XCOFF does not use stabs for line numbers. Instead, it uses COFF line numbers (which are outside the scope of this document). Standard COFF line numbers cannot deal with include files, but in XCOFF this is fixed with the C_BINCL method of marking include files (see section Names of Include Files).

Procedures

All of the following stabs normally use the N_FUN symbol type. However, Sun's acc compiler on SunOS4 uses N_GSYM and N_STSYM, which means that the value of the stab for the function is useless and the debugger must get the address of the function from the non-stab symbols instead. On systems where non-stab symbols have leading underscores, the stabs will lack underscores and the debugger needs to know about the leading underscore to match up the stab and the non-stab symbol. BSD Fortran is said to use N_FNAME with the same restriction; the value of the symbol is not useful (I'm not sure it really does use this, because GDB doesn't handle this and no one has complained).

A function is represented by an `F' symbol descriptor for a global (extern) function, and `f' for a static (local) function. For a.out, the value of the symbol is the address of the start of the function; it is already relocated. For stabs in ELF, the SunPRO compiler version 2.0.1 and GCC put out an address which gets relocated by the linker. In a future release SunPRO is planning to put out zero, in which case the address can be found from the ELF (non-stab) symbol. Because looking things up in the ELF symbols would probably be slow, I'm not sure how to find which symbol of that name is the right one, and this doesn't provide any way to deal with nested functions, it would probably be better to make the value of the stab an address relative to the start of the file, or just absolute. See section Having the Linker Relocate Stabs in ELF for more information on linker relocation of stabs in ELF files. For XCOFF, the stab uses the C_FUN storage class and the value of the stab is meaningless; the address of the function can be found from the csect symbol (XTY_LD/XMC_PR).

The type information of the stab represents the return type of the function; thus `foo:f5' means that foo is a function returning type 5. There is no need to try to get the line number of the start of the function from the stab for the function; it is in the next N_SLINE symbol.

Some compilers (such as Sun's Solaris compiler) support an extension for specifying the types of the arguments. I suspect this extension is not used for old (non-prototyped) function definitions in C. If the extension is in use, the type information of the stab for the function is followed by type information for each argument, with each argument preceded by `;'. An argument type of 0 means that additional arguments are being passed, whose types and number may vary (`...' in ANSI C). GDB has tolerated this extension (parsed the syntax, if not necessarily used the information) since at least version 4.8; I don't know whether all versions of dbx tolerate it. The argument types given here are not redundant with the symbols for the formal parameters (see section Parameters); they are the types of the arguments as they are passed, before any conversions might take place. For example, if a C function which is declared without a prototype takes a float argument, the value is passed as a double but then converted to a float. Debuggers need to use the types given in the arguments when printing values, but when calling the function they need to use the types given in the symbol defining the function.

If the return type and types of arguments of a function which is defined in another source file are specified (i.e., a function prototype in ANSI C), traditionally compilers emit no stab; the only way for the debugger to find the information is if the source file where the function is defined was also compiled with debugging symbols. As an extension the Solaris compiler uses symbol descriptor `P' followed by the return type of the function, followed by the arguments, each preceded by `;', as in a stab with symbol descriptor `f' or `F'. This use of symbol descriptor `P' can be distinguished from its use for register parameters (see section Passing Parameters in Registers) by the fact that it has symbol type N_FUN.

The AIX documentation also defines symbol descriptor `J' as an internal function. I assume this means a function nested within another function. It also says symbol descriptor `m' is a module in Modula-2 or extended Pascal.

Procedures (functions which do not return values) are represented as functions returning the void type in C. I don't see why this couldn't be used for all languages (inventing a void type for this purpose if necessary), but the AIX documentation defines `I', `P', and `Q' for internal, global, and static procedures, respectively. These symbol descriptors are unusual in that they are not followed by type information.

The following example shows a stab for a function main which returns type number 1. The _main specified for the value is a reference to an assembler label which is used to fill in the start address of the function.

.stabs "main:F1",36,0,0,_main      # 36 is N_FUN

The stab representing a procedure is located immediately following the code of the procedure. This stab is in turn directly followed by a group of other stabs describing elements of the procedure. These other stabs describe the procedure's parameters, its block local variables, and its block structure.

If functions can appear in different sections, then the debugger may not be able to find the end of a function. Recent versions of GCC will mark the end of a function with an N_FUN symbol with an empty string for the name. The value is the address of the end of the current function. Without such a symbol, there is no indication of the address of the end of a function, and you must assume that it ended at the starting address of the next function or at the end of the text section for the program.

Nested Procedures

For any of the symbol descriptors representing procedures, after the symbol descriptor and the type information is optionally a scope specifier. This consists of a comma, the name of the procedure, another comma, and the name of the enclosing procedure. The first name is local to the scope specified, and seems to be redundant with the name of the symbol (before the `:'). This feature is used by GCC, and presumably Pascal, Modula-2, etc., compilers, for nested functions.

If procedures are nested more than one level deep, only the immediately containing scope is specified. For example, this code:

int
foo (int x)
{
  int bar (int y)
    {
      int baz (int z)
        {
          return x + y + z;
        }
      return baz (x + 2 * y);
    }
  return x + bar (3 * x);
}

produces the stabs:

.stabs "baz:f1,baz,bar",36,0,0,_baz.15         # 36 is N_FUN
.stabs "bar:f1,bar,foo",36,0,0,_bar.12
.stabs "foo:F1",36,0,0,_foo

Block Structure

The program's block structure is represented by the N_LBRAC (left brace) and the N_RBRAC (right brace) stab types. The variables defined inside a block precede the N_LBRAC symbol for most compilers, including GCC. Other compilers, such as the Convex, Acorn RISC machine, and Sun acc compilers, put the variables after the N_LBRAC symbol. The values of the N_LBRAC and N_RBRAC symbols are the start and end addresses of the code of the block, respectively. For most machines, they are relative to the starting address of this source file. For the Gould NP1, they are absolute. For stabs in sections (see section Using Stabs in Their Own Sections), they are relative to the function in which they occur.

The N_LBRAC and N_RBRAC stabs that describe the block scope of a procedure are located after the N_FUN stab that represents the procedure itself.

Sun documents the desc field of N_LBRAC and N_RBRAC symbols as containing the nesting level of the block. However, dbx seems to not care, and GCC always sets desc to zero.

For XCOFF, block scope is indicated with C_BLOCK symbols. If the name of the symbol is `.bb', then it is the beginning of the block; if the name of the symbol is `.be'; it is the end of the block.

Coalesced Symbol Blocks

On Mac OS X, a coalesced symbol is a true definition of a symbol that may appear one or more times in the compilation units generated by the compiler. The semantics of coalesced symbols are similar to those of sections with SEC_LINK_DUPLICATES_DISCARD (COMDAT) set, with the difference that coalesced symbols are processed on a per-symbol basis, rather than on a per-section basis. Currently, coalesced symbols are implemented only on Mac OS X.

The static link editor allows multiple definitions of a coalesced symbol without any warnings or errors. The static link editor outputs only one instance of each coalesced symbol, using the first instance it encounters in the object files being linked. The static link editor always outputs an instance of a coalesced symbol if it appears in the object files being linked, even if it also appears in the dynamic libraries being referenced. The dynamic link editor then relocates such that only one instance of each coalesced symbol is used throughout the program.

Coalesced symbols are placed by the compiler tools into sections flagged with the type bit S_COALESCED. The static and dynamic linker look at the section type to determine that a given symbol is a coalesced symbol and therefore to allow multiple definitions.

The static link editor divides up a coalesced section on the boundaries of the symbols in that section, associating the bytes of the section after each symbol with the preceding symbol. An object file is considered malformed if a coalesced section does not have a symbol at the first address of the section.

To allow the linker to properly manage the debug information for coalesced symbols, the stabs entries for a given coalesced symbol must be preceded by N_BNSYM and terminated with N_ENSYM. The value of the N_BNYSM stab should be the start address of the coalesced portion (the same value as the symbol that begins the portion), and the value of the N_ENSYM stab should be the end address of the coalesced portion (the same value as the next coalesced symbol, or the end of the whole coalesced section).

Alternate Entry Points

Some languages, like Fortran, have the ability to enter procedures at some place other than the beginning. One can declare an alternate entry point. The N_ENTRY stab is for this; however, the Sun FORTRAN compiler doesn't use it. According to AIX documentation, only the name of a C_ENTRY stab is significant; the address of the alternate entry point comes from the corresponding external symbol. A previous revision of this document said that the value of an N_ENTRY stab was the address of the alternate entry point, but I don't know the source for that information.


Go to the first, previous, next, last section, table of contents.