The elements of the program structure that stabs encode include the name of the main function, the names of the source and include files, the line numbers, procedure names and types, and the beginnings and ends of blocks of code.
Most languages allow the main program to have any name. The
N_MAIN
stab type tells the debugger the name that is used in this
program. Only the string field is significant; it is the name of
a function which is the main program. Most C compilers do not use this
stab (they expect the debugger to assume that the name is main
),
but some C compilers emit an N_MAIN
stab for the main
function. I'm not sure how XCOFF handles this.
Before any other stabs occur, there must be a stab specifying the source
file. This information is contained in a symbol of stab type
N_SO
; the string field contains the name of the file. The
value of the symbol is the start address of the portion of the
text section corresponding to that file.
Some compilers use the desc field to indicate the language of the source file. Sun's compilers started this usage, and the first constants are derived from their documentation. Languages added by gcc/gdb start at 0x32 to avoid conflict with languages Sun may add in the future. A desc field with a value 0 indicates that no language has been specified via this mechanism.
N_SO_AS
(0x1)
N_SO_C
(0x2)
N_SO_ANSI_C
(0x3)
N_SO_CC
(0x4)
N_SO_FORTRAN
(0x5)
N_SO_PASCAL
(0x6)
N_SO_FORTRAN90
(0x7)
N_SO_OBJC
(0x32)
N_SO_OBJCPLUS
(0x33)
Some compilers (for example, GCC2 and SunOS4 `/bin/cc') also
include the directory in which the source was compiled, in a second
N_SO
symbol preceding the one containing the file name. This
symbol can be distinguished by the fact that it ends in a slash. Code
from the cfront
C++ compiler can have additional N_SO
symbols for
nonexistent source files after the N_SO
for the real source file;
these are believed to contain no useful information.
For example:
.stabs "/cygint/s1/users/jcm/play/",100,0,0,Ltext0 # 100 is N_SO .stabs "hello.c",100,0,0,Ltext0 .text Ltext0:
Instead of N_SO
symbols, XCOFF uses a .file
assembler
directive which assembles to a C_FILE
symbol; explaining this in
detail is outside the scope of this document.
If it is useful to indicate the end of a source file, this is done with
an N_SO
symbol with an empty string for the name. The value is
the address of the end of the text section for the file. For some
systems, there is no indication of the end of a source file, and you
just need to figure it ended when you see an N_SO
for a different
source file, or a symbol ending in .o
(which at least some
linkers insert to mark the start of a new .o
file).
There are several schemes for dealing with include files: the
traditional N_SOL
approach, Sun's N_BINCL
approach, and the
XCOFF C_BINCL
approach (which despite the similar name has little in
common with N_BINCL
).
An N_SOL
symbol specifies which include file subsequent symbols
refer to. The string field is the name of the file and the value is the
text address corresponding to the end of the previous include file and
the start of this one. To specify the main source file again, use an
N_SOL
symbol with the name of the main source file.
The N_BINCL
approach works as follows. An N_BINCL
symbol
specifies the start of an include file. In an object file, only the
string is significant; the linker puts data into some of the other
fields. The end of the include file is marked by an N_EINCL
symbol (which has no string field). In an object file, there is no
significant data in the N_EINCL
symbol. N_BINCL
and
N_EINCL
can be nested.
If the linker detects that two source files have identical stabs between
an N_BINCL
and N_EINCL
pair (as will generally be the case
for a header file), then it only puts out the stabs once. Each
additional occurrence is replaced by an N_EXCL
symbol. I believe
the GNU linker and the Sun (both SunOS4 and Solaris) linker are the only
ones which supports this feature.
A linker which supports this feature will set the value of a
N_BINCL
symbol to the total of all the characters in the stabs
strings included in the header file, omitting any file numbers. The
value of an N_EXCL
symbol is the same as the value of the
N_BINCL
symbol it replaces. This information can be used to
match up N_EXCL
and N_BINCL
symbols which have the same
filename. The N_EINCL
value, and the values of the other and
description fields for all three, appear to always be zero.
For the start of an include file in XCOFF, use the `.bi' assembler
directive, which generates a C_BINCL
symbol. A `.ei'
directive, which generates a C_EINCL
symbol, denotes the end of
the include file. Both directives are followed by the name of the
source file in quotes, which becomes the string for the symbol.
The value of each symbol, produced automatically by the assembler
and linker, is the offset into the executable of the beginning
(inclusive, as you'd expect) or end (inclusive, as you would not expect)
of the portion of the COFF line table that corresponds to this include
file. C_BINCL
and C_EINCL
do not nest.
An N_SLINE
symbol represents the start of a source line. The
desc field contains the line number and the value contains the code
address for the start of that source line. On most machines the address
is absolute; for stabs in sections (see section Using Stabs in Their Own Sections), it is
relative to the function in which the N_SLINE
symbol occurs.
GNU documents N_DSLINE
and N_BSLINE
symbols for line
numbers in the data or bss segments, respectively. They are identical
to N_SLINE
but are relocated differently by the linker. They
were intended to be used to describe the source location of a variable
declaration, but I believe that GCC2 actually puts the line number in
the desc field of the stab for the variable itself. GDB has been
ignoring these symbols (unless they contain a string field) since
at least GDB 3.5.
For single source lines that generate discontiguous code, such as flow of control statements, there may be more than one line number entry for the same source line. In this case there is a line number entry at the start of each code range, each with the same line number.
XCOFF does not use stabs for line numbers. Instead, it uses COFF line
numbers (which are outside the scope of this document). Standard COFF
line numbers cannot deal with include files, but in XCOFF this is fixed
with the C_BINCL
method of marking include files (see section Names of Include Files).
All of the following stabs normally use the N_FUN
symbol type.
However, Sun's acc
compiler on SunOS4 uses N_GSYM
and
N_STSYM
, which means that the value of the stab for the function
is useless and the debugger must get the address of the function from
the non-stab symbols instead. On systems where non-stab symbols have
leading underscores, the stabs will lack underscores and the debugger
needs to know about the leading underscore to match up the stab and the
non-stab symbol. BSD Fortran is said to use N_FNAME
with the
same restriction; the value of the symbol is not useful (I'm not sure it
really does use this, because GDB doesn't handle this and no one has
complained).
A function is represented by an `F' symbol descriptor for a global
(extern) function, and `f' for a static (local) function. For
a.out, the value of the symbol is the address of the start of the
function; it is already relocated. For stabs in ELF, the SunPRO
compiler version 2.0.1 and GCC put out an address which gets relocated
by the linker. In a future release SunPRO is planning to put out zero,
in which case the address can be found from the ELF (non-stab) symbol.
Because looking things up in the ELF symbols would probably be slow, I'm
not sure how to find which symbol of that name is the right one, and
this doesn't provide any way to deal with nested functions, it would
probably be better to make the value of the stab an address relative to
the start of the file, or just absolute. See section Having the Linker Relocate Stabs in ELF for more information on linker relocation of stabs in ELF
files. For XCOFF, the stab uses the C_FUN
storage class and the
value of the stab is meaningless; the address of the function can be
found from the csect symbol (XTY_LD/XMC_PR).
The type information of the stab represents the return type of the
function; thus `foo:f5' means that foo is a function returning type
5. There is no need to try to get the line number of the start of the
function from the stab for the function; it is in the next
N_SLINE
symbol.
Some compilers (such as Sun's Solaris compiler) support an extension for
specifying the types of the arguments. I suspect this extension is not
used for old (non-prototyped) function definitions in C. If the
extension is in use, the type information of the stab for the function
is followed by type information for each argument, with each argument
preceded by `;'. An argument type of 0 means that additional
arguments are being passed, whose types and number may vary (`...'
in ANSI C). GDB has tolerated this extension (parsed the syntax, if not
necessarily used the information) since at least version 4.8; I don't
know whether all versions of dbx tolerate it. The argument types given
here are not redundant with the symbols for the formal parameters
(see section Parameters); they are the types of the arguments as they are
passed, before any conversions might take place. For example, if a C
function which is declared without a prototype takes a float
argument, the value is passed as a double
but then converted to a
float
. Debuggers need to use the types given in the arguments
when printing values, but when calling the function they need to use the
types given in the symbol defining the function.
If the return type and types of arguments of a function which is defined
in another source file are specified (i.e., a function prototype in ANSI
C), traditionally compilers emit no stab; the only way for the debugger
to find the information is if the source file where the function is
defined was also compiled with debugging symbols. As an extension the
Solaris compiler uses symbol descriptor `P' followed by the return
type of the function, followed by the arguments, each preceded by
`;', as in a stab with symbol descriptor `f' or `F'.
This use of symbol descriptor `P' can be distinguished from its use
for register parameters (see section Passing Parameters in Registers) by the fact that it has
symbol type N_FUN
.
The AIX documentation also defines symbol descriptor `J' as an internal function. I assume this means a function nested within another function. It also says symbol descriptor `m' is a module in Modula-2 or extended Pascal.
Procedures (functions which do not return values) are represented as
functions returning the void
type in C. I don't see why this couldn't
be used for all languages (inventing a void
type for this purpose if
necessary), but the AIX documentation defines `I', `P', and
`Q' for internal, global, and static procedures, respectively.
These symbol descriptors are unusual in that they are not followed by
type information.
The following example shows a stab for a function main
which
returns type number 1
. The _main
specified for the value
is a reference to an assembler label which is used to fill in the start
address of the function.
.stabs "main:F1",36,0,0,_main # 36 is N_FUN
The stab representing a procedure is located immediately following the code of the procedure. This stab is in turn directly followed by a group of other stabs describing elements of the procedure. These other stabs describe the procedure's parameters, its block local variables, and its block structure.
If functions can appear in different sections, then the debugger may not
be able to find the end of a function. Recent versions of GCC will mark
the end of a function with an N_FUN
symbol with an empty string
for the name. The value is the address of the end of the current
function. Without such a symbol, there is no indication of the address
of the end of a function, and you must assume that it ended at the
starting address of the next function or at the end of the text section
for the program.
For any of the symbol descriptors representing procedures, after the symbol descriptor and the type information is optionally a scope specifier. This consists of a comma, the name of the procedure, another comma, and the name of the enclosing procedure. The first name is local to the scope specified, and seems to be redundant with the name of the symbol (before the `:'). This feature is used by GCC, and presumably Pascal, Modula-2, etc., compilers, for nested functions.
If procedures are nested more than one level deep, only the immediately containing scope is specified. For example, this code:
int foo (int x) { int bar (int y) { int baz (int z) { return x + y + z; } return baz (x + 2 * y); } return x + bar (3 * x); }
produces the stabs:
.stabs "baz:f1,baz,bar",36,0,0,_baz.15 # 36 is N_FUN .stabs "bar:f1,bar,foo",36,0,0,_bar.12 .stabs "foo:F1",36,0,0,_foo
The program's block structure is represented by the N_LBRAC
(left
brace) and the N_RBRAC
(right brace) stab types. The variables
defined inside a block precede the N_LBRAC
symbol for most
compilers, including GCC. Other compilers, such as the Convex, Acorn
RISC machine, and Sun acc
compilers, put the variables after the
N_LBRAC
symbol. The values of the N_LBRAC
and
N_RBRAC
symbols are the start and end addresses of the code of
the block, respectively. For most machines, they are relative to the
starting address of this source file. For the Gould NP1, they are
absolute. For stabs in sections (see section Using Stabs in Their Own Sections), they are
relative to the function in which they occur.
The N_LBRAC
and N_RBRAC
stabs that describe the block
scope of a procedure are located after the N_FUN
stab that
represents the procedure itself.
Sun documents the desc field of N_LBRAC
and
N_RBRAC
symbols as containing the nesting level of the block.
However, dbx seems to not care, and GCC always sets desc to
zero.
For XCOFF, block scope is indicated with C_BLOCK
symbols. If the
name of the symbol is `.bb', then it is the beginning of the block;
if the name of the symbol is `.be'; it is the end of the block.
On Mac OS X, a coalesced symbol is a true definition of a symbol
that may appear one or more times in the compilation units generated
by the compiler. The semantics of coalesced symbols are similar to
those of sections with SEC_LINK_DUPLICATES_DISCARD
(COMDAT
) set, with the difference that coalesced symbols are
processed on a per-symbol basis, rather than on a per-section basis.
Currently, coalesced symbols are implemented only on Mac OS X.
The static link editor allows multiple definitions of a coalesced symbol without any warnings or errors. The static link editor outputs only one instance of each coalesced symbol, using the first instance it encounters in the object files being linked. The static link editor always outputs an instance of a coalesced symbol if it appears in the object files being linked, even if it also appears in the dynamic libraries being referenced. The dynamic link editor then relocates such that only one instance of each coalesced symbol is used throughout the program.
Coalesced symbols are placed by the compiler tools into sections
flagged with the type bit S_COALESCED
. The static and dynamic linker
look at the section type to determine that a given symbol is a
coalesced symbol and therefore to allow multiple definitions.
The static link editor divides up a coalesced section on the boundaries of the symbols in that section, associating the bytes of the section after each symbol with the preceding symbol. An object file is considered malformed if a coalesced section does not have a symbol at the first address of the section.
To allow the linker to properly manage the debug information for
coalesced symbols, the stabs entries for a given coalesced symbol must
be preceded by N_BNSYM
and terminated with N_ENSYM
. The
value of the N_BNYSM
stab should be the start address of the coalesced
portion (the same value as the symbol that begins the portion), and
the value of the N_ENSYM
stab should be the end address of the
coalesced portion (the same value as the next coalesced symbol, or the
end of the whole coalesced section).
Some languages, like Fortran, have the ability to enter procedures at
some place other than the beginning. One can declare an alternate entry
point. The N_ENTRY
stab is for this; however, the Sun FORTRAN
compiler doesn't use it. According to AIX documentation, only the name
of a C_ENTRY
stab is significant; the address of the alternate
entry point comes from the corresponding external symbol. A previous
revision of this document said that the value of an N_ENTRY
stab
was the address of the alternate entry point, but I don't know the
source for that information.
Go to the first, previous, next, last section, table of contents.