When functions (routines) call other functions (subroutines), they may need to pass arguments to the called functions. The called functions access those arguments as parameters. Conversely, some functions return a result or return value to their callers. Both arguments and results can be passed using the 32-bit PowerPC architecture registers or the runtime stack, depending on the data type of the values involved. For the successful and efficient passing of values between routines and subroutines, GCC follows strict rules when it generates a program’s object code.
This article describes the data types that can be used to manipulate the arguments and results of function calls, how routines pass arguments to the subroutines they call, and how functions pass results to their callers. It also lists the registers available in the 32-bit PowerPC architecture and whether their value is preserved after a function call.
Data Types and Data Alignment
Function Calls
Using the correct data types for your variables and setting the appropriate data alignment for your data can maximize the performance and portability of your programs. Data alignment specifies how data is laid out in memory.
Table 1 lists the ANSI C scalar data types and their sizes and natural alignment in this environment.
Data type | Size and natural alignment (in bytes) |
---|---|
| 4 |
| 1 |
| 1 |
| 2 |
| 2 |
| 4 |
| 4 |
| 4 |
| 4 |
| 8 |
| 8 |
| 4 |
| 8 |
| 16* |
pointer | 4 |
(*) In Mac OS X v10.4 and later and GCC 4.0 and later, the size of the long double
extended precision data type is 16 bytes (it’s made up of two 8-byte doubles). In earlier versions of Mac OS X and GCC, long double
is equivalent to double
. You should not use the long double
type when you use GCC 4.0 or later to develop or in programs targeted at Mac OS X versions earlier than 10.4.
These are some important details about the 32-bit PowerPC environment:
This environment uses the big-endian byte ordering scheme to store numeric and pointer data types. That is, the most significant bytes go first, followed by the least significant bytes.
This environment uses the two’s-complement binary representation for signed integer data types.
Arithmetic for the 64-bit integer data types must be synthesized by the compiler since the 32-bit PowerPC architecture does not implement 64-bit integer math operations.
The float
and double
data types conform to the IEEE-754 standard representation. For the value range and precise format of floating-point data types, see PowerPC Numerics in Performance Documentation.
This environment supports multiple data alignment modes. The alignment of data types falls into two categories:
Natural alignment. The alignment of a data type when allocated in memory or assigned a memory address.
The natural alignment of a data type is its size. Table 1 shows the natural alignment of each data type supported by this environment.
Embedding alignment. The alignment of a data type within a composite data structure.
For example, the alignment of an unsigned short
variable on the stack may differ from that of an unsigned short
element embedded in a data structure.
The embedding alignment for data structures varies depending on the alignment mode selected. Generally, you can set the alignment mode using compiler options or #pragma
statements. You should consider the compatibility and performance issues described later in this section when choosing a particular alignment mode.
These are the embedding alignment modes available in the 32-bit PowerPC environment:
Power alignment mode is derived from the alignment rules used by the IBM XLC compiler for the AIX operating system. It is the default alignment mode for the PowerPC-architecture version of GCC used on AIX and Mac OS X. Because this mode is most likely to be compatible between PowerPC-architecture compilers from different vendors, it’s typically used with data structures that are shared between different programs.
The rules for power alignment are:
The embedding alignment of the first element in a data structure is equal to the element’s natural alignment.
For subsequent elements with a natural alignment less than 4 bytes, the embedding alignment of each element is equal to its natural alignment.
For subsequent elements that have a natural alignment greater than 4 bytes, the embedding alignment is 4, unless the element is a vector
.
The embedding alignment for vector
elements is always 16 bytes.
The embedding alignment of a composite data type (array or data structure) is determined by the largest embedding alignment of its members.
The total size of a composite type is rounded up to a multiple of its embedding alignment, and is padded with null bytes.
Because the natural alignment of the double
and long long
data types is greater than 4 bytes, they may not be appropriately aligned in power alignment mode. Any misalignment impairs performance when such data members are accessed. When you use these data types for any element after the first element, the compiler pads the structure to align the elements to the next multiple of their natural alignment.
Mac68K alignment mode is usually used with legacy data structures inherited from Mac OS 9 and earlier systems. New code should not need to use this alignment mode except to preserve compatibility with older data structures.
The rules for Mac68K alignment are:
The embedding alignment of the char
data type is 1 byte.
The embedding alignment of all other data types (except vector
) is 2 bytes.
The embedding alignment for the vector
data type is 16 bytes.
The total size of a composite data type is rounded up to a multiple of 2 bytes.
Natural alignment mode uses the natural alignment of each data type as its embedding alignment. Use this alignment mode to obtain the highest performance when using the double
, long long
, and long double
data types.
Packed alignment mode contains no alignment padding between elements (the alignment for all data types is 1 byte). Use this alignment mode when you need a data structure to use as little memory as possible. Note, however, that packed alignment can significantly lower the performance of your application.
Note: Data items passed as parameters in a function call have special alignment rules. See “Stack Structure” for more information.
Table 2 lists the alignment for structure fields of the fundamental data types and composite data types in the supported alignment modes.
Data type | Power alignment | Natural alignment | Mac68K alignment | Packed alignment |
---|---|---|---|---|
| 4 | 4 | 2 | 1 |
| 1 | 1 | 1 | 1 |
| 2 | 2 | 2 | 1 |
| 4 | 4 | 2 | 1 |
| 4 | 4 | 2 | 1 |
| 4 or 8 | 8 | 2 | 1 |
| 4 | 4 | 2 | 1 |
| 4 or 8 | 8 | 2 | 1 |
| 2 | 1 | ||
| 16 | 16 | 16 | 1 |
Composite (data structure or array) | 4, 8, or 16 | 1, 2, 4, 8, or 16 | 2 | 1 |
With GCC you can control data-structure alignment by adding #pragma
statements to your source code or by using command-line options. The power alignment mode is used if you do not specify otherwise.
To set the alignment mode, use the gcc
flags -malign-power
, -malign-mac68k
, and -malign-natural
. To use a specific alignment mode in a data structure, add this statement just before the data-structure declaration:
#pragma option align=<mode> |
Replace <mode>
with power
, mac68k
, natural
, or packed
. To restore the previous alignment mode, use reset
as the alignment mode in a #pragma
statement:
#pragma option align=reset |
This section details the process of calling a function and passing arguments to it, and how functions return values to their callers.
Note: These argument-passing conventions are part of the Apple standard for procedural programming interfaces. Object-oriented languages may use different rules for their own method calls. For example, the conventions for C++ virtual function calls may be different from those for C functions.
This environment uses a stack that grows downward and contains linkage information, local variables, and a function’s parameter information, as shown in Figure 1. (To help prevent the execution of malicious code on the stack, GCC protects the stack against execution.)
The stack pointer (SP) points to the bottom of the stack. The stack has a fixed frame size, which is known at compile time.
The calling routine’s stack frame includes a parameter area and some linkage information. The parameter area has the arguments the caller passes to the called function or space for them, depending on the type of each argument and the availability of registers (see “Passing Arguments” for details). Since the calling routine may call several functions, in the 32-bit PowerPC environment the parameter area is normally large enough to accommodate the largest argument list of all the functions the caller calls. It is the calling routine’s responsibility to set up the parameter area before each function call. The called function is responsible for accessing the arguments placed in the parameter area.
The first 32 bytes in the parameter area correspond to the general-purpose registers GPR3 through GPR10. When data is placed in a general-purpose register and not duplicated in the parameter area, the corresponding section in the parameter area is reserved in case the called function needs to copy the value in the register to the stack. Table 3 shows the correspondence of parameter area locations to the general-purpose registers that can be used to pass arguments.
Stack frame location | Register |
---|---|
| GPR3 |
| GPR4 |
| GPR5 |
| GPR6 |
| GPR7 |
| GPR8 |
| GPR9 |
| GPR10 |
These are the alignment rules followed when parameters are placed in the parameter area or in GPR3 through GPR10:
All nonvector parameters are aligned on 4-byte boundaries.
Vector parameters are aligned on 16-byte boundaries.
Noncomposite parameters (that is, parameters that are not arrays or data structures) smaller than 4 bytes occupy the high-order bytes of their 4-byte area.
Composite parameters (arrays, structures, and unions) 1 or 2 bytes in size occupy the low-order bytes of their 4-byte area. They are preceded by padding to 4 bytes.
This rule is inconsistent with other 32-bit PowerPC binary interfaces. In AIX and Mac OS 9 (and earlier), padding bytes always follow the data structure even in the case of composite parameters smaller than 4 bytes.
Composite parameters 3 bytes or larger in size occupy the high-order bytes of their 4-byte area. They are followed by padding to make a multiple of 4 bytes, with the padding bytes being undefined.
For example, consider the foo
function, declared like this:
void foo(SInt32 i1, float f1, double d1, SInt16 s1, double d2, |
UInt8 c1, UInt16 s2, float f2, SInt32 i2); |
Table 4 shows how the function’s arguments are assigned locations in the parameter area. The assignment takes into account the 4-byte alignment required for each argument.
Parameter | Type | Location | Data size and padding (in bytes) |
---|---|---|---|
|
|
| 4, 0 |
|
|
| 4, 0 |
|
|
| 8, 0 |
|
|
| 2, 2 |
|
|
| 8, 0 |
|
|
| 1, 3 |
|
|
| 2, 2 |
|
|
| 4, 0 |
|
|
| 4, 0 |
The calling routine’s linkage area holds a number of values, some of which are saved by the calling routine and some by the called function. The elements within the linkage area are:
The link register (LR). Its value is saved at 8(SP)
by the called function if it chooses to do so. The link register holds the return address of the instruction that follows a branch and link instruction.
The condition register (CR). Its value may be saved at 4(SP)
by the called function. The condition register holds the results of comparison operations. As with the link register, the called procedure is not required to save this value.
The stack pointer (SP). Its value may be saved at 0(SP)
by the called function as part of its stack frame. Leaf functions are not required to save the the stack pointer. A leaf function is a function that does not call any other functions.
The linkage area is at the top of the stack, adjacent to the stack pointer. This positioning is necessary so that the calling routine can find and restore the values stored there and also allow the called function to find the caller’s parameter area. This placement means that a routine cannot push and pop parameters from the stack once the stack frame is set up.
The stack frame also includes space for the called function’s local variables. However, some registers are also available for use by the called function; see “Register Preservation” for details. If the subroutine contains more local variables than would fit in the registers, it uses additional space on the stack. The size of the local-variable area is determined at compile time. Once a stack frame is allocated, the size of the local-variable area does not change.
The called function is responsible for allocating its own stack frame, making sure to preserve 16-byte alignment in the stack. This operation is accomplished by a section of code called the prolog, which the compiler places before the body of the subroutine. After the body of the subroutine, the compiler places an epilog to restore the processor to the state it was prior to the subroutine call.
The compiler-generated prolog code does the following:
Decrements the stack pointer to account for the new stack frame and writes the previous value of the stack pointer to its own linkage area, which ensures the stack can be restored to its original state after returning from the call.
It is important that the decrement and update tasks happen atomically (for example, with stwu
, stwux
, stdu
, or stdux
) so that the stack pointer and back-link are in a consistent state. Otherwise, asynchronous signals or interrupts could corrupt the stack.
Saves all nonvolatile general-purpose and floating-point registers into the saved-registers area. Note that if the called function does not change a particular nonvolatile register, it does not save it.
Saves the link-register and condition-register values in the caller’s linkage area, if needed.
Listing 1 shows an example of a subroutine prolog. Notice that the order of these actions differs from the order previously described.
Listing 1 Example prolog
linkageArea = 24 ; size in 32-bit PowerPC ABI |
params = 32 ; callee parameter area |
localVars = 0 ; callee local variables |
numGPRs = 0 ; volatile GPRs used by callee |
numFPRs = 0 ; volatile FPRs used by callee |
spaceToSave = linkageArea + params + localVars + 4*numGPRs + 8*numFPRs |
spaceToSaveAligned = ((spaceToSave+15) & (-16)) ; 16-byte-aligned stack |
_functionName: ; PROLOG |
mflr r0 ; extract return address |
stw r0, 8(SP) ; save the return address |
stwu SP, -spaceToSaveAligned(SP) ; skip over caller save area |
At the end of the subroutine, the compiler-generated epilog does the following:
Restores the nonvolatile general-purpose and floating-point registers that were saved in the stack frame.
Nonvolatile registers are saved in the new stack frame before the stack pointer is updated only when they fit within the space beneath the stack pointer, where a new stack frame would normally be allocated, also known as the red zone. The red zone is by definition large enough to hold all nonvolatile general-purpose and floating-point registers but not the nonvolatile vector registers. See “The Red Zone” for details.
Restores the condition-register and link-register values that were stored in the linkage area.
Returns control to the the calling routine using the address stored in the link register.
Listing 2 shows an example epilog.
Listing 2 Example epilog
; EPILOG |
lwz r0, spaceToSaveAligned + 8(SP) ; get the return address |
mtlr r0 ; into the link register |
addi SP, SP, spaceToSaveAligned ; restore stack pointer |
blr ; and branch to the return address |
The VRSAVE register is used to specify which vector registers must be saved during a thread or process context switch.Listing 3 shows an example prolog that sets up VRSAVE so that vector registers V0 through V2 are saved. Listing 3 also includes the epilog that restores VRSAVE to its previous state.
Listing 3 Example usage of the VRSAVE register
#define VRSAVE 256 // VRSAVE IS SPR# 256 |
_functionName: |
mfspr r2, VRSAVE ; get vector of live VRs |
oris r0, r2, 0xE000 ; set bits 0-2 since we use V0..V2 |
mtspr VRSAVE, r0 ; update live VR vector before using any VRs |
; Now, V0..V2 can be safely used. |
; Function body goes here. |
mtspr VRSAVE, r2 ; restore VRSAVE |
blr ; return to caller |
The space beneath the stack pointer, where a new stack frame would normally be allocated by a subroutine, is called the red zone. The red zone, shown in Figure 2, is considered part of the current stack frame. This area is not modified by asynchronous pushes, such as signals or interrupt handlers. Therefore, the red zone may be used for any purpose as long as a new stack frame does not need to be added to the stack. However, the contents of the red zone are assumed to be destroyed by any synchronous call.
For example, because a leaf function does not call any other functions—and, therefore, does not allocate a parameter area on the stack—it can use the red zone. Furthermore, such a function does not need to use the stack to store local variables; it needs to save only the nonvolatile registers it uses for local variables. Since, by definition, no more than one leaf function is active at any time within a thread, there is no possibility of multiple leaf functions competing for the same red zone space.
A leaf function may or may not allocate a stack frame and decrement the stack pointer. When it doesn’t allocate a stack frame, a leaf function stores the link register and condition register values in the linkage area of the routine that calls it (if necessary) and stores the values of any nonvolatile registers it uses in the red zone. This streamlining means that a leaf function’s prolog and epilog do minimal work; they do not have to set up and take down a stack frame.
The size of the red zone is 224 bytes, which is enough space to store the values of nineteen 32-bit general-purpose registers and eighteen 64-bit floating-point registers, rounded up to the nearest 16-byte boundary. If a leaf function’s red zone usage would exceed the red zone size, it must set up a stack frame, just as functions that call other functions do.
In the C language, functions can declare their parameters using one of three conventions:
The types of all parameters is specified in the function’s prototype. For example:
int foo(int, short); |
In this case, the type of all the function’s parameters is known at compile time.
The function’s prototype declares some fixed parameters and some nonfixed parameters. The group of nonfixed parameters is also called a variable argument list. For example:
int foo(int, ...); |
In this case, the type of one of the function’s parameters in known at compile time. The type of the nonfixed parameters is not known.
The function has no prototype or uses a pre–ANSI C declaration. For example:
int foo(); |
In this case, the type of all the function’s parameters is unknown at compile time.
When the compiler generates the prolog for a function call, it uses the information from the function’s declaration to decide how arguments are passed to the function. When the compiler knows the type of a parameter, it passes it in the most efficient way possible. But when the type is unknown, it passes the parameter using the safest approach, which may involve placing data both in registers and in the parameter area. For called functions to access their parameters correctly, it’s important that they know when parameters are passed in the stack or in registers.
Arguments are passed in the stack, in registers, or both, depending on their types and the availability of registers. There are three types of registers: general purpose, floating point, and vector. General-purpose registers (GPRs) are 32-bit registers that can manipulate integral values and pointers. Floating-point registers (FPRs) are 64-bit registers that can manipulate single-precision and double-precision floating-point values. Vector registers are 128-bit registers that can manipulate 4 through 16 chunks of data in parallel.
The registers that can be used to pass arguments to called functions are the general-purpose registers GPR3 through GPR10, the floating-point registers FPR1 through FPR13, and the vector registers V2 through V13 (see “Register Preservation” for details). These registers are also known as parameter registers.
Important: Only the low 32 bits in each of the general-purpose registers available on the 64-bit PowerPC architecture are used in this environment. That is, only the low 32 bits of nonvolatile registers are saved and restored. However, all 64 bits are saved across asynchronous events, such as signals and preemptions. Therefore, you can use the 64 bits in each register between function calls. You control this feature through the gcc
options -arch
and -mcpu
.
Typically, the called routine obtains arguments from registers. However, the caller generates a parameter area in the caller’s stack frame that is large enough to hold all the arguments passed to the called function, regardless of how many of the arguments are actually passed in registers. (You can think of the parameter area as a data structure that has space to hold all the arguments in a given call.) There are several reasons for these scheme:
It provides the called function with space in the stack to store a register-based parameter if it wants to use one of the parameter registers for some other purpose. For example, the callee can use these space to pass arguments to a function it calls.
Functions with variable argument lists must often access their parameters from RAM, not from registers. Such functions must reserve 32 bytes (8 registers) in the parameter area to hold the parameter values.
To simplify debugging, GCC writes parameters from the parameter registers into the parameter area in the stack frame. This allows you to see all the parameters by looking only at the parameter area.
The compiler uses the following rules when passing arguments to subroutines:
Parameters whose type is known at compile time are processed as follows:
Scalar, non–floating-point elements are placed in the general-purpose registers GPR3 through GPR10. As each register is used, the caller allocates the register’s corresponding section in the parameter area, as described in “Stack Structure.” When general-purpose registers are exhausted, the caller places scalar, non–floating-point elements in the parameter area.
The caller places floating-point parameters in the floating-point registers FPR1 through FPR13. As each floating-point register is used, the caller skips one or more general-purpose registers, based on the size of the parameter. (For example, a float
element causes one (4-byte) general-purpose register to be skipped. A double
element causes two general-purpose registers to be skipped.) When floating-point registers are exhausted, the caller places floating-point elements in the parameter area.
The caller places structures (struct
elements) with only one noncomposite member in general-purpose or floating-point registers, depending on whether the member is an integer or a floating-point value. For example, the caller places a structure comprised of a float
member in a floating-point register, not a general-purpose register. When registers of the required type are exhausted, the caller places structures in the parameter area.
The caller places vector
parameters in vector registers V2 through V13. For procedures with a fixed number of parameters, the presence of vectors doesn’t affect the allocation of general-purpose registers and floating-point registers. The caller doesn’t allocate space for vector
elements in the parameter area of its stack frame unless the number of vector
elements exceeds the number of usable vector registers.
When the number of parameters exceeds the number of usable registers, the caller places the excess parameters in the parameter area.
Parameters whose type is not known at compile time (functions with variable-argument lists or using pre–ANSI C prototypes) are processed as follows:
The caller places nonvector elements both in general-purpose registers and in floating-point registers.
Because the compiler doesn’t know the type of the parameter, it cannot determine whether the argument should be passed in a general-purpose register or in a floating-point register. Therefore, callers place each argument in a floating-point register and the corresponding general-purpose registers based on the argument’s size.
The caller places vector
elements in vector registers and general-purpose registers (each vector element requires four general-purpose registers. The caller also allocates space in the parameter area that corresponds to the general-purpose registers used.
Important: When the return type of the called function is a composite value (for example, struct
or union
), the caller passes a pointer in GPR3 as an implicit first parameter of the called function. Therefore, the functions’ declared parameters start at GPR4. The pointer points to a section of memory large enough to hold the return value. See “Returning Results” for more information.
For example, consider the foo
function, declared like this:
void foo(SInt32 i1, float f1, double d1, SInt16 s1, double d2, |
UInt8 c1, UInt16 s2, float f2, SInt32 i2); |
The caller places each argument to foo
in a general-purpose register, a floating-point register, or the parameter area, depending on the parameter’s data type and register availability. Table 5 describes this process.
Parameter | Type | Placed in | Reason |
---|---|---|---|
|
| GPR3 | Noncomposite, non–floating-point element. |
|
| FPR1 | Floating-point element. GPR4 is skipped. |
|
| FPR2 | Double-precision, floating-point element. GPR5 and GPR6 are skipped. |
|
| GPR7 | Noncomposite, non–floating-point element. |
|
| FPR3 | Double-precision, floating-point element. GPR8 and GPR9 are skipped. |
|
| GPR10 | Noncomposite, non–floating-point element. |
|
|
| No general-purpose registers available. |
|
| FPR4 | Floating-point element. |
|
|
| No general-purpose registers available. |
Note: In this case, the caller doesn’t place any arguments that it places in general-purpose registers or floating-point registers in the parameter area.
Figure 3 illustrates the assignment of the foo
parameters to registers and the parameter area. Keep in mind that the only parameters placed in the parameter area are s2
and i2
.
The called function can access the fixed parameters as usual. But it copies the general-purpose registers to the parameter area and accesses the values from there. Listing 4 shows a routine that accesses undefined parameters by walking through the stack.
Listing 4 A variable-argument procedure
#include <stdarg.h> |
double dsum(int count, ...) { |
double sum = 0.0; |
double val; |
va_list arg; |
va_start(arg, count); |
while (count > 0) { |
val = va_arg(arg, double); |
sum += val; |
count--; |
} |
va_end(arg); |
return sum; |
} |
The following list describes where a function’s return value is passed to the caller.
Scalars smaller than 4 bytes (such as char
and short
) are placed in the low word of GPR3. The register’s high word is undefined.
Scalars 4 bytes in size (such as long
, int
, and pointers, including array pointers) are placed in GPR3.
Values of type long long
are returned in the high word of GPR3 and the low word of GPR4.
Floating-point values are placed in FPR1.
Composite values (such as struct
and union
) and values larger than 4 bytes are placed at the location pointed to by GPR3. See “Passing Arguments” for more information.
Table 6 lists the 32-bit PowerPC architecture registers used in this environment and their volatility in function calls. Registers that must preserve their value after a function call are called nonvolatile.
Type | Name | Preserved | Notes |
---|---|---|---|
General-purpose register | GPR0 | No | |
GPR1 | Yes | Used as the stack pointer to store parameters and other temporary data items. | |
GPR2 | No | Available for general use. | |
GPR3 | No | The caller passes parameter values to the called procedure in GPR3 through GPR10. The caller may also pass the address to storage where the callee places its return value in this register. | |
GPR4–GPR10 | No | Used by callers to pass parameter values to called functions (see notes for GPR3). | |
GPR11 | Yes in nested functions. No in leaf functions. | In nested functions, the caller passes its stack frame to the nested function in this register. In leaf functions, the register is available. For details on nested functions, see the GCC documentation. This register is also used by lazy stubs in dynamic code generation to point to the lazy pointer. | |
GPR12 | No | Set to the address of the branch target before an indirect call for dynamic code generation. This register is not set for a function that has been called directly; therefore, functions that may be called directly should not depend on this register being set up correctly. See Mach-O Programming Topics for more information. | |
GPR13–GPR31 | Yes | ||
Floating-point register | FPR0 | No | |
FPR1–FPR13 | No | Used to pass floating-point parameters in function calls. | |
FPR14–FPR31 | Yes | ||
Vector register | V0–V19 | No | The caller passes vector parameters in V2 to V13 during a function call. |
V20–V31 | Yes | ||
Special-purpose vector register | VRSAVE | Yes | 32-bit special-purpose register. Each bit in this register indicates whether the corresponding vector register must be saved during a thread or process context switch. |
Link register | LR | No | Stores the return address of the calling routine that called the current subroutine. |
Count register | CTR | No | |
Fixed-point exception register | XER | No | |
Condition register fields | CR0, CR1 | No | |
CR2–CR4 | Yes | ||
CR5–CR7 | No |
© 2009 Apple Inc. All Rights Reserved. (Last updated: 2009-02-04)