An important improvement you can make to your application’s performance is to reduce the number of virtual memory pages used by the application at any given time. This set of pages is referred to as the working set and consists of application code and runtime data. Reducing the amount of in-memory data is a function of your algorithms, but reducing the amount of in-memory code can be achieved by a process called scatter loading. This technique is also referred to as improving the locality of reference of your code.
Normally, compiled code for methods and functions is organized by source file in the resulting binary. Scatter loading changes this organization and instead groups related methods and functions together, independent of the original locations of those methods and functions. This process allows the kernel to keep an active application’s most frequently referenced executable pages in the smallest memory space possible. Not only does this reduce the footprint of the application, it reduces the likelihood of those pages being paged out.
Important: You should generally wait until very late in the development cycle to scatter load your application. Code tends to get moved around during development, which can invalidate prior profiling results.
Profiling Code With gprof
Profiling With the Monitor Functions
Organizing Code at Compile Time
Reordering the __text Section
Reordering Other Sections
Reordering Assembly Language Code
Given profiling data collected at runtime, gprof
produces an execution profile of a program. The effect of called routines is incorporated in the profile of each caller. The profile data is taken from the call graph profile file (gmon.out
by default), which is created by a program compiled and linked with the -pg
option. The symbol table in the executable is correlated with the call graph profile file. If more than one profile file is specified, the gprof
output shows the sum of the profile information in the given profile files.
The gprof
tool is useful for many purposes, including the following:
cases where the Sampler application doesn’t work well, such as command-line tools or applications that quit after a short period of time
cases where you want a call graph that includes all the code that might be called in a given program, rather than a periodic sampling of calls
cases where you want to change the link order of your code to optimize the code locality
Before you can profile your application, you must set up your project to generate profiling information. To generate profiling information for your Xcode project, you must modify your target or build-style settings to include the “Generate profiling code” option. (See the Xcode Help for information on enabling target and build-style settings.)
The profiling code inside your program generates a file called gmon.out
with the profiling information. (Usually, this file is placed in the current working directory.) To analyze the data in this file, copy it to the directory containing your executable prior to calling gprof
, or just specify the path to gmon.out
when you run gprof
.
In addition to profiling your own code, you can find out how much time is spent in Carbon and Cocoa frameworks by linking against profile versions of those frameworks. To do this, add the DYLD_IMAGE_SUFFIX
setting to your target or build style and set its value to _profile
. The dynamic linker combines this suffix with the framework name to link against the profile version of the framework. To determine which frameworks support profiling, look at the frameworks themselves. For example, the Carbon library comes with both profile and debug versions.
Note: Profile and debug versions of libraries are installed as part of the developer tools package and may not be available on user systems. Make sure your shipping executable does not link against one of these libraries.
An order file contains an ordered sequence of lines, each line consisting of a source file name and a symbol name, separated by a colon with no other white space. Each line represents a block to be placed in a section of the executable. If you modify the file by hand, you must follow this format exactly so the linker can process the file. If the object file name:symbol name pair is not exactly the name seen by the linker, it tries its best to match up the names with the objects being linked.
The lines in an order file for procedure reordering consist of an object filename and procedure name (function, method, or other symbol). The sequence in which procedures are listed in the order file represents the sequence in which they are linked into the __text
section of the executable.
To create an order file from the profiling data generated by using a program, run gprof
using the -S
option (see the man page for gprof (1)
). For example,
gprof -S MyApp.profile/MyApp gmon.out |
The -S
option produces four mutually exclusive order files:
|
Ordering based on a “closest is best” analysis of the profiling call graph. Calls that call each other frequently are placed close together. |
|
Routines sorted by the number of calls made to each routine, largest numbers first. |
|
Routines sorted by the order in which they are called. |
|
Routines sorted by the amount of CPU time spent, largest times first. |
You should try using each of these files to see which provides the largest performance improvement, if any. See “Using pagestuff to Examine Pages on Disk” for a discussion of how to measure the results of the ordering.
These order files contain only those procedures used during profiling. The linker keeps track of missing procedures and links them in their default order after those listed in the order file. Static names for library functions are generated in the order file only if the project directory contains a file generated by the linker’s -whatsloaded
option; see “Creating a Default Order File” for details.
The gprof -S
option doesn’t work with executables that have already been linked using an order file.
After you generate your order files, you should look through them and make sure they are correct. There are a number of cases where you need to edit your order files manually, including the following:
Your executable contained assembly-language files.
You profiled a stripped executable.
Your executable contained files compiled without the -g
option.
Your project defines internal labels (typically for goto
statements).
You want to preserve the order of routines in a particular object file.
If the definition of a symbol is located in an assembly file, a stripped executable file, or a file compiled without the -g
option, gprof
omits the source file name from the symbol’s entry in the order file. If your project uses such files, you must edit the order file manually and add the appropriate source filenames. Alternatively, you can delete the symbol references altogether to force the corresponding routines to be linked in default order.
If your code contains internal labels, you must remove those labels from the order files; otherwise, the function that defines the label will be split apart during the linking phase. You can prevent the inclusion of internal labels in assembly files altogether by prefixing them with the string L_
. The assembler interprets symbols with this prefix as local to a particular function and strips them to prevent access by other tools such as gprof
.
To preserve the order of routines in a particular object file, use the special symbol .section_all
. For example, if the object file foo.o
comes from assembly source and you want to link all of the routines without reordering them, delete any existing references to foo.o
and insert the following line in the order file:
foo.o:.section_all |
This option is useful for object files compiled from assembly source, or for which you don’t have the source.
Once you’ve generated an order file, you can link the program using the -sectorder
and -e
start
options:
cc -o
outputFile inputFile.o …
-sectorder __TEXT __text
orderFile -e
start
To use an order file with a Xcode project, modify the “Other Linker Flags” option in the Deployment build style of your project. Add the text -sectorder __TEXT __text
orderFile to this setting to specify your order file.
If any inputFile is a library rather than an object file, you may need to edit the order file before linking to replace all references to the object file with references to the appropriate library file. Again, the linker does its best to match names in the order file with the sources it is editing.
With these options, the linker constructs the executable file outputFile so that the contents of the __TEXT
segment’s __text
section are constructed from blocks of the input files’ __text
sections. The linker arranges the routines in the input files in the order listed in orderFile.
As the linker processes the order file, it places the procedures whose object-file and symbol-name pairs aren’t listed in the order file into the __text
section of outputFile. It links these symbols in the default order. Object-file and symbol-name pairs listed more than once always generate a warning, and the first occurrence of the pair is used.
By default, the linker prints a summary of the number of symbol names in the linked objects that are not in the order file, the number of symbol names in the order file that are not in the linked objects, and the number of symbol names it tried to match that were ambiguous. To request a detailed listing of these symbols, use the -sectorder_detail
option.
The linker’s -e start
option preserves the executable’s entry point. The symbol start
(note the lack of a leading “_
”) is defined in the C runtime shared library /usr/bin/crt1.o
; it represents the first text address in your program when you link normally. When you reorder your procedures, you have to use this option to fix the entry point. Another way to do this is to make the line /usr/lib/crt1.o:start
or /usr/lib/crt1.o:section_all
the first line of your order file.
The .order
files generated by gprof
contain only those functions that were called or sampled when the executable was run. For library functions to appear correctly in the order file, a whatsloaded
file produced by the linker should exist in the working directory.
The -S
option does not work with executables that have already been linked with an order file.
Production of the gmon.order
file can take a long time—it can be suppressed with the -x
parameter.
Filenames will be missing for the following items:
files compiled without the -g
parameter
routines generated from assembly-language source
executables that have had their debugging symbols removed (as with the strip
tool)
The file /usr/include/monitor.h
declares a set of functions that you can use to programmatically profile specific sections of your code. You can use these functions to gather statistics only for certain sections of your code, or for all of your code. You can then use the gprof
tool to build call graph and other performance analysis data from the resulting file. Listing 1 shows how to use the monitor functions.
Listing 1 Using monitor functions
#include <monitor.h> |
/* To start profiling: */ |
moninit(); |
moncontrol(1); |
/* To stop, and dump to a file */ |
moncontrol(0); |
monoutput("/tmp/myprofiledata.out"); |
monreset(); |
The GCC compiler lets you specify attributes on any function or variable you declare. The section
attribute lets you tell GCC which segment and section you want a particular piece of code to be placed.
Warning:
Do not use section
attributes unless you understand the structure of Mach-O executables and know the rules for placing functions and data in the corresponding segments. Placing a function or global variable in an inappropriate section may break your program.
The section
attribute takes several parameters that control where the resulting code is placed. At a minimum, you must specify a segment and section name for the code you want to place. Other options are also available and are described in the GCC documentation.
The following listing shows how you use the section
attribute for a function. In this example, the section
attribute is added to a forward declaration of the function. The attribute tells the compiler to place the function in a specific __text
section of the executable.
void MyFunction (int a) __attribute__((section("__TEXT,__text.10"))); |
The following listing shows some examples of how you can organize your global variables using the section
attribute.
extern const int x __attribute__((section("__TEXT,__my_const"))); |
const int x=2; |
extern char foo_string[] __attribute__((section("__DATA,__my_data"))); |
char foo_string[] = "My text string\n"; |
For detailed information on specifying section
attributes, see the documentation for the GCC compiler in /Developer/Documentation/DeveloperTools/gcc3
.
As described in “Overview of the Mach-O Executable Format,” the __TEXT
segment holds the actual code and read-only portions of your program. The compiler tools, by convention, place procedures from your Mach-O object files (with extension .o
) in the __text
section of the __TEXT
segment.
As your program runs, pages from the __text
section are loaded into memory on demand, as routines on these pages are used. Code is linked into the __text
section in the order in which it appears in a given source file, and source files are linked in the order in which they are listed on the linker command line (or in the order specifiable in Xcode). Thus, code from the first object file is linked from start to finish, followed by code from the second file and third file and so on.
Loading code in the source file declaration order is rarely optimal. For example, suppose that certain methods or functions in your code are invoked repeatedly, while others are seldom used. Reordering the procedures to place frequently used code at the beginning of the __text
section minimizes the average number of pages your application uses and thereby reduces paging activity.
As another example, suppose that all the objects defined by your code are initialized at the same time. Because the initialization routine for each class is defined in a separate source file, the initialization code is ordinarily distributed across the __text
section. By contiguously reordering initialization code for all classes, you reduce the number of pages that need to be read in, enhancing initialization performance. The application requires just the small number of pages containing initialization code, rather than a larger number of pages, each containing a small piece of initialization code.
Depending on the size and complexity of your application, you should pursue a strategy for ordering code that best improves the performance of your executable. Like most performance tuning, the more time you spend measuring and retuning your procedure order, the more memory you save. You can easily obtain a good first-cut ordering by running your application and sorting the routines by call frequency. The steps for this strategy are listed below and explained in more detail in the sections that follow:
Build a profile version of your application. This step generates an executable containing symbols used in the profiling and reordering procedures.
Run and use the application to create a set of profile data. Perform a series of functional tests, or have someone use the program for a test period.
Create order files. Order files list procedures in optimized order. The linker uses order files to reorder procedures in the executable.
Run the linker using the order files. This creates an executable with procedures linked into the __text
section as specified in the order file.
For information on profiling your code and generating and linking an order file, see “Profiling Code With gprof.”
For many programs, the ordering generated by the steps just described brings a substantial improvement over unordered procedures. For a simple application with few features, such an ordering represents most of the gains to be had by procedure reordering. However, larger applications and other large programs can benefit greatly from additional analysis. While order files based on call frequency or the call graph are a good start, you can use your knowledge of the structure of your application to further reduce the virtual-memory working set.
If you want to reorder an application’s procedures using techniques other than those described above, you may want to skip the profiling steps and just start with a default order file that lists all the routines of your application. Once you have a list of the routines in suitable form, you can then rearrange the entries by hand or by using a sorting technique of your choice. You can then use the resulting order file with the linker’s -sectorder
option as described in “Linking with an Order File.”
To create a default order file, first run the linker with the -whatsloaded
option:
cc -o
outputFile inputFile.o -whatsloaded >
loadedFile
This creates a file, loadedFile, that lists the object files loaded in the executable, including any in frameworks or other libraries. The -whatsloaded
option can also be used to make sure that order files generated by gprof
-S
include names for procedures in static libraries.
Using the file loadedFile, you can run nm
with the -onjls
options and the __TEXT __text
argument:
nm -onjls __TEXT __text `cat
loadedFile`
>
orderFile
The content of the file orderFile is the symbol table for the text section. Procedures are listed in the symbol table in their default link order. You can rearrange entries in this file to change the order in which you want procedures to be linked, then run the linker as described in “Linking with an Order File.”
The pagestuff
tool helps you measure the effectiveness of your procedure ordering by telling you which pages of the executable file are likely to be loaded in memory at a given time. This section briefly describes how to use this tool. See the pagestuff
man page for more information.
The pagestuff
tool prints out the symbols on a particular page of executable code. The following is the syntax for the command:
pagestuff
filename [pageNumber | -a
]
The output of pagestuff
is a list of procedures contained in filename on page pageNumber. To view all the pages of the file, use the -a
option in place of the page number. This output allows you to determine if each page associated with the file in memory is optimized. If it isn’t, you can rearrange entries in the order file and link the executable again to maximize performance gains. For example, move two related procedures together so they are linked on the same page. Perfecting the ordering may require several cycles of linking and tuning.
Why generate profile data for individual operations of your application? The strategy is based on the assumption that a large application has three general groups of routines:
Hot routines run during the most common usage of the application. These are often primitive routines that provide a foundation for the application’s features (for example, routines for accessing a document’s data structures) or routines that implement the core features of an application, such as routines that implement typing in a word processor. These routines should be clustered together in the same set of pages.
Warm routines implement specific features of the application. Warm routines are usually associated with particular features that user performs only occasionally (such as launching, printing, or importing graphics). Because these routines are used reasonably often, cluster them in the same small set of pages so they will load quickly. However, because there are long periods when users aren’t accessing this functionality, these routines should not be located in the hot category.
Cold routines are rarely used in the application. Cold routines implement obscure features or cover boundary or error cases. Group these routines together to avoid wasting space on a hot or warm page.
At any given time, you should expect most of the hot pages to be resident, and you should expect the warm pages to be resident for the features that the user is currently using. Only very rarely should a cold page be resident.
To achieve this ideal ordering, gather a number of profile data sets. First, gather the hot routines. As described above, compile the application for profiling, launch it, and use the program. Using gprof
-S
, generate a frequency sorted order file called hot.order
from the profile data.
After creating a hot order file, create order files for features that users occasionally use, such as routines that only run when the application is launched. Printing, opening documents, importing images and using various non-document windows and tools are other examples of features that users use occasionally but not continually, and are good candidates for having their own order files. Naming these order files after the feature being profiled (for example, feature.order
) is recommended.
Finally, to generate a list of all routines, build a “default” order file default.order
(as described in “Reordering Procedures”).
Once you have these order files, you can combine them using the code shown in Listing 2. You can use this listing to build a command-line utility that removes duplicate lines in the order files while retaining the ordering of the original data.
// |
// unique |
// |
// A command for combining files while removing |
// duplicate lines of text. The order of other lines of text |
// in the input files is preserved. |
// |
// Build using this command line: |
// |
// cc -ObjC -O -o unique -framework Foundation Unique.c |
// |
// Note that "unique" differs from the BSD command "uniq" in that |
// "uniq" combines duplicate adjacent lines, while "unique" does not |
// require duplicate lines to be adjacent. "unique" is also spelled |
// correctly. |
// |
#import <stdio.h> |
#import <string.h> |
#import <Foundation/NSSet.h> |
#import <Foundation/NSData.h> |
#define kBufferSize 8*1024 |
void ProcessFile(FILE *fp) |
{ |
char buf[ kBufferSize ]; |
static id theSet = nil; |
if( theSet == nil ) |
{ |
theSet = [[NSMutableSet alloc] init]; |
} |
while( fgets(buf, kBufferSize, fp) ) |
{ |
id dataForString; |
dataForString = [[NSData alloc] initWithBytes:buf length:strlen(buf)]; |
if( ! [theSet containsObject:dataForString] ) |
{ |
[theSet addObject:dataForString]; |
fputs(buf, stdout); |
} |
[dataForString release]; |
} |
} |
int main( int argc, char *argv[] ) |
{ |
int i; |
FILE * theFile; |
int status = 0; |
if( argc > 1 ) |
{ |
for( i = 1; i < argc; i++ ) |
{ |
if( theFile = fopen( argv[i], "r" ) ) |
{ |
ProcessFile( theFile ); |
fclose( theFile ); |
} |
else |
{ |
fprintf( stderr, "Could not open ‘%s’\n", argv[i] ); |
status = 1; |
break; |
} |
} |
} |
else |
{ |
ProcessFile( stdin ); |
} |
return status; |
} |
Once built, you would use the program to generate your final order file with syntax similar to the following:
unique hot.order feature1.order ... featureN.order default.order > final.order |
Of course, the real test of the ordering is the amount by which paging I/O is reduced. Run your application, use different features, and examine how well your ordering file is performing under different conditions. You can use the top
tool (among others) to measure paging performance.
After reordering you will usually have a region of pages with cold routines that you expect to be rarely used, often at the end of your text ordering. However, one or two hot routines might slip through the cracks and land in this cold section. This is a costly mistake, because using one of these hot routines now requires an entire page to be resident, a page that is otherwise filled with cold routines that are not likely to be used.
Check that the cold pages of your executable are not being paged in unexpectedly. Look for pages that are resident with high-page offsets in the cold region of your application’s text segment. If there is an unwanted page, you need to find out what routine on that page is being called. One way to do this is to profile during the particular operation that is touching that page, and use the grep
tool to search the profiler output for routines that reside on that page. Alternatively, a quick way to identify the location where a page is being touched is to run the application under the gdb
debugger and use the Mach call vm_protect
to disallow all access to that page:
(gdb) p vm_protect(task_self(), startpage_addr, vm_page_size, FALSE, 0); |
After clearing the page protections, any access to that page causes a memory fault, which breaks the program in the debugger. At this point you can simply look at the function call stack (using the bt
command) to learn why the routine was being called.
You can use the -sectorder
option of the linker to organize blocks in most sections of the executable. Sections that might occasionally benefit from reordering are literal sections, such as the __TEXT
segment’s __cstring
section, and the __DATA
segment’s __data
section.
The lines in the order file for literal sections can most easily be produced with the ld
and otool
tools. For literal sections, otool
creates a specific type of order file for each type of literal section:
For C string literal sections, the order-file format is one literal C string per line (with ANSI C escape sequences allowed in the C string). For example, a line might look like
Hello world\n |
For 4-byte literal sections, the order-file format is one 32-bit hex number with a leading 0x per line with the rest of the line treated as a comment. For example, a line might look like
0x3f8ccccd (1.10000002384185790000e+00) |
For 8-byte literal sections, the order file line consists of two 32-bit hexadecimal numbers per line separated by white space each with a leading 0x, with the rest of the line treated as a comment. For example, a line might look like:
0x3ff00000 0x00000000 (1.00000000000000000000e+00) |
For literal pointer sections, the format of the lines in the order file represents the pointers, one per line. A literal pointer is represented by the segment name, the section name of the literal pointer, and then the literal itself. These are separated by colons with no extra white space. For example, a line might look like:
__OBJC:__selector_strs:new |
For all the literal sections, each line in the order file is simply entered into the literal section and appears in the output file in the order of the order file. No check is made to see if the literal is in the loaded objects.
To reorder a literal section, first create a “whatsloaded” file using the ld
-whatsloaded
option as described in section “Creating a Default Order File.” Then, run otool
with the appropriate options, segment and section names, and filenames. The output of otool
is a default order file for the specified section. For example, the following command line produces an order file listing the default load order for the __TEXT
segment’s __cstring
section in the file cstring_order
:
otool -X -v -s __TEXT __cstring `cat whatsloaded` > cstring_order |
Once you’ve created the file cstring_order
, you can edit the file and rearrange its entries to optimize locality of reference. For example, you can place literal strings used most frequently by your program (such as labels that appear in your user interface) at the beginning of the file. To produce the desired load order in the executable, use the following command:
cc -o hello hello.o -sectorder __TEXT __cstring cstring_order |
There are currently no tools to measure code references to data symbols. However, you might know a program’s data-referencing patterns and might be able to get some savings by separating data for seldom-used features from other data. One way to approach __data
section reordering is to sort the data by size so small data items end up on as few pages as possible. For example, if a larger data item is placed across two pages with two small items sharing each of these pages, the larger item must be paged in to access the smaller items. Reordering the data by size can minimize this sort of inefficiency. Because this data would normally need to be written to the virtual-memory backing store, this could be a major savings in some programs.
To reorder the __data
section, first create an order file listing source files and symbols in the order in which you want them linked (order file entries are described at the beginning of “Generating Order Files”). Then, link the program using the -sectorder
command-line option:
cc -o
outputFile inputFile.o …
-sectorder __DATA __data
orderFile -e
start
To use an order file with a Xcode project, modify the “Other Linker Flags” option in the Deployment build style of your project. Add the text -sectorder __DATA __data
orderFile to this setting to specify your order file.
Some additional guidelines to keep in mind when reordering routines coded in assembly language:
temporary labels in assembly code
Within hand-coded assembly code, be careful of branches to temporary labels that branch over a non temporary label. For example, if you use a label that starts with “L” or a d label (where d is a digit), as in this example
foo: b 1f |
... |
bar: ... |
1: ... |
The resulting program won’t link or execute correctly, because only the symbols foo
and bar
make it into the object file’s symbol table. References to the temporary label 1
are compiled as offsets; as a result, no relocation entry is generated for the instruction b
1f
. If the linker does not place the block associated with the symbol bar
directly after that associated with foo
, the branch to 1f
will not go to the correct place. Because there is no relocation entry, the linker doesn’t know to fix up the branch. The source-code change to fix this problem is to change the label 1
to a non temporary label (bar1
for example). You can avoid problems with object files containing hand-coded assembly code by linking them whole, without reordering.
the pseudo-symbol .section_start
If the specified section in any input file has a non-zero size and there is no symbol with the value of the beginning of its section, the linker uses the pseudo symbol .section_start
as the symbol name it associates with the first block in the section. The purpose of this symbol is to deal with literal constants whose symbols do not persist into the object file. Because literal strings and floating-point constants are in literal sections, this not a problem for Apple compilers. You might see this symbol used by assembly-language programs or non-Apple compilers. However, you should not reorder such code and you should instead link the file whole, without reordering (see“Linking with an Order File”).
© 2003, 2006 Apple Computer, Inc. All Rights Reserved. (Last updated: 2006-06-28)