Mach-O is the native executable format of binaries in Mac OS X and is the preferred format for shipping code. An executable format determines the order in which the code and data in a binary file are read into memory. The ordering of code and data has implications for memory usage and paging activity and thus directly affects the performance of your program.
A Mach-O binary is organized into segments. Each segment contains one or more sections. Code or data of different types goes into each section. Segments always start on a page boundary, but sections are not necessarily page-aligned. The size of a segment is measured by the number of bytes in all the sections it contains and rounded up to the next virtual memory page boundary. Thus, a segment is always a multiple of 4096 bytes, or 4 kilobytes, with 4096 bytes being the minimum size.
The segments and sections of a Mach-O executable are named according to their intended use. The convention for segment names is to use all-uppercase letters preceded by double underscores (for example, __TEXT
); the convention for section names is to use all-lowercase letters preceded by double underscores (for example, __text
).
There are several possible segments within a Mach-O executable, but only two of them are of interest in relation to performance: the __TEXT
segment and the __DATA
segment.
The __TEXT Segment: Read Only
The __DATA Segment: Read/Write
Mach-O Performance Implications
The __TEXT
segment is a read-only area containing executable code and constant data. By convention, the compiler tools create every executable file with at least one read-only __TEXT
segment. Because the segment is read-only, the kernel can map the __TEXT
segment directly from the executable into memory just once. When the segment is mapped into memory, it can be shared among all processes interested in its contents. (This is primarily the case with frameworks and other shared libraries.) The read-only attribute also means that the pages that make up the __TEXT
segment never have to be saved to backing store. If the kernel needs to free up physical memory, it can discard one or more __TEXT
pages and re-read them from disk when they are needed.
Table 1 lists some of the more important sections that can appear in the __TEXT
segment. For a complete list of segments, see Mach-O Runtime Architecture.
The __DATA
segment contains the non-constant data for an executable. This segment is both readable and writable. Because it is writable, the __DATA
segment of a framework or other shared library is logically copied for each process linking with the library. When memory pages are readable and writable, the kernel marks them copy-on-write. This technique defers copying the page until one of the processes sharing that page attempts to write to it. When that happens, the kernel creates a private copy of the page for that process.
The __DATA
segment has a number of sections, some of which are used only by the dynamic linker. Table 2 lists some of the more important sections that can appear in the __DATA
segment. For a complete list of segments, see Mach-O Runtime Architecture.
The composition of the __TEXT
and __DATA
segments of a Mach-O executable file has a direct bearing on performance. The techniques and goals for optimizing these segments are different. However, they have as a common goal: greater efficiency in the use of memory.
Most of a typical Mach-O file consists of executable code, which occupies the __TEXT
, __text
section. As noted in “The __TEXT Segment: Read Only,” the __TEXT
segment is read-only and is mapped directly to the executable file. Thus, if the kernel needs to reclaim the physical memory occupied by some __text
pages, it does not have to save the pages to backing store and page them in later. It only needs to free up the memory and, when the code is later referenced, read it back in from disk. Although this is cheaper than swapping—because it involves one disk access instead of two—it can still be expensive, especially if many pages have to be recreated from disk.
One way to improve this situation is through improving your code’s locality of reference through procedure reordering, as described in “Improving Locality of Reference.” This technique groups methods and functions together based on the order in which they are executed, how often they are called, and the frequency with which they call one another. If pages in the __text
section group functions logically in this way, it is less likely they have to be freed and read back in multiple times. For example, if you put all of your launch-time initialization functions on one or two pages, the pages do not have to be recreated after those initializations have occurred.
Unlike the __TEXT
segment, the __DATA
segment can be written to and thus the pages in the __DATA
segment are not shareable. The non-constant global variables in frameworks can have an impact on performance because each process that links with the framework gets its own copy of these variables. The main solution to this problem is to move as many of the non-constant global variables as possible to the __TEXT
,__const
section by declaring them const
. “Reducing Shared Memory Pages” describes this and related techniques. This is not usually a problem for applications because the __DATA
section in an application is not shared with other applications.
The compiler stores different types of nonconstant global data in different sections of the __DATA
segment. These types of data are uninitialized static data and symbols consistent with the ANSI C notion of “tentative definition” that aren’t declared extern
. Uninitialized static data is in the __bss
section of the __DATA
segment. Tentative-definition symbols are in the __common
section of the __DATA
segment.
The ANSI C and C++ standards specify that the system must set uninitialized static variables to zero. (Other types of uninitialized data are left uninitialized.) Because uninitialized static variables and tentative-definition symbols are stored in separate sections, the system needs to treat them differently. But when variables are in different sections, they are more likely to end up on different memory pages and thus can be swapped in and out separately, making your code run slower. The solution to these problems, as described in “Reducing Shared Memory Pages,” is to consolidate the non-constant global data in one section of the __DATA
segment.
© 2003, 2006 Apple Computer, Inc. All Rights Reserved. (Last updated: 2006-06-28)