This chapter offers practical advice for how to tune your programs. It offers suggestions of areas you should monitor with the performance tools and also provides a list of practical tips for improving performance.
Common Areas to Monitor
Fundamental Optimization Tips
Many performance problems can be traced to specific parts of your program. As you design and implement your code, you should monitor those areas to make sure they meet the performance targets you set.
As you design your program, consider the tasks or workflows that users will encounter the most. During your implementation phase, be sure to monitor the code for those tasks and make sure their performance does not drop below acceptable levels. If it does, you should take immediate actions to correct the problems.
The key tasks performed by a program varies from program to program. For example, a word processor might need to be fast during text input and display, while a file utility program would need to be fast at scanning the files and directories on a hard disk. It is up to you to decide which tasks your users are most likely to perform.
For information on how to identify and fix slow operations in your program, see Code Speed Performance Guidelines.
Most programs do some amount of drawing. If your program uses only standard windows and controls, then you probably do not need to worry too much about drawing performance. However, if you do any custom drawing, you need to monitor your drawing code and make sure it is performing at acceptable levels. In particular, if you support any of the following, you should investigate ways to optimize your drawing code.
Live resizing
Custom view drawing code, especially if portions of the view can be updated without updating the whole view
Textured graphics
Entirely opaque views
For information on how to optimize drawing performance, see Drawing Performance Guidelines.
Launch time is the time when you initialize your program’s data structures and prepare to receive user input. However, many programs do much more work at launch time than is necessary. In many cases, tasks performed at launch time can be deferred until after the application has started processing user events. This deferral gives the user the perception that your application is fast, which is a good first impression to make.
For applications that need to run in Mac OS X version 10.3.3 and earlier, another way to improve launch times is to prebind your application. Prebinding involves precalculating library address ranges and storing those values in your application binary. This step eliminates the need for the dynamic loader (dyld
) to calculate those address ranges at launch time. Improvements in dyld
for Mac OS X version 10.3.4 make prebinding largely unnecessary in that and later releases.
For information on how to improve launch-time performance, see Launch Time Performance Guidelines.
The file system is a bottleneck for getting information into memory and the CPU. In the time it takes to access a file, tens of millions of instructions may be executed. It is therefore imperative that you examine the way your program uses files to be sure that the files you use are needed and are used properly.
Minimizing the number of files you use is one way to improve file-related performance. When you must access files, do so judiciously and keep the following in mind:
Understand how the system caches work and know how to optimize the use of those caches. Avoid caching data unless you plan to refer to it more than once.
Read and write data sequentially whenever possible. Jumping around a file takes extra time to seek to the new location.
Read larger blocks of data from files whenever possible, keeping in mind that reading too much data at once might cause different problems. For example, reading the entire contents of a 32 MB file might trigger paging of those contents before the operation is complete.
Avoid closing and reopening files unnecessarily. If caching is enabled, doing so may cause the cache to be refreshed even if the data did not change.
For information on how to identify and fix file-related performance problems, see Launch Time Performance Guidelines.
The size of your code can have a tremendous effect on system performance. The more memory pages used by your program, the fewer there are available for the system and other programs. This memory pressure can eventually lead to paging and an overall system slowdown.
Managing your code footprint is all about organizing your code and data structures. You need to make sure you have the right pieces in memory and that you are not causing any memory pages to be read or written unnecessarily. Some of the problems that cause a large memory footprint are as follows:
Code pages contain unused code. The compiler typically organizes code by compilation module, which is not always the best way to organize code. Alternatively, a function might have been excised from the active code path but remains in the code module.
Static or constant data is stored on writable pages. During paging, this data is written to disk unnecessarily.
Too many frameworks are included by the program. Load only the code you need.
For information on how to find and fix code footprint problems, see Code Size Performance Guidelines.
Programs allocate memory for storing both permanent and temporary data structures. Each memory allocation has a cost associated with it, both in CPU time and in memory consumption. Understanding when your program allocates memory and how that memory is used can help you reduce both of those costs.
Understanding your program’s memory usage can help determine ways to reduce that usage. You can find out if autoreleased Objective-C objects are being deallocated before they cause too much paging. You can find memory leaks caused by bugs in your code. You can also watch the number of times you call malloc
, which might point out places where you can reuse existing memory blocks rather than create new ones.
One important rule to follow when allocating memory is to be lazy. Defer memory allocations until you actually need the memory being used. For some additional ways you can be lazy with memory allocations, see “Be Lazy .”
For information about optimizing your memory allocation patterns, see Memory Usage Performance Guidelines.
Before you begin implementing a new program, there are several performance enhancements you should consider adding. Although you might not be able to take advantage of all of these enhancements in every case, you should at least consider them during your design phase.
All modern Mac OS X applications should be using the Carbon Event Manager or other event-based model for responding to system events. The old way of retrieving events by polling the system is highly inefficient. In fact, when there are no events to process, polling code is a 100 percent waste of CPU time. Using more modern event-based APIs can lead to the following benefits:
It makes your program more responsive to the user.
It reduces your application’s CPU usage.
It minimizes your application’s working set—the number of code pages loaded in memory at any given time.
It allows the system to manage power aggressively.
The Cocoa framework incorporates Carbon Event Manager calls into its classes and methods to implement an event-driven model for you. Applications written in Cocoa automatically take advantage of this behavior and require no additional modifications. Carbon applications must support the Carbon Event Manager calls explicitly.
Event-based handlers are not limited to supporting user events, such as mouse and keyboard events. Each thread has its own run loop to provide on-demand responses to timers, network events, and other incoming data. Applications support run loops using either the Core Foundation (CFRunLoop) or Cocoa (NSRunLoop) interfaces.
Supporting multiple threads is a good way to improve both the perceived and actual performance of your program. On hardware containing multiple processors, a multithreaded program often has significantly better performance than a single-threaded program. By distributing tasks across all available processors, an application can perform multiple operations simultaneously. Even on a single-processor machine, the use of additional threads can provide a perceived speed boost by leaving your main thread free to handle user events.
Before you begin adding support for multiple threads, though, be sure to put some thought into how your program might use those threads effectively. Because threads require a fair amount of overhead to create, you should carefully choose which tasks you want to assign to separate threads. If all of your program’s tasks are small and performed at different times, you would probably not want to create separate threads for each one. Instead, creating a single long-lived worker thread might be more appropriate.
Another consideration with threading is how to protect your data structures. Problems can occur when multiple threads modify the same data without first checking to see if it is safe to do so. Your code needs to use locks rigorously to protect its data structures. You might also need to synchronize specific blocks of code to prevent them from being executed by multiple threads at once.
For information on how to support additional threads in your program, see Threading Programming Guide.
If your application performs a lot of mathematical computations on scalar data, you should consider using the Accelerate framework (Accelerate.framework
) to accelerate those calculations. The Accelerate framework takes advantage of any available vector processing units (such as the PowerPC AltiVec extensions, also known as Velocity Engine, or the Intel x86 SSE extensions) to perform multiple calculations in parallel. By coding to the framework, you can avoid having to create separate code paths for each platform architecture. The Accelerate framework is highly tuned for all of the architectures Mac OS X supports.
Tools such as Shark can help point out portions of your program that might benefit from using the Accelerate framework. For more information about Shark and other tools, see “Performance Tools.”
A very simple way to improve performance is to make sure your application does not perform any unnecessary work. Each moment of an application’s time should be spent responding to the user’s current request, not predicting future requests. If you do not need a resource right away, such as a nib file containing a preferences window, don’t load it. Such an action takes time to execute because it accesses the file system, and if the user never opens that preference window, the process of loading its nib file is a waste of time.
The basic rule is wait until the user requests something from your application, then use the necessary resources to fulfill the request. You should cache data only in situations where there is a measurable performance benefit. Preloading caches on the assumption that the rest of the application will run faster can actually degrade performance in low-memory situations. In such a situation, your cached data may be paged to disk before it can be used. Thus, any savings you gained by caching the data turn into a loss because that data must now be read from disk twice before it is ever used. If you really want to cache data, wait until a given operation has been performed once before you cache any data from it.
Some other things to be lazy about include the following:
Defer memory allocation until the point where you actually need the memory.
Don’t zero-initialize blocks of memory. Call the calloc
function to do it for you lazily.
Give the system the chance to load your code lazily. Profile and organize your code so that the system loads only the code needed for the current operation.
Defer reading the contents of a file until you actually need the information.
The perception of performance is just as effective as actual performance in many cases. Many program tasks can be performed in the background, on a separate thread, or at idle time. Doing this makes the program interface feel more responsive to the user. Of course, creating the perception of performance does not work in every case. For example, the perception may be lost if the data being processed in the background is needed by the user immediately.
As you design your program, think about which tasks can be moved to the background effectively. For example, if your program needed to scan a number of files, do it on a background thread. Similarly, if you need to perform lengthy calculations, do it in the background so that the user may continue to manipulate your program’s user interface.
Another way to improve perceived performance is to make sure your application launches quickly. At launch time, defer any tasks that do not contribute to the immediate presentation of your application interface. For example, defer the creation of large data structures you do not need immediately until after your application has finished launching. You should also avoid loading plug-ins until the moment their code is actually needed.
If you have a Carbon application that is based on the Code Fragment Manager Preferred Executable Format (PEF), you should consider switching to the Mach-O executable format for several reasons. Foremost among them is that Mach-O is designed and optimized for use with the Mac OS X virtual memory system. Other reasons include the following:
PEF executables are not supported on Intel-based Macintosh computers.
In Mac OS X, the libraries that implement the Carbon environment use the Mach-O executable format. Mach-O executables use a calling convention different from that used by PEF executables. Calls made to or from PEF code fragments must be translated at runtime. While the translation overhead is small, it is unnecessary if you are using Mach-O.
Apple’s Mac OS X development environment supports only Mach-O. Whether or not you use Apple’s development environment for Mac OS X, the Mac OS X performance tools are significantly easier to use with Mach-O executables than with PEF executables.
Mach-O executables can directly call other Mach-O shared libraries and BSD API routines in the kernel.
Mach-O supports just-in-time binding, where a link to a function is resolved when that function is first called. All links in a PEF-based application (and all PEF libraries it links to) must be resolved when the application is launched.
Although Mach-O is not supported in Mac OS 9, using Mach-O does not require you to abandon Mac OS 9 as a delivery platform. You can build an application package that runs a PEF binary in Mac OS 9 and a Mach-O binary in Mac OS X. This allows you to optimize your executable for each operating system that you wish to support. For more information, see Bundle Programming Guide.
For an overview of the Mach-O format and how you can take advantage of that format for performance tuning, see “Overview of the Mach-O Executable Format” in Code Size Performance Guidelines.
© 2004, 2006 Apple Computer, Inc. All Rights Reserved. (Last updated: 2006-10-03)