For many years, maximum computer performance was limited largely by the speed of a single microprocessor at the heart of the computer. As the speed of individual processors started reaching their practical limits, however, designers switched to multicore designs, which let the computer perform multiple tasks simultaneously. Of course, software plays a crucial role in keeping each core in a multicore machine busy. This is where concurrency plays a role.
This chapter introduces the concept of concurrency and its effects on application design. You do not have to understand the implementation details of threads to read this chapter. The purpose of this chapter is to get you thinking about whether concurrency (and threading) is an appropriate tool for you to use in your application.
About Concurrency
Terminology
Mac OS X Support
Design Considerations
Design Tips
The foundations of Mac OS X were built to support concurrency, both at the system level and at the application level. At the system level, multiple applications run side by side, each receiving an appropriate amount of execution time based on its needs and the needs of other programs. At the application level, a single application can have multiple paths of execution perform different tasks simultaneously, or in a nearly simultaneous manner. This document focuses on the latter form of concurrency—application-level concurrency—along with its benefits and hazards.
In a nonconcurrent application, there is only one path (or thread) of execution through the application’s code. That path starts and ends with the application’s main
routine and branches one-by-one to different methods or functions to implement the application’s overall behavior. By contrast, an application that supports concurrency starts with one path of execution and adds more paths as needed. Each new path has its own custom start and end routines that runs parallel to the application’s main
routine. There are two important reasons to have multiple paths of execution in an application:
Multiple paths can improve an application’s perceived responsiveness.
Multiple paths can improve an application’s real-time performance on multicore systems.
If your application has only a single execution path, that one path does everything. It responds to user events, draws to your windows, and does all of the computations you need to implement your application’s behavior. The problem is that it can only do one thing at a time, so what happens when one of your computations takes a long time to finish? While your code is computing the needed values, your application stops responding to user events or drawing to its windows. A user seeing this behavior might think your application is hung and try to forcibly kill it. But if you moved all of your custom computations on a separate path of execution, you could free up the main path to respond to the user and make sure events got handled and windows got updated.
With multicore computers all but ubiquitous these days, concurrency also offers a way to increase performance in some types of applications. Tasks that are truly parallel can now be run on different processor cores, making it possible for an application to increase the amount of work it does in a given amount of time by multiple factors.
Of course, along with the benefits of concurrency come the potential problems. As you might expect, having multiple paths of execution in an application can add a considerable amount of complexity to your code. Because all of your application’s execution paths share the same memory space, two paths modifying the same block of memory at the same time can corrupt each other’s changes and cause your application to misbehave. Even with protection in place to prevent that occurrence, you still have to watch out for compiler optimizations that introduce subtle (and not so subtle) bugs into your code. Fortunately, Mac OS X has tools to help overcome many of the problems associated with concurrency and to help you reap its benefits.
Before getting too far into discussions about programs and the technologies used to implement concurrency, it would be good to settle on some basic terminology.
If you are familiar with Carbon’s Multiprocessor Services API or with UNIX systems, you may find that the term “task” is used differently by this document. In earlier versions of Mac OS, the term “task” was used to distinguish between threads created using the Multiprocessor Services and those created using the Carbon Thread Manager API. On UNIX systems, the term “task” is also used at times to refer to a running process. In practical terms, a Multiprocessor Services task is equivalent to a preemptively scheduled thread in Mac OS X.
This document adopts the following terminology:
The term thread is used to refer to a separate thread of execution.
The term process is used to refer to a running executable, which can encompass multiple threads.
The term task is used to refer to the abstract concept of the job being performed by a thread.
Mac OS X provides numerous technologies to help you implement concurrency in your applications. The following sections summarize these technologies and how you can use them.
Threads are the fundamental technology underlying application-level concurrency. More lightweight than processes, threads provide the basic constructs needed to implement separate paths of execution inside a process. The kernel provides direct support for threads and runs them using a preemptive scheduling model. This implementation prevents any one thread from dominating the processor and also provides support for fundamental features, such as locks and the ability to put threads to sleep when there is nothing to do.
Although the kernel provides the basic implementation for threads, application-level threads are based on BSD threads and the POSIX threading API. This API provides the application-level support needed to create and manage threads. In addition, Mac OS X implements several higher-level technologies that provide a more streamlined or sophisticated interface to the basic POSIX threads API. Although higher-level technologies are usually the natural choice, they are by no means the only choice. Stepping down and using lower-level APIs is completely supported and may be necessary at times to use features not readily accessible in the higher-level technology.
At the application level, threads in Mac OS X behave in basically the same way as on other platforms. After starting a thread, the thread runs in one of three main states: running, ready, or blocked. If a thread is not currently running, it is either blocked and waiting for input or it is ready to run but not scheduled to do so yet. The thread continues moving back and forth among these states until it finally exits and moves to the terminated state.
When you create a thread, you specify an entry-point function (or an entry-point method in the case of Cocoa threads). This entry-point function constitutes the code you want to run on the thread. The function can perform a fixed amount of work and then exit or it can set up a run loop and keep running for as long as you want. (For information about run loops, see “Run Loops.”) When the entry-point function exits, or when you terminate the thread explicitly, the thread stops permanently and is reclaimed by the system.
For more information about the available threading technologies and how to use them, see “Creating and Managing Threads.”
Operation objects provide an easy way to add concurrency to your Cocoa applications without creating threads yourself. Introduced in Mac OS X v10.5, operation objects separate out the custom behavior of your application from the threads used to run that behavior. An operation object encapsulates the code and data associated with a particular task in your application. All you have to do to perform that task is create the operation object and either run it directly or add it to an operation queue. The operation queue infrastructure then takes over by setting up the runtime environment and running your task. By default, an operation queue runs each operation in a separate thread, but you can customize the environment for each operation object as needed.
Because operation objects provide a clean and simple encapsulation model, they promote a better (and simpler) design model than raw threads. In addition, letting operation objects create threads for you is often more efficient than doing it yourself. Operation queues work directly with the kernel to ensure that an optimal number of operation objects are run in the most efficient way possible. They take into account system-specific factors, such as the number of available cores and the system load, and use that information to decide how many operations to run and when. This kernel support also extends to the creation of the threads themselves, which are often maintained in thread pools to reduce the startup costs associated with creating new threads.
For more information about using operation objects to support concurrency, see “Creating and Managing Operation Objects.”
One of the hazards of concurrent program design is the fact that although there are multiple threads of execution, there is often just one set of resources that those threads have to share. If multiple threads try to modify the same resource at the same time, problems can occur. One way to alleviate shared resource problems is to eliminate them altogether and make sure each thread has its own set of resources, but sometimes maintaining completely separate resources is not an option. In those situations, you can synchronize access to the resource using locks, conditions, atomic operations, and other techniques.
Locks provide a brute force form of protection for code that can be executed by only one thread at a time. The most common type of lock is mutual exclusion lock, also known as a mutex. When a thread tries to acquire a mutex that is currently held by another thread, it blocks until the lock is released by the other thread. Several system frameworks provide support for mutex locks, although they are all based on the same underlying technology. In addition, Cocoa provides several variants of the mutex lock to support different types of behavior, such as recursion. For more information about the types of locks available in Mac OS X, see “Locks.”
In addition to locks, Mac OS X provides support for conditions, which ensure the proper sequencing of tasks within your application. A condition acts as a gatekeeper, blocking a given thread until the condition it represents becomes true. When that happens, the condition releases the thread and allows it to continue. Mac OS X provides direct support for conditions in both POSIX and Cocoa. If you use operation objects, you can also configure dependencies among your operation objects to sequence the execution of tasks, which is very similar to the behavior offered by conditions.
Although locks and conditions are very common in concurrent design, atomic operations are another way to protect and synchronize access to data. Atomic operations offer a lightweight alternative to locks in situations where you want to perform mathematical or logical operations on scalar data types. Atomic operations take advantage of hardware instructions to ensure that modifications to a variable are completed before other threads have a chance to access it.
For more information about the available synchronization tools for Mac OS X, see “Synchronization Tools.”
A run loop is a piece of infrastructure used to manage events arriving asynchronously on a thread. A run loop is created for each thread automatically by the system, but that run loop must be configured before it can be used. The infrastructure provided by both Cocoa and Carbon handles the configuration of the main thread’s run loop for you automatically. If you plan to create long-lived secondary threads, however, you must configure the run loop for those threads yourself.
A run loop works by monitoring one or more attached event sources. If no events are present and ready to be handled, the run loop puts the thread to sleep. The thread stays asleep until one of the run loop’s sources signals that the thread should be woken up. At that point, the kernel wakes up the thread and hands control back to the run loop, which then dispatches the event to the appropriate handler routine.
You are not required to use a run loop with any threads you create, but doing so can provide a better experience for the user. Run loops make it possible to create long-lived threads, and to put those threads to sleep when there is nothing to do. This behavior is much more efficient than polling for events, which wastes CPU time. The run loop infrastructure is also very flexible and can be configured to support different runtime modes and application-specific messaging systems.
Details about run loops and examples of how to use them are provided in “Run Loop Management.”
Although a good design minimizes the amount of required communication, at some point, communication between threads becomes necessary. A thread’s job is to do work for your application, but if the results of that job are never used, what good is it? Threads may need to process new job requests or report their progress to your application’s main thread. In these situations, you need a way to get information from one thread to another. Fortunately, the fact that threads share the same process space means you have lots of options for communication.
There are many ways to communicate between threads, each with its own advantages and disadvantages. “Configuring Thread-Local Storage” lists the most common communication mechanisms you can use in Mac OS X. The techniques in this table are listed in order of increasing complexity.
Mechanism | Description |
---|---|
Direct messaging | Cocoa applications support the ability to perform selectors directly on other threads. This capability means that one thread can essentially execute a method on any other thread. Because they are executed in the context of the target thread, messages sent this way are automatically serialized on that thread. For information about input sources, see “Cocoa Perform Selector Sources.” |
Global variables, shared memory, and objects | Another simple way to communicate information between two threads is to use a global variable, shared object, or shared block of memory. Although shared variables are fast and simple, they are also more fragile than direct messaging. Shared variables must be carefully protected with locks or other synchronization mechanisms to ensure the correctness of your code. Failure to do so could lead to race conditions, corrupted data, or crashes. |
Conditions | Conditions are a synchronization tool that you can use to control when a thread executes a particular portion of code. You can think of conditions as gate keepers, letting a thread run only when the stated condition is met. For information on how to use conditions, see “Using Conditions.” |
Run loop sources | A custom run loop source is one that you set up to receive application-specific messages on a thread. Because they are event driven, run loop sources put your thread to sleep automatically when there is nothing to do, which improves your thread’s efficiency. For information about run loops and run loop sources, see “Run Loop Management.” |
Ports and sockets | Port-based communication is a more elaborate way to communication between two threads, but it is also a very reliable technique. More importantly, ports and sockets can be used to communicate with external entities, such as other processes and services. For efficiency, ports are implemented using run loop sources, so your thread sleeps when there is no data waiting on the port. For information about run loops and about port-based input sources, see “Run Loop Management.” |
Message queues | Multiprocessing Services defines a first-in, first-out (FIFO) queue abstraction for managing incoming and outgoing data. Although message queues are simple and convenient, they are not as efficient as some other communications techniques. For more information about how to use message queues, see Multiprocessing Services Programming Guide. |
Cocoa distributed objects | Distributed objects is a Cocoa technology that provides a high-level implementation of port-based communications. Although it is possible to use this technology for interthread communication, doing so is highly discouraged because of the amount of overhead it incurs. Distributed objects is much more suitable for communicating with other processes, where the overhead of going between processes is already high. For more information, see Distributed Objects Programming Topics. |
One aspect of concurrency that is often forgotten is that threads are not the only option available. Threads solve the specific problem of how to run tasks in parallel inside the same process. There may be cases, however, where the overhead associated with threads may be too great for the intended task or where other options might be easier. Table 1-2 lists some of the alternatives to threads along with the situations in which you might use them.
Technology | Description |
---|---|
Idle-time notifications | For tasks that are relatively short and very low priority, idle time notifications let you perform the task at a time when your application is not as busy. Cocoa provides support for idle-time notifications using the |
Asynchronous functions | Mac OS X provides many asynchronous functions that provide automatic concurrency for you. These APIs may use system daemons and processes or create custom threads to perform their task and return the results to you. (The actual implementation is irrelevant because it is separated from your code.) As you design your application, look for functions that offer asynchronous behavior and consider using them instead of using the equivalent synchronous function on a custom thread. |
Timers | You can use timers on your application’s main thread to perform periodic tasks that are too trivial to require a thread, but which still require servicing at regular intervals. For information on timers, see “Timer Sources.” |
Separate processes | Although more heavyweight than threads, creating a separate process might be useful in cases where the task is only tangentially related to your application. You might use a process if a task requires a significant amount of memory or must be executed using root privileges. For example, you might use a 64-bit server process to compute a large data set while your 32-bit application displays the results to the user. |
Warning: When launching separate processes using the fork
function, you must always follow a call to fork
with a call to exec
or a similar function. Applications that depend on the Core Foundation, Cocoa, or Core Data frameworks (either explicitly or implicitly) must make a subsequent call to an exec
function or those frameworks may behave improperly.
As computers gain more and more cores, support for concurrency is quickly becoming less of an option and more of a requirement for software designers. But does this mean you should start creating large numbers of threads in your code? Absolutely not. Supporting concurrency requires a careful analysis of your program’s behavior to determine which portions (if any) might benefit from running independently. This set of tasks then has to be balanced against the costs of supporting concurrency, which are not trivial. When analyzing your application, you should look for tasks that exhibit as many of the following characteristics as possible:
The task shares as few data structures with other tasks.
The task is as modular.
The task performs a relatively large (more than 10 milliseconds worth) amount of work.
Tasks that share data or code tend to require much more careful coding than those that do not. Shared data structures require the use of locks to synchronize access to those structures. Although locks are a useful tool, they are also a performance bottleneck because their acquisition takes a nontrivial amount of time. Avoiding locks by using separate data structures is preferable, especially when that separation comes naturally. Forcibly separating a large data structure into several smaller chunks may avoid the need for a lock, but creating those data structures and reintegrating them has a cost associated with it as well.
It is also important to remember that there are costs to supporting concurrency. Threads and other thread-related data structures consume system resources. If you choose tasks that take relatively little time to complete, the cost of allocating the needed resources may outweigh the potential benefits. This is not to say that you should never perform short tasks on background threads. Operation objects are optimized to use thread pools, which often alleviates many of the costs associated with setting up a thread. You can also configure a thread to be long-lived and process multiple requests on demand, although doing so requires more effort and increases the complexity of your code, which increases the potential for bugs.
As far as knowing whether your application is right for concurrency, the most important thing you can do is understand your application’s data model and expected behavior. After that, you need to understand where the potential pitfalls lie. Understanding your data model is something only you can do, but the remaining sections in this chapter (and the rest of this document) are here to help you understand where the potential pitfalls lie and how you might avoid them.
In the context of concurrency, your application’s expected behavior comprises two factors. First, you must define what your application does and what is considered to be “correct” behavior. Second, you should define the expected performance for your application when it is behaving correctly. Defining both of these pieces up front is necessary for determining whether your actual implementation is working correctly.
The absolute correctness of your code is of the utmost importance in a concurrent application. By its nature, concurrency introduces the potential for data to be misinterpreted or corrupted due to timing errors. Ensuring that the data in your data structures is accessed safely, and in the proper sequence, must always be part of your overall design. Document all of the key data structures in your program and the steps it takes to access and modify them correctly. Understand how your key data structures affect and influence each other. In a concurrent application it is easier to modify isolated data structures than it is to modify structures with large sets of dependencies.
If you are using concurrency to increase the real-time performance of your application, you should also define some performance goals. Resource contention and an improperly designed set of tasks are both factors that can degrade performance significantly. If those problems are serious enough, they can even make performance in the concurrent case worse than in the single-threaded case. Having definite, but reasonable, performance goals helps you track whether the addition of concurrency is having the intended effect.
As part of your performance goals, you should also consider the constraints for your application. Sure, having an application that runs fast on a Mac Pro with eight processors and 8 gigabytes of memory is great, but what happens when you run it on an older Mac Mini? The point of having constraints is that they will help influence the decisions you make as you try to achieve your goals. Ask yourself the following questions during your planning.
Are you trying to boost application responsiveness? If so, what level of latency is expected?
What is the minimum level of acceptable performance? (10% gain? 50% gain?)
What level of additional memory usage is allowable? (Threads use up additional memory, so setting an upper limit may limit the level of concurrency you support.)
After you have a set of goals for your application’s expected behavior, you need to think about how you can factor your application’s tasks to support those goals. Just because you can create concurrent threads of execution, does not mean you should. Each task should be considered carefully to determine if running it concurrently would benefit your application or cause potential problems. You should already know the expected behaviors of your application, so this exercise is all about identifying which of those behaviors is suited for concurrency. For example, searching for a string in a large block of text or performing a large calculation might be well suited for concurrency, but things like low-level event handling and drawing (with some exceptions) typically are not.
When deciding which tasks to make concurrent, there are several factors you should consider.
Are there alternatives to performing the task yourself? Asynchronous methods or system technologies may already perform the same task, and may be able to do so concurrently. If so, using them might be simpler than doing the task yourself.
How long does the task take to execute? Longer tasks are generally better suited for running in the background than short tasks. However, if a shorter task runs at regular intervals or can share a thread with other tasks, you might consider creating a long-lived thread to run them.
What shared resources must be manipulated by the task? If the task must manipulate complex data structures, or shared data structures, it may encounter more synchronization issues than if it manipulated only local data.
What is the benefit to running the task separately? If running the task in the background would offer significant performance improvements, the benefit of doing so may outweigh other factors.
How much intertask communication is required? If a task would spend a lot of its time sending messages or coordinating with other parts of your application, you might reconsider the benefit of executing the task concurrently. The task might end up spending much more time blocked and waiting on other parts of your application to respond than doing real work.
These factors are by no means the only criteria to consider, nor should you avoid selecting a task because it manipulates one data structure too many. All of the decisions you make must be measured against the goals for your application.
The basic problem with creating threads yourself is that it adds uncertainty to your code. Threads are a relatively low-level tool for implementing concurrency and their use is fraught with pitfalls. If you do not fully understand the implications of your design choices, you might encounter synchronization or timing issues, the severity of which can range from subtle behavioral changes to your application imploding gloriously and destroying user data. (Granted, it takes a lot of effort for you to cause your application to implode gloriously, but the fact that it is possible should serve as a warning not to skimp on your planning efforts.)
As part of your planning, you should consider using system technologies that eliminate the need for you to implement threads yourself. Mac OS X itself takes advantage of threads in many places, and as multicore machines becomes more common, system concurrency is only going to increase. Building your code on top of asynchronous functions or using operation objects (see “Creating and Managing Operation Objects”) may not totally eliminate the need to think about concurrency, but they certainly make implementing it easier. Building on top of these technologies also means your code will benefit from any future improvements to them.
As you design your application, here are some guidelines to help you implement concurrency and ensure the correctness of your code.
Writing thread-creation code manually is tedious and potentially error-prone and you should avoid it whenever possible. Mac OS X provides implicit support for concurrency through other APIs. Rather than create a thread yourself, consider using asynchronous APIs or operation objects to do the work. These technologies do the thread-related work behind the scenes for you and are guaranteed to do it correctly. In addition, technologies such as operation objects are designed to manage threads much more efficiently than your own code ever could by adjusting the number of active threads based on the current system load. For more information, see “Creating and Managing Operation Objects.”
If you decide to handle thread creation and management yourself, remember that threads consume precious system resources. You should do your best to make sure that any tasks you assign to threads are reasonably long-lived and productive. At the same time, you should not be afraid to terminate threads that are spending most of their time idle. Threads use a nontrivial amount of memory, some of it wired, so releasing an idle thread not only helps reduce your application’s memory footprint, it also frees up more physical memory for other system processes to use.
Important: Before you start terminating idle threads, you should always record a set of baseline measurements of your applications current performance. After trying your changes, take additional measurements to verify that the changes are actually improving performance, rather than hurting it.
The simplest and easiest way to avoid thread-related resource conflicts is to give each thread in your program its own copy of whatever data it needs. Parallel code works best when you minimize the communication and resource contention among your threads.
Creating a multithreaded application is hard. Even if you are very careful and lock shared data structures at all the right junctures in your code, your code may still be semantically unsafe. For example, your code could run into problems if it expected shared data structures to be modified in a specific order. Changing your code to a transaction-based model to compensate could subsequently negate the performance advantage of having multiple threads. Eliminating the resource contention in the first place often results in a simpler design with excellent performance.
If your application has a graphical user interface, it is recommended that you receive user-related events and initiate interface updates from your application’s main thread. This approach helps avoid synchronization issues associated with handling user events and drawing window content. Some frameworks, such as Cocoa, generally require this behavior, but it also has the advantage of simplifying the logic for managing your user interface.
There are a few notable cases where it is advantageous to perform graphical operations from other threads. For example, the QuickTime API includes a number of operations that can be performed from secondary threads, including opening movie files, rendering movie files, compressing movie files, and importing and exporting images. Using secondary threads for these operations can greatly increase performance. Similarly, in Carbon and Cocoa you can use secondary threads to create and process images and perform other image-related calculations. There are likely other exceptions, but if you’re not sure about a particular graphical operation, plan on doing it from your main thread.
For more information about QuickTime thread safety, see Technical Note TN2125: “Thread-Safe Programming in QuickTime.” For more information about Cocoa thread safety, see “Thread Safety Summary for Mac OS X.” For more information about drawing in Cocoa, see Cocoa Drawing Guide.
A process runs until all nondetached threads have exited. By default, only the application’s main thread is created as nondetached, but you can create other threads that way as well. When the user quits an application, it is usually considered appropriate behavior to terminate all detached threads immediately, because the work done by detached threads is considered optional. If your application is using background threads to save data to disk or do other critical work, however, you may want to create those threads as nondetached to prevent the loss of data when the application exits.
Creating threads as nondetached (also known as joinable) requires extra work on your part. Because most high-level thread technologies do not create joinable threads by default, you may have to use the POSIX API to create your thread. In addition, you must add code to your application’s main thread to join with the nondetached threads when they do finally exit. For information on creating joinable threads, see “Setting the Detached State of a Thread.”
If you are writing a Cocoa application, you can also use the applicationShouldTerminate:
delegate method to delay the termination of the application until a later time or cancel it altogether. When delaying termination, your application would need to wait until any critical threads have finished their tasks and then invoke the replyToApplicationShouldTerminate:
method. For more information on these methods, see NSApplication Class Reference.
Exception handling mechanisms rely on the current call stack to perform any necessary clean up when an exception is thrown. Because each thread has its own call stack, each thread is therefore responsible for catching its own exceptions. Failing to catch an exception in a secondary thread is the same as failing to catch an exception in your main thread: the owning process is terminated. You cannot throw an uncaught exception to a different thread for processing.
If you need to notify another thread (such as the main thread) of an exceptional situation in the current thread, you should catch the exception and simply send a message to the other thread indicating what happened. Depending on your model and what you are trying to do, the thread that caught the exception can then continue processing (if that is possible), wait for instructions, or simply exit.
Note: In Cocoa, an NSException
object is a self-contained object that can be passed from thread to thread once it has been caught.
In some cases, an exception handler may be created for you automatically. For example, the @synchronized
directive in Objective-C contains an implicit exception handler.
The best way for a thread to exit is naturally, by letting it reach the end of its main entry point routine. Although there are functions to terminate threads immediately, those functions should be used only when absolutely necessary. Terminating a thread before it has reached its natural end point prevents the thread from cleaning up after itself. If the thread has allocated memory, opened a file, or acquired other types of resources, your code may be unable to reclaim those resources, resulting in memory leaks or other potential problems.
For more information on the proper way to exit a thread, see “Terminating a Thread.”
Although an application developer has control over whether an application executes with multiple threads, library developers do not. When developing libraries, you must assume that the calling application is multithreaded or could switch to being multithreaded at any time. As a result, you should always use locks for critical sections of code.
For library developers, it is unwise to create locks only when an application becomes multithreaded. If you need to lock your code at some point, create the lock object early in the use of your library, preferably in some sort of explicit call to initialize the library. Although you could also use a static library initialization function to create such locks, try to do so only when there is no other way. Execution of an initialization function adds to the time required to load your library and could adversely affect performance.
Note: Always remember to balance calls to lock and unlock a mutex lock within your library. You should also remember to lock library data structures rather than rely on the calling code to provide a thread-safe environment.
If you are developing a Cocoa library, you can register as an observer for the NSWillBecomeMultiThreadedNotification
if you want to be notified when the application becomes multithreaded. You should not rely on receiving this notification, though, as it might be dispatched before your library code is ever called.
© 2008 Apple Inc. All Rights Reserved. (Last updated: 2008-02-08)