< Previous PageNext Page > Hide TOC

System Tracing

Shark’s System Trace configuration records an exact trace of system-level events, such as system calls, thread scheduling decisions, interrupts, and virtual memory faults. System Trace allows you to measure and understand how your code interacts with Mac OS X and how the threads in your multi-threaded application interact with each other. If you would like to gain a clear understanding of the multi-threaded behavior of a given program, characterize user vs. system processor utilization, or understand the virtual memory paging behavior of your application, then System Trace can give you the necessary insight.

In this section:

Tracing Methodology
Basic Usage
Interpreting Sessions
Sign Posts
Tips and Tricks

Tracing Methodology

System Trace complements Shark’s default Time Profiling configuration (see “Time Profiling”) by allowing you to see all transitions between OS and user code, a useful complement to the statistical sampling techniques used by Time Profiling. For example, in “Basic Usage” we can see an example of a thread of execution sampled by both a Time Profile (on top) and a System Trace (on the bottom). Time profiling takes evenly-spaced samples over the course of time, allowing us to get an even distribution of samples from various points of execution in both the user and system code segments of the application. This gives us a statistical view of what the processor was doing, but it does not allow us to see all user-kernel transitions — the first, brief visit to the kernel is completely missed, because it falls between two sample points. System tracing, in contrast, records an event for each user-kernel transition. Unlike time profiling, this does not give us an overview of all execution, because we are looking only at the transition borders and never in the execution time between them, but we gain an exact view of all of the transitions. This precision is often more useful when debugging user-kernel and multithreading problems, because these issues frequently hinge upon managing the precise timing of interaction events properly in order to minimize the time that threads spend waiting for resources (blocked), as opposed to minimizing execution time.

Figure 3-1  Time Profile vs. System Trace Comparison

Time Profile vs. System Trace Comparison

As with most other profiling options available from Shark, System Trace requires no modification of your binaries, and can be used on released products. However, it is possible to insert arbitrary events into the System Trace Session using Sign Posts, which are discussed in detail in “Sign Posts,” if the built-in selection of events simply does not record enough information.

Basic Usage

In its simplest usage, all you need to do is select System Trace from the Configuration Popup and start sampling. Shark will then capture up to 1,000,000 system events per-processor (a user-tunable limit), process the results, and display a session (see “Interpreting Sessions”). In most cases, this should be more than enough data to interpret. Note that you cannot select a single process before you start tracing using the standard pop-up menu (from “Main Window”); instead, if you only want to see a trace for a specific process or thread, you must narrow the scope of the traced events after the session is recorded (see “Interpreting Sessions”).

In order to allow direct correlation of system events to your application’s code, Shark records the following information with each event:

In the course of profiling your application, it may become necessary to trim or expand the number of events recorded. Most of the typical options are tunable by displaying the Mini Config Editor, depicted in Figure 3-2.

Figure 3-2  System Trace Mini Config Editor

System Trace Mini Configuration Editor

The Mini Config Editor adds the following profiling controls to the main Shark window:

  1. Start Delay— Amount of time to wait after the user selects “Start” before data collection actually begins.

  2. Time Limit— The maximum amount of time to record samples. This is ignored if Windowed Time Facility is enabled, or if Sample Limit is reached before the time limit expires.

  3. Sample Limit — The maximum number of samples to record. Specifying a maximum of N samples will result in at most N samples being taken, even on a multi-processor system, so this should be scaled up as larger systems are sampled. On the other hand, you may need to reduce the sample limit if Shark runs out of memory when you attempt to start a system trace, because it must be able to allocate a buffer in RAM large enough to hold this number of samples. When the sample limit is reached, data collection automatically stops, unless the Windowed Time Facility is enabled (see below). The Sample Limit is always enforced, and cannot be disabled.

  4. Windowed Time Facility— If enabled, Shark will collect samples until you explicitly stop it. However, it will only store the last N samples, where N is the number entered into the Sample Limit field. This mode is described in “Windowed Time Facility (WTF).”

If the user-level callstacks associated with each system event are of no interest to you, it is possible to disable their collection from within the Plugin Editor for the System Trace Data Source (see “System Trace Data Source PlugIn Editor”), further reducing the overhead of system tracing. With callstack recording disabled, Shark will still record the instruction pointer associated with each event, but will not record a full callstack. Since most system calls come from somewhere within library code, you may lose some ability to relate system events to your code with callstack recording disabled.

Out of memory errors?: If you see these when starting a system trace, then just reduce the Sample Limit value until Shark is able to successfully allocate a buffer for itself.

Interpreting Sessions

Upon opening a System Trace session, Shark will present you with three different views, each in a separate tab. Each viewer has different strengths and typical usage:

All views show trace events from a configurable scope of your System Trace. The default scope is the entire system, but you can also focus the session view on a specific process, and even a specific thread within that process. Independently, you can choose to only view events from a single CPU. For example, when focusing on CPU 1 and thread 2, you will see only the events caused by thread 2 that also occurred on CPU 1. The current settings for the scope are set using the three popup menus at the bottom of all System Trace session windows, which select process, thread, and CPU, respectively.

Summary View In-depth

The Summary View is the starting point for most types of analysis, and is shown in Figure 3-3. Its most salient feature is a pie chart that gives an overview of where time was spent during the session. Time is broken down between user, system call, virtual memory fault, interrupt, idle, and other kernel time.

Figure 3-3  Summary View

Summary View

Underneath the pie chart, there are individual summaries of the various event types. Each of these is discussed in turn in the following subsections.

Scheduler Summary

The Scheduler Summary tab, shown in Figure 3-4, summarizes the overall scheduling behavior of the threads running in the system during the trace. Each thread is listed in the outline underneath its owning process, as shown at (1). To the left of each thread’s name, Shark displays the number of run intervals of that thread (or all threads within a process) that it recorded in the course of this session.

Figure 3-4  Summary View: Scheduler

Summary View: Scheduler

The Total, Avg, Min and Max columns list the total, average, minimum, and maximum values for the selected metric. A popup button below the outline (2) lists the supported metrics:

Information about how your application’s threads are being scheduled can be used to verify that what is actually happening on the system matches your expectations. Because the maximum time that a thread can run before it is suspended to run something else (the maximum quantum or time slice) is 10ms on Mac OS X, you can expect that a CPU-bound thread will generally be scheduled for several milliseconds per thread tenure. If you rewrite your CPU-bound, performance-critical code with multiple threads, and System Trace shows that these threads are only running for very short intervals (on the order of microseconds), this may indicate that the amount of work given to any worker thread is too small to amortize the overhead of creating and synchronizing your threads, or that there is a significant amount of serialization between these threads.

Note on Thread IDs: Thread IDs on Mac OS X are not necessarily unique across the duration of a System Trace Session. The Thread IDs reported by the kernel are not static, single use identifiers - they are actually memory addresses. When you destroy a thread, and then create a new one immediately thereafter, there is a very high probability that the new thread will have the same thread ID as the one you just destroyed. This can lead to some confusion when profiling multithreaded applications that create and destroy many threads.

There are a couple of distinct ways to avoid this confusion. Your first option is to park your threads when you are profiling; if you don’t let any new threads exit for the duration of profiling, it is not possible to get duplicate thread IDs. A better option is to utilize a work queue model: create enough threads to fully populate the processors in your system, and instead of destroying the threads when they are out of work, put them on a work queue to be woken up later with more work.

System Calls Summary

The System Calls Summary tab, shown in Figure 3-5, allows you to see the breakdown of system call time for each specific system call. Because System Trace normally records the user callstack for each system call, you can use this view to correlate system call time (and other metrics) directly to the locations in your application’s code that make system calls. This view is quite similar to Time Profile’s standard profile view, described in “Profile Browser.” In fact, many of the same tools are available. For example, the system call profile can be viewed as either a heavy or tree view (selected using the popup menu at the bottom). Similarly, data mining can be used to simplify complex system call callstacks (see “Data Mining”).

More settings for modifying this display are available in the Advanced Settings drawer, and are described in “Summary View Advanced Settings.”

Figure 3-5  Summary View: System Calls

Summary View: System Calls

Note on System Trace callstacks: In rare cases, it is not possible for System Trace to accurately determine the user callstack for the currently active thread. In this case, it may just copy the callstack from the previous sample. While it occurs so rarely that it is usually not a problem, this “interpolation” can occasionally result in bad callstack information. As a result, you should carefully analyze Shark’s system trace callstacks when the callstack information seems unusual or impossible.

Virtual Memory (VM) Faults Summary

The VM Faults Summary tab, depicted in Figure 3-6, allows you to see what code is causing virtual memory faults to occur. The purpose of this view is to help you find behavior that is normally transparent to software, and is obscured in a statistical time profile. Functionally, it acts just like the System Calls Summary tab.

By default, virtual memory faults are broken down by type:

In some cases, virtual memory faults can represent a significant amount of overhead. For example, if you see a large amount of time being spent in zero fill faults, and correlate it to repeated allocation and subsequent deallocation of temporary memory in your code, you may be able to instead reuse a single buffer for the entire loop, reallocating it only when more space is needed. This type of optimization is especially useful in the case of very large (multiple page) allocations.

More settings for modifying this display are available in the Advanced Settings drawer, and are described in “Summary View Advanced Settings.”

Figure 3-6  Summary View: VM Faults

Summary View: VM Faults

Summary View Advanced Settings

When you are viewing the System Calls Summary and VM Faults Summary tabs, several options are available in the Advanced Settings drawer (see “Advanced Settings Drawer”), as seen in Figure 3-7:

  1. Weight By Popup— the summary can construct the profile of system calls using several metrics. The following metrics can be selected:

    • CPU Time— Total system call or fault time spent actively running on a processor

    • Wait Time— Total system call or fault time spent blocked waiting on a resource or event

    • Total Time— Total system call or fault time, including both CPU and waiting time

    • Count— Total system call or fault count

    • Size— (VM fault tab only) Total number of bytes faulted, since a single fault can sometimes affect more than one page

  2. Display Popup— This lets you choose between having the Self and Total columns display the raw Value for the selected metric or the percent of total system call or fault time. You can also change the columns between these modes individually by double clicking on either column.

  3. Group by System Call/VM Fault Type— By default, the metrics are listed for each of the different types of system calls or faults in the trace. You can see summary statistics for all system calls or faults by deselecting this.

  4. Callstack Data Mining— The System Call and VM Fault summaries support Shark’s data mining options, described in “Data Mining,” which can also be used to customize the presentation of the data.

Figure 3-7  Summary View Advanced Settings Drawer

Summary View Advanced Settings Drawer

Trace View In-depth

The Trace View lists all of the events that occurred in the currently selected scope. Because events are most commonly viewed with “System” scope (all processes and all CPUs), each event list has a Process and a Thread column describing the execution context in which it took place.

As with the Summary View, the Trace View is sub-divided according to the class of event. The events in each event class (scheduler events, system calls, and VM Faults) are separately indexed, starting from 0. Each tab in the Trace View lists the index of the event (specific to each event class) in the Index Column.

Each tab in the Trace View supports multi-level (hierarchical) sorting of the event records, based on the order that you click column header row cells. This provides an extremely flexible means for searching the event lists. For example, clicking on the “CPU Time” column title will immediately sort by CPU Time. If you then click on the “Name” column title Shark will group events by Name, and then within each group of identically named events it will sort secondarily by CPU Time.

You may click on events in the table to select them. As with most Mac tables, you may Shift-click to extend a contiguous selection or Command-click to select discontiguous events. Below the main Trace View table, Shark presents a line summarizing key features of the selected trace event(s). This is particularly convenient if you select multiple events at once, because Shark will automatically add up key features of the selected events together and present the totals here.

You should note that double-clicking on any event in the trace list will jump to that event in the Timeline View. This is a helpful way to go directly to a spot of interest in a System Trace, because you can reliably scroll to the same point by double-clicking on the same trace element.

The remainder of this section will examine the three different tabs in the trace view window.

Scheduler Trace

The Scheduler Trace Tab, shown in Figure 3-8, lists the intervals that threads were running on the system. The meanings of the columns are as follows:

Figure 3-8  Trace View: Scheduler

Trace View: Scheduler

System Call Trace

The System Call Trace Tab, shown in Figure 3-9, lists the system call events that occurred during the trace. In most respects, this Tab behaves much like the scheduler tab described previously, but it does have a couple of new features.

You can inspect the first five integer arguments to each system call by selecting an entry in the table and looking at the arg fields near the bottom of the window.

The current user callstack is recorded for each system call. You can toggle the display of the Callstack Table by clicking the Callstack Button button in the lower right corner of the trace table. When visible, the Callstack Table displays the user callstack for the currently selected system call entry.

The meanings of the columns are as follows:

Figure 3-9  Trace View: System Calls

Trace View: System Calls

VM Fault Trace

The VM Fault Trace Tab, illustrated in Figure 3-10, lists the virtual memory faults that occurred. The current user callstack, if any, is recorded for each VM fault.

You can toggle the display of the Callstack Table, which displays the user callstack for the currently selected VM fault entry, by clicking the Callstack Button button in the lower right corner of the trace table.

The columns in the trace view have the following meanings in this tab:

Figure 3-10  Trace View: VM Faults

Trace View: VM Faults

Timeline View In-depth

The Timeline View, displayed in Figure 3-11, allows you to visualize a complete picture of system events and threading behavior in detail, instead of just summaries. Each row in the timeline corresponds to a traced thread, with the horizontal axis representing time. At a glance, you can see when and why the threads in your system start and stop, and how they interact with the system when they are running.

Session timelines are initially shown in their entirety. Because a typical session has thousands or millions of events in it, this initially means that you can only see a broad overview of when threads are running. Events such as system calls and VM faults are labeled with icons, but these icons are automatically hidden if there is insufficient space to display them, as is usually the case when the trace is completely zoomed out. Hence, to see and examine these events, you will need to zoom in on particular parts of the timeline display and then maneuver around to see other parts of the display. There are 3 main ways to perform this navigation:

If you have too many threads, the Timeline View allows you to limit the scope of what is displayed using the standard scope-control menus described in “Interpreting Sessions,” at the bottom of the window. There are also many options for filtering out events in the Advanced Settings drawer, as described in “Timeline View Advanced Settings.”

Figure 3-11  Timeline View

Timeline View

Thread Run Intervals

Each time interval that a thread is actively running on a CPU is a thread run interval. Thread run intervals are depicted as solid rectangles in the Timeline View, as is shown in Figure 3-12, with lines depicting context switches joining the ends of the two threads running before and after each context switch.

Figure 3-12  Timeline View: Thread Run Intervals

Timeline View: Thread Run Intervals

Thread interval lines can be colored according to several metrics, including the CPU on which the thread ran, its dynamic priority, and the reason the thread was switched out. See “Timeline View Advanced Settings” for more information on how to control this.

You can inspect any thread run interval in the Timeline View by clicking on it. The inspector (see Figure 3-13) will indicate the amount of time spent running in user mode and supervisor mode, the reason the thread was switched out, and its dynamic priority.

Figure 3-13  Thread Run Interval Inspector

Thread Run Interval Inspector

There are five basic reasons a thread will be switched out by the system to run another thread:

System Calls

System calls represent explicit work done on behalf of the calling process. They are a secure way for user-space applications to employ kernel APIs. On Mac OS X, these APIs can be divided into four groups:

Figure 3-14  Timeline View: System Calls

Timeline View: System Calls

Calls from all of these groups are visible in Figure 3-14. Clicking on the icon for a system call will bring up the System Call Inspector, as seen in Figure 3-15. The resulting inspector displays many useful pieces of information which you can use to correlate the system call to you application’s code. The listed information includes:

Figure 3-15  System Call Inspector

System Call Inspector

VM Faults

As is the case with almost all modern operating systems, Mac OS X implements a virtual memory system. Virtual memory works by dividing up the addressable space (typically 4GB on a 32-bit machine, currently 256 TB on 64-bit machines) into pages (typically 4 KB in size). Pages are brought into available physical memory from a backing store (typically a hard disk) on demand through the page fault mechanism. In addition to efficiently managing a system’s available physical memory, this added level of indirection provided by the virtual to physical address mapping allows for memory protection, shared memory, and other modern operating system capabilities. There are five virtual memory events on Mac OS X, all of which are faults (running code is interrupted to handle them the first time the page is touched) except for page outs, which are completed asynchronously.

Figure 3-16  Timeline View: VM Faults

Timeline View: VM Faults

Three of these types of faults are visible in Figure 3-16. A zero-fill fault is circled to highlight it. Clicking on a VM Fault Icon will bring up the VM Fault Inspector, as seen in Figure 3-17. This inspector functions much like the System Call Inspector, except instead of listing arguments and return values, the VM Fault Inspector lists the fault address, size, and — for code faults — the library in which it occurred.

Figure 3-17  VM Fault Inspector

VM Fault Inspector


Interrupts are asynchronous signals that external hardware devices use to communicate to the processor that they require servicing. Most are associated with I/O devices, and signal either that new data has been received by an input device or that an output device needs more data to send. However, there are also other sources of interrupts inside of the computer system, such as DMA controllers and clock timers.

Here is an interrupt icon: Interrupt

Because interrupts occur asynchronously, there is no correlation between the source of the interrupt and the thread being interrupted. As a result, and because most users of System Trace are primarily interested in examining the threading behavior of their own programs, the display of interrupt events in the Timeline View is disabled by default. See “Timeline View Advanced Settings” for instructions on how to enable these.

Clicking on an Interrupt icon will bring up the Interrupt Inspector. This inspector lists the amount of time the interrupt consumed, broken down by CPU and wait time.

Figure 3-18  Interrupt Inspector

Interrupt Inspector

Sign Posts

You can often get a good idea of your application’s current state by inspecting the user callstacks associated with the built-in VM fault and system call events that occur in your application. But an even more precise technique is to instrument your application with Sign Post events at critical points in your code (see “Sign Posts” for instructions as to how you can do this). For example, if you are developing an application that operates on video frames, you can insert a sign post marker whenever the processing of a new frame starts.

Here is a sign post icon: Sign Post

Sign posts are displayed in the timeline alongside the other events. You can inspect each sign post by clicking its icon in the timeline. The inspector will indicate which thread it came from, its event name, the 4 user-specified ancillary data values for both the begin and the end point, and the associated user callstack (if any). If the sign post is an “interval” sign post, an under-bar will indicate its duration on the timeline, and the inspector will list the amount of time spent on the CPU and time spent Waiting between the begin and end event. Since you can supply different arguments to the start and end points of an interval sign post, the inspector supplies “Begin” and “End” tabs that display the arguments supplied to the start and end points, respectively.

Figure 3-19  Sign Post Inspector

Sign Post Inspector

Timeline View Advanced Settings

While you are in Timeline View, several options are available in the Advanced Settings drawer (see “Advanced Settings Drawer”), as seen in Figure 3-20.

  1. Enable Thread Coloring— When enabled (the default), Shark attempts to color-code the threads in the timeline display using one of the algorithms selected below.

  2. Color By— Assuming that you do choose to use thread coloring, this menu allows you to choose from among several different color schemes that allow you to see different trends:

    • CPU— Each CPU is assigned a different color, so you can see both how a processor is bouncing from one thread to another or, alternately, how a thread is bouncing from one processor to another. This is the default, and usually the most useful coloration.

    • Priority— Thread priority is assigned a color on a gradient from deep blue-violet colors (low priority) to bright, hot yellows and oranges (high priority). Mac OS X changes thread priority dynamically, depending upon how much CPU time each thread gets, so this color scheme is useful to see the effects of these dynamic changes over time.

    • Reason— Colors each thread run tenure based on the reason that it ended. See “Thread Run Intervals” for a list of the possible reasons that are used to color-code thread tenures.

  3. Color Key— Displays the colors that Shark is currently using to display your threads.

  4. Draw Context Switch Lines— Check this to enable (default) or disable the thin gray lines that show context switches, linking the thread tenures before and after the switch that ran on the same CPU core.

  5. Detailed Event Icons— Deselecting this instructs Shark turn off the icons that identify the various types of VM faults and system calls and just replace them with generic “plain page” VM fault and “gray phone” system call icons.

  6. Label Events— These checkboxes allow you to enable or disable the display of event icons either entirely, by type group, or on an individual, type-by-type basis. For example, you can use them to enable interrupt icons or to remove icons for events, such as VM faults, that you may not be interested in at the present time.

Figure 3-20  Timeline View Advanced Settings Drawer

Timeline View Advanced Settings Drawer

Sign Posts

Even with all of the system-level instrumentation already included in Mac OS X, you may sometimes find that it is helpful or even necessary to further instrument your code. Whether to orient yourself within a long trace, or to time certain operations, Sign Posts can be inserted in your code to accomplish these and other tasks.

Shark supports two types of Sign Posts:

Point events can be used to orient yourself within a larger trace. For example, if you are developing an application that operates on video frames, you can insert a sign post marker whenever the processing of a new frame has begun. Interval events can be used to time operations, such as the length a particular lock is held, or how long it takes to decode a frame of video.

In order to use Sign Posts, you must first make Shark aware of your Sign Posts’ definitions. Do so by creating a Sign Post File. In your home directory, Shark creates a folder in which you can place Sign Post Files: ~/Library/Application Support/Shark/KDebugCodes. A Sign Post File is any file that contains specially formatted lines which associate a code value with a Sign Post name. This name can be anything you like. To create a Sign Post File, simply create a new text file in the above directory and edit it, adding one Sign Post definition per line. Each line should only contain a hexadecimal value followed by an event name, as illustrated in the following example:

Listing 3-1  ~/Library/Application Support/Shark/KDebugCodes/myFirstSignPosts

0x31 LoopTimer
0x32 LockHeld

Sign Post values can take any value from 0x0 to 0x3FFF, inclusive.

Note on changes to Sign Post Files: If you created your Sign Post File while Shark was running, you might need to relaunch Shark for your new Sign Posts to appear in a System Trace. If you’ve already taken a System Trace that generated Sign Posts, but no Sign Posts are displayed in the Viewers, then save your session and relaunch Shark with your Sign Post Definition file(s) in place.

Once you’ve added your Sign Post File(s) to the KDebugCodes directory, you can add Sign Posts to your code. There are two ways to accomplish this, depending on where your code runs:

In both cases, the caller can record up to 4 values of user-defined data with each Sign Post that will be recorded and displayed with the session.

The Sign Post type must be one of the following:

When using interval Sign Posts, the start and end points will be coalesced into one Sign Post with a duration equal to the elapsed time between the two events. You must ensure that the same code value is given for both the start and end points of the interval Sign Post. It is possible to “nest” sign posts - just be sure you match the code value for each start and end point.

The example uses the Sign Post defined in the above Sign Post File to create an interval Sign Post that times a loop in a user-space application:

Listing 3-2  signPostExample.c

#include <CHUD/CHUD.h>
#include <stdint.h>
/* This corresponds to the sign post defined above, LoopTimer */
#define LOOP_TIMER 0x31
#define ITERATIONS 1000
uint32_t ii;
/* The last 4 arguments are user-defined, ancillary data */
chudRecordSignPost(LOOP_TIMER, chudBeginIntervalSignPost, ITERATIONS, 0, 0, 0);
for(ii = 0; ii < ITERATIONS; ii++) {
/* notice that the code value used here matches that used above */
chudRecordSignPost(LOOP_TIMER, chudEndIntervalSignPost, 0, 0, 0, 0);

Note: To compile the above example, you’ll need to instruct gcc to use the CHUD.framework:

gcc -framework CHUD -F/System/Library/PrivateFrameworks signPostExample.c

To accomplish the same task in a kernel extension, use kernel_debug() as follows:

Listing 3-3  testKernelSignPost.c

#include <sys/kdebug.h>
#define LOOP_TIMER 0x31
#define ITERATIONS 1000
uint32_t ii;
 * Use the kernel_debug() method when in the kernel (arg5 is unused),
 * DBG_FUNC_START corresponds to chudBeginIntervalSignPost.
    ITERATIONS, 0, 0, 0, 0);
for(ii = 0; ii < ITERATIONS; ii++) {
/* remember to use the same debugid value, with DBG_FUNC_END */
    0, 0, 0, 0, 0);

You should note that when using Sign Posts in the kernel, it is not necessary to add CHUD to the list of linked frameworks. Adding the above code to your drivers will cause Sign Posts to be created in the System Trace session without it. Similar code using the syscall(SYS_kdebug_trace,... invocation instead of kernel_debug does exactly the same thing, but works from user code, instead.

Tips and Tricks

This section will list common things to look for in a System Trace, what they may mean, and how to improve your application’s code using the information presented. The tips and tricks listed herein are organized according to the view most commonly used to infer the associated behavior.

< Previous PageNext Page > Hide TOC

© 2008 Apple Inc. All Rights Reserved. (Last updated: 2008-04-14)

Did this document help you?
Yes: Tell us what works for you.
It’s good, but: Report typos, inaccuracies, and so forth.
It wasn’t helpful: Tell us what would have helped.