< Previous PageNext Page >

Hide TOC

Core Data Performance

In general, Core Data is very efficient. For many applications, an implementation that uses Core Data may be more efficient than a comparable application that does not. It is possible, however, to use the framework in such a way that its efficiency is reduced. This article describes how to get the most out of Core Data.

Introduction

Core Data is a rich and sophisticated object graph management framework capable of dealing with large volumes of data. The SQLite store can scale to terabyte sized databases with billions of rows/tables/columns. Unless your entities themselves have very large attributes (although see “Large Data Objects (BLOBs)”) or large numbers of properties, 10,000 objects is considered to be a fairly small size for a data set.

For a very simple application it is certainly the case that Core Data adds some overhead (compare a vanilla Cocoa document-based application with a Cocoa Core Data document-based application), however Core Data adds significant functionality. For a small overhead, even a simple Core Data-based application supports undo and redo, validation, object graph maintenance, and provides the ability to save objects to a persistent store. If you implemented this functionality yourself, if it quite likely that the overhead would exceed that imposed by Core Data. As the complexity of an application increases, so the proportionate overhead that Core Data imposes typically decreases while at the same time the benefit typically increases (supporting undo and redo in a large application, for example, is usually hard).

NSManagedObject uses an internal storage mechanism for data that is highly optimized. In particular, it leverages the information about the types of data that is available through introspecting the model. When you store and retrieve data in a manner that is key-value coding and key-value observing compliant, it is likely that using NSManagedObject will be faster than any other storage mechanism—including for the the simple get/set cases. In a modern Cocoa application that leverages Cocoa Bindings, given that Cocoa Bindings is reliant upon key-value coding and key-value observing it would be difficult to build a raw data storage mechanism that provides the same level of efficiency as Core Data.

Like all technologies, however, Core Data can be abused. Using Core Data does not free you from the need to consider basic Cocoa patterns, such as memory management. You should also consider how you fetch data from a persistent store. If you find that your application is not performing as well as you would like, you should use profiling tools such as Shark to determine where the problem lies (see Performance & Debugging).

Fetching Managed Objects

Each round trip to the persistent store (each fetch) incurs an overhead, both in accessing the store and in merging the returned objects into the persistence stack. You should avoid executing multiple requests if you can instead combine them into a single request that will return all the objects you require. You can also minimize the number of objects you have in memory.

Fetch Predicates

How you use predicates can significantly affect the performance of your application. If a fetch request requires a compound predicate, you can make the fetch more efficient by ensuring that the most restrictive predicate is the first, especially if the predicate involves text matching (contains, endsWith, like, and matches) since correct Unicode searching is slow. If the predicate combines textual and non-textual comparisons, then it is likely to be more efficient to specify the non-textual predicates first, for example (salary > 5000000) AND (lastName LIKE 'Quincey') is better than (lastName LIKE 'Quincey') AND (salary > 5000000). For more about creating predicates, see Predicate Programming Guide.

Fetch Limits

You can set a limit to the number of objects a fetch will return using the method setFetchLimit: as shown in the following example.

NSFetchRequest *request = [[[NSFetchRequest alloc] init] autorelease];

[request setFetchLimit:100];

If you are using the SQLite store, you can use a fetch limit to minimize the working set of managed objects in memory, and so improve the performance of your application.

If you do need to retrieve a large number of objects, you can make your application appear more responsive by executing two fetches. In the first fetch, you retrieve a comparatively small number of objects—for example, 100—and populate the user interface with these objects. You then execute a second fetch to retrieve the complete result set (that is, you execute a fetch without a fetch limit).

Note that there is no way to "batch" fetches (or in database terms, to set a cursor). That is, you cannot fetch the "first" 100 objects, then the second 100, then the third, and so on.

In general, however, you are encouraged to use predicates to ensure that you retrieve only those objects you require.

Faulting Behavior

Firing faults can be a comparatively expensive process (potentially requiring a round trip to the persistent store), and you may wish to avoid unnecessarily firing a fault. You can safely invoke the following methods on a fault without causing it to fire: isEqual:, hash, superclass, class, self, zone, isProxy, isKindOfClass:, isMemberOfClass:, conformsToProtocol:, respondsToSelector:, retain, release, autorelease, retainCount, description, managedObjectContext, entity, objectID, isInserted, isUpdated, isDeleted, and isFault.

Since isEqual and hash do not cause a fault to fire, managed objects can typically be placed in collections without firing a fault. Note, however, that invoking key-value coding methods on the collection object might in turn result in an invocation of valueForKey: on a managed object, which would fire a fault. In addition, although the default implementation of description does not cause a fault to fire, if you implement a custom description method that accesses the object’s persistent properties, this will cause a fault to fire.

Note that just because a managed object is a fault, it does not necessarily mean that the data for the object are not in memory—see the definition for isFault.

Batch Faulting and Pre-fetching with the SQLite Store

When you execute a fetch, Core Data fetches just instances of the entity you specify. In some situations (see “Faulting”), the destination of a relationship is represented by a fault. Core Data automatically resolves (fires) the fault when you access data in the fault. This lazy loading of the related objects is much better for memory use, and much faster for fetching objects related to rarely used (or very large) objects. It can also, however, lead to a situation where Core Data executes separate fetch requests for a number of individual objects.

If data for a fault has not been cached (see “Faulting”), Core Data performs a separate round trip to the persistent store for each fault fired. This incurs a comparatively high overhead. Since Core Data automatically resolves the fault when you access data in the fault, this can lead to an inefficient pattern where a Core Data executes fetch requests for a number of individual objects. For example, given a model:

image: ../Art/relationship_cardinality.gif

you might fetch a number of Employees and ask each in turn for their Department's name, as shown in the following code fragment.

// create fetch request for Employees -- includes predicate

// (if you don't have a predicate here, you should probably just fetch all the Departments...)

NSArray *fetchedEmployees = [moc executeFetchRequest:employeesFetch error:&error];

for (Employee *employee in fetchedEmployees)

    NSLog(@"%@ -> %@ department", employee.name, employee.department.name);

This might lead to the following behavior:

Jack -> Sales [fault fires]

Jill -> Marketing [fault fires]

Benjy -> Sales

Gillian -> Sales

Hector -> Engineering [fault fires]

Michelle -> Marketing

Here, there are four round trips to the persistent store (one for the original fetch of Employees, and three for individual Departments) which represents a considerable overhead on top of the minimum (two—one for each entity).

There are two techniques you can use to mitigate this effect—batch faulting and pre-fetching.

Batch faulting

You can batch fault a collection of objects by executing a fetch request using a predicate with an IN operator, as illustrated by the following example. (In a predicate, self represents the object being evaluated—see Predicate Format String Syntax.)

NSArray *array = [NSArray arrayWithObjects:fault1, fault2, ..., nil];

NSPredicate *predicate = [NSPredicate predicateWithFormat:@"self IN %@", array];

On Mac OS X v10.5, when you create a fetch request you can use the NSFetchRequest method setReturnsObjectsAsFaults: to ensure that managed objects are not returned as faults.

Pre-fetching

Pre-fetching is in effect a special case of batch-faulting, performed immediately after another fetch. The idea behind pre-fetching is the anticipation of future needs. When you fetch some objects, sometimes you know that soon after you will also need related objects which may be represented by faults. To avoid the inefficiency of individual faults firing, you can pre-fetch the objects at the destination.

On Mac OS X v10.5, you can use the NSFetchRequest method setRelationshipKeyPathsForPrefetching: to specify an array of relationship keypaths to prefetch along with the entity for the request. For example, given an Employee entity with a relationship to a Department entity: if you fetch all the employees then for each print out their name and the name of the department to which they belong, you can avoid the possibility of a fault being fired for each Department instance by prefetching the department relationship, as illustrated in the following code fragment:

NSManagedObjectContext *context = /* get the context */;

NSEntityDescription *employeeEntity = [NSEntityDescription

    entityForName:@"Employee" inManagedObjectContext:context];

NSFetchRequest *request = [[NSFetchRequest alloc] init];

[request setEntity:employeeEntity];

[request setRelationshipKeyPathsForPrefetching:

    [NSArray arrayWithObject:@"department"]];

On Mac OS X v10.4, you create a fetch request to fetch just those instances of the destination entity that are related to the source objects you just retrieved, this reduces the number of fetches to two (the minimum). How (or whether) you implement the pre-fetch depends on the cardinality of the relationship.

If the inverse relationship is a to-one, you can use a predicate with the format, @"%K IN %@" where the first argument is the key name for the inverse relationship, and the second argument an array of the original objects.
If the inverse relationship is a to-many, you first collect the object IDs from the faults you care about (being careful not touch other attributes). You then create a predicate with the format, @"SELF IN %@", where the argument is the array of object IDs.
If the relationship is a many-to-many, pre-fetching is not recommended.

You could implement pre-fetching for the department relationship in the previous example as follows.

NSEntityDescription *deptEntity = [NSEntityDescription entityForName:@"Department"

        inManagedObjectContext:moc];

NSArray *deptOIDs = [fetchedEmployees valueForKeyPath:@"department.objectID"];

NSPredicate *deptsPredicate = [NSPredicate predicateWithFormat:

        @"SELF in %@", deptOIDs];

NSFetchRequest *deptFetch = [[[NSFetchRequest alloc] init] autorelease];

[deptFetch setEntity:deptEntity];

[deptFetch setPredicate:deptsPredicate];

// execute fetch...

If you know something about how the data will be accessed or presented, you can further refine the fetch predicate to reduce the number of objects fetched. Note, though, that this technique can be fragile—if the application changes and needs a different set of data, then you can end up pre-fetching the wrong objects.

For more about faulting, and in particular the meaning of the value returned from isFault, see “Faulting and Uniquing.”

Reducing Memory Overhead

It is sometimes the case that you want to use managed objects on a temporary basis, for example to calculate an average value for a particular attribute. This causes your object graph, and memory consumption, to grow. You can reduce the memory overhead by re-faulting individual managed objects that you no longer need, or you can reset a managed object context to clear an entire object graph. You can also use patterns that apply to Cocoa programming in general.

You can re-fault an individual managed object using NSManagedObjectContext's refreshObject:mergeChanges: method. This has the effect of clearing its in-memory property values thereby reducing its memory overhead. (Note that this is not the same as setting the property values to nil—the values will be retrieved on demand if the fault is fired—see “Faulting and Uniquing.”)
On Mac OS X v10.5, when you create a fetch request you can set includesPropertyValues to NO to reduce memory overhead by avoiding creation of objects to represent the property values. You should typically only do so, however, if you are sure that either you will not need the actual property data or you already have the information in the row cache, otherwise you will incur multiple trips to the persistent store.
You can use NSManagedObjectContext's reset method to remove all managed objects associated with a context and "start over" as if you'd just created it. Note that any managed object associated with that context will be invalidated, and so you will need to discard any references to and re-fetch any objects associated with that context in which you are still interested.
Objects returned by fetching and other API are usually autoreleased as required by the Cocoa programming guidelines. If you iterate over a lot of objects, you may need to allocate and release your own autorelease pools to gain a finer-grain level of memory management.
If you do not intend to use Core Data's undo functionality, you can reduce your application's resource requirements by setting the context’s undo manager to nil. This may be especially beneficial for background worker threads, as well as for large import or batch operations.
Finally, Core Data does not by default retain managed objects (unless they have unsaved changes). If you have lots of objects in memory, you should determine why they are still retained. Managed objects do retain each other through relationships, which can easily create cycles. You can break retain cycles by re-faulting objects (again by using NSManagedObjectContext's refreshObject:mergeChanges: method).

Large Data Objects (BLOBs)

If your application uses large BLOBs ("Binary Large OBjects" such as image and sound data), you need to take care to minimize overheads. The exact definition of "small", "modest", and "large" is fluid and depends on an application's usage. A loose rule of thumb is that objects in the order of kilobytes in size are of a "modest" sized and those in the order of megabytes in size are "large" sized. Some developers have achieved good performance with 10MB BLOBs in a database. On the other hand, if an application has millions of rows in a table, even 128 bytes might be a "modest" sized CLOB (Character Large OBject) that needs to be normalized into a separate table.

In general, if you need to store BLOBs in a persistent store, you should use an SQLite store. The XML and binary stores require that the whole object graph reside in memory, and store writes are atomic (see “Persistent Store Features”) which means that they do not efficiently deal with large data objects. SQLite can scale to handle extremely large databases. Properly used, SQLite provides good performance for databases up to 100GB, and a single row can hold up to 1GB (although of course reading 1GB of data into memory is an expensive operation no matter how efficient the repository).

A BLOB often represents an attribute of an entity—for example, a photograph might be an attribute of an Employee entity. For small to modest sized BLOBs (and CLOBs), you should create a separate entity for the data and create a to-one relationship in place of the attribute. For example, you might create Employee and Photograph entities with a one-to-one relationship between them, where the relationship from Employee to Photograph replaces the Employee's photograph attribute. This pattern maximizes the benefits of object faulting (see “Faulting and Uniquing”). Any given photograph is only retrieved if it is actually needed (if the relationship is traversed).

It is better, however, if you are able to store BLOBs as resources on the filesystem, and to maintain links (such as URLs or paths) to those resources. You can then load a BLOB as and when necessary.

Analyzing Performance

Analyzing Fetch Behavior with SQLite

With Mac OS X version 10.4.3 and later, you can use the user default com.apple.CoreData.SQLDebug to log to stderr the actual SQL sent to SQLite. (Note that user default names are case sensitive.) For example, you can pass the following as an argument to the application:

-com.apple.CoreData.SQLDebug 1

Higher levels of debug numbers produce more information, although this is likely to be of diminishing utility.

The information the output provides can be useful when debugging performance problems—in particular it may tell you when Core Data is performing a large number of small fetches (such as when firing faults individually). The output differentiates between fetches that you execute using a fetch request and fetches that are performed automatically to realize faults.

Instruments

With Mac OS X version 10.5, you can use the Instruments application (by default in /Developer/Applications/) to analyze the behavior of your application. There are several Instruments probes specific to Core Data:

Core Data Fetches
Records invocations of executeFetchRequest:error:, providing information about the entity against which the request was made, the number of objects returned, and the time taken for the fetch.
Core Data Saves
Records invocations of save: and the time taken to do the save.
Core Data Faults
Records information about object and relationship fault firing. For object faults, records the object being faulted; for relationship faults, records the source object and the relationship being fired. In both cases, records the time taken to fire the fault.
Core Data Cache Misses
Traces fault behavior that specifically results in filesystem activity—indicating that a fault was fired for which no data was available—and records the time taken to retrieve the data.

All the instruments provide a stack trace for each event so that you can see what caused it to happen.

When analyzing your application, you should of course also take into account factors not directly related to Core Data, such as overall memory footprint, object allocations, use and abuse of other API such as the key-value technologies and so on.