< Previous PageNext Page > Hide TOC

File Encodings and Fonts

Unicode is generally considered the native encoding for Mac OS X and should be used in nearly all situations. Previous versions of Mac OS supported file encodings such as MacRoman but most modern Mac OS X libraries support Unicode inherently. If you use Cocoa or Core Foundation routines, then you will probably never need to worry about other file encodings. If your software supports legacy file formats, however, you might need to consider file encoding issues when importing legacy file formats. The following sections describe some of the issues related to Unicode support and legacy file encodings.

Contents:

File Systems and Unicode Support
Getting Canonical Strings
Carbon and QuickDraw Issues
Cocoa Issues


File Systems and Unicode Support

Different file systems in Mac OS X have different levels of Unicode support:

Locking the canonical decomposition to a particular version of Unicode does not exclude usage of characters defined in a newer version of Unicode. Because the Unicode consortium has guaranteed not to add any more precomposed characters, applications can expect to store characters defined in future versions of Unicode without compatibility issues.

Note: Because of implementation differences, erroneous Unicode in filenames on HFS+ volumes may display correctly when entered on Mac OS 9 but appear garbled on Mac OS X. Similarly, erroneous Unicode entered on Mac OS X may appear garbled in Mac OS 9.

All BSD system functions expect their string parameters to be in UTF-8 encoding and nothing else. Code that calls BSD system routines should ensure that the contents of all const *char parameters are in canonical UTF-8 encoding. In a canonical UTF-8 string, all decomposable characters are decomposed; for example, é (0x00E9) is represented as e (0x0065) + ´ (0x0301). To put things into a canonical UTF-8 encoding, use the “file-system representation” interfaces defined in Cocoa and Carbon (including Core Foundation).

Getting Canonical Strings

Both Cocoa and Core Foundation provide routines for accessing canonical and non-canonical Unicode strings. Cocoa string manipulations are all handled through the NSString class and its subclasses. In Core Foundation, you can use the CFStringGetCString and CFStringGetCStringPtr functions to obtain a C string with the desired encoding.

Carbon and QuickDraw Issues

If you have existing QuickDraw code and want to draw text, you should be aware that the QuickDraw Text routines do not directly support Unicode. The Carbon File Manager has some file-system calls that return Mac encodings and others that return Unicode. If you pass this Unicode text directly to a QuickDraw routine, you may run into problems. Similarly, if you retrieve text in a Mac encoding and want to use it with Cocoa or with Carbon’s Apple Type Services for Unicode Imaging (ATSUI) API, you must convert the text to Unicode first.

Generally, the encoding that is used depends upon the API you use and not on the font. Fonts are not necessarily limited to particular encodings. TrueType fonts, for example, declare the set of glyphs they implement and provide encoding tables that map those glyphs to character values in particular encodings. PostScript fonts have similar encoding tables. Various parts of the operating system know how to map characters from one encoding to another. Cocoa and ATSUI use Unicode as the “destination” mapping for a font. QuickDraw Text in Carbon uses the Mac encodings, selected according to the script that the ‘FOND’ resource of the font corresponds to.

The fonts that are installed with Mac OS X have large character sets supporting a wide range of encodings and scripts. For example, Lucida, the system font, supports extended Latin, Greek, Cyrillic, Arabic, Hebrew, and Thai. But if you draw text through QuickDraw Text, you have access only to the MacRoman repertoire. To access the rest, you must use Cocoa or ATSUI. Similarly, the Hiragino fonts also have a large repertoire of characters beyond that supported by MacJapanese, and these are accessible only through Cocoa or ATSUI. Both Cocoa and ATSUI also substitute glyphs from other fonts when the requested one isn't available; however, their algorithms for font substitution are different.

For information on file encodings in the context of multiscript support, see “Guidelines for Adding MultiScript Support.”

Cocoa Issues

Cocoa employs Unicode for character encoding, making any Cocoa application capable of displaying most human languages. Although Cocoa supports vertical and bidirectional text, the NSTypesetter class only supports layout for horizontal text. If you want to lay out vertical text, you need to define your own custom typesetter class.



< Previous PageNext Page > Hide TOC


© 2003, 2009 Apple Inc. All Rights Reserved. (Last updated: 2009-01-06)


Did this document help you?
Yes: Tell us what works for you.
It’s good, but: Report typos, inaccuracies, and so forth.
It wasn’t helpful: Tell us what would have helped.