Legacy Documentclose button

Important: The information in this document is obsolete and should not be used for new development.

Up Previous Next 

PATH 
Mac OS 8 and 9 Developer Documentation > Text Encoding Conversion Manager
Programming With the Text Encoding Conversion Manager



Character Data in Programming Languages

The C char type is supposed to be large enough to store any member of the execution character set. If a genuine character from that set is stored in a char object, its value is equivalent to the integer code for the character and is non-negative. The char type is also equivalent to a single byte and may be signed or unsigned (implementation dependent).

C does not actually define the size of a byte, so in principle a byte could be made large enough so a char would accommodate multi-octet characters and Unicode characters. However, in most implementations, bytes and char objects are 8 bits, and multi-octet characters require a sequence of char objects.

Instead, C provides the wide character or wchar_t type. This is really supposed to be large enough to hold the largest character in any extended execution set supported by the implementation ( including MBCS encodings). It permits internal processing using fixed-size characters; C library functions such as mbstowcs( ) and wcstombs() convert between SBCS/MBCS strings and wide character strings. However, the size of wchar_t is implementation specific; although it is usually 16 or 32 bits, on some implementations it is equivalent to char.

Java takes a different approach: Bytes remain 8 bits, but a Java char is a 16-bit unit intended to contain a Unicode character.

Finally, programming languages generally provide some abstraction away from encoding details. For example, the C character constant 'A' may have the value 0x41 for an ASCII-based implementation, but 0xC1 for an EBCDIC-based implementation. Nevertheless, programs may make more subtle assumptions about character encodings, such as assuming that A-Z have sequential contiguous code points (not true in EBCDIC).


© 1999 Apple Computer, Inc. – (Last Updated 13 Dec 99)

Up Previous Next