Important: The information in this document is obsolete and should not be used for new development.
A recent meeting on character sets organized by the Internet Architecture Board proposed a 7-layer architectural model for the transmission of text data. The first three layers are required for specifying the content of a transmitted text stream "on the wire"; higher layers specify language, locale, and so forth. As specified in the minutes of that meeting, the first three layers are
Other documents offer slightly different definitions of characteristics of a CCS, for example, a repertoire of abstract characters, range of numbers, and a mapping from numbers to characters (not necessarily invertible). Each of the integers in the set used to represent a CCS is called a code point.Note
The term integer is used in this appendix in its mathematical sense; that is, it does not refer to the integer size on a particular CPU. Also, the term octet is used here instead of byte because the latter has not always meant an 8-bit unit; octet is explicitly defined to be an ordered sequence of 8 bits considered as a unit (the term is from ISO character set standards.
A CES might be more accurately described as a mapping from a sequence of elements in one or more CCSs to a sequence of octets. This definition suggests that the mapping from a single CCS element to its representation in the CES does not fully characterize the CES, which may include additional octets to set or change state information.
A TES is usually used to send 8-bit data through a transport mechanism that is only safe for 7-bit data, and even then may perform special handling for certain 7-bit values.
This appendix frequently uses the shorter term character set to mean coded character set and character encoding or encoding scheme to encompass both character sets and more complex character encoding schemes.