The Cocoa text system is responsible for the processing and display of all visible text in Cocoa. It provides a complete set of high-quality typographical services through the Application Kit classes. This article defines typographical concepts relevant to the text system.
Characters and Glyphs
Typefaces and Fonts
Text Layout
A character is the smallest unit of written language that carries meaning. Characters can correspond to a particular sound in the spoken form of the language, as do the letters of the Roman alphabet; they can represent entire words, such as Chinese ideographs; or they can represent independent concepts, such as mathematical symbols. In every case, however, a character is an abstract concept.
Although characters must be represented in a display area by a recognizable shape, they are not identical to that shape. That is, a character can be drawn in various forms and remain the same character. For example, an “uppercase A” character can be drawn with a different size or a different stroke thickness, it can lean or be vertical, and it can have certain optional variations in form, such as serifs. Any one of these various concrete forms of a character is called a glyph. Figure 1 shows different glyphs that all represent the character “uppercase A.”
Characters and glyphs do not have a one-to-one correspondence. In some cases a character may be represented by multiple glyphs, such as an “é” which may be an “e” glyph combined with an acute accent glyph “´”. In other cases, a single glyph may represent multiple characters, as in the case of a ligature, or joined letter. Figure 2 shows individual characters and the single-glyph ligature often used when they are adjacent. A ligature is an example of a contextual form in which the glyph used to represent a character changes depending on the characters next to it. Other contextual forms include alternate glyphs for characters beginning or ending a word.
Computers store characters as numbers mapped by encoding tables to their corresponding characters. The encoding scheme native to Mac OS X is called Unicode. The Unicode standard provides a unique number for every character in every modern written language in the world, independent of the platform, program, and programming language being used. This universal standard solves a longstanding problem of different computer systems using hundreds of conflicting encoding schemes. Unicode provides the encoding used to store characters in Cocoa. It also has features that simplify handling bidirectional text and contextual forms.
Glyphs are also represented by numeric codes called glyph codes. The glyphs used to depict characters are selected by the Cocoa layout manager (NSLayoutManager) during composition and layout processing. The layout manager determines which glyphs to use and where to place them in the display, or view. The layout manager caches the glyph codes in use and provides methods to convert between characters and glyphs and between characters and view coordinates. (See “Text Layout” for more information about the layout process.)
A typeface is a set of visually related shapes for some or all of the characters in a written language. For example, Times is a typeface, designed by Stanley Morrison in 1931 for The Times newspaper of London. All of the letter forms in Times are related in appearance, having consistent proportions between stems (vertical strokes) and counters (rounded shapes in letter bodies) and other elements. When laid out in blocks of text, the shapes in a typeface work together to enhance readability.
A typestyle, or simply style, is a distinguishing visual characteristic of a typeface. For example, roman typestyle is characterized by upright letters having serifs and stems thicker than horizontal lines. In italic typestyle, the letters slant to the right and are rounded, similar to cursive or handwritten letter shapes. A typeface usually has several associated typestyles.
A font is a series of glyphs depicting the characters in a consistent size, typeface, and typestyle. A font is intended for use in a specific display environment. Fonts contain glyphs for all the contextual forms, such as ligatures, as well as the normal character forms.
A font family is a group of fonts that share a typeface but differ in typestyle. So, for example, Times is the name of a font family (as well as the name of its typeface). Times Roman and Times Italic are the names of individual fonts belonging to the family. Figure 3 shows several of the fonts in the Times font family.
In some cases a particular style of a typeface is a separate font, but in other cases where such a font is not available, the system can produce the typestyle programmatically by varying certain characteristics of another font in the family. For example, an algorithm can create a bold typestyle by increasing the line weight of the regular font of the same family. The Cocoa text system can substitute these variations as necessary. Styles, also called traits, that are available in Cocoa include variations such as bold, italic, condensed, expanded, narrow, small caps, poster fonts, and fixed pitch.
Text layout is the process of arranging glyphs on a display device, in an area called a text view, which represents an area similar to a page in traditional typesetting. The way in which glyphs are laid out relative to each other is called text direction. In English and other languages derived from Latin, glyphs are placed side by side to form words that are separated by spaces. Words are laid out in lines beginning at the top left of the text view proceeding from left to right until the text reaches the right side of the view. Text then begins a new line at the left side of the view under the beginning of the previous line, and layout proceeds in the same manner to the bottom of the text view.
In other languages, glyph layout can be quite different. For example, some languages lay out glyphs from right to left or vertically instead of horizontally. It is common, especially in technical writing, to mix languages with differing text direction, such as English and Hebrew, in the same line. Some writing systems even alternate layout direction in every other line (an arrangement called boustrophedonic writing). Some languages do not group glyphs into words separated by spaces. Moreover, some applications call for arbitrary arrangements of glyphs; a graphic layout may require glyphs to be arranged on a nonlinear path.
The Cocoa layout manager (an instance of the NSLayoutManager class) lays out glyphs along an invisible line called the baseline. In Roman text, the baseline is horizontal, and the bottom edge of most of the glyphs rest on it. Some glyphs extend below the baseline, including those for characters like “g” that have descenders, or “tails,” and large rounded characters like “O” that must extend slightly below the baseline to compensate for optical effects. Other writing systems place glyphs below or centered on the baseline. Every glyph includes an origin point that the layout manager uses to align it properly with the baseline.
Glyph designers provide a set of measurements with a font, called metrics, which describe the spacing around each glyph in the font. The layout manager uses these metrics to determining glyph placement. In horizontal text, the glyph has a metric called the advance width, which measures the distance along the baseline to the origin point of the next glyph. Typically there is some space between the origin point and the left side of the glyph, which is called the left-side bearing. There may also be space between the right side of the glyph and the point described by the advance width, which is called the right-side bearing. The vertical dimension of the glyph is provided by two metrics called the ascent and the descent. The ascent is the distance from the origin (on the baseline) to the top of the tallest glyphs in the font. The descent, which is the distance below the baseline to the bottom of the font’s deepest descenders. The rectangle enclosing the visible parts of the glyph is called the bounding rectangle or bounding box. Figure 4 illustrates these metrics and similar ones used for vertical glyph placement.
By default, typesetters place glyphs side-by-side using the advance width, resulting in a standard interglyph space. However, in some combinations text is made more readable by kerning, which is shrinking or stretching the space between two glyphs. A very common example of kerning occurs between an uppercase W and uppercase A, as shown in Figure 5. Type designers include kerning information in the metrics for a font. The Cocoa text system provides methods to turn kerning off, use the default settings provided with the font, or tighten or loosen the kerning throughout a selection of text.
Type systems usually measure font metrics in units called points, which in Mac OS X measure exactly 72 per inch. Adding the distance of the ascent and the descent of a font provides the font’s point size.
Space added during typesetting between lines of type is called leading, after the slugs of lead used for that purpose in traditional metal-type page layout. (Leading is sometimes also called linegap.) The total amount of ascent plus descent plus leading provides a font’s line height.
Although the typographic concepts of type design are somewhat esoteric, most people who have created documents on a computer or typewriter are familiar with the elements of text layout on a page. For example, the margins are the areas of white space between the edges of the page and the text area where the layout engine places glyphs. Alignment describes the way text lines are placed relative to the margins. For example, horizontal text can be aligned right, left, or centered, as shown in Figure 6.
Lines of text can also be justified; for horizontal text the lines are aligned on both right and left margins, as shown in Figure 7.
To create lines from a string of glyphs, the layout engine must perform line breaking by finding a point at which to end one line and begin the next. In the Cocoa text system, you can specify line breaking at either word or glyph boundaries. In Roman text, a word broken between glyphs requires insertion of a hyphen glyph at the breakpoint. Once the text stream has been broken into lines, the system performs alignment and justification, if requested.
© 1997, 2009 Apple Inc. All Rights Reserved. (Last updated: 2009-04-08)