Next Page > Hide TOC

NSCharacterSet Class Reference

Inherits from
Conforms to
Framework
/System/Library/Frameworks/Foundation.framework
Availability
Available in Mac OS X v10.0 and later.
Companion guide
Declared in
NSCharacterSet.h
Related sample code

Overview

An NSCharacterSet object represents a set of Unicode-compliant characters. NSString and NSScanner objects use NSCharacterSet objects to group characters together for searching operations, so that they can find any of a particular set of characters during a search. The cluster’s two public classes, NSCharacterSet and NSMutableCharacterSet, declare the programmatic interface for static and dynamic character sets, respectively.

The objects you create using these classes are referred to as character set objects (and when no confusion will result, merely as character sets). Because of the nature of class clusters, character set objects aren’t actual instances of the NSCharacterSet or NSMutableCharacterSet classes but of one of their private subclasses. Although a character set object’s class is private, its interface is public, as declared by these abstract superclasses, NSCharacterSet and NSMutableCharacterSet. The character set classes adopt the NSCopying and NSMutableCopying protocols, making it convenient to convert a character set of one type to the other.

The NSCharacterSet class declares the programmatic interface for an object that manages a set of Unicode characters (see the NSString class cluster specification for information on Unicode). NSCharacterSet’s principal primitive method, characterIsMember:, provides the basis for all other instance methods in its interface. A subclass of NSCharacterSet needs only to implement this method, plus mutableCopyWithZone:, for proper behavior. For optimal performance, a subclass should also override bitmapRepresentation, which otherwise works by invoking characterIsMember: for every possible Unicode value.

NSCharacterSet is “toll-free bridged” with its Cocoa Foundation counterpart, CFCharacterSet Reference. This means that the Core Foundation type is interchangeable in function or method calls with the bridged Foundation object. Therefore, in a method where you see an NSCharacterSet * parameter, you can pass a CFCharacterSetRef, and in a function where you see a CFCharacterSetRef parameter, you can pass an NSCharacterSet instance (you cast one type to the other to suppress compiler warnings). See Interchangeable Data Types for more information on toll-free bridging.

The mutable subclass of NSCharacterSet is NSMutableCharacterSet.

Adopted Protocols

NSCoding
NSCopying
NSMutableCopying

Tasks

Creating a Standard Character Set

Creating a Custom Character Set

Creating and Managing Character Sets as Bitmap Representations

Testing Set Membership

Class Methods

alphanumericCharacterSet

Returns a character set containing the characters in the categories Letters, Marks, and Numbers.

+ (id)alphanumericCharacterSet

Return Value

A character set containing the characters in the categories Letters, Marks, and Numbers.

Discussion

Informally, this set is the set of all characters used as basic units of alphabets, syllabaries, ideographs, and digits.

Availability
See Also
Declared In
NSCharacterSet.h

capitalizedLetterCharacterSet

Returns a character set containing the characters in the category of Titlecase Letters.

+ (id)capitalizedLetterCharacterSet

Return Value

A character set containing the characters in the category of Titlecase Letters.

Availability
See Also
Declared In
NSCharacterSet.h

characterSetWithBitmapRepresentation:

Returns a character set containing characters determined by a given bitmap representation.

+ (id)characterSetWithBitmapRepresentation:(NSData *)data

Parameters
data

A bitmap representation of a character set.

Return Value

A character set containing characters determined by data.

Discussion

This method is useful for creating a character set object with data from a file or other external data source.

A raw bitmap representation of a character set is a byte array of 2^16 bits (that is, 8192 bytes). The value of the bit at position n represents the presence in the character set of the character with decimal Unicode value n. To add a character with decimal Unicode value n to a raw bitmap representation, use a statement such as the following:

unsigned char bitmapRep[8192];
bitmapRep[n >> 3] |= (((unsigned int)1) << (n & 7));

To remove that character:

bitmapRep[n >> 3] &= ~(((unsigned int)1) << (n & 7));
Availability
See Also
Declared In
NSCharacterSet.h

characterSetWithCharactersInString:

Returns a character set containing the characters in a given string.

+ (id)characterSetWithCharactersInString:(NSString *)aString

Parameters
aString

A string containing characters for the new character set.

Return Value

A character set containing the characters in aString. Returns an empty character set if aString is empty.

Availability
Related Sample Code
Declared In
NSCharacterSet.h

characterSetWithContentsOfFile:

Returns a character set read from the bitmap representation stored in the file a given path.

+ (id)characterSetWithContentsOfFile:(NSString *)path

Parameters
path

A path to a file containing a bitmap representation of a character set. The path name must end with the extension .bitmap.

Return Value

A character set read from the bitmap representation stored in the file at path.

Discussion

To read a bitmap representation from any file, use the NSData methoddataWithContentsOfFile:options:error: and pass the result to characterSetWithBitmapRepresentation:.

This method doesn’t use filenames to check for the uniqueness of the character sets it creates. To prevent duplication of character sets in memory, cache them and make them available through an API that checks whether the requested set has already been loaded.

Availability
Declared In
NSCharacterSet.h

characterSetWithRange:

Returns a character set containing characters with Unicode values in a given range.

+ (id)characterSetWithRange:(NSRange)aRange

Parameters
aRange

A range of Unicode values.

aRange.location is the value of the first character to return; aRange.location + aRange.length– 1 is the value of the last.

Return Value

A character set containing characters whose Unicode values are given by aRange. If aRange.length is 0, returns an empty character set.

Discussion

This code excerpt creates a character set object containing the lowercase English alphabetic characters:

NSRange lcEnglishRange;
NSCharacterSet *lcEnglishLetters;
 
lcEnglishRange.location = (unsigned int)'a';
lcEnglishRange.length = 26;
lcEnglishLetters = [NSCharacterSet characterSetWithRange:lcEnglishRange];
Availability
Declared In
NSCharacterSet.h

controlCharacterSet

Returns a character set containing the characters in the categories of Control or Format Characters.

+ (id)controlCharacterSet

Return Value

A character set containing the characters in the categories of Control or Format Characters.

Discussion

These characters are specifically the Unicode values U+0000 to U+001F and U+007F to U+009F.

Availability
See Also
Related Sample Code
Declared In
NSCharacterSet.h

decimalDigitCharacterSet

Returns a character set containing the characters in the category of Decimal Numbers.

+ (id)decimalDigitCharacterSet

Return Value

A character set containing the characters in the category of Decimal Numbers.

Discussion

Informally, this set is the set of all characters used to represent the decimal values 0 through 9. These characters include, for example, the decimal digits of the Indic scripts and Arabic.

Availability
See Also
Declared In
NSCharacterSet.h

decomposableCharacterSet

Returns a character set containing all individual Unicode characters that can also be represented as composed character sequences.

+ (id)decomposableCharacterSet

Return Value

A character set containing all individual Unicode characters that can also be represented as composed character sequences (such as for letters with accents), by the definition of “standard decomposition” in version 3.2 of the Unicode character encoding standard.

Discussion

These characters include compatibility characters as well as pre-composed characters.

Note: This character set doesn’t currently include the Hangul characters defined in version 2.0 of the Unicode standard.

Availability
See Also
Declared In
NSCharacterSet.h

illegalCharacterSet

Returns a character set containing values in the category of Non-Characters or that have not yet been defined in version 3.2 of the Unicode standard.

+ (id)illegalCharacterSet

Return Value

A character set containing values in the category of Non-Characters or that have not yet been defined in version 3.2 of the Unicode standard.

Availability
See Also
Declared In
NSCharacterSet.h

letterCharacterSet

Returns a character set containing the characters in the categories Letters and Marks.

+ (id)letterCharacterSet

Return Value

A character set containing the characters in the categories Letters and Marks.

Discussion

Informally, this set is the set of all characters used as letters of alphabets and ideographs.

Availability
See Also
Declared In
NSCharacterSet.h

lowercaseLetterCharacterSet

Returns a character set containing the characters in the category of Lowercase Letters.

+ (id)lowercaseLetterCharacterSet

Return Value

A character set containing the characters in the category of Lowercase Letters.

Discussion

Informally, this set is the set of all characters used as lowercase letters in alphabets that make case distinctions.

Availability
See Also
Declared In
NSCharacterSet.h

newlineCharacterSet

Returns a character set containing the newline characters.

+ (id)newlineCharacterSet

Return Value

A character set containing the newline characters (U+000AU+000D, U+0085).

Availability
See Also
Declared In
NSCharacterSet.h

nonBaseCharacterSet

Returns a character set containing the characters in the category of Marks.

+ (id)nonBaseCharacterSet

Return Value

A character set containing the characters in the category of Marks.

Discussion

This set is also defined as all legal Unicode characters with a non-spacing priority greater than 0. Informally, this set is the set of all characters used as modifiers of base characters.

Availability
See Also
Declared In
NSCharacterSet.h

punctuationCharacterSet

Returns a character set containing the characters in the category of Punctuation.

+ (id)punctuationCharacterSet

Return Value

A character set containing the characters in the category of Punctuation.

Discussion

Informally, this set is the set of all non-whitespace characters used to separate linguistic units in scripts, such as periods, dashes, parentheses, and so on.

Availability
Declared In
NSCharacterSet.h

symbolCharacterSet

Returns a character set containing the characters in the category of Symbols.

+ (id)symbolCharacterSet

Return Value

A character set containing the characters in the category of Symbols.

Discussion

These characters include, for example, the dollar sign ($) and the plus (+) sign.

Availability
Declared In
NSCharacterSet.h

uppercaseLetterCharacterSet

Returns a character set containing the characters in the categories of Uppercase Letters and Titlecase Letters.

+ (id)uppercaseLetterCharacterSet

Return Value

A character set containing the characters in the categories of Uppercase Letters and Titlecase Letters.

Discussion

Informally, this set is the set of all characters used as uppercase letters in alphabets that make case distinctions.

Availability
See Also
Declared In
NSCharacterSet.h

whitespaceAndNewlineCharacterSet

Returns a character set containing only the whitespace characters space (U+0020) and tab (U+0009) and the newline and nextline characters (U+000AU+000D, U+0085).

+ (id)whitespaceAndNewlineCharacterSet

Return Value

A character set containing only the whitespace characters space (U+0020) and tab (U+0009) and the newline and nextline characters (U+000AU+000D, U+0085).

Availability
See Also
Related Sample Code
Declared In
NSCharacterSet.h

whitespaceCharacterSet

Returns a character set containing only the in-line whitespace characters space (U+0020) and tab (U+0009).

+ (id)whitespaceCharacterSet

Return Value

A character set containing only the in-line whitespace characters space (U+0020) and tab (U+0009).

Discussion

This set doesn’t contain the newline or carriage return characters.

Availability
See Also
Related Sample Code
Declared In
NSCharacterSet.h

Instance Methods

bitmapRepresentation

Returns an NSData object encoding the receiver in binary format.

- (NSData *)bitmapRepresentation

Return Value

An NSData object encoding the receiver in binary format.

Discussion

This format is suitable for saving to a file or otherwise transmitting or archiving.

A raw bitmap representation of a character set is a byte array of 2^16 bits (that is, 8192 bytes). The value of the bit at position n represents the presence in the character set of the character with decimal Unicode value n. To test for the presence of a character with decimal Unicode value n in a raw bitmap representation, use an expression such as the following:

unsigned char bitmapRep[8192];
if (bitmapRep[n >> 3] & (((unsigned int)1) << (n  & 7))) {
    /* Character is present. */
}
Availability
See Also
Declared In
NSCharacterSet.h

characterIsMember:

Returns a Boolean value that indicates whether a given character is in the receiver.

- (BOOL)characterIsMember:(unichar)aCharacter

Parameters
aCharacter

The character to test for membership of the receiver.

Return Value

YES if aCharacter is in the receiving character set, otherwise NO.

Availability
See Also
Declared In
NSCharacterSet.h

hasMemberInPlane:

Returns a Boolean value that indicates whether the receiver has at least one member in a given character plane.

- (BOOL)hasMemberInPlane:(uint8_t)thePlane

Parameters
thePlane

A character plane.

Return Value

YES if the receiver has at least one member in thePlane, otherwise NO.

Discussion

This method makes it easier to find the plane containing the members of the current character set. The Basic Multilingual Plane is plane 0.

Availability
Declared In
NSCharacterSet.h

invertedSet

Returns a character set containing only characters that don’t exist in the receiver.

- (NSCharacterSet *)invertedSet

Return Value

A character set containing only characters that don’t exist in the receiver.

Discussion

Inverting an immutable character set is much more efficient than inverting a mutable character set.

Availability
See Also
Declared In
NSCharacterSet.h

isSupersetOfSet:

Returns a Boolean value that indicates whether the receiver is a superset of another given character set.

- (BOOL)isSupersetOfSet:(NSCharacterSet *)theOtherSet

Parameters
theOtherSet

A character set.

Return Value

YES if the receiver is a superset of theOtherSet, otherwise NO.

Availability
Declared In
NSCharacterSet.h

longCharacterIsMember:

Returns a Boolean value that indicates whether a given long character is a member of the receiver.

- (BOOL)longCharacterIsMember:(UTF32Char)theLongChar

Parameters
theLongChar

A UTF32 character.

Return Value

YES if theLongChar is in the receiver, otherwise NO.

Discussion

This method supports the specification of 32-bit characters.

Availability
See Also
Declared In
NSCharacterSet.h

Constants

NSOpenStepUnicodeReservedBase

Specifies lower bound for a Unicode character range reserved for Apple’s corporate use.

enum {
   NSOpenStepUnicodeReservedBase = 0xF400
};

Constants
NSOpenStepUnicodeReservedBase

Specifies lower bound for a Unicode character range reserved for Apple’s corporate use (the range is 0xF400–0xF8FF).

Available in Mac OS X v10.0 and later.

Declared in NSCharacterSet.h.

Declared In
NSCharacterSet.h

Next Page > Hide TOC


© 2008 Apple Inc. All Rights Reserved. (Last updated: 2008-10-15)


Did this document help you?
Yes: Tell us what works for you.
It’s good, but: Report typos, inaccuracies, and so forth.
It wasn’t helpful: Tell us what would have helped.