Inherits from | |
Conforms to | |
Framework | /System/Library/Frameworks/Foundation.framework |
Availability | Available in Mac OS X v10.0 and later. |
Companion guide | |
Declared in | NSCharacterSet.h |
Related sample code |
An NSCharacterSet
object represents a set of Unicode-compliant characters. NSString
and NSScanner
objects use NSCharacterSet
objects to group characters together for searching operations, so that they can find any of a particular set of characters during a search. The cluster’s two public classes, NSCharacterSet
and NSMutableCharacterSet
, declare the programmatic interface for static and dynamic character sets, respectively.
The objects you create using these classes are referred to as character set objects (and when no confusion will result, merely as character sets). Because of the nature of class clusters, character set objects aren’t actual instances of the NSCharacterSet
or NSMutableCharacterSet
classes but of one of their private subclasses. Although a character set object’s class is private, its interface is public, as declared by these abstract superclasses, NSCharacterSet
and NSMutableCharacterSet
. The character set classes adopt the NSCopying
and NSMutableCopying
protocols, making it convenient to convert a character set of one type to the other.
The NSCharacterSet
class declares the programmatic interface for an object that manages a set of Unicode characters (see the NSString
class cluster specification for information on Unicode). NSCharacterSet
’s principal primitive method, characterIsMember:
, provides the basis for all other instance methods in its interface. A subclass of NSCharacterSet
needs only to implement this method, plus mutableCopyWithZone:
, for proper behavior. For optimal performance, a subclass should also override bitmapRepresentation
, which otherwise works by invoking characterIsMember:
for every possible Unicode value.
NSCharacterSet
is “toll-free bridged” with its Cocoa Foundation counterpart, CFCharacterSet Reference. This means that the Core Foundation type is interchangeable in function or method calls with the bridged Foundation object. Therefore, in a method where you see an NSCharacterSet *
parameter, you can pass a CFCharacterSetRef
, and in a function where you see a CFCharacterSetRef
parameter, you can pass an NSCharacterSet
instance (you cast one type to the other to suppress compiler warnings). See Interchangeable Data Types for more information on toll-free bridging.
The mutable subclass of NSCharacterSet
is NSMutableCharacterSet
.
+ alphanumericCharacterSet
+ capitalizedLetterCharacterSet
+ controlCharacterSet
+ decimalDigitCharacterSet
+ decomposableCharacterSet
+ illegalCharacterSet
+ letterCharacterSet
+ lowercaseLetterCharacterSet
+ newlineCharacterSet
+ nonBaseCharacterSet
+ punctuationCharacterSet
+ symbolCharacterSet
+ uppercaseLetterCharacterSet
+ whitespaceAndNewlineCharacterSet
+ whitespaceCharacterSet
Returns a character set containing the characters in the categories Letters, Marks, and Numbers.
+ (id)alphanumericCharacterSet
A character set containing the characters in the categories Letters, Marks, and Numbers.
Informally, this set is the set of all characters used as basic units of alphabets, syllabaries, ideographs, and digits.
NSCharacterSet.h
Returns a character set containing the characters in the category of Titlecase Letters.
+ (id)capitalizedLetterCharacterSet
A character set containing the characters in the category of Titlecase Letters.
NSCharacterSet.h
Returns a character set containing characters determined by a given bitmap representation.
+ (id)characterSetWithBitmapRepresentation:(NSData *)data
A bitmap representation of a character set.
A character set containing characters determined by data.
This method is useful for creating a character set object with data from a file or other external data source.
A raw bitmap representation of a character set is a byte array of 2^16 bits (that is, 8192 bytes). The value of the bit at position n represents the presence in the character set of the character with decimal Unicode value n. To add a character with decimal Unicode value n to a raw bitmap representation, use a statement such as the following:
unsigned char bitmapRep[8192]; |
bitmapRep[n >> 3] |= (((unsigned int)1) << (n & 7)); |
To remove that character:
bitmapRep[n >> 3] &= ~(((unsigned int)1) << (n & 7)); |
NSCharacterSet.h
Returns a character set containing the characters in a given string.
+ (id)characterSetWithCharactersInString:(NSString *)aString
A string containing characters for the new character set.
A character set containing the characters in aString. Returns an empty character set if aString is empty.
NSCharacterSet.h
Returns a character set read from the bitmap representation stored in the file a given path.
+ (id)characterSetWithContentsOfFile:(NSString *)path
A path to a file containing a bitmap representation of a character set. The path name must end with the extension .bitmap
.
A character set read from the bitmap representation stored in the file at path.
To read a bitmap representation from any file, use the NSData
methoddataWithContentsOfFile:options:error:
and pass the result to characterSetWithBitmapRepresentation:
.
This method doesn’t use filenames to check for the uniqueness of the character sets it creates. To prevent duplication of character sets in memory, cache them and make them available through an API that checks whether the requested set has already been loaded.
NSCharacterSet.h
Returns a character set containing characters with Unicode values in a given range.
+ (id)characterSetWithRange:(NSRange)aRange
A range of Unicode values.
aRange.location
is the value of the first character to return; aRange.location +
aRange.length– 1
is the value of the last.
A character set containing characters whose Unicode values are given by aRange. If aRange.length
is 0
, returns an empty character set.
This code excerpt creates a character set object containing the lowercase English alphabetic characters:
NSRange lcEnglishRange; |
NSCharacterSet *lcEnglishLetters; |
lcEnglishRange.location = (unsigned int)'a'; |
lcEnglishRange.length = 26; |
lcEnglishLetters = [NSCharacterSet characterSetWithRange:lcEnglishRange]; |
NSCharacterSet.h
Returns a character set containing the characters in the categories of Control or Format Characters.
+ (id)controlCharacterSet
A character set containing the characters in the categories of Control or Format Characters.
These characters are specifically the Unicode values U+0000
to U+001F
and U+007F
to U+009F
.
NSCharacterSet.h
Returns a character set containing the characters in the category of Decimal Numbers.
+ (id)decimalDigitCharacterSet
A character set containing the characters in the category of Decimal Numbers.
Informally, this set is the set of all characters used to represent the decimal values 0
through 9
. These characters include, for example, the decimal digits of the Indic scripts and Arabic.
NSCharacterSet.h
Returns a character set containing all individual Unicode characters that can also be represented as composed character sequences.
+ (id)decomposableCharacterSet
A character set containing all individual Unicode characters that can also be represented as composed character sequences (such as for letters with accents), by the definition of “standard decomposition” in version 3.2 of the Unicode character encoding standard.
These characters include compatibility characters as well as pre-composed characters.
Note: This character set doesn’t currently include the Hangul characters defined in version 2.0 of the Unicode standard.
NSCharacterSet.h
Returns a character set containing values in the category of Non-Characters or that have not yet been defined in version 3.2 of the Unicode standard.
+ (id)illegalCharacterSet
A character set containing values in the category of Non-Characters or that have not yet been defined in version 3.2 of the Unicode standard.
NSCharacterSet.h
Returns a character set containing the characters in the categories Letters and Marks.
+ (id)letterCharacterSet
A character set containing the characters in the categories Letters and Marks.
Informally, this set is the set of all characters used as letters of alphabets and ideographs.
NSCharacterSet.h
Returns a character set containing the characters in the category of Lowercase Letters.
+ (id)lowercaseLetterCharacterSet
A character set containing the characters in the category of Lowercase Letters.
Informally, this set is the set of all characters used as lowercase letters in alphabets that make case distinctions.
NSCharacterSet.h
Returns a character set containing the newline characters.
+ (id)newlineCharacterSet
A character set containing the newline characters (U+000A
–U+000D
, U+0085
).
NSCharacterSet.h
Returns a character set containing the characters in the category of Marks.
+ (id)nonBaseCharacterSet
A character set containing the characters in the category of Marks.
This set is also defined as all legal Unicode characters with a non-spacing priority greater than 0
. Informally, this set is the set of all characters used as modifiers of base characters.
NSCharacterSet.h
Returns a character set containing the characters in the category of Punctuation.
+ (id)punctuationCharacterSet
A character set containing the characters in the category of Punctuation.
Informally, this set is the set of all non-whitespace characters used to separate linguistic units in scripts, such as periods, dashes, parentheses, and so on.
NSCharacterSet.h
Returns a character set containing the characters in the category of Symbols.
+ (id)symbolCharacterSet
A character set containing the characters in the category of Symbols.
These characters include, for example, the dollar sign ($) and the plus (+) sign.
NSCharacterSet.h
Returns a character set containing the characters in the categories of Uppercase Letters and Titlecase Letters.
+ (id)uppercaseLetterCharacterSet
A character set containing the characters in the categories of Uppercase Letters and Titlecase Letters.
Informally, this set is the set of all characters used as uppercase letters in alphabets that make case distinctions.
NSCharacterSet.h
Returns a character set containing only the whitespace characters space (U+0020
) and tab (U+0009
) and the newline and nextline characters (U+000A
–U+000D
, U+0085
).
+ (id)whitespaceAndNewlineCharacterSet
A character set containing only the whitespace characters space (U+0020
) and tab (U+0009
) and the newline and nextline characters (U+000A
–U+000D
, U+0085
).
NSCharacterSet.h
Returns a character set containing only the in-line whitespace characters space (U+0020
) and tab (U+0009
).
+ (id)whitespaceCharacterSet
A character set containing only the in-line whitespace characters space (U+0020
) and tab (U+0009
).
This set doesn’t contain the newline or carriage return characters.
NSCharacterSet.h
Returns an NSData
object encoding the receiver in binary format.
- (NSData *)bitmapRepresentation
An NSData
object encoding the receiver in binary format.
This format is suitable for saving to a file or otherwise transmitting or archiving.
A raw bitmap representation of a character set is a byte array of 2^16 bits (that is, 8192 bytes). The value of the bit at position n represents the presence in the character set of the character with decimal Unicode value n. To test for the presence of a character with decimal Unicode value n in a raw bitmap representation, use an expression such as the following:
unsigned char bitmapRep[8192]; |
if (bitmapRep[n >> 3] & (((unsigned int)1) << (n & 7))) { |
/* Character is present. */ |
} |
NSCharacterSet.h
Returns a Boolean value that indicates whether a given character is in the receiver.
- (BOOL)characterIsMember:(unichar)aCharacter
The character to test for membership of the receiver.
YES
if aCharacter is in the receiving character set, otherwise NO
.
NSCharacterSet.h
Returns a Boolean value that indicates whether the receiver has at least one member in a given character plane.
- (BOOL)hasMemberInPlane:(uint8_t)thePlane
A character plane.
YES
if the receiver has at least one member in thePlane, otherwise NO
.
This method makes it easier to find the plane containing the members of the current character set. The Basic Multilingual Plane is plane 0
.
NSCharacterSet.h
Returns a character set containing only characters that don’t exist in the receiver.
- (NSCharacterSet *)invertedSet
A character set containing only characters that don’t exist in the receiver.
Inverting an immutable character set is much more efficient than inverting a mutable character set.
invert
(NSMutableCharacterSet
)NSCharacterSet.h
Returns a Boolean value that indicates whether the receiver is a superset of another given character set.
- (BOOL)isSupersetOfSet:(NSCharacterSet *)theOtherSet
A character set.
YES
if the receiver is a superset of theOtherSet, otherwise NO
.
NSCharacterSet.h
Returns a Boolean value that indicates whether a given long character is a member of the receiver.
- (BOOL)longCharacterIsMember:(UTF32Char)theLongChar
A UTF32 character.
YES
if theLongChar is in the receiver, otherwise NO
.
This method supports the specification of 32-bit characters.
NSCharacterSet.h
Specifies lower bound for a Unicode character range reserved for Apple’s corporate use.
enum { NSOpenStepUnicodeReservedBase = 0xF400 };
NSOpenStepUnicodeReservedBase
Specifies lower bound for a Unicode character range reserved for Apple’s corporate use (the range is 0xF400–0xF8FF
).
Available in Mac OS X v10.0 and later.
Declared in NSCharacterSet.h
.
NSCharacterSet.h
© 2008 Apple Inc. All Rights Reserved. (Last updated: 2008-10-15)