|
FSSpecs and FSRefsContains information and coding techniques useful in migrating your source from FSSpecs to using FSRefs. Differences between FSSpecs and FSRefsstruct FSSpec { short vRefNum; long parID; StrFileName name; /* a Str63 */ }; struct FSRef { UInt8 hidden[80]; /* private to File Manager*/ }; The differences which will probably have the biggest impact on your code are that FSRefs cannot represent items which do not exist, and an Converting FSSpecs to FSRefs and backTo convert an
To obtain an
How can I tell if an FSRef is valid?Boolean FSRefIsValid( const FSRef &fsRef ) { return ( FSGetCatalogInfo( &fsRef, kFSCatInfoNone, NULL, NULL, NULL, NULL ) == noErr ); } How can I tell if they reference the same item?if ( FSCompareFSRefs( &fsRef1, &fsRef2 ) == noErr ) Getting the parent directory of an FSReferr = FSGetCatalogInfo( &fsRef, kFSCatInfoNone, NULL, NULL, NULL, &parentFSRef ); How do I specify non-existent items, such as files you plan to create?struct ExtFSRef { FSRef parentFSRef; HFSUniStr255 name; }; class CExtFSRef { FSRef parentFSRef; HFSUniStr255 name; // ... Some useful member functions }; This technique is especially useful when storing data returned by struct ExtFSRef2 { FSRef parentFSRef; CFStringRef name; }; class CExtFSRef2 { FSRef parentFSRef; CFStringRef name; // ... Some useful member functions }; Apple eventsDon't pass FSRefs in AppleEvents. Because FSRefs are not guaranteed to be valid across processes in Mac OS X you shouldn't send them in AppleEvents. MoreFinderEvents contains code demonstrating how to pass aliases to the Finder through AppleEvents. Persistent storageLike FSSpecs, FSRefs are not guaranteed to be valid across boots in Mac OS 9 or Mac OS X, across processes in Mac OS X, or even across separate launches of the same application in Mac OS X, so don't use them when you need persistent storage. For persistent storage, aliases are still the recommended approach. (Alias Manager) Can I continue to use FSSpecs?Yes, they continue to be valid file references. An It depends on your application. The QuickTime and Drag Manager APIs still require FSSpecs, but creating temporary FSSpecs from FSRefs is an easy operation. You have to use the new Note: If you use kWindowModalityAppModal for any of the new How do I get an FSRef to my application?OSErr GetCurrentProcessFSSpec( FSSpec *outFSSpec ) { ProcessSerialNumber currentProcess = { 0, kCurrentProcess }; ProcessInfoRec processInfo; processInfo.processInfoLength = sizeof(ProcessInfoRec); processInfo.processName = NULL; /* don't need the process name */ processInfo.processAppSpec = outFSSpec; return GetProcessInformation( ¤tProcess, &processInfo ); } If your application is bundled, this will get an OSErr GetMyBundleFSRef( FSRef *outFSRef ) { ProcessSerialNumber currentProcess = { 0, kCurrentProcess }; return( GetProcessBundleLocation( ¤tProcess, outFSRef ) ); } LaunchServicesLaunchServices is a set of Mac OS X-only APIs for working with files. Read through <LaunchServices.h> if you want to be up-to-date on files in Mac OS X, where there are some new issues like bundled applications, display names, new rules for application binding, and so on. Technical Note TN2017, 'Using Launch Services for discovering document binding and launching applications', also contains a wealth of information. OSStatus LSIsApplication( const FSRef *inRef, Boolean *outIsApplication, Boolean *outIsBundled ) { LSItemInfoRecord info; OSStatus err = LSCopyItemInfoForRef( inRef, kLSRequestBasicFlagsOnly, &info ); if ( err == noErr ) { *outIsApplication = ( kLSItemInfoIsApplication &info.flags ) != 0; *outIsBundled = ( kLSItemInfoIsPackage &info.flags ) != 0; } return( err ); } Use Use Use Use Getting a file pathThe most straightforward approach to getting a files path is with the API: OSStatus FSRefMakePath( const FSRef * ref, UInt8 * path, UInt32 maxPathSize); This will return back a UTF8 encoded path to the object specified by the CFURLRef url = CFURLCreateFromFSRef( kCFAllocatorDefault, &fsRef ); CFStringRef cfString = NULL; if ( url != NULL ) { cfString = CFURLCopyFileSystemPath( url, inPathStyle ); CFRelease( url ); } Boolean GetPathManually( const FSRef *inFSRef, CFMutableStringRef ioPath, UniChar inSepChar ) { //ioPath should already be created with CFStringCreateMutablexxx. FSCatalogInfo catalogInfo; int n; int i; HFSUniStr255 names[100]; FSRef localRef = *inFSRef; OSStatus err = noErr; CFStringDelete( ioPath, CFRangeMake( 0, CFStringGetLength( ioPath ) ) ); for ( n=0 ; err==noErr && catalogInfo.nodeID != fsRtDirID && n<100 ; n++ ) { err = FSGetCatalogInfo( &localRef, kFSCatInfoNodeID, &catalogInfo, &names[n], nil, &localRef ); } for ( i = n - 1; i >= 0; --i ) { CFStringAppendCharacters( ioPath, names[i].unicode, names[i].length ); if ( i > 0 ) CFStringAppendCharacters( ioPath, &inSepChar, 1 ); } return( err == noErr ); } Additional NotesBecause the contents of an CarbonLib If you're contemplating a CarbonLib project, be aware that FSRefs were introduced with the new HFS+ APIs in Mac OS 9 and hence require Mac OS 9 or later. CarbonLib provides a wrapper around FSRefs and long Unicode file namesHow do I get the name of an item from an FSRef?OSErr FSRefGetName( const FSRef *fsRef, HFSUniStr255 *name ) { return( FSGetCatalogInfo(fsRef, kFSCatInfoNone, NULL, name, NULL, NULL) ); } An HFSUniStr255 is defined as: struct HFSUniStr255 { UInt16 length; /* number of unicode characters */ UniChar unicode[255]; /* unicode characters */ }; Since HFSUniStr255s occupy 512 bytes you may want to store names as CFStringRefs: strRef = CFStringCreateWithCharacters( kCFAllocatorDefault, name.unicode, name.length ); In addition to conserving memory, Core Foundation provides a wealth of APIs for testing and manipulating CFStrings. There are no such APIs for working with Note that Note: While technically correct, he definition of Notes about Unicode stringsStrictly speaking, the issue here is independent of the source of the CFString, but they are often encountered when deal with Unicode file names. Many of us need to display the name of a file or folder in our applications. Since Mac OS X supports long Unicode file names, there are some related issues. Unicode has a number of things going on under the hood which you wouldn't expect if you are unfamiliar with Unicode and how it works. The following are some basic points to remember when working with Unicode file names. A Unicode string (speaking from the viewpoint of Mac OS X) is a string of UniChars. Such a string can be converted to and from a A single Unicode code point may require multiple UniChars, so never modify a Unicode string by simply removing a range of UniChars or inserting UniChars at an arbitrary offset. Doing so can produce a string which is not what you expect, incorrect, or even leave you with a string which is no longer a legal Unicode string. Truncating Unicode strings by widthBoolean TruncateWidth( CFMutableStringRef ioString, SInt16 inMaxWidth, TruncCode inTruncCode, ThemeFontID inThemeFontID, ThemeDrawState inState ) { Boolean wasTruncated = false; OSStatus err = TruncateThemeText( ioString, inThemeFontID, inState, inMaxWidth, inTruncCode, &wasTruncated ); return( (err == noErr) && wasTruncated ); } Truncating Unicode strings by lengthUnfortunately, there is no simple API available which you can use to correctly truncate a Unicode string to a certain number of characters. To truncate a file name based on length, you'll need to convert the name to a UniChar string and use ConcatenationYou can concatenate Unicode strings at will. The individual pieces will retain their original meaning. For example, you can append ".txt" to a Unicode string without changing the meaning of the existing string. Or, you could concatenate English and Arabic (a right-to-left script) and get the desired result. Determining the width of stringsDon't try to estimate the width of a Unicode string based on the number of UniChars in the string. In addition to the issues of combining characters and surrogate pairs, Unicode text can contain invisible characters which are not rendered. Unicode goes beyond the simple encoding of characters and scripts. There are are several code point values which can be used to provide hints or instructions to rendering software, but are never rendered themselves. SInt16 GetWidth( const CFStringRef inString, ThemeFontID inFontID, ThemeDrawState inState ) { Point pt; SInt16 baseline; GetThemeTextDimensions( inString, inFontID, inState, false, &pt, &baseline ); return( pt.h ); } Encoding file names in other formatsAgain, this is not an issue limited to file names, but is included because people often make the mistake of assuming that the size of the buffer needed when converting a CFString to a C string is char* CreateUTF8CStringFromCFString( const CFStringRef inString ) { // DisposePtr(cStr) must be called when with the result if successful. CFIndex max; char *cStr; CFIndex len = CFStringGetLength( inString ); max = CFStringGetMaximumSizeForEncoding( len, kCFStringEncodingUTF8 ); cStr = NewPtr( 1 + max ); if ( cStr != NULL ) { if ( !CFStringGetCString(inString, cStr, len, kCFStringEncodingUTF8) ) { DisposePtr( cStr ); cStr = NULL; } if ( cStr != NULL ) SetPtrSize( cStr, strlen( cStr ) + 1 ); // SetPtrSize has no effect in Mac OS X if the new size is less than // the old size. If you really want to shrink the pointer to the // amount actually used, inMac OS X, you'll need to allocate a new // pointer whose size is strlen(cStr) + 1 and copy the contents of // cStr to the new pointer. } return( cStr ); } How file names are encodedHFS+ disks store file names as UTF-16 in an Apple-modified form of Normalization Form D (decomposed). This form excludes certain compatibility decompositions and parts of the symbol blocks, in order to assure round-trip of file names to Mac OS encodings (applications using the HFS APIs assume they get the same bytes out that they put in). In Mac OS X 10.2, the decomposition rules used were changed from Unicode 2.0.x (based on an intermediate draft) plus the above-mentioned Apple modifications, to Unicode 3.2 plus the above-mentioned Apple modifications. The Unicode Consortium has committed to not changing the decomposition rules after Unicode 3.2, so we shouldn't have to do this again. The change from 2.0.x to 3.2 was necessary because A) lots of new decompositions had been added, and B) the 2.0.x data was full of errors. Other file systems use different storage formats. UFS disks use UTF-8, HFS disks use Mac OS encodings. AFP (AppleShare) uses Mac OS encodings prior to 3.0, and UTF-16 for 3.0 or later. Notes About Using UnicodeThis could also be called "Unicode for File Names," as there are many aspects of Unicode which won't be discussed here because they aren't needed if all you're doing with Unicode is working with file names in Mac OS X. The reason for focusing on this particular area is that it's an area which every Mac OS X application should be prepared to support. If you're writing a Unicode-savvy word processor, you're going to need a lot more understanding than any glossary notes. Most of the information presented here is from the book, Unicode Demystified by Richard Gillam. However, its 800 pages and may be overkill if all you want to do is handle file names properly in Mac OS X. What is Unicode?Unicode is a universal text encoding standard for representing written language in a format suitable for use and storage by computers. It's goal is to allow the encoding of all, or at least all significant forms of writing in use in the world today, as well as many which are no longer used, but are of historical or scholarly interest. Unicode terminologyThere are two major challenges for those new to Unicode. First is getting a handle on the terminology. Second, and directly related to the first, is understanding what constitutes a character in a written language, in Unicode, and how the two are related (i.e., how characters are encoded in Unicode). English is one of the simplestâif not the simplestâof all the world's languages to write and encode for use by computers. Understandably, people whose native language is English tend to make incorrect assumptions about how other languages are written and encoded into Unicode. When code is written based on those false assumptions, it will not work correctly for all languages. Following are a some terms often used in Unicode discussions. A character is an abstract linguistic concept such as "the Latin letter A" or "the Chinese character for 'sun.'" Every character defined in the Unicode standard is assigned a single 21-bit abstract code point value. Apple refers to a code point value in Unicode as a Unicode Scalar Value. MacTypes.h has the following to say: Table 1:MacTypes.h definitions
In Unicode terms: The Basic Multilingual Plane or BMP refers to the code point values from U+0000 to U+FFFF, and was the original Unicode encoding space. Later, when it was realized additional space was needed, 16 supplementary planes were added to the encoding space and code point values were extended from 16 bits to 21 bits. Hence the BMP contains the code point values which can be converted to corresponding UniChars by simply lopping off the upper five zero bits. The nth supplementary plane contains code point values in the range U+n0000 to U+nFFFF, where n ranges from 0x01 - 0x10. Thus the full range of Unicode code point values is 0x0000 to 0x10FFFF. Planes 3 - 13 (U+30000 to U+EFFFF) are currently unused and available for future use. Surrogate pairs - Unicode sets aside 2,048 code point values (U+8000 - U+DFFF) in the BMP which will never be assigned to actual characters. They are reserved for defining paired combinations to represent characters outside the BMP. These values are called surrogates. The first 1,024 surrogate values (U+D800-U+DBFF) are called high-surrogates, and the remaining 1,024 surrogate values (U+DC00-U+DFFF) are called low-surrogates. A supplementary-plane character (a character not in the BMP) is represented by high-surrogate followed by a low-surrogate. Note that surrogates are only legal when they occur in high-low pairs. An unpaired surrogate is considered an error in Unicode. In case you're just dying to know how a 21âbit code point value is mapped to a surrogate pair, it goes like this: First, subtract 0x10000 from the original code point value to get a 20âbit value. Split those 20 bits down the middle to get two 10âbit sequences. The first 10âbit sequence becomes the lower 10 bits of the high-surrogate value and the second 10âbit sequence becomes the lower 10 bits of the low-surrogate value. Combining marks are code point values which do not represent characters themselves, but apply a mark to a base character which precedes them. Diacritical marks are one kind of combining mark. For example: é = e + ´ (U+0065 LATIN SMALL LETTER E) + (U+0301 COMBINING ACUTE ACCENT) A grapheme is a minimal writing unit is some written language; a mark that is considered a single "character" by an average reader or writer of a particular written language. A grapheme cluster is a sequence of one or more Unicode code points (UniChars) that should be treated as an indivisible unit by most processes operating on Unicode text, such as searching and sorting, hit testing, arrow key movement, and so on. References to the term "cluster" in documentation, or in the headers, such as A glyph is a concrete visual representation of a character. It's what you see on screen or in print. Truncating and other manipulationsThe original intention was for Unicode to represent every character with a single UniChar, but it quickly became obvious that it isn't possible to do this. More than 95,000 characters are now defined in the Unicode standard, far more than can be represented by a single 16-bit value. Only code point values in the Basic Multilingual Plane can be represented with a single UniChar. Furthermore, a significant number of characters are represented as a base character plus one or more diacritical or other combining marks. Assuming that there's a one-to-one relationship between characters and the Unicode characters which represent them leads to one of the most common errors in code which manipulates Unicode strings, which is to truncate a Unicode string at an inappropriate offset. Always use appropriate Unicode-aware APIs to truncate a Unicode string or determine where to insert or remove characters. (See truncation comments.) (Encoded into Unicode as is done by the File Manager in Mac OS X, the string "résumé" contains eight UniChars. Lop off the last one and you'll have a Unicode string for "résume".) Final Unicode commentsA 32-bit encoding would allow Unicode to provide a direct 1-1 correspondence between code point values and their encoded values, which in turn would eliminate most of those issues about where you can safely insert or truncate characters. 21 bits provides support for about a million characters, roughly 10 times the number currently encoded. But the smallest data type used by computers that's easily manipulated and will contain 21 bits is 32 bits. The downside is that an encoding scheme based on a 32-bit data type would waste a lot of space. If Unicode used a 32-bit encoding schemeâwhich would allow encoding every code point value in a single code valueâit would waste at least 11 bits for every character, and at least 16 bits/character for the vast majority of characters in common use. For example, a 32-bit based ReferencesDocument Revision History
Posted: 2003-05-06 |
|