Using language-tagged QuickTime UserData text APIs with CFStrings

Q: When using AddUserDataText I keep getting strange results when adding UTF-8 text that is intended for a Japanese OS. This API is documented as taking a text string with a language code (itlRegionTag), how do I map UTF-8 strings to the correct language? It would be ideal to just work with CFStrings.

A: AddUserDataText, GetUserDataText and many other QuickTime APIs that take or return text strings assume that the text string is in one of the Traditional Mac OS language specific encodings, for example kTextEncodingMacJapanese. Therefore, the value of the itlRegionTag parameter passed to these APIs should be the language code, for example langJapanese.

If the string you have is UTF-8 ( or UTF-16 ), when using AddUserDataText you will have to convert it to the appropriate Traditional Mac OS language specific encoding.

CFString has the ability to do this conversion by calling CFStringGetBytes and passing in the appropriate CFStringEncoding.

Note: CFStringEncoding is an integer type for constants used to specify supported string encodings in various CFString functions; the values are exactly the same as the Text Encoding Converter's TextEncoding type and can be found in TextCommon.h.

You can map a Traditional Mac OS language code to the appropriate TextEncoding for CFStringGetBytes by calling GetTextEncodingFromScriptInfo which converts any combination of Traditional Mac OS script code, language code and region code to a TextEncoding.

Listing 1 demonstrates how to add a user data text item from a CFString using the above technique, while Listing 2 demonstrates retrieving a text user data item as a CFString. Because QuickTime requires language-tagged text, you will always need to use the Traditional Mac OS language codes found in Script.h with these UserData APIs.

Note: It is important to note that a conversion isn't always possible. For example, if you have a CFString containing a mixture of Japanese and Arabic you can't convert it to any single Traditional Mac OS encoding. The conversion will fail unless you're doing lossy conversion. CFStringGetBytes allows for lossy conversion by passing a "loss byte" to the function. If a character cannot be converted, CFStringGetBytes substitutes the "loss character" and conversion proceeds.

See Converting Between String Encodings for more information.

Listing 1: Adding a UserData item as text using a language code.

/* AddUserDataTextFromCFString
 *
    Description:
        Add a user data item as text to a user data list from a CFString
        performing character conversion to a specified language implemented
        using a Traditional Mac OS encoding if possible

    Parameters:
    inUserData - the user data list for this operation
    inUDType - the type that is to be assigned to the new item
    inIndex - the item to which the text is to be added
    inLanguageCode - a language code implemented using a particular Mac OS
                     encoding (eg. langEnglish, langJapanese etc.)
    inCFString - a CFString containing the user data text to be added

    Returns:
        noErr or appropriate error code on failure
 *
 */
OSStatus AddUserDataTextFromCFString(UserData inUserData, SInt32 inUDType, SInt32 inIndex,
                                     SInt16 inLanguageCode, CFStringRef inCFString)
{
    // the string encoding of the characters to copy, the values are the same
    // as Text Encoding Converter TextEncoding
    CFStringEncoding encoding = 0;
    CFIndex numberOfCharsConverted = 0, usedBufferLength = 0;
    CFRange range = { 0, CFStringGetLength(inCFString)};
    OSStatus status;

    // convert any combination of a Mac OS script code, a language code, a region code 
    // to a text encoding
    // the CFString passed in should be in this encoding
    status = GetTextEncodingFromScriptInfo(kTextScriptDontCare, inLanguageCode,
                                           kTextRegionDontCare, &encoding);
    if (noErr == status) {
        // grab the characters from a CFString object into a byte buffer after
        // converting the characters to a specified encoding
        // we initially pass NULL for the destination buffer to make sure the
        // conversion will succeed then we check to make sure the entire string can be
        // converted as we are not using lossy conversion
        numberOfCharsConverted = CFStringGetBytes(inCFString, range, encoding, 0, false,
                                                  NULL, 0, &usedBufferLength);
        if ((numberOfCharsConverted == CFStringGetLength(inCFString)) && (usedBufferLength > 0)) {
            // conversion will work so do it for real this time
            Handle hData = NewHandleClear(usedBufferLength);
            if (NULL != hData) {
                HLock(hData);

                numberOfCharsConverted = CFStringGetBytes(inCFString, range, encoding, 0,
                                                          false, *hData, usedBufferLength,
                                                          &usedBufferLength);
                status = AddUserDataText(inUserData, hData, inUDType, inIndex,
                                         inLanguageCode);

                DisposeHandle(hData);
            } else {
                status = MemError();
            }
        } else {
            // conversion did not work
            status = kTextUnsupportedEncodingErr;
        }
    }

    return status;
}

Listing 2: Retrieving language-tagged UserData text as a CFString.

/* GetUserDataTextAsCFString
 *
    Description:
        Retrieves language code tagged text from an item in a user data list
        as a CFString performing character conversion to the appropriate text
        encoding if possible

    Parameters:
    inUserData - the user data list for this operation
    inUDType - the type that is to be assigned to the new item
    inIndex - the item to which the text is to be added
    inLanguageCode - a language code implemented using a particular
                     Mac OS encoding (langEnglish, langJapanese etc.)

    Returns:
        a CFString containing the text or NULL on failure

    Note:
        it is the responsibility of the caller to release the returned CFString
 *
 */
CFStringRef GetUserDataTextAsCFString(UserData inUserData, SInt32 inUDType, SInt32 inIndex,
                                      SInt16 inLanguageCode)
{
    TextEncoding encoding = 0; // the encoding of the characters in the buffer
    CFStringRef string = NULL;
    Handle hData = NULL;
    OSStatus status;

    hData = NewHandle(0);
    if (NULL == hData || noErr != MemError()) return NULL;

    status = GetUserDataText(inUserData, hData, inUDType, inIndex, inLanguageCode);
    if (noErr == status && (GetHandleSize(hData) > 0)) {
        // convert any combination of a Mac OS script code, a language code, a region
        // code to a text encoding
        status = GetTextEncodingFromScriptInfo(kTextScriptDontCare, inLanguageCode,
                                               kTextRegionDontCare, &encoding);
        if (noErr == status) {
            // create a CFString object from a buffer containing characters in a
            // specified encoding
            HLock(hData);
            string = CFStringCreateWithBytes(kCFAllocatorDefault, (const char *)*hData,
                                             GetHandleSize(hData), encoding, false);
        }
    }

    DisposeHandle(hData);

    return string;
}

References

Back to Top

Downloadables

Back to Top

Document Revision History

Date Notes
2005-02-11 First Version

Posted: 2005-02-11


Did this document help you?
Yes: Tell us what works for you.
It’s good, but: Report typos, inaccuracies, and so forth.
It wasn’t helpful: Tell us what would have helped.