Important: The information in this document is obsolete and should not be used for new development.
Sniffs a text stream of unknown encoding, based on an array of possible encodings, and returns the probable encodings in a ranked list.
pascal OSStatus TECSniffTextEncoding (
For a specified stream of bytes in an unknown encoding and an array of possible encodings, TECSniffTextEncoding returns counts of "errors" and "features" for each of the encodings. Each error indicates a code point or sequence that is illegal in the specified encoding, and a feature indicates the presence of a sequence that is characteristic of that encoding. Table 3-1 shows sample output from a sniffer run.
EUC | 0 | 8 |
JIS | 0 | 0 |
Mac OS Japanese | 20 | 20 |
For example, the byte sequence which is interpreted in Mac OS Roman as "äøéö" could legally be interpreted either as Mac OS Roman text or as Mac OS Japanese text. Both sniffers would return zero errors, but the Mac OS Japanese sniffer would also return two features of Mac OS Japanese (representing two legal 2-byte characters.)
The arrays are returned in a ranked list with the most likely text encodings first. The results are sorted first by number of errors (fewest to most), then by number of features (most to fewest), and then by the original order in the list. Upon return from the function, you can assume the correct encoding is in testEncodings[0], or possibly testEncodings[1].
If any of the available encodings are not examined, their number of errors and number of features are set to 0xFFFFFFFF, and they sort to the end of the list.
The function TECCountAvailableSniffers
The function TECGetAvailableSniffers
The function TECCreateSniffer