TECSniffTextEncoding

Sniffs a text stream of unknown encoding, based on an array of possible encodings, and returns the probable encodings in a ranked list.

pascal OSStatus TECSniffTextEncoding (
                     TECSnifferObjectRef encodingSniffer,
                     TextPtr inputBuffer,
                     ByteCount inputBufferLength,
                     TextEncoding testEncodings[],
                     ItemCount numTextEncodings,
                     ItemCount numErrsArray[],
                     ItemCount maxErrs,
                     ItemCount numFeaturesArray[],
                     ItemCount maxFeatures);

encodingSniffer

A pointer to a sniffer object.

inputBuffer

The text to be sniffed.

inputBufferLength

The length of the input buffer.

testEncodings[]

An array of text encoding specifications. On input, you must specify which text encodings you want to sniff for. On output, this array contains the input array rearranged in the order of most likely to least likely text encodings.

numTextEncodings

A value of type ItemCount. This value refers to the number of entries in the testEncodings[] parameter.

numErrsArray[]

An array of type ItemCount. This array must contain at least numTextEncodings elements. On return, numErrsArray holds the number of errors found for each possible text encoding. The entries are in the same order as the entries in the testEncodings[] parameter at output.

maxErrs

The maximum number of errors allowed for a sniffer. The sniffer stops sniffing an encoding after this number is reached when creating the numErrsArray list.

numFeaturesArray[]

An array of type ItemCount. This array must contain at least numTextEncodings elements. On return, the numFeaturesArray[] parameter holds the number of features found for each possible text encoding. The entries are in the same order as the entries in the testEncodings[] parameter at output.

maxFeatures

The maximum number of features allowed for a sniffer. The sniffer stops sniffing an encoding after this number is reached when creating the numFeaturesArray list.

function result

A result code. See Text Encoding Conversion Manager Result Codes for a list of possible values. If this function returns a result code other than noErr, then one of the conversion plug-ins accessed by the converter encountered an error condition while accessing a sniffer function.

DISCUSSION

For a specified stream of bytes in an unknown encoding and an array of possible encodings, TECSniffTextEncoding returns counts of "errors" and "features" for each of the encodings. Each error indicates a code point or sequence that is illegal in the specified encoding, and a feature indicates the presence of a sequence that is characteristic of that encoding. Table 3-1 shows sample output from a sniffer run.

Table 3-1 Sample Sniffer Output

Encoding	Errors	Features
EUC	0	8
JIS	0	0
Mac OS Japanese	20	20

For example, the byte sequence which is interpreted in Mac OS Roman as "äøéö" could legally be interpreted either as Mac OS Roman text or as Mac OS Japanese text. Both sniffers would return zero errors, but the Mac OS Japanese sniffer would also return two features of Mac OS Japanese (representing two legal 2-byte characters.)

The arrays are returned in a ranked list with the most likely text encodings first. The results are sorted first by number of errors (fewest to most), then by number of features (most to fewest), and then by the original order in the list. Upon return from the function, you can assume the correct encoding is in testEncodings[0], or possibly testEncodings[1].

If any of the available encodings are not examined, their number of errors and number of features are set to 0xFFFFFFFF, and they sort to the end of the list.

Legacy Document

TECSniffTextEncoding

DISCUSSION

SEE ALSO