| 
    
        |  | This technical note describes how to modify the standard string comparison by
constructing an itl2resource. Developers may want to modify the
standard string comparison if Apple's comparison doesn't meet their needs or if
Apple has not written a string comparison routine for the language that
concerns them. [Mar 01 1988] |  
 
 
 
 
 General Structure
The itl2resource contains a number of procedures that are used for
accurate comparison of text by the International Utilities Package. Refer to
Inside Macintosh, volume V for an explanation of the algorithm used. The
defaultitl2for standard English text, which does no special
processing, has the following form: 
	|     ; normal Include/Load statements
    Include 'hd:mpw:aincludes:ScriptEqu.a'
    Print    On,NoMDir
    String    AsIs
;------------------------------------------------------------------------
;     dispatch table at the front of the code.
;------------------------------------------------------------------------
Intl1        Proc
    With    IUSortFrame,IUStrData
HookDispatch
    dc.w    ReturnEQ-HookDispatch    ; InitProc = 0
    dc.w    ReturnEQ-HookDispatch    ; FetchHook = 2
    dc.w    ReturnEQ-HookDispatch    ; VernierHook = 4
    dc.w    ReturnEQ-HookDispatch    ; ProjectHook = 6
    dc.w    ReturnEQ-HookDispatch    ; ReservedHook1 = 8
    dc.w    ReturnEQ-HookDispatch    ; ReservedHook2 = 10
;------------------------------------------------------------------------
; Some common exit points
;------------------------------------------------------------------------
ReturnNE
    tst.w     MinusOne        ; set cc NE
    rts
ReturnEQ
    cmp.w    d0,d0        ; set cc EQ
    rts
;------------------------------------------------------------------------
    EndWith
    EndWith
 |  
If modifications need to be made to the comparison process, then one or more of
the dispatches will be modified to point to different routines: 
	|     dc.w    InitProc-HookDispatch        ; InitProc = 0
    dc.w    FetchProc-HookDispatch        ; FetchHook = 2
    dc.w    VernierProc-HookDispatch    ; VernierHook = 4
 |  
There are a number of different changes that can be made to the comparison
routines. Some of the common modifications include: 
	Comparing two bytes as one character
	Yugoslavian "l" < "lj" < "m"; Japanese...    [InitProc, FetchProc]Comparing characters in different order
	Norwegian "z" < "å"                [ProjectProc]Comparing one character as twoh
	German "ä" ~ "ae"                [ProjectProc]Ignoring characters unless strings are otherwise equal:
	"blackbird" < "black-bird" < "blackbirds"    [ProjectProc]Changing the secondary orderingh
	Bibliographic "a" < "A"            [VernierProc] 
The comparison hook procedures are all assembly language based, with arguments
described below. Since the routines may be called once per character in both
strings, the routines should be as fast as possible. 
The condition codes are used to return information about the status of the hook
routine. Typically the normal processing of characters will be skipped if the
CCRis set toNE, so the default return should always haveEQset. Each of these routines has access to the stack frame(A6)used in the comparison routine, which has the following form: 
	| IUSortFrame    Record    {oldA6},Decrement
result        ds.w    1
argTop        equ    *
aStrText    ds.l    1
bStrText    ds.l    1
aStrLen    ds.w    1
bStrLen    ds.w    1
argSize    equ    argTop-*
return        ds.l    1
oldA6        ds.l    1
aInfo        ds    IUStrData
bInfo        ds    IUStrData
wantMag     ds.b    1    ; 1-MagStrig 0-MagIdString.
weakEq        ds.b    1    ; Signals at most weak equality
msLock        ds.b    1    ; high byte of master ptr.
weakMag     ds.b    1    ; -1 weak, 1 strong compare
supStorage    ds.b    18    ; extra storage.
localSize    equ    *    ; frame size.
 |  
There are three fields in this frame that are of interest for altering text
comparison. The supStoragefield is an area reserved for use by the
comparison hook procedures as they see fit. TheaInfoandbInforecords contain information about the current byte positions in
the two compared strings A and B, and information about the status of current
characters in those string. TheIUStrDatarecord has the following
form: 
	| IUStrData    Record        0
curChar    ds.w        1    ; current character.
mapChar     ds.w        1    ; projected character.
decChar     ds.w        1    ; decision char for weak equality
bufChar     ds.b        1    ; buffer for expansion.
justAfter    ds.b        1    ; boolean for AE vs ligature-AE.
ignChar    ds.b        1    ; flag: ignore char.
noFetch    ds.b        1    ; flag: no fetch of next.
strCnt        ds.w        1    ; length word.
strPtr        ds.l        1    ; current ptr to string.
 |  Back to top The Init Procedure
The Init Procedure is used to initialize the comparison process. The main use
for this procedure is for double-byte scripts. As an optimization, the
International Utilities will perform an initial check on the two strings,
comparing for simple byte-to-byte equality. Thus any common initial substrings
are checked before the Init procedure is called. The string pointers and
lengths in the IUStrDatarecords have been updated to point just past
the common substrings. 
Languages such as Japanese or Yugoslavian, which may consider two bytes to be
one character, may have to back up one byte, as shown below. 
	| ;------------------------------------------------------------------------
; Routine    InitProc
; Input    A6          Local Frame
; Output    CCR         NE to skip entire sort (usually set EQ)
; Trashes    Standard regs:    A0/A1/D0-D2
; Function    Initialize any special international hooks.
;        Double-byte scripts must synchronize AInfo.StrPtr &
;                  BInfo.StrPtr here!
;------------------------------------------------------------------------
; Note: this should also check for single-byte nigori or maru, as below
InitProc
    move.w        AStrLen(a6), d0    ; A length
    sub.w        AInfo.StrCnt(a6),d0    ; see if its changed
    beq.s        @FixB            ; A is done if not
    sub.l        #2,sp            ; return param
    move.l        AStrText(a6),-(sp)    ; textBuf
    move.w        d0,-(sp)        ; textOffset
    _CharByte
    tst.w        (sp)+            ; on character boundary?
    ble.s        @FixB            ; yes, continue
    sub.l        #1,AInfo.StrPtr(A6)    ; adjust pointer
    add.w        #1,AInfo.StrCnt(A6)    ; adjust count
@FixB
    move.w        BStrLen(a6), d0    ; B length
    sub.w        BInfo.StrCnt(a6),d0    ; see if its changed
    beq.s        Quit Init        ; B is done if not
    sub.l        #2,sp            ; return param
    move.l        BStrText(a6), -(sp)    ; textBuf
    move.w        d0, -(sp)        ; textOffset
    _CharByte
    tst.w        (sp)+            ; on character boundary?
    ble.w        @QuitInit        ; yes, continue
    sub.l        #1,BInfo.StrPtr(A6)    ; adjust pointer
    add.w        #1,BInfo.StrCnt(A6)    ; adjust count
@QuitInit
    bra.s        ReturnEQ        ; return to the caller.
 |  Back to top The Fetch Procedure
The Fetch Procedure is used to fetch a character from a string, updating the
pointer and length to reflect the remainder of the string. For example, the
following code changes the  text comparison for Yugoslavian: 
	| ;------------------------------------------------------------------------
; Routine    FetchProc
; Input    A2          String Data Structure
;        A3          String pointer (one past fetched char)
;         A6          Local Frame
;        D4.W        Character: top byte is fetched character, bottom
;                              is zero
;        D5.B        1 if string is empty, otherwise 0
; Output    D4.W        Character: top byte set to character, bottom to
;                    extension
;        D5.B        1 if string is empty, otherwise 0
; Trashes    Standard regs:    A0/A1/D0-D2
; Function    This routine returns the characters that are fetched from
;        the string, if they are not just a sequence of single bytes.
;------------------------------------------------------------------------
FetchProc
    tst.b        d5            ; more characters in string?
    bne.s        ReturnEq        ; no -> bail out.
    move.w        d4,d0            ; load high byte.
    move.b        (a3),d0        ; load low byte.
    lea        pairTable,a1        ; load table address
@compareChar
    move.w        (a1)+,d1        ; pair = 0?
    beq.s        ReturnEq        ; yes -> end of table.
    cmp.w        d0,d1            ; legal character pair?
    bne.s        @compareChar        ; no -> try the next one.
    add.w        #1,a3            ; increment pointer.
    sub.w        #1,StrCnt(a2)        ; decrement length.
    addx.w        d5,d5            ; empty -> set the flag.
    move.w        d0,d4            ; copy character pair.
    rts                    ; return to caller with CCR=NE
pairTable
    dc.b    'Lj'            ; Lj
    dc.b    'LJ'            ; LJ
    dc.b    'lJ'             ; lJ
    dc.b    'lj'            ; lj
    dc.b    'Nj'            ; Nj
    dc.b    'NJ'            ; NJ
    dc.b    'nJ'            ; nJ
    dc.b    'nj'            ; nj
    dc.b    'D', $be        ; Dz-hat
    dc.b    'D', $ae        ; DZ-hat
    dc.b    'd', $ae        ; dZ-hat
    dc.b    'd', $be        ; dz-hat
    DC.B    $00, $00        ; table end
FetchProc
    with        IUStrData
    tst.b        d5            ; empty string?
    bne.s        ReturnEq        ; exit if length = 0
; if we have a double-byte char, add the second byte
    lea        CurChar(a2),a0    ; pass pointer
    move.w        d4,(a0)        ; set value at ptr
    clr.w        d0            ; pass length
    sub.l        #2,SP            ; allocate return
    move.l        a0,-(sp)        ; pointer
    move.w        d0,-(sp)        ; offset
    _CharByte
    tst.w        (sp)+            ; test return
    bmi.s        @DoubleByte        ; skip if high byte (first two)
; we don't have a double byte, but two special cases combine second bytes
    move.b        (a3),d0         ; get next byte
    cmp.b        #$DE,d0        ; nigori?
    beq.s        @DoubleByte        ; add in
    cmp.b        #$DF,d0        ; maru?
    bne.s        ReturnEq        ; exit: single byte
@DoubleByte
    move.b        (a3)+,d4        ; get next byte
    subq.w        #1,StrCnt(A2)        ; dec string length
    addx.w        d5,d5            ; set x=1 if string len = 0
 |  Back to top The Project Procedure
The Project Procedure is used to find the primary ordering for a character.
This routine will map characters that differ only in the secondary ordering
onto a single character, typically the unmodified, uppercase character. For
example, the following changes the comparison order for some Norwegian
characters, so that they occur after 'Z.' 
	| ;------------------------------------------------------------------------
; Routine    ProjectProc
; Input    A2          String Data Structure
;        D4.W        Character (top byte is char, bottom is extension
;                    (the extension is zero unless set by FetchProc))
; Output    D4.W        Projected Character
;        CCR         NE to skip normal Project
; Trashes    Standard regs:    A0/A1/D0-D2
; Function    This routine projects the secondary characters onto primary
;                  characters.
;        Example: a,Ä,Ä -> A
;------------------------------------------------------------------------
ProjectProc
    lea        ProjTable,A1    ; load table address.
@findChar
    move.l        (a1)+,D0        ; get entry
    cmp.w        d0,d4            ; original >= entry?
    bhi.s        @findChar        ; no, try the next entry.
    bne.s        ReturnEq        ; not equal, process normally
@replaceChar
    swap        d0            ; get replacement
    move.w        d0,d4            ; set new character word.
@doneChar
    rts                    ; CCR is NE to skip project.
ProjTable
;    Table contains entries of the form  r1, r2, o1, o2,
;    where r1,r2 are the replacement word, and
;    o1, o2 are the original character.
;    The entries are sorted by o1,o2 for use in the above algorithm
    DC.B    'Z', 3, 'Å', 0    ; Å after Ø
    DC.B    'Z', 3, 'Å', 0    ; Å after Ø
    DC.B    'Z', 1, 'Æ', 0    ; Æ after Z
    DC.B    'Z', 2, 'Ø', 0    ; Ø after Æ
    DC.B    'Z', 1, 'Æ', 0    ; Æ after Z
    DC.B    'Z', 2, 'Ø', 0    ; Ø after Æ
 |  
The Project procedure can also be used to undo the effects of the normal
projection. For example, suppose that "oe" is not to be expanded into "oe": in
that case, a simple test can be made against 'oe',0, returning NEif
there is a match, so that the normal processing is not done. To expand one
character into two, the routine should return the first replacement character
inD4.W, and modify two fields in theIUStrDatafield. For
example, given thatA1points to a table entry of the form(primaryCharacter: Word; secondaryCharacters: Word), the following code could
be used: 
	|     ...
    move.w        (a1)+,d4        ; return first, primary character
    move.w        (a1)+,CurChar(A2)    ; original => first, modified char.
    addq.b        #1,JustAfter(A2)    ; set to one (otherwise zero)
    move.b        (a1),BufChar(A2)    ; store second character (BYTE!)
 |  
CurCharis where the original character returned byFetchCharis
stored. If characters are different even after being projected onto their
respective primary characters, then theCurCharvalues for each string
will be compared.JustAfterindicates that the expanded character
should sort after the corresponding unexpanded form. This field must be set
wheneverCurCharis modified in order for the comparison to be fully
ordered.BufCharstores the next byte to be retrieved from the string
byFetchChar.
 
To handle the case where characters are ignored unless the two compared strings
are otherwise equal, the IgnCharflag can be set. This can be used to
handle characters such as the hyphen in English, or vowels in Arabic. 
	|     ...
    cmp.w    #hyphen,d0        ; is it a ignorable?
    seq    IgnChar(a2)        ; set whether or not
 |  Back to top The Vernier Procedure
The Vernier Procedure is used to make a final comparison among characters that
have the same primary ordering. It is only needed if the CurCharvalues are not ordered properly. For example, according to the binary encoding,
å < Ã. To change this ordering so that uppercase letters are
before lowercase letters, Ã is mapped to $7F in normal comparison. Notice
that only the characters in the secondary ordering are affected: Ã can be
mapped onto Z, but not onto Ä, since that would cause a collision. 
	| ;------------------------------------------------------------------------
; Routine    VernierProc
; Input    D4.B               High byte of character
;        D5.B               Low byte of character
; Output    D4.B               High byte of character
;        D5.B               Low byte of character
;        CCR                NE if to skip standard Vernier
; Trashes    Standard regs:    A0/A1/D0-D2
; Function    The Vernier routine compares characters within the secondary
;                      ordering if two strings are otherwise equal.
;                      Example: (a,A,Ä,Ä)
;------------------------------------------------------------------------
VernierProc
    not.b        d4        ; invert secondary ordering
    not.b        d5        ; ditto for lower byte
 |  Back to top Installing an itl2 resource 
To write an itl2resource, follow the guidelines in
M.PT.StandAloneCode for writing standalone code in MPW. The code should be
written in assembly language, and it must follow the specifications given in
this technical note or serious system errors could occur whenever string
comparisons are made. 
The default comparison routine is in the itl2resource of the System
file. In order to use a comparison routine other than the standard one, you
should include anitl2resource in your application with the same name
and resource ID as the one in the System file that you wish to change. The
Resource Manager will look for the resource in the application resource file
before it looks in the System resource file, so your string comparison routine
will be used instead of the default one. 
It is generally a dangerous practice to change a system resource since other
applications may depend on it, but if you have good reasons to permanently
change the system itl2resource so that all applications use a
different comparison routine, then you should write an installer script to
change theitl2resource in the System resource file. Writing an
installer script is documented in M.PT.Installer. You are required to write an
installer script if you are planning to ship your application on a licensed
system software disk and your application makes a permanent change to any
resources in the System file. We strongly discourage changing the Systemitl2as that would change the behavior of string comparison and
sorting for all applications. If that is your intent, then you should write an
installer script. However, if you are changing theitl2resource in
the System file for academic or internal use, then you can use a resource
editor such as ResEdit to copy youritl2resource into the System
file. Back to top ReferencesThe International Utilities M.TP.Installer M.PT.StandAloneCode Back to top Downloadables
            
               | 
 | Acrobat version of this Note (64K). | Download |  
 |