| 
    
        |  | This Technical Note documents cache behavior, manipulation
         of processor caches, and manipulation of external caches on
         Macintosh models that incorporate these features. It also
         describes how system software uses a memory management unit
         (when available) to implement special caching options. [Oct 01 1991] |  
 
 
 
 
 Cache MachinesThe Motorola MC68020 microprocessor includes a 256-byte
         internal instruction cache. The MC68030 includes a
         similar-size instruction cache plus a 256-byte writethrough
         data cache. The MC68040 has much larger caches, 4K of
         instructions, and 4K of data. It also supports copyback
         caching in addition to the writethrough caching used by the
         MC68030. The difference between writethrough and copyback caching
         is a matter of whether data writes go directly and
         immediately to main memory, or whether they go only as far
         as the data cache to be copied back to main memory later (if
         necessary) in a highly optimized fashion. The MC68030 and MC68040 include memory management units
         internally. Besides the ability to divide memory into
         logical pages and provide memory access control, these
         memory management units can also associate cacheability
         attributes with individual pages of memory, affecting how
         data is cached on a page-by-page basis. Back to top Stale Data (Baked Fresh Daily)Caching greatly improves overall system performance but
         introduces the problem of stale data, or inconsistency
         between cached data and the data in actual memory (RAM). In
         certain cases, cache maintenance instructions are necessary
         to maintain coherency between cache and main memory. Stale InstructionsThe first time when stale data becomes a problem occurs
         when writing self-modifying code on the MC68020 or any other
         processor with an instruction cache. The instruction cache
         remembers, separately from main memory, many of the
         instructions it has recently executed. If the processor
         executes an instruction, later changes that instruction in
         memory, and then tries to execute the new instruction at the
         same address, there is a probability that the original
         instruction is still cached. Since the cache is used before
         main memory, the old instruction may be executed instead of
         the new one, resulting in incorrect program operation. To prevent this, any time a program changes an executable
         instruction in memory, it must flush the instruction cache
         before attempting to execute the modified instruction.
         Flushing a cache invalidates its entries, forcing the
         processor to refill cache entries from main memory. In part,
         this defeats the purpose of a cache and hurts performance.
         Nevertheless, the alternative is incorrect program
         operation. This serves to emphasize that caches must always
         be flushed judiciously in order to maintain correct
         operation and optimal performance. As described, self-modifying code is not just code that
         changes itself directly as it executes. It can be much more
         subtle. Code that modifies a jump table entry is modifying
         executable code and must flush the instruction cache. Patch
         installation code often copies code from one block of memory
         into another and may modify one or more JMPinstruction operands in order to get back to the original
         routine--either technique requires flushing the instruction
         cache. Stale DataWith the addition of the data cache in the MC68030,
         performance is further enhanced, but another cache offers
         another source of stale data. Let's say that you have a whizzy disk controller card
         that supports DMA. The board reads command buffers from the
         main CPU's memory area and writes status information back to
         the command buffer when done. Before the command is started,
         the MC68030 sets up the command buffer and zeroes the status
         code (the following figures are not to scale). 
 Figure 1. Write (Writethrough Cache) At this point the cache and the memory both contain the
         value 0, since the MC68030's cache is writethrough (that is,
         it always writes data to memory immediately). Now the
         MC68030 starts the command running and waits for an
         interrupt from the disk controller card. It then reads back
         the status from the command buffer, which is modified by the
         DMA card. 
 Figure 2. Read (From Cache) Oops! Because the status code's value is already cached,
         the MC68030 thinks that the status is 0, even though the
         actual value in memory is -23. This type of thing can cause
         some very hard-to-find bugs in your driver. Copyback Data and Stale InstructionsThere is another type of cache called a copyback cache
         that is supported by more advanced microprocessors like the
         MC68040. A copyback cache further improves system
         performance by writing data to external memory only when
         necessary to make room in the cache for more recent
         accesses, or when explicitly "pushed" by system software.
         This is extremely valuable for relatively small, short-lived
         data that are accessed frequently but don't need to persist
         for a long time, like local stack frames in C and Pascal
         function calls. This increase in performance again comes at some cost in
         terms of maintaining cache coherency. Here, the problem is
         twofold. Fundamentally, a datum that is "written to memory"
         isn't really in memory (meaning main RAM) until it's pushed
         out of the data cache. When performing DMA, it is necessary
         to push data cache contents into memory before instructing
         alternate bus masters to look for it; they'll only find
         stale data if it's still cached. Second, and perhaps even
         more important, the instruction and data caches are
         completely independent of each other. When fetching
         instructions, the processor looks in only two places: first
         the instruction cache, then main memory. It does not look in
         the data cache. When performing the types of operations
         described above that can cause a stale instruction cache,
         one must remember that it is impossible to make the
         instruction cache and memory coherent if memory itself is
         stale! The data cache must be flushed; then and only then
         can the instruction cache refill with the valid data the
         processor has written. Here, some code writes the _LoadSegtrap to memory as
         part of a jump table update. Figure 3 indicates what happens
         if only the instruction cache is flushed. When execution
         later proceeds through that jump table entry, the processor
         fetches the opcode from that address and gets zonked with an
         illegal F-line exception. Why?_LoadSegis still in the data
         cache. The code responsible for maintaining the jump table
         failed to push the contents of the data cache before
         invalidating the instruction cache. This certainly causes
         problems on the MC68040. 
 Figure 3. Write (Copyback Cache) and Fetch Another similar problem applies to the time at which
         cache flushing is performed. When using a writethrough data
         cache, it is acceptable to invalidate the instruction cache
         first and then modify instructions in memory. With a
         copyback data cache, it is imperative to make changes to
         memory first and then flush caches. Again, this ensures that
         copyback data is written to memory before the instruction
         cache attempts to refill from memory. The key point to
         remember is that the MC68040 instruction cache always reads
         from memory, never from the data cache. Figure 4 shows the path that an instruction properly
         takes when it is first written as data by a program that
         modifies instructions in memory. 
 Figure 4. Write
         (Copyback Cache), Push, and Fetch It's worth noting here that although pushing copyback
         data to memory and invalidating (flushing) the cache are
         conceptually different operations, they are at least for the
         MC68040 irrevocably connected. This makes flushing the data
         cache for the sake of pushing its contents to memory a
         potentially expensive one. Valid cache data is essentially
         lost when it is pushed and must be read from main memory if
         it is to be accessed again. This should be another
         reinforcement that cache flushing must be performed
         judiciously. It is possible to flush only a portion of the
         MC68040 caches, and software that flushes caches frequently
         should consider this optimization to avoid unnecessary
         performance degradation when running on this processor. See
         the interfaces provided below. What Is Apple's Part in This?There are two answers to this question. First, there are
         things that Apple has done in ROM to make life easier while
         dealing with a caching processor. Second, there are
         functions provided in ROM or in system software to allow
         developers to take some control of their own destinies. Things That Happen for YouEver since the Macintosh II made its debut, it has been
         flushing the instruction cache. It does so at a number of
         critical points where code may be moved to a new location,
         potentially leaving memory and the instruction cache
         incoherent. Specifically, there are a number of traps that
         have the potential to move code around memory. In each of
         these cases, the instruction cache is flushed by system
         software or ROM. _BlockMove _LoadSeg
 _Read _UnloadSeg
 Warning: The _BlockMovetrap is not
         guaranteed to flush caches for block sizes 12 bytes or less.
         This is a performance optimization._BlockMoveis
         called often by system software to move small blocks of
         memory that are not executable instructions. Flushing the
         cache in all such cases causes significant performance
         degradation. When moving very small blocks of code with
         _BlockMove, use one of the explicit cache flushing routines
         described below. 
 
| Note:C programmers should not
         assume that the standard library function
 memcpy()invokes_BlockMove. An explicit cache flush is required after moving
         code withmemcpy(). |  
 In general, there may be others. As a rule of thumb, the
         instruction cache needs to be flushed explicitly only as a
         result of actions taken by user code, not as the result of
         anything a trap might have done. Traps can take care of
         themselves. A memory management unit allows individual pages of
         memory to be marked noncachable. In current Macintosh
         implementations, NuBus(TM) and I/O address spaces are always
         marked noncachable--the processor won't cache memory stored
         at NuBus or I/O addresses. This solves any problems of stale
         data when processor/DMA "mailboxes" are located in NuBus
         memory and eliminates the fundamental problem of stale data
         at memory-mapped I/O locations. Data at RAM and ROM
         addresses are cachable, which makes sense and maximizes
         performance. Since DMA still poses a problem when common buffers are
         located in main RAM, it would seem that there should be
         greater intrinsic support for specifying cachability. There
         is. In order for DMA masters to be compatible with abstract
         memory architectures like those defined by the Macintosh
         IIci and even more so by virtual memory, they must use the
         GetPhysical routine. Before using GetPhysical, a range must
         always be locked withLockMemory. Since this sequence is so
         commonly required when performing DMA, the LockMemory
         routine has the effect of either disabling the data cache or
         marking the corresponding pages noncachable, depending on
         what's possible and what makes the most sense. In many
         cases, therefore, it is unnecessary to explicitly flush the
         data cache. If common DMA buffers are locked with
         LockMemory, the operating system ensures cache coherency at
         least for those buffers. To ensure compatibility with existing code while taking
         advantage of copyback cache mode, the FlushInstructionCachefunction on an MC68040 actually flushes both caches using
         the CPUSHA BC instruction. This prevents the need for
         modification of correct existing code which properly flushes
         the instruction cache withFlushInstructionCache. If code is
         written properly for the MC68020 and MC68030, it will work
         on the MC68040 as well, without modification. If code is
         written incorrectly or directly manipulates the CACR
         register of these processors it will fail on the MC68040.
         When modifying code in memory or moving code about memory,
         use FlushInstructionCache before executing that code. Facilities That Are Provided for YouApple provides some system calls that let you flush the
         data and instruction caches without using privileged
         instructions (which is, as you should all know by now, a
         major no-no). Following are the interfaces for these calls, for MPW
         Pascal and C (respectively): 
	|    FUNCTION SwapInstructionCache (cacheEnable: BOOLEAN) : BOOLEAN;
   pascal Boolean SwapInstructionCache (Boolean cacheEnable); |  This call enables or disables the instruction cache
         according to the state passed in cacheEnableand
         returns the previous state of the instruction cache as a
         result. 
	| PROCEDURE FlushInstructionCache;
pascal void FlushInstructionCache (void); |  This call flushes the current contents of the instruction
         cache. This has an adverse effect on CPU performance, so
         only call it when absolutely necessary. 
	|    FUNCTION SwapDataCache (cacheEnable: BOOLEAN) : BOOLEAN;
   pascal Boolean SwapDataCache (Boolean cacheEnable); |  This call enables or disables the data cache according to
         the state passed in cacheEnableand returns the
         previous state of the data cache as a result. 
	| PROCEDURE FlushDataCache;
pascal void FlushDataCache (void); |  This call flushes the current contents of the data cache.
         This has an adverse effect on CPU performance, so only call
         it when absolutely necessary. 
 
| Note:Before you call any of these routines, make
         sure that the
 _HwPriv($A198) trap is implemented, or your
         program will crash._HwPrivis implemented in the Macintosh
         IIx ROMs and later, as well as System 6.0.3 and later. The
         correct way to check for the trap is using the TrapAvailable
         function documented in Inside Macintosh Volume VI
         (pages 3-7 to 3-9). 
These calls are provided as part of the MPW 3.1 library.
         For those of you without MPW 3.1 or later, you can use the
         following MPW assembly-language glue: |  
 
	|             CASE OFF
_HwPriv     OPWORD  $A198
SwapInstructionCache PROC EXPORT
            MOVEA.L (A7)+,A1        ; save return address
            MOVEQ   #0,D0           ; clear D0 before we shove Boolean into it
            MOVE.B  (A7)+,D0        ; D0 <- new mode
            MOVE.L  D0,A0           ; _HwPriv wants mode in A0
            CLR.W   D0              ; set low word to 0 (routine selector)
            _HwPriv
            MOVE.W  A0,D0           ; move old state of cache to D0
            TST.W   D0              ; if nonzero, cache was enabled
            BEQ.S   WasFalse        ; if zero, leave result false
            MOVEQ   #1,D0           ; set result to true
WasFalse:
            MOVE.B  D0,(A7)         ; save result on stack
            JMP     (A1)
            ENDPROC
FlushInstructionCache PROC EXPORT
            MOVEA.L (A7)+,A1        ; save return address
            MOVEQ   #1,D0           ; set low word to 1 (routine selector)
            _HwPriv
            JMP     (A1)
            ENDPROC
SwapDataCache PROC EXPORT
            MOVEA.L (A7)+,A1        ; save return address
            MOVEQ   #0,D0           ; clear D0 before we shove Boolean into it
            MOVE.B  (A7)+,D0        ; D0 <- new mode
            MOVE.L  D0,A0           ; _HwPriv wants mode in A0
            MOVE.W  #2,D0           ; set low word to 2 (routine selector)
            _HwPriv
            MOVE.W  A0,D0           ; move old state of cache to D0
            TST.W   D0              ; if nonzero, cache was enabled
            BEQ.S   WasFalse        ; if zero, leave result false
            MOVEQ   #1,D0           ; set result to true
WasFalse:
            MOVE.B  D0,(A7)         ; save result on stack
            JMP     (A1)
            ENDPROC
FlushDataCache PROC EXPORT
            MOVEA.L (A7)+,A1        ; save return address
            MOVEQ   #$3,D0          ; set low word to 3 (routine selector)
            _HwPriv
            JMP     (A1)
            ENDPROC |  There are two additional calls whose interfaces follow.
         Each requires a little explanation. The first call is FlushCodeCache, which simply
         invokes the_CacheFlush ($A0BD) trap. This
         trap's function is to make the instruction cache coherent
         with memory. On the MC68020 and MC68030 it simply flushes
         the instruction cache. On the MC68040 it also flushes the
         data cache for copyback compatibility. The advantage ofFlushCodeCacheas opposed toFlushInstructionCacheis that it was implemented
         before the_HwPrivtrap, and thus can be used on
         the Macintosh II while running older system software. In general, FlushInstructionCacheis still the
         preferred application-level cache flushing mechanism.FlushInstructionCachecallsFlushCodeCacheand is therefore a higher-level call conceptually.FlushCodeCachemay be useful whereFlushInstructionCacheproves unsuitable, or as an
         alternative to the next call,FlushCodeCacheRange.
         Obviously, before callingFlushCodeCache, be
         certain that_CacheFlushis implemented. 
 
| Note:If the processor has a cache to flush, this
         trap should be properly implemented, because ROM and system
         software use this trap's vector to do their own cache
         flushing. In fact,
 FlushInstructionCacheitself
         uses this vector. This should be of particular interest to
         accelerator card developers. |  
 
	|             MACRO
            _FlushCodeCache
            _CacheFlush
            ENDM
PROCEDURE FlushCodeCache;
    INLINE $A0BD;
void FlushCodeCache (void) = 0xA0BD; |  The second call is FlushCodeCacheRange.FlushCodeCacheRangeis an optimization ofFlushCodeCachedesigned for processors like the
         MC68040 which support flushing only a portion of the cache.
         (The MC68020 and MC68030 do not support this feature andFlushCodeCacheRangesimply flushes the entire
         instruction cache on those processors.) As described
         earlier, pushing and flushing cache entries are linked and
         flushing the entire cache after a small change like a jump
         table entry can be expensive.FlushCodeCacheRangeallows one to request that only a specific memory range be
         flushed, leaving the rest of the cache intact. Note that
         this is only a request and that more than the requested
         range may be flushed if it proves inefficient to satisfy the
         request exactly. Also,FlushCodeCacheRangemay not
         be implemented for some older versions of system software
         that are not MC68040-aware. If not,FlushCodeCacheRangereturnshwParamErr(-502) and it is necessary to flush the entire cache
         instead, probably usingFlushCodeCache. IfFlushCodeCacheRangesucceeds it returnsnoErr(0). Before callingFlushCodeCacheRange, be certain that_HwPrivis implemented. 
	| ; _FlushCodeCacheRange takes/returns the following parameters:
;    -> A0.L =   Base of range to flush
;    -> A1.L =   Length of range to flush
;    <- D0.W =   Result code (noErr = 0, hwParamErr = -502)
            MACRO
            _FlushCodeCacheRange
            moveq   #9,d0
            _HwPriv
            ENDM
FUNCTION FlushCodeCacheRange (address: UNIV Ptr; count: LongInt) : OSErr;
    INLINE $225F,   { MOVEA.L (SP)+,A1 }
           $205F,   { MOVEA.L (SP)+,A0 }
           $7009,   { MOVEQ   #9,D0    }
           $A198,   { _HwPriv          }
           $3E80;   { MOVE.W  D0,(SP)  }
// MPW C 3.2 makes register-based inline calls very efficient.
#pragma parameter __D0 FlushCodeCacheRange(__A0,__A1)
OSErr FlushCodeCacheRange (void *address, unsigned long count) =
    {0x7009, 0xA198};
/* MPW C 3.1 and earlier, and THINK C(TM) should declare the function as   */
/* "pascal" and use the same inline constants as the Pascal interface: */
pascal OSErr FlushCodeCacheRange (void *address, unsigned long count) =
    {0x225F, 0x205F, 0x7009, 0xA198, 0x3E80}; |  Back to top Caching Consequences - LocalTalkAs noted above, altering the state of the data/code cache
         significantly affects the performance of the 68040
         processor. This change in effective CPU speed may affect any
         background process that is dependent on the processor speed
         remaining constant. LocalTalk is an example of one such
         affected process, as it employs speed sensitive timing
         loops. The change in CPU speed affects the LocalTalk timers,
         to the extent that the LocalTalk no longer functions
         correctly if it is the current AppleTalk connection. Fortunately, the AppleTalk Transition Queue mechanism can
         be used to notify LocalTalk of the change in effective CPU
         speed. Upon notification, LocalTalk recalculates its timer
         values to match the current CPU speed. Refer to Inside
         Macintosh Volume VI, page 32-17, and to Technical Note
         #311, "What's New In AppleTalk Phase 2," for additional
         information on the use of the AppleTalk Transition
         Queue. The following code demonstrates the use of the
         ATEventprocedure to send theATTransSpeedChangeevent. TheATEventcall
         is provided as part of the MPW 3.2 library. 
 
| Note:Issue the
 ATTransSpeedChangeevent only atSystemTasktime! |  
 
	| USES AppleTalk;  { ATEvent prototyped in AppleTalk unit, MPW 3.2 }
CONST
    ATTransSpeedChange = 'sped'; {change in cpu speed transition }
PROCEDURE NotifyLocalTalkSpeedChange;
    BEGIN
        if LAPMgrExists THEN { check LAP Manager exists, see Technote 311 }
                        { for the code for LAPMgrExists }
            ATEvent(longint(ATTransSpeedChange), NIL);
                        { notify speed change event }
    END; |  Note that only LocalTalk drivers that are included with
         AppleTalk version 57 or greater, respond to the
         ATTranSpeedChangeevent. System 7.0.1 for the Quadra's, is
         supplied with AppleTalk version 56. AppleTalk version 57 is
         available by using the AppleTalk Remote Access Installation
         program, or the Network Software Installer version 1.1.
         Licensing for AppleTalk can be arranged by contacting Apple
         Software Licensing. Software Licensing can be reached as
         follows: Software LicensingApple Computer, Inc.
 20525 Mariani Avenue, M/S 38-ICupertino, CA 95014
 MCI: 312-5360
 AppleLink: SW.LICENSE
 Internet: SW.LICENSE@AppleLink.Apple.com
 (408) 974-4667
 AppleTalk version 53 or greater is required to handle the
         ATEvent call, however, nothing bad will happen if you issue
         the ATTranSpeedChangetransition event under AppleTalk
         versions 53 - 56. It is important to check that the LAP
         Manager is implemented before issuing theATEventcall. See Technote 311 for a description of theLAPMgrExistsfunction. Back to top External CachesThe Macintosh IIci and Macintosh IIsi support external
         cache cards. Because of the way these caches work, cache
         coherency is not much of a problem. In fact these caches are
         usually enabled full-time and their operations are totally
         transparent to all well-behaved hardware and software.
         Still, there are corresponding cache control functions to
         enable, disable, and flush these cache cards. If _HwPriv is
         implemented, the following routines may be used: 
	|             MACRO
            _EnableExtCache
            moveq   #4,d0
            _HwPriv
            ENDM
PROCEDURE EnableExtCache;
    INLINE $7004,$A198;
void EnableExtCache (void) = {0x7004, 0xA198};
            MACRO
            _DisableExtCache
            moveq   #5,d0
            _HwPriv
            ENDM
PROCEDURE DisableExtCache;
    INLINE $7005,$A198;
void DisableExtCache (void) = {0x7005, 0xA198};
            MACRO
            _FlushExtCache
            moveq   #6,d0
            _HwPriv
            ENDM
PROCEDURE FlushExtCache;
    INLINE $7006,$A198;
void FlushExtCache (void) = {0x7006, 0xA198}; |  Back to top ReferencesInside Macintosh, Volume V, Operating System
            Utilities Inside Macintosh, Volume VI, Compatibility
            Guidelines Designing Cards and Drivers for the Macintosh
            Family M68000 Family Programmer's Reference
            Manual M68020 32-Bit Microprocessor User's
            Manual M68030 Enhanced 32-Bit Microprocessor User's
            Manual M68040 32-Bit Third-Generation Microprocessor
            User's Manual NuBus(TM) is a trademark of Texas Instruments. THINK C is a trademark of Symantec Corp. Back to top 
 Change History
            
               | 01-October-1991 | Described use of
         AppleTalk Transition Queue event, ATTransSpeedChange, when
         altering the 68040 cache state on the fly. This call must be
         issued so that LocalTalk can reevaluate its timers.
         Otherwise LocalTalk becomes disabled. |  
 Back to top 
 Downloadables
            
               | 
 | Acrobat version of this Note (144K) | Download |  
 Back to top |