This section discusses the Mac OS services available to maximize PCI throughput.
Beginning with the Mac OS version 7.5.2 release, a DSL (Driver Services Library) that implements all programming interface services is available for drivers. The complete API for the DSL is documented in Driver Services Library.
To coordinate I/O operations that transfer buffers between system memory and PCI address space, the Macintosh OS provides two functions with the DSL: PrepareMemoryForIO, and CheckpointIO. The PrepareMemoryForIO function allocates resident system memory to buffers, provides logical and physical address information, and in conjunction with CheckpointIO manages coherency between system memory and the PowerPC caches. CheckpointIO is called after the buffer transfer is complete and either relinquishes the memory back to the OS and adjusts the processor caches for coherency, or prepares for another IO transfer.
PrepareMemoryForIO should not be confused with PCI I/O space. It is for buffers whether they are located in PCI memory or PCI I/O space.
PrepareMemoryForIO is an example of a service in the DSL; PCI cards that have DMA hardware should use PrepareMemoryForIO to locate physical addresses in system memory. Older I/O expansion cards would typically use the toolbox call GetPhysical to locate physical addresses in system memory. To be fully compatible with the present and future Mac OS releases, drivers should only use the DSL services described in Driver Services Library
Remembering that PCI address space defaults to cache inhibit mode, to enable the PowerPC to burst to areas of PCI memory space, that area must be set to cacheable. This can be done with the SetProcessorCacheMode function described in SetProcessorCacheMode. Set the desired PCI address space to kProcessorCacheModeCopyBack for cache line writes and kProcessorCacheModeWriteThrough for cache line reads.
Extreme care must be taken for burst writes to PCI address space to perform appropriate cache flushing.
Be advised that the SetProcessorCacheMode has an undocumented limitation. The PowerPC address space is divided into sixteen 256-Mbyte segments that are distinguished by the upper 4-bits of the effective address. The SetProcessorCacheMode is only capable of changing the cache setting for one contiguous section of memory per 256-Mbyte segment. Therefore, if two PCI cards are configured where they both have PCI address assignments in the same segment only one card can change its address space cache setting.
For example, if two cards (card x and card y) have addresses mapped into segment 8, one at 0x80800000 and another at 0x80801000, the first call to SetProcessorCacheMode from the driver of card x to make a cacheable address space in segment 8 will work. A second call, say from the driver of card y, to modify the cache setting in segment 8 will not work nor will it report an error. This scenario will most likely result in a lower than expected performance for card y, because card y address space is actually cache inhibited which disables PCI transactions of 32-byte cache lines. If the two cards are mapped into different segments, such as 8 and A, then they both can modify the cache settings within their perspective segments. This limitation will be relaxed in the future.
Extensions to the BlockMove routine have been incorporated in the DSL that optimize performance on the PowerPC CPU family. In particular, BlockMoveData has been optimized for data that is cacheable and BlockMoveDataUncached for data that is cache inhibited. The difference between the cached and uncached versions of these instructions is that, for BlockMoveData, the PPC dcbz instruction is used to avoid the logically unnecessary read of the destination cache blocks. BlockMoveDataUncached does not use the dcbz instruction because dcbz is extremely slow for address space marked cache inhibited or cache write thru.
The difference between BlockMove and BlockMoveData versions is whether or not the block being moved contains 68K instructions. If the data does contain 68K instructions BlockMove must be called which also flushes the DR (Dynamic Recompilation) Emulator's cache. This is costly time-wise, so if the block does not contain 68K instructions, be sure to use BlockMoveData or BlockMoveDataUncached. Also with performance in mind, when appropriate the BlockMove routines will align the source and destination address to utilize floating-point load and store instructions.
For transfers of large buffers between PCI cards the BlockMoveData or BlockMoveDataUncached functions should be used, depending if the destination address space is marked write back cacheable or not. Native PCI drivers most likely will not need to consider the non-Data variant of the BlockMove routines because destination buffers either in PCI address space or system memory will probably not need to execute 68K code.
To initiate a PCI burst of a cache line, use the BlockMoveData function. Provided the PCI address space is marked cacheable as explained earlier, the BockMoveData function forces the IB chip to burst 32-byte cache lines -- eight-beat data phases per PCI command transaction.
To read or write PCI I/O space, the Expansion Bus Manager provides routines to transfer data -- byte, word, or long word (8, 16, or 32 bits, respectively) -- using PCI I/O Read and I/O Write commands. The Expansion Bus Manager is part of the ROM firmware in PCI Power Macintosh CPUs. These routines also perform appropriate byte swapping. For a further description, refer to Expansion Bus Manager. PCI cards that are limited to I/O space, and do not incorporate PCI memory space, are limited to PCI I/O Read and I/O Write commands to transfer data between the PowerPC processor and PCI target. If PCI I/O data needs to be processed quickly, note there is a significant performance hit using Expansion Manager Routines. These routines are intended for PCI targets that have I/O registers or low bandwidth I/O buffers. The IB chip does not burst PCI I/O Read nor burst PCI I/O Write commands.
As described in Fast I/O Space Cycle Generation, the PCI property assigned-addresses provides vector entries that represent physical addresses on PCI cards. Using the APPL,address property, a driver can locate a logical address of a physical I/O resource. By accessing the logical I/O address, the IB chip generates the appropriate PCI I/O command. Therefore a driver can generate PCI I/O commands without using the Expansion Bus Manager Routines; the same way it accesses PCI memory space. This provides the fastest way to access I/O space, but note it does not perform the byte swapping provided by the Expansion Bus Manager routines.
Also note, the Expansion Bus Manager provides OS services to generate PCI Configuration Read, Configuration Write, Interrupt Acknowledge, and Special Cycle commands.
To maximize bus performance, utilize the services available in the Driver Services Library, and pay close attention to PCI chip selection, in particular, chips that can execute cache line burst transactions with Memory Read Line, Memory Read Multiple, and Memory Write and Invalidate commands.
To maximize your PCI card's performance on the Power Macintosh platform. As a PCI target, your card should
As a PCI master, your card should