This chapter describes the C data types and intrinsics for use in programming SSE. It also shows how to detect the availability of SSE3 at run time.
Data Types and Intrinsics
Detecting SSE3
Like AltiVec, there is a C Programming Interface for SSE. The two follow the same general design:
The SIMD vector register is described in C as a special 128 bit data type.
A series of function-like intrinsics are used to do SIMD style operations on those variables.
A notable difference is that many more intrinsics in the Intel C programming extensions do not correspond 1:1 with instructions in the ISA. Some developers may choose to limit their use of intrinsics to those that map 1:1 with ISA, so as not to introduce hidden expensive calculations.
Data Types
Intel defines three basic data types for SSE programming in C:
These types are portable across the Gnu C Compiler, the Intel C Compiler and various x86 C compilers targeted towards the Windows™ operating system.
One shortcoming of this set of data types is that the __m128i
type does not adequately describe the type and number of integer
elements in the __m128i vector. Both Intel and Microsoft defined
extensions to this subset to build in this information, and Apple
is no exception. The Accelerate.framework defines a series of vector
types that may be used for both AltiVec and SSE programming. It
is recommended that you use these, since the extra information will
make it easier to read your own code and make it possible for gdb
and xcode to properly format vector data. In addition, it will allow
you to share data types with AltiVec, which may simplify some programming
tasks. To use the types described below, use the following #include
line:
#include <Accelerate/Accelerate.h> |
8-bit |
16-bit |
32-bit |
64-bit |
|
---|---|---|---|---|
signed |
vSInt8 |
vSInt16 |
vSInt32 |
vSInt64 |
unsigned |
vUInt8 |
vUInt16 |
vUInt32 |
vUInt64 |
floating point |
- |
- |
vFloat |
vDouble |
Please note that while the 64-bit types are indeed defined for AltiVec by Accelerate.framework (and do work in the sense that you can load and store vectors full of 64-bit data types in and out of AltiVec register), there are no intrinsics (or instructions) defined by AltiVec itself to do SIMD style operations on elements of this size. The Accelerate.framework vBasicOps.h header declares some functions to allow you to do packed 64-bit integer operations. (These function using AltiVec intrinsics for smaller element sizes to build up larger operations — see available source code for vBasicOpsavailable source code for vBasicOps.) Certain C language operators (e.g. +, -, *, /) may function with the vDouble type on GCC-4.0 and later on PowerPC. However these simply map the vector type to the scalar FPU and do standard arithmetic on the data using scalar code.
Intrinsics
Intel also defines a set of function-like intrinsics for programming
SSE in C. These are similar to those provided by AltiVec, with some
small differences. The Intel intrinsics use _mm_-
instead
of vec_-
as the operator
prefix. In addition, where AltiVec relies on C++ style function
overloading to decide based on argument type which particular flavor
of add to use among many, Intel has encoded this information as
a suffix on the intrinsic:
The suffixes are defined as follows:
The various intrinsics are available in one of four headers, one each for MMX, SSE, SSE2, and SSE3, when the corresponding ISA appeared:
MMX |
mmintrin.h |
SSE |
xmmintrin.h |
SSE2 |
emmintrin.h |
SSE3 |
pmmintrin.h |
The complete set of operations available for the Intel architecture is detailed in the Intel Architecture Software Developer's Manual (Volume 2, see link in the Introduction at top of page). There is a partial AltiVec to SSE translation table in the Universal Binary Programming Guide, Appendix B. More thorough conversion tables appear in various segments entitled Algorithms/Conversions in the part of this document to follow.
In addition, GCC has a set of GCC native non-portable intrinsics,
described here. Please note that these are subject to change. GCC
can and does regularly remove __builtins
from
the programming environment.
Sample function
Here is a function that calculates the distances from the origin {0,0} of a set of 4 {x,y} pairs in AltiVec:
#include <Accelerate/Accelerate.h> //contains data types used |
vFloat Distance( vFloat x, vFloat y ) |
{ |
vFloat x2 = vec_madd( x, x, (vFloat) (-0.0f) ); //x * x |
vFloat distance2 = vec_madd( y, y, x2 ); // x*x + y*y |
return vsqrtf( distance2 ); //from Accelerate.framework |
} |
and here is the same thing in SSE:
#include <Accelerate/Accelerate.h> //contains data types used |
#include <xmmintrin.h> //declares _mm_* intrinsics |
vFloat Distance( vFloat x, vFloat y ) |
{ |
vFloat x2 = _mm_mul_ps( x, x); //x * x |
vFloat distance2 = _mm_add_ps(_mm_mul_ps( y, y), x2); // x*x + y*y |
return vsqrtf( distance2 ); //from Accelerate.framework |
} |
If you wish to tie yourself to GCC specific features, you may investigate GCC's unified vector programming interfaces. That would allow you to write the following and compile for both platforms:
#include <Accelerate/Accelerate.h> |
//Not portable to other compilers! |
vFloat Distance( vFloat x, vFloat y ) |
{ |
return vsqrtf( x*x + y*y ); //from Accelerate.framework |
} |
Since this is a new feature, it is suggested that you inspect generated code thoroughly. In addition, there are clearly other ways to do the same thing, using some inline functions or macros using more traditional interfaces, that may preserve your compiler independence.
SSE3 is an optional hardware feature on MacOS X for Intel.
If you wish to use SSE3 features, you must detect them first, similar
to how you are required to check for AltiVec. The same interfaces
are used, just a different sysctlbyname()
selector:
#include <sys/sysctl.h> |
int IsSSE3Present( void ) |
{ |
int hasSSE3 = 0; |
size_t length = sizeof( hasSSE3 ); |
int error = sysctlbyname("hw.optional.sse3", &hasSSE3, &length, NULL, 0); |
if( 0 != error ) return 0; |
return hasSSE3; |
} |
Similar selectors exist for MMX, SSE and SSE2, but since those are required features for MacOS X for Intel, it is not required that you test them before using those vector extensions, in software intended solely for MacOS X for Intel. (SSE is not available in any format for MacOS X for PowerPC and AltiVec is not available for MacOS X for Intel. When writing code for Universal Binaries to run on MacOS X, you should conditionalize your code using appropriate symbols like __VEC__ and __SSE2__ to prevent the compiler from seeing vector code for unsupported architectures for each fork of the universal binary.)
© 2005 Apple Computer, Inc. All Rights Reserved. (Last updated: 2005-09-08)