Previous: Darwin Options, Up: Submodel Options



3.17.3 Intel 386 and AMD x86-64 Options

These -m options are defined for the i386 and x86-64 family of computers:

-mcpu=cpu-type
Tune to cpu-type everything applicable about the generated code, except for the ABI and the set of available instructions. The choices for cpu-type are i386, i486, i586, i686, pentium, pentium-mmx, pentiumpro, pentium2, pentium3, pentium4, k6, k6-2, k6-3, athlon, athlon-tbird, athlon-4, athlon-xp, athlon-mp, winchip-c6, winchip2 and c3.

While picking a specific cpu-type will schedule things appropriately for that particular chip, the compiler will not generate any code that does not run on the i386 without the -march=cpu-type option being used. i586 is equivalent to pentium and i686 is equivalent to pentiumpro. k6 and athlon are the AMD chips as opposed to the Intel ones.

-march=cpu-type
Generate instructions for the machine type cpu-type. The choices for cpu-type are the same as for -mcpu. Moreover, specifying -march=cpu-type implies -mcpu=cpu-type.
-m386
-m486
-mpentium
-mpentiumpro
These options are synonyms for -mcpu=i386, -mcpu=i486, -mcpu=pentium, and -mcpu=pentiumpro respectively. These synonyms are deprecated.
-mfpmath=unit
generate floating point arithmetics for selected unit unit. the choices for unit are:
387
Use the standard 387 floating point coprocessor present majority of chips and emulated otherwise. Code compiled with this option will run almost everywhere. The temporary results are computed in 80bit precision instead of precision specified by the type resulting in slightly different results compared to most of other chips. See -ffloat-store for more detailed description.

This is the default choice for i386 compiler.

sse
Use scalar floating point instructions present in the SSE instruction set. This instruction set is supported by Pentium3 and newer chips, in the AMD line by Athlon-4, Athlon-xp and Athlon-mp chips. The earlier version of SSE instruction set supports only single precision arithmetics, thus the double and extended precision arithmetics is still done using 387. Later version, present only in Pentium4 and the future AMD x86-64 chips supports double precision arithmetics too.

For i387 you need to use -march=cpu-type, -msse or -msse2 switches to enable SSE extensions and make this option effective. For x86-64 compiler, these extensions are enabled by default.

The resulting code should be considerably faster in majority of cases and avoid the numerical instability problems of 387 code, but may break some existing code that expects temporaries to be 80bit.

This is the default choice for x86-64 compiler.

sse,387
Attempt to utilize both instruction sets at once. This effectively double the amount of available registers and on chips with separate execution units for 387 and SSE the execution resources too. Use this option with care, as it is still experimental, because gcc register allocator does not model separate functional units well resulting in instable performance.

-masm=dialect
Output asm instructions using selected dialect. Supported choices are intel or att (the default one).
-mieee-fp
-mno-ieee-fp
Control whether or not the compiler uses IEEE floating point comparisons. These handle correctly the case where the result of a comparison is unordered.
-msoft-float
Generate output containing library calls for floating point. Warning: the requisite libraries are not part of GCC. Normally the facilities of the machine's usual C compiler are used, but this can't be done directly in cross-compilation. You must make your own arrangements to provide suitable library functions for cross-compilation.

On machines where a function returns floating point results in the 80387 register stack, some floating point opcodes may be emitted even if -msoft-float is used.

-mno-fp-ret-in-387
Do not use the FPU registers for return values of functions.

The usual calling convention has functions return values of types float and double in an FPU register, even if there is no FPU. The idea is that the operating system should emulate an FPU.

The option -mno-fp-ret-in-387 causes such values to be returned in ordinary CPU registers instead.

-mno-fancy-math-387
Some 387 emulators do not support the sin, cos and sqrt instructions for the 387. Specify this option to avoid generating those instructions. This option is the default on FreeBSD, OpenBSD and NetBSD. This option is overridden when -march indicates that the target cpu will always have an FPU and so the instruction will not need emulation. As of revision 2.6.1, these instructions are not generated unless you also use the -funsafe-math-optimizations switch.
-malign-double
-mno-align-double
Control whether GCC aligns double, long double, and long long variables on a two word boundary or a one word boundary. Aligning double variables on a two word boundary will produce code that runs somewhat faster on a Pentium at the expense of more memory.

Warning: if you use the -malign-double switch, structures containing the above types will be aligned differently than the published application binary interface specifications for the 386 and will not be binary compatible with structures in code compiled without that switch.

-m128bit-long-double
Control the size of long double type. i386 application binary interface specify the size to be 12 bytes, while modern architectures (Pentium and newer) prefer long double aligned to 8 or 16 byte boundary. This is impossible to reach with 12 byte long doubles in the array accesses.

Warning: if you use the -m128bit-long-double switch, the structures and arrays containing long double will change their size as well as function calling convention for function taking long double will be modified.

-m96bit-long-double
Set the size of long double to 96 bits as required by the i386 application binary interface. This is the default.
-msvr3-shlib
-mno-svr3-shlib
Control whether GCC places uninitialized local variables into the bss or data segments. -msvr3-shlib places them into bss. These options are meaningful only on System V Release 3.
-mrtd
Use a different function-calling convention, in which functions that take a fixed number of arguments return with the ret num instruction, which pops their arguments while returning. This saves one instruction in the caller since there is no need to pop the arguments there.

You can specify that an individual function is called with this calling sequence with the function attribute stdcall. You can also override the -mrtd option by using the function attribute cdecl. See Function Attributes.

Warning: this calling convention is incompatible with the one normally used on Unix, so you cannot use it if you need to call libraries compiled with the Unix compiler.

Also, you must provide function prototypes for all functions that take variable numbers of arguments (including printf); otherwise incorrect code will be generated for calls to those functions.

In addition, seriously incorrect code will result if you call a function with too many arguments. (Normally, extra arguments are harmlessly ignored.)

-mregparm=num
Control how many registers are used to pass integer arguments. By default, no registers are used to pass arguments, and at most 3 registers can be used. You can control this behavior for a specific function by using the function attribute regparm. See Function Attributes.

Warning: if you use this switch, and num is nonzero, then you must build all modules with the same value, including any libraries. This includes the system libraries and startup modules.

-mpreferred-stack-boundary=num
Attempt to keep the stack boundary aligned to a 2 raised to num byte boundary. If -mpreferred-stack-boundary is not specified, the default is 4 (16 bytes or 128 bits), except when optimizing for code size (-Os), in which case the default is the minimum correct alignment (4 bytes for x86, and 8 bytes for x86-64).

On Pentium and PentiumPro, double and long double values should be aligned to an 8 byte boundary (see -malign-double) or suffer significant run time performance penalties. On Pentium III, the Streaming SIMD Extension (SSE) data type __m128 suffers similar penalties if it is not 16 byte aligned.

To ensure proper alignment of this values on the stack, the stack boundary must be as aligned as that required by any value stored on the stack. Further, every function must be generated such that it keeps the stack aligned. Thus calling a function compiled with a higher preferred stack boundary from a function compiled with a lower preferred stack boundary will most likely misalign the stack. It is recommended that libraries that use callbacks always use the default setting.

This extra alignment does consume extra stack space, and generally increases code size. Code that is sensitive to stack space usage, such as embedded systems and operating system kernels, may want to reduce the preferred alignment to -mpreferred-stack-boundary=2.

-mmmx
-mno-mmx
-msse
-mno-sse
-msse2
-mno-sse2
-m3dnow
-mno-3dnow
These switches enable or disable the use of built-in functions that allow direct access to the MMX, SSE and 3Dnow extensions of the instruction set.

See X86 Built-in Functions, for details of the functions enabled and disabled by these switches.

To have SSE/SSE2 instructions generated automatically from floating-point code, see -mfpmath=sse.

-mpush-args
-mno-push-args
Use PUSH operations to store outgoing parameters. This method is shorter and usually equally fast as method using SUB/MOV operations and is enabled by default. In some cases disabling it may improve performance because of improved scheduling and reduced dependencies.
-maccumulate-outgoing-args
If enabled, the maximum amount of space required for outgoing arguments will be computed in the function prologue. This is faster on most modern CPUs because of reduced dependencies, improved scheduling and reduced stack usage when preferred stack boundary is not equal to 2. The drawback is a notable increase in code size. This switch implies -mno-push-args.
-mthreads
Support thread-safe exception handling on Mingw32. Code that relies on thread-safe exception handling must compile and link all code with the -mthreads option. When compiling, -mthreads defines -D_MT; when linking, it links in a special thread helper library -lmingwthrd which cleans up per thread exception handling data.
-mno-align-stringops
Do not align destination of inlined string operations. This switch reduces code size and improves performance in case the destination is already aligned, but gcc don't know about it.
-minline-all-stringops
By default GCC inlines string operations only when destination is known to be aligned at least to 4 byte boundary. This enables more inlining, increase code size, but may improve performance of code that depends on fast memcpy, strlen and memset for short lengths.
-momit-leaf-frame-pointer
Don't keep the frame pointer in a register for leaf functions. This avoids the instructions to save, set up and restore frame pointers and makes an extra register available in leaf functions. The option -fomit-frame-pointer removes the frame pointer for all functions which might make debugging harder.

These -m switches are supported in addition to the above on AMD x86-64 processors in 64-bit environments.

-m32
-m64
Generate code for a 32-bit or 64-bit environment. The 32-bit environment sets int, long and pointer to 32 bits and generates code that runs on any i386 system. The 64-bit environment sets int to 32 bits and long and pointer to 64 bits and generates code for AMD's x86-64 architecture.
-mno-red-zone
Do not use a so called red zone for x86-64 code. The red zone is mandated by the x86-64 ABI, it is a 128-byte area beyond the location of the stack pointer that will not be modified by signal or interrupt handlers and therefore can be used for temporary data without adjusting the stack pointer. The flag -mno-red-zone disables this red zone.
-mcmodel=small
Generate code for the small code model: the program and its symbols must be linked in the lower 2 GB of the address space. Pointers are 64 bits. Programs can be statically or dynamically linked. This is the default code model.
-mcmodel=kernel
Generate code for the kernel code model. The kernel runs in the negative 2 GB of the address space. This model has to be used for Linux kernel code.
-mcmodel=medium
Generate code for the medium model: The program is linked in the lower 2 GB of the address space but symbols can be located anywhere in the address space. Programs can be statically or dynamically linked, but building of shared libraries are not supported with the medium model.
-mcmodel=large
Generate code for the large model: This model makes no assumptions about addresses and sizes of sections. Currently GCC does not implement this model.