[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
As m4
reads its input, it separates it into tokens. A
token is either a name, a quoted string, or any single character, that
is not a part of either a name or a string. Input to m4
can also
contain comments. GNU m4
does not yet understand
locales; all operations are byte-oriented rather than
character-oriented. However, m4
is eight-bit clean, so you can
use non-ASCII characters in quoted strings (see section Changing the quote characters),
comments (see section Changing comment delimiters), and macro names (see section Indirect call of macros), with the
exception of the NUL character (the zero byte `'\0'').
2.1 Names | Macro names | |
2.2 Quoted strings | Quoting input to m4 | |
2.3 Comments | Comments in m4 input | |
2.4 Other tokens | Other kinds of input tokens | |
2.5 Input Processing | How m4 copies input to output |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A name is any sequence of letters, digits, and the character _
(underscore), where the first character is not a digit. m4
will
use the longest such sequence found in the input. If a name has a
macro definition, it will be subject to macro expansion
(see section How to invoke macros). Names are case-sensitive.
Examples of legal names are: `foo', `_tmp', and `name01'.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A quoted string is a sequence of characters surrounded by quote strings, defaulting to ` and ', where the nested begin and end quotes within the string are balanced. The value of a string token is the text, with one level of quotes stripped off. Thus
`' ⇒ |
is the empty string, and double-quoting turns into single-quoting.
``quoted'' ⇒`quoted' |
The quote characters can be changed at any time, using the builtin macro
changequote
. See section Changing the quote characters, for more information.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Comments in m4
are normally delimited by the characters `#'
and newline. All characters between the comment delimiters are ignored,
but the entire comment (including the delimiters) is passed through to
the output--comments are not discarded by m4
.
Comments cannot be nested, so the first newline after a `#' ends the comment. The commenting effect of the begin-comment string can be inhibited by quoting it.
`quoted text' # `commented text' ⇒quoted text # `commented text' `quoting inhibits' `#' `comments' ⇒quoting inhibits # comments |
The comment delimiters can be changed to any string at any time, using
the builtin macro changecom
. See section Changing comment delimiters, for more
information.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Any character, that is neither a part of a name, nor of a quoted string, nor a comment, is a token by itself. When not in the context of macro expansion, all of these tokens are just copied to output. However, during macro expansion, whitespace characters (space, tab, newline, formfeed, carriage return, vertical tab), parentheses (`(' and `)'), comma (`,'), and dollar (`$') have additional roles, explained later.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
As m4
reads the input token by token, it will copy each token
directly to the output immediately.
The exception is when it finds a word with a macro definition. In that
case m4
will calculate the macro's expansion, possibly reading
more input to get the arguments. It then inserts the expansion in front
of the remaining input. In other words, the resulting text from a macro
call will be read and parsed into tokens again.
m4
expands a macro as soon as possible. If it finds a macro call
when collecting the arguments to another, it will expand the second
call first. For a running example, examine how m4
handles this
input:
format(`Result is %d', eval(`2**15')) |
First, m4
sees that the token `format' is a macro name, so
it collects the tokens `(', ``Result is %d'', `,',
and ` ', before encountering another potential macro. Sure
enough, `eval' is a macro name, so the nested argument collection
picks up `(', ``2**15'', and `)', invoking the eval macro
with the lone argument of `2**15'. The expansion of
`eval(2**15)' is `32768', which is then rescanned as the five
tokens `3', `2', `7', `6', and `8'; and
combined with the next `)', the format macro now has all its
arguments, as if the user had typed:
format(`Result is %d', 32768) |
The format macro expands to `Result is 32768', and we have another round of scanning for the tokens `Result', ` ', `is', ` ', `3', `2', `7', `6', and `8'. None of these are macros, so the final output is
⇒Result is 32768 |
The order in which m4
expands the macros can be explored using
the Tracing macro calls facilities of GNU m4
.
This process continues until there are no more macro calls to expand and all the input has been consumed.
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated by System Administrator on September, 23 2007 using texi2html 1.70.