7. Input control

This chapter describes various builtin macros for controlling the input to m4.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.1 Deleting whitespace in input

The builtin dnl stands for "Discard to Next Line":

Builtin: dnl

All characters, up to and including the next newline, are discarded without performing any macro expansion.

The expansion of dnl is void.

It is often used in connection with define, to remove the newline that follows the call to define. Thus

define(`foo', `Macro `foo'.')dnl A very simple macro, indeed.
foo
⇒Macro foo.

The input up to and including the next newline is discarded, as opposed to the way comments are treated (see section Comments).

Usually, dnl is immediately followed by an end of line or some other whitespace. GNU m4 will produce a warning diagnostic if dnl is followed by an open parenthesis. In this case, dnl will collect and process all arguments, looking for a matching close parenthesis. All predictable side effects resulting from this collection will take place. dnl will return no output. The input following the matching close parenthesis up to and including the next newline, on whatever line containing it, will still be discarded.

dnl(`args are ignored, but side effects occur',
define(`foo', `like this')) while this text is ignored: undefine(`foo')
error-->m4:stdin:2: Warning: excess arguments to builtin `dnl' ignored
See how `foo' was defined, foo?
⇒See how foo was defined, like this?

If the end of file is encountered without a newline character, a warning is issued and dnl stops consuming input.

define(`hi', `HI')
⇒
m4wrap(`m4wrap(`2 hi
')0 hi dnl 1 hi')
⇒
^D
error-->m4: Warning: end of file treated as newline
⇒0 HI 2 HI

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.2 Changing the quote characters

The default quote delimiters can be changed with the builtin changequote:

Builtin: changequote ([start = ``'], [end = `''])

This sets start as the new begin-quote delimiter and end as the new end-quote delimiter. If any of the arguments are missing, the default quotes (` and ') are used instead of the void arguments.

The expansion of changequote is void.

changequote(`[', `]')
⇒
define([foo], [Macro [foo].])
⇒
foo
⇒Macro foo.

The quotation strings can safely contain eight-bit characters. If no single character is appropriate, start and end can be of any length.

changequote(`[[[', `]]]')
⇒
define([[[foo]]], [[[Macro [[[[[foo]]]]].]]])
⇒
foo
⇒Macro [[foo]].

Changing the quotes to the empty strings will effectively disable the quoting mechanism, leaving no way to quote text.

define(`foo', `Macro `FOO'.')
⇒
changequote(, )
⇒
foo
⇒Macro `FOO'.
`foo'
⇒`Macro `FOO'.'

There is no way in m4 to quote a string containing an unmatched begin-quote, except using changequote to change the current quotes.

If the quotes should be changed from, say, `[' to `[[', temporary quote characters have to be defined. To achieve this, two calls of changequote must be made, one for the temporary quotes and one for the new quotes.

Macros are recognized in preference to the begin-quote string, so if a prefix of start can be recognized as a potential macro name, the quoting mechanism is effectively disabled. Unless you use changeword (see section Changing the lexical structure of words), this means that start should not begin with a letter or `_' (underscore).

define(`hi', `HI')
⇒
changequote(`q', `Q')
⇒
q hi Q hi
⇒q HI Q HI
changequote
⇒
changequote(`-', `EOF')
⇒
- hi EOF hi
⇒ hi  HI

Quotes are recognized in preference to argument collection. In particular, if start is a single `(', then argument collection is effectively disabled. For portability with other implementations, it is a good idea to avoid `(', `,', and `)' as the first character in start.

define(`echo', `$#:$')
⇒
define(`hi', `HI')
⇒
changequote(`(',`)')
⇒
echo(hi)
⇒0::hi
changequote
⇒
changequote(`((', `))')
⇒
echo(hi)
⇒1:HI:
echo((hi))
⇒0::hi
changequote
⇒
changequote(`,', `)')
⇒
echo(hi,hi)bye)
⇒1:HIhibye:

If end is a prefix of start, the end-quote will be recognized in preference to a nested begin-quote. In particular, changing the quotes to have the same string for start and end disables nesting of quotes. When quote nesting is disabled, it is impossible to double-quote strings across macro expansions, so using the same string is not done very often.

define(`hi', `HI')
⇒
changequote(`""', `"')
⇒
""hi"""hi"
⇒hihi
""hi" ""hi"
⇒hi hi
""hi"" "hi"
⇒hi" "HI"
changequote
⇒
`hi`hi'hi'
⇒hi`hi'hi
changequote(`"', `"')
⇒
"hi"hi"hi"
⇒hiHIhi

It is an error if the end of file occurs within a quoted string.

`dangling quote
^D
error-->m4:stdin:1: ERROR: end of file in string

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.3 Changing comment delimiters

The default comment delimiters can be changed with the builtin macro changecom:

Builtin: changecom ([start], [end])

This sets start as the new begin-comment delimiter and end as the new end-comment delimiter. If only one argument is provided, newline becomes the new end-comment delimiter. The comment delimiters can be of any length. Omitting the first argument, or using the empty string as the first argument, disables comments.

The expansion of changecom is void.

define(`comment', `COMMENT')
⇒
# A normal comment
⇒# A normal comment
changecom(`/*', `*/')
⇒
# Not a comment anymore
⇒# Not a COMMENT anymore
But: /* this is a comment now */ while this is not a comment
⇒But: /* this is a comment now */ while this is not a COMMENT

Note how comments are copied to the output, much as if they were quoted strings. If you want the text inside a comment expanded, quote the begin-comment delimiter.

Calling changecom without any arguments, or with an empty string for the first argument, disables the commenting mechanism completely. To restore the original comment start of `#', you must explicitly ask for it.

define(`comment', `COMMENT')
⇒
changecom
⇒
# Not a comment anymore
⇒# Not a COMMENT anymore
changecom(`#')
⇒
# comment again
⇒# comment again

The comment strings can safely contain eight-bit characters.

Comments are recognized in preference to macros. However, this is not compatible with other implementations, where macros and even quoting takes precedence over comments, so it may change in a future release. For portability, this means that start should not begin with a letter or `_' (underscore), and that neither the start-quote nor the start-comment string should be a prefix of the other.

define(`hi', `HI')
⇒
changecom(`q', `Q')
⇒
q hi Q hi
⇒q hi Q HI

Comments are recognized in preference to argument collection. In particular, if start is a single `(', then argument collection is effectively disabled. For portability with other implementations, it is a good idea to avoid `(', `,', and `)' as the first character in start.

define(`echo', `$#:$')
⇒
define(`hi', `HI')
⇒
changecom(`(',`)')
⇒
echo(hi)
⇒0::(hi)
changecom
⇒
changecom(`((', `))')
⇒
echo(hi)
⇒1:HI:
echo((hi))
⇒0::((hi))
changecom(`,', `)')
⇒
echo(hi,hi)bye)
⇒1:HI,hi)bye:

It is an error if the end of file occurs within a comment.

changecom(`/*', `*/')
⇒
/*dangling comment
^D
error-->m4:stdin:1: ERROR: end of file in comment

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.4 Changing the lexical structure of words

The macro changeword and all associated functionality is experimental. It is only available if the `--enable-changeword' option was given to configure, at GNU m4 installation time. The functionality will go away in the future, to be replaced by other new features that are more efficient at providing the same capabilities. Do not rely on it. Please direct your comments about it the same way you would do for bugs.

A file being processed by m4 is split into quoted strings, words (potential macro names) and simple tokens (any other single character). Initially a word is defined by the following regular expression:

[_a-zA-Z][_a-zA-Z0-9]*

Using changeword, you can change this regular expression:

Optional builtin: changeword (regex)

Changes the regular expression for recognizing macro names to be regex. If regex is empty, use `[_a-zA-Z][_a-zA-Z0-9]*'. regex must obey the constraint that every prefix of the desired final pattern is also accepted by the regular expression. If regex contains grouping parentheses, the macro invoked is the portion that matched the first group, rather than the entire matching string.

The expansion of changeword is void. The macro changeword is recognized only with parameters.

Relaxing the lexical rules of m4 might be useful (for example) if you wanted to apply translations to a file of numbers:

ifdef(`changeword', `', `errprint(` skipping: no changeword support
')m4exit(`77')')dnl
changeword(`[_a-zA-Z0-9]+')
⇒
define(`1', `0')1
⇒0

Tightening the lexical rules is less useful, because it will generally make some of the builtins unavailable. You could use it to prevent accidental call of builtins, for example:

ifdef(`changeword', `', `errprint(` skipping: no changeword support
')m4exit(`77')')dnl
define(`_indir', defn(`indir'))
⇒
changeword(`_[_a-zA-Z0-9]*')
⇒
esyscmd(`foo')
⇒esyscmd(foo)
_indir(`esyscmd', `echo hi')
⇒hi
⇒

Because m4 constructs its words a character at a time, there is a restriction on the regular expressions that may be passed to changeword. This is that if your regular expression accepts `foo', it must also accept `f' and `fo'.

changeword has another function. If the regular expression supplied contains any grouped subexpressions, then text outside the first of these is discarded before symbol lookup. So:

ifdef(`changeword', `', `errprint(` skipping: no changeword support
')m4exit(`77')')dnl
changecom(`/*', `*/')dnl
define(`foo', `bar')dnl
changeword(`#\([_a-zA-Z0-9]*\)')
⇒
#esyscmd(`echo foo \#foo')
⇒foo bar
⇒

m4 now requires a `#' mark at the beginning of every macro invocation, so one can use m4 to preprocess plain text without losing various words like `divert'.

In m4, macro substitution is based on text, while in TeX, it is based on tokens. changeword can throw this difference into relief. For example, here is the same idea represented in TeX and m4. First, the TeX version:

\def\a{\message{Hello}}
\catcode`\@=0
\catcode`\\=12
@a
@bye
⇒Hello

Then, the m4 version:

ifdef(`changeword', `', `errprint(` skipping: no changeword support
')m4exit(`77')')dnl
define(`a', `errprint(`Hello')')dnl
changeword(`@\([_a-zA-Z0-9]*\)')
⇒
@a
⇒errprint(Hello)

In the TeX example, the first line defines a macro a to print the message `Hello'. The second line defines @ to be usable instead of \ as an escape character. The third line defines \ to be a normal printing character, not an escape. The fourth line invokes the macro a. So, when TeX is run on this file, it displays the message `Hello'.

When the m4 example is passed through m4, it outputs `errprint(Hello)'. The reason for this is that TeX does lexical analysis of macro definition when the macro is defined. m4 just stores the text, postponing the lexical analysis until the macro is used.

You should note that using changeword will slow m4 down by a factor of about seven, once it is changed to something other than the default regular expression. You can invoke changeword with the empty string to restore the default word definition, and regain the parsing speed.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.5 Saving input

It is possible to `save' some text until the end of the normal input has been seen. Text can be saved, to be read again by m4 when the normal input has been exhausted. This feature is normally used to initiate cleanup actions before normal exit, e.g., deleting temporary files.

To save input text, use the builtin m4wrap:

Builtin: m4wrap ([string], …)

Stores string in a safe place, to be reread when end of input is reached. As a GNU extension, additional arguments are concatenated with a space to the string.

The expansion of m4wrap is void. The macro m4wrap is recognized only with parameters.

define(`cleanup', `This is the `cleanup' action.
')
⇒
m4wrap(`cleanup')
⇒
This is the first and last normal input line.
⇒This is the first and last normal input line.
^D
⇒This is the cleanup action.

The saved input is only reread when the end of normal input is seen, and not if m4exit is used to exit m4.

It is safe to call m4wrap from saved text, but then the order in which the saved text is reread is undefined. If m4wrap is not used recursively, the saved pieces of text are reread in the opposite order in which they were saved (LIFO--last in, first out). However, this behavior is likely to change in a future release, to match POSIX, so you should not depend on this order.

Here is an example of implementing a factorial function using m4wrap:

define(`f', `ifelse(`$1', `0', `Answer: 0!=1
', eval(`$1>1'), `0', `Answer: $2$1=eval(`$2$1')
', `m4wrap(`f(decr(`$1'), `$2$1*')')')')
⇒
f(`10')
⇒
^D
⇒Answer: 10*9*8*7*6*5*4*3*2*1=3628800

Invocations of m4wrap at the same recursion level are concatenated and rescanned as usual:

define(`aa', `AA
')
⇒
m4wrap(`a')m4wrap(`a')
⇒
^D
⇒AA

however, the transition between recursion levels behaves like an end of file condition between two input files.

m4wrap(`m4wrap(`)')len(abc')
⇒
^D
error-->m4: ERROR: end of file in argument list

[ << ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

This document was generated by System Administrator on September, 23 2007 using texi2html 1.70.