[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
There are a number of builtins in m4
for manipulating text in
various ways, extracting substrings, searching, substituting, and so on.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The length of a string can be calculated by len
:
Expands to the length of string, as a decimal number.
The macro len
is recognized only with parameters.
len() ⇒0 len(`abcdef') ⇒6 |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Searching for substrings is done with index
:
Expands to the index of the first occurrence of substring in
string. The first character in string has index 0. If
substring does not occur in string, index
expands to
`-1'.
The macro index
is recognized only with parameters.
index(`gnus, gnats, and armadillos', `nat') ⇒7 index(`gnus, gnats, and armadillos', `dag') ⇒-1 |
Omitting substring evokes a warning, but still produces output.
index(`abc') error-->m4:stdin:1: Warning: too few arguments to builtin `index' ⇒0 |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Searching for regular expressions is done with the builtin
regexp
:
Searches for regexp in string. The syntax for regular expressions is the same as in GNU Emacs. See Syntax of Regular Expressions in the GNU Emacs Manual.
If replacement is omitted, regexp
expands to the index of
the first match of regexp in string. If regexp does
not match anywhere in string, it expands to -1.
If replacement is supplied, and there was a match, regexp
changes the expansion to this argument, with `\n' substituted
by the text matched by the nth parenthesized sub-expression of
regexp, up to nine sub-expressions. The escape `\&' is
replaced by the text of the entire regular expression matched. For
all other characters, `\' treats the next character literally. A
warning is issued if there were fewer sub-expressions than the
`\n' requested, or if there is a trailing `\'. If there
was no match, regexp
expands to the empty string.
The macro regexp
is recognized only with parameters.
regexp(`GNUs not Unix', `\<[a-z]\w+') ⇒5 regexp(`GNUs not Unix', `\<Q\w*') ⇒-1 regexp(`GNUs not Unix', `\w\(\w+\)$', `*** \& *** \1 ***') ⇒*** Unix *** nix *** regexp(`GNUs not Unix', `\<Q\w*', `*** \& *** \1 ***') ⇒ |
Here are some more examples on the handling of backslash:
regexp(`abc', `\(b\)', `\\\10\a') ⇒\b0a regexp(`abc', `b', `\1\') error-->m4:stdin:2: Warning: sub-expression 1 not present error-->m4:stdin:2: Warning: trailing \ ignored in replacement ⇒ regexp(`abc', `\(\(d\)?\)\(c\)', `\1\2\3\4\5\6') error-->m4:stdin:3: Warning: sub-expression 4 not present error-->m4:stdin:3: Warning: sub-expression 5 not present error-->m4:stdin:3: Warning: sub-expression 6 not present ⇒c |
Omitting regexp evokes a warning, but still produces output.
regexp(`abc') error-->m4:stdin:1: Warning: too few arguments to builtin `regexp' ⇒0 |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Substrings are extracted with substr
:
Expands to the substring of string, which starts at index from, and extends for length characters, or to the end of string, if length is omitted. The starting index of a string is always 0. The expansion is empty if there is an error parsing from or length, if from is beyond the end of string, or if length is negative.
The macro substr
is recognized only with parameters.
substr(`gnus, gnats, and armadillos', `6') ⇒gnats, and armadillos substr(`gnus, gnats, and armadillos', `6', `5') ⇒gnats |
Omitting from evokes a warning, but still produces output.
substr(`abc') error-->m4:stdin:1: Warning: too few arguments to builtin `substr' ⇒abc substr(`abc',) error-->m4:stdin:2: empty string treated as 0 in builtin `substr' ⇒abc |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Character translation is done with translit
:
Expands to string, with each character that occurs in chars translated into the character from replacement with the same index.
If replacement is shorter than chars, the excess characters are deleted from the expansion. If replacement is omitted, all characters in string that are present in chars are deleted from the expansion.
As a GNU extension, both chars and replacement can contain character-ranges, e.g., `a-z' (meaning all lowercase letters) or `0-9' (meaning all digits). To include a dash `-' in chars or replacement, place it first or last.
It is not an error for the last character in the range to be `larger' than the first. In that case, the range runs backwards, i.e., `9-0' means the string `9876543210'.
The macro translit
is recognized only with parameters.
translit(`GNUs not Unix', `A-Z') ⇒s not nix translit(`GNUs not Unix', `a-z', `A-Z') ⇒GNUS NOT UNIX translit(`GNUs not Unix', `A-Z', `z-a') ⇒tmfs not fnix |
The first example deletes all uppercase letters, the second converts lowercase to uppercase, and the third `mirrors' all uppercase letters, while converting them to lowercase. The two first cases are by far the most common.
Omitting chars evokes a warning, but still produces output.
translit(`abc') error-->m4:stdin:1: Warning: too few arguments to builtin `translit' ⇒abc |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Global substitution in a string is done by patsubst
:
Searches string for matches of regexp, and substitutes replacement for each match. The syntax for regular expressions is the same as in GNU Emacs (see section Searching for regular expressions).
The parts of string that are not covered by any match of regexp are copied to the expansion. Whenever a match is found, the search proceeds from the end of the match, so a character from string will never be substituted twice. If regexp matches a string of zero length, the start position for the search is incremented, to avoid infinite loops.
When a replacement is to be made, replacement is inserted into the expansion, with `\n' substituted by the text matched by the nth parenthesized sub-expression of patsubst, for up to nine sub-expressions. The escape `\&' is replaced by the text of the entire regular expression matched. For all other characters, `\' treats the next character literally. A warning is issued if there were fewer sub-expressions than the `\n' requested, or if there is a trailing `\'.
The replacement argument can be omitted, in which case the text matched by regexp is deleted.
The macro patsubst
is recognized only with parameters.
patsubst(`GNUs not Unix', `^', `OBS: ') ⇒OBS: GNUs not Unix patsubst(`GNUs not Unix', `\<', `OBS: ') ⇒OBS: GNUs OBS: not OBS: Unix patsubst(`GNUs not Unix', `\w*', `(\&)') ⇒(GNUs)() (not)() (Unix)() patsubst(`GNUs not Unix', `\w+', `(\&)') ⇒(GNUs) (not) (Unix) patsubst(`GNUs not Unix', `[A-Z][a-z]+') ⇒GN not patsubst(`GNUs not Unix', `not', `NOT\') error-->m4:stdin:6: Warning: trailing \ ignored in replacement ⇒GNUs NOT Unix |
Here is a slightly more realistic example, which capitalizes individual
word or whole sentences, by substituting calls of the macros
upcase
and downcase
into the strings.
Expand to text, but with capitalization changed: upcase
changes all letters to upper case, downcase
changes all letters
to lower case, and capitalize
changes the first character of each
word to upper case and the remaining characters to lower case.
define(`upcase', `translit(`$*', `a-z', `A-Z')')dnl define(`downcase', `translit(`$*', `A-Z', `a-z')')dnl define(`capitalize1', `regexp(`$1', `^\(\w\)\(\w*\)', `upcase(`\1')`'downcase(`\2')')')dnl define(`capitalize', `patsubst(`$1', `\w+', `capitalize1(`\&')')')dnl capitalize(`GNUs not Unix') ⇒Gnus Not Unix |
While regexp
replaces the whole input with the replacement as
soon as there is a match, patsubst
replaces each
occurrence of a match and preserves non-matching pieces:
define(`patreg', `patsubst($@) regexp($@)')dnl patreg(`bar foo baz Foo', `foo\|Foo', `FOO') ⇒bar FOO baz FOO ⇒FOO patreg(`aba abb 121', `\(.\)\(.\)\1', `\2\1\2') ⇒bab abb 212 ⇒bab |
Omitting regexp evokes a warning, but still produces output.
patsubst(`abc') error-->m4:stdin:1: Warning: too few arguments to builtin `patsubst' ⇒abc |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Formatted output can be made with format
:
Works much like the C function printf
. The first argument
format-string can contain `%' specifications which are
satisfied by additional arguments, and the expansion of format
is
the formatted string.
The macro format
is recognized only with parameters.
Its use is best described by a few examples:
define(`foo', `The brown fox jumped over the lazy dog') ⇒ format(`The string "%s" uses %d characters', foo, len(foo)) ⇒The string "The brown fox jumped over the lazy dog" uses 38 characters format(`%.0f', `56789.9876') ⇒56790 len(format(`%-*X', `300', `1')) ⇒300 |
Using the forloop
macro defined in See section Loops and recursion, this
example shows how format
can be used to produce tabular output.
include(`forloop.m4') ⇒ forloop(`i', `1', `10', `format(`%6d squared is %10d ', i, eval(i**2))') ⇒ 1 squared is 1 ⇒ 2 squared is 4 ⇒ 3 squared is 9 ⇒ 4 squared is 16 ⇒ 5 squared is 25 ⇒ 6 squared is 36 ⇒ 7 squared is 49 ⇒ 8 squared is 64 ⇒ 9 squared is 81 ⇒ 10 squared is 100 ⇒ |
The builtin format
is modeled after the ANSI C `printf'
function, and supports these `%' specifiers: `c',
`s', `d', `o', `x', `X', `u', `e',
`E', `f', `F', `g', `G', and `%'; it
supports field widths and precisions, and the
modifiers `+', `-', ` ', `0', `#', `h' and
`l'. For more details on the functioning of printf
, see the
C Library Manual.
For now, unrecognized specifiers are silently ignored, but it is
anticipated that a future release of GNU m4
will support more
specifiers, and give warnings when problems are encountered. Likewise,
escape sequences are not yet recognized.
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated by System Administrator on September, 23 2007 using texi2html 1.70.