29 May 2009 zoem 1.002, 09-149
zoem — macro processor for the Zoem macro/programming language.
zoem [-i <file name>[.azm] (entry file name)] [-I <file name> (entry file name)] [-o <file name> (output file name)] [-d <device> (set device key)]
zoem
(enter interactive mode - happens when none of -i,
-I, -o is given)
zoem -i <file name>[.azm] (entry file name) -I <file name> (entry file name) [-o <file name> (output file name)] [-d <device> (set device key)] [-x (enter interactive mode on error)] [-s <key>[=<val>] (set key to val)] [-e <any> (evaluate any, exit)] [-E <any> (evaluate any, proceed)] [-chunk-size <num> (process chunks of size num)] [--trace (trace mode, default)] [--trace-all-long (long trace mode)] [--trace-all-short (short trace mode)] [--trace-regex (trace regexes)] [-trace k (trace mode, explicit)] [--stats (show symbol table stats after run)] [--split (assume \writeto usage, set \__split__)] [--stress-write (make \write#3 recover)] [--unsafe (prompt for \system#3)] [--unsafe-silent (simply allow \system#3)] [-allow cmd1[:cmdx]+ (allowable commands)] [--system-honor (require \system#3 to succeed)] [-nuser k (user dict stack size)] [-ndollar k (dollar dict stack size)] [-nsegment k (maximum simple nesting depth)] [-nstack k (maximum eval nesting depth)] [-buser (initial user dict capacity)] [-bzoem (initial zoem dict capacity)] [-tl k (tab length)] [-l <str> (list items)] [-h (show options)] [--apropos (show options)]
Zoem is a macro/programming language. It is fully described in the Zoem User Manual , currently available in HTML only. This manual page documents the zoem processor, not the zoem language.
If the input file is specified using the -i option and is a regular file (i.e. not STDIN - which is specified by using a single hyphen), it must have the extension .azm. This extension can but need not be specified. The zoem key \__fnbase__ will be set to the file base name stripped of the .azm extension and any leading path components. If the input file is specified using the -I option, no extension is assumed, and \__fnbase__ is set to the file base name, period. The file base name is the file name with any leading path components stripped away.
If neither -i nor -o is specified, zoem enters interactive mode. Zoem should fully recover from any error it encounters in the input. If you find an exception to this rule, consider filing a bug report. In interactive mode, zoem start interpreting once it encounters a line containing a single dot. Zoem's input behaviour can be modified by setting the key \__parmode__. See the section SESSION MACROS for the details. In interactive mode, zoem does not preprocess the interactive input, implying that it does not accept inline files and it does not recognize comments. Both types of sequence will generate syntax errors.
From within the entry file and included files it is possible to open and write to arbitrary files using the \write#3 primitive. Arbitrary files can be read in various modes using the \dofile#2 macro (providing four different modes with respect to file existence and output), \finsert#1, and \zinsert#1. Zoem will write the default output to a single file, the name of which is either specified by the -o option, or constructed as described below. Zoem can split the default output among multiple files. This is governed from within the input files by issuing \writeto#1 calls. Refer to the --split option and the Zoem User Manual.
If none of -i or -o is given, then zoem will enter interactive mode. In this mode, zoem interprets by default chunks of text that are ended by a single dot on a line of its own. This can be useful for testing or debugging. In interactive mode, zoem should recover from any failure it encounters. Interactive mode can also be accessed from within a file by issuing \zinsert{stdia}, and it can be triggered as the mode to enter should an error occur (by adding the -x option to the command line).
If -o is given and -i is not, zoem reads input from STDIN.
If -i is given and -o is not, zoem will construct an output file name as follows. If the -d option was used with argument <dev>, zoem will write to the file which results from expanding \__fnbase__.<dev>. Otherwise, zoem writes to (the expansion of) \__fnbase__.ozm.
For -i and -o, the argument - is interpreted as respectively stdin and stdout.
Specify the entry file name. The file must have the .azm extension, but it need not be specified.
Specify the entry file name, without restrictions on the file name.
Specify the output file name.
Set the key \__device__ to <device>.
The afterlife option. If zoem encounters an error during regular processing, it will emit error messages as usual, and then enter interactive mode. This allows you e.g. to inspect the values of keys used or defined within the problematic area.
Set the key \key to val if present, 1 otherwise. Any type of key can be set, including keys taking arguments and keys surrounded in quotes. Beware of the shell's quote and backslash interpolation. Currently val is not evaluated, so appending or prepending to a key is not possible.
This causes zoem to evaluate <any>, write any result text to stdout, and exit.
This causes zoem to evaluate <any>, write any result text to stdout, and proceed e.g. with the entry file or an interactive session.
Zoem reads its input in chunks. It fully processes a chunk before moving on with the next one. This option defines the (minimum) size of the chunks. The size or count of the chunks does not at all affect zoem's output. The default minimum chunk size equals one megabyte.
Zoem will read files in their entirety before further processsing if -chunk-size 0 is specified.
Zoem does not chunk input files arbitrarily. It will append to a chunk until it is in the outermost scope (not contained within any block) and the chunk will end with a line that was fully read.
Consequently, if e.g. a file contains a block (delimited by balanced curlies) spanning the entire file then zoem is forced to read it in its entirety.
Trace in default mode.
Sets on most trace options in long mode. Trace options xxx not set have their own --trace-xxx entry (see below).
Sets on most trace options in short mode. Trace options xxx not set have their own --trace-xxx entry (see below).
Trace keys.
Trace regexes (i.e. the \inspect#4 primitive).
Set trace options by adding their representing bits.
This makes \write#3 recover from errors. It is a special purpose option used for creating zoem stress test suites, such as stress.azm in the zoem distribution /examples subdirectory.
With --unsafe system calls are allowed but the user is prompted for each invocation. The command and its arguments (if any) are shown, but the STDIN information (if any) is withheld. With --unsafe-silent system calls are allowed and the user is never prompted.
Use -allow str or --allow=str to specify a list of allowable commands, as a string in which the commands are separated by colons.
With this option any \system#3 failure (for whatever reason, including safe behaviour) is regarded as a zoem failure. By default, failing system calls are ignored under either safe mode, unsafe mode (--unsafe), or silent unsafe mode (--unsafe-silent).
This assumes zoem input that allows output to multiple files (e.g. chapters). It sets the default output stream to stdout (anticipating custom output redirection with \writeto#1) and sets the session macro \__split__ to 1.
Show symbol table chacteristics. Symbol tables are maintained as hashes.
Set the tab length. HTML output can be indented according to nesting structure, using tabs which are expanded to simple spaces. By default, the tab length is zero, meaning that no indent is shown. The maximum value the tab length can be set to is four.
Probably needed only if you have some obscure and extreme use for zoem. The segment limit applies to simple nesting of macros. The stack limit applies to nesting of macros that evaluate an argument before use. Each such evaluation creates a new stack. The user limit applies to \push{user}, the dollar limit applies to \push{dollar}. The user dict capacity pertains to the initial number of buckets allocated for user and dollar dictionaries, and the zoem dict capacity pertains to the dictionary containing the zoem primitives.
List items identified by <str>. It can be any of all, filter. legend, builtin, session, trace, or zoem, Multiple identifiers can be joined in a string, e.g. -l legendzoem prints a legend followed by a listing of zoem primitives.
Show short synopsis of options.
Show one-line synopsis of all options.
1    chomp newlines (remove the newline character)
2    skip empty newlines
4    read paragraphs (an empty line triggers input read)
8    newlines can be escaped using a backslash
16   read large paragraphs (a single dot on a line
     triggers input read)
The current output device, set by the command line option -d. The man and faq packages support html and roff as its values.
The base name of the input file name. Leading path components are stripped away. If the -i option is used the input file is required to have the .azm suffix. In that case the suffix is also stripped to obtain the base name.
The name of the entry file.
The file currently being processed.
The name of the default output file.
The leading component of the input file name, possibly empty.
The file that included the current file, if applicable.
This key is set by \write#3 to its first argument. It can be used by macros that are expanded during evaluation of the third argument. Possible usage is to branch on the name of the write output stream. For example a file called index.html may be treated differently from other files. The key is deleted afterwards. Nested invocations of \write#3 may corrupt the status of this key.
The line number in the file currently being processed. This number will be correct for any invocation outside the scope of a macro. For any invocation within a macro the result will be the line number of the closing curly of the outermost containing macro. The following
   \__line__
   \__line__
   \__line__
   \group{
   \__line__
   \group{\__line__}
   \__line__}
Results in
1 2 3 7 7 7
A vararg containing a list of paths to search when a file is to be included/imported/read/loaded. When you start zoem, this key should contain the location of the man.zmm and faq.zmm package files. It is advisable not to overwrite this key but to append to it instead.
Set to one of ok, towel (that one is a bit lame), or error by either the interpreter, an occurrence of \catch#2, or \try#1.
Set by \try#1 to the possibly truncated result of processing its argument.
Expands to a left curly. It is hard to find a need for this — the zoem stress suite uses it to generate a particular syntax error at a deeper interpretation level.
Expands to a right curly.
The \inspect#4 primitive takes four arguments. The languages accepted by the first two arguments are described below. The third argument is a replacement string or a replacement macro accepting back-references (supplied as an anonymous macro). The fourth argument is the data to be processed.
arg 1
Is a vararg. Currently it accepts a single key
mods for which the value should be a comma-separated list over the words
posix,
icase,
dotall,
iter-lines
iter-args,
match-once,
discard-nmp,
discard-nil-out,
discard-miss,
count-matches.
Alternatively repeated use of mods is allowed.
arg 2
Is a regular expression. Tilde patterns
are expanded according to all of the ZOEM, UNIX, and REGEX schemes.
Refer to TILDE EXPANSION for these.
The third argument is a constant string or an anonymous key, the fourth argument is data.
The \tr#2 primitive takes two arguments. The first argument contains key-value pairs. The accepted keys are from and to which must always occur together, and delete and squash. The values of these keys must be valid translation specifications. This primitive transforms the data in the second argument by successively applying translation, deletion and squashing in that order. Only the transformations that are needed need be specified.
Translation specifications are subjected to UNIX tilde expansion as described below.
The syntax accepted by translation specifications is almost fully compliant with the syntax accepted by tr(1), with three exceptions. First, repeats are introduced as [*a*20] rather than [a*20]. Second, ranges can (for now) only be entered as X-Y, not as [X-Y]. X and Y can be entered in either octal or hexadecimal notation (see further below). As an additional feature, the magic repeat operator [*a#] stops on both class and range boundaries. Character specifications can be complemented by preceding them with the caret ^.
Specifications may contain ranges of characters such as a-z and 0-9. Posix character classes are allowed. The available classes are
[:alnum:] [:alpha:] [:cntrl:] [:digit:] [:graph:] [:lower:] [:print:] [:punct:] [:space:] [:upper:] [:xdigit:]
Characters can be specified using octal notation, e.g. \012 encodes the newline. Use \173 for the opening curly, \175 for the closing curly, \134 for the backslash, and \036 for the caret if it is the first character in a specification. DON'T use \\, \{, or \} in this case! Hexadecimal notation is written as \x7b (the left curly in this instance).
See EXAMPLES for an example of tr#2 usage.
Some primitives interface with UNIX libraries that require backslash escape sequences to encode certain tokens or characters. The backslash is special in zoem too and without further measures it can become very cumbersome to encode the correct escape sequences as it is not always clear which tokens should be escaped or unprotected at what point. It is especially difficult to handle the zoem characters with special meaning, {, } and \.
The two primitives under consideration are are \inspect#4 and \tr#2. Both treat the tilde as an additional escape character for certain arguments (as documented in the user manual). These arguments are subjected to tilde expansion, where the tilde and the character it proceeds are translated to a new character or character sequence. There are three different sets of tilde escapes, ZOEM, UNIX and REGEX escapes. \tr#2 only accepts UNIX escapes, \inspect#4 accepts all. Tilde expansion is always the last processing step before strings are passed on to external libraries.
The ZOEM scheme contains some convenience escapes, such as \E to encode a double backslash.
ZOEM tilde expansion
 meta sequence   replacement
.-----------------------------.
|     ~~       |      ~       |
|     ~E       |      \\      |
|     ~e       |      \       |
|     ~I       |      \{      |
|     ~J       |      \}      |
|     ~x       |      \x      |
|     ~i       |      {       |
|     ~j       |      }       |
`-----------------------------'
The zoem tr specification language accepts \x** as hexadecimal notation, e.g. \x0a denotes a newline in the ASCII character set
.UNIX tilde expansion
meta sequence replacement .-----------------------------. | ~a | \a | | ~b | \b | | ~f | \f | | ~n | \n | | ~r | \r | | ~t | \t | | ~v | \v | | ~0 | \0 | | ~1 | \1 | | ~2 | \2 | | ~3 | \3 | `-----------------------------'
REGEX tilde expansion
meta sequence replacement .-----------------------------. | ~^ | \^ | | ~. | \. | | ~[ | \[ | | ~$ | \$ | | ~( | \( | | ~) | \) | | ~| | \| | | ~* | \* | | ~+ | \+ | | ~? | \? | `-----------------------------'
The environment variable ZOEMSEARCHPATH may contain a colon and/or whitespace separated list of paths. It will be used when searching for files included via one of the dofile aliases \input, \import, \read, and \load. Note that the zoem macro \__searchpath__ contains the location where the zoem macro files were copied at the time of installation of zoem.
On error, Zoem prints a file name and a line number to which it was able to trace the error. The number reported is the same as the one stored in the session macro \__line__. For an error-trigering macro which is not nested within another macro the line number should be correct. For a macro that does occur nested within another macro the line number will be the line number of the closing curly in the outermost containing macro.
If in despair, use one of the tracing modes, --trace-keys is one of the first to come to mind. Another possibility is to supply the -x option.
No known bugs. \inspect#4 has not received thorough stress-testing, and the more esoteric parts of its interface will probably change.
Portable Unix Documentation provides two mini-languages for authoring in the unix environment. These languages, pud-man and pud-faq are both written in zoem.
This is a relatively new section, aimed at assembling useful or explanatory snippets.
Create a vararg containing file names matching a pattern (png in this example).
\setx{images}{
   \inspect{
      {mods}{iter-lines,discard-miss}
   }{(.*~.png)}{_#1{{\1}}}{\system{ls}}
}
Use magic boundary stops with tr#2.
\tr{
   {from}{[:lower:][:upper:][:digit:][:space:][:punct:]}
   {to}{[*L#][*U#][*D#][*S#][*P#]}}{
 !"#$%&'()*+,-./0123456789:;<=>?@
ABCDEFGHIJKLMNOPQRSTUVWXYZ
[\\]^_`
abcdefghijklmnopqrstuvwxyz
\{|\}~]}
Stijn van Dongen.