Character sets are the fundamental elements in a regular expression. A character set is a pattern that matches a single character. The syntax of character sets is as follows:
set := set ['#' set0]
set0 := @char [ '-' @char ]
| '.'
| @smac
| '[' [^] { set } ']'
| '~' set0The various character set constructions are:
charThe simplest character set is a single character.
Note that special characters such as [
and . must be escaped by prefixing them
with \ (see the lexical syntax, Section 3.1, “Lexical syntax”, for the list of special
characters).
Certain non-printable characters have special escape
sequences. These are: \a,
\b, \f,
\n, \r,
\t, and \v. Other
characters can be represented by using their numerical
character values (although this may be non-portable):
\x0A is equivalent to
\n, for example.
Whitespace characters are ignored; to represent a
literal space, escape it with \.
char-charA range of characters can be expressed by separating
the characters with a ‘-’,
all the characters with codes in the given range are
included in the set. Character ranges can also be
non-portable.
.The built-in set ‘.’
matches all characters except newline
(\n).
Equivalent to the set
[\x00-\xff] # \n.
set0 # set1Matches all the characters in
set0 that are not in
set1.
[sets]The union of sets.
[^sets]The complement of the union of the
sets. Equivalent to
‘. # [’.sets]
~setThe complement of set.
Equivalent to ‘. # ’set
A set macro is written as $ followed by
an identifier. There are some builtin character set
macros:
$whiteMatches all whitespace characters, including newline.
Equivalent to the set
[\t\n\f\v\r].
$printableMatches all printable characters (characters 32 to
126 in ASCII). Equivalent to the set
[\32-\126].
Character set macros can be defined at the top of the file at the same time as regular expression macros (see Chapter 4, Regular Expression). Here are some example character set macros:
$lls = a-z -- little letters $not_lls = ~a-z -- anything but little letters $ls_ds = [a-zA-Z0-9] -- letters and digits $sym = [ \! \@ \# \$ ] -- the symbols !, @, #, and $ $sym_q_nl = [ \' \! \@ \# \$ \n ] -- the above symbols with ' and newline $quotable = $printable # \' -- any graphic character except ' $del = \127 -- ASCII DEL