std/pegs

Search:
Source   Edit  

Simple PEG (Parsing expression grammar) matching. Uses no memorization, but uses superoperators and symbol inlining to improve performance. Note: Matching performance is hopefully competitive with optimized regular expression engines.

PEG syntax and semantics

A PEG (Parsing expression grammar) is a simple deterministic grammar, that can be directly used for parsing. The current implementation has been designed as a more powerful replacement for regular expressions. UTF-8 is supported.

The notation used for a PEG is similar to that of EBNF:

notationmeaning
A / ... / ZOrdered choice: Apply expressions A, ..., Z, in this order, to the text ahead, until one of them succeeds and possibly consumes some text. Indicate success if one of expressions succeeded. Otherwise, do not consume any text and indicate failure.
A ... ZSequence: Apply expressions A, ..., Z, in this order, to consume consecutive portions of the text ahead, as long as they succeed. Indicate success if all succeeded. Otherwise, do not consume any text and indicate failure. The sequence's precedence is higher than that of ordered choice: A B / C means (A B) / Z and not A (B / Z).
(E)Grouping: Parenthesis can be used to change operator priority.
{E}Capture: Apply expression E and store the substring that matched E into a capture that can be accessed after the matching process.
{}Empty capture: Delete the last capture. No character is consumed.
$iBack reference to the ith capture. i counts forwards from 1 or backwards (last capture to first) from ^1.
$Anchor: Matches at the end of the input. No character is consumed. Same as !..
^Anchor: Matches at the start of the input. No character is consumed.
&EAnd predicate: Indicate success if expression E matches the text ahead; otherwise indicate failure. Do not consume any text.
!ENot predicate: Indicate failure if expression E matches the text ahead; otherwise indicate success. Do not consume any text.
E+One or more: Apply expression E repeatedly to match the text ahead, as long as it succeeds. Consume the matched text (if any) and indicate success if there was at least one match. Otherwise, indicate failure.
E*Zero or more: Apply expression E repeatedly to match the text ahead, as long as it succeeds. Consume the matched text (if any). Always indicate success.
E?Zero or one: If expression E matches the text ahead, consume it. Always indicate success.
[s]Character class: If the character ahead appears in the string s, consume it and indicate success. Otherwise, indicate failure.
[a-b]Character range: If the character ahead is one from the range a through b, consume it and indicate success. Otherwise, indicate failure.
's'String: If the text ahead is the string s, consume it and indicate success. Otherwise, indicate failure.
i's'String match ignoring case.
y's'String match ignoring style.
v's'Verbatim string match: Use this to override a global \i or \y modifier.
i$jString match ignoring case for back reference.
y$jString match ignoring style for back reference.
v$jVerbatim string match for back reference.
.Any character: If there is a character ahead, consume it and indicate success. Otherwise, (that is, at the end of input) indicate failure.
_Any Unicode character: If there is a UTF-8 character ahead, consume it and indicate success. Otherwise, indicate failure.
@ESearch: Shorthand for (!E .)* E. (Search loop for the pattern E.)
{@} ECaptured Search: Shorthand for {(!E .)*} E. (Search loop for the pattern E.) Everything until and excluding E is captured.
@@ ESame as {@} E.
A <- ERule: Bind the expression E to the nonterminal symbol A. Left recursive rules are not possible and crash the matching engine.
\identifierBuilt-in macro for a longer expression.
\dddCharacter with decimal code ddd.
\", etc.Literal ", etc.

Built-in macros

macromeaning
\dany decimal digit: [0-9]
\Dany character that is not a decimal digit: [^0-9]
\sany whitespace character: [ \9-\13]
\Sany character that is not a whitespace character: [^ \9-\13]
\wany "word" character: [a-zA-Z0-9_]
\Wany "non-word" character: [^a-zA-Z0-9_]
\asame as [a-zA-Z]
\Asame as [^a-zA-Z]
\nany newline combination: \10 / \13\10 / \13
\iignore case for matching; use this at the start of the PEG
\yignore style for matching; use this at the start of the PEG
\skip patskip pattern pat before trying to match other tokens; this is useful for whitespace skipping, for example: \skip(\s*) {\ident} ':' {\ident} matches key value pairs ignoring whitespace around the ':'.
\identa standard ASCII identifier: [a-zA-Z_][a-zA-Z_0-9]*
\letterany Unicode letter
\upperany Unicode uppercase letter
\lowerany Unicode lowercase letter
\titleany Unicode title letter
\whiteany Unicode whitespace character

A backslash followed by a letter is a built-in macro, otherwise it is used for ordinary escaping:

notationmeaning
\\a single backslash
\*same as '*'
\tnot a tabulator, but an (unknown) built-in

Supported PEG grammar

The PEG parser implements this grammar (written in PEG syntax):

# Example grammar of PEG in PEG syntax.
# Comments start with '#'.
# First symbol is the start symbol.

grammar <- rule* / expr

identifier <- [A-Za-z][A-Za-z0-9_]*
charsetchar <- "\\" . / [^\]]
charset <- "[" "^"? (charsetchar ("-" charsetchar)?)+ "]"
stringlit <- identifier? ("\"" ("\\" . / [^"])* "\"" /
                          "'" ("\\" . / [^'])* "'")
builtin <- "\\" identifier / [^\13\10]

comment <- '#' @ \n
ig <- (\s / comment)* # things to ignore

rule <- identifier \s* "<-" expr ig
identNoArrow <- identifier !(\s* "<-")
prefixOpr <- ig '&' / ig '!' / ig '@' / ig '{@}' / ig '@@'
literal <- ig identifier? '$' '^'? [0-9]+ / '$' / '^' /
           ig identNoArrow /
           ig charset /
           ig stringlit /
           ig builtin /
           ig '.' /
           ig '_' /
           (ig "(" expr ig ")") /
           (ig "{" expr? ig "}")
postfixOpr <- ig '?' / ig '*' / ig '+'
primary <- prefixOpr* (literal postfixOpr*)

# Concatenation has higher priority than choice:
# ``a b / c`` means ``(a b) / c``

seqExpr <- primary+
expr <- seqExpr (ig "/" expr)*

Note: As a special syntactic extension if the whole PEG is only a single expression, identifiers are not interpreted as non-terminals, but are interpreted as verbatim string:

abc =~ peg"abc" # is true

So it is not necessary to write peg" 'abc' " in the above example.

Examples

Check if s matches Nim's "while" keyword:

s =~ peg" y'while'"

Exchange (key, val)-pairs:

"key: val; key2: val2".replacef(peg"{\ident} \s* ':' \s* {\ident}", "$2: $1")

Determine the #include'ed files of a C file:

for line in lines("myfile.c"):
  if line =~ peg"""s <- ws '#include' ws '"' {[^"]+} '"' ws
                   comment <- '/*' @ '*/' / '//' .*
                   ws <- (comment / \s+)* """:
    echo matches[0]

PEG vs regular expression

As a regular expression \[.*\] matches the longest possible text between '[' and ']'. As a PEG it never matches anything, because a PEG is deterministic: .* consumes the rest of the input, so \] never matches. As a PEG this needs to be written as: \[ ( !\] . )* \] (or \[ @ \]).

Note that the regular expression does not behave as intended either: in the example * should not be greedy, so \[.*?\] should be used instead.

PEG construction

There are two ways to construct a PEG in Nim code:

  1. Parsing a string into an AST which consists of Peg nodes with the peg proc.
  2. Constructing the AST directly with proc calls. This method does not support constructing rules, only simple expressions and is not as convenient. Its only advantage is that it does not pull in the whole PEG parser into your executable.

Types

Captures = object
contains the captured substrings. Source   Edit  
EInvalidPeg = object of ValueError
raised if an invalid PEG has been detected Source   Edit  
NonTerminal = ref NonTerminalObj
Source   Edit  
NonTerminalFlag = enum
  ntDeclared, ntUsed
Source   Edit  
Peg {.shallow.} = object
  case
  of pkEmpty .. pkWhitespace:
    nil
  of pkTerminal, pkTerminalIgnoreCase, pkTerminalIgnoreStyle:
  of pkChar, pkGreedyRepChar:
  of pkCharChoice, pkGreedyRepSet:
  of pkNonTerminal:
  of pkBackRef .. pkBackRefIgnoreStyle:
  else:
type that represents a PEG Source   Edit  
PegKind = enum
  pkEmpty, pkAny,           ## any character (.)
  pkAnyRune,                ## any Unicode character (_)
  pkNewLine,                ## CR-LF, LF, CR
  pkLetter,                 ## Unicode letter
  pkLower,                  ## Unicode lower case letter
  pkUpper,                  ## Unicode upper case letter
  pkTitle,                  ## Unicode title character
  pkWhitespace,             ## Unicode whitespace character
  pkTerminal, pkTerminalIgnoreCase, pkTerminalIgnoreStyle, pkChar, ## single character to match
  pkCharChoice, pkNonTerminal, pkSequence, ## a b c ... --> Internal DSL: peg(a, b, c)
  pkOrderedChoice,          ## a / b / ... --> Internal DSL: a / b or /[a, b, c]
  pkGreedyRep,              ## a*     --> Internal DSL: *a
                             ## a+     --> (a a*)
  pkGreedyRepChar,          ## x* where x is a single character (superop)
  pkGreedyRepSet,           ## [set]* (superop)
  pkGreedyAny,              ## .* or _* (superop)
  pkOption,                 ## a?     --> Internal DSL: ?a
  pkAndPredicate,           ## &a     --> Internal DSL: &a
  pkNotPredicate,           ## !a     --> Internal DSL: !a
  pkCapture,                ## {a}    --> Internal DSL: capture(a)
  pkBackRef,                ## $i     --> Internal DSL: backref(i)
  pkBackRefIgnoreCase, pkBackRefIgnoreStyle, pkSearch, ## @a     --> Internal DSL: !*a
  pkCapturedSearch,         ## {@} a  --> Internal DSL: !*\a
  pkRule,                   ## a <- b
  pkList,                   ## a, b
  pkStartAnchor              ## ^      --> Internal DSL: startAnchor()
Source   Edit  

Consts

MaxSubpatterns = 20
defines the maximum number of subpatterns that can be captured. More subpatterns cannot be captured! Source   Edit  

Procs

func `!`(a: Peg): Peg {....gcsafe, extern: "npegsNotPredicate", raises: [],
                        tags: [], forbids: [].}
constructs a "not predicate" with the PEG a Source   Edit  
func `!*`(a: Peg): Peg {....gcsafe, extern: "npegsSearch", raises: [], tags: [],
                         forbids: [].}
constructs a "search" for the PEG a Source   Edit  
func `!*\`(a: Peg): Peg {....gcsafe, extern: "npgegsCapturedSearch", raises: [],
                          tags: [], forbids: [].}
constructs a "captured search" for the PEG a Source   Edit  
func `$`(r: Peg): string {....gcsafe, extern: "npegsToString", raises: [],
                           tags: [], forbids: [].}
converts a PEG to its string representation Source   Edit  
func `&`(a: Peg): Peg {....gcsafe, extern: "npegsAndPredicate", raises: [],
                        tags: [], forbids: [].}
constructs an "and predicate" with the PEG a Source   Edit  
func `*`(a: Peg): Peg {....gcsafe, extern: "npegsGreedyRep", raises: [], tags: [],
                        forbids: [].}
constructs a "greedy repetition" for the PEG a Source   Edit  
func `+`(a: Peg): Peg {....gcsafe, extern: "npegsGreedyPosRep", raises: [],
                        tags: [], forbids: [].}
constructs a "greedy positive repetition" with the PEG a Source   Edit  
func `/`(a: varargs[Peg]): Peg {....gcsafe, extern: "npegsOrderedChoice",
                                 raises: [], tags: [], forbids: [].}
constructs an ordered choice with the PEGs in a Source   Edit  
func `?`(a: Peg): Peg {....gcsafe, extern: "npegsOptional", raises: [], tags: [],
                        forbids: [].}
constructs an optional for the PEG a Source   Edit  
func any(): Peg {.inline, ...raises: [], tags: [], forbids: [].}
constructs the PEG any character (.) Source   Edit  
func anyRune(): Peg {.inline, ...raises: [], tags: [], forbids: [].}
constructs the PEG any rune (_) Source   Edit  
func backref(index: range[1 .. MaxSubpatterns]; reverse: bool = false): Peg {.
    ...gcsafe, extern: "npegs$1", raises: [], tags: [], forbids: [].}
constructs a back reference of the given index. index starts counting from 1. reverse specifies whether indexing starts from the end of the capture list. Source   Edit  
func backrefIgnoreCase(index: range[1 .. MaxSubpatterns]; reverse: bool = false): Peg {.
    ...gcsafe, extern: "npegs$1", raises: [], tags: [], forbids: [].}
constructs a back reference of the given index. index starts counting from 1. reverse specifies whether indexing starts from the end of the capture list. Ignores case for matching. Source   Edit  
func backrefIgnoreStyle(index: range[1 .. MaxSubpatterns]; reverse: bool = false): Peg {.
    ...gcsafe, extern: "npegs$1", raises: [], tags: [], forbids: [].}
constructs a back reference of the given index. index starts counting from 1. reverse specifies whether indexing starts from the end of the capture list. Ignores style for matching. Source   Edit  
func bounds(c: Captures; i: range[0 .. 20 - 1]): tuple[first, last: int] {.
    ...raises: [], tags: [], forbids: [].}
returns the bounds [first..last] of the i'th capture. Source   Edit  
func capture(a: Peg = Peg(kind: pkEmpty)): Peg {....gcsafe, extern: "npegsCapture",
    raises: [], tags: [], forbids: [].}
constructs a capture with the PEG a Source   Edit  
func ch(p: Peg): char {....raises: [], tags: [], forbids: [].}
Returns the char representation of a given Peg variant object where present. Source   Edit  
func charChoice(p: Peg): ref set[char] {....raises: [], tags: [], forbids: [].}
Returns the charChoice field of a given Peg variant object where present. Source   Edit  
func charSet(s: set[char]): Peg {....gcsafe, extern: "npegs$1", raises: [],
                                  tags: [], forbids: [].}
constructs a PEG from a character set s Source   Edit  
func col(nt: NonTerminal): int {....raises: [], tags: [], forbids: [].}
Gets the column number of the definition of the parent Peg object variant of a given NonTerminal. Source   Edit  
func contains(s: string; pattern: Peg; matches: var openArray[string]; start = 0): bool {.
    ...gcsafe, extern: "npegs$1Capture", raises: [], tags: [RootEffect],
    forbids: [].}
same as find(s, pattern, matches, start) >= 0 Source   Edit  
func contains(s: string; pattern: Peg; start = 0): bool {....gcsafe,
    extern: "npegs$1", raises: [], tags: [RootEffect], forbids: [].}
same as find(s, pattern, start) >= 0 Source   Edit  
func endAnchor(): Peg {.inline, ...raises: [], tags: [], forbids: [].}
constructs the PEG $ which matches the end of the input. Source   Edit  
func endsWith(s: string; suffix: Peg; start = 0): bool {....gcsafe,
    extern: "npegs$1", raises: [], tags: [RootEffect], forbids: [].}
returns true if s ends with the pattern suffix Source   Edit  
func escapePeg(s: string): string {....raises: [], tags: [], forbids: [].}
escapes s so that it is matched verbatim when used as a peg. Source   Edit  
func find(s: string; pattern: Peg; matches: var openArray[string]; start = 0): int {.
    ...gcsafe, extern: "npegs$1Capture", raises: [], tags: [RootEffect],
    forbids: [].}
returns the starting position of pattern in s and the captured substrings in the array matches. If it does not match, nothing is written into matches and -1 is returned. Source   Edit  
func find(s: string; pattern: Peg; start = 0): int {....gcsafe, extern: "npegs$1",
    raises: [], tags: [RootEffect], forbids: [].}
returns the starting position of pattern in s. If it does not match, -1 is returned. Source   Edit  
func findAll(s: string; pattern: Peg; start = 0): seq[string] {....gcsafe,
    extern: "npegs$1", raises: [], tags: [RootEffect], forbids: [].}
returns all matching substrings of s that match pattern. If it does not match, @[] is returned. Source   Edit  
func findBounds(s: string; pattern: Peg; matches: var openArray[string];
                start = 0): tuple[first, last: int] {....gcsafe,
    extern: "npegs$1Capture", raises: [], tags: [RootEffect], forbids: [].}
returns the starting position and end position of pattern in s and the captured substrings in the array matches. If it does not match, nothing is written into matches and (-1,0) is returned. Source   Edit  
func flags(nt: NonTerminal): set[NonTerminalFlag] {....raises: [], tags: [],
    forbids: [].}
Gets the NonTerminalFlag-typed flags field of the parent Peg variant object of a given NonTerminal. Source   Edit  
func index(p: Peg): range[-20 .. 20 - 1] {....raises: [], tags: [], forbids: [].}
Returns the back-reference index of a captured sub-pattern in the Captures object for a given Peg variant object where present. Source   Edit  
func kind(p: Peg): PegKind {....raises: [], tags: [], forbids: [].}
Returns the PegKind of a given Peg object. Source   Edit  
func line(nt: NonTerminal): int {....raises: [], tags: [], forbids: [].}
Gets the line number of the definition of the parent Peg object variant of a given NonTerminal. Source   Edit  
func match(s: string; pattern: Peg; matches: var openArray[string]; start = 0): bool {.
    ...gcsafe, extern: "npegs$1Capture", raises: [], tags: [RootEffect],
    forbids: [].}
returns true if s[start..] matches the pattern and the captured substrings in the array matches. If it does not match, nothing is written into matches and false is returned. Source   Edit  
func match(s: string; pattern: Peg; start = 0): bool {....gcsafe,
    extern: "npegs$1", raises: [], tags: [RootEffect], forbids: [].}
returns true if s matches the pattern beginning from start. Source   Edit  
func matchLen(s: string; pattern: Peg; matches: var openArray[string]; start = 0): int {.
    ...gcsafe, extern: "npegs$1Capture", raises: [], tags: [RootEffect],
    forbids: [].}
the same as match, but it returns the length of the match, if there is no match, -1 is returned. Note that a match length of zero can happen. It's possible that a suffix of s remains that does not belong to the match. Source   Edit  
func matchLen(s: string; pattern: Peg; start = 0): int {....gcsafe,
    extern: "npegs$1", raises: [], tags: [RootEffect], forbids: [].}
the same as match, but it returns the length of the match, if there is no match, -1 is returned. Note that a match length of zero can happen. It's possible that a suffix of s remains that does not belong to the match. Source   Edit  
func name(nt: NonTerminal): string {....raises: [], tags: [], forbids: [].}
Gets the name of the symbol represented by the parent Peg object variant of a given NonTerminal. Source   Edit  
func newLine(): Peg {.inline, ...raises: [], tags: [], forbids: [].}
constructs the PEG newline (\n) Source   Edit  
func newNonTerminal(name: string; line, column: int): NonTerminal {....gcsafe,
    extern: "npegs$1", raises: [], tags: [], forbids: [].}
constructs a nonterminal symbol Source   Edit  
func nonterminal(n: NonTerminal): Peg {....gcsafe, extern: "npegs$1", raises: [],
                                        tags: [], forbids: [].}
constructs a PEG that consists of the nonterminal symbol Source   Edit  
func nt(p: Peg): NonTerminal {....raises: [], tags: [], forbids: [].}
Returns the NonTerminal object of a given Peg variant object where present. Source   Edit  
func parallelReplace(s: string; subs: varargs[tuple[pattern: Peg, repl: string]]): string {.
    ...gcsafe, extern: "npegs$1", raises: [ValueError], tags: [RootEffect],
    forbids: [].}
Returns a modified copy of s with the substitutions in subs applied in parallel. Source   Edit  
func parsePeg(pattern: string; filename = "pattern"; line = 1; col = 0): Peg {.
    ...raises: [ValueError, EInvalidPeg, Exception], tags: [RootEffect],
    forbids: [].}
constructs a Peg object from pattern. filename, line, col are used for error messages, but they only provide start offsets. parsePeg keeps track of line and column numbers within pattern. Source   Edit  
func peg(pattern: string): Peg {....raises: [ValueError, EInvalidPeg, Exception],
                                 tags: [RootEffect], forbids: [].}
constructs a Peg object from the pattern. The short name has been chosen to encourage its use as a raw string modifier:
peg"{\ident} \s* '=' \s* {.*}"
Source   Edit  
func rawMatch(s: string; p: Peg; start: int; c: var Captures): int {....gcsafe,
    extern: "npegs$1", raises: [], tags: [RootEffect], forbids: [].}
low-level matching proc that implements the PEG interpreter. Use this for maximum efficiency (every other PEG operation ends up calling this proc). Returns -1 if it does not match, else the length of the match Source   Edit  
func replace(s: string; sub: Peg;
             cb: proc (match: int; cnt: int; caps: openArray[string]): string): string {.
    ...gcsafe, extern: "npegs$1cb", effectsOf: cb, ...raises: [], tags: [RootEffect],
    forbids: [].}

Replaces sub in s by the resulting strings from the callback. The callback proc receives the index of the current match (starting with 0), the count of captures and an open array with the captures of each match. Examples:

func handleMatches*(m: int, n: int, c: openArray[string]): string =
  result = ""
  if m > 0:
    result.add ", "
  result.add case n:
    of 2: c[0].toLower & ": '" & c[1] & "'"
    of 1: c[0].toLower & ": ''"
    else: ""

let s = "Var1=key1;var2=Key2;   VAR3"
echo s.replace(peg"{\ident}('='{\ident})* ';'* \s*", handleMatches)

Results in:

"var1: 'key1', var2: 'Key2', var3: ''"

Source   Edit  
func replace(s: string; sub: Peg; by = ""): string {....gcsafe, extern: "npegs$1",
    raises: [], tags: [RootEffect], forbids: [].}
Replaces sub in s by the string by. Captures cannot be accessed in by. Source   Edit  
func replacef(s: string; sub: Peg; by: string): string {....gcsafe,
    extern: "npegs$1", raises: [ValueError], tags: [RootEffect], forbids: [].}

Replaces sub in s by the string by. Captures can be accessed in by with the notation $i and $# (see strutils.%). Examples:

"var1=key; var2=key2".replacef(peg"{\ident}'='{\ident}", "$1<-$2$2")

Results in:

"var1<-keykey; val2<-key2key2"

Source   Edit  
func rule(nt: NonTerminal): Peg {....raises: [], tags: [], forbids: [].}
Gets the Peg object representing the rule definition of the parent Peg object variant of a given NonTerminal. Source   Edit  
func sequence(a: varargs[Peg]): Peg {....gcsafe, extern: "npegs$1", raises: [],
                                      tags: [], forbids: [].}
constructs a sequence with all the PEGs from a Source   Edit  
func split(s: string; sep: Peg): seq[string] {....gcsafe, extern: "npegs$1",
    raises: [], tags: [RootEffect], forbids: [].}
Splits the string s into substrings. Source   Edit  
func startAnchor(): Peg {.inline, ...raises: [], tags: [], forbids: [].}
constructs the PEG ^ which matches the start of the input. Source   Edit  
func startsWith(s: string; prefix: Peg; start = 0): bool {....gcsafe,
    extern: "npegs$1", raises: [], tags: [RootEffect], forbids: [].}
returns true if s starts with the pattern prefix Source   Edit  
func term(p: Peg): string {....raises: [], tags: [], forbids: [].}
Returns the string representation of a given Peg variant object where present. Source   Edit  
func term(t: char): Peg {....gcsafe, extern: "npegs$1Char", raises: [], tags: [],
                          forbids: [].}
constructs a PEG from a terminal char Source   Edit  
func term(t: string): Peg {....gcsafe, extern: "npegs$1Str", raises: [], tags: [],
                            forbids: [].}
constructs a PEG from a terminal string Source   Edit  
func termIgnoreCase(t: string): Peg {....gcsafe, extern: "npegs$1", raises: [],
                                      tags: [], forbids: [].}
constructs a PEG from a terminal string; ignore case for matching Source   Edit  
func termIgnoreStyle(t: string): Peg {....gcsafe, extern: "npegs$1", raises: [],
                                       tags: [], forbids: [].}
constructs a PEG from a terminal string; ignore style for matching Source   Edit  
proc transformFile(infile, outfile: string;
                   subs: varargs[tuple[pattern: Peg, repl: string]]) {....gcsafe,
    extern: "npegs$1", raises: [IOError, ValueError],
    tags: [ReadIOEffect, WriteIOEffect, RootEffect], forbids: [].}

reads in the file infile, performs a parallel replacement (calls parallelReplace) and writes back to outfile. Raises IOError if an error occurs. This is supposed to be used for quick scripting.

Note: this proc does not exist while using the JS backend.

Source   Edit  
func unicodeLetter(): Peg {.inline, ...raises: [], tags: [], forbids: [].}
constructs the PEG \letter which matches any Unicode letter. Source   Edit  
func unicodeLower(): Peg {.inline, ...raises: [], tags: [], forbids: [].}
constructs the PEG \lower which matches any Unicode lowercase letter. Source   Edit  
func unicodeTitle(): Peg {.inline, ...raises: [], tags: [], forbids: [].}
constructs the PEG \title which matches any Unicode title letter. Source   Edit  
func unicodeUpper(): Peg {.inline, ...raises: [], tags: [], forbids: [].}
constructs the PEG \upper which matches any Unicode uppercase letter. Source   Edit  
func unicodeWhitespace(): Peg {.inline, ...raises: [], tags: [], forbids: [].}
constructs the PEG \white which matches any Unicode whitespace character. Source   Edit  

Iterators

iterator findAll(s: string; pattern: Peg; start = 0): string {....raises: [],
    tags: [RootEffect], forbids: [].}
yields all matching substrings of s that match pattern. Source   Edit  
iterator items(p: Peg): Peg {.inline, ...raises: [], tags: [], forbids: [].}
Yields the child nodes of a Peg variant object where present. Source   Edit  
iterator pairs(p: Peg): (int, Peg) {.inline, ...raises: [], tags: [], forbids: [].}
Yields the indices and child nodes of a Peg variant object where present. Source   Edit  
iterator split(s: string; sep: Peg): string {....raises: [], tags: [RootEffect],
    forbids: [].}

Splits the string s into substrings.

Substrings are separated by the PEG sep. Examples:

for word in split("00232this02939is39an22example111", peg"\d+"):
  writeLine(stdout, word)

Results in:

"this"
"is"
"an"
"example"

Source   Edit  

Templates

template `=~`(s: string; pattern: Peg): bool

This calls match with an implicit declared matches array that can be used in the scope of the =~ call:

if line =~ peg"\s* {\w+} \s* '=' \s* {\w+}":
  # matches a key=value pair:
  echo("Key: ", matches[0])
  echo("Value: ", matches[1])
elif line =~ peg"\s*{'#'.*}":
  # matches a comment
  # note that the implicit ``matches`` array is different from the
  # ``matches`` array of the first branch
  echo("comment: ", matches[0])
else:
  echo("syntax error")

Source   Edit  
template digits(): Peg
expands to charset({'0'..'9'}) Source   Edit  
template eventParser(pegAst, handlers: untyped): (proc (s: string): int)

Generates an interpreting event parser proc according to the specified PEG AST and handler code blocks. The proc can be called with a string to be parsed and will execute the handler code blocks whenever their associated grammar element is matched. It returns -1 if the string does not match, else the length of the total match. The following example code evaluates an arithmetic expression defined by a simple PEG:

import std/[strutils, pegs]

let
  pegAst = """
Expr    <- Sum
Sum     <- Product (('+' / '-')Product)*
Product <- Value (('*' / '/')Value)*
Value   <- [0-9]+ / '(' Expr ')'
  """.peg
  txt = "(5+3)/2-7*22"

var
  pStack: seq[string] = @[]
  valStack: seq[float] = @[]
  opStack = ""
let
  parseArithExpr = pegAst.eventParser:
    pkNonTerminal:
      enter:
        pStack.add p.nt.name
      leave:
        pStack.setLen pStack.high
        if length > 0:
          let matchStr = s.substr(start, start+length-1)
          case p.nt.name
          of "Value":
            try:
              valStack.add matchStr.parseFloat
              echo valStack
            except ValueError:
              discard
          of "Sum", "Product":
            try:
              let val = matchStr.parseFloat
            except ValueError:
              if valStack.len > 1 and opStack.len > 0:
                valStack[^2] = case opStack[^1]
                of '+': valStack[^2] + valStack[^1]
                of '-': valStack[^2] - valStack[^1]
                of '*': valStack[^2] * valStack[^1]
                else: valStack[^2] / valStack[^1]
                valStack.setLen valStack.high
                echo valStack
                opStack.setLen opStack.high
                echo opStack
    pkChar:
      leave:
        if length == 1 and "Value" != pStack[^1]:
          let matchChar = s[start]
          opStack.add matchChar
          echo opStack

let pLen = parseArithExpr(txt)

The handlers parameter consists of code blocks for PegKinds, which define the grammar elements of interest. Each block can contain handler code to be executed when the parser enters and leaves text matching the grammar element. An enter handler can access the specific PEG AST node being matched as p, the entire parsed string as s and the position of the matched text segment in s as start. A leave handler can access p, s, start and also the length of the matched text segment as length. For an unsuccessful match, the enter and leave handlers will be executed, with length set to -1.

Symbols declared in an enter handler can be made visible in the corresponding leave handler by annotating them with an inject pragma.

Source   Edit  
template ident(): Peg
same as [a-zA-Z_][a-zA-z_0-9]*; standard identifier Source   Edit  
template identChars(): Peg
expands to charset({'a'..'z', 'A'..'Z', '0'..'9', '_'}) Source   Edit  
template identStartChars(): Peg
expands to charset({'A'..'Z', 'a'..'z', '_'}) Source   Edit  
template letters(): Peg
expands to charset({'A'..'Z', 'a'..'z'}) Source   Edit  
template natural(): Peg
same as \d+ Source   Edit  
template whitespace(): Peg
expands to charset({' ', '\9'..'\13'}) Source   Edit