src/fusion/matching

Search:
Group by:

Example:

import src/fusion/matching
{.experimental: "caseStmtMacros".}

case [(1, 3), (3, 4)]:
  of [(1, @a), _]:
    echo a

  else:
    echo "Match failed"

"you can probably make a macro for that" -- Rika, 22-09-2020 10:41:51

Author:haxscramper

This module implements pattern matching for objects, tuples, sequences, key-value pairs, case and derived objects. DSL can also be used to create object trees (AST).

Quick reference

ExampleExplanation
(fld: @val)Field fld into variable @val
Kind()Object with .kind == Kind() 1
of Derived()Match object of derived type
(@val, _)First element in tuple in @val
(@val, @val)Tuple with two equal elements
{"key" : @val}Table with "key", capture into @val 2
[_, _]Sequence with len == 2 3
[_, .._]At least one element
[_, all @val]All elements starting from index 1
[until @val == "2", .._]Capture all elements until first "2" 4
[until @val == 1, @val]All including first match
[all @val == 12]All elements are == 12, capture into @val
[some @val == 12]At least one is == 12, capture all matching into @val
  • 1 Kind fields can use shorted enum names - both nnkStrLit and StrLit will work (prefix nnk can be omitted)
  • 2 Or any object with contains and [] defined (for necessary types)
  • 3 Or any object with len proc or field
  • 4 Note that sequence must match fully and it is necessary to have .._ at the end in order to accept sequences of arbitrary length.

Supported match elements

  • seqs - matched using [Patt1(), Patt2(), ..]. Must have len(): int and iterator items(): T defined.
  • tuples - matched using (Patt1(), Patt2(), ..).
  • pairable - matched using {Key: Patt()}. Must have [Key]: T defined. Key is not a pattern - search for whole collection won't be performed.
  • set - matched using {Val1, Val2, .._}. Must have contains defined. If variable is captured then Val1 must be comparable and collection should also implement items and incl.
  • object - matched using (field: Val). Case objects are matched using Kind(field: Val). If you want to check agains multiple values for kind field (kind: in SomeSetOfKinds)

Element access

To determine whether particular object matches pattern access path is generated - sequence of fields and [] operators that you would normally write by hand, like fld.subfield["value"].len. Due to support for method call syntax there is no difference between field access and proc call, so things like (len: < 12) also work as expected.

(fld: "3") Match field fld against "3". Generated access is expr.fld == "3".

["2"] Match first element of expression agains patt. Generate acess expr[pos] == "2", where pos is an integer index for current position in sequence.

("2") For each field generate access using [1]

{"key": "val"} First check "key" in expr and then expr["key"] == "val". No exception on missing keys, just fail match.

It is possible to have mixed assess for objects. Mixed object access via (gg: _, [], {}) creates the same code for checking. E.g ([_]) is the same as [_], ({"key": "val"}) and is identical to just {"key": "val"}. You can also call functions and check their values (like (len: _(it < 10)) or (len: in {0 .. 10})) to check for sequence length.

Checks

  • Any operators with exception of is (subpattern) and of (derived object subpattern) is considered final comparison and just pasted as-is into generated pattern match code. E.g. fld: in {2,3,4} will generate expr.fld in {2,3,4}
  • (fld: Patt()) - check if expr.fld matches pattern Patt()
  • (fld: _.matchesPredicate()) - if call to matchesPredicate(expr.fld) evaluates to true.

Notation: <expr> refers to any possible combination of checks. For example

  • fld: in {1,2,3} - <expr> is in {1,2,3}
  • [_] - <expr> is _
  • fld: Patt() - <expr> is Patt()

Examples

  • (fld: 12) If rhs for key-value pair is integer, string or identifier then == comparison will be generated.
  • (fld: == ident("33")) if rhs is a prefix of == then == will be generated. Any for of prefix operator will be converted to expr.fld <op> <rhs>.
  • (fld: in {1, 3, 3}) or (fld: in Anything) creates fld.expr in Anything. Either in or notin can be used.

Variable binding

Match can be bound to new variable. All variable declarations happen via @varname syntax.

  • To bind element to variable without any additional checks do: (fld: @varname)
  • To bind element with some additional operator checks do:
    • (fld: @varname <operator> Value) first perform check using <operator> and then add Value to @varname
      • (fld: @hello is ("2" | "3"))
  • Predicate checks: fld: @a.matchPredicate()
  • Arbitrary expression: fld: @a(it mod 2 == 0). If expression has no type it is considered true.

Bind order

Bind order: if check evaluates to true variable is bound immediately, making it possible to use in other checks. [@head, any @tail != head] is a valid pattern. First match head and then any number of @tail elements. Can use any _(if it != head: tail.add it) and declare tail externally.

Variable is never rebound. After it is bound, then it will have the value of first binding.

Bind variable type

  • Any variadics are mapped to sequence
  • Only once in alternative is option
  • Explicitly optional is option
  • Optional with default value is regular value
  • Variable can be used only once if in alternative
PatternInjected variables
[@a]var a: typeof(expr[0])
{"key": @val}var val: typeof(expr["key"])
[all @a]var a: seq[typeof(expr[0])]
[opt @val]var a: Option[typeof(expr[0])]
[opt @val or default]var a: typeof(expr[0])
(fld: @val)var val: typeof(expr.fld)

Matching different things

Sequence matching

Input sequence: [1,2,3,4,5,6,5,6]

PatternResultComment
[_]FailInput sequence size mismatch
[.._]Ok
[@a]FailInput sequence size mismatch
[@a, .._]Ok, a = 1
[any @a, .._]Error
[any @a(it < 10)]Ok, a = [1..6]Capture all elements that match
[until @a == 6, .._]OkAll until first ocurrence of 6
[all @a == 6, .._]Ok a = []All leading 6
[any @a(it > 100)]FailNo elements > 100
[none @a(it in {6 .. 10})]FailThere is an element == 6
[0 .. 2 is < 10, .._]OkFirst three elements < 10
[0 .. 2 is < 10]FailMissing trailing .._

until non-greedy. Match everything until <expr>

- ``until <expr>``: match all until the first element that matches Expr

all greedy. Match everything that matches <expr>

- ``all <expr>``: all elements should match Expr

- ``all @val is <expr>``: capture all elements in ``@val`` if ``<expr>``
  is true for every one of them.

opt

Optional single element match - if sequence contains fewer elements than
necessary element is considered missing. In that case either `default`
fallback (if present) is used as value, or capture is set to `None(T)`.

- ``opt @a``: match optional element and bind it to a

- ``opt @a or "default"``: either match element to a or set a to
  "default"

any greedy. Consume all sequence elements until the end and succeed only if at least one element has matched.

- ``any @val is "d"``: capture all element that match ``is "d"``

none greedy. Consume all sequence elements until the end and succed only if any element has matched. EE

[m .. n @capture] Capture slice of elements from index m to n

Greedy patterns match until the end of a sequence and cannot be followed by anything else.

For sequence to match is must either be completely matched by all subpatterns or have trailing .._ in pattern.

SequencePatternMatch result
[1,2,3][1,2] [1, .._] [1,2,_]Fail Ok Ok

Use examples

  • capture all elements in sequence: [all @elems]
  • get all elements until (not including "d"): [until @a is "d"]
  • All leading "d": [all @leading is "d"]
  • Match first two elements and ignore the rest [_, _, .._]
  • Match optional third element [_, _, opt @trail]
  • Match third element and if not matched use default value [_, _, opt @trail or "default"]
  • Capture all elements until first separator: [until @leading is "sep", @middle is "sep", all @trailing]
  • Extract all conditions from IfStmt: IfStmt([all ElseIf([@cond, _]), .._])

In addition to working with nested subpatterns it is possible to use pattern matching as simple text scanner, similar to strscans. Main difference is that it allows working on arbitrary sequences, meaning it is possible, for example, to operate on tokens, or as in this example on strings (for the sake of simplicity).

func allIs(str: string, chars: set[char]): bool = str.allIt(it in chars)

"2019-10-11 school start".split({'-', ' '}).assertMatch([
  pref @dateParts(it.allIs({'0' .. '9'})),
  pref _(it.allIs({' '})),
  all @text
])

doAssert dateParts == @["2019", "10", "11"]
doAssert text == @["school", "start"]

Tuple matching

Input tuple: (1, 2, "fa")

PatternResultComment
(_, _, _)OkMatch all
(@a, @a, _)Fail
(@a is (1 | 2), @a, _)Fail1
(1, 1 | 2, _)Ok
  • 1 Pattern backtracking is not performed, @a is first bound to 1, and in subsequent match attempts pattern fails.

Tuple element matches support any regular match expression like @capture, and not different from field matches. You can also use opt @capture or "default" in order to assign fallback value on tuple unpacking.

(@a, (@b, _), _) := ("hello", ("world", 11), 0.2)

Object matching

For matching object fields you can use (fld: value) -

type
  Obj = object
    fld1: int8

func len(o: Obj): int = 0

case Obj():
  of (fld1: < -10):
    discard
  
  of (len: > 10):
    # can use results of function evaluation as fields - same idea as
    # method call syntax in regular code.
    discard
  
  of (fld1: in {1 .. 10}):
    discard
  
  of (fld1: @capture):
    doAssert capture == 0

For objects with Option[T] fields it is possible to use field: opt @capture or "default" to either get capture value, or set it to fallback expression.

Variant object matching

Matching on .kind field is a very common operation and has special syntax sugar - ForStmt() is functionally equivalent to (kind: nnkForStmt), but much more concise.

nnk pefix can be omitted - in general if your enum field name folows nep1 naming conventions (each enum name starts with underscore prefix (common for all enum elements), followed PascalCase enum name.

Input AST

ForStmt
  Ident "i"
  Infix
    Ident ".."
    IntLit 1
    IntLit 10
  StmtList
    Command
      Ident "echo"
      IntLit 12
  • ForStmt([== ident("i"), .._]) Only for loops with i as variable
  • ForStmt([@a is Ident(), .._]) Capture for loop variable
  • ForStmt([@a.isTuple(), .._]) for loops in which first subnode satisfies predicate isTuple(). Bind match to a
  • ForStmt([_, _, (len: in {1 .. 10})]) between one to ten statements in the for loop body
  • Using object name for pattern matching ObjectName() does not produce a hard error, but if .kind field does not need to be checked (fld: <pattern>) will be sufficient.
  • To check .kind against multiple operators prefix in can be used - (kind: in {nnkForStmt, nnkWhileStmt})

Custom unpackers

It is possible to unpack regular object using tuple matcher syntax - in this case overload for [] operator must be provided that accepts static[FieldIndex] argument and returns a field.

type
  Point = object
    x: int
    y: int

proc `[]`(p: Point, idx: static[FieldIndex]): auto =
  when idx == 0:
    p.x
  elif idx == 1:
    p.y
  else:
    static:
      error("Cannot unpack `Point` into three-tuple")

let point = Point(x: 12, y: 13)

(@x, @y) := point

assertEq x, 12
assertEq y, 13

Note auto return type for [] proc - it is necessary if different types of fields might be returned on tuple unpacking, but not mandatory.

If different fields have varying types when must and static be used to allow for compile-time code selection.

Predicates and infix operators

Infix operators

By default object fields are either matched using recursive pattern, or compared for equality (when field: "some value" is used). It is also possible to explicitly specify operator, for example using =~ from std/pegs module:

case (parent: (field: "string")):
  of (parent.field: =~ peg"str{\w+}"):
    doAssert matches[0] == "ing"

It should be noted that implicitly injected matches variable is also visible in the case branch.

Custom predicates

Matching expressions using custom predicates is also possible. If it is not necessary to capture matched element placeholder _. should be used as a first argument:

proc lenEq(s: openarray[int], value: int): bool = s.len == value

case [1, 2]:
  of _.lenEq(3):
    # fails
  
  of _.lenEq(2):
    # matches

To capture value using predicate placeholder can be replaced with @capture pattern:

let arr = @[@[1, 2], @[2, 3], @[4]]
discard arr.matches([any @capture.lenEq(2)])
doAssert capture == @[@[1, 2], @[2, 3]]

Ref object matching

It is also possible to match derived ref objects with patterns using of operator. It allows for runtime selection of different derived types.

Note that of operator is necessary for distinguishing between multiple derived objects, or getting fields that are present only in derived types. In addition to it performs isNil() check in the object, so it might be used in cases when you are not dealing with derived types.

Due to isNil() check this pattern only makes sense when working with ref objects.

type
  Base1 = ref object of RootObj
    fld: int
  
  First1 = ref object of Base1
    first: float
  
  Second1 = ref object of Base1
    second: string

let elems: seq[Base1] = @[
  Base1(fld: 123),
  First1(fld: 456, first: 0.123),
  Second1(fld: 678, second: "test"),
  nil
]

for elem in elems:
  case elem:
    of of First1(fld: @capture1, first: @first):
      # Only capture `Frist1` elements
      doAssert capture1 == 456
      doAssert first == 0.123
    
    of of Second1(fld: @capture2, second: @second):
      # Capture `second` field in derived object
      doAssert capture2 == 678
      doAssert second == "test"
    
    of of Base1(fld: @default):
      # Match all *non-nil* base elements
      doAssert default == 123
    
    else:
      doAssert isNil(elem)

KV-pairs matching

Input json string

{"menu": {
  "id": "file",
  "value": "File",
  "popup": {
    "menuitem": [
      {"value": "New", "onclick": "CreateNewDoc()"},
      {"value": "Open", "onclick": "OpenDoc()"},
      {"value": "Close", "onclick": "CloseDoc()"}
    ]
  }
}}
  • Get input ["menu"]["file"] from node and
case inj:
  of {"menu" : {"file": @file is JString()}}:
    # ...
  else:
    raiseAssert("Expected [menu][file] as string, but found " & $inj)

Option matching

Some(@x) and None() is a special case that will be rewritten into (isSome: true, get: @x) and (isNone: true) respectively. This is made to allow better integration with optional types. 9_ .

Note: implementation does not explicitly require to use std/options.Option type, but instead works with anything that provides following functions:

  • isSome(): bool (for Some() pattern check),
  • isNone(): bool (for None() pattern), and
  • get(): T (for getting value if type is some).

Tree matching

For deeply nested AST structures it might be really inconvenient to write one-line expression with lots of ProcDef[@name is Ident() | Postfix[_, @name is Ident()]] and so on. But it is possible to use block syntax for patterns if necessary -

ProcDef:
  @name is Ident() | Postfix[_, @name is Ident()]
  # Other pattern parts

In case of ProcDef: pattern braces can be omitted because it is clear that we are trying to match a case object here.

Tree matching syntax has a nice property of being extremely close (copy-pastable) from treeRepr for NimNode. For a following proc declaration:

proc testProc1(arg1: int) {.unpackProc.} =
  discard

We have an ast

ProcDef
  Ident "testProc1"
  Empty
  Empty
  FormalParams
    Empty
    IdentDefs
      Ident "arg1"
      Ident "int"
      Empty
  Empty
  Empty
  StmtList
    DiscardStmt
      Empty

That can be matched using following pattern:

procDecl.assertMatch:
  ProcDef:
    Ident(strVal: @name) | Postfix[_, Ident(strVal: @name)]
    _ # Term rewriting template
    _ # Generic params
    FormalParams:
      @returnType
      all IdentDefs[@trailArgsName, _, _]
    
    @pragmas
    _ # Reserved
    @implementation

Tree construction

makeTree provides 'reversed' implementation of pattern matching, which allows to construct tree from pattern, using variables. Example of use

type
  HtmlNodeKind = enum
    htmlBase = "base"
    htmlHead = "head"
    htmlLink = "link"
  
  HtmlNode = object
    kind*: HtmlNodeKind
    text*: string
    subn*: seq[HtmlNode]

func add(n: var HtmlNode, s: HtmlNode) = n.subn.add s

discard makeTree(HtmlNode):
  base:
    link(text: "hello")

In order to construct tree, kind= and add have to be defined. Internally DSL just creats resulting object, sets kind= and then repeatedly add elements to it. In order to properties for objects either the field has to be exported, or fld= has to be defined (where fld is the name of property you want to set).

Types

FieldIndex = distinct int
ItemMatchKind = enum
  imkInfixEq,               ## Match item using infix operator
  imkSubPattern,            ## Match item by checking it agains subpattern
  imkPredicate               ## Execute custom predicate to determine if element
                             ## matches pattern.
Type of item pattern match
KVPair = object
Match = ref object
  bindVar*: Option[NimNode]  ## Bound variable (if any)
  declNode*: NimNode         ## Original declaration of match
  isOptional*: bool
  fallback*: Option[NimNode] ## Default value in case match fails
  case kind*: MatchKind
  of kItem:
    case
    of imkInfixEq:
      infix*: string         ## Infix operator used for comparison
      rhsNode*: NimNode      ## Rhs expression to compare against
      isPlaceholder*: bool   ## Always true? `_` pattern is an
                             ## infix expression with `isPlaceholder` equal to true
    of imkSubPattern:
      rhsPattern*: Match     ## SubPattern to compare value against
    of imkPredicate:
      isCall*: bool          ## Predicate is a call expression
                             ## (`@val.matches()`) or a free-standing expression
                             ## (`@val(it.len < 100)`)
      predBody*: NimNode     ## Body of the expression
  of kAlt:
    altElements*: seq[Match] ## Alternatives for matching
  of kSeq:
    seqElements*: seq[SeqStructure] ## Sequence subpatterns
  of kTuple:
    tupleElements*: seq[Match] ## Tuple elements
  of kPairs:
    pairElements*: seq[KVPair]
    nocheck*: bool
  of kSet:
    setElements*: seq[Match]
  of kObject:
    kindCall*: Option[NimNode] ## Optional node with kind
                               ## expression pattern (see `hasKind`)
    isRefKind*: bool
    fieldElements*: seq[tuple[name: string, pattern: Match]]
    kvMatches*: Option[Match] ## Optional key-value matches for
                              ## expressions like `JObject({"key": @val})`
    seqMatches*: Option[Match] ## Optional indexed matches for
                               ## subelement access using `Infix([@op, @lhs, @rhs])` pattern.
Object describing single match for element
MatchError = ref object of CatchableError
Exception indicating match failure
MatchKind = enum
  kItem,                    ## Match single element
  kSeq,                     ## Match sequence of elements
  kTuple,                   ## Mach tuple (anonymous or named)
  kPairs,                   ## Match key-value pairs
  kObject,                  ## Match object, named tuple or object-like value
  kSet,                     ## Match set of elements
  kAlt                       ## Ordered choice - mactch any of patterns.
Different kinds of matching patterns
SeqKeyword = enum
  lkAny = "any",            ## Any element from seq
  lkAll = "all",            ## All elements from seq
  lkNone = "none",          ## None of the elements from seq
  lkOpt = "opt",            ## Optionaly match element in seq
  lkUntil = "until",        ## All elements until
  lkPref = "pref",          ## All elements while
  lkPos,                    ## Exact position
  lkSlice,                  ## Subrange slice
  lkTrail                    ## Variadic placeholder `.._`
Possible special words for seq pattern matching
SeqStructure = object
  ## Original declaration of the node
  bindVar*: Option[NimNode]  ## Optional bound variable
  pattern*: Match            ## Patterh for element matching
  case kind*: SeqKeyword
  of lkSlice:
    slice*: NimNode
  else:
    nil
VarKind = enum
  vkRegular,                ## Regular variable, assigned once
  vkSequence, vkOption, vkSet, vkAlt
Kind of matched variables
VarSpec = object
  decl*: NimNode             ## First time variable has been declared
  case varKind*: VarKind     ## Type of the variable
  of vkAlt:
    prefixMap*: Table[Path, AltSpec]
  else:
    nil
  typePath*: Path            ## Whole path for expression that can be used to
                             ## determine type of the variable.
  foundCount*: int           ## Number of variable occurencies in expression

Consts

nnkFloatKinds = {nnkFloatLit..nnkFloat128Lit}
Set of all nim node kinds for float literal nodes
nnkIdentKinds = {nnkIdent, nnkSym, nnkOpenSymChoice}
Set of all nim node kinds for identifier-like nodes
nnkIntKinds = {nnkCharLit..nnkUInt64Lit}
Set of all nim node kinds for integer literal nodes
nnkStrKinds = {nnkStrLit..nnkTripleStrLit}
Set of all nim node kinds for string nodes
nnkTokenKinds = {nnkEmpty..nnkSym, nnkCharLit..nnkTripleStrLit, nnkOpenSymChoice}
Set of all token-like nodes (primitive type literals or identifiers)

Procs

func `$`(kvp: KVPair): string {....raises: [Exception], tags: [RootEffect],
                                forbids: [].}
func `$`(match: Match): string {....raises: [Exception], tags: [RootEffect],
                                 forbids: [].}
func `$`(path: Path): string {....raises: [], tags: [], forbids: [].}
func `$`(ss: SeqStructure): string {....raises: [Exception], tags: [RootEffect],
                                     forbids: [].}
func `==`(idx: FieldIndex; i: SomeInteger): bool
proc getKindNames(head: NimNode): (string, seq[string]) {....raises: [KeyError],
    tags: [], forbids: [].}
func kind=(node: var NimNode; kind: NimNodeKind) {....raises: [], tags: [],
    forbids: [].}
func makeMatchExpr(m: Match; mainExpr: NimNode; doRaise: bool;
                   originalMainExpr: NimNode): tuple[expr: NimNode,
    vtable: VarTable, mixident: seq[string]] {....raises: [KeyError, Exception],
    tags: [RootEffect], forbids: [].}
Create NimNode for checking whether or not item referred to by mainExpr matches pattern described by Match
func parseMatchExpr(n: NimNode): Match {....raises: [Exception],
    tags: [RootEffect], forbids: [].}
func str(node: NimNode): string {....raises: [], tags: [], forbids: [].}
func str=(node: var NimNode; val: string) {....raises: [], tags: [], forbids: [].}
func toAccs(path: Path; name: NimNode; pathForType: bool): NimNode {....raises: [],
    tags: [], forbids: [].}
Convert path in object to expression for getting element at path.

Macros

macro assertMatch(input, pattern: untyped): untyped
Try to match input using pattern and raise MatchError on failure. For DSL syntax details see start of the document.
macro `case`(n: untyped): untyped
macro expand(body: typed): untyped
macro hasKindImpl(head: typed; kind: untyped): untyped
macro matches(input, pattern: untyped): untyped
Try to match input using pattern and return false on failure. For DSL syntax details see start of the document.

Templates

template `:=`(lhs, rhs: untyped): untyped
Shorthand for assertMatch
template `?=`(lhs, rhs: untyped): untyped
Shorthand for matches
template `[]`(t: tuple; idx: static[FieldIndex]): untyped
template hasKind(head, kindExpr: untyped): untyped
Determine if head has kind value. Either function/procedure kind or field with the same name is expected to be declared. Type of kind must be an enum. Kind expression is a pattern describing expected values. Possible examples of pattern (assuming value of type NimNode is used as head)
  • nnkIntLit - match integer literal
  • IntLit - alternative (preferred) syntax for matching enum values nnk prefix can be omitted.
template makeTree(T: typed; pattern: untyped): untyped
Construct tree from pattern matching expression. For example of use see documentation at the start of the module
template varOfIteration(arg: untyped): untyped