Nim Experimental Features

Source   Edit  

Authors:Andreas Rumpf
Version:1.7.3

About this document

This document describes features of Nim that are to be considered experimental. Some of these are not covered by the .experimental pragma or --experimental switch because they are already behind a special syntax and one may want to use Nim libraries using these features without using them oneself.

Note: Unless otherwise indicated, these features are not to be removed, but refined and overhauled.

Void type

The void type denotes the absence of any type. Parameters of type void are treated as non-existent, void as a return type means that the procedure does not return a value:

proc nothing(x, y: void): void =
  echo "ha"

nothing() # writes "ha" to stdout

The void type is particularly useful for generic code:

proc callProc[T](p: proc (x: T), x: T) =
  when T is void:
    p()
  else:
    p(x)

proc intProc(x: int) = discard
proc emptyProc() = discard

callProc[int](intProc, 12)
callProc[void](emptyProc)

However, a void type cannot be inferred in generic code:

callProc(emptyProc)
# Error: type mismatch: got (proc ())
# but expected one of:
# callProc(p: proc (T), x: T)

The void type is only valid for parameters and return types; other symbols cannot have the type void.

Top-down type inference

In expressions such as:

let a: T = ex

Normally, the compiler type checks the expression ex by itself, then attempts to statically convert the type-checked expression to the given type T as much as it can, while making sure it matches the type. The extent of this process is limited however due to the expression usually having an assumed type that might clash with the given type.

With top-down type inference, the expression is type checked with the extra knowledge that it is supposed to be of type T. For example, the following code is does not compile with the former method, but compiles with top-down type inference:

let foo: (float, uint8, cstring) = (1, 2, "abc")

The tuple expression has an expected type of (float, uint8, cstring). Since it is a tuple literal, we can use this information to assume the types of its elements. The expected types for the expressions 1, 2 and "abc" are respectively float, uint8, and cstring; and these expressions can be statically converted to these types.

Without this information, the type of the tuple expression would have been assumed to be (int, int, string). Thus the type of the tuple expression would not match the type of the variable, and an error would be given.

The extent of this varies, but there are some notable special cases.

Sequence literals

Top-down type inference applies to sequence literals.

let x: seq[seq[float]] = @[@[1, 2, 3], @[4, 5, 6]]

This behavior is tied to the @ overloads in the system module, so overloading @ can disable this behavior. This can be circumvented by specifying the `` system.@ `` overload.

proc `@`(x: string): string = "@" & x

# does not compile:
let x: seq[float] = @[1, 2, 3]
# compiles:
let x: seq[float] = system.`@`([1, 2, 3])

Package level objects

Every Nim module resides in a (nimble) package. An object type can be attached to the package it resides in. If that is done, the type can be referenced from other modules as an incomplete object type. This feature allows to break up recursive type dependencies across module boundaries. Incomplete object types are always passed byref and can only be used in pointer like contexts (var/ref/ptr IncompleteObject) in general, since the compiler does not yet know the size of the object. To complete an incomplete object, the package pragma has to be used. package implies byref.

As long as a type T is incomplete, no runtime type information for T is available.

Example:

# module A (in an arbitrary package)
type
  Pack.SomeObject = object # declare as incomplete object of package 'Pack'
  Triple = object
    a, b, c: ref SomeObject # pointers to incomplete objects are allowed

# Incomplete objects can be used as parameters:
proc myproc(x: SomeObject) = discard

# module B (in package "Pack")
type
  SomeObject* {.package.} = object # Use 'package' to complete the object
    s, t: string
    x, y: int

This feature will likely be superseded in the future by support for recursive module dependencies.

Importing private symbols

In some situations, it may be useful to import all symbols (public or private) from a module. The syntax import foo {.all.} can be used to import all symbols from the module foo. Note that importing private symbols is generally not recommended.

See also the experimental importutils module.

Code reordering

The code reordering feature can implicitly rearrange procedure, template, and macro definitions along with variable declarations and initializations at the top level scope so that, to a large extent, a programmer should not have to worry about ordering definitions correctly or be forced to use forward declarations to preface definitions inside a module.

Example:

{.experimental: "codeReordering".}

proc foo(x: int) =
  bar(x)

proc bar(x: int) =
  echo(x)

foo(10)

Variables can also be reordered as well. Variables that are initialized (i.e. variables that have their declaration and assignment combined in a single statement) can have their entire initialization statement reordered. Be wary of what code is executed at the top level:

{.experimental: "codeReordering".}

proc a() =
  echo(foo)

var foo = 5

a() # outputs: "5"

It is important to note that reordering only works for symbols at top level scope. Therefore, the following will fail to compile:

{.experimental: "codeReordering".}

proc a() =
  b()
  proc b() =
    echo("Hello!")

a()

This feature will likely be replaced with a better solution to remove the need for forward declarations.

Special Operators

dot operators

Note: Dot operators are still experimental and so need to be enabled via {.experimental: "dotOperators".}.

Nim offers a special family of dot operators that can be used to intercept and rewrite proc call and field access attempts, referring to previously undeclared symbol names. They can be used to provide a fluent interface to objects lying outside the static confines of the type system such as values from dynamic scripting languages or dynamic file formats such as JSON or XML.

When Nim encounters an expression that cannot be resolved by the standard overload resolution rules, the current scope will be searched for a dot operator that can be matched against a re-written form of the expression, where the unknown field or proc name is passed to an untyped parameter:

a.b # becomes `.`(a, b)
a.b(c, d) # becomes `.`(a, b, c, d)

The matched dot operators can be symbols of any callable kind (procs, templates and macros), depending on the desired effect:

template `.`(js: PJsonNode, field: untyped): JSON = js[astToStr(field)]

var js = parseJson("{ x: 1, y: 2}")
echo js.x # outputs 1
echo js.y # outputs 2

The following dot operators are available:

operator .

This operator will be matched against both field accesses and method calls.

operator .()

This operator will be matched exclusively against method calls. It has higher precedence than the . operator and this allows one to handle expressions like x.y and x.y() differently if one is interfacing with a scripting language for example.

operator .=

This operator will be matched against assignments to missing fields.

a.b = c # becomes `.=`(a, b, c)

Call operator

The call operator, (), matches all kinds of unresolved calls and takes precedence over dot operators, however it does not match missing overloads for existing routines. The experimental callOperator switch must be enabled to use this operator.

{.experimental: "callOperator".}

template `()`(a: int, b: float): untyped = $(a, b)

block:
  let a = 1.0
  let b = 2
  doAssert b(a) == `()`(b, a)
  doAssert a.b == `()`(b, a)

block:
  let a = 1.0
  proc b(): int = 2
  doAssert not compiles(b(a))
  doAssert not compiles(a.b) # `()` not called

block:
  let a = 1.0
  proc b(x: float): int = int(x + 1)
  let c = 3.0
  
  doAssert not compiles(a.b(c)) # gives a type mismatch error same as b(a, c)
  doAssert (a.b)(c) == `()`(a.b, c)

Extended macro pragmas

Macro pragmas as described in the manual can also be applied to type, variable and constant declarations.

For types:

type
  MyObject {.schema: "schema.protobuf".} = object

This is translated to a call to the schema macro with a nnkTypeDef AST node capturing the left-hand side, remaining pragmas and the right-hand side of the definition. The macro can return either a type section or another nnkTypeDef node, both of which will replace the original row in the type section.

In the future, this nnkTypeDef argument may be replaced with a unary type section node containing the type definition, or some other node that may be more convenient to work with. The ability to return nodes other than type definitions may also be supported, however currently this is not convenient when dealing with mutual type recursion. For now, macros can return an unused type definition where the right-hand node is of kind nnkStmtListType. Declarations in this node will be attached to the same scope as the parent scope of the type section.


For variables and constants, it is largely the same, except a unary node with the same kind as the section containing a single definition is passed to macros, and macros can return any expression.

var
  a = ...
  b {.importc, foo, nodecl.} = ...
  c = ...

Assuming foo is a macro or a template, this is roughly equivalent to:

var a = ...
foo:
  var b {.importc, nodecl.} = ...
var c = ...

Symbols as template/macro calls

Templates and macros that take no arguments can be called as lone symbols, i.e. without parentheses. This is useful for repeated uses of complex expressions that cannot conveniently be represented as runtime values.

type Foo = object
  bar: int

var foo = Foo(bar: 10)
template bar: untyped = foo.bar
assert bar == 10
bar = 15
assert bar == 15

In the future, this may require more specific information on template or macro signatures to be used. Specializations for some applications of this may also be introduced to guarantee consistency and circumvent bugs.

Not nil annotation

Note: This is an experimental feature. It can be enabled with {.experimental: "notnil".}.

All types for which nil is a valid value can be annotated with the not nil annotation to exclude nil as a valid value:

{.experimental: "notnil".}

type
  PObject = ref TObj not nil
  TProc = (proc (x, y: int)) not nil

proc p(x: PObject) =
  echo "not nil"

# compiler catches this:
p(nil)

# and also this:
var x: PObject
p(x)

The compiler ensures that every code path initializes variables which contain non-nilable pointers. The details of this analysis are still to be specified here.

Strict not nil checking

Note: This feature is experimental, you need to enable it with

{.experimental: "strictNotNil".}

or

nim c --experimental:strictNotNil <program>

In the second case it would check builtin and imported modules as well.

It checks the nilability of ref-like types and makes dereferencing safer based on flow typing and not nil annotations.

Its implementation is different than the notnil one: defined under strictNotNil. Keep in mind the difference in option names, be careful with distinguishing them.

We check several kinds of types for nilability:

  • ref types
  • pointer types
  • proc types
  • cstrings

nil

The default kind of nilability types is the nilable kind: they can have the value nil. If you have a non-nilable type T, you can use T nil to get a nilable type for it.

not nil

You can annotate a type where nil isn't a valid value with not nil.

  type
    NilableObject = ref object
      a: int
    Object = NilableObject not nil
    
    Proc = (proc (x, y: int))
  
  proc p(x: Object) =
    echo x.a # ensured to dereference without an error
  # compiler catches this:
  p(nil)
  # and also this:
  var x: NilableObject
  if x.isNil:
    p(x)
  else:
    p(x) # ok

If a type can include nil as a valid value, dereferencing values of the type is checked by the compiler: if a value which might be nil is dereferenced, this produces a warning by default, you can turn this into an error using the compiler options --warningAsError:strictNotNil.

If a type is nilable, you should dereference its values only after a isNil or equivalent check.

local turn on/off

You can still turn off nil checking on function/module level by using a {.strictNotNil: off.} pragma. Note: test that/TODO for code/manual.

nilability state

Currently, a nilable value can be Safe, MaybeNil or Nil : we use internally Parent and Unreachable but this is an implementation detail(a parent layer has the actual nilability).

  • Safe means it shouldn't be nil at that point: e.g. after assignment to a non-nil value or not a.isNil check
  • MaybeNil means it might be nil, but it might not be nil: e.g. an argument, a call argument or a value after an if and else.
  • Nil means it should be nil at that point; e.g. after an assignment to nil or a .isNil check.
  • Unreachable means it shouldn't be possible to access this in this branch: so we do generate a warning as well.

We show an error for each dereference ([], .field, [index] () etc.) which is of a tracked expression which is in MaybeNil or Nil state.

type nilability

Types are either nilable or non-nilable. When you pass a param or a default value, we use the type : for nilable types we return MaybeNil and for non-nilable Safe.

TODO: fix the manual here. (This is not great, as default values for non-nilables and nilables are usually actually nil , so we should think a bit more about this section.)

params rules

Param's nilability is detected based on type nilability. We use the type of the argument to detect the nilability.

assignment rules

Let's say we have left = right.

When we assign, we pass the right's nilability to the left's expression. There should be special handling of aliasing and compound expressions which we specify in their sections. (Assignment is a possible alias move or move out).

call args rules

When we call with arguments, we have two cases when we might change the nilability.

callByVar(a)

Here callByVar can re-assign a, so this might change a's nilability, so we change it to MaybeNil. This is also a possible aliasing move out (moving out of a current alias set).

call(a)

Here call can change a field or element of a, so if we have a dependant expression of a : e.g. a.field. Dependants become MaybeNil.

branches rules

Branches are the reason we do nil checking like this: with flow checking. Sources of branching are if, while, for, and, or, case, try and combinations with return, break, continue and raise

We create a new layer/"scope" for each branch where we map expressions to nilability. This happens when we "fork": usually on the beginning of a construct. When branches "join" we usually unify their expression maps or/and nilabilities.

Merging usually merges maps and alias sets: nilabilities are merged like this:

template union(l: Nilability, r: Nilability): Nilability =
  ## unify two states
  if l == r:
    l
  else:
    MaybeNil

Special handling is for .isNil and == nil, also for not, and and or.

not reverses the nilability, and is similar to "forking" : the right expression is checked in the layer resulting from the left one and or is similar to "merging": the right and left expression should be both checked in the original layer.

isNil, == nil make expressions Nil. If there is a not or != nil, they make them Safe. We also reverse the nilability in the opposite branch: e.g. else.

compound expressions: field, index expressions

We want to track also field(dot) and index(bracket) expressions.

We track some of those compound expressions which might be nilable as dependants of their bases: a.field is changed if a is moved (re-assigned), similarly a[index] is dependent on a and a.field.field on a.field.

When we move the base, we update dependants to MaybeNil. Otherwise, we usually start with type nilability.

When we call args, we update the nilability of their dependants to MaybeNil as the calls usually can change them. We might need to check for strictFuncs pure funcs and not do that then.

For field expressions a.field, we calculate an integer value based on a hash of the tree and just accept equivalent trees as equivalent expressions.

For item expression a[index], we also calculate an integer value based on a hash of the tree and accept equivalent trees as equivalent expressions: for static values only. For now, we support only constant indices: we don't track expression with no-const indices. For those we just report a warning even if they are safe for now: one can use a local variable to workaround. For loops this might be annoying: so one should be able to turn off locally the warning using the {.warning[StrictNotNil]:off.}.

For bracket expressions, in the future we might count a[<any>] as the same general expression. This means we should the index but otherwise handle it the same for assign (maybe "aliasing" all the non-static elements) and differentiate only for static: e.g. a[0] and a[1].

element tracking

When we assign an object construction, we should track the fields as well:

var a = Nilable(field: Nilable()) # a : Safe, a.field: Safe

Usually we just track the result of an expression: probably this should apply for elements in other cases as well. Also related to tracking initialization of expressions/fields.

unstructured control flow rules

Unstructured control flow keywords as return, break, continue, raise mean that we jump from a branch out. This means that if there is code after the finishing of the branch, it would be run if one hasn't hit the direct parent branch of those: so it is similar to an else. In those cases we should use the reverse nilabilities for the local to the condition expressions. E.g.

for a in c:
  if not a.isNil:
    b()
    break
  code # here a: Nil , because if not, we would have breaked

aliasing

We support alias detection for local expressions.

We track sets of aliased expressions. We start with all nilable local expressions in separate sets. Assignments and other changes to nilability can move / move out expressions of sets.

move: Moving left to right means we remove left from its current set and unify it with the right's set. This means it stops being aliased with its previous aliases.

var left = b
left = right # moving left to right

move out: Moving out left might remove it from the current set and ensure that it's in its own set as a single element. e.g.

var left = b
left = nil # moving out

initialization of non nilable and nilable values

TODO

warnings and errors

We show an error for each dereference ([], .field, [index] () etc.) which is of a tracked expression which is in MaybeNil or Nil state.

We might also show a history of the transitions and the reasons for them that might change the nilability of the expression.

Aliasing restrictions in parameter passing

Note: The aliasing restrictions are currently not enforced by the implementation and need to be fleshed out further.

"Aliasing" here means that the underlying storage locations overlap in memory at runtime. An "output parameter" is a parameter of type var T, an input parameter is any parameter that is not of type var.

  1. Two output parameters should never be aliased.
  2. An input and an output parameter should not be aliased.
  3. An output parameter should never be aliased with a global or thread local variable referenced by the called proc.
  4. An input parameter should not be aliased with a global or thread local variable updated by the called proc.

One problem with rules 3 and 4 is that they affect specific global or thread local variables, but Nim's effect tracking only tracks "uses no global variable" via .noSideEffect. The rules 3 and 4 can also be approximated by a different rule:

  1. A global or thread local variable (or a location derived from such a location) can only passed to a parameter of a .noSideEffect proc.

Strict funcs

Since version 1.4, a stricter definition of "side effect" is available. In addition to the existing rule that a side effect is calling a function with side effects, the following rule is also enforced:

Any mutation to an object does count as a side effect if that object is reachable via a parameter that is not declared as a var parameter.

For example:

{.experimental: "strictFuncs".}

type
  Node = ref object
    le, ri: Node
    data: string

func len(n: Node): int =
  # valid: len does not have side effects
  var it = n
  while it != nil:
    inc result
    it = it.ri

func mut(n: Node) =
  let m = n # is the statement that connected the mutation to the parameter
  m.data = "yeah" # the mutation is here
  # Error: 'mut' can have side effects
  # an object reachable from 'n' is potentially mutated

The algorithm behind this analysis is described in the view types algorithm.

View types

Tip: --experimental:views is more effective with --experimental:strictFuncs.

A view type is a type that is or contains one of the following types:

  • lent T (view into T)
  • openArray[T] (pair of (pointer to array of T, size))

For example:

type
  View1 = openArray[byte]
  View2 = lent string
  View3 = Table[openArray[char], int]

Exceptions to this rule are types constructed via ptr or proc. For example, the following types are not view types:

type
  NotView1 = proc (x: openArray[int])
  NotView2 = ptr openArray[char]
  NotView3 = ptr array[4, lent int]

The mutability aspect of a view type is not part of the type but part of the locations it's derived from. More on this later.

A view is a symbol (a let, var, const, etc.) that has a view type.

Since version 1.4, Nim allows view types to be used as local variables. This feature needs to be enabled via {.experimental: "views".}.

A local variable of a view type borrows from the locations and it is statically enforced that the view does not outlive the location it was borrowed from.

For example:

{.experimental: "views".}

proc take(a: openArray[int]) =
  echo a.len

proc main(s: seq[int]) =
  var x: openArray[int] = s # 'x' is a view into 's'
  # it is checked that 'x' does not outlive 's' and
  # that 's' is not mutated.
  for i in 0 .. high(x):
    echo x[i]
  take(x)
  
  take(x.toOpenArray(0, 1)) # slicing remains possible
  let y = x  # create a view from a view
  take y
  # it is checked that 'y' does not outlive 'x' and
  # that 'x' is not mutated as long as 'y' lives.


main(@[11, 22, 33])

A local variable of a view type can borrow from a location derived from a parameter, another local variable, a global const or let symbol or a thread-local var or let.

Let p the proc that is analysed for the correctness of the borrow operation.

Let source be one of:

  • A formal parameter of p. Note that this does not cover parameters of inner procs.
  • The result symbol of p.
  • A local var or let or const of p. Note that this does not cover locals of inner procs.
  • A thread-local var or let.
  • A global let or const.
  • A constant array/seq/object/tuple constructor.

Path expressions

A location derived from source is then defined as a path expression that has source as the owner. A path expression e is defined recursively:

  • source itself is a path expression.
  • Container access like e[i] is a path expression.
  • Tuple access e[0] is a path expression.
  • Object field access e.field is a path expression.
  • system.toOpenArray(e, ...) is a path expression.
  • Pointer dereference e[] is a path expression.
  • An address addr e is a path expression.
  • A type conversion T(e) is a path expression.
  • A cast expression cast[T](e) is a path expression.
  • f(e, ...) is a path expression if f's return type is a view type. Because the view can only have been borrowed from e, we then know that the owner of f(e, ...) is e.

If a view type is used as a return type, the location must borrow from a location that is derived from the first parameter that is passed to the proc. See the manual for details about how this is done for var T.

A mutable view can borrow from a mutable location, an immutable view can borrow from both a mutable or an immutable location.

If a view borrows from a mutable location, the view can be used to update the location. Otherwise it cannot be used for mutations.

The duration of a borrow is the span of commands beginning from the assignment to the view and ending with the last usage of the view.

For the duration of the borrow operation, no mutations to the borrowed locations may be performed except via the view that borrowed from the location. The borrowed location is said to be sealed during the borrow.

{.experimental: "views".}

type
  Obj = object
    field: string

proc dangerous(s: var seq[Obj]) =
  let v: lent Obj = s[0] # seal 's'
  s.setLen 0  # prevented at compile-time because 's' is sealed.
  echo v.field

The scope of the view does not matter:

proc valid(s: var seq[Obj]) =
  let v: lent Obj = s[0]  # begin of borrow
  echo v.field            # end of borrow
  s.setLen 0  # valid because 'v' isn't used afterwards

The analysis requires as much precision about mutations as is reasonably obtainable, so it is more effective with the experimental strict funcs feature. In other words --experimental:views works better with --experimental:strictFuncs.

The analysis is currently control flow insensitive:

proc invalid(s: var seq[Obj]) =
  let v: lent Obj = s[0]
  if false:
    s.setLen 0
  echo v.field

In this example, the compiler assumes that s.setLen 0 invalidates the borrow operation of v even though a human being can easily see that it will never do that at runtime.

Start of a borrow

A borrow starts with one of the following:

  • The assignment of a non-view-type to a view-type.
  • The assignment of a location that is derived from a local parameter to a view-type.

End of a borrow

A borrow operation ends with the last usage of the view variable.

Reborrows

A view v can borrow from multiple different locations. However, the borrow is always the full span of v's lifetime and every location that is borrowed from is sealed during v's lifetime.

Algorithm

The following section is an outline of the algorithm that the current implementation uses. The algorithm performs two traversals over the AST of the procedure or global section of code that uses a view variable. No fixpoint iterations are performed, the complexity of the analysis is O(N) where N is the number of nodes of the AST.

The first pass over the AST computes the lifetime of each local variable based on a notion of an "abstract time", in the implementation it's a simple integer that is incremented for every visited node.

In the second pass, information about the underlying object "graphs" is computed. Let v be a parameter or a local variable. Let G(v) be the graph that v belongs to. A graph is defined by the set of variables that belong to the graph. Initially for all v: G(v) = {v}. Every variable can only be part of a single graph.

Assignments like a = b "connect" two variables, both variables end up in the same graph {a, b} = G(a) = G(b). Unfortunately, the pattern to look for is much more complex than that and can involve multiple assignment targets and sources:

f(x, y) = g(a, b)

connects x and y to a and b: G(x) = G(y) = G(a) = G(b) = {x, y, a, b}. A type based alias analysis rules out some of these combinations, for example a string value cannot possibly be connected to a seq[int].

A pattern like v[] = value or v.field = value marks G(v) as mutated. After the second pass a set of disjoint graphs was computed.

For strict functions it is then enforced that there is no graph that is both mutated and has an element that is an immutable parameter (that is a parameter that is not of type var T).

For borrow checking, a different set of checks is performed. Let v be the view and b the location that is borrowed from.

  • The lifetime of v must not exceed b's lifetime. Note: The lifetime of a parameter is the complete proc body.
  • If v is used for a mutation, b must be a mutable location too.
  • During v's lifetime, G(b) can only be modified by v (and only if v is a mutable view).
  • If v is result then b has to be a location derived from the first formal parameter or from a constant location.
  • A view cannot be used for a read or a write access before it was assigned to.

Concepts

Concepts, also known as "user-defined type classes", are used to specify an arbitrary set of requirements that the matched type must satisfy.

Concepts are written in the following form:

type
  Comparable = concept x, y
    (x < y) is bool
  
  Stack[T] = concept s, var v
    s.pop() is T
    v.push(T)
    
    s.len is Ordinal
    
    for value in s:
      value is T

The concept matches if:

  1. all expressions within the body can be compiled for the tested type
  2. all statically evaluable boolean expressions in the body are true

The identifiers following the concept keyword represent instances of the currently matched type. You can apply any of the standard type modifiers such as var, ref, ptr and static to denote a more specific type of instance. You can also apply the type modifier to create a named instance of the type itself:

type
  MyConcept = concept x, var v, ref r, ptr p, static s, type T
    ...

Within the concept body, types can appear in positions where ordinary values and parameters are expected. This provides a more convenient way to check for the presence of callable symbols with specific signatures:

type
  OutputStream = concept var s
    s.write(string)

In order to check for symbols accepting type params, you must prefix the type with the explicit type modifier. The named instance of the type, following the concept keyword is also considered to have the explicit modifier and will be matched only as a type.

type
  # Let's imagine a user-defined casting framework with operators
  # such as `val.to(string)` and `val.to(JSonValue)`. We can test
  # for these with the following concept:
  MyCastables = concept x
    x.to(type string)
    x.to(type JSonValue)
  
  # Let's define a couple of concepts, known from Algebra:
  AdditiveMonoid* = concept x, y, type T
    x + y is T
    T.zero is T # require a proc such as `int.zero` or 'Position.zero'
  
  AdditiveGroup* = concept x, y, type T
    x is AdditiveMonoid
    -x is T
    x - y is T

Please note that the is operator allows one to easily verify the precise type signatures of the required operations, but since type inference and default parameters are still applied in the concept body, it's also possible to describe usage protocols that do not reveal implementation details.

Much like generics, concepts are instantiated exactly once for each tested type and any static code included within the body is executed only once.

Concept diagnostics

By default, the compiler will report the matching errors in concepts only when no other overload can be selected and a normal compilation error is produced. When you need to understand why the compiler is not matching a particular concept and, as a result, a wrong overload is selected, you can apply the explain pragma to either the concept body or a particular call-site.

type
  MyConcept {.explain.} = concept ...

overloadedProc(x, y, z) {.explain.}

This will provide Hints in the compiler output either every time the concept is not matched or only on the particular call-site.

Generic concepts and type binding rules

The concept types can be parametric just like the regular generic types:

### matrixalgo.nim

import std/typetraits

type
  AnyMatrix*[R, C: static int; T] = concept m, var mvar, type M
    M.ValueType is T
    M.Rows == R
    M.Cols == C
    
    m[int, int] is T
    mvar[int, int] = T
    
    type TransposedType = stripGenericParams(M)[C, R, T]
  
  AnySquareMatrix*[N: static int, T] = AnyMatrix[N, N, T]
  
  AnyTransform3D* = AnyMatrix[4, 4, float]

proc transposed*(m: AnyMatrix): m.TransposedType =
  for r in 0 ..< m.R:
    for c in 0 ..< m.C:
      result[r, c] = m[c, r]

proc determinant*(m: AnySquareMatrix): int =
  ...

proc setPerspectiveProjection*(m: AnyTransform3D) =
  ...

--------------
### matrix.nim

type
  Matrix*[M, N: static int; T] = object
    data: array[M*N, T]

proc `[]`*(M: Matrix; m, n: int): M.T =
  M.data[m * M.N + n]

proc `[]=`*(M: var Matrix; m, n: int; v: M.T) =
  M.data[m * M.N + n] = v

# Adapt the Matrix type to the concept's requirements
template Rows*(M: typedesc[Matrix]): int = M.M
template Cols*(M: typedesc[Matrix]): int = M.N
template ValueType*(M: typedesc[Matrix]): typedesc = M.T

-------------
### usage.nim

import matrix, matrixalgo

var
  m: Matrix[3, 3, int]
  projectionMatrix: Matrix[4, 4, float]

echo m.transposed.determinant
setPerspectiveProjection projectionMatrix

When the concept type is matched against a concrete type, the unbound type parameters are inferred from the body of the concept in a way that closely resembles the way generic parameters of callable symbols are inferred on call sites.

Unbound types can appear both as params to calls such as s.push(T) and on the right-hand side of the is operator in cases such as x.pop is T and x.data is seq[T].

Unbound static params will be inferred from expressions involving the == operator and also when types dependent on them are being matched:

type
  MatrixReducer[M, N: static int; T] = concept x
    x.reduce(SquareMatrix[N, T]) is array[M, int]

The Nim compiler includes a simple linear equation solver, allowing it to infer static params in some situations where integer arithmetic is involved.

Just like in regular type classes, Nim discriminates between bind once and bind many types when matching the concept. You can add the distinct modifier to any of the otherwise inferable types to get a type that will be matched without permanently inferring it. This may be useful when you need to match several procs accepting the same wide class of types:

type
  Enumerable[T] = concept e
    for v in e:
      v is T

type
  MyConcept = concept o
    # this could be inferred to a type such as Enumerable[int]
    o.foo is distinct Enumerable
    
    # this could be inferred to a different type such as Enumerable[float]
    o.bar is distinct Enumerable
    
    # it's also possible to give an alias name to a `bind many` type class
    type Enum = distinct Enumerable
    o.baz is Enum

On the other hand, using bind once types allows you to test for equivalent types used in multiple signatures, without actually requiring any concrete types, thus allowing you to encode implementation-defined types:

type
  MyConcept = concept x
    type T1 = auto
    x.foo(T1)
    x.bar(T1) # both procs must accept the same type
    
    type T2 = seq[SomeNumber]
    x.alpha(T2)
    x.omega(T2) # both procs must accept the same type
                # and it must be a numeric sequence

As seen in the previous examples, you can refer to generic concepts such as Enumerable[T] just by their short name. Much like the regular generic types, the concept will be automatically instantiated with the bind once auto type in the place of each missing generic param.

Please note that generic concepts such as Enumerable[T] can be matched against concrete types such as string. Nim doesn't require the concept type to have the same number of parameters as the type being matched. If you wish to express a requirement towards the generic parameters of the matched type, you can use a type mapping operator such as genericHead or stripGenericParams within the body of the concept to obtain the uninstantiated version of the type, which you can then try to instantiate in any required way. For example, here is how one might define the classic Functor concept from Haskell and then demonstrate that Nim's Option[T] type is an instance of it:

import std/[sugar, typetraits]

type
  Functor[A] = concept f
    type MatchedGenericType = genericHead(typeof(f))
      # `f` will be a value of a type such as `Option[T]`
      # `MatchedGenericType` will become the `Option` type
    
    f.val is A
      # The Functor should provide a way to obtain
      # a value stored inside it
    
    type T = auto
    map(f, A -> T) is MatchedGenericType[T]
      # And it should provide a way to map one instance of
      # the Functor to a instance of a different type, given
      # a suitable `map` operation for the enclosed values

import std/options
echo Option[int] is Functor # prints true

Concept derived values

All top level constants or types appearing within the concept body are accessible through the dot operator in procs where the concept was successfully matched to a concrete type:

type
  DateTime = concept t1, t2, type T
    const Min = T.MinDate
    T.Now is T
    
    t1 < t2 is bool
    
    type TimeSpan = typeof(t1 - t2)
    TimeSpan * int is TimeSpan
    TimeSpan + TimeSpan is TimeSpan
    
    t1 + TimeSpan is T

proc eventsJitter(events: Enumerable[DateTime]): float =
  var
    # this variable will have the inferred TimeSpan type for
    # the concrete Date-like value the proc was called with:
    averageInterval: DateTime.TimeSpan
    
    deviation: float
  ...

Concept refinement

When the matched type within a concept is directly tested against a different concept, we say that the outer concept is a refinement of the inner concept and thus it is more-specific. When both concepts are matched in a call during overload resolution, Nim will assign a higher precedence to the most specific one. As an alternative way of defining concept refinements, you can use the object inheritance syntax involving the of keyword:

type
  Graph = concept g, type G of EquallyComparable, Copyable
    type
      VertexType = G.VertexType
      EdgeType = G.EdgeType
    
    VertexType is Copyable
    EdgeType is Copyable
    
    var
      v: VertexType
      e: EdgeType
  
  IncidendeGraph = concept of Graph
    # symbols such as variables and types from the refined
    # concept are automatically in scope:
    
    g.source(e) is VertexType
    g.target(e) is VertexType
    
    g.outgoingEdges(v) is Enumerable[EdgeType]
  
  BidirectionalGraph = concept g, type G
    # The following will also turn the concept into a refinement when it
    # comes to overload resolution, but it doesn't provide the convenient
    # symbol inheritance
    g is IncidendeGraph
    
    g.incomingEdges(G.VertexType) is Enumerable[G.EdgeType]

proc f(g: IncidendeGraph)
proc f(g: BidirectionalGraph) # this one will be preferred if we pass a type
                              # matching the BidirectionalGraph concept

Dynamic arguments for bindSym

This experimental feature allows the symbol name argument of macros.bindSym to be computed dynamically.

{.experimental: "dynamicBindSym".}

import macros

macro callOp(opName, arg1, arg2): untyped =
  result = newCall(bindSym($opName), arg1, arg2)

echo callOp("+", 1, 2)
echo callOp("-", 5, 4)

Term rewriting macros

Term rewriting macros are macros or templates that have not only a name but also a pattern that is searched for after the semantic checking phase of the compiler: This means they provide an easy way to enhance the compilation pipeline with user defined optimizations:

template optMul{`*`(a, 2)}(a: int): int = a + a

let x = 3
echo x * 2

The compiler now rewrites x * 2 as x + x. The code inside the curly brackets is the pattern to match against. The operators *, **, |, ~ have a special meaning in patterns if they are written in infix notation, so to match verbatim against * the ordinary function call syntax needs to be used.

Term rewriting macros are applied recursively, up to a limit. This means that if the result of a term rewriting macro is eligible for another rewriting, the compiler will try to perform it, and so on, until no more optimizations are applicable. To avoid putting the compiler into an infinite loop, there is a hard limit on how many times a single term rewriting macro can be applied. Once this limit has been passed, the term rewriting macro will be ignored.

Unfortunately optimizations are hard to get right and even this tiny example is wrong:

template optMul{`*`(a, 2)}(a: int): int = a + a

proc f(): int =
  echo "side effect!"
  result = 55

echo f() * 2

We cannot duplicate 'a' if it denotes an expression that has a side effect! Fortunately Nim supports side effect analysis:

template optMul{`*`(a, 2)}(a: int{noSideEffect}): int = a + a

proc f(): int =
  echo "side effect!"
  result = 55

echo f() * 2 # not optimized ;-)

You can make one overload matching with a constraint and one without, and the one with a constraint will have precedence, and so you can handle both cases differently.

So what about 2 * a? We should tell the compiler * is commutative. We cannot really do that however as the following code only swaps arguments blindly:

template mulIsCommutative{`*`(a, b)}(a, b: int): int = b * a

What optimizers really need to do is a canonicalization:

template canonMul{`*`(a, b)}(a: int{lit}, b: int): int = b * a

The int{lit} parameter pattern matches against an expression of type int, but only if it's a literal.

Parameter constraints

The parameter constraint expression can use the operators | (or), & (and) and ~ (not) and the following predicates:

PredicateMeaning
atomThe matching node has no children.
litThe matching node is a literal like "abc", 12.
symThe matching node must be a symbol (a bound identifier).
identThe matching node must be an identifier (an unbound identifier).
callThe matching AST must be a call/apply expression.
lvalueThe matching AST must be an lvalue.
sideeffectThe matching AST must have a side effect.
nosideeffectThe matching AST must have no side effect.
paramA symbol which is a parameter.
genericparamA symbol which is a generic parameter.
moduleA symbol which is a module.
typeA symbol which is a type.
varA symbol which is a variable.
letA symbol which is a let variable.
constA symbol which is a constant.
resultThe special result variable.
procA symbol which is a proc.
methodA symbol which is a method.
iteratorA symbol which is an iterator.
converterA symbol which is a converter.
macroA symbol which is a macro.
templateA symbol which is a template.
fieldA symbol which is a field in a tuple or an object.
enumfieldA symbol which is a field in an enumeration.
forvarA for loop variable.
labelA label (used in block statements).
nk*The matching AST must have the specified kind. (Example: nkIfStmt denotes an if statement.)
aliasStates that the marked parameter needs to alias with some other parameter.
noaliasStates that every other parameter must not alias with the marked parameter.

Predicates that share their name with a keyword have to be escaped with backticks. The alias and noalias predicates refer not only to the matching AST, but also to every other bound parameter; syntactically they need to occur after the ordinary AST predicates:

template ex{a = b + c}(a: int{noalias}, b, c: int) =
  # this transformation is only valid if 'b' and 'c' do not alias 'a':
  a = b
  inc a, c

Another example:

proc somefunc(s: string)                 = assert s == "variable"
proc somefunc(s: string{nkStrLit})       = assert s == "literal"
proc somefunc(s: string{nkRStrLit})      = assert s == r"raw"
proc somefunc(s: string{nkTripleStrLit}) = assert s == """triple"""
proc somefunc(s: static[string])         = assert s == "constant"

# Use parameter constraints to provide overloads based on both the input parameter type and form.
var variable = "variable"
somefunc(variable)
const constant = "constant"
somefunc(constant)
somefunc("literal")
somefunc(r"raw")
somefunc("""triple""")

Pattern operators

The operators *, **, |, ~ have a special meaning in patterns if they are written in infix notation.

The | operator

The | operator if used as infix operator creates an ordered choice:

template t{0|1}(): untyped = 3
let a = 1
# outputs 3:
echo a

The matching is performed after the compiler performed some optimizations like constant folding, so the following does not work:

template t{0|1}(): untyped = 3
# outputs 1:
echo 1

The reason is that the compiler already transformed the 1 into "1" for the echo statement. However, a term rewriting macro should not change the semantics anyway. In fact, they can be deactivated with the --patterns:off command line option or temporarily with the patterns pragma.

The {} operator

A pattern expression can be bound to a pattern parameter via the expr{param} notation:

template t{(0|1|2){x}}(x: untyped): untyped = x + 1
let a = 1
# outputs 2:
echo a

The ~ operator

The ~ operator is the 'not' operator in patterns:

template t{x = (~x){y} and (~x){z}}(x, y, z: bool) =
  x = y
  if x: x = z

var
  a = false
  b = true
  c = false
a = b and c
echo a

The * operator

The * operator can flatten a nested binary expression like a & b & c to &(a, b, c):

var
  calls = 0

proc `&&`(s: varargs[string]): string =
  result = s[0]
  for i in 1..len(s)-1: result.add s[i]
  inc calls

template optConc{ `&&` * a }(a: string): untyped = &&a

let space = " "
echo "my" && (space & "awe" && "some " ) && "concat"

# check that it's been optimized properly:
doAssert calls == 1

The second operator of * must be a parameter; it is used to gather all the arguments. The expression "my" && (space & "awe" && "some " ) && "concat" is passed to optConc in a as a special list (of kind nkArgList) which is flattened into a call expression; thus the invocation of optConc produces:

`&&`("my", space & "awe", "some ", "concat")

The ** operator

The ** is much like the * operator, except that it gathers not only all the arguments, but also the matched operators in reverse polish notation:

import std/macros

type
  Matrix = object
    dummy: int

proc `*`(a, b: Matrix): Matrix = discard
proc `+`(a, b: Matrix): Matrix = discard
proc `-`(a, b: Matrix): Matrix = discard
proc `$`(a: Matrix): string = result = $a.dummy
proc mat21(): Matrix =
  result.dummy = 21

macro optM{ (`+`|`-`|`*`) ** a }(a: Matrix): untyped =
  echo treeRepr(a)
  result = newCall(bindSym"mat21")

var x, y, z: Matrix

echo x + y * z - x

This passes the expression x + y * z - x to the optM macro as an nnkArgList node containing:

Arglist
  Sym "x"
  Sym "y"
  Sym "z"
  Sym "*"
  Sym "+"
  Sym "x"
  Sym "-"

(This is the reverse polish notation of x + y * z - x.)

Parameters

Parameters in a pattern are type checked in the matching process. If a parameter is of the type varargs, it is treated specially and can match 0 or more arguments in the AST to be matched against:

template optWrite{
  write(f, x)
  ((write|writeLine){w})(f, y)
}(x, y: varargs[untyped], f: File, w: untyped) =
  w(f, x, y)

noRewrite pragma

Term rewriting macros and templates are currently greedy and they will rewrite as long as there is a match. There was no way to ensure some rewrite happens only once, e.g. when rewriting term to same term plus extra content.

noRewrite pragma can actually prevent further rewriting on marked code, e.g. with given example echo("ab") will be rewritten just once:

template pwnEcho{echo(x)}(x: untyped) =
  {.noRewrite.}: echo("pwned!")

echo "ab"

noRewrite pragma can be useful to control term-rewriting macros recursion.

Example: Partial evaluation

The following example shows how some simple partial evaluation can be implemented with term rewriting:

proc p(x, y: int; cond: bool): int =
  result = if cond: x + y else: x - y

template optP1{p(x, y, true)}(x, y: untyped): untyped = x + y
template optP2{p(x, y, false)}(x, y: untyped): untyped = x - y

Example: Hoisting

The following example shows how some form of hoisting can be implemented:

import std/pegs

template optPeg{peg(pattern)}(pattern: string{lit}): Peg =
  var gl {.global, gensym.} = peg(pattern)
  gl

for i in 0 .. 3:
  echo match("(a b c)", peg"'(' @ ')'")
  echo match("W_HI_Le", peg"\y 'while'")

The optPeg template optimizes the case of a peg constructor with a string literal, so that the pattern will only be parsed once at program startup and stored in a global gl which is then re-used. This optimization is called hoisting because it is comparable to classical loop hoisting.

AST based overloading

Parameter constraints can also be used for ordinary routine parameters; these constraints then affect ordinary overloading resolution:

proc optLit(a: string{lit|`const`}) =
  echo "string literal"
proc optLit(a: string) =
  echo "no string literal"

const
  constant = "abc"

var
  variable = "xyz"

optLit("literal")
optLit(constant)
optLit(variable)

However, the constraints alias and noalias are not available in ordinary routines.

Parallel & Spawn

Nim has two flavors of parallelism:

  1. Structured parallelism via the parallel statement.
  2. Unstructured parallelism via the standalone spawn statement.

Nim has a builtin thread pool that can be used for CPU intensive tasks. For IO intensive tasks the async and await features should be used instead. Both parallel and spawn need the threadpool module to work.

Somewhat confusingly, spawn is also used in the parallel statement with slightly different semantics. spawn always takes a call expression of the form f(a, ...). Let T be f's return type. If T is void, then spawn's return type is also void, otherwise it is FlowVar[T].

Within a parallel section, the FlowVar[T] is sometimes eliminated to T. This happens when T does not contain any GC'ed memory. The compiler can ensure the location in location = spawn f(...) is not read prematurely within a parallel section and so there is no need for the overhead of an indirection via FlowVar[T] to ensure correctness.

Note: Currently exceptions are not propagated between spawn'ed tasks!

This feature is likely to be removed in the future as external packages can have better solutions.

Spawn statement

The spawn statement can be used to pass a task to the thread pool:

import std/threadpool

proc processLine(line: string) =
  discard "do some heavy lifting here"

for x in lines("myinput.txt"):
  spawn processLine(x)
sync()

For reasons of type safety and implementation simplicity the expression that spawn takes is restricted:

  • It must be a call expression f(a, ...).
  • f must be gcsafe.
  • f must not have the calling convention closure.
  • f's parameters may not be of type var. This means one has to use raw ptr's for data passing reminding the programmer to be careful.
  • ref parameters are deeply copied, which is a subtle semantic change and can cause performance problems, but ensures memory safety. This deep copy is performed via system.deepCopy, so it can be overridden.
  • For safe data exchange between f and the caller, a global Channel needs to be used. However, since spawn can return a result, often no further communication is required.

spawn executes the passed expression on the thread pool and returns a data flow variable FlowVar[T] that can be read from. The reading with the ^ operator is blocking. However, one can use blockUntilAny to wait on multiple flow variables at the same time:

import std/threadpool, ...

# wait until 2 out of 3 servers received the update:
proc main =
  var responses = newSeq[FlowVarBase](3)
  for i in 0..2:
    responses[i] = spawn tellServer(Update, "key", "value")
  var index = blockUntilAny(responses)
  assert index >= 0
  responses.del(index)
  discard blockUntilAny(responses)

Data flow variables ensure that no data races are possible. Due to technical limitations, not every type T can be used in a data flow variable: T has to be a ref, string, seq or of a type that doesn't contain any GC'd type. This restriction is not hard to work-around in practice.

Parallel statement

Example:

# Compute pi in an inefficient way
import std/[strutils, math, threadpool]
{.experimental: "parallel".}

proc term(k: float): float = 4 * math.pow(-1, k) / (2*k + 1)

proc pi(n: int): float =
  var ch = newSeq[float](n + 1)
  parallel:
    for k in 0..ch.high:
      ch[k] = spawn term(float(k))
  for k in 0..ch.high:
    result += ch[k]

echo formatFloat(pi(5000))

The parallel statement is the preferred mechanism to introduce parallelism in a Nim program. Only a subset of the Nim language is valid within a parallel section. This subset is checked during semantic analysis to be free of data races. A sophisticated disjoint checker ensures that no data races are possible, even though shared memory is extensively supported!

The subset is in fact the full language with the following restrictions / changes:

  • spawn within a parallel section has special semantics.
  • Every location of the form a[i], a[i..j] and dest where dest is part of the pattern dest = spawn f(...) has to be provably disjoint. This is called the disjoint check.
  • Every other complex location loc that is used in a spawned proc (spawn f(loc)) has to be immutable for the duration of the parallel section. This is called the immutability check. Currently it is not specified what exactly "complex location" means. We need to make this an optimization!
  • Every array access has to be provably within bounds. This is called the bounds check.
  • Slices are optimized so that no copy is performed. This optimization is not yet performed for ordinary slices outside of a parallel section.

Strict definitions and out parameters

With experimental: "strictDefs" every local variable must be initialized explicitly before it can be used:

{.experimental: "strictDefs".}

proc test =
  var s: seq[string]
  s.add "abc" # invalid!

Needs to be written as:

{.experimental: "strictDefs".}

proc test =
  var s: seq[string] = @[]
  s.add "abc" # valid!

A control flow analysis is performed in order to prove that a variable has been written to before it is used. Thus the following is valid:

{.experimental: "strictDefs".}

proc test(cond: bool) =
  var s: seq[string]
  if cond:
    s = @["y"]
  else:
    s = @[]
  s.add "abc" # valid!

In this example every path does set s to a value before it is used.

out parameters

An out parameter is like a var parameter but it must be written to before it can be used:


proc myopen(f: out File; name: string): bool =
  f = default(File)
  result = open(f, name)

While it is usually the better style to use the return type in order to return results API and ABI considerations might make this infeasible. Like for var T Nim maps out T to a hidden pointer. For example POSIX's stat routine can be wrapped as:


proc stat*(a1: cstring, a2: out Stat): cint {.importc, header: "<sys/stat.h>".}

When the implementation of a routine with output parameters is analysed, the compiler checks that every path before the (implicit or explicit) return does set every output parameter:


proc p(x: out int; y: out string; cond: bool) =
  x = 4
  if cond:
    y = "abc"
  # error: not every path initializes 'y'

Out parameters and exception handling

The analysis should take exceptions into account (but currently does not):


proc p(x: out int; y: out string; cond: bool) =
  x = canRaise(45)
  y = "abc" # <-- error: not every path initializes 'y'

Once the implementation takes exceptions into account it is easy enough to use outParam = default(typeof(outParam)) in the beginning of the proc body.

Out parameters and inheritance

It is not valid to pass an lvalue of a supertype to an out T parameter:


type
  Superclass = object of RootObj
    a: int
  Subclass = object of Superclass
    s: string

proc init(x: out Superclass) =
  x = Superclass(a: 8)

var v: Subclass
init v
use v.s # the 's' field was never initialized!

However, in the future this could be allowed and provide a better way to write object constructors that take inheritance into account.

Note: The implementation of "strict definitions" and "out parameters" is experimental but the concept is solid and it is expected that eventually this mode becomes the default in later versions.

Strict case objects

With experimental: "strictCaseObjects" every field access is checked to be valid at compile-time. The field is within a case section of an object.

{.experimental: "strictCaseObjects".}

type
  Foo = object
    case b: bool
    of false:
      s: string
    of true:
      x: int

var x = Foo(b: true, x: 4)
case x.b
of true:
  echo x.x # valid
of false:
  echo "no"

case x.b
of false:
  echo x.x # error: field access outside of valid case branch: x.x
of true:
  echo "no"

Note: The implementation of "strict case objects" is experimental but the concept is solid and it is expected that eventually this mode becomes the default in later versions.