Scoping Rules in Elixir (and Erlang)

For everyday use it is sufficient to understand the basics of scoping rules in Elixir: that there’s the top level scope and function clause scope, and that named functions have their own peculiar differences from the more conventional anonymous functions.

But there are, in fact, quite a few rules you need to know to get a complete picture of the way scopes work in Elixir. In this technical article we will take a close look at all of the scoping rules and learn in what ways they differ from Erlang.

Types of Scope

In Elixir there are two types of scope:

  • the top level scope
  • function clause scope

There are a number of constructs that create new scope:

  • modules and module-like structures: defmodule, defprotocol, defimpl
  • functions: fn, def, defp
  • comprehensions: for
  • try block bodies

Most of the time user code in Elixir is structured in the following way. At the top level we define modules. Each module contains a number of attributes and function clauses. Inside a function clause there can be arbitrary number of expressions including control flow constructs like case, if, or try:

abc = "abc"            T ---------------------+
                                              |
defmodule M do             M ---------------+ |
  @doc "factorial"                          | |
  @limit 13                                 | |
                                            | |
  def foo(n) do                F ---------+ | |
    x = case n do                         | | |  # T: top level scope
      0 -> 1                              | | |
      i when i > 0 -> n * foo(n - 1)      | | |  # M: module's scope
      _ -> :undef                         | | |
    end                                   | | |  # F: function clause scope
                                          | | |
    for x <- [1,2,3] do            C ---+ | | |  # C: comprehension's scope
      -x                                | | | |
    end                            -----+ | | |
                                          | | |
  end                          -----------+ | |
                                            | |
end                        -----------------+ |
                        ----------------------+

Another way to visualise that structure, schematically:

# Figure 1

+------------------------------------------------------------+
| Top level                                                  |
|                                                            |
|  +------------------------+     +------------------------+ |
|  | Module                 |     | Module                 | |
|  |                        |     |                        | |
|  | +--------------------+ |     | +--------------------+ | |
|  | | Function clause    | |     | | Function clause    | | |
|  | |                    | |     | |                    | | |
|  | | +----------------+ | |     | | +----------------+ | | |
|  | | | Comprehension  | | |     | | | Comprehension  | | | |
|  | | +----------------+ | |     | | +----------------+ | | |
|  | | +----------------+ | | ... | | +----------------+ | | |
|  | | | Anon. function | | |     | | | Anon. function | | | |
|  | | +----------------+ | |     | | +----------------+ | | |
|  | | +----------------+ | |     | | +----------------+ | | |
|  | | | Try block      | | |     | | | Try block      | | | |
|  | | +----------------+ | |     | | +----------------+ | | |
|  | +--------------------+ |     | +--------------------+ | |
|  +------------------------+     +------------------------+ |
|                                                            |
+------------------------------------------------------------+

When working in the interactive shell, the scope hierarchy is usually flat (“function clause” in the graphic below now refers to anonymous functions instead of named functions):

# Figure 2

+-----------------------+
| Top level             |
|                       |
|  +-----------------+  |
|  | Module          |  |
|  +-----------------+  |
|  +-----------------+  |
|  | Function clause |  |
|  +-----------------+  |
|  +-----------------+  |
|  | Comprehension   |  |
|  +-----------------+  |
|  +-----------------+  |
|  | Anon. function  |  |
|  +-----------------+  |
|  +-----------------+  |
|  | Try block       |  |
|  +-----------------+  |
|                       |
+-----------------------+

Those are the two most commonly seen structures for code organisation in Elixir.

In the general case, however, all scopes are arbitrarily nestable: we could imagine a case expression inside a comprehension or a top-level if expression defining different modules depending on some condition. For example:

f = fn x ->
  case x do
    1 ->
      defmodule M do
        def say do
          "one"
        end
      end
    2 ->
     defmodule N do
        def say do
          "two"
        end
      end
  end
end

# no module has been defined yet
M.say       #=> undefined function: M.say/0
N.say       #=> undefined function: N.say/0

# define M
f.(1)
M.say       #=> "one"
N.say       #=> undefined function: N.say/0

# define N
f.(2)
M.say       #=> "one"
N.say       #=> "two"

In order to understand how the example above works, you should be aware of the fact the a module definition creates the module as its side-effect, so the module itself will be available globally. Only the name of the module is affected by the nesting of the defmodule call as we’ll see later in this article.

Elixir Scopes Are Lexical

This means that it is possible to determine the scope of every identifier only by looking at the source code.

All variable bindings introduced in a scope are available until the end of that scope. Elixir has a few special forms that treat scopes a little differently (namely require, import, and alias). We will examine them at the end of this article.

Scope Nesting and Shadowing

According to the rules of lexical scope, any variables defined in the surrounding scope are accessible in all other scopes it contains.

In Figure 1 above, any variable defined in the top level scope will be accessible in the module’s scope and any scope nested inside it, and so on.

There is an exception to this rule which applies only to named functions: any variable coming from the surrounding scope has to be unquoted inside a function clause body.

Any variable in a nested scope whose name coincides with a variable from the surrounding scope will shadow that outer variable. In other words, the variable inside the nested scope temporarily hides the variable from the surrounding scope, but does not affect it in any way.

The Top Level Scope

The top level scope includes every variable and identifier defined outside of any other scope.

x        #=> undefined function: x/0

x = 1
x        #=> 1

f = fn -> x end
f.()     #=> 1

Named functions cannot be defined at the top level because a named function always belongs within a module. However, named functions can be imported into any lexical scope (including the top level scope) like this:

import String, only: [reverse: 1]

reverse "Hello"  #=> "olleH"

In fact, all functions and macros from the Kernel module are autoimported in the top level scope by the compiler.

Function Clause Scope

Each function clause defines a new lexical scope: any new variable bound inside it will not be available outside of that clause:

defmodule M do
  def foo(x), do: -x

  # this 'x' is completely independent from the one in 'foo/1'
  def bar(x), do: 2*x

  x = 1

  # shadowing in action: the 'x' in the argument list creates a variable
  # local to the function clause's body and has nothing to do with the
  # previously defined 'x'
  f = fn(x) ->
    x = x + 1
  end

  y = f.(x)
  IO.puts "The correct answer is #{y} == #{f.(x)}"
  # output: The correct answer is 2 == 2

  # in this case the argument 'y' shadows the named function 'y/0'
  def y(y), do: y*2

  # here the reference to 'y' inside the function
  # body is actually a recursive call to 'y/0'
  def y, do: y*2
end

M.foo 3      #=> -3
M.bar 4      #=> 8

M.y -2       #=> -4
M.y          #=> infinite loop

Apart from named functions, a new function clause scope is created for each module-like block, anonymous function, try block body, or comprehension body (see below).

f = fn(x) ->
  a = x - 1
end

a            #=> undefined function: a/0

g = fn(f) ->
  g = f
end

f            #=> (still the anonymous function defined above)
g            #=> (the anonymous function we've just defined)

Named Functions And Modules

As mentioned before, named function have a couple of peculiarities.

First, defining a named function does not introduce a new binding into the current scope:

defmodule M do
  def foo, do: "hi"

  foo()  # will cause CompileError: undefined function foo/0
end

Second, named functions cannot directly access surrounding scope, one has to use unquote to achieve that:

defmodule M do
  a = 1

  # 'a' inside unquote() unambiguously refers to 'a' defined
  # in the module's scope
  def a, do: unquote(a)

  # 'a' inside the body unambiguously refers to the function 'a/0'
  def a(b), do: a + b
end

M.a          #=> 1
M.a 3        #=> 4

Module scope works just like function clause scope: any variables defined between defmodule (or defprotocol, etc.) and its corresponding end will not be accessible outside of the module, but they will be available in the nested scopes of that module as per usual (modulo the unquoting caveat of named functions mentioned above).

It is important to understand that a module’s scope exists as long as it is being compiled. In other words, variables are not “compiled into” the module. The Module.function syntax is only applicable to named functions and that’s another thing that makes such functions special:

defmodule M do
  x = "hello"

  def hi, do: unquote(x)
end

M.hi         #=> "hello"
M.x          #=> undefined function: x/0

You may be wondering how local function calls work when named functions don’t produce name bindings and don’t have direct access to the surrounding scope. The answer to this lies in the following rule followed by Elixir when trying to resolve an identifier to its value:

Any unbound identifier is treated as a local function call.

Let’s see how this works in code:

defmodule P do
  def f, do: "I am P's f"
  def g, do: f
end

defmodule Q do
  def f, do: "I am Q's f"
  def g, do: f
end

# both P's 'g' and Q's 'g' refer to their local buddy named 'f'
P.g          #=> "I am P's f"
Q.g          #=> "I am Q's f"

# let's make 'f' local in the top level scope
f            #=> undefined function: f/0
import P
f            #=> "I am P's f"

One more note about module naming and nested modules: modules are always defined at the top level, no matter in what scope the actual call to defmodule is located. This means that as long the VM can find the .beam file with the module’s code at run time, it does not matter in which scope you reference that module’s name.

What the scoping does affect is the name the module will get:

defmodule P do
  # The actual module name will be P.Q, but it is implicitly aliased to Q
  # in P's scope
  defmodule Q do
    def q(false), do: "sorry"
    def q(true) do
      # The actual module name will be P.Q.M
      defmodule M do
        def say, do: "hi"
      end
    end
  end

  # Q is resolved to P.Q
  def foo do
    Q.q false
  end

  # At run time, this has the same exact implementation as foo
  def bar do
    P.Q.q false
  end
end

P.foo         #=> "sorry"
P.bar         #=> "sorry"
P.Q.q false   #=> "sorry"

# the module hasn't been defined yet
P.Q.M.say     #=> undefined function: P.Q.M.say/0

# after this call the P.Q.M module will become available
P.Q.q true
P.Q.M.say     #=> "hi"

Case-like Clauses

Control flow constructs case, receive, and cond share a common trait:

  • any variable introduced in a clause pattern/condition will be accessible only within that clause’s body
  • any variable introduced inside some (but not all) clause bodies will become available in the surrounding scope (possibly with the default nil value)

Here are some examples of those rules in action:

case x do
  # both 'result' and 'a' are visible only within this clause's body
  {:ok, result}=a -> IO.inspect(result); a

  # 'error' is actually bound in the surrounding scope; its value will be nil
  # if 'x' does not match :error
  :error -> error = true

  # ordinary shadowing: this 'x' is visible only within the clause's body and
  # it doesn't affect the 'x' from the surrounding scope
  [x] -> IO.inspect(x)
end

result  #=> undefined function: result/0
a       #=> undefined function: a/0

error   #=> true if x == :error, otherwise nil

Note: due to a bug in the 0.12.x series, cond‘s conditions actually leak bindings to the surrounding scope. This should be fixed in 0.13.1.

cond do
  a0 = false -> a = a0
  b = 1      -> b
  c = 2      -> c = 2
  true       -> d = 3
end

a      #=> false (bound to false inside the 1st condition's body)
b      #=> undefined function: b/0
c      #=> nil (the 2nd condition is truthy, so `c = 2` was not evaluated)
d      #=> nil (the body with `d = 3` was not evaluated,
       #        so 'd' also leaks with the default value)
if x = 3 do
  case y = :ok do
    :ok -> :ok
    :error -> a = "it's an error"
  end
else
  z = 11
end

x      #=> 3
y      #=> :ok
a      #=> nil
z      #=> nil

Try Blocks

The try block works similar to case and receive, but it creates new scope, so it never leaks variable bindings to the surrounding scope.

try do
  # all of the variables defined here are local to this block
  # (like in a function clause scope)
  a = 1
  b = a + 1
  c = d
rescue
  # these work like bindings in `case` patterns
  x in [RuntimeError] -> y = x
  x -> z = x
end

# none of the variables have leaked
a       #=> undefined function: a/0
b       #=> undefined function: b/0
c       #=> undefined function: c/0
d       #=> undefined function: d/0
x       #=> undefined function: x/0
y       #=> undefined function: y/0
z       #=> undefined function: z/0

Comprehensions

Comprehensions consist of two parts: the generator and the body.

Variables introduced in the generator part will only be visible within the body.

for a = x <- [1, 2, 3, 4], do: b = {a, x}
#=> [{1, 1}, {2, 2}, {3, 3}, {4, 4}]

a       #=> undefined function: a/0
x       #=> undefined function: x/0

The comprehension body itself works like function clause scope:

for x <- ["abc", "def"] do
  # import takes effect only within the comprehension's body
  import String, only: [reverse: 1]
  b = reverse x
end
#=> ["cba", "fed"]

b
#=> undefined function: b/0

reverse "hello"
#=> undefined function: reverse/1

require, import, and alias

All of the rules described so far apply to variable bindings. When it comes to one of these three special forms, their effect persists until the end of the do block they are called in. Effectively, those forms see a slightly different scope division in which control flow constructs create a new lexical scope:

# top level scope

defmodule M do
  # new scope
  import String, only: [reverse: 1]

  def foo do
    # new scope
    import String, only: [strip: 1]

    IO.puts reverse("abc")   # ok: inherited from the surrounding scope

    if true do
      # new scope
      import String, only: [downcase: 1]
    else
      # new scope
      import String, only: [upcase: 1]
    end

    " hello "
    |> strip      # ok: made local in the current scope with 'import'
    |> downcase   # error: no local function downcase/1
    |> upcase     # ditto
  end

  def bar do
    # new scope

    IO.puts reverse("abc")   # ok: inherited from the surrounding scope
    strip(" hello ")         # error: no local function strip/1
  end
end

Differences from Erlang

Most of the scoping rules described here have been inherited from Erlang.

One notable difference is that modules simply contain forms and function clauses, they don’t have scope nor allow arbitrary expressions like modules in Elixir do.

There are two differences in the way case clause scope works in Erlang:

  1. both bindings introduced in the pattern and in the body of a clause modify the surrounding scope
  2. those variables that are bound in some (but not all) of the clauses will remain unbound in the surrounding scope (instead of getting the nil value like they do in Elixir); they are also called unsafe variables
case 1 of
  1=A -> B = A;
  _   -> C = 1
end.

A.  %=> 1
B.  %=> 1
C.  %=> variable 'C' is unbound

There is an if construct in Erlang that looks similar to cond, but works differently. It only allows guard expressions as conditions and those do not let you introduce variable bindings. Variables bound in clause bodies leak to the surrounding scope the same way they do in case.

X = 1,
if
  X -> A = X;
  true -> B = X
end.

A.  %=> variable 'A' is unbound
B.  %=> 1

%%%

Y = true,
if
  Y -> P = Y;
  true -> Q = Y
end.

P.  %=> true
Q.  %=> variable 'Q' is unbound

Refer to this page for more information about Erlang control flow constructs.

An assorted list of resources that describe various aspects of Erlang’s scoping rules: