| Home > Technical articles | e-mail: laurie@tratt.net github: ltratt twitter: @laurencetratt |
|
March 3 2008 scopingrules. In this article I'm going to outline why Converge has the scoping rules it does. The first thing I need to do is to outline the problem. Although all my examples are framed in terms of a x := 2 ... y := xIn every language I know this says assign. So after this code is run both x and y will have the value 2.
The first major issue with regards to variables is global vs. local variables. Take the following code which is intended to represent the top-level of a source file:
x := 2
func f():
x := 3
f()
In a lot of BASIC-type languages there is only one underlying x variable in this program, so after this code is run the outer x will have the value 3. In essence we have a flat variable scope: all variables belong to the same, single, namespace. This not only makes writing programs error-prone (e.g. one function accidentally corrupts another's x), but makes certain styles of programming largely impractical (e.g. recursive functions). It's probably no coincidence that the first mainstream programming language to make a virtue out of recursion - Algol 60 - also was among the first to get its scoping rules in reasonable order.
Retro-fitting sane scoping rules is not easy if the first version of your language used the above scoping rule (note the deliberate use of the singular) - any change of scoping rule(s) has a high chance of breaking programs in nasty ways. Some of the BASIC-type languages I first used solved the backwards compatibility problem by making variables
x := 2
func f():
local x
x := 3
f()
This code now has two distinct x variables: one at the top-level and one in the f function. Every time f is called - even recursively - it will be given space for a new, fresh x. This feature makes programming a lot easier, even if it defaults to the insane globalscoping rule by default. As an aside, surprisingly (to me at least), you can still see the global by defaultscoping rule in the modern language Lua. This goes to show how fundamental scoping rules are: once they're in a language, users will resist nearly all meaningful change to them. Most programming languages adopt a slight variant of the above rules which are fairly easy to understand in practice. Essentially variables with the same name as a top-level variable reference that top-level variable directly, while all other variables are local. So in a C-type language the following code contains two variables: a top-level
x := 2
func f():
x := 3
y := 4
f()
Running the above code means that the top-level x is set to 3, while the y variable is local to f. Variations on this set of scoping rules underly many programming languages in use today.
The next major design challenge for scoping rules is much more subtle and confuses many of us to this day. Knowing that the
x = 2
def f():
x = 3
def g():
global x
print x
x = 4
print x
g()
f()
print x
What do you think this will print out? Let's try it out with Python 2.5:
$ python scope.py 3 2 4 $In other words, we get a result which is a long way from what we might have expected: the print statement in g prints 2 instead of the expected 3 and the final print statement prints 4 instead of the expected 2. What is it doing? What's happening is that most languages don't have nested scopes (as one might expect) but two scopes: a top-level (a.k.a. global or module) scope and a function scope. What this means is that the assignment to x in g references the top-level x, not the x in f; you might want to read that twice to check that you've really understood it.
It might at first seem that Python's scoping rules are simply silly; actually, they're not unreasonable and they're shared by most programming languages (e.g. Java). Why? The problem is that the function
func f():
x := 2
func g():
return x
return g
f()()
In other words, f returns a reference to g; when g is executed, the value of x known to f will have disappeared as, in most programming languages, variables are stored on the stack. This means that variables only exist for the duration of a function call. Since f's variables will have disappeared when g is executed, all sorts of bad things could happen.
Scheme was the first language that presented a practical solution to this problem of nested scopes in the form of closures. The standard way that closures are defined is guaranteed to confuse and I'm not going to repeat it. They're actually very simple: essentially each function allocates heap memory to store variables on. Thus if an inner function outlives an outer function there is no problem in referencing variables in the outer function even if the stack space has long since disappeared, since a function calls variables can outlive the function call itself. [As an aside, the fact that closures need to allocate heap memory (although it's often possible to statically analyse such allocations away) has been used as an argument against them in languages such as Java. That's the chief reason that Java has all sorts of complications like inner classes, final variables and so on: Java resisted closures, and then had to resort to hacks to get a poor facsimile of its functionality. It's hard to imagine any decent programming language being built now that doesn't implement closures (Converge certainly does), so closures are gradually losing their When I was designing Converge, I put some effort into deciding what its scoping rules were going to be. I wanted to make things safe by default (e.g. no
x := 2
func f():
x := 3
func g():
nonlocal x
Sys::println(x)
x := 4
Sys::println(x)
g()
func main():
f()
Sys::println(x)
Which when run does the expected:
$ converge s.cv 3 3 2 $The interesting thing here, in my mind at least, is the nonlocal keyword. Although it's a tad awkward sounding, it was the best combination of brevity and accuracy that I could think of. Unlike Python it would be incorrect to declare a variable as global since there are more than 2 scopes: in fact, there are an arbitrary number. What nonlocal is saying is when you see an . It's not a commonly used feature, but when you need it, it is priceless.
I said above that Converge has - or had - the simplest of any imperative programming language (at least as far as I'm aware). Some time after the first publications and release of Converge, the Python team decided to fix their scoping rules for their backwards-compatibility-breaking Python 3000 release. PEP 3104 contains the eventual proposal they came up with. Interestingly it is identical to Converge's scoping rules even down to using the The open question is this: why has it taken us, as a community, 50 years or more to define two simple scoping rules (assignment is local; |
| Home > Technical Articles | e-mail: laurie@tratt.net github: ltratt twitter: @laurencetratt |
| Copyright © 1995-2012 Laurence Tratt | |