Implementing Rogue's Module/Namespace System

I decided to add "modules" (AKA namespaces) to Rogue to be able to start making some auxiliary libraries without worrying about name collisions with end-developer code.

Here's the syntax - pretty straightforward and C++-esque:

# AlphaBeta.rogue
$module Alpha
class Value
...
endClass

class XYZ
...
endClass

$module Beta
class Value
...
endClass
# Test.rogue
$include "AlphaBeta.rogue"
println Value()         # doesn't work
println Alpha::Value()  # creates an Alpha::Value

$using Alpha
println Value()   # creates an Alpha::Value

$using Beta
println Value()   # creates a Beta::Value; later 'using' commands can 
                  # shadow the namespace of earlier commands

println XYZ()     # creates an Alpha::XYZ because there is no Beta::XYZ

I first tried implementing this module system at the Parser level with limited success. There was always some edge case that complicated things or that I had missed.

I redid the system at the Tokenizer level and it proved to be a simple, elegant solution. Here's an overview of the steps I took:

  1. Before implementing modules my files were compiled in the following order: tokenize a file, preprocess the file tokens, parse the file tokens, repeat for the next file.

  2. I reordered my monolithic compile to be a slightly different order: tokenize every file, then preprocess every file's tokens, and finally parse every file's tokens.

  3. With :: being a unique symbol reserved for being a module name separator, I modified the Tokenizer to accept :: as part of an identifier, so writing Alpha::Value is turned into a single identifier token.

  4. During token preprocessing of a particular file I took note of every $module ModuleName directive and used ModuleName:: as the module context for any class definitions that followed.

  5. When I encountered any class X definition during preprocessing (all Rogue type definitions begin with class), I collected the class name "X" into a central lookup table as an identifier declared under the ModuleName namespace. After I was done I might have collected the following in essence: {"Alpha":["Value","XYZ"],"Beta":["Value"]}.

  6. I then added a new step. In-between preprocessing every file's tokens and then parsing every file's tokens I added a call to insert_module_prefixes(). This just runs through every file's tokens one more time and performs the following steps:

    1. If there is a $module X or a $using X then I add all of module X's declared identifiers into a current module_id_map table. The example table in Step 5 above would produce the following module id map when added: {"Value":"Beta::Value","XYZ":"Alpha::XYZ"}.

    2. Whenever I see an identifier token, I look up its name in the module_id_map and substitute any mapped ID I find there. Like C++ I accept a leading :: on an identifier as a way to escape out of module namespace and back into default namespace.

    3. That's it! class XYZ becomes class Alpha::XYZ and println Value() becomes println Beta::Value by the time those tokens hit the parser. If an identifier is already prefixed with Alpha:: then it doesn't occur in the module_id_map and so it's fine.

    4. One implication of this technique is that you could sprinkle :: in your identifiers quite arbitrarily and independently of the module namespace system. I don't see that as a bad thing; just interesting.