Why I Prefer Tracing Garbage Collection Over ARC
I started rewriting my Rogue language compiler from scratch. Rogue is evolved from a series of earlier languages, so this is approximately the sixth full iteration of that compiler, with each version programmed using the compiler from the previous iteration. Rogue v2 is being programmed in Rogue v1, naturally.
I wasn't sure if I would ever rewrite the Rogue compiler again. I'd kind of hoped not 'cause it's a lot of work - v1 has been working well and has become very robust over time, but on the other hand rewriting it is fun and the end result is better architected every time.
I have three motivations for wanting to rewrite Rogue.
- I wanted to use my compiler development tool Froley to generate and maintain the scanner and parser components.
- Rogue v1 and its predecessors cross-compile (or transpile) to C++ but I've always thought it would be ideal to target straight C instead.
- Rewriting the compiler gives me a good opportunity to refine the language. Now that I've been using Rogue v1 for eight years I've got a few pain points I'm going to address and a few unused features I'm going to trim out.
Those motivations have been building for a while, but the actual catalyst was the release of the Playdate SDK. I'd like to mess around with Playdate and Rogue is almost the ideal language for doing so, but Playdate only (?) works with C, not the C++ that Rogue v1 generates. I know I might be able to make it work regardless, but it was a good excuse to pull the trigger on my Rogue v2 rewrite.
When I first started R2 (as I'm calling it during development) I wanted to consider alternatives to the Mark & Sweep tracing garbage collection that I've always used.
Mark & Sweep |
---|
|
As my first possible alternative I investigated Rust and its Borrow Checker system. Here's a summary from my brief look into it.
Borrow Checker |
---|
|
I'm very impressed with Rust's unique approach as it shifts the burden of memory management entirely onto the compiler (and to some extent the developer as they adhere to the required conventions) instead of requiring any tracking or tracing at runtime. However, while this system works well with Rust's procedural approach I don't believe it's possible to extend it to an object-oriented paradigm. Especially not with Rogue that is so tightly interleaved with C. Rogue passes data to inline C, C functions call Rogue methods, etc., and it's just not feasible to make those interconnections visible to a Borrow Checker-style system.
Finally I turned my thoughts towards Automatic Reference Counting, or ARC.
Automatic Reference Counting (ARC) |
---|
|
Choosing ARC
I had the following things in mind when I initially decided to use ARC for R2.
- Garbage Collection (implicitly meaning "traced" GC) has a bit of a bad reputation, especially with the occasional small GC freezes that used to (and maybe still do) happen on Android.
- Because GC takes longer the more objects there are, I tend to be uptight about creating lots of e.g. small objects in my games if I can accomplish the same things with non-reference data lists and such. It would be nice to not have to worry about that.
- Conversely ARC has more of a "professional" or "robust" halo around it, and you can create as many objects as you want - the execution speed will effectively remain constant in terms of memory management.
- It would be interesting to tackle the challenge of writing an ARC mechanism for R2.
Second Thoughts
I started to have some doubts as I developed my ARC system. These were, in both chronological order and in order of being increasingly troubling, as follows.
- Reference counting every variable didn't sound that bad - just affects the assignments, right? - until I started implementing the underlying mechanism. Then I realized there's just a great deal of overhead involved. Locals need to be ref-counted as well, so the straightforward approach is to to refcount every parameter and every
this
object context for every call. Returned references need to be retained beforehand, and consequently unused return values must be released by the caller. - Rogue/C interoperability becomes more complicated and fragile. C calls from Rogue must remember to retain their return values because Rogue will release them. Likewise C callers must remember to release values returned from Rogue.
- When I started implementing R2's list methods I realized that the ARC overhead was going to be even worse than I had already realized. For example, it's not uncommon for me to copy some or most objects from a main list to a work list to send to a method, clear the original list, copy the updated work list back to the original list, and clear the work list. Say the list has 100 objects. In a tracing GC that's about a hundred individual copies, a memset, a memcpy, and another memset. But in ARC that's a hundred copies with a hundred retains, a hundred individual releases to clear the original list, a hundred more individual copies and retains, and a hundred more releases.
Premature Optimization
At this point I started thinking about going back to good ol' Mark & Sweep. After all, there wasn't really a problem I was trying to solve. Rogue's GC runs very fast, under 2 ms in the average case, if I recall, and my Rogue games run the same speed even if I force GC 60 times a second (once per frame).
And then I had an epiphany that sealed the deal for a return to GC.
Viva GC
I realized I was completely overlooking an important aspect of the GC-vs-ARC argument.
I had been comparing, say, occasional 2 ms GC calls with 0.5 ms of extra ARC tracking overhead per frame, thinking that 0.5 ms every frame was better than 2 ms during a frame once in a while.
What I was missing was that allocation doesn't need to happen all the time! If I write my code well, GC doesn't ever need to happen during core gameplay. In my games, GC only happens when the system reaches a designated allocation threshold or when it's manually triggered. If my game engine is well-engineered, makes use of recycled entity pools and such, and if I trigger GC's at opportune moments between levels and such, then there's little chance that intensive gameplay could be interrupted by a poorly timed GC.
So the real choice is "very fast code with fast, appropriately-timed allocations and GC's" versus "ARC but a bit slow all the time". I choose fast GC.
Appendix: Rogue's GC Tricks
Here are some details on how R2 implements its tracing GC.
Referenced Object Marking Mechanism
Each new object is added to a master list of all objects. Each object has a signed 32-bit refcount
which isn't directly used by the Rogue runtime but can be incremented by native code to signal that the object is referenced in C.
An object is marked as in use by complementing (or flipping the bits of) the refcount
. 0
becomes -1
, 5
becomes -6
, etc. So checking the in use status is as simple as checking refcount < 0
versus refcount >= 0
. The refcount is complemented back to normal again at the end of GC.
GC In-between Calls to Rogue
Rogue's default GC mode is auto
, which means GC can happen any time an object is about to be allocated. This requires extra tracking overhead of making sure every local variable is accessible to the collector.
Rogue has a faster GC mode called manual
, where GC can only be invoked by C in-between calls to Rogue. This means that no local variables need be tracked, so the code runs faster and GC happens faster. This is ideal for games.
Separate Lists for Special Kinds of Objects
Any Rogue class can define an on_cleanup()
method that is automatically called when then object is about to be freed. Instances of that class are said to "require cleanup". Rogue handles these objects by placing them in a separate master list; we know that any unreferenced objects in the objects_requiring_cleanup
list need to have their on_cleanup()
method called.