Reference Count, Don't Garbage Collect

The basic rule of thumb is: if you have the option of reference counting, it's going to beat garbage collection, hands down. But there are a few scenarios where reference counting is not going to be available to you. Primarily I would guess these fall into the categories of:

  1. You have a buggy binary (or codebase) whose malloc library you want to replace with a GC because malloc/free/new/delete are being used incorrectly and swapping the library is simpler than fixing the code.

  2. You made an imperative object-oriented or scripting language, you want automatic memory management, you want object mutability, you want shallow references, you want reference cycles, you don't want any hint of cycle detection, you don't want weak pointers, and you're willing to pay a walloping performance penalty to have this configuration.

    Note that this is a very specific case. In array languages, which are functional and not imperative, we typically don't garbage collect at all. You can see one method of acyclic reference counting here or here.


I'm not going to do a real history of GC, but I'll share what my experience with it has been since the 90s. A long time ago we had C and C++, and the problem with that was they used malloc or new, and programmers had a hard time managing this. Fair enough, they're tricky and it's easy to leak memory or whatnot. So, many of the languages that were released afterwards wanted to make programming easier, in part by adding automatic memory management. There are a lot of corner cases in automatic memory management, and one sure way to hammer them all down is to use an old piece of technology known as a garbage collector. Garbage collectors appeared in many higher level languages, perhaps most famously in Java. What we learned later from these deployments was that garbage collection is bad for performance. It's slow, it happens at random times, it can be long running, it touches all of the data in your app, etc. The promise of the GC was that programmers would not have to spend time debugging memory leaks and other such issues, but I witnessed deployments where probably as much developer time was spent tuning an uncontrollable GC; a GC, which, in addition, made other goals impossible, because it necessitated bad and unpredictable performance when steady and predictable performance was required. "See this giant spike in the resource graph? This is where the garbage collector turned on."

In the meantime, because GCs seemed to solve the problem of automatic memory management, a lot of research continued in this area and a lot of students were taught about the merits of garbage collection. Later on, as real deployments struggled with performance issues inherent in garbage collection, other people remembered the technique of reference counting and found ways to institute it in their language, potentially making small sacrifices to do so. Reference counting is a lot faster and doesn't have a long-running background process. Provided your problem admits certain concessions that let you retreat from full-on garbage collection into reference counting, you're going to come out way ahead in the bargain. Objective-C and Swift use a nearly transparent form of reference counting which requires the use of weak pointers to avoid cycles. Python doesn't want to add weak types to pointers, so it reference counts whatever it doesn't need to garbage collect, and the only thing it needs to garbage collect are the few objects capable of forming cycles. Functional languages can often depend on immutability to simply reference count without worrying about garbage collection at all.


Because of the hangover from the preceding decades, there are still a lot of people around who think that garbage collection is a praiseworthy critical feature instead of sometimes-necessary evil. Here's this mistaken site that appeared on Hacker News:

This thing is highly voted and has appeared many times over the years, so I feel confident in asserting it represents commonly held misconceptions in the field.

I'll do a point-by-point refutation of their claims, since even though I think RC being on the defensive makes it sound like there is more of a case for GC than there is, the refutation is illustrative of the mechanics behinds the pros and cons of using reference counting in your application.

Promoting garbage collection over reference counting is ridiculous, a bit like lobbying for hopping on one foot over running as a technique for racing. Hey, what if you broke your ankle the day before, then you might have to hop, right? What if you stepped on a piece of glass, huh? What if it's field day and you're not allowed to use your leg? And so on.

Here's what their FAQ says:

Isn’t it much cheaper to use reference counts rather than garbage collection?

No, updating reference counts is quite expensive, and they have a couple of problems:

They can’t cope with cyclic data structures; that is, sets of objects that are referred to only by objects in that set, but that don’t have a zero reference count.

Reference counting gets more expensive if you have to allow for the count overflowing.

There are many systems that use reference counts, and avoid the problems described above by using a conventional garbage collector to complement it. This is usually done for real-time benefits. Unfortunately, experience shows that this is generally less efficient than implementing a proper real-time garbage collector, except in the case where most reference counts are one.

Let's do a point by point:

1. Updating reference counts is quite expensive.

No, it isn't. It's an atomic increment, perhaps with overflow checks for small integer widths. This is about as minimal as you can get short of nothing at all. The comparison is the ongoing garbage collection routine, which is going to be more expensive and occur unpredictably.

Increments and decrements happen once and at a predictable time. The GC is running all the time and traversing the universe of GC objects. Probably with bad locality, polluting the cache, etc.

Arguing this is just weird. This is what I am talking about with the hopping on one foot.

2. They can’t cope with cyclic data structures; that is, sets of objects that are referred to only by objects in that set, but that don’t have a zero reference count.

This is the only gripe with any merit. Arguably, this is the reason we have garbage collectors in the first place, because people got hung up on this issue before there was a better solution. Then they just kept building more garbage collectors.

In functional languages cyclic data structures aren't an issue because data is "immutable." You do a cycle-check at creation time and either error or split out a copy and reference that. RC still wins because the GC is going to have to make this traversal at least once anyway—this is game over for GC, there's no coming back from it. Performance-wise, a deep object cycle tends to be pathological, so it doesn't hurt your RC guarantees to check it.

In object-oriented languages, where you can literally have a pointer to something, you simply mark a reference as a weak reference if it might create a cycle. Cycles are still detectable at runtime and I think even at compile-time, so it's not going to be a surprise like with leaked memory. You get all the benefits of automatic memory management in your code, but you've made one concession to memory management instead of importing the slowness of GC. Apple did this with ARC and pretty much nobody had a problem with it. The whole point was putting this one requirement back on the developers was far and away worth it to avoid the garbage collector.

You might tell me there are some OO developers who can't handle even this small amount of reference management, but I would say they're in a scripting language with a hidden notion of "shallow" and "deep" copies, and so they're already doing it or they're blissfully unaware and screwing it up anyway.

Python uses a GC here but I wonder how the language would've fared from the beginning with a runtime cycle checker instead.

3. Reference counting gets more expensive if you have to allow for the count overflowing.

Not really. With an 8-byte counter you will never overflow. know...just expand up to 8-bytes as needed? Usually you can get by with a few bits.

Overflow is typically a pathological case, so pathological that I hesitate to treat it because it gives a legitimacy to their claim that isn't there. The dominant pattern is that objects have only a few references. Like, 1 or 3, not 2^63. So, OK, let's say you start doing something weird.

If you must overflow, e.g., you cannot afford an 8-byte counter and you need to overflow a 4-byte counter with billions of references, if you can copy it, you create a shallow copy. (Is a GC program even usable at this point?) Should this actually happen, and I can't think of any time that it does, RC is still going to beat GC. Making a shallow copy during a pathological case handily beats running a garbage collector all the time.

If you can't copy the object, well, when does this happen ever? Any non-static in-memory object that is being referenced (literally pointed to with a address pointer) millions of times from millions of different places is almost certainly an engineering flaw in any language. This is rightly in the class of error like "I tried to malloc 128 exabytes" or "I created too many objects on the stack and it crashed" or "I ran out of open file handles"—system concerns that you won't be able to avoid even from the comfort of your preferred scripting language. (Also, again, does a GC program even run here?)

The answer is really just "use up to 8 bytes of counter when needed."

4. Experience shows that this is generally less efficient than implementing a proper real-time garbage collector.

The opposite of my experience.


I talked with a few garbage collectors to sanity check my thoughts on the subject and I can already see that there is, I suppose not surprisingly, a lot more arguing to do before the issue is settled.

One of the arguments against my position is that it is arguing from experience, which is a fair critique. Every time I use GC it's slow and every time I use RC it's fast, and I'm telling you this. There is no way GC can ever cover the ground because it is broken by design.

I suppose my case would be stronger with controlled, randomized, double-blinded, and peer reviewed benchmarks, complete with video, showing that hopping on one foot is indeed slower. While I would like to believe this is true, I'm sure I'd have to argue even that. And then I'd have to explain why the latest papers demonstrating new and faster methods of hopping on one foot aren't really correct in one way or the other. I think I'd prefer to wait another ten or twenty years until this issue settles itself.

If I had to guess the cause of this situation, I would say it is a psychological one, that generations of computer programmers have been taught the wrong thing, so that when you tell them their thing is straight busted and that there is an alternative way to do things, internally they can't conceive why they weren't taught the simpler and faster way, and so this leads them to make faulty assumptions about how the alternative method works, in order to justify all the work they put in to understand the original thing.

For instance, I think there is this misconception that reference counting must work somehow like read-lock acquisition for functions, where you acquire and release the lock every time you inspect the element with a function call. This is not true. Basically, you attach the reference to the object graph once, and then free it when you're done with it. Most of the time you're not touching the reference at all. While your element is in scope, you can create an arbitrarily high stack of functions pointing back to it with zero additional cost. But this leads to an interesting realization. Even garbage collection has to touch the item once, probably many times, and free it, but then that would imply RC could only be strictly better and faster, which implies...(puzzled expression). Yes, exactly. People don't want to believe this so they view it as the conclusion to a reductio ad absurdem: RC must work some other way. No, it does not.

Reference counting sounds simple but even that simple bit is widely misinterpreted. On top of that, there's an entire discipline to using RC correctly, that is not widely known, and this technique is probably deep enough to teach in the schoolbooks, at equal or greater length to the time spent on garbage collection. But I don't recall any in-depth treatments like this. I've just seen lots of treatises on garbage collecting that skim over the reference counting part.

I've already stated I'm not going to do benchmarks. I am aware of two orgs who've already run extensive and far-reaching experiments on this: Apple, for use in their mobile phones, and the Python project. Apple's thing is self-explanatory, and actually it runs their entire platform, desktops included. But it has the wrinkle that they skipped from manual user reference counting to automatic reference counting without attempting garbage collecting in between because the GC was so terrible in Xcode, plus, you know, ARC requires more work from the developers, so it's not an airtight case here. The Python case is more inarguable. If GC is so good, why wouldn't Python just garbage collect everything, which they already did once and could trivially do, instead of going through the hassle of implementing reference counting for everything but the one case I mentioned? It is because RC outperforms garbage collecting in all these standard cases.