ConcurrentDictionary allocates … a lot

I was inspecting the latest build of Paint.NET with SciTech Memory Profiler  and noticed that there were a lot of System.Object allocations. Thousands of them … then, tens of thousands of them … and when I had opened 100 images, each of which were 3440×1440 pixels, I had over 800,000 System.Objects on the heap. That’s ridiculous! Not only do those use up a ton of memory, but they can really slow down the garbage collector. (Yes, they’ll survive to gen2 and live a nice quiet retired life, for the most part … but they also have to first survive a gen0 and then a gen1 collection.)

Obviously my question was, where are these coming from?! After poking around in the object graph for a bit, and then digging in with Reflector, it eventually became clear: every ConcurrentDictionary was allocating an Object[] array of size 128, and immediately populating it with brand new Object()s (it was not lazily populated). And Paint.NET uses a lot of ConcurrentDictionarys!

Each of these Objects serves as a locking object to help ensure correctness and good performance for when there are lots of writes to the dictionary. The reason it allocates 128 of these is based on its default policy for concurrencyLevel: 4 x ProcessorCount. My system is a 16-core Dual Xeon E5-2687W with HyperThreading, which means ProcessorCount = 32.

There’s no way this level of concurrency is needed, so I quickly refactored my code to use a utility method for creating ConcurrentDictionary instead of the constructor. Most places in the code only need a low concurrency level, like 2-4. Some places did warrant a higher concurrency level, but I still maxed it out at 1x ProcessorCount.

Once this was done, I recreated the slightly contrived experiment of loading up 100 x 3440×1440 images, and the System.Object count was down to about ~20,000. Not bad!

This may seem like a niche scenario. “Bah! Who buys a Dual Xeon? Most people just have a dual or quad core CPU!” That’s true today. But this will become more important as Intel’s Skylake-X and AMD’s Threadripper bring 16-core CPUs much closer to the mainstream. AMD is already doing a fantastic job with their mainstream 8-core Ryzen chips (which run Paint.NET really fantastically well, by the way!), and Intel has the 6-core Coffee Lake headed to mainstream systems later this year. Core counts are going up, which means ConcurrentDictionary’s memory usage is also going up.

So, if you’re writing a Windows app with the stock .NET Framework and you’re using ConcurrentDictionary a lot, I’d advise you to be careful with it. It’s not as lightweight as you think.

(The good news is that Stephen Toub updated this in the open source .NET Core 2.0 so that only 1x ProcessorCount is employed. Here’s the commit. This doesn’t seem to have made it into the latest .NET Framework 4.7, unfortunately.)

Advertisements

7 thoughts on “ConcurrentDictionary allocates … a lot

  1. Reza says:

    Hi Rick. I’ve been there too! fortunately in my case I found out about issues in ConcurrentDictionary or lets say the whole concurrent collections in .NET early enough to be able to completely switch to another technique in the project. IMHO stay away from those collections as far as you can. They are designed for very specific scenarios where performance is not that important and you really (really) need to offload the concurrent collection operations to .NET framework.

  2. dotnetchris says:

    I was writing high performance code that dealt with concurrent access. Thankfully I did not need to deal with active mutation of the data after load.

    Using hand built collections that were very high memory (4GB+ of array allocation) per collection. Read performance was 50-100x better than ConcurrentDictionary and 4-10x better than Dictionary.

    Truly a niche scenario, but as with everything when it involves real performance. Profile, profile, profile. Everything else is just imagination.

    Who would ever assume, let alone even consider that a Dictionary is a bottleneck? Access is O(1), but yet it was a bottleneck and massive throughput gains provided after profiling showed it was a primary bottleneck.

    Most people aren’t dealing with 1 billion objects in memory however.

  3. DarthVitrial says:

    Interesting that you found this. have you spoken to Microsoft about this problem?

Comments are closed.