You are currently browsing the category archive for the ‘Performance’ category.

I mentioned in a previous blog post that I had finally written the code for Paint.NET so that it would run the animation timer at the correct rate. Namely, at the refresh that the monitor is actually running at instead of a constant 120 Hz.

For the 4.0 release, I chose 120 Hz as a compromise to not delay the release, partly because I’d been working on 4.0 for 5 years and was exhausted – any further delay was just not okay. I had already spent a bunch of time researching how to do this, but had not yet been successful in putting all the pieces together, and couldn’t justify spending more time on it.

The Code

So without further ado, here’s some code that shows exactly how to do this:

The 4 Steps

Here are the 4 steps that it takes to get the refresh rate:

Step 1: Get an HWND

An HWND is a “handle to a window.” This is pretty easy. Just do it.

In raw Win32, you should already have this. MFC, ATL, and WTL are super easy too.

In WinForms, just grab the value of the Form’s Handle property.

In WPF, you’ll want WindowInteropHelper’s help.

In UWP, you’ll want the help of ICoreWindowInterop.

Step 2: Get the HMONITOR

Thankfully this is pretty easy too, thanks to the MonitorFromWindow function. If your window is straddling multiple monitors then this will return the “best” one, at Windows’ discretion. Enumerating all of those monitors is probably much more complex, but I’d be surprised if it’s not possible somehow.


In other words, get information about the monitor. Again, the aptly named GetMonitorInfo function helps us out here.

Step 4: Get the monitor’s display settings

This includes the resolution and refresh rate and … all sorts of weird information that probably isn’t relevant in this decade. We using EnumDisplaySettings for this, which isn’t as obvious if you’re not well-versed in Win32 API patterns.

Also, note that I’m detailing how to query for this information. I didn’t look into how to get a notification for when this changes. I ultimately decided to key off of the standard window activation/focus events, because in order to change the refresh rate you have to click over to a control panel and back. I’m pretty sure WM_DISPLAYCHANGE would be more precise, but I didn’t verify this (it doesn’t say anything about refresh rate changes).

The Backstory

That didn’t seem too bad … so why did I say that this so difficult? The code is only a single page long in C#, and most of it’s just interop definitions!

Well, this is Win32 we’re talking about. Everything’s cryptic, and a lot of the real documentation is buried in the tribal knowledge of various Microsoft engineers. You may only end up with a few paragraphs of code, but it’s often a lot of work to get there (and this isn’t unique to Win32: I just described a lot of software development!).

DirectX, or rather DXGI, turned out to be more readable but way more complex, as we’ll see soon. I didn’t end up needing DirectX’s help in the end, as the code can testify to.

DXGI rabbit hole

My first research attempts were in DXGI to try and find this information. The DXGI_MODE_DESC has the refresh rate right in plain sight, and it’s a DXGI_RATIONAL so maybe it’ll be more accurate than an integer for those times when a display is running at something weird like 59.97 Hz.

Unfortunately, I was never able to find out how to query the current display mode for a specific monitor, window, or render target.

I got pretty close though:

1. Starting with ID2D1RenderTarget, call QueryInterface() to retrieve an interface pointer for ID2D1DeviceContext.

2. On ID2D1DeviceContext, call GetDevice() to get the ID2D1Device.

3. On the ID2D1Device, call QueryInterface() to retrieve an interface pointer for ID2D1Device2.

4. On the ID2D1Device2, call GetDxgiDevice() to get the IDXGIDevice.

5. On the IDXGIDevice, call GetAdapter() to get the IDXGIAdapter

From here you will need to use EnumOutputs to enumerate the IDXGIOutputs, then call GetDesc() on each one until you find one with the right HMONITOR for your HWND. Which means you can’t actually start from your render target (or device context) and get to the refresh rate. You still need some external information, namely the HWND, to get there.

IDXGIOutput also has FindClosestMatchingMode, which sounds promising, but it’s no help at all for what we need.

IDXGIOutput::GetDisplayModeList allows you to enumerate all the modes, but not the current mode. This just seems like an omission to me. IDXGIOutput1 through 5, retrievable via QueryInterface(), don’t have anything to help here either.

Not being able to go from the render target to the refresh rate actually makes sense, since a render target doesn’t have to be attached to a monitor (it could be pointed at a bitmap). However, not being able to query the monitor’s current mode is not something I’ve come up with a plausible explanation for.

Maybe I’m wrong and there is a way to do this with DXGI – like maybe the first entry in GetDisplayModeList is the current mode by convention. The documentation says nothing about this, however.

So, DXGI turned out to be an empty rabbit hole that left me frustrated and so I shelved the problem for a later date.


But, I did finally get it working! As is often the case, it mostly required deciding that this really was the most important thing to work on at the time (prioritization, in other words). Then I sat down for a few hours, did the research, wrote and experimented with some code in C so I wouldn’t have to worry about interop definitions, then ported it to a little C#/WPF sample app, debugged the interop bugs, and then it was ready for integration into Paint.NET (another hour or two).

And now, finally, as of version 4.0.17, Paint.NET is using a lot less CPU time any time it does any animations. This made opening many images run a lot faster (you know you can multiselect with File->Open, right?) – the image thumbnail lists does all sorts of neat animations when you’re doing that.


I was inspecting the latest build of Paint.NET with SciTech Memory Profiler  and noticed that there were a lot of System.Object allocations. Thousands of them … then, tens of thousands of them … and when I had opened 100 images, each of which were 3440×1440 pixels, I had over 800,000 System.Objects on the heap. That’s ridiculous! Not only do those use up a ton of memory, but they can really slow down the garbage collector. (Yes, they’ll survive to gen2 and live a nice quiet retired life, for the most part … but they also have to first survive a gen0 and then a gen1 collection.)

Obviously my question was, where are these coming from?! After poking around in the object graph for a bit, and then digging in with Reflector, it eventually became clear: every ConcurrentDictionary was allocating an Object[] array of size 128, and immediately populating it with brand new Object()s (it was not lazily populated). And Paint.NET uses a lot of ConcurrentDictionarys!

Each of these Objects serves as a locking object to help ensure correctness and good performance for when there are lots of writes to the dictionary. The reason it allocates 128 of these is based on its default policy for concurrencyLevel: 4 x ProcessorCount. My system is a 16-core Dual Xeon E5-2687W with HyperThreading, which means ProcessorCount = 32.

There’s no way this level of concurrency is needed, so I quickly refactored my code to use a utility method for creating ConcurrentDictionary instead of the constructor. Most places in the code only need a low concurrency level, like 2-4. Some places did warrant a higher concurrency level, but I still maxed it out at 1x ProcessorCount.

Once this was done, I recreated the slightly contrived experiment of loading up 100 x 3440×1440 images, and the System.Object count was down to about ~20,000. Not bad!

This may seem like a niche scenario. “Bah! Who buys a Dual Xeon? Most people just have a dual or quad core CPU!” That’s true today. But this will become more important as Intel’s Skylake-X and AMD’s Threadripper bring 16-core CPUs much closer to the mainstream. AMD is already doing a fantastic job with their mainstream 8-core Ryzen chips (which run Paint.NET really fantastically well, by the way!), and Intel has the 6-core Coffee Lake headed to mainstream systems later this year. Core counts are going up, which means ConcurrentDictionary’s memory usage is also going up.

So, if you’re writing a Windows app with the stock .NET Framework and you’re using ConcurrentDictionary a lot, I’d advise you to be careful with it. It’s not as lightweight as you think.

(The good news is that Stephen Toub updated this in the open source .NET Core 2.0 so that only 1x ProcessorCount is employed. Here’s the commit. This doesn’t seem to have made it into the latest .NET Framework 4.7, unfortunately.)