You are currently browsing the category archive for the ‘Code’ category.

I mentioned in a previous blog post that I had finally written the code for Paint.NET so that it would run the animation timer at the correct rate. Namely, at the refresh that the monitor is actually running at instead of a constant 120 Hz.

For the 4.0 release, I chose 120 Hz as a compromise to not delay the release, partly because I’d been working on 4.0 for 5 years and was exhausted – any further delay was just not okay. I had already spent a bunch of time researching how to do this, but had not yet been successful in putting all the pieces together, and couldn’t justify spending more time on it.

The Code

So without further ado, here’s some code that shows exactly how to do this:

The 4 Steps

Here are the 4 steps that it takes to get the refresh rate:

Step 1: Get an HWND

An HWND is a “handle to a window.” This is pretty easy. Just do it.

In raw Win32, you should already have this. MFC, ATL, and WTL are super easy too.

In WinForms, just grab the value of the Form’s Handle property.

In WPF, you’ll want WindowInteropHelper’s help.

In UWP, you’ll want the help of ICoreWindowInterop.

Step 2: Get the HMONITOR

Thankfully this is pretty easy too, thanks to the MonitorFromWindow function. If your window is straddling multiple monitors then this will return the “best” one, at Windows’ discretion. Enumerating all of those monitors is probably much more complex, but I’d be surprised if it’s not possible somehow.


In other words, get information about the monitor. Again, the aptly named GetMonitorInfo function helps us out here.

Step 4: Get the monitor’s display settings

This includes the resolution and refresh rate and … all sorts of weird information that probably isn’t relevant in this decade. We using EnumDisplaySettings for this, which isn’t as obvious if you’re not well-versed in Win32 API patterns.

Also, note that I’m detailing how to query for this information. I didn’t look into how to get a notification for when this changes. I ultimately decided to key off of the standard window activation/focus events, because in order to change the refresh rate you have to click over to a control panel and back. I’m pretty sure WM_DISPLAYCHANGE would be more precise, but I didn’t verify this (it doesn’t say anything about refresh rate changes).

The Backstory

That didn’t seem too bad … so why did I say that this so difficult? The code is only a single page long in C#, and most of it’s just interop definitions!

Well, this is Win32 we’re talking about. Everything’s cryptic, and a lot of the real documentation is buried in the tribal knowledge of various Microsoft engineers. You may only end up with a few paragraphs of code, but it’s often a lot of work to get there (and this isn’t unique to Win32: I just described a lot of software development!).

DirectX, or rather DXGI, turned out to be more readable but way more complex, as we’ll see soon. I didn’t end up needing DirectX’s help in the end, as the code can testify to.

DXGI rabbit hole

My first research attempts were in DXGI to try and find this information. The DXGI_MODE_DESC has the refresh rate right in plain sight, and it’s a DXGI_RATIONAL so maybe it’ll be more accurate than an integer for those times when a display is running at something weird like 59.97 Hz.

Unfortunately, I was never able to find out how to query the current display mode for a specific monitor, window, or render target.

I got pretty close though:

1. Starting with ID2D1RenderTarget, call QueryInterface() to retrieve an interface pointer for ID2D1DeviceContext.

2. On ID2D1DeviceContext, call GetDevice() to get the ID2D1Device.

3. On the ID2D1Device, call QueryInterface() to retrieve an interface pointer for ID2D1Device2.

4. On the ID2D1Device2, call GetDxgiDevice() to get the IDXGIDevice.

5. On the IDXGIDevice, call GetAdapter() to get the IDXGIAdapter

From here you will need to use EnumOutputs to enumerate the IDXGIOutputs, then call GetDesc() on each one until you find one with the right HMONITOR for your HWND. Which means you can’t actually start from your render target (or device context) and get to the refresh rate. You still need some external information, namely the HWND, to get there.

IDXGIOutput also has FindClosestMatchingMode, which sounds promising, but it’s no help at all for what we need.

IDXGIOutput::GetDisplayModeList allows you to enumerate all the modes, but not the current mode. This just seems like an omission to me. IDXGIOutput1 through 5, retrievable via QueryInterface(), don’t have anything to help here either.

Not being able to go from the render target to the refresh rate actually makes sense, since a render target doesn’t have to be attached to a monitor (it could be pointed at a bitmap). However, not being able to query the monitor’s current mode is not something I’ve come up with a plausible explanation for.

Maybe I’m wrong and there is a way to do this with DXGI – like maybe the first entry in GetDisplayModeList is the current mode by convention. The documentation says nothing about this, however.

So, DXGI turned out to be an empty rabbit hole that left me frustrated and so I shelved the problem for a later date.


But, I did finally get it working! As is often the case, it mostly required deciding that this really was the most important thing to work on at the time (prioritization, in other words). Then I sat down for a few hours, did the research, wrote and experimented with some code in C so I wouldn’t have to worry about interop definitions, then ported it to a little C#/WPF sample app, debugged the interop bugs, and then it was ready for integration into Paint.NET (another hour or two).

And now, finally, as of version 4.0.17, Paint.NET is using a lot less CPU time any time it does any animations. This made opening many images run a lot faster (you know you can multiselect with File->Open, right?) – the image thumbnail lists does all sorts of neat animations when you’re doing that.


I was inspecting the latest build of Paint.NET with SciTech Memory Profiler  and noticed that there were a lot of System.Object allocations. Thousands of them … then, tens of thousands of them … and when I had opened 100 images, each of which were 3440×1440 pixels, I had over 800,000 System.Objects on the heap. That’s ridiculous! Not only do those use up a ton of memory, but they can really slow down the garbage collector. (Yes, they’ll survive to gen2 and live a nice quiet retired life, for the most part … but they also have to first survive a gen0 and then a gen1 collection.)

Obviously my question was, where are these coming from?! After poking around in the object graph for a bit, and then digging in with Reflector, it eventually became clear: every ConcurrentDictionary was allocating an Object[] array of size 128, and immediately populating it with brand new Object()s (it was not lazily populated). And Paint.NET uses a lot of ConcurrentDictionarys!

Each of these Objects serves as a locking object to help ensure correctness and good performance for when there are lots of writes to the dictionary. The reason it allocates 128 of these is based on its default policy for concurrencyLevel: 4 x ProcessorCount. My system is a 16-core Dual Xeon E5-2687W with HyperThreading, which means ProcessorCount = 32.

There’s no way this level of concurrency is needed, so I quickly refactored my code to use a utility method for creating ConcurrentDictionary instead of the constructor. Most places in the code only need a low concurrency level, like 2-4. Some places did warrant a higher concurrency level, but I still maxed it out at 1x ProcessorCount.

Once this was done, I recreated the slightly contrived experiment of loading up 100 x 3440×1440 images, and the System.Object count was down to about ~20,000. Not bad!

This may seem like a niche scenario. “Bah! Who buys a Dual Xeon? Most people just have a dual or quad core CPU!” That’s true today. But this will become more important as Intel’s Skylake-X and AMD’s Threadripper bring 16-core CPUs much closer to the mainstream. AMD is already doing a fantastic job with their mainstream 8-core Ryzen chips (which run Paint.NET really fantastically well, by the way!), and Intel has the 6-core Coffee Lake headed to mainstream systems later this year. Core counts are going up, which means ConcurrentDictionary’s memory usage is also going up.

So, if you’re writing a Windows app with the stock .NET Framework and you’re using ConcurrentDictionary a lot, I’d advise you to be careful with it. It’s not as lightweight as you think.

(The good news is that Stephen Toub updated this in the open source .NET Core 2.0 so that only 1x ProcessorCount is employed. Here’s the commit. This doesn’t seem to have made it into the latest .NET Framework 4.7, unfortunately.)

(Side note: expect a Paint.NET 4.0.6 update soon, hopefully timed with the availability of Windows 10. It’s not just a bugfix release, either. More info soon!)

If you have some code whose performance is limited by string building or concatenation, you may want to read this. For everyone else: close your eyes! Here be dragons! (just kidding, but seriously, be careful with this stuff)

Everybody knows that strings are immutable in .NET. When I first tried out C# (in 2003?) this annoyed me, as I was used to C and C++ and being able to do whatever I wanted. I quickly learned to love this property, and embraced immutability and functional programming style in general. The benefits far outweigh the initial clumsiness.

Everybody also knows that you can pin a string with unsafe and fixed, get a pointer to the first character, and read directly from it or even hand it off the native code. No copying required. Right? Right?!

And following from that, there’s a dirty little secret here that nobody really likes to talk about: .NET strings aren’t actually immutable. You can get a pointer to one, and you can write to that pointer because it’s allocated in the regular heap and not in some special read-only memory pages (e.g. via VirtualProtect). Strings in .NET don’t cache their hash code, therefore there’s nothing to cause an inconsistency if you do something like …

String s = “I’m immutable, right?”;
    fixed (char* p = s)
        *(p + 4) = ‘_‘;
        *(p + 5) = ‘_‘;        

Console.WriteLine(s); // outputs “I’m __mutable, right?”;

Note that if you do this anywhere other than while initializing a string, be prepared for the local townsfolk to come knocking on your door. With torches and pitchforks. It’s very important to respect the immutability contract once you hand off a String to external code.

If we browse around with Reflector or over on github in the coreclr repo, we can see that .NET itself takes advantage of this in functions such as System.String.Concat()  and its helper function FillStringChecked():

public static String Concat(String str0, String str1, String str2) {
    // … parameter checking and other stuff omitted for brevity …
    int totalLength = str0.Length + str1.Length + str2.Length;
    String result = FastAllocateString(totalLength);
    FillStringChecked(result, 0, str0);
    FillStringChecked(result, str0.Length, str1);
    FillStringChecked(result, str0.Length + str1.Length, str2);
    return result;

unsafe private static void FillStringChecked(String dest, int destPos, String src)
    // … parameter checking omitted for brevity
    fixed(char *pDest = &dest.m_firstChar)
    fixed (char *pSrc = &src.m_firstChar) {
        wstrcpy(pDest + destPos, pSrc, src.Length);

And wstrcpy just calls into some regular ol’ memcpy/memmove type code. Some of those code paths (via wstrcpy) are managed code, others eventually thunk into native, but none of it’s magic: you can do all of this yourself! You can’t call FastAllocateString()*, but you can still allocate a string of a specified length via new String(char c, int count). Just use a placeholder character for the first parameter, like a space or even null ‘’. Also note that all .NET strings are null-terminated, so if you create one of length N, you actually have N+1 characters. The memory layout intentionally matches BSTR which gives a lot of flexibility for passing things to native code without having to do any copying. (I don’t know what would happen if you overwrote the length prefix, but feel free to try it out and post a comment!)

With this information you can build your own versions of String.Concat or StringBuilder that can avoid making copies in specialized situations. However, I’m not actually sure how useful this is: I looked through the Paint.NET codebase and found only a handful of places that do lots of string manipulation, and even then they were mostly in error reporting code paths whose performance was entirely unimportant. I decided to leave the code alone, preferring safety over a little bit of performance (and fun and creativity). Your Mileage May Vary.

On a side note, the fact that you can do this at all is one of the reasons why I love .NET and C#. You get all the good stuff that a managed runtime and high-level language is supposed to provide, but you also get “escape hatches” to drill through the glass floors on the abstraction ladder. You can push the runtime and its safety guarantees out of the way in the name of performance (or for other reasons). Paint.NET takes advantage of this a lot: graphics code loves things like pointers and structs. Lately I’ve been doing a lot of work on Android** with Java, and the lack of things like pointers and structs … gosh, it just makes things so much more difficult to work with from a performance standpoint!

* Well, maybe you could with reflection Winking smile That would probably eliminate its “fast” property though.

** Sorry, not for Paint.NET.

.NET Framework 4.5 contains a very cool new feature called Multi-Core JIT. You can think of it as a profile-guided JIT prefetcher for application startup, and can read about it in a few places …

I’ve been using .NET 4.0 to develop Paint.NET 4.0 for the past few years. Now that .NET 4.5 is out, I’ve been upgrading Paint.NET to require it. However, due to a circumstance beyond my control at this moment, I can’t actually use anything in .NET 4.5 (see below for why). So Paint.NET is compiled for .NET 4.0 and can’t use .NET 4.5’s features at compile time, but as it turns out they are still there at runtime.

I decided to see if it was possible to use the ProfileOptimization class via reflection even if I compiled for .NET 4.0. The answer: yes! You may ask why you’d want to do this at all instead of biting the bullet and requiring .NET 4.5. Well, you may need to keep your project on .NET 4.0 in order to maintain maximum compatibility with your customers who aren’t yet ready (or willing Smile) to install .NET 4.5. Maybe you’d like to use the ProfileOptimization class in your next “dot release” (e.g. v1.0.1) as a free performance boost for those who’ve upgraded to .NET 4.5, but without displacing those who haven’t.

So, here’s the code, which I’ve verified as working just fine if you compile for .NET 4.0 but run with .NET 4.5 installed:

using System.Reflection;

Type systemRuntimeProfileOptimizationType = Type.GetType("System.Runtime.ProfileOptimization", false);
if (systemRuntimeProfileOptimizationType != null)
    MethodInfo setProfileRootMethod = systemRuntimeProfileOptimizationType.GetMethod("SetProfileRoot", BindingFlags.Static | BindingFlags.Public, null, new Type[] { typeof(string) }, null);
    MethodInfo startProfileMethod = systemRuntimeProfileOptimizationType.GetMethod("StartProfile", BindingFlags.Static | BindingFlags.Public, null, new Type[] { typeof(string) }, null);

    if (setProfileRootMethod != null && startProfileMethod != null)
            // Figure out where to put the profile (go ahead and customize this for your application)
            // This code will end up using something like, C:\Users\UserName\AppData\Local\YourAppName\StartupProfile\
            string localSettingsDir = Environment.GetFolderPath(Environment.SpecialFolder.LocalApplicationData);
            string localAppSettingsDir = Path.Combine(localSettingsDir, "YourAppName");
            string profileDir = Path.Combine(localAppSettingsDir, "ProfileOptimization");

            setProfileRootMethod.Invoke(null, new object[] { profileDir });
            startProfileMethod.Invoke(null, new object[] { "Startup.profile" }); // don’t need to be too clever here

        catch (Exception)
            // discard errors. good faith effort only.

I’m not sure I’ll be using this in Paint.NET 4.0 since it uses NGEN already, but it’s nice to have this code snippet around.

So, why can’t I use .NET 4.5? Well, they removed support for Setup projects (*.vdproj) in Visual Studio 2012, and I don’t yet have the time or energy to convert Paint.NET’s MSI to be built using WiX. I’m not willing to push back Paint.NET 4.0 any further because of this. Instead, I will continue using Visual Studio 2010 and compiling for .NET 4.0 (or maybe I’ll find a better approach). However, at install time and application startup, it will check for and require .NET 4.5. The installer will get it installed if necessary. Also, there’s a serialization bug in .NET 4.0 which has dire consequences for images saved in the native .PDN file format, but it’s fixed in .NET 4.5 (and for .NET 4.0 apps if 4.5 just happens to be what’s installed).

I’ve come up with a trick that can be used in some very specific scenarios in order to avoid extra array copying when calling into native code from managed code (e.g. C#). This won’t usually work for regular P/Invokes into all your favorite Win32 APIs, but I’m hopeful it’ll be useful for someone somewhere. It’s not even evil! No hacks required.

Many native methods require the caller to allocate the array and specify its length, and then the callee fills it in or returns an error code indicating that the buffer is too small. The technique described in this post is not necessary for those, as they can already be used optimally without any copying.

Instead, let’s talk about the general problem if you’re calling a native method which does the array allocation and then returns it. You can’t use it as a “managed array” unless you copy it into a brand new managed array (don’t forget to free the native array). In other words, native { T* pArray; size_t length; } cannot be used as a simple managed T[] as-is (or even with modification!). The managed runtime didn’t allocate it, won’t recognize it, and there’s nothing you can do about it. Very few managed methods will accept a pointer and a length, and will require a managed array. This is particularly irksome when you want to use System.IO.Stream.Read() or Write() with bytes from a native-side buffer.

Paint.NET uses a library written in classic C called General Polygon Clipper (GPC), from The University of Manchester, to perform polygon clipping. This is used for, among other things, when you draw a selection with a mode such as add (union), subtract (exclude), intersect, and invert (“xor”). I blogged about this 4 years ago when version 3.35 was about to be released: using GPC made these operations immensely faster, and I saved a lot of time and headache by purchasing a commercial use license for the library and then integrating it into the Paint.NET code base. tl;dr: The algorithms for doing this are nontrivial and rife with special corner cases, and I’d been struggling to find enough sequential time to implement and debug it on my own.

Anyway, the data going into and coming out of GPC is an array of polygons. Each polygon is an array of points, each of which is just a struct containing X and Y as double-precision floating point values. To put it simply, it’s just a System.Windows.Point[][] (I actually use my own geometry primitives nowadays, but that’s another story, and it’s the same exact thing).

Getting this data into GPC from the managed side is easy. You pin every array, and then hand off the pinned pointers to GPC. Since you can’t use the “fixed” expression with a dynamic number of elements, I use GCHandle directly and stuff them all into GCHandle[] arrays for the duration of the native call. This is great because on the managed side I can work with regular ol’ managed arrays, and then send them off to GPC as “native arrays” by pinning them and using the pointers obtained from GCHandle.AddrOfPinnedObject().

Now, here’s the heart breaking part. GPC allocates the output polygon using good ol’ malloc*. So when I get the result back on the managed side, I must copy every single last one so that I can use it as a Point[] (a managed array). This ends up burning a lot of CPU time, and can cause virtual address space claustrophobia on 32-bit/x86 systems when working with complex selections (e.g. Magic Wand), as you must have enough memory for 2 copies of the result while you’re doing the copying. (Or you could free each native array after you copy it into a managed array, but that’s an optimization for another day, and isn’t as straightforward as you’d think because freeing the native memory requires another P/Invoke, and those add up, and so it might not actually be an optimization.)

But wait, there’s another way! Since the code for GPC is part of my build, I can modify it. So I added an extra parameter called gpc_vertex_calloc:

    typedef (gpc_vertex *)(__stdcall * gpc_vertex_calloc_fn)(int count);

    void, gpc_polygon_clip)(
        gpc_op                set_operation,
        gpc_polygon          *subject_polygon,
        gpc_polygon          *clip_polygon,
        // result_polygon holds the arrays of gpc_vertex, aka System.Windows.Point)
        gpc_polygon          *result_polygon, 
        gpc_vertex_calloc_fn  gpc_vertex_calloc);

("gpc_vertex” is GPC’s struct that has the same layout as System.Windows.Point: X and Y, defined as a double.)

In short, I’ve changed GPC so that is uses an external allocator by passing in a function pointer it should use instead of malloc. And now if I want I can have it use malloc, HeapAlloc, VirtualAlloc, or even the secret sauce detailed below.

On the managed side, the interop delegate definition for gpc_vertex_calloc_fn gets defined as such:

    public delegate IntPtr gpc_vertex_calloc_fn(int count);

And gpc_polygon_clip’s interop defintion is like so:

    [DllImport(“PaintDotNet.SystemLayer.Native.x86.dll”, CallingConvention = CallingConvention.StdCall)]
    public static extern void gpc_polygon_clip(
        [In] NativeConstants.gpc_op set_operation,
        [In] ref NativeStructs.gpc_polygon subject_polygon,
        [In] ref NativeStructs.gpc_polygon clip_polygon,
        [In, Out] ref NativeStructs.gpc_polygon result_polygon,
        [In] [MarshalAs(UnmanagedType.FunctionPtr)] NativeDelegates.gpc_vertex_calloc_fn gpc_vertex_calloc);

So, we’re half way there, and now we need to implement the allocator on the managed side.

    internal unsafe sealed class PinnedManagedArrayAllocator<T>
        : Disposable
          where T : struct
        private Dictionary<IntPtr, T[]> pbArrayToArray;
        private Dictionary<IntPtr, GCHandle> pbArrayToGCHandle;

        public PinnedManagedArrayAllocator()
            this.pbArrayToArray = new Dictionary<IntPtr, T[]>();
            this.pbArrayToGCHandle = new Dictionary<IntPtr, GCHandle>();
        // (Finalizer is already implemented by the base class (Disposable))

        protected override void Dispose(bool disposing)
            if (this.pbArrayToGCHandle != null)
                foreach (GCHandle gcHandle in this.pbArrayToGCHandle.Values)

                this.pbArrayToGCHandle = null;

            this.pbArrayToArray = null;


        // Pass a delegate to this method for “gpc_vertex_calloc_fn”. Don’t forget to use GC.KeepAlive() on the delegate!
        public IntPtr AllocateArray(int count)
            T[] array = new T[count];
            GCHandle gcHandle = GCHandle.Alloc(array, GCHandleType.Pinned);
            IntPtr pbArray = gcHandle.AddrOfPinnedObject();
            this.pbArrayToArray.Add(pbArray, array);
            this.pbArrayToGCHandle.Add(pbArray, gcHandle);
            return pbArray;

        // This is what you would use instead of, e.g. Marshal.Copy()
        public T[] GetManagedArray(IntPtr pbArray)
            return this.pbArrayToArray[pbArray];

(“Disposable” is a base class which implements, you guessed it, IDisposable, while also ensuring that Dispose(bool) never gets called more than once. This is important for other places where thread safety is very important. This class is specifically not thread safe, but it should be reasonably easy to make it so.)

And that’s it! Well, almost. I’m omitting the guts of the interop code but nothing that should inhibit comprehension of this part of it. Also, the above code is not hardened for error cases, and should not be used as-is for anything running on a server or in a shared process. Oh, and I just noticed that my Dispose() method has an incorrect implementation, whereby it shouldn’t be using this.pbArrayToGCHandle, specifically it shouldn’t be foreach-ing on it, and should instead wrap that in its own IDisposable-implementing class … exercise for the reader? Or I can post a fix later if someone wants it.

After I’ve called gpc_polygon_clip, instead of copying all the arrays using something like System.Runtime.InteropServices.Marshal.Copy(), I just use GetManagedArray() and pass in the pointer that GPC retrieved from its gpc_vertex_calloc_fn, aka AllocateArray(). When I’m done, I dispose the PinnedManagedArrayAllocator and it unpins all the managed arrays. And this is much faster than making copies of everything.

Now, this isn’t the exact code I’m using. I’ve un-generalized it in the real code so I can allocate all of the arrays at once instead of incurring potentially hundreds of managed <—> native transitions for each allocation. The above implementation also doesn’t have a “FreeArray” method; I had one, but I ended up not needing it, so I removed it.

So the next time you find yourself calling into native code which either 1) allows you to specify an external allocator, or 2) is part of your build, and that 3) involves lots of data and thus lots of copying and wasted CPU time, you might just consider using the tactic above. Your users will thank you.

Lastly, I apologize for my blog’s poor code formatting.

Legal: I hereby place the code in this blog post into the public domain for anyone to do whatever they want with. Attribution is not required, but I certainly appreciate if you send me an e-mail or post a comment and let me know it was useful for you.

* Actually in Paint.NET 3.5, I changed it to use HeapAlloc(). This way I can get an exception raised when it runs out of memory, instead of corrupt results. This does happen on 32-bit/x86, especially when using the Magic Wand on large images.