Marshaling native arrays back as managed arrays without copying

I’ve come up with a trick that can be used in some very specific scenarios in order to avoid extra array copying when calling into native code from managed code (e.g. C#). This won’t usually work for regular P/Invokes into all your favorite Win32 APIs, but I’m hopeful it’ll be useful for someone somewhere. It’s not even evil! No hacks required.

Many native methods require the caller to allocate the array and specify its length, and then the callee fills it in or returns an error code indicating that the buffer is too small. The technique described in this post is not necessary for those, as they can already be used optimally without any copying.

Instead, let’s talk about the general problem if you’re calling a native method which does the array allocation and then returns it. You can’t use it as a “managed array” unless you copy it into a brand new managed array (don’t forget to free the native array). In other words, native { T* pArray; size_t length; } cannot be used as a simple managed T[] as-is (or even with modification!). The managed runtime didn’t allocate it, won’t recognize it, and there’s nothing you can do about it. Very few managed methods will accept a pointer and a length, and will require a managed array. This is particularly irksome when you want to use System.IO.Stream.Read() or Write() with bytes from a native-side buffer.

Paint.NET uses a library written in classic C called General Polygon Clipper (GPC), from The University of Manchester, to perform polygon clipping. This is used for, among other things, when you draw a selection with a mode such as add (union), subtract (exclude), intersect, and invert (“xor”). I blogged about this 4 years ago when version 3.35 was about to be released: using GPC made these operations immensely faster, and I saved a lot of time and headache by purchasing a commercial use license for the library and then integrating it into the Paint.NET code base. tl;dr: The algorithms for doing this are nontrivial and rife with special corner cases, and I’d been struggling to find enough sequential time to implement and debug it on my own.

Anyway, the data going into and coming out of GPC is an array of polygons. Each polygon is an array of points, each of which is just a struct containing X and Y as double-precision floating point values. To put it simply, it’s just a System.Windows.Point[][] (I actually use my own geometry primitives nowadays, but that’s another story, and it’s the same exact thing).

Getting this data into GPC from the managed side is easy. You pin every array, and then hand off the pinned pointers to GPC. Since you can’t use the “fixed” expression with a dynamic number of elements, I use GCHandle directly and stuff them all into GCHandle[] arrays for the duration of the native call. This is great because on the managed side I can work with regular ol’ managed arrays, and then send them off to GPC as “native arrays” by pinning them and using the pointers obtained from GCHandle.AddrOfPinnedObject().

Now, here’s the heart breaking part. GPC allocates the output polygon using good ol’ malloc*. So when I get the result back on the managed side, I must copy every single last one so that I can use it as a Point[] (a managed array). This ends up burning a lot of CPU time, and can cause virtual address space claustrophobia on 32-bit/x86 systems when working with complex selections (e.g. Magic Wand), as you must have enough memory for 2 copies of the result while you’re doing the copying. (Or you could free each native array after you copy it into a managed array, but that’s an optimization for another day, and isn’t as straightforward as you’d think because freeing the native memory requires another P/Invoke, and those add up, and so it might not actually be an optimization.)

But wait, there’s another way! Since the code for GPC is part of my build, I can modify it. So I added an extra parameter called gpc_vertex_calloc:

    typedef (gpc_vertex *)(__stdcall * gpc_vertex_calloc_fn)(int count);

    GPC_DLL_EXPORT(
    void, gpc_polygon_clip)(
        gpc_op                set_operation,
        gpc_polygon          *subject_polygon,
        gpc_polygon          *clip_polygon,
        // result_polygon holds the arrays of gpc_vertex, aka System.Windows.Point)
        gpc_polygon          *result_polygon, 
        gpc_vertex_calloc_fn  gpc_vertex_calloc);

("gpc_vertex” is GPC’s struct that has the same layout as System.Windows.Point: X and Y, defined as a double.)

In short, I’ve changed GPC so that is uses an external allocator by passing in a function pointer it should use instead of malloc. And now if I want I can have it use malloc, HeapAlloc, VirtualAlloc, or even the secret sauce detailed below.

On the managed side, the interop delegate definition for gpc_vertex_calloc_fn gets defined as such:

    [UnmanagedFunctionPointer(CallingConvention.StdCall)]
    public delegate IntPtr gpc_vertex_calloc_fn(int count);

And gpc_polygon_clip’s interop defintion is like so:

    [DllImport(“PaintDotNet.SystemLayer.Native.x86.dll”, CallingConvention = CallingConvention.StdCall)]
    public static extern void gpc_polygon_clip(
        [In] NativeConstants.gpc_op set_operation,
        [In] ref NativeStructs.gpc_polygon subject_polygon,
        [In] ref NativeStructs.gpc_polygon clip_polygon,
        [In, Out] ref NativeStructs.gpc_polygon result_polygon,
        [In] [MarshalAs(UnmanagedType.FunctionPtr)] NativeDelegates.gpc_vertex_calloc_fn gpc_vertex_calloc);

So, we’re half way there, and now we need to implement the allocator on the managed side.

    internal unsafe sealed class PinnedManagedArrayAllocator<T>
        : Disposable
          where T : struct
    {
        private Dictionary<IntPtr, T[]> pbArrayToArray;
        private Dictionary<IntPtr, GCHandle> pbArrayToGCHandle;

        public PinnedManagedArrayAllocator()
        {
            this.pbArrayToArray = new Dictionary<IntPtr, T[]>();
            this.pbArrayToGCHandle = new Dictionary<IntPtr, GCHandle>();
        }
 
        // (Finalizer is already implemented by the base class (Disposable))

        protected override void Dispose(bool disposing)
        {
            if (this.pbArrayToGCHandle != null)
            {
                foreach (GCHandle gcHandle in this.pbArrayToGCHandle.Values)
                {
                    gcHandle.Free();
                }

                this.pbArrayToGCHandle = null;
            }

            this.pbArrayToArray = null;

            base.Dispose(disposing);
        }

        // Pass a delegate to this method for “gpc_vertex_calloc_fn”. Don’t forget to use GC.KeepAlive() on the delegate!
        public IntPtr AllocateArray(int count)
        {
            T[] array = new T[count];
            GCHandle gcHandle = GCHandle.Alloc(array, GCHandleType.Pinned);
            IntPtr pbArray = gcHandle.AddrOfPinnedObject();
            this.pbArrayToArray.Add(pbArray, array);
            this.pbArrayToGCHandle.Add(pbArray, gcHandle);
            return pbArray;
        }

        // This is what you would use instead of, e.g. Marshal.Copy()
        public T[] GetManagedArray(IntPtr pbArray)
        {
            return this.pbArrayToArray[pbArray];
        }
    }

(“Disposable” is a base class which implements, you guessed it, IDisposable, while also ensuring that Dispose(bool) never gets called more than once. This is important for other places where thread safety is very important. This class is specifically not thread safe, but it should be reasonably easy to make it so.)

And that’s it! Well, almost. I’m omitting the guts of the interop code but nothing that should inhibit comprehension of this part of it. Also, the above code is not hardened for error cases, and should not be used as-is for anything running on a server or in a shared process. Oh, and I just noticed that my Dispose() method has an incorrect implementation, whereby it shouldn’t be using this.pbArrayToGCHandle, specifically it shouldn’t be foreach-ing on it, and should instead wrap that in its own IDisposable-implementing class … exercise for the reader? Or I can post a fix later if someone wants it.

After I’ve called gpc_polygon_clip, instead of copying all the arrays using something like System.Runtime.InteropServices.Marshal.Copy(), I just use GetManagedArray() and pass in the pointer that GPC retrieved from its gpc_vertex_calloc_fn, aka AllocateArray(). When I’m done, I dispose the PinnedManagedArrayAllocator and it unpins all the managed arrays. And this is much faster than making copies of everything.

Now, this isn’t the exact code I’m using. I’ve un-generalized it in the real code so I can allocate all of the arrays at once instead of incurring potentially hundreds of managed <—> native transitions for each allocation. The above implementation also doesn’t have a “FreeArray” method; I had one, but I ended up not needing it, so I removed it.

So the next time you find yourself calling into native code which either 1) allows you to specify an external allocator, or 2) is part of your build, and that 3) involves lots of data and thus lots of copying and wasted CPU time, you might just consider using the tactic above. Your users will thank you.

Lastly, I apologize for my blog’s poor code formatting.

Legal: I hereby place the code in this blog post into the public domain for anyone to do whatever they want with. Attribution is not required, but I certainly appreciate if you send me an e-mail or post a comment and let me know it was useful for you.

* Actually in Paint.NET 3.5, I changed it to use HeapAlloc(). This way I can get an exception raised when it runs out of memory, instead of corrupt results. This does happen on 32-bit/x86, especially when using the Magic Wand on large images.

New features for Paint.NET 4.0

It’s been awhile since I talked about some of the smaller features that have been implemented for Paint.NET 4.0. So, without further ado …

Light Color Scheme

Paint.NET 3.5 uses a blue color scheme. For 4.0, you can still use that but the default is now the “Light” color scheme. The differences can be subtle but change is nice to have. The light theme also uses a gray canvas background (#CFCFCF to be precise), which can be important for color matching.

 
 

Color Picker Enhancements

Ed Harvey, who wrote and has been maintaining "Ed Harvey Effects,” one of the most popular and interesting plugin packs, has contributed some more features to Paint.NET 4.0 recently. The first two are in the Color Picker and give you the ability to set the sampling size as well as whether it will sample just the current layer or the whole image:

Copy Merged

Ed Harvey is also responsible for implementing another highly requested feature, Copy Merged. When you have a selection active, Edit->Copy will take the pixels from the current layer, while Edit->Copy Merged will use the whole image. In Paint.NET v3.5 you could do this but it required you to 1) Flatten the image, 2) Copy, and finally 3) Undo the Flatten. Paint.NET 4.0 will let you do that in one keystroke, and mirrors Photoshop’s functionality and keyboard shortcut. It also means you don’t have to wipe out your Redo history.

Tool Blending Modes

Paint.NET has always had an option to let you choose between Normal and Overwrite blending. The latter is necessary if you ever want to use anything but the Eraser tool in order to “draw transparent pixels.” This has been extended to include all of the layer blend modes, and still includes Overwrite. Currently this only works on the tools which have been upgraded to the new rendering system, namely the Pencil and Gradient tools, but all the others will be upgraded in due time. (I have already started upgrading the shape tools, for instance.)

Here’s an example comparing Normal and Xor blending modes with a rounded rectangle*:

Layer Reordering with Drag-and-Drop

In Paint.NET v3.5 you have to use the cumbersome Move Layer Up and Move Layer Down buttons to change layer ordering. Paint.NET 4.0 adds what you would naturally want to do here, namely the ability to just drag-and-drop the layers to reorder them. In addition, there are some nice animations for this and all the other things that can change the contents of the Layers window.

Antialiased Selections

Whenever you have a selection active, all drawing is clipped to it. Paint.NET 4.0 can finally do this clipping with antialiasing. This results in a much smoother edge. This was actually quite simple to implement with the new rendering engine that’s in place for 4.0. (Note: Feathered selections and other gizmos are another matter entirely and will hopefully make it into a post-4.0 release without too much of a wait.)

The first option gives you the same rendering that Paint.NET v3.5 and earlier uses. The second uses 2×2 super sampling on the clipping mask, and the third uses 3×3 super sampling. I experimented with 4×4 super sampling but the improvement wasn’t very noticeable; in addition, performance went down and memory usage went up.

Here’s an example of the quality levels with a circular selection that’s had a gradient drawn inside of it:

Right now the default is Antialiased (2×2 super sampling). I’ll be doing some further experimenting, and decide whether the default should be High Quality and whether the “normal quality” option should even be present.

Anyway, that’s all for now!

* Astute readers may notice that the rounded rectangle’s corner radius does not match what 3.5 uses … yes, this will finally be configurable. Right now I’ve just got a test tool that renders a fixed size, but in short order the shape tools will get some fantastic upgrades, including configuring the corner radius for a rounded rectangle.

Paint.NET forums moving to a new server

This is just a little note to explain why the forum may not be accessible for a little while. It’s moving to a better server, although it’ll have the same http:// location.

You shouldn’t have to do anything other than be please be patient for the next day or so, and then things should be back to normal once all the new DNS stuff propagates.

Apparently we were put on the wrong server during the last migration, and just recently we’ve been bumping into all of its CPU usage limitations 🙂