You are currently browsing the category archive for the ‘Code’ category.

.NET Framework 4.5 contains a very cool new feature called Multi-Core JIT. You can think of it as a profile-guided JIT prefetcher for application startup, and can read about it in a few places …

I’ve been using .NET 4.0 to develop Paint.NET 4.0 for the past few years. Now that .NET 4.5 is out, I’ve been upgrading Paint.NET to require it. However, due to a circumstance beyond my control at this moment, I can’t actually use anything in .NET 4.5 (see below for why). So Paint.NET is compiled for .NET 4.0 and can’t use .NET 4.5’s features at compile time, but as it turns out they are still there at runtime.

I decided to see if it was possible to use the ProfileOptimization class via reflection even if I compiled for .NET 4.0. The answer: yes! You may ask why you’d want to do this at all instead of biting the bullet and requiring .NET 4.5. Well, you may need to keep your project on .NET 4.0 in order to maintain maximum compatibility with your customers who aren’t yet ready (or willing Smile) to install .NET 4.5. Maybe you’d like to use the ProfileOptimization class in your next “dot release” (e.g. v1.0.1) as a free performance boost for those who’ve upgraded to .NET 4.5, but without displacing those who haven’t.

So, here’s the code, which I’ve verified as working just fine if you compile for .NET 4.0 but run with .NET 4.5 installed:

using System.Reflection;

Type systemRuntimeProfileOptimizationType = Type.GetType("System.Runtime.ProfileOptimization", false);
if (systemRuntimeProfileOptimizationType != null)
{
    MethodInfo setProfileRootMethod = systemRuntimeProfileOptimizationType.GetMethod("SetProfileRoot", BindingFlags.Static | BindingFlags.Public, null, new Type[] { typeof(string) }, null);
    MethodInfo startProfileMethod = systemRuntimeProfileOptimizationType.GetMethod("StartProfile", BindingFlags.Static | BindingFlags.Public, null, new Type[] { typeof(string) }, null);

    if (setProfileRootMethod != null && startProfileMethod != null)
    {
        try
        {
            // Figure out where to put the profile (go ahead and customize this for your application)
            // This code will end up using something like, C:\Users\UserName\AppData\Local\YourAppName\StartupProfile\
            string localSettingsDir = Environment.GetFolderPath(Environment.SpecialFolder.LocalApplicationData);
            string localAppSettingsDir = Path.Combine(localSettingsDir, "YourAppName");
            string profileDir = Path.Combine(localAppSettingsDir, "ProfileOptimization");
            Directory.CreateDirectory(profileDir);

            setProfileRootMethod.Invoke(null, new object[] { profileDir });
            startProfileMethod.Invoke(null, new object[] { "Startup.profile" }); // don’t need to be too clever here
        }

        catch (Exception)
        {
            // discard errors. good faith effort only.
        }
    }
}

I’m not sure I’ll be using this in Paint.NET 4.0 since it uses NGEN already, but it’s nice to have this code snippet around.

So, why can’t I use .NET 4.5? Well, they removed support for Setup projects (*.vdproj) in Visual Studio 2012, and I don’t yet have the time or energy to convert Paint.NET’s MSI to be built using WiX. I’m not willing to push back Paint.NET 4.0 any further because of this. Instead, I will continue using Visual Studio 2010 and compiling for .NET 4.0 (or maybe I’ll find a better approach). However, at install time and application startup, it will check for and require .NET 4.5. The installer will get it installed if necessary. Also, there’s a serialization bug in .NET 4.0 which has dire consequences for images saved in the native .PDN file format, but it’s fixed in .NET 4.5 (and for .NET 4.0 apps if 4.5 just happens to be what’s installed).

I’ve come up with a trick that can be used in some very specific scenarios in order to avoid extra array copying when calling into native code from managed code (e.g. C#). This won’t usually work for regular P/Invokes into all your favorite Win32 APIs, but I’m hopeful it’ll be useful for someone somewhere. It’s not even evil! No hacks required.

Many native methods require the caller to allocate the array and specify its length, and then the callee fills it in or returns an error code indicating that the buffer is too small. The technique described in this post is not necessary for those, as they can already be used optimally without any copying.

Instead, let’s talk about the general problem if you’re calling a native method which does the array allocation and then returns it. You can’t use it as a “managed array” unless you copy it into a brand new managed array (don’t forget to free the native array). In other words, native { T* pArray; size_t length; } cannot be used as a simple managed T[] as-is (or even with modification!). The managed runtime didn’t allocate it, won’t recognize it, and there’s nothing you can do about it. Very few managed methods will accept a pointer and a length, and will require a managed array. This is particularly irksome when you want to use System.IO.Stream.Read() or Write() with bytes from a native-side buffer.

Paint.NET uses a library written in classic C called General Polygon Clipper (GPC), from The University of Manchester, to perform polygon clipping. This is used for, among other things, when you draw a selection with a mode such as add (union), subtract (exclude), intersect, and invert (“xor”). I blogged about this 4 years ago when version 3.35 was about to be released: using GPC made these operations immensely faster, and I saved a lot of time and headache by purchasing a commercial use license for the library and then integrating it into the Paint.NET code base. tl;dr: The algorithms for doing this are nontrivial and rife with special corner cases, and I’d been struggling to find enough sequential time to implement and debug it on my own.

Anyway, the data going into and coming out of GPC is an array of polygons. Each polygon is an array of points, each of which is just a struct containing X and Y as double-precision floating point values. To put it simply, it’s just a System.Windows.Point[][] (I actually use my own geometry primitives nowadays, but that’s another story, and it’s the same exact thing).

Getting this data into GPC from the managed side is easy. You pin every array, and then hand off the pinned pointers to GPC. Since you can’t use the “fixed” expression with a dynamic number of elements, I use GCHandle directly and stuff them all into GCHandle[] arrays for the duration of the native call. This is great because on the managed side I can work with regular ol’ managed arrays, and then send them off to GPC as “native arrays” by pinning them and using the pointers obtained from GCHandle.AddrOfPinnedObject().

Now, here’s the heart breaking part. GPC allocates the output polygon using good ol’ malloc*. So when I get the result back on the managed side, I must copy every single last one so that I can use it as a Point[] (a managed array). This ends up burning a lot of CPU time, and can cause virtual address space claustrophobia on 32-bit/x86 systems when working with complex selections (e.g. Magic Wand), as you must have enough memory for 2 copies of the result while you’re doing the copying. (Or you could free each native array after you copy it into a managed array, but that’s an optimization for another day, and isn’t as straightforward as you’d think because freeing the native memory requires another P/Invoke, and those add up, and so it might not actually be an optimization.)

But wait, there’s another way! Since the code for GPC is part of my build, I can modify it. So I added an extra parameter called gpc_vertex_calloc:

    typedef (gpc_vertex *)(__stdcall * gpc_vertex_calloc_fn)(int count);

    GPC_DLL_EXPORT(
    void, gpc_polygon_clip)(
        gpc_op                set_operation,
        gpc_polygon          *subject_polygon,
        gpc_polygon          *clip_polygon,
        // result_polygon holds the arrays of gpc_vertex, aka System.Windows.Point)
        gpc_polygon          *result_polygon, 
        gpc_vertex_calloc_fn  gpc_vertex_calloc);

("gpc_vertex” is GPC’s struct that has the same layout as System.Windows.Point: X and Y, defined as a double.)

In short, I’ve changed GPC so that is uses an external allocator by passing in a function pointer it should use instead of malloc. And now if I want I can have it use malloc, HeapAlloc, VirtualAlloc, or even the secret sauce detailed below.

On the managed side, the interop delegate definition for gpc_vertex_calloc_fn gets defined as such:

    [UnmanagedFunctionPointer(CallingConvention.StdCall)]
    public delegate IntPtr gpc_vertex_calloc_fn(int count);

And gpc_polygon_clip’s interop defintion is like so:

    [DllImport(“PaintDotNet.SystemLayer.Native.x86.dll”, CallingConvention = CallingConvention.StdCall)]
    public static extern void gpc_polygon_clip(
        [In] NativeConstants.gpc_op set_operation,
        [In] ref NativeStructs.gpc_polygon subject_polygon,
        [In] ref NativeStructs.gpc_polygon clip_polygon,
        [In, Out] ref NativeStructs.gpc_polygon result_polygon,
        [In] [MarshalAs(UnmanagedType.FunctionPtr)] NativeDelegates.gpc_vertex_calloc_fn gpc_vertex_calloc);

So, we’re half way there, and now we need to implement the allocator on the managed side.

    internal unsafe sealed class PinnedManagedArrayAllocator<T>
        : Disposable
          where T : struct
    {
        private Dictionary<IntPtr, T[]> pbArrayToArray;
        private Dictionary<IntPtr, GCHandle> pbArrayToGCHandle;

        public PinnedManagedArrayAllocator()
        {
            this.pbArrayToArray = new Dictionary<IntPtr, T[]>();
            this.pbArrayToGCHandle = new Dictionary<IntPtr, GCHandle>();
        }
 
        // (Finalizer is already implemented by the base class (Disposable))

        protected override void Dispose(bool disposing)
        {
            if (this.pbArrayToGCHandle != null)
            {
                foreach (GCHandle gcHandle in this.pbArrayToGCHandle.Values)
                {
                    gcHandle.Free();
                }

                this.pbArrayToGCHandle = null;
            }

            this.pbArrayToArray = null;

            base.Dispose(disposing);
        }

        // Pass a delegate to this method for “gpc_vertex_calloc_fn”. Don’t forget to use GC.KeepAlive() on the delegate!
        public IntPtr AllocateArray(int count)
        {
            T[] array = new T[count];
            GCHandle gcHandle = GCHandle.Alloc(array, GCHandleType.Pinned);
            IntPtr pbArray = gcHandle.AddrOfPinnedObject();
            this.pbArrayToArray.Add(pbArray, array);
            this.pbArrayToGCHandle.Add(pbArray, gcHandle);
            return pbArray;
        }

        // This is what you would use instead of, e.g. Marshal.Copy()
        public T[] GetManagedArray(IntPtr pbArray)
        {
            return this.pbArrayToArray[pbArray];
        }
    }

(“Disposable” is a base class which implements, you guessed it, IDisposable, while also ensuring that Dispose(bool) never gets called more than once. This is important for other places where thread safety is very important. This class is specifically not thread safe, but it should be reasonably easy to make it so.)

And that’s it! Well, almost. I’m omitting the guts of the interop code but nothing that should inhibit comprehension of this part of it. Also, the above code is not hardened for error cases, and should not be used as-is for anything running on a server or in a shared process. Oh, and I just noticed that my Dispose() method has an incorrect implementation, whereby it shouldn’t be using this.pbArrayToGCHandle, specifically it shouldn’t be foreach-ing on it, and should instead wrap that in its own IDisposable-implementing class … exercise for the reader? Or I can post a fix later if someone wants it.

After I’ve called gpc_polygon_clip, instead of copying all the arrays using something like System.Runtime.InteropServices.Marshal.Copy(), I just use GetManagedArray() and pass in the pointer that GPC retrieved from its gpc_vertex_calloc_fn, aka AllocateArray(). When I’m done, I dispose the PinnedManagedArrayAllocator and it unpins all the managed arrays. And this is much faster than making copies of everything.

Now, this isn’t the exact code I’m using. I’ve un-generalized it in the real code so I can allocate all of the arrays at once instead of incurring potentially hundreds of managed <—> native transitions for each allocation. The above implementation also doesn’t have a “FreeArray” method; I had one, but I ended up not needing it, so I removed it.

So the next time you find yourself calling into native code which either 1) allows you to specify an external allocator, or 2) is part of your build, and that 3) involves lots of data and thus lots of copying and wasted CPU time, you might just consider using the tactic above. Your users will thank you.

Lastly, I apologize for my blog’s poor code formatting.

Legal: I hereby place the code in this blog post into the public domain for anyone to do whatever they want with. Attribution is not required, but I certainly appreciate if you send me an e-mail or post a comment and let me know it was useful for you.

* Actually in Paint.NET 3.5, I changed it to use HeapAlloc(). This way I can get an exception raised when it runs out of memory, instead of corrupt results. This does happen on 32-bit/x86, especially when using the Magic Wand on large images.

Follow

Get every new post delivered to your Inbox.

Join 246 other followers