Introducing Paint.NET v3.5’s new selection renderer

In my previous post, I mentioned that I had rewritten the selection renderer in Paint.NET v3.5. As a pseudo-warning, this post is pretty technical.

The selection renderer in Paint.NET v3.36 uses GDI+. It draws the outline twice: once as a dashed black line, and the second as a dashed white line. A timer fires every 50 milliseconds or so and redraws the outline with a slightly increased “dash offset.” This gives the “dancing ants” animation. Our pal Emma Roberts will demonstrate:

 
“Pretend I’m animated!” — Ants

There are a few major problems with this implementation. The first is that while you are drawing or modifying a selection, it is only drawn with a black outline (once you are done, it will transition to the “dancing ants” mode). This makes it wickedly difficult to figure out what kind of selection you’re drawing if the underlying area is black. Try it with a solid black image and you’ll understand immediately. I had to do a solid outline instead of a dashed one because there were some horrible artifacts and performance problems if I didn’t.

Next up, the new render cache (hopefully I’ll discuss it in an upcoming post) splits up rendering into more tiles than it did before (for several good reasons!). GDI+ does not do multithreading. At all. In fact, it actively prevents you from making GDI+ calls on more than one thread even if you are clever and “trick” it. This was resulting in selection rendering performance falling off a cliff as all sorts of “setup” work (creating brushes, clipping, etc.) was being over-executed by up to a factor of 10 (that’s an estimate). Not to mention it was a non-starter for multithreaded performance scaling anyway.

Another problem is that the “dancing ants” animation consumes a lot of CPU power. This in turn slows down the rest of the system, and drains battery power (I’m a desktop/workstation guy, but the market has been majority-owned by laptops for awhile now). There are a few optimizations in there to throttle CPU usage if it gets “too high” but it’s never really been profiled or quantitatively proven to work well. My criteria at the time was “fewer people are complaining,” and then I went and drank beer (woohoo!).

Ahem. Anyway. The animation is really there for two reasons: not only does it look cool, but it guarantees contrast between the selection outline and the underlying image. The image you’re working on can’t animate, but the selection outline does, so there you go: it will never be confused for being part of the image.

The solution? I wrote my own polygon rasterizer renderer. From scratch. There’s a good reason people use libraries like GDI+ and Cairo to render graphics for them: it’s tricky! Get someone else to do it for you! Simple implementations aren’t difficult, but the complexity skyrockets once you add things like antialiasing, clipping, blending, various types of caching, serialization (critical for undo/redo), and safe multithreading.

However, I felt it was worth it to implement just enough of a pixel-pushing geometry rasterization “library” in order to render the selection since it is so crucial for Paint.NET. It’s taken “only” 2 months to get right, and it still isn’t quite finished. (but hey, if something is a core business of yours, do it yourself!). I’m now taking polygon lists and doing my own clipping, scanline rasterization, clipping, blending, etc. It’s fun, confusing, educational, and horrifying. I should probably have a cigarette once I’m all done and calmed down.

Bresenham invented the classic aliased line rasterization algorithm, and a kid named Xiaolin Wu later invented a smart antialiased rasterization algorithm. Both are in wide use today because of their inherent and wonderful simplicity. I implemented the latter, because antialiasing is important. The selection outline now uses “XOR” blending, so contrast is guaranteed except for a few less-common scenarios (prevalent 50% grey).

Notice how in this new screenshot, the selection outline effectively changes color as it goes through parts of the image which are varying colors. Where her hair is dark, the selection is bright, and where it crosses her lighter skin tone it becomes darker:

Normally you can’t use both XOR and antialiasing. XOR is meant to be “undoable” and so applying it a 2nd time simply gives you back the original data. It’s generally fast, cheap, and simple. With a naive antialiasing implementation you end up with seams and dots all over your polygon, as endpoints of lines give you pixels that are rendered twice, and very small lines (0 to 2 pixels long) don’t really know what to do.

My solution was to do the rendering in two passes. In the first pass I accumulate coverage values into the alpha channel. I accumulate towards zero, and use the inverse of that later as my real alpha value. I can do this because I know that the pixel is fully opaque at this stage in the rendering pipeline; this is because the “checkerboard” has already been applied. Thus, the alpha channel can be used for whatever I want.

In the second pass I apply the XOR operator to every pixel along the path at full strength (yar!), and then use alpha blending between the original pixel’s color value and the XOR’d color value with respect to the accumulated alpha. Yes, I’m doing stencil buffer accumulation in software. Video games use these a lot, especially for things like shadows and dancing teddy bears (ok maybe not the second one so much).

Oh, also, the selection is no longer animated because contrast is achievable without it, and because the performance benefit is profound. It would also be much more difficult to get an animated or dotted outline with the new code. I’d need 4x the lines of code, or I’d have to employ code generators or some other form of voodoo (if this were C++, a “policy-based template something-or-other” would be employed). As it is I still have a few higher-order functions and closures in there I need to get rid of. But the performance is still great, so I have deferred those optimizations until later (alpha or beta).

Because the selection renderer is now implemented in code that is completely owned by me, all opportunities for optimization are available. This includes changing the underlying storage model that defines the selection’s polygons – I now use a List of Point arrays (List<System.Windows.Point[]>), which makes interoperating with GPC and WPF easier and faster. I can optimize my clipping and do work ahead of time to ensure that rendering is fast.

I can split work like vector/matrix multiplication across multiple threads. I can even prefetch work ahead of time using “spare” CPU cycles. For instance, whenever the selection changes (moves, rotates, or is edited using add/subtract/union/intersection), I have to recompute the scans of the selection. This is a list of rectangles that I use to fill the interior of the selection with a blue tint. Well, in between the notification of a changed selection, and actually painting it (other stuff happens in the middle), I compute these scans in a background thread. This is a future at work for you!

Also, I can make sure that these “scans” are computed and stored in sorted order with respect to their top Y coordinate. Then, when clipping the rendering of the highlighted selection interior, I can use a nicely fast binary search to figure out which “scan” to start rendering from. Later on I’ll put in logic so that the computation itself can be clipped (I never render the highlight outside of the image bounds, so why calculate that part?). Oh, and did I mention that along the way I found some code on the critical path for the Magic Wand that was using insertion sort? I didn’t? Well, it’s fixed. That was embarrassing.

Other opportunities for optimization include being smarter about which areas of the canvas are redrawn when things change. With GDI+, it was difficult to do the boolean path algebra correctly (because of bugs in GDI+) to find a minimal invalidation region, and so several heuristics were put into place. Now that I control all of the code for both rendering and computational geometry, I’ve been able to implement this better. This improves performance, and fixes some visual glitches whereby little bits and pieces of the selection outline remained when moving/rotating a selection (the heuristic to fix this was, “every 1 second, redraw everything”).

This has all also served to help reacquaint myself with the Paint.NET codebase in areas that haven’t really seen much change in at least 2 years. Therefore, I’m better prepared for more refactoring after v3.5’s release. I’m changing gears for my work on v3.5: I’m going to stop fixing/refactoring things, and move to bugfixing mode. I’ve done a lot of optimizations, and there are still many more possible ones, but I also need to release something so that you all can use it! More optimizations can be trickled out over v3.5.1 or v3.6 releases, etc.

Anyway, that’s all for now. I hope you all will like it. One of my private beta testers sure does:

“…the speed improvements in comparison to my memory of v3.36 are greatly improved on any Windows machine I throw it at. Really well done! I think that this alone will be enough to make people excited.”

The theme of Paint.NET v3.5 is … performance

I sat down to write some notes before starting this blog entry, and I wound up with two full pages in OneNote on the 1920×1200 monitor it was sitting in. The more I’ve been working on it the more I’m excited about the Paint.NET v3.5 release. It isn’t one that introduces a lot of really cool or big-ticket features, but the list of small improvements is really adding up. I’ve been able to do a lot of research and prototyping in esoteric areas of multithreading and concurrency, and have gained both more mastery and more fear for these topics.

Performance work in Paint.NET v3.5 has wound up focusing on 3 areas:

  1. Scaling up. As everyone’s been saying for years, the future is increasingly multithreaded. My newest CPU upgrade leaves me with 8 threads in Task Manager (Intel Core i7 overclocked to 3.8GHz). A lot of research and work has gone into making sure that Paint.NET continues to scale with more threads, and that I have better tools for safely and correctly implementing this across more of the application.

    ”A high-end 64-bit Intel Core i7 desktop should run Paint.NET very fast.”

     

  2. Scaling down. Those $300 netbooks that are taking everyone by storm only run about as fast as what I was using 7 years ago (Pentium 4 at 2.0 – 2.5ghz). Clearly, classic optimization strategies are important as well: trimming cycles, removing or deferring code execution, and optimizing repainting.

    “A brand-new netbook with an Atom processor should run Paint.NET comfortably.”

  3. Reducing memory usage. I guess this goes with scaling down. I made a bet a long time ago that 64-bit would slowly take care of the way I was allocating memory, which simplified development work but has had the consequence of consuming vast amounts of virtual address space . I was wrong: 32-bit will be here for a long time, especially since most of those hot-like-pancakes $300 netbooks are not 64-bit capable. This is currently my top reliability issue, as running out of memory causes Paint.NET to crash.

    ”It’s not all yours.”

I’ve had to split this discussion over several blog entries because otherwise it was too long and even I would have fallen asleep reading it. I’ll summarize the results here though:

  • Images open much faster, especially on single-core/single-thread systems. Actually, I already wrote about this, so go read that first 🙂
  • I ordered and assembled my own Atom-based mini-desktop (“nettop”), in order to keep myself honest as I was working on my Core 2 Quad QX6700 2.67 GHz monster and subsequently as I upgraded to a Core i7 920 2.66GHz overclocked to 3.8 GHz.
  • The selection renderer has been completely rewritten. No more dancing ants and no more GDI+ means much lower CPU usage and better performance with multiple CPU cores.
  • Much better CPU scaling for the image composition rendering pipeline using LINQ-esque functional programming and deferred execution techniques.
  • A rewritten “render cache” has resulted in an average of 30-50% less memory usage when opening multiple images, especially those with just a single layer (PNG, JPEG). This means fewer out of memory crashes, and the ability to open more images without out-of-memory errors.

Paint.NET v3.5 is a stepping stone towards a hopefully epic v4.0. I’m slowly rebuilding the application from the inside out, and it takes a lot of time to do the necessary research and development. About 2 years ago, right around the time I was preparing to release Paint.NET v3.0, I had this nagging feeling in the back of my head that said basically “ur doin’ it wrong”. My document model was wrong, my application model was brittle, and I just couldn’t implement really cool features without using up a ton of memory. I also couldn’t provide features like scripting or a better extensibility model (plugins) in a manner that was both safe and powerful.

However, I didn’t really know how to solve all of this at a scale lower than the 50,000-foot view. Since then I’ve been slowly piecing together the tools and knowledge that I’ll need to create the best version of Paint.NET ever – one that’s great both outside (for users) and inside (for developers).

Now, if you’ll excuse me, I’ve got to stop breaking things and start fixing them so that I can push out an alpha release.