In my previous post, I mentioned that I had rewritten the selection renderer in Paint.NET v3.5. As a pseudo-warning, this post is pretty technical.

The selection renderer in Paint.NET v3.36 uses GDI+. It draws the outline twice: once as a dashed black line, and the second as a dashed white line. A timer fires every 50 milliseconds or so and redraws the outline with a slightly increased “dash offset.” This gives the “dancing ants” animation. Our pal Emma Roberts will demonstrate:

“Pretend I’m animated!” — Ants

There are a few major problems with this implementation. The first is that while you are drawing or modifying a selection, it is only drawn with a black outline (once you are done, it will transition to the “dancing ants” mode). This makes it wickedly difficult to figure out what kind of selection you’re drawing if the underlying area is black. Try it with a solid black image and you’ll understand immediately. I had to do a solid outline instead of a dashed one because there were some horrible artifacts and performance problems if I didn’t.

Next up, the new render cache (hopefully I’ll discuss it in an upcoming post) splits up rendering into more tiles than it did before (for several good reasons!). GDI+ does not do multithreading. At all. In fact, it actively prevents you from making GDI+ calls on more than one thread even if you are clever and “trick” it. This was resulting in selection rendering performance falling off a cliff as all sorts of “setup” work (creating brushes, clipping, etc.) was being over-executed by up to a factor of 10 (that’s an estimate). Not to mention it was a non-starter for multithreaded performance scaling anyway.

Another problem is that the “dancing ants” animation consumes a lot of CPU power. This in turn slows down the rest of the system, and drains battery power (I’m a desktop/workstation guy, but the market has been majority-owned by laptops for awhile now). There are a few optimizations in there to throttle CPU usage if it gets “too high” but it’s never really been profiled or quantitatively proven to work well. My criteria at the time was “fewer people are complaining,” and then I went and drank beer (woohoo!).

Ahem. Anyway. The animation is really there for two reasons: not only does it look cool, but it guarantees contrast between the selection outline and the underlying image. The image you’re working on can’t animate, but the selection outline does, so there you go: it will never be confused for being part of the image.

The solution? I wrote my own polygon rasterizer renderer. From scratch. There’s a good reason people use libraries like GDI+ and Cairo to render graphics for them: it’s tricky! Get someone else to do it for you! Simple implementations aren’t difficult, but the complexity skyrockets once you add things like antialiasing, clipping, blending, various types of caching, serialization (critical for undo/redo), and safe multithreading.

However, I felt it was worth it to implement just enough of a pixel-pushing geometry rasterization “library” in order to render the selection since it is so crucial for Paint.NET. It’s taken “only” 2 months to get right, and it still isn’t quite finished. (but hey, if something is a core business of yours, do it yourself!). I’m now taking polygon lists and doing my own clipping, scanline rasterization, clipping, blending, etc. It’s fun, confusing, educational, and horrifying. I should probably have a cigarette once I’m all done and calmed down.

Bresenham invented the classic aliased line rasterization algorithm, and a kid named Xiaolin Wu later invented a smart antialiased rasterization algorithm. Both are in wide use today because of their inherent and wonderful simplicity. I implemented the latter, because antialiasing is important. The selection outline now uses “XOR” blending, so contrast is guaranteed except for a few less-common scenarios (prevalent 50% grey).

Notice how in this new screenshot, the selection outline effectively changes color as it goes through parts of the image which are varying colors. Where her hair is dark, the selection is bright, and where it crosses her lighter skin tone it becomes darker:

Normally you can’t use both XOR and antialiasing. XOR is meant to be “undoable” and so applying it a 2nd time simply gives you back the original data. It’s generally fast, cheap, and simple. With a naive antialiasing implementation you end up with seams and dots all over your polygon, as endpoints of lines give you pixels that are rendered twice, and very small lines (0 to 2 pixels long) don’t really know what to do.

My solution was to do the rendering in two passes. In the first pass I accumulate coverage values into the alpha channel. I accumulate towards zero, and use the inverse of that later as my real alpha value. I can do this because I know that the pixel is fully opaque at this stage in the rendering pipeline; this is because the “checkerboard” has already been applied. Thus, the alpha channel can be used for whatever I want.

In the second pass I apply the XOR operator to every pixel along the path at full strength (yar!), and then use alpha blending between the original pixel’s color value and the XOR’d color value with respect to the accumulated alpha. Yes, I’m doing stencil buffer accumulation in software. Video games use these a lot, especially for things like shadows and dancing teddy bears (ok maybe not the second one so much).

Oh, also, the selection is no longer animated because contrast is achievable without it, and because the performance benefit is profound. It would also be much more difficult to get an animated or dotted outline with the new code. I’d need 4x the lines of code, or I’d have to employ code generators or some other form of voodoo (if this were C++, a “policy-based template something-or-other” would be employed). As it is I still have a few higher-order functions and closures in there I need to get rid of. But the performance is still great, so I have deferred those optimizations until later (alpha or beta).

Because the selection renderer is now implemented in code that is completely owned by me, all opportunities for optimization are available. This includes changing the underlying storage model that defines the selection’s polygons – I now use a List of Point arrays (List<System.Windows.Point[]>), which makes interoperating with GPC and WPF easier and faster. I can optimize my clipping and do work ahead of time to ensure that rendering is fast.

I can split work like vector/matrix multiplication across multiple threads. I can even prefetch work ahead of time using “spare” CPU cycles. For instance, whenever the selection changes (moves, rotates, or is edited using add/subtract/union/intersection), I have to recompute the scans of the selection. This is a list of rectangles that I use to fill the interior of the selection with a blue tint. Well, in between the notification of a changed selection, and actually painting it (other stuff happens in the middle), I compute these scans in a background thread. This is a future at work for you!

Also, I can make sure that these “scans” are computed and stored in sorted order with respect to their top Y coordinate. Then, when clipping the rendering of the highlighted selection interior, I can use a nicely fast binary search to figure out which “scan” to start rendering from. Later on I’ll put in logic so that the computation itself can be clipped (I never render the highlight outside of the image bounds, so why calculate that part?). Oh, and did I mention that along the way I found some code on the critical path for the Magic Wand that was using insertion sort? I didn’t? Well, it’s fixed. That was embarrassing.

Other opportunities for optimization include being smarter about which areas of the canvas are redrawn when things change. With GDI+, it was difficult to do the boolean path algebra correctly (because of bugs in GDI+) to find a minimal invalidation region, and so several heuristics were put into place. Now that I control all of the code for both rendering and computational geometry, I’ve been able to implement this better. This improves performance, and fixes some visual glitches whereby little bits and pieces of the selection outline remained when moving/rotating a selection (the heuristic to fix this was, “every 1 second, redraw everything”).

This has all also served to help reacquaint myself with the Paint.NET codebase in areas that haven’t really seen much change in at least 2 years. Therefore, I’m better prepared for more refactoring after v3.5’s release. I’m changing gears for my work on v3.5: I’m going to stop fixing/refactoring things, and move to bugfixing mode. I’ve done a lot of optimizations, and there are still many more possible ones, but I also need to release something so that you all can use it! More optimizations can be trickled out over v3.5.1 or v3.6 releases, etc.

Anyway, that’s all for now. I hope you all will like it. One of my private beta testers sure does:

“…the speed improvements in comparison to my memory of v3.36 are greatly improved on any Windows machine I throw it at. Really well done! I think that this alone will be enough to make people excited.”