Paint.NET’s performance is an interesting topic. There are some areas, such as startup, where Paint.NET is very fast. There are other areas, such as selection manipulation, where it is quite slow, even laughably so in non-pathological situations. If you’ve ever worked with the Magic Wand tool on an image of more than about 4 megapixels, then you probably understand what I’m talking about.
Just about the slowest thing you can do in Paint.NET v3.5.x is to use the Magic Wand on a large image and then try to move, rotate, or combine that selection. The geometry processing is slow, and the rendering takes just as long. I even ran into a test case where clicking the Magic Wand on a very high resolution image of the Milky Way took hours of processing time and a few gigabytes of memory. Ouch. (I actually killed the process after about 20 minutes, so “hours” is just an estimate.)
At this point I’m going to jump ahead to the results of what I’ve done. I took* a 6 MP image (3871 x 1595 pixels) of a 2011 BMW 5-series and loaded it up in v3.36, v3.5.10 and in 4.0. (For those who aren’t aware, 3.36 and 3.5.x are basically two separate generations of the Paint.NET codebase.) I then resized this image to be 5x larger, giving me a 136 MP image (19,355 x 7,975). I clicked with the Magic Wand tool in the same spot in each, waited for the selection to be created, and then did some simple things like scrolling, zooming, and drawing a gradient (which is clipped to the selection).
In 3.5.10, scrolling and zooming were very slow, limited by the rate at which the geometry of the selection could be continually rasterized. Drawing something like a gradient was able to run at about 1 frame every 2 seconds. The app just couldn’t deal with it. CPU usage was extremely high the whole time.
In 4.0, scrolling and zooming were perfectly fluid. Oftentimes the selection itself would lag behind what I was doing but it always caught up a moment later. It didn’t get in the way of me trying to find the area of the image that I wanted to do stuff with next. I could even draw a gradient, clipped to the selection, and it was responsive the whole time (the status bar and whatnot kept updating). Now, to be honest, the gradient actually takes longer to render than in 3.5.10, but it fills in gradually as each tile completes its work, and allows the user to keep doing other stuff while it’s in-progress. This is a much better situation to be in: I can and will optimize the gradient renderer to be faster at some point, but in the meantime I still have something I can ship and that people can get utility from.
If we go back to Paint.NET v3.36, the situation becomes almost comical. You see, 3.36 had an animated “dancing ants” selection outline, and it uses GDI+ to render this on the UI thread. As such, in this test case, it is constantly chewing up CPU while also providing an almost entirely non-responsive UI. You can’t even hover the mouse over the menus to get them to light up and indicate they can be clicked without waiting a second or two first. The 3.5 release improved this by removing the animation. 4.0 will give us the best of both worlds: we get the animation, and we also get trivial CPU usage. We also get all sorts of other improvements.
The way I do this in 4.0 is to render the selection on another thread. Actually, I render masks, which are 1-byte per pixel bitmaps that only store the alpha channel. Brushes are then used to fill in the color. I create 5 of these masks: 1 for the interior, and then 4 for the outline to take care of the animation. When I go to draw the selection back on the UI thread, I just use ID2D1RenderTarget::FillOpacityMask() three times. Once for the interior, once for the Nth mask with a black brush, and once more for the ((N+4)%4)th mask with a white brush. This increases memory usage and GPU fill rate requirements, but with dramatic improvements to responsiveness, CPU usage, and battery life.
Because I’m using Direct2D and do this mask filling with hardware acceleration, CPU usage for an animated selection almost non-existent. Click on the screenshot below for evidence.
In summary, I’ve traded 1.25 frame buffers of memory for a huge performance win. Manipulating selections is still just as slow as it ever was, and over time I plan to move that work off the UI thread or whatever it takes. Right now if you use the Move Selection tool on a complex selection, it does a bunch of matrix math to transform all the points before committing the changes to the Selection object. Well, clearly we can render that in a much easier way: take the cached bitmap and just apply the same transformation. etc. etc. One step at a time.
* By “take” I don’t mean that I was the photographer.