.NET Strings are immutable … except that they’re not

(Side note: expect a Paint.NET 4.0.6 update soon, hopefully timed with the availability of Windows 10. It’s not just a bugfix release, either. More info soon!)

If you have some code whose performance is limited by string building or concatenation, you may want to read this. For everyone else: close your eyes! Here be dragons! (just kidding, but seriously, be careful with this stuff)

Everybody knows that strings are immutable in .NET. When I first tried out C# (in 2003?) this annoyed me, as I was used to C and C++ and being able to do whatever I wanted. I quickly learned to love this property, and embraced immutability and functional programming style in general. The benefits far outweigh the initial clumsiness.

Everybody also knows that you can pin a string with unsafe and fixed, get a pointer to the first character, and read directly from it or even hand it off the native code. No copying required. Right? Right?!

And following from that, there’s a dirty little secret here that nobody really likes to talk about: .NET strings aren’t actually immutable. You can get a pointer to one, and you can write to that pointer because it’s allocated in the regular heap and not in some special read-only memory pages (e.g. via VirtualProtect). Strings in .NET don’t cache their hash code, therefore there’s nothing to cause an inconsistency if you do something like …

String s = “I’m immutable, right?”;
unsafe
{
    fixed (char* p = s)
    {
        *(p + 4) = ‘_‘;
        *(p + 5) = ‘_‘;        
    }
}

Console.WriteLine(s); // outputs “I’m __mutable, right?”;

Note that if you do this anywhere other than while initializing a string, be prepared for the local townsfolk to come knocking on your door. With torches and pitchforks. It’s very important to respect the immutability contract once you hand off a String to external code.

If we browse around with Reflector or over on github in the coreclr repo, we can see that .NET itself takes advantage of this in functions such as System.String.Concat()  and its helper function FillStringChecked():

public static String Concat(String str0, String str1, String str2) {
    // … parameter checking and other stuff omitted for brevity …
    int totalLength = str0.Length + str1.Length + str2.Length;
    String result = FastAllocateString(totalLength);
    FillStringChecked(result, 0, str0);
    FillStringChecked(result, str0.Length, str1);
    FillStringChecked(result, str0.Length + str1.Length, str2);
    return result;
}

unsafe private static void FillStringChecked(String dest, int destPos, String src)
{
    // … parameter checking omitted for brevity
    fixed(char *pDest = &dest.m_firstChar)
    fixed (char *pSrc = &src.m_firstChar) {
        wstrcpy(pDest + destPos, pSrc, src.Length);
    }
}

And wstrcpy just calls into some regular ol’ memcpy/memmove type code. Some of those code paths (via wstrcpy) are managed code, others eventually thunk into native, but none of it’s magic: you can do all of this yourself! You can’t call FastAllocateString()*, but you can still allocate a string of a specified length via new String(char c, int count). Just use a placeholder character for the first parameter, like a space or even null ‘’. Also note that all .NET strings are null-terminated, so if you create one of length N, you actually have N+1 characters. The memory layout intentionally matches BSTR which gives a lot of flexibility for passing things to native code without having to do any copying. (I don’t know what would happen if you overwrote the length prefix, but feel free to try it out and post a comment!)

With this information you can build your own versions of String.Concat or StringBuilder that can avoid making copies in specialized situations. However, I’m not actually sure how useful this is: I looked through the Paint.NET codebase and found only a handful of places that do lots of string manipulation, and even then they were mostly in error reporting code paths whose performance was entirely unimportant. I decided to leave the code alone, preferring safety over a little bit of performance (and fun and creativity). Your Mileage May Vary.

On a side note, the fact that you can do this at all is one of the reasons why I love .NET and C#. You get all the good stuff that a managed runtime and high-level language is supposed to provide, but you also get “escape hatches” to drill through the glass floors on the abstraction ladder. You can push the runtime and its safety guarantees out of the way in the name of performance (or for other reasons). Paint.NET takes advantage of this a lot: graphics code loves things like pointers and structs. Lately I’ve been doing a lot of work on Android** with Java, and the lack of things like pointers and structs … gosh, it just makes things so much more difficult to work with from a performance standpoint!

* Well, maybe you could with reflection Winking smile That would probably eliminate its “fast” property though.

** Sorry, not for Paint.NET.